HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: arydshln

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2312.16341v1 [stat.ML] 26 Dec 2023

Harnessing the Power of Federated Learning in Federated Contextual Bandits

Chengshuai Shi [email protected]
Department of Electrical and Computer Engineering
University of Virginia
Ruida Zhou [email protected]
Department of Electrical and Computer Engineering
University of California, Los Angeles
Kun Yang [email protected]
Department of Electrical and Computer Engineering
University of Virginia
Cong Shen [email protected]
Department of Electrical and Computer Engineering
University of Virginia
Abstract

Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

1 Introduction

Federated learning (FL), initially proposed by McMahan et al. (2017); Konečnỳ et al. (2016), has garnered significant attention for its effectiveness in enabling distributed machine learning with heterogeneous agents (Li et al., 2020a; Kairouz et al., 2021). As FL has gained popularity, numerous endeavors have sought to extend its applicability beyond the original realm of supervised learning, e.g., to unsupervised and semi-supervised learning (Zhang et al., 2020; van Berlo et al., 2020; Zhuang et al., 2022; Lubana et al., 2022). Among these directions, the exploration of federated contextual bandits (FCB) has emerged as a particularly compelling area of research, representing a pivotal fusion of FL and sequential decision-making, which has found various practical applications in cognitive radio and recommendation systems, among others.

Over the past several years, substantial progress has been made in the field of FCB (Wang et al., 2019; Li & Wang, 2022b; Li et al., 2022; 2023; Dai et al., 2023), particularly those involving varying function approximations (e.g., linear models, as discussed in Huang et al. (2021b); Dubey & Pentland (2020); Li & Wang (2022a); He et al. (2022); Amani et al. (2022); Fan et al. (2023)). Despite their different focuses, it can be observed that these existing designs all employ certain FL components to enable the participating agents to collaboratively update their CB parameterization via locally collected interaction data.

However, these FL components adopted in the previous FCB works are often over-simplified. In particular, the canonical FL framework (traced back to the celebrated FedAvg algorithm (McMahan et al., 2017)) typically takes an optimization view of incorporating the local data through multi-round aggregation of model parameters (such as gradients). In contrast, the FL protocol in many existing FCB works is one-shot aggregation of some compressed local data per epoch (e.g., combining local estimates and local covariance matrices in the study of federated linear bandits). Admittedly, for some simple cases, such straightforward aggregation is sufficient and allows problem-specific finetuning for tight performance bounds. However, such a deviation from the canonical FL studies prohibits existing FCB designs from leveraging the vast FL advances, and thus largely limits the connection between FL and FCB.

Motivated by this disconnection between FL and FCB, this work, instead of pursuing tighter performance bounds, aims to utilize the canonical FL framework as the FL component of FCB to harness the full power of FL studies in FCB. We propose FedIGW – an exploring design that demonstrates the ability to leverage a comprehensive array of FL advancements, encompassing canonical algorithmic approaches (like FedAvg (McMahan et al., 2017) and SCAFFOLD (Karimireddy et al., 2020)), rigorous convergence analyses, and critical appendages (such as personalization, robustness, and privacy). To the best of our knowledge, this is the first paper that explicitly focuses on the close connection between FL and FCB, which we hope can inspire a new line of FCB studies. The distinctive contributions of FedIGW can be succinctly summarized as follows:

\bullet Flexible incorporation of FL protocols. In the FCB setting with stochastic contexts and a realizable reward function, FedIGW employs the inverse gap weighting (IGW) algorithm for CB while versatile FL protocols can be incorporated (e.g., FedAvg and SCAFFOLD), provided they can solve a standard FL problem. These two parts iterate according to designed epochs: FL, drawing from previously gathered interaction data, supplies estimated reward functions for the forthcoming IGW interactions. A pivotal advantage is that the flexible FL component in FedIGW provides substantial adaptability, meaning that existing and future FL protocols can be seamlessly leveraged. Experimental results using real-world data with several different FL choices corroborate the practicability and flexibility of FedIGW.

\bullet Modularized plug-in of FL analyses. A general theoretical analysis of FedIGW is developed to demonstrate its provably efficient performance. The influence of the adopted FL protocol is captured through its optimization error, delineating the excess risk of the learned reward function. Notably, any theoretical breakthroughs in FL convergence rates can be immediately integrated into the obtained analysis framework and supply the corresponding guarantees of FedIGW. Concretized results are further provided through the utilization of FedAvg and SCAFFOLD in FedIGW.

\bullet Seamless integration of FL appendages. Beyond its inherent generality and efficiency, FedIGW exhibits exceptional extensibility. Various appendages from FL studies can be flexibly integrated without necessitating alterations to the CB component. We explore the extension of FedIGW to personalized learning and the incorporation of privacy and robustness guarantees. Similar investigations in prior FCB works would entail substantial algorithmic modifications, while FedIGW can effortlessly leverage corresponding FL advancements to obtain these appealing attributes.

Key related works. Most of the previous studies on FCB are discussed in Sec. 2.2, and more comprehensively reviewed in Appendix B. We note that these FCB designs with tailored FL protocols in previous works sometimes can achieve near-optimal performance bounds in specific settings, while our proposed FedIGW is more practical and extendable. We believe these two types of designs are valuable supplements to each other. A high-level comparison between the proposed FedIGW and existing FCB designs is listed in Table 1. Additionally, while this work was being developed, the paper (Agarwal et al., 2023) was posted, which also proposes to have a decoupled FL component in FCB. However, Agarwal et al. (2023) mainly focuses on empirical investigations, while our work offers valuable complementary contributions by conducting thorough theoretical analyses.

2 Federated Contextual Bandits

This section introduces the problem of federated contextual bandits (FCB). A concise formulation is first provided. Then, the existing works are re-visited with a focus on revealing the disconnection between FL and FCB.

2.1 Problem Formulation

Agents. In the FCB setting, a total of M𝑀Mitalic_M agents simultaneously participate in solving a contextual bandit (CB) problem. For generality, we consider an asynchronous system: each of the M𝑀Mitalic_M agents has a clock indicating her time step, which is denoted as tm=1,2,subscript𝑡𝑚12t_{m}=1,2,\cdotsitalic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 1 , 2 , ⋯ for agent m𝑚mitalic_m. For convenience, we also introduce a global time step t𝑡titalic_t. Denote by tm(t)subscript𝑡𝑚𝑡t_{m}(t)italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) the agent m𝑚mitalic_m’s local time step when the global time is t𝑡titalic_t, and t(tm,m)𝑡subscript𝑡𝑚𝑚t(t_{m},m)italic_t ( italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_m ) the global time step when the agent m𝑚mitalic_m’s local time is tmsubscript𝑡𝑚t_{m}italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

Agent m𝑚mitalic_m at each of her local time step tm=1,2,subscript𝑡𝑚12t_{m}=1,2,\cdotsitalic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 1 , 2 , ⋯ observes a context xm,tmsubscript𝑥𝑚subscript𝑡𝑚x_{m,t_{m}}italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT, selects an action am,tmsubscript𝑎𝑚subscript𝑡𝑚a_{m,t_{m}}italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT from an action set 𝒜m,tmsubscript𝒜𝑚subscript𝑡𝑚\mathcal{A}_{m,t_{m}}caligraphic_A start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and then receives the associated reward rm,tm(am,tm)subscript𝑟𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚r_{m,t_{m}}(a_{m,t_{m}})italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (possibly depends on both xm,tmsubscript𝑥𝑚subscript𝑡𝑚x_{m,t_{m}}italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT and am,tmsubscript𝑎𝑚subscript𝑡𝑚a_{m,t_{m}}italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT) as in the standard CB (Lattimore & Szepesvári, 2020). Each agent’s goal is to collect as many rewards as possible given a time horizon.

Federation. While many efficient single-agent (centralized) algorithms have been proposed for CB (Lattimore & Szepesvári, 2020), FCB targets building a federation among agents to perform collaborative learning such that the performance can be improved from learning independently. Especially, common interests shared among agents motivate their collaboration. Thus, FCB studies typically assume that the agents’ environments are either fully (Wang et al., 2019; Huang et al., 2021b; Dubey & Pentland, 2020; He et al., 2022; Amani et al., 2022; Li et al., 2022; Li & Wang, 2022b; Dai et al., 2023) or partially (Li & Wang, 2022a; Agarwal et al., 2020) shared in the global federation.

In federated learning, the following two modes are commonly considered: (1) There exists a central server in the system, and the agents can share information with the server, which can then broadcast aggregated information back to the agents; or (2) There exists a communication graph between agents, who can share information with their neighbors on the graph. In the later discussions, we mainly consider the first scenario, i.e., collaborating through the server, which is also the main focus in FL, while both modes can be effectively encompassed in the proposed FedIGW design.

2.2 The Current Disconnection Between FCB and FL

The exploration of FCB traces its origins to distributed multi-armed bandits (Wang et al., 2019). Since then, FCB research has predominantly focused on enhancing performance in broader problem domains, encompassing various types of reward functions, such as linear (Wang et al., 2019; Huang et al., 2021b; Dubey & Pentland, 2020), kernelized (Li et al., 2022; 2023), generalized linear (Li & Wang, 2022b) and neural (Dai et al., 2023) (see Appendix B for a comprehensive review).

Table 1: A comparison between existing FCB designs and the proposed FedIGW.
Existing FCB designs FedIGW
FL components Develop tailored FL protocols Leverage versatile FL protocols, such as FedAvg and SCAFFOLD
Theoretical guarantees Analyse tailored FL protocols for the focused instance Plugin FL convergence rates in a modularized fashion
Extensions (e.g., personalization, robustness, privacy) Require further tailored protocols Integrate corresponding FL advances directly
Table 2: A compact summary of investigations on FCB with their adopted FL and CB components;
a more comprehensive review is in Appendix B.
Reference Setting FL CB
Globally Shared Full Model (See Section 3)
Wang et al. (2019) Tabular Mean Averaging AE
Wang et al. (2019); Huang et al. (2021b) Linear Linear Regression AE
Li & Wang (2022a); He et al. (2022) Linear Ridge Regression UCB
Li & Wang (2022b) Gen. Linear Distributed AGD UCB
Li et al. (2022; 2023) Kernel Nyström Approximation UCB
Dai et al. (2023) Neural NTK Approximation UCB
FedIGW (this work) Realizable Flexible (e.g., FedAvg) IGW
Globally Shared Partial Model (see Section 6.1)
Li & Wang (2022a) Linear Alternating Minimization UCB
Agarwal et al. (2020) Realizable FedRes.SGD ε𝜀\varepsilonitalic_ε-greedy
FedIGW (this work) Realizable Flexible (e.g., LSGD-PFL) IGW

AE: arm elimination; Gen. Linear: generalized linear model; AGD: accelerated gradient descent

Refer to caption
Figure 1: The FCB design principle of periodically alternating between the employed CB and FL components.

Upon a holistic review of these works, it becomes apparent that each of them employs a particular FL protocol to update the parameters required by CB. To be more specific, a periodically alternating design between CB and FL is commonly adopted as reflected in Fig. 1: CB (collects one epoch of data in parallel) \rightarrow FL (proceeds with CB data together and outputs CB’s parameterization) \rightarrow updated CB (collects another epoch of data in parallel) \rightarrow \cdots. A compact summary, including the components of FL and CB employed in previous FCB works, is presented in Table 2.

However, with a deeper look into the existing works, it is evident that the adopted FL components are not well investigated and even have some mismatches from canonical FL designs (McMahan et al., 2017; Konečnỳ et al., 2016). For example, in federated linear bandits (Wang et al., 2019; Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022; Amani et al., 2022; Fan et al., 2023) and its extensions (Li et al., 2022; 2023; Li & Wang, 2022b; Dai et al., 2023), the adopted FL protocols typically involve the direct transmission of local reward aggregates and covariance matrices, constituting a one-shot aggregation of compressed local data per epoch (albeit with subtle variations, such as synchronous or asynchronous communications); a concrete example is given in Appendix A.2, Due to both efficiency and privacy concerns, such choices are rare (and even undesirable) in canonical FL studies, where agents typically communicate and aggregate their model parameters (e.g., gradients) over multiple rounds, e.g., the renowned FedAvg algorithm (McMahan et al., 2017) (see details in Appendix A.2).

We believe that this disparity represents a significant drawback in current FCB studies, as it limits the connection between FL and FCB to merely philosophical, i.e., benefiting individual learning by collaborating through a federation, while vast FL studies cannot be leveraged to benefit FCB as illustrated in Fig. 2. Driven by this gap, this work aims to take one step towards establishing a closer relationship between FCB and FL through the introduction of an exploring design, FedIGW, that is detailed in the subsequent sections. This approach provides the flexibility to integrate any FL protocol following the standard FL framework, which allows us to effectively harness the progress made in FL studies, encompassing canonical algorithmic designs, convergence analyses, and useful appendages.

Refer to caption
Figure 2: Comparison between the FL components in existing FCB approaches and the FedIGW design proposed in this work, where the former requires tailored FL protocols while the latter can flexibly leverage both existing and forthcoming protocols in canonical FL studies. Additional comparisons regarding the FL components can be found in Appendix A.2.

3 FedIGW: Flexible Incorporation of FL Protocols

In this section, we present FedIGW, a novel FCB algorithm proposed in this work. Before delving into the algorithmic details, a more concrete system model with stochastic contexts and a realizable reward function is introduced. Subsequently, we outline the specifics of FedIGW, emphasizing its principal strength in seamlessly integrating canonical FL protocols.

3.1 System Model

Built on the formulation in Sec. 2, for each agent m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], denote 𝒳msubscript𝒳𝑚\mathcal{X}_{m}caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT a context space, and 𝒜msubscript𝒜𝑚\mathcal{A}_{m}caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT a finite set of Kmsubscript𝐾𝑚K_{m}italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT actions. At each time step tmsubscript𝑡𝑚t_{m}italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT of each agent m𝑚mitalic_m, the environment samples a context xm,tm𝒳msubscript𝑥𝑚subscript𝑡𝑚subscript𝒳𝑚x_{m,t_{m}}\in\mathcal{X}_{m}italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and a context-dependent reward vector rm,tm[0,1]𝒜msubscript𝑟𝑚subscript𝑡𝑚superscript01subscript𝒜𝑚r_{m,t_{m}}\in[0,1]^{\mathcal{A}_{m}}italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT according to a fixed but unknown distribution 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. The agent m𝑚mitalic_m, as in Sec. 2, then observes the context xm,tmsubscript𝑥𝑚subscript𝑡𝑚x_{m,t_{m}}italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT, picks an action am,tm𝒜msubscript𝑎𝑚subscript𝑡𝑚subscript𝒜𝑚a_{m,t_{m}}\in\mathcal{A}_{m}italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and receives the reward rm,tm(am,tm)subscript𝑟𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚r_{m,t_{m}}(a_{m,t_{m}})italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). The expected reward of playing action amsubscript𝑎𝑚a_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT given context xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is denoted as μm(xm,am):=𝔼[rm,tm(am)|xm,tm=xm]assignsubscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚𝔼delimited-[]conditionalsubscript𝑟𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚subscript𝑥𝑚\mu_{m}(x_{m},a_{m}):=\mathbb{E}[r_{m,t_{m}}(a_{m})|x_{m,t_{m}}=x_{m}]italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := blackboard_E [ italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ].

With no prior information about the rewards, the agents gradually learn their optimal policies, denoted by πm*(xm):=argmaxam𝒜mμm(xm,am)assignsubscriptsuperscript𝜋𝑚subscript𝑥𝑚subscriptargmaxsubscript𝑎𝑚subscript𝒜𝑚subscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚\pi^{*}_{m}(x_{m}):=\operatorname*{arg\,max}_{a_{m}\in\mathcal{A}_{m}}\mu_{m}(% x_{m},a_{m})italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) for agent m𝑚mitalic_m with context xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Following a standard notation (Wang et al., 2019; Huang et al., 2021b; Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022; Amani et al., 2022; Li & Wang, 2022b; Li et al., 2022; 2023; Dai et al., 2023), the overall regret of M𝑀Mitalic_M agents in this environment is

Reg(T):=𝔼[m[M]tm[Tm][μm(xm,tm,πm*(xm,tm))μm(xm,tm,am,tm)]],assignReg𝑇𝔼delimited-[]subscript𝑚delimited-[]𝑀subscriptsubscript𝑡𝑚delimited-[]subscript𝑇𝑚delimited-[]subscript𝜇𝑚subscript𝑥𝑚subscript𝑡𝑚subscriptsuperscript𝜋𝑚subscript𝑥𝑚subscript𝑡𝑚subscript𝜇𝑚subscript𝑥𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚\displaystyle\textup{Reg}(T):=\mathbb{E}\left[\sum_{m\in[M]}\sum_{t_{m}\in[T_{% m}]}\big{[}\mu_{m}(x_{m,t_{m}},\pi^{*}_{m}(x_{m,t_{m}}))-\mu_{m}(x_{m,t_{m}},a% _{m,t_{m}})\big{]}\right],Reg ( italic_T ) := blackboard_E [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT [ italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) - italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] ] ,

where Tm=tm(T)subscript𝑇𝑚subscript𝑡𝑚𝑇T_{m}=t_{m}(T)italic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_T ) is the effective time horizon for agent m𝑚mitalic_m given a global horizon T𝑇Titalic_T and the expectation is taken over the randomness in contexts and rewards and the agents’ algorithms. This overall regret can be interpreted as the sum of each agent m𝑚mitalic_m’s individual regret with respect to (w.r.t.) her optimal strategy πm*subscriptsuperscript𝜋𝑚\pi^{*}_{m}italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Hence, it is ideal to be sub-linear w.r.t. the number of agents M𝑀Mitalic_M, which indicates the agents’ learning processes are accelerated on average due to federation.

Realizablilty. Despite not knowing the true expected reward functions, we consider the scenario that they are the same across agents and are within a function class \mathcal{F}caligraphic_F, to which the agents have access. This assumption, rigorously stated in the following, is often referred to as the realizability assumption.

Assumption 3.1 (Realizability).

There exists f*superscript𝑓f^{*}italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT in \mathcal{F}caligraphic_F such that f*(xm,am)=μm(xm,am)superscript𝑓subscript𝑥𝑚subscript𝑎𝑚subscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚f^{*}(x_{m},a_{m})=\mu_{m}(x_{m},a_{m})italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], xm𝒳msubscript𝑥𝑚subscript𝒳𝑚x_{m}\in\mathcal{X}_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and am𝒜msubscript𝑎𝑚subscript𝒜𝑚a_{m}\in\mathcal{A}_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

This assumption is a natural extension from its commonly-adopted single-agent version (Agarwal et al., 2012; Simchi-Levi & Xu, 2022; Xu & Zeevi, 2020; Sen et al., 2021) to a federated one. Note that it does not imply that the agents’ environments are the same since they may face different contexts 𝒳msubscript𝒳𝑚\mathcal{X}_{m}caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, arms 𝒜msubscript𝒜𝑚\mathcal{A}_{m}caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and distributions 𝒟m𝒳msuperscriptsubscript𝒟𝑚subscript𝒳𝑚\mathcal{D}_{m}^{\mathcal{X}_{m}}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where 𝒟m𝒳msubscriptsuperscript𝒟subscript𝒳𝑚𝑚\mathcal{D}^{\mathcal{X}_{m}}_{m}caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the marginal distribution of the joint distribution 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT on the context space 𝒳msubscript𝒳𝑚\mathcal{X}_{m}caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. We study a general FCB setting only with this assumption, which incorporates many previously studied FCB scenarios as special cases. For example, the federated linear bandits (Huang et al., 2021b; Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022; Amani et al., 2022) are with a linear function class \mathcal{F}caligraphic_F.

Algorithm 1 FedIGW (Agent m𝑚mitalic_m)
1:epoch number l=1𝑙1l=1italic_l = 1, reward function f^ml(,)=0superscriptsubscript^𝑓𝑚𝑙0\widehat{f}_{m}^{l}(\cdot,\cdot)=0over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) = 0, local dataset 𝒮ml=subscriptsuperscript𝒮𝑙𝑚\mathcal{S}^{l}_{m}=\emptysetcaligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = ∅
2:for time step tm=1,2,subscript𝑡𝑚12t_{m}=1,2,\cdotsitalic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 1 , 2 , ⋯ do
3:      observe context xm,tmsubscript𝑥𝑚subscript𝑡𝑚x_{m,t_{m}}italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT \triangleright CB: IGW
4:     compute a^m*=argmaxam𝒜mf^l(am,xm,tm)subscriptsuperscript^𝑎𝑚subscriptargmaxsubscript𝑎𝑚subscript𝒜𝑚superscript^𝑓𝑙subscript𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚\widehat{a}^{*}_{m}=\operatorname*{arg\,max}_{a_{m}\in\mathcal{A}_{m}}\widehat% {f}^{l}(a_{m},x_{m,t_{m}})over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) and action selection distribution
pml(am|xm,tm){1/(Km+γl(f^l(a^m*,xm,tm)f^l(am,xm,tm)))if ama^m*1ama^m*pml(am|xm,tm)if am=a^m*subscriptsuperscript𝑝𝑙𝑚conditionalsubscript𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚cases1subscript𝐾𝑚superscript𝛾𝑙superscript^𝑓𝑙subscriptsuperscript^𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚superscript^𝑓𝑙subscript𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚if ama^m*1subscriptsubscriptsuperscript𝑎𝑚subscriptsuperscript^𝑎𝑚subscriptsuperscript𝑝𝑙𝑚conditionalsubscriptsuperscript𝑎𝑚subscript𝑥𝑚subscript𝑡𝑚if am=a^m*p^{l}_{m}(a_{m}|x_{m,t_{m}})\leftarrow\begin{cases}1/\left(K_{m}+\gamma^{l}% \left(\widehat{f}^{l}(\widehat{a}^{*}_{m},x_{m,t_{m}})-\widehat{f}^{l}(a_{m},x% _{m,t_{m}})\right)\right)&\text{if $a_{m}\neq\widehat{a}^{*}_{m}$}\\ 1-\sum_{a^{\prime}_{m}\neq\widehat{a}^{*}_{m}}p^{l}_{m}(a^{\prime}_{m}|x_{m,t_% {m}})&\text{if $a_{m}=\widehat{a}^{*}_{m}$}\end{cases}italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ← { start_ROW start_CELL 1 / ( italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ) end_CELL start_CELL if italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≠ over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 - ∑ start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≠ over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW
5:     select action am,tmpml(|xm,tm)a_{m,t_{m}}\sim p_{m}^{l}(\cdot|x_{m,t_{m}})italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ); observe reward rm,tm(am,tm)subscript𝑟𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚r_{m,t_{m}}(a_{m,t_{m}})italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
6:     update the local dataset 𝒮ml𝒮ml{(xm,tm,am,tm,rm,tm(am,tm))}subscriptsuperscript𝒮𝑙𝑚subscriptsuperscript𝒮𝑙𝑚subscript𝑥𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚subscript𝑟𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚\mathcal{S}^{l}_{m}\leftarrow\mathcal{S}^{l}_{m}\cup\{(x_{m,t_{m}},a_{m,t_{m}}% ,r_{m,t_{m}}(a_{m,t_{m}}))\}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∪ { ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) }
7:     if tm=tm(τl)subscript𝑡𝑚subscript𝑡𝑚superscript𝜏𝑙t_{m}=t_{m}(\tau^{l})italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) then \triangleright FL
8:         perform FL f^l+1𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎(𝒮ml)superscript^𝑓𝑙1𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎superscriptsubscript𝒮𝑚𝑙\widehat{f}^{l+1}\leftarrow\texttt{FLroutine}(\mathcal{S}_{m}^{l})over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT ← FLroutine ( caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )
9:          update dataset 𝒮ml+1subscriptsuperscript𝒮𝑙1𝑚\mathcal{S}^{l+1}_{m}\leftarrow\emptysetcaligraphic_S start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← ∅; update epoch ll+1𝑙𝑙1l\leftarrow l+1italic_l ← italic_l + 1
10:     end if
11:end for

3.2 Algorithm Design

The FedIGW algorithm proceeds in epochs, which are separated at time slots τ1,τ2,superscript𝜏1superscript𝜏2\tau^{1},\tau^{2},\cdotsitalic_τ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ w.r.t. the global time step t𝑡titalic_t, i.e., the l𝑙litalic_l-th epoch starts from t=τl1+1𝑡superscript𝜏𝑙11t=\tau^{l-1}+1italic_t = italic_τ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + 1 and ends at t=τl𝑡superscript𝜏𝑙t=\tau^{l}italic_t = italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT. The overall number of epochs is denoted as l(T)𝑙𝑇l(T)italic_l ( italic_T ). In each epoch l𝑙litalic_l, we describe the FL and CB components as follows, while emphasizing that the FL component is decoupled and follows the standard FL framework.

CB: inverse gap weighting (IGW). For CB, we use inverse gap weighting (Abe & Long, 1999), which has received growing interest in the single-agent setting recently (Foster & Rakhlin, 2020; Simchi-Levi & Xu, 2022; Krishnamurthy et al., 2021; Ghosh et al., 2021) but has not been fully investigated in the federated setting. At any time step in epoch l𝑙litalic_l, when encountering the context xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, agent m𝑚mitalic_m first identifies the optimal arm by a^m*=argmaxam𝒜mf^l(xm,am)subscriptsuperscript^𝑎𝑚subscriptargmaxsubscript𝑎𝑚subscript𝒜𝑚superscript^𝑓𝑙subscript𝑥𝑚subscript𝑎𝑚\widehat{a}^{*}_{m}=\operatorname*{arg\,max}_{a_{m}\in\mathcal{A}_{m}}\widehat% {f}^{l}(x_{m},a_{m})over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) from an estimated reward function f^lsuperscript^𝑓𝑙\widehat{f}^{l}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT (provided by the to-be-discussed FL component). Then, she randomly selects her action amsubscript𝑎𝑚a_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT according to the following distribution, which is inversely proportional to each action’s estimated reward gap from the identified optimal action a^m*subscriptsuperscript^𝑎𝑚\widehat{a}^{*}_{m}over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT:

pml(am|xm){1/(Km+γl(f^l(a^m*,xm)f^l(am,xm)))if ama^m*1ama^m*pml(am|xm)if am=a^m*,subscriptsuperscript𝑝𝑙𝑚conditionalsubscript𝑎𝑚subscript𝑥𝑚cases1subscript𝐾𝑚superscript𝛾𝑙superscript^𝑓𝑙subscriptsuperscript^𝑎𝑚subscript𝑥𝑚superscript^𝑓𝑙subscript𝑎𝑚subscript𝑥𝑚if ama^m*1subscriptsubscriptsuperscript𝑎𝑚subscriptsuperscript^𝑎𝑚subscriptsuperscript𝑝𝑙𝑚conditionalsubscriptsuperscript𝑎𝑚subscript𝑥𝑚if am=a^m*\displaystyle p^{l}_{m}(a_{m}|x_{m})\leftarrow\begin{cases}1/\left(K_{m}+% \gamma^{l}\left(\widehat{f}^{l}(\widehat{a}^{*}_{m},x_{m})-\widehat{f}^{l}(a_{% m},x_{m})\right)\right)&\text{if $a_{m}\neq\widehat{a}^{*}_{m}$}\\ 1-\sum_{a^{\prime}_{m}\neq\widehat{a}^{*}_{m}}p^{l}_{m}(a^{\prime}_{m}|x_{m})&% \text{if $a_{m}=\widehat{a}^{*}_{m}$}\end{cases},italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ← { start_ROW start_CELL 1 / ( italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ) end_CELL start_CELL if italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≠ over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 - ∑ start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≠ over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW ,

where γlsuperscript𝛾𝑙\gamma^{l}italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is the learning rate in epoch l𝑙litalic_l that controls the exploration-exploitation tradeoff.

Besides being a valuable supplement to the currently dominating UCB-based studies in FCB, the main merit of leveraging IGW as the CB component is that it only requires an estimated reward function instead of other complicated data analytics, e.g., upper confidence bounds.

FL: flexible choices. By IGW, each agent m𝑚mitalic_m performs local stochastic arm sampling and collects a set of data samples 𝒮ml:={(xm,tm,am,tm,rm,tm:tm[tm(τl1)+1,tm(τl)])}\mathcal{S}^{l}_{m}:=\{(x_{m,t_{m}},a_{m,t_{m}},r_{m,t_{m}}:t_{m}\in[t_{m}(% \tau^{l-1})+1,t_{m}(\tau^{l})])\}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := { ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT : italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) + 1 , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ] ) } in epoch l𝑙litalic_l. To enhance the performance of IGW in the subsequent epoch l+1𝑙1l+1italic_l + 1, an improved estimate f^l+1superscript^𝑓𝑙1\widehat{f}^{l+1}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT based on all agents’ data is desired. This objective aligns precisely with the aim of canonical FL studies, which aggregates local data for better global estimates (McMahan et al., 2017; Konečnỳ et al., 2016). Thus, the agents can target solving the following standard FL problem:

minf^(f;𝒮[M]l):=m[M](nm/n)^m(f;𝒮ml),assignsubscript𝑓^𝑓subscriptsuperscript𝒮𝑙delimited-[]𝑀subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript^𝑚𝑓subscriptsuperscript𝒮𝑙𝑚\displaystyle\min_{f\in\mathcal{F}}\widehat{\mathcal{L}}(f;\mathcal{S}^{l}_{[M% ]}):=\sum_{m\in[M]}(n_{m}/n)\cdot\widehat{\mathcal{L}}_{m}(f;\mathcal{S}^{l}_{% m}),roman_min start_POSTSUBSCRIPT italic_f ∈ caligraphic_F end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG ( italic_f ; caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_n ) ⋅ over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ; caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , (1)

where nm:=|𝒮ml|assignsubscript𝑛𝑚subscriptsuperscript𝒮𝑙𝑚n_{m}:=|\mathcal{S}^{l}_{m}|italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := | caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | is the number of samples in dataset 𝒮mlsubscriptsuperscript𝒮𝑙𝑚\mathcal{S}^{l}_{m}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, n:=m[M]nmassign𝑛subscript𝑚delimited-[]𝑀subscript𝑛𝑚n:=\sum_{m\in[M]}n_{m}italic_n := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the total number of samples, and ^m(f;𝒮ml):=(1/nm)i[nm]m(f(xmi,ami);rmi)assignsubscript^𝑚𝑓subscriptsuperscript𝒮𝑙𝑚1subscript𝑛𝑚subscript𝑖delimited-[]subscript𝑛𝑚subscript𝑚𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖\widehat{\mathcal{L}}_{m}(f;\mathcal{S}^{l}_{m}):=(1/n_{m})\cdot\sum_{i\in[n_{% m}]}\ell_{m}(f(x_{m}^{i},a_{m}^{i});r_{m}^{i})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ; caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := ( 1 / italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⋅ ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is the empirical local loss of agent m𝑚mitalic_m with m(;):2:subscript𝑚superscript2\ell_{m}(\cdot;\cdot):\mathbb{R}^{2}\to\mathbb{R}roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ; ⋅ ) : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R as the loss function and (xmi,ami,rmi)superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖(x_{m}^{i},a_{m}^{i},r_{m}^{i})( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) as the i𝑖iitalic_i-th sample in 𝒮mlsubscriptsuperscript𝒮𝑙𝑚\mathcal{S}^{l}_{m}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

As Eqn. (1) exactly follows the standard formulation of FL, the agents and the server can employ any FL protocol to solve this optimization, such as FedAvg (McMahan et al., 2017), SCAFFOLD (Karimireddy et al., 2020) and FedProx (Li et al., 2020a). These wildly-adopted FL protocols typically perform iterative communications of local model parameters (e.g., gradients), instead of one-shot aggregations of compressed local data in previous FCB studies. To highlight the remarkable flexibility, we denote the adopted FL protocol as 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ). With datasets 𝒮[M]l:={𝒮ml:m[M]}assignsubscriptsuperscript𝒮𝑙delimited-[]𝑀conditional-setsubscriptsuperscript𝒮𝑙𝑚𝑚delimited-[]𝑀\mathcal{S}^{l}_{[M]}:=\{\mathcal{S}^{l}_{m}:m\in[M]\}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := { caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] }, the output function of this FL process, denoted as f^l+1𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎(𝒮[M]l)superscript^𝑓𝑙1𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎subscriptsuperscript𝒮𝑙delimited-[]𝑀\widehat{f}^{l+1}\leftarrow\texttt{FLroutine}(\mathcal{S}^{l}_{[M]})over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT ← FLroutine ( caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ), is used as the estimated reward function for IGW sampling in the next epoch l+1𝑙1l+1italic_l + 1.

The FedIGW algorithm for agent m𝑚mitalic_m is summarized in Alg. 1. The key, as aforementioned, is that the component of FL in FedIGW is highly flexible as it only requires an estimated reward function for later IGW interactions. In particular, any existing or forthcoming FL protocol following the standard FL framework in Eqn. (1) can be leveraged as the 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ) in FedIGW.

Remark 3.2.

The main underlying reason for selecting IGW as the CB component is that it is a regression-based CB algorithm, i.e., IGW only requires a learned reward function f^lsuperscript^𝑓𝑙\hat{f}^{l}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT for the CB interaction in one epoch l𝑙litalic_l. The canonical FL framework with an optimization perspective is exactly targeted at learning such a function via collaboratively solving Eqn. (1), which thus can be integrated with IGW. In contrast, previous FCB designs are predominated by UCB-based CB components as reflected in Table 2. However, obtaining the upper confidence bounds (UCBs) estimates for an unknown reward function is not usually the target of the canonical FL framework. Thus, tailored FL components are developed to fulfill this purpose, e.g., sharing covariance matrices to obtain UCBs for linear reward functions.

4 Theoretical Guarantees: Modularized Plug-in of FL Analyses

In this section, we theoretically analyze the performance of the FedIGW algorithm, where the impact of the adopted FL choice is modularized as a plug-in component of its optimization error.

4.1 A General Guarantee

Denoting Eml:=tm(τl)tm(τl1)assignsubscriptsuperscript𝐸𝑙𝑚subscript𝑡𝑚superscript𝜏𝑙subscript𝑡𝑚superscript𝜏𝑙1E^{l}_{m}:=t_{m}(\tau^{l})-t_{m}(\tau^{l-1})italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) - italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) as the length of epoch l𝑙litalic_l for agent m𝑚mitalic_m, E[M]l:={Eml:m[M]}assignsubscriptsuperscript𝐸𝑙delimited-[]𝑀conditional-setsubscriptsuperscript𝐸𝑙𝑚𝑚delimited-[]𝑀E^{l}_{[M]}:=\{E^{l}_{m}:m\in[M]\}italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := { italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } as the epoch length set, c¯:=minm[M],l[2,l(T)]Eml/Eml1assign¯𝑐subscriptformulae-sequence𝑚delimited-[]𝑀𝑙2𝑙𝑇subscriptsuperscript𝐸𝑙𝑚subscriptsuperscript𝐸𝑙1𝑚\underline{c}:=\min_{m\in[M],l\in[2,l(T)]}E^{l}_{m}/E^{l-1}_{m}under¯ start_ARG italic_c end_ARG := roman_min start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] , italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, c¯:=maxm[M],l[2,l(T)]Eml/Eml1assign¯𝑐subscriptformulae-sequence𝑚delimited-[]𝑀𝑙2𝑙𝑇subscriptsuperscript𝐸𝑙𝑚subscriptsuperscript𝐸𝑙1𝑚\overline{c}:=\max_{m\in[M],l\in[2,l(T)]}E^{l}_{m}/E^{l-1}_{m}over¯ start_ARG italic_c end_ARG := roman_max start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] , italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and c:=c¯/c¯assign𝑐¯𝑐¯𝑐c:=\overline{c}/\underline{c}italic_c := over¯ start_ARG italic_c end_ARG / under¯ start_ARG italic_c end_ARG, the following global regret guarantee can be established.

Theorem 4.1.

Using a learning rate γl=O(m[M]Eml1Km/(m[M]Eml1(E[M]l1)))superscript𝛾𝑙𝑂subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptsuperscript𝐸𝑙1delimited-[]𝑀\gamma^{l}=O\left(\sqrt{\sum_{m\in[M]}E^{l-1}_{m}K_{m}/(\sum_{m\in[M]}E^{l-1}_% {m}\mathcal{E}(E^{l-1}_{[M]}))}\right)italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = italic_O ( square-root start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ) end_ARG ) in epoch l𝑙litalic_l, denoting K¯l:=m[M]EmlKm/m[M]Emlassignsuperscriptnormal-¯𝐾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚\bar{K}^{l}:=\sum_{m\in[M]}E^{l}_{m}K_{m}/\sum_{m\in[M]}E^{l}_{m}over¯ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, the regret of FedIGW can be bounded as

Reg(T)=O(m[M]Em1+l[2,l(T)]c52K¯l(E[M]l1)m[M]Eml).Reg𝑇𝑂subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscript𝑙2𝑙𝑇superscript𝑐52superscript¯𝐾𝑙subscriptsuperscript𝐸𝑙1delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚\displaystyle\textup{Reg}(T)=O\left(\sum_{m\in[M]}E^{1}_{m}+\sum_{l\in[2,l(T)]% }c^{\frac{5}{2}}\sqrt{\bar{K}^{l}\mathcal{E}(E^{l-1}_{[M]})}\sum_{m\in[M]}E^{l% }_{m}\right).Reg ( italic_T ) = italic_O ( ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT divide start_ARG 5 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG over¯ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) . (2)

Here (E[M]l)subscriptsuperscript𝐸𝑙delimited-[]𝑀\mathcal{E}(E^{l}_{[M]})caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) (abbreviated from (;E[M]l)subscriptsuperscript𝐸𝑙delimited-[]𝑀\mathcal{E}(\mathcal{F};E^{l}_{[M]})caligraphic_E ( caligraphic_F ; italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT )) denotes the excess risk of the output from the adopted 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎(𝒮[M]l)𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎subscriptsuperscript𝒮𝑙delimited-[]𝑀\textup{{FLroutine}}(\mathcal{S}^{l}_{[M]})FLroutine ( caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) using the datasets 𝒮[M]lsubscriptsuperscript𝒮𝑙delimited-[]𝑀\mathcal{S}^{l}_{[M]}caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT, whose formal definition is deferred to Definition C.1.

It can be observed that in Eqn. (2), the first term bounds the regret in the first epoch. The obtained bounds for the regrets incurred within each later epoch (i.e., the term inside the sum over l𝑙litalic_l in the second epoch) can be interpreted as the epoch length times the expected per-step suboptimality, which then relates to the estimation quality of f^lsuperscript^𝑓𝑙\widehat{f}^{l}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and thus (E[M]l1)subscriptsuperscript𝐸𝑙1delimited-[]𝑀\mathcal{E}(E^{l-1}_{[M]})caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) as f^lsuperscript^𝑓𝑙\widehat{f}^{l}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is learned with the interaction data collected from epoch l1𝑙1l-1italic_l - 1 as in the design of FedIGW shown in Alg. 1.

4.2 Some Concretized Discussions

Theorem 4.1 is notably general in the sense that a corresponding regret can be established as long as an upper bound on the excess risk (E[M]l1)subscriptsuperscript𝐸𝑙1delimited-[]𝑀\mathcal{E}(E^{l-1}_{[M]})caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) can be obtained for a certain class of reward functions and the adopted FL protocol. In the following, we provide several more concrete illustrations, and especially, a modularized framework to leverage FL convergence analyses. To ease the notation, we discuss synchronous systems with a shared number of arms in the following, i.e., tm=t,m[M]formulae-sequencesubscript𝑡𝑚𝑡for-all𝑚delimited-[]𝑀t_{m}=t,\forall m\in[M]italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_t , ∀ italic_m ∈ [ italic_M ], and Km=K,m[M]formulae-sequencesubscript𝐾𝑚𝐾for-all𝑚delimited-[]𝑀K_{m}=K,\forall m\in[M]italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_K , ∀ italic_m ∈ [ italic_M ], while noting similar results can be easily obtained for general systems. With this simplification, we can unify all Emlsubscriptsuperscript𝐸𝑙𝑚E^{l}_{m}italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT as Elsuperscript𝐸𝑙E^{l}italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and K¯lsuperscript¯𝐾𝑙\bar{K}^{l}over¯ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT as K𝐾Kitalic_K.

To initiate the concretized discussions, we start with considering a finite function class \mathcal{F}caligraphic_F, i.e., ||<|\mathcal{F}|<\infty| caligraphic_F | < ∞, which can be extended to a function class \mathcal{F}caligraphic_F with a finite covering number of the metric space (,l)subscript𝑙(\mathcal{F},l_{\infty})( caligraphic_F , italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ). In particular, the following corollary can be established via establishing (n[M])=O(log(||n)/n)subscript𝑛delimited-[]𝑀𝑂𝑛𝑛\mathcal{E}(n_{[M]})=O(\log(|\mathcal{F}|n)/n)caligraphic_E ( italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) = italic_O ( roman_log ( | caligraphic_F | italic_n ) / italic_n ) in the considered case as in Lemma D.2.

Corollary 4.2 (A Finite Function Class).

If ||<|\mathcal{F}|<\infty| caligraphic_F | < ∞ and the adopted FL protocol provides an exact minimizer for Eqn. (1) with quadratic losses, with τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, FedIGW incurs a regret of Reg(T)=O(KMTlog(||MT))Reg𝑇𝑂𝐾𝑀𝑇𝑀𝑇\textup{Reg}(T)=O(\sqrt{KMT\log(|\mathcal{F}|MT)})Reg ( italic_T ) = italic_O ( square-root start_ARG italic_K italic_M italic_T roman_log ( | caligraphic_F | italic_M italic_T ) end_ARG ) and a total O(log(T))𝑂𝑇O(\log(T))italic_O ( roman_log ( italic_T ) ) calls of the adopted FL protocol.

We note that the obtained regret approaches the optimal regret Ω(KMTlog(||)/log(K))Ω𝐾𝑀𝑇𝐾\Omega(\sqrt{KMT\log(|\mathcal{F}|)/\log(K)})roman_Ω ( square-root start_ARG italic_K italic_M italic_T roman_log ( | caligraphic_F | ) / roman_log ( italic_K ) end_ARG ) of a single agent playing for MT𝑀𝑇MTitalic_M italic_T rounds (Agarwal et al., 2012) up to logarithmic factors, which demonstrates the statistical efficiency of the proposed FedIGW. Moreover, the total O(log(T))𝑂𝑇O(\log(T))italic_O ( roman_log ( italic_T ) ) times call of the FL protocol indicates that only a limited number of agents-server information-sharing are required, which further illustrates its communication efficiency.

As the finite function class is not often practically useful, we then focus on the canonical FL setting that each f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F is parameterized by a d𝑑ditalic_d-dimensional parameter ωd𝜔superscript𝑑\omega\in\mathbb{R}^{d}italic_ω ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as fωsubscript𝑓𝜔f_{\omega}italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, e.g., a neural network. To facilitate discussions, we abbreviate 𝒮:=𝒮[M]assign𝒮subscript𝒮delimited-[]𝑀\mathcal{S}:=\mathcal{S}_{[M]}caligraphic_S := caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT while denoting ω𝒮*:=argminω^(fω;𝒮)assignsubscriptsuperscript𝜔𝒮subscriptargmin𝜔^subscript𝑓𝜔𝒮\omega^{*}_{\mathcal{S}}:=\operatorname*{arg\,min}_{\omega}\widehat{\mathcal{L% }}(f_{\omega};\mathcal{S})italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT := start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) as the empirical optimal parameter given a fixed dataset 𝒮𝒮\mathcal{S}caligraphic_S and ω^𝒮subscript^𝜔𝒮\widehat{\omega}_{\mathcal{S}}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT as the output of the adopted FL protocol. We further assume f*superscript𝑓f^{*}italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is parameterized by the true model parameter ω*superscript𝜔\omega^{*}italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, and for a fixed ω𝜔\omegaitalic_ω, define (fω):=𝔼𝒮[^(fω;𝒮)]assignsubscript𝑓𝜔subscript𝔼𝒮delimited-[]^subscript𝑓𝜔𝒮\mathcal{L}(f_{\omega}):=\mathbb{E}_{\mathcal{S}}[\widehat{\mathcal{L}}(f_{% \omega};\mathcal{S})]caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) ] as its expected loss w.r.t. the data distribution.

Following standard learning-theoretic analyses, the key task excess risk (;n[M])subscript𝑛delimited-[]𝑀\mathcal{E}(\mathcal{F};n_{[M]})caligraphic_E ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) can be bounded via a combination of errors stemming from optimization and generalization.

Lemma 4.3.

If the loss function lm(;)subscript𝑙𝑚normal-⋅normal-⋅l_{m}(\cdot;\cdot)italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ; ⋅ ) is μfsubscript𝜇𝑓\mu_{f}italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT-strongly convex in its first coordinate for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], it holds that (;n[M])2(ε𝑜𝑝𝑡(;n[M])+ε𝑔𝑒𝑛(;n[M]))/μfsubscript𝑛delimited-[]𝑀2subscript𝜀𝑜𝑝𝑡subscript𝑛delimited-[]𝑀subscript𝜀𝑔𝑒𝑛subscript𝑛delimited-[]𝑀subscript𝜇𝑓\mathcal{E}(\mathcal{F};n_{[M]})\leq 2\left(\varepsilon_{\text{opt}}(\mathcal{% F};n_{[M]})+\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]})\right)/\mu_{f}caligraphic_E ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ 2 ( italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ) / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, where ε𝑔𝑒𝑛(;n[M]):=𝔼𝒮,ξ[(fω^𝒮)^(fω^𝒮;𝒮)]assignsubscript𝜀𝑔𝑒𝑛subscript𝑛delimited-[]𝑀subscript𝔼𝒮𝜉delimited-[]subscript𝑓subscriptnormal-^𝜔𝒮normal-^subscript𝑓subscriptnormal-^𝜔𝒮𝒮\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]}):=\mathbb{E}_{\mathcal{S},\xi}[% \mathcal{L}(f_{\widehat{\omega}_{\mathcal{S}}})-\widehat{\mathcal{L}}(f_{% \widehat{\omega}_{\mathcal{S}}};\mathcal{S})]italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] and ε𝑜𝑝𝑡(;n[M]):=𝔼𝒮,ξ[^(fω^𝒮;𝒮)^(fω𝒮*;𝒮)]assignsubscript𝜀𝑜𝑝𝑡subscript𝑛delimited-[]𝑀subscript𝔼𝒮𝜉delimited-[]normal-^subscript𝑓subscriptnormal-^𝜔𝒮𝒮normal-^subscript𝑓subscriptsuperscript𝜔𝒮𝒮\varepsilon_{\text{opt}}(\mathcal{F};n_{[M]}):=\mathbb{E}_{\mathcal{S},\xi}[% \widehat{\mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};\mathcal{S})-\widehat% {\mathcal{L}}(f_{\omega^{*}_{\mathcal{S}}};\mathcal{S})]italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ].

For the generalization error term εgen(;n[M])subscript𝜀gensubscript𝑛delimited-[]𝑀\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]})italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ), we can utilize standard results in learning theory (e.g., uniform convergence). For the sake of simplicity, we here leverage a distributional-independent upper bound on the Rademacher complexity, denoted as (;n[M])subscript𝑛delimited-[]𝑀\mathfrak{R}(\mathcal{F};n_{[M]})fraktur_R ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) (rigorously defined in Eqn. (4)), which provides that εgen(;n[M])2(;n[M])subscript𝜀gensubscript𝑛delimited-[]𝑀2subscript𝑛delimited-[]𝑀\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]})\leq 2\mathfrak{R}(\mathcal{F};n_% {[M]})italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ 2 fraktur_R ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) using the classical uniform convergence result (see Lemma D.5). We do not further particularize this upper bound while noting it can be specified following standard procedures (Mohri et al., 2018; Bartlett et al., 2005).

On the other hand, the optimization error term εopt(;n[M])subscript𝜀optsubscript𝑛delimited-[]𝑀\varepsilon_{\text{opt}}(\mathcal{F};n_{[M]})italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) is exactly the standard convergence error in the analysis of FL protocols. Thus, once any theoretical breakthrough on the convergence of one FL protocol is reported, the obtained result can be immediately incorporated into our analysis framework to characterize the performance of FedIGW using that FL protocol. In particular, the following corollary is established to demonstrate the modularized plug-in of analyses of different FL protocols, where FedAvg (McMahan et al., 2017) and SCAFFOLD (Karimireddy et al., 2020) are adopted as further specific instances. To the best of our knowledge, this is the first time that convergence analyses of FL protocols can directly benefit the analysis of FCB designs.

Corollary 4.4 (Modularized Plug-in of FL Analyses; A Simplified Version of Corollary D.6).

Under the condition of Lemma 4.3, the regret of FedIGW can be bounded as

Reg(T)=O(ME1+l[2,l(T)]K(l1+ε𝑜𝑝𝑡l))/μfMEl),\displaystyle\textup{Reg}(T)=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{K\left(% \mathfrak{R}^{l-1}+\varepsilon_{\text{opt}}^{l})\right)/\mu_{f}}ME^{l}\right),Reg ( italic_T ) = italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K ( fraktur_R start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ,

where l:=(;{El:m[M]})assignsuperscript𝑙conditional-setsuperscript𝐸𝑙𝑚delimited-[]𝑀\mathfrak{R}^{l}:=\mathfrak{R}(\mathcal{F};\{E^{l}:m\in[M]\})fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := fraktur_R ( caligraphic_F ; { italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : italic_m ∈ [ italic_M ] } ) and using ρlsuperscript𝜌𝑙\rho^{l}italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT rounds of communications (i.e., global aggregations) and κlsuperscript𝜅𝑙\kappa^{l}italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT rounds of local updates in epoch l𝑙litalic_l, under a few other standard conditions,

  • with FedAvg as the adopted 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ), it holds that εoptlO~((ρlκlM)1+(ρl)2)superscriptsubscript𝜀𝑜𝑝𝑡𝑙~𝑂superscriptsuperscript𝜌𝑙superscript𝜅𝑙𝑀1superscriptsuperscript𝜌𝑙2\varepsilon_{opt}^{l}\leq\tilde{O}((\rho^{l}\kappa^{l}M)^{-1}+(\rho^{l})^{-2})italic_ε start_POSTSUBSCRIPT italic_o italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ≤ over~ start_ARG italic_O end_ARG ( ( italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + ( italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT );

  • with SCAFFOLD as the adopted 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ), it holds that εoptlO~((ρlκlM)1)superscriptsubscript𝜀𝑜𝑝𝑡𝑙~𝑂superscriptsuperscript𝜌𝑙superscript𝜅𝑙𝑀1\varepsilon_{opt}^{l}\leq\tilde{O}((\rho^{l}\kappa^{l}M)^{-1})italic_ε start_POSTSUBSCRIPT italic_o italic_p italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ≤ over~ start_ARG italic_O end_ARG ( ( italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ).

From this corollary, we can see that FedIGW enables a general analysis framework to seamlessly leverage theoretical advances in FL, in particular, convergence analyses. Thus, besides FedAvg and SCAFFOLD, when switching the FL component in FedIGW to FedProx (Li et al., 2020a), FedOPT (Reddi et al., 2020), and other existing or forthcoming FL designs, we can effortlessly plug in their optimization errors to obtain corresponding performance guarantees of FedIGW. This convenience highlights the theoretically intimate relationship between FedIGW and canonical FL studies.

Moreover, Corollary 4.4 can also guide how to perform the adopted FL protocol. As the generalization error is an inherent property that cannot be bypassed by better optimization results, there is no need to further proceed with the iterative FL process as long as the optimization error does not dominate the generalization error, which is reflected in a more particularized corollary in Corollary D.7.

Remark 4.5 (A Linear Reward Function Class).

As a more specified instance, we consider linear reward functions as in federated linear bandits, i.e., fω()=ω,ϕ()subscript𝑓𝜔𝜔italic-ϕf_{\omega}(\cdot)=\langle\omega,\phi(\cdot)\rangleitalic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ⋅ ) = ⟨ italic_ω , italic_ϕ ( ⋅ ) ⟩ and f*()=ω*,ϕ()superscript𝑓superscript𝜔italic-ϕf^{*}(\cdot)=\langle\omega^{*},\phi(\cdot)\rangleitalic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( ⋅ ) = ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( ⋅ ) ⟩, where ϕ()ditalic-ϕsuperscript𝑑\phi(\cdot)\in\mathbb{R}^{d}italic_ϕ ( ⋅ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a known feature map**. In this case, the FL problem can be formulated as a standard ridge regression with m(fω(xm,am);rm):=(ω,ϕ(xm,am)rm)2+λω22assignsubscript𝑚subscript𝑓𝜔subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚2𝜆superscriptsubscriptnorm𝜔22\ell_{m}(f_{\omega}(x_{m},a_{m});r_{m}):=\left(\langle\omega,\phi(x_{m},a_{m})% \rangle-r_{m}\right)^{2}+\lambda\|\omega\|_{2}^{2}roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := ( ⟨ italic_ω , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_ω ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. With a properly chosen regularization parameter λ=O(1/n)𝜆𝑂1𝑛\lambda=O(1/n)italic_λ = italic_O ( 1 / italic_n ), the generalization error can be bounded as εgen(n[M])=O~(d/n)subscript𝜀gensubscript𝑛delimited-[]𝑀~𝑂𝑑𝑛\varepsilon_{\text{gen}}(n_{[M]})=\tilde{O}(d/n)italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) = over~ start_ARG italic_O end_ARG ( italic_d / italic_n ) (Hsu et al., 2012), while a same-order optimization error can be achieved by many efficient distributed algorithms (Nesterov, 2003) with roughly O(nlog(n/d))𝑂𝑛𝑛𝑑O(\sqrt{n}\log(n/d))italic_O ( square-root start_ARG italic_n end_ARG roman_log ( italic_n / italic_d ) ) rounds of communications. Then, with an exponentially growing epoch length, FedIGW can have a regret of O~(dMKT)~𝑂𝑑𝑀𝐾𝑇\tilde{O}(\sqrt{dMKT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_d italic_M italic_K italic_T end_ARG ) with at most O~(MT)~𝑂𝑀𝑇\tilde{O}(\sqrt{MT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_M italic_T end_ARG ) rounds of communications as illustrated in Appendix D.3, both of which are efficient with sublinear dependencies on the number of agents M𝑀Mitalic_M and time horizon T𝑇Titalic_T. It is worth noting that during this process, no raw or compressed data is communicated – only processed model parameters (e.g., gradients) are exchanged. This aligns with FL studies while is distinctive from previous designs for federated linear bandits (Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022; Fan et al., 2023), which often communicate covariance matrices or aggregated rewards.

Refer to caption
Refer to caption
Figure 3: The averaged reward collected by each agent via FedIGW (using different FL protocols) and the state-of-the-art FN-UCB with M=10𝑀10M=10italic_M = 10 participating agents on Bibtex (left) and Delicious (right) datasets.

5 Experimental Results

In this section, we report the empirical performances of FedIGW on two distinct real-world multi-label classification datasets, Bibtex (Katakis et al., 2008) and Delicious (Tsoumakas et al., 2008), which are also used in other practical CB investigations such as Cortes (2018). The aim of CB in these experiments is considered to be recommending one of the correct labels at any given time. Especially, in the experiments, at each time step, a context is randomly sampled from the dataset while the true labels are concealed from the agents. The agents then determine which label to select (i.e., pull one arm) with their CB algorithms; thus the number of arms is the number of possible labels in each dataset. Upon pulling one arm, a reward of 1111 is granted if the pulled arm corresponds to one of the true labels, while a reward of 00 is granted otherwise. From Table 3, we can observe that these tasks are challenging given their high-dimensional contexts (>500absent500>500> 500) and large numbers of arms (>150absent150>150> 150). Additional experimental details and results are discussed in Appendix G.

Table 3: The context dimension and number of arms in Bibtex and Delicious
Task Context dimension Number of arms
Bibtex 1835 159
Delicious 500 983

Varying FL choices. The reported Fig. 3 first compares the averaged rewards collected by each agent with FedIGW using different FL choices, including FedAvg (McMahan et al., 2017), SCAFFOLD (Karimireddy et al., 2020), and FedProx (Li et al., 2020a). This is the first time, to the best of our knowledge, that FedAvg is practically integrated with FCB experiments, let alone other FL protocols, which largely demonstrate the generality and flexibility of FedIGW. It can be observed that using the more developed SCAFFOLD and FedProx provides improved performance (i.e., collects more rewards) compared with the basic FedAvg, which credits to that FedIGW can flexibly leverage algorithmic advances in FL protocols.

Comparison with baselines. To further verify the performance of FedIGW, experiments are conducted to compare its performance with one state-of-the-art federated contextual bandit baseline. Especially, the federated neural-upper confidence bound (FN-UCB) design proposed in Dai et al. (2023) is adopted as the FCB baseline due to its capability of leveraging neural networks to approximate rewards and the previously reported strong performance. In Fig. 3, both FedIGW and FN-UCB leverage the same-size MLPs to approximate reward functions for fair comparisons. It can be observed that after convergence, FedIGW (even with the basic FedAvg) significantly outperforms FN-UCB with about twice the rewards collected by each agent on average, demonstrating its remarkable superiority.

6 Flexible Extensions: Seamless Integration of FL Appendages

Another notable advantage offered by the flexible FL choices is to bring appealing appendages from FL studies to directly benefit FCB. In the following, we discuss how to leverage techniques of personalization, robustness, and privacy from FL in FedIGW while presenting intriguing avenues for future exploration.

6.1 Personalized Learning

In many cases, each agent’s true reward function is not globally realizable as in Assumption 3.1, but instead only locally realizable in her own function class as in the following assumption.

Assumption 6.1 (Local Realizability).

For each m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], there exists fm*subscriptsuperscript𝑓𝑚f^{*}_{m}italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in msubscript𝑚\mathcal{F}_{m}caligraphic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT such that fm*(xm,am)=μm(xm,am)subscriptsuperscript𝑓𝑚subscript𝑥𝑚subscript𝑎𝑚subscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚f^{*}_{m}(x_{m},a_{m})=\mu_{m}(x_{m},a_{m})italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) for all xm𝒳msubscript𝑥𝑚subscript𝒳𝑚x_{m}\in\mathcal{X}_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and am𝒜msubscript𝑎𝑚subscript𝒜𝑚a_{m}\in\mathcal{A}_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT

Following discussions in Sec. 4.2, we consider that each function f𝑓fitalic_f in msubscript𝑚\mathcal{F}_{m}caligraphic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is parameterized by a dmsubscript𝑑𝑚d_{m}italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT-dimensional parameter ωmdmsubscript𝜔𝑚superscriptsubscript𝑑𝑚\omega_{m}\in\mathbb{R}^{d_{m}}italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which is denoted as fωmsubscript𝑓subscript𝜔𝑚f_{\omega_{m}}italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Correspondingly, the true reward function fm*subscriptsuperscript𝑓𝑚f^{*}_{m}italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is parameterized by ωm*subscriptsuperscript𝜔𝑚\omega^{*}_{m}italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and denoted as fωm*subscript𝑓subscriptsuperscript𝜔𝑚f_{\omega^{*}_{m}}italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT. To still motivate the collaboration and motivated by popular personalized FL studies (Hanzely et al., 2021; Agarwal et al., 2020), we study a middle case where only partial parameters are globally shared among {fωm*:m[M]}conditional-setsubscript𝑓superscriptsubscript𝜔𝑚𝑚delimited-[]𝑀\{f_{\omega_{m}^{*}}:m\in[M]\}{ italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } while other parameters are potentially heterogeneous among agents, which can be formulated via the following assumption.

Assumption 6.2.

For all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], the true parameter ωm*subscriptsuperscript𝜔𝑚\omega^{*}_{m}italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT can be decomposed as [ωα,*,ωmβ,*]superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚[\omega^{\alpha,*},\omega^{\beta,*}_{m}][ italic_ω start_POSTSUPERSCRIPT italic_α , * end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β , * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] with ωα,*dαsuperscript𝜔𝛼superscriptsuperscript𝑑𝛼\omega^{\alpha,*}\in\mathbb{R}^{d^{\alpha}}italic_ω start_POSTSUPERSCRIPT italic_α , * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and ωmβ,*dmβsubscriptsuperscript𝜔𝛽𝑚superscriptsubscriptsuperscript𝑑𝛽𝑚\omega^{\beta,*}_{m}\in\mathbb{R}^{d^{\beta}_{m}}italic_ω start_POSTSUPERSCRIPT italic_β , * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where dαminm[M]dmsuperscript𝑑𝛼subscript𝑚delimited-[]𝑀subscript𝑑𝑚d^{\alpha}\leq\min_{m\in[M]}d_{m}italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ≤ roman_min start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and dmβ:=dmdαassignsubscriptsuperscript𝑑𝛽𝑚subscript𝑑𝑚superscript𝑑𝛼d^{\beta}_{m}:=d_{m}-d^{\alpha}italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT. In other words, there are dαsuperscript𝑑𝛼d^{\alpha}italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT-dimensional globally shared parameters among {ωm*:m[M]}conditional-setsubscriptsuperscript𝜔𝑚𝑚delimited-[]𝑀\{\omega^{*}_{m}:m\in[M]\}{ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] }.

A similar setting is studied in Li & Wang (2022a) for linear reward functions and in Agarwal et al. (2020) for realizable cases with a naive ε𝜀\varepsilonitalic_ε-greedy design for CB. For FedIGW, we can directly adopt a personalized FL protocol (such as LSGD-PFL in Hanzely et al. (2021)) to solve a standard personalized FL problem:

minωα,ω[M]β^(fωα,ω[M]β;𝒮[M]):=m[M](nm/n)^m(fωα,ωmβ;𝒮m).assignsubscriptsuperscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀^subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀subscript𝒮delimited-[]𝑀subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript^𝑚subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚subscript𝒮𝑚\min_{\omega^{\alpha},\omega^{\beta}_{[M]}}\widehat{\mathcal{L}}(f_{\omega^{% \alpha},\omega^{\beta}_{[M]}};\mathcal{S}_{[M]}):=\sum_{m\in[M]}(n_{m}/n)\cdot% \widehat{\mathcal{L}}_{m}(f_{\omega^{\alpha},\omega^{\beta}_{m}};\mathcal{S}_{% m}).roman_min start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_n ) ⋅ over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) .

With outputs ω^αsuperscript^𝜔𝛼\widehat{\omega}^{\alpha}over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT and ω^[M]βsubscriptsuperscript^𝜔𝛽delimited-[]𝑀\widehat{\omega}^{\beta}_{[M]}over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT, the corresponding M𝑀Mitalic_M functions {fω^α,ω^mβ:m[M]}conditional-setsubscript𝑓superscript^𝜔𝛼subscriptsuperscript^𝜔𝛽𝑚𝑚delimited-[]𝑀\{f_{\widehat{\omega}^{\alpha},\widehat{\omega}^{\beta}_{m}}:m\in[M]\}{ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } (instead of the single one f^^𝑓\widehat{f}over^ start_ARG italic_f end_ARG in Sec. 3.2) can be used by the M𝑀Mitalic_M agents, separately, for their CB interactions following the IGW algorithm. Concrete results and more details can be found in Appendix E.1.

Remark 6.3 (A Linear Reward Function Class).

Similar to Remark 4.5, we also consider linear reward functions for the personalized setting with fm*():=ωm*,ϕ()assignsubscriptsuperscript𝑓𝑚subscriptsuperscript𝜔𝑚italic-ϕf^{*}_{m}(\cdot):=\langle\omega^{*}_{m},\phi(\cdot)\rangleitalic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ) := ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_ϕ ( ⋅ ) ⟩ and {ωm*:m[M]}conditional-setsubscriptsuperscript𝜔𝑚𝑚delimited-[]𝑀\{\omega^{*}_{m}:m\in[M]\}{ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } satisfying Assumption 6.2. Then, FedIGW still can achieve a regret of O~(d~MKT)~𝑂~𝑑𝑀𝐾𝑇\tilde{O}(\sqrt{\tilde{d}MKT})over~ start_ARG italic_O end_ARG ( square-root start_ARG over~ start_ARG italic_d end_ARG italic_M italic_K italic_T end_ARG ) with O~(MT)~𝑂𝑀𝑇\tilde{O}(\sqrt{MT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_M italic_T end_ARG ) rounds of communications, where d~:=dα+m[M]dmβassign~𝑑superscript𝑑𝛼subscript𝑚delimited-[]𝑀subscriptsuperscript𝑑𝛽𝑚\tilde{d}:=d^{\alpha}+\sum_{m\in[M]}d^{\beta}_{m}over~ start_ARG italic_d end_ARG := italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT; see more details in Appendix E.1.1.

6.2 Robustness, Privacy, and Beyond

Another important direction in FCB studies is to improve robustness against malicious attacks and provide privacy guarantees for local agents. A few progresses have been achieved in attaining these desirable attributes for FCB but they typically require substantial modifications to their base FCB designs, such as robustness in Demirel et al. (2022); Jadbabaie et al. (2022); Mitra et al. (2022) and privacy guarantees in Dubey & Pentland (2020); Zhou & Chowdhury (2023); Li & Song (2022); Huang et al. (2023).

With FedIGW, it is more convenient to achieve these attributes as suitable techniques from FL studies can be seamlessly applied. Especially, robustness and privacy protection have been extensively studied for FL in Yin et al. (2018); Pillutla et al. (2022); Fu et al. (2019) and Wei et al. (2020); Yin et al. (2021); Liu et al. (2022), respectively, among other works. As long as such FL protocols can provide an estimated function (which is the default goal of FL), they can be adopted in FedIGW to achieve additional robustness and privacy guarantees in FCB; see more details in Appendix E.2.

Other Possibilities. There have been many studies on fairness guarantees (Mohri et al., 2019; Du et al., 2021), client selections (Balakrishnan et al., 2022; Fraboni et al., 2021), and practical communication designs (Chen et al., 2021; Wei & Shen, 2022; Zheng et al., 2020) in FL among many other directions, which are all conceivably applicable in FedIGW. In addition, Marfoq et al. (2023) studies FL with data streams, i.e., data comes sequentially instead of being static, which is a suitable design for FCB as CB essentially provides data streams. If similar ideas can be leveraged in FCB, the two components of CB and FL can truly be parallel.

7 Conclusions

In this work, we studied the problem of federated contextual bandits (FCB). It is first recognized that existing FCB designs are largely disconnected from canonical FL studies in their adopted FL protocols, which hinders the integration of crucial FL advancements. To bridge this gap, we introduced a novel design, FedIGW, capable of accommodating a wide range of FL protocols, provided they address a standard FL problem. A comprehensive theoretical performance guarantee was provided for FedIGW, highlighting its efficiency and versatility. Notably, we demonstrated the modularized incorporation of convergence analysis from FL by employing examples of the renowned FedAvg (McMahan et al., 2017) and SCAFFOLD (Karimireddy et al., 2020). Empirical validations on real-world datasets further underscored its practicality and flexibility. Moreover, we explored how advancements in FL can seamlessly bestow additional desirable attributes upon FedIGW. Specifically, we delved into the incorporation of personalization, robustness, and privacy, presenting intriguing opportunities for future research.

It would be valuable to pursue further exploration of alternative CB algorithms within FCB, e.g., Xu & Zeevi (2020); Foster et al. (2020); Wei & Luo (2021), and investigate whether the FedIGW design can be extended to more general federated RL (Dubey & Pentland, 2021; Min et al., 2023).

References

  • Abbasi-Yadkori et al. (2011) Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  • Abe & Long (1999) Naoki Abe and Philip M Long. Associative reinforcement learning using linear probabilistic concepts. In ICML, pp.  3–11. Citeseer, 1999.
  • Agarwal et al. (2012) Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, and Robert Schapire. Contextual bandit learning with predictable rewards. In Artificial Intelligence and Statistics, pp.  19–26. PMLR, 2012.
  • Agarwal et al. (2020) Alekh Agarwal, John Langford, and Chen-Yu Wei. Federated residual learning. arXiv preprint arXiv:2003.12880, 2020.
  • Agarwal et al. (2023) Alekh Agarwal, H Brendan McMahan, and Zheng Xu. An empirical evaluation of federated contextual bandit algorithms. arXiv preprint arXiv:2303.10218, 2023.
  • Amani et al. (2022) Sanae Amani, Tor Lattimore, András György, and Lin F Yang. Distributed contextual linear bandits with minimax optimal communication cost. arXiv preprint arXiv:2205.13170, 2022.
  • Auer et al. (2002) Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  • Balakrishnan et al. (2022) Ravikumar Balakrishnan, Tian Li, Tianyi Zhou, Nageen Himayat, Virginia Smith, and Jeff Bilmes. Diverse client selection for federated learning via submodular maximization. In International Conference on Learning Representations, 2022.
  • Bartlett et al. (2005) Peter Bartlett, Olivier Bousquet, and Shahar Mendelson. Local rademacher complexities. Annals of Statistics, 33(4):1497–1537, 2005.
  • Boursier & Perchet (2019) Etienne Boursier and Vianney Perchet. Sic-mmab: synchronisation involves communication in multiplayer multi-armed bandits. In Advances in Neural Information Processing Systems, pp.  12071–12080, 2019.
  • Chakrabarti et al. (2008) Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli Upfal. Mortal multi-armed bandits. Advances in neural information processing systems, 21, 2008.
  • Chan et al. (2021) Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S Song, Peter Bartlett, and Michael I Jordan. Parallelizing contextual bandits. arXiv preprint arXiv:2105.10590, 2021.
  • Chen et al. (2021) Mingzhe Chen, Deniz Gündüz, Kaibin Huang, Walid Saad, Mehdi Bennis, Aneta Vulgarakis Feljan, and H Vincent Poor. Distributed learning in wireless networks: Recent progress and future challenges. IEEE Journal on Selected Areas in Communications, 39(12):3579–3605, 2021.
  • Chen et al. (2022) Zhirui Chen, PN Karthik, Vincent YF Tan, and Yeow Meng Chee. Federated best arm identification with heterogeneous clients. arXiv preprint arXiv:2210.07780, 2022.
  • Chou et al. (2020) Chi-Ning Chou, Juspreet Singh Sandhu, Mien Brabeeba Wang, and Tiancheng Yu. A general framework for analyzing stochastic dynamics in learning algorithms. arXiv preprint arXiv:2006.06171, 2020.
  • Cisneros-Velarde et al. (2023) Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo, and Mladen Kolar. One policy is enough: Parallel exploration with a single policy is near-optimal for reward-free reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp.  1965–2001. PMLR, 2023.
  • Cortes (2018) David Cortes. Adapting multi-armed bandits policies to contextual bandits scenarios. arXiv preprint arXiv:1811.04383, 2018.
  • Dai et al. (2023) Zhongxiang Dai, Yao Shu, Arun Verma, Flint Xiaofeng Fan, Bryan Kian Hsiang Low, and Patrick Jaillet. Federated neural bandit. The Eleventh International Conference on Learning Representations, 2023.
  • Demirel et al. (2022) Ilker Demirel, Yigit Yildirim, and Cem Tekin. Federated multi-armed bandits under byzantine attacks. arXiv preprint arXiv:2205.04134, 2022.
  • Du et al. (2021) Wei Du, Depeng Xu, Xintao Wu, and Hanghang Tong. Fairness-aware agnostic federated learning. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pp.  181–189. SIAM, 2021.
  • Dubey & Pentland (2020) Abhimanyu Dubey and Alex Pentland. Differentially-private federated linear bandits. Advances in Neural Information Processing Systems, 33:6003–6014, 2020.
  • Dubey & Pentland (2021) Abhimanyu Dubey and Alex Pentland. Provably efficient cooperative multi-agent reinforcement learning with function approximation. arXiv preprint arXiv:2103.04972, 2021.
  • Fan et al. (2023) Li Fan, Ruida Zhou, Chao Tian, and Cong Shen. Federated linear bandits with finite adversarial actions. arXiv preprint arXiv:2311.00973, 2023.
  • Fan et al. (2021) Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei **g, Cheston Tan, and Bryan Kian Hsiang Low. Fault-tolerant federated reinforcement learning with theoretical guarantee. Advances in Neural Information Processing Systems, 34:1007–1021, 2021.
  • Foster & Rakhlin (2020) Dylan Foster and Alexander Rakhlin. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pp.  3199–3210. PMLR, 2020.
  • Foster et al. (2020) Dylan J Foster, Alexander Rakhlin, David Simchi-Levi, and Yunzong Xu. Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective. arXiv preprint arXiv:2010.03104, 2020.
  • Fraboni et al. (2021) Yann Fraboni, Richard Vidal, Laetitia Kameni, and Marco Lorenzi. Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In International Conference on Machine Learning, pp.  3407–3416. PMLR, 2021.
  • Fu et al. (2019) Shuhao Fu, Chulin Xie, Bo Li, and Qifeng Chen. Attack-resistant federated learning with residual-based reweighting. arXiv preprint arXiv:1912.11464, 2019.
  • Ghosh et al. (2021) Avishek Ghosh, Abishek Sankararaman, and Kannan Ramchandran. Model selection for generic contextual bandits. arXiv preprint arXiv:2107.03455, 2021.
  • Girgis et al. (2021) Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, and Ananda Theertha Suresh. Shuffled model of differential privacy in federated learning. In International Conference on Artificial Intelligence and Statistics, pp.  2521–2529. PMLR, 2021.
  • Han et al. (2020) Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W Glynn, and Yinyu Ye. Sequential batch learning in finite-action linear contextual bandits. arXiv preprint arXiv:2004.06321, 2020.
  • Hanzely et al. (2021) Filip Hanzely, Boxin Zhao, and Mladen Kolar. Personalized federated learning: A unified framework and universal optimization techniques. arXiv preprint arXiv:2102.09743, 2021.
  • He et al. (2022) Jiafan He, Tianhao Wang, Yifei Min, and Quanquan Gu. A simple and provably efficient algorithm for asynchronous federated contextual linear bandits. Advances in neural information processing systems, 2022.
  • Hillel et al. (2013) Eshcar Hillel, Zohar S Karnin, Tomer Koren, Ronny Lempel, and Oren Somekh. Distributed exploration in multi-armed bandits. Advances in Neural Information Processing Systems, 26, 2013.
  • Hsu et al. (2012) Daniel Hsu, Sham M Kakade, and Tong Zhang. Random design analysis of ridge regression. In Conference on learning theory, pp.  9–1. JMLR Workshop and Conference Proceedings, 2012.
  • Huang et al. (2021a) Baihe Huang, Xiaoxiao Li, Zhao Song, and Xin Yang. Fl-ntk: A neural tangent kernel-based framework for federated learning analysis. In International Conference on Machine Learning, pp.  4423–4434. PMLR, 2021a.
  • Huang et al. (2021b) Ruiquan Huang, Weiqiang Wu, **g Yang, and Cong Shen. Federated linear contextual bandits. Advances in neural information processing systems, 34:27057–27068, 2021b.
  • Huang et al. (2023) Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hejazinia, and **g Yang. Federated linear contextual bandits with user-level differential privacy. In International Conference on Machine Learning, pp.  14060–14095. PMLR, 2023.
  • Jadbabaie et al. (2022) Ali Jadbabaie, Haochuan Li, Jian Qian, and Yi Tian. Byzantine-robust federated linear bandits. In 2022 IEEE 61st Conference on Decision and Control (CDC), pp.  5206–5213. IEEE, 2022.
  • ** et al. (2022) Hao **, Yang Peng, Wenhao Yang, Shusen Wang, and Zhihua Zhang. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.  18–37. PMLR, 2022.
  • Kairouz et al. (2021) Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  • Karbasi et al. (2021) Amin Karbasi, Vahab Mirrokni, and Mohammad Shadravan. Parallelizing thompson sampling. Advances in Neural Information Processing Systems, 34:10535–10548, 2021.
  • Karimireddy et al. (2020) Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp.  5132–5143. PMLR, 2020.
  • Katakis et al. (2008) Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. Multilabel text classification for automated tag suggestion. ECML PKDD discovery challenge, 75:2008, 2008.
  • Konečnỳ et al. (2016) Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
  • Krishnamurthy et al. (2021) Sanath Kumar Krishnamurthy, Vitor Hadad, and Susan Athey. Adapting to misspecification in contextual bandits with offline regression oracles. In International Conference on Machine Learning, pp.  5805–5814. PMLR, 2021.
  • Landgren et al. (2016) Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. On distributed cooperative decision-making in multiarmed bandits. In 2016 European Control Conference (ECC), pp.  243–248. IEEE, 2016.
  • Lattimore & Szepesvári (2020) Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
  • Li & Wang (2022a) Chuanhao Li and Hongning Wang. Asynchronous upper confidence bound algorithms for federated linear bandits. In International Conference on Artificial Intelligence and Statistics, pp.  6529–6553. PMLR, 2022a.
  • Li & Wang (2022b) Chuanhao Li and Hongning Wang. Communication efficient federated learning for generalized linear bandits. Advances in Neural Information Processing Systems, 2022b.
  • Li et al. (2022) Chuanhao Li, Huazheng Wang, Mengdi Wang, and Hongning Wang. Communication efficient distributed learning for kernelized contextual bandits. Advances in Neural Information Processing Systems, 2022.
  • Li et al. (2023) Chuanhao Li, Huazheng Wang, Mengdi Wang, and Hongning Wang. Learning kernelized contextual bandits in a distributed and asynchronous environment. The Eleventh International Conference on Learning Representations, 2023.
  • Li & Song (2022) Tan Li and Linqi Song. Privacy-preserving communication-efficient federated multi-armed bandits. IEEE Journal on Selected Areas in Communications, 40(3):773–787, 2022.
  • Li et al. (2020a) Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020a.
  • Li et al. (2021) Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning, pp.  6357–6368. PMLR, 2021.
  • Li et al. (2020b) Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2020b.
  • Lin & Moothedath (2023) Jiabin Lin and Shana Moothedath. Federated stochastic bandit learning with unobserved context. arXiv preprint arXiv:2303.17043, 2023.
  • Liu & Zhao (2010) Keqin Liu and Qing Zhao. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, 58(11):5667–5681, 2010.
  • Liu et al. (2022) Ziyao Liu, Jiale Guo, Wenzhuo Yang, Jiani Fan, Kwok-Yan Lam, and Jun Zhao. Privacy-preserving aggregation in federated learning: A survey. IEEE Transactions on Big Data, 2022.
  • Lubana et al. (2022) Ekdeep Lubana, Chi Ian Tang, Fahim Kawsar, Robert Dick, and Akhil Mathur. Orchestra: Unsupervised federated learning via globally consistent clustering. In International Conference on Machine Learning, pp.  14461–14484. PMLR, 2022.
  • Marfoq et al. (2023) Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, and Richard Vidal. Federated learning for data streams. arXiv preprint arXiv:2301.01542, 2023.
  • Martínez-Rubio et al. (2019) David Martínez-Rubio, Varun Kanade, and Patrick Rebeschini. Decentralized cooperative stochastic bandits. Advances in Neural Information Processing Systems, 32, 2019.
  • McMahan et al. (2017) Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  • Min et al. (2023) Yifei Min, Jiafan He, Tianhao Wang, and Quanquan Gu. Multi-agent reinforcement learning: Asynchronous communication and linear function approximation. arXiv preprint arXiv:2305.06446, 2023.
  • Mitra et al. (2022) Aritra Mitra, Arman Adibi, George J Pappas, and Hamed Hassani. Collaborative linear bandits with adversarial agents: Near-optimal regret bounds. Advances in neural information processing systems, 2022.
  • Mohri et al. (2018) Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
  • Mohri et al. (2019) Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pp.  4615–4625. PMLR, 2019.
  • Nesterov (2003) Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
  • Neu & Olkhovskaya (2020) Gergely Neu and Julia Olkhovskaya. Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory, pp.  3049–3068. PMLR, 2020.
  • Pillutla et al. (2022) Krishna Pillutla, Sham M Kakade, and Zaid Harchaoui. Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022.
  • Réda et al. (2022) Clémence Réda, Sattar Vakili, and Emilie Kaufmann. Near-optimal collaborative learning in bandits. In NeurIPS 2022-36th Conference on Neural Information Processing System, 2022.
  • Reddi et al. (2020) Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
  • Salgia & Zhao (2022) Sudeep Salgia and Qing Zhao. Distributed linear bandits under communication constraints. arXiv preprint arXiv:2211.02212, 2022.
  • Sen et al. (2021) Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel N Hill, and Inderjit S Dhillon. Top-k extreme contextual bandits with arm hierarchy. In International Conference on Machine Learning, pp.  9422–9433. PMLR, 2021.
  • Shi & Shen (2021) Chengshuai Shi and Cong Shen. Federated multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  9603–9611, 2021.
  • Shi et al. (2021) Chengshuai Shi, Cong Shen, and **g Yang. Federated multi-armed bandits with personalization. In International Conference on Artificial Intelligence and Statistics, pp.  2917–2925. PMLR, 2021.
  • Simchi-Levi & Xu (2022) David Simchi-Levi and Yunzong Xu. Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Mathematics of Operations Research, 47(3):1904–1931, 2022.
  • Szorenyi et al. (2013) Balazs Szorenyi, Róbert Busa-Fekete, István Hegedus, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. Gossip-based distributed stochastic bandit algorithms. In International conference on machine learning, pp.  19–27. PMLR, 2013.
  • Tsoumakas et al. (2008) Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08), volume 21, pp.  53–59, 2008.
  • van Berlo et al. (2020) Bram van Berlo, Aaqib Saeed, and Tanir Ozcelebi. Towards federated unsupervised representation learning. In Proceedings of the third ACM international workshop on edge systems, analytics and networking, pp.  31–36, 2020.
  • Wang et al. (2019) Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, and Liwei Wang. Distributed bandit learning: Near-optimal regret with efficient communication. arXiv preprint arXiv:1904.06309, 2019.
  • Wei & Luo (2021) Chen-Yu Wei and Haipeng Luo. Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. In Conference on Learning Theory, pp.  4300–4354. PMLR, 2021.
  • Wei et al. (2020) Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi **, Tony QS Quek, and H Vincent Poor. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15:3454–3469, 2020.
  • Wei et al. (2021) Kang Wei, Jun Li, Ming Ding, Chuan Ma, Hang Su, Bo Zhang, and H Vincent Poor. User-level privacy-preserving federated learning: Analysis and performance optimization. IEEE Transactions on Mobile Computing, 21(9):3388–3401, 2021.
  • Wei & Shen (2022) Xizixiang Wei and Cong Shen. Federated learning over noisy channels: Convergence analysis and design examples. IEEE Transactions on Cognitive Communications and Networking, 8(2):1253–1268, 2022.
  • Xin et al. (2020) Ran Xin, Usman A Khan, and Soummya Kar. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Transactions on Signal Processing, 68:6255–6271, 2020.
  • Xu & Zeevi (2020) Yunbei Xu and Assaf Zeevi. Upper counterfactual confidence bounds: a new optimism principle for contextual bandits. arXiv preprint arXiv:2007.07876, 2020.
  • Ye et al. (2020) Haishan Ye, Wei Xiong, and Tong Zhang. Pmgt-vr: A decentralized proximal-gradient algorithmic framework with variance reduction. arXiv preprint arXiv:2012.15010, 2020.
  • Yi & Vojnović (2023) Jialin Yi and Milan Vojnović. Doubly adversarial federated bandits. arXiv preprint arXiv:2301.09223, 2023.
  • Yin et al. (2018) Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, pp.  5650–5659. PMLR, 2018.
  • Yin et al. (2021) Xuefei Yin, Yanming Zhu, and Jiankun Hu. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Computing Surveys (CSUR), 54(6):1–36, 2021.
  • Zhang et al. (2020) Fengda Zhang, Kun Kuang, Zhaoyang You, Tao Shen, Jun Xiao, Yin Zhang, Chao Wu, Yueting Zhuang, and Xiaolin Li. Federated unsupervised representation learning. arXiv preprint arXiv:2010.08982, 2020.
  • Zhang (2023) Tong Zhang. Mathematical Analysis of Machine Learning Algorithms. Cambridge University Press, 2023.
  • Zheng et al. (2020) Sihui Zheng, Cong Shen, and Xiang Chen. Design and analysis of uplink and downlink communications for federated learning. IEEE Journal on Selected Areas in Communications, 39(7):2150–2167, 2020.
  • Zhou et al. (2020) Dongruo Zhou, Lihong Li, and Quanquan Gu. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pp.  11492–11502. PMLR, 2020.
  • Zhou & Chowdhury (2023) Xingyu Zhou and Sayak Ray Chowdhury. On differentially private federated linear contextual bandits. arXiv preprint arXiv:2302.13945, 2023.
  • Zhu et al. (2023) Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, and Michael I Jordan. Byzantine-robust federated learning with optimal statistical rates. In International Conference on Artificial Intelligence and Statistics, pp.  3151–3178. PMLR, 2023.
  • Zhu et al. (2022) Yinglun Zhu, Dylan J Foster, John Langford, and Paul Mineiro. Contextual bandits with large action spaces: Made practical. In International Conference on Machine Learning, pp.  27428–27453. PMLR, 2022.
  • Zhu et al. (2021) Zhaowei Zhu, **gxuan Zhu, Ji Liu, and Yang Liu. Federated bandit: A gossi** approach. In Abstract Proceedings of the 2021 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, pp.  3–4, 2021.
  • Zhuang et al. (2022) Weiming Zhuang, Yonggang Wen, and Shuai Zhang. Divergence-aware federated self-supervised learning. arXiv preprint arXiv:2204.04385, 2022.
  • Zierahn et al. (2023) Lukas Zierahn, Dirk van der Hoeven, Nicolo Cesa-Bianchi, and Gergely Neu. Nonstochastic contextual combinatorial bandits. In International Conference on Artificial Intelligence and Statistics, pp.  8771–8813. PMLR, 2023.

Appendix A Additional Discussions

A.1 Societal Impacts

This work focuses on providing a new design for federated contextual bandits (FCB), which establishes a close relationship between FCB and FL. We do not foresee major negative societal impacts as FCB is a well-established research domain and this work largely investigates its theoretical aspects. Moreover, as discussed in Section 6.2, FedIGW can conveniently incorporate appendages from FL studies to obtain appealing properties of privacy, robustness, fairness, and beyond, which we believe can contribute to a positive societal impact.

A.2 Examples of FL Components in FCB Studies

An example of the FL components adopted in previous FCB studies is provided in the following, together with the renowned FedAvg protocol for comparison. Specifically, as in Remark 4.5, we consider the study of federated linear bandits with a known d𝑑ditalic_d-dimensional feature map** ϕ(,)italic-ϕ\phi(\cdot,\cdot)italic_ϕ ( ⋅ , ⋅ ). Then, Alg. 2 illustrates the FL component commonly adopted in Wang et al. (2019); Li & Wang (2022a); Dubey & Pentland (2020); He et al. (2022): the agents share compressed local data (e.g., covariance matrices) to the server for aggregation, which happens in a one-shot fashion. A simplified version of FedAvg (McMahan et al., 2017) is presented in Alg. 3, with client m𝑚mitalic_m’s local loss function denoted as ^m(;)subscript^𝑚\hat{\mathcal{L}}_{m}(\cdot;\cdot)over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ; ⋅ ) following Sec. 3. It can be observed that FedAvg takes an optimization perspective to perform multi-rounds of gradient descent distributively.

Algorithm 2 The FL component commonly adopted in existing studies on federated linear bandits: one-shot aggregation of compressed local data
1:M𝑀Mitalic_M clients with client m𝑚mitalic_m’s interaction dataset denoted as 𝒮m={(xm,τm,am,τm,rm,τm):τm[nm]}subscript𝒮𝑚conditional-setsubscript𝑥𝑚subscript𝜏𝑚subscript𝑎𝑚subscript𝜏𝑚subscript𝑟𝑚subscript𝜏𝑚subscript𝜏𝑚delimited-[]subscript𝑛𝑚\mathcal{S}_{m}=\{(x_{m,\tau_{m}},a_{m,\tau_{m}},r_{m,\tau_{m}}):\tau_{m}\in[n% _{m}]\}caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) : italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] }
2:Client m𝑚mitalic_m: with ϕ(xm,τm,am,τm)italic-ϕsubscript𝑥𝑚subscript𝜏𝑚subscript𝑎𝑚subscript𝜏𝑚\phi(x_{m,\tau_{m}},a_{m,\tau_{m}})italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) denoted as ϕm,τmsubscriptitalic-ϕ𝑚subscript𝜏𝑚\phi_{m,\tau_{m}}italic_ϕ start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT, compute Vmτm[nm]ϕm,τmϕm,τmsubscript𝑉𝑚subscriptsubscript𝜏𝑚delimited-[]subscript𝑛𝑚subscriptitalic-ϕ𝑚subscript𝜏𝑚superscriptsubscriptitalic-ϕ𝑚subscript𝜏𝑚topV_{m}\leftarrow\sum_{\tau_{m}\in[n_{m}]}\phi_{m,\tau_{m}}\phi_{m,\tau_{m}}^{\top}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and bmτm[nm]rm,τmϕm,τmsubscript𝑏𝑚subscriptsubscript𝜏𝑚delimited-[]subscript𝑛𝑚subscript𝑟𝑚subscript𝜏𝑚subscriptitalic-ϕ𝑚subscript𝜏𝑚b_{m}\leftarrow\sum_{\tau_{m}\in[n_{m}]}r_{m,\tau_{m}}\phi_{m,\tau_{m}}italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT
3:Client m𝑚mitalic_m: send Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and bmsubscript𝑏𝑚b_{m}italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to the server
4:Server: receive Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and bmsubscript𝑏𝑚b_{m}italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT from each client m𝑚mitalic_m
5:Server: compute Vm[M]Vm𝑉subscript𝑚delimited-[]𝑀subscript𝑉𝑚V\leftarrow\sum_{m\in[M]}V_{m}italic_V ← ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and bm[M]bm𝑏subscript𝑚delimited-[]𝑀subscript𝑏𝑚b\leftarrow\sum_{m\in[M]}b_{m}italic_b ← ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
6:Server: send V𝑉Vitalic_V and b𝑏bitalic_b to all clients
7:Client m𝑚mitalic_m: receive V𝑉Vitalic_V and b𝑏bitalic_b from the server
Algorithm 3 The (simplified) FedAvg algorithm as an example of the canonical FL framework: multiple-round aggregation of local model parameters
1:M𝑀Mitalic_M clients with client m𝑚mitalic_m’s interaction dataset denoted as 𝒮m={(xm,τm,am,τm,rm,τm):τm[nm]}subscript𝒮𝑚conditional-setsubscript𝑥𝑚subscript𝜏𝑚subscript𝑎𝑚subscript𝜏𝑚subscript𝑟𝑚subscript𝜏𝑚subscript𝜏𝑚delimited-[]subscript𝑛𝑚\mathcal{S}_{m}=\{(x_{m,\tau_{m}},a_{m,\tau_{m}},r_{m,\tau_{m}}):\tau_{m}\in[n% _{m}]\}caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) : italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] }, learning rate η𝜂\etaitalic_η
2:for i=1,2,𝑖12i=1,2,\cdotsitalic_i = 1 , 2 , ⋯ do
3:     Client m𝑚mitalic_m: update ω^mω^m^m(ω^m;𝒮m)superscriptsubscript^𝜔𝑚subscript^𝜔𝑚subscript^𝑚subscript^𝜔𝑚subscript𝒮𝑚\hat{\omega}_{m}^{\prime}\leftarrow\hat{\omega}_{m}-\nabla\hat{\mathcal{L}}_{m% }(\hat{\omega}_{m};\mathcal{S}_{m})over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - ∇ over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
4:     Client m𝑚mitalic_m: send Δmω^mω^msubscriptΔ𝑚superscriptsubscript^𝜔𝑚subscript^𝜔𝑚\Delta_{m}\leftarrow\hat{\omega}_{m}^{\prime}-\hat{\omega}_{m}roman_Δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to the server
5:     Server: receive ΔmsubscriptΔ𝑚\Delta_{m}roman_Δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT from each client m𝑚mitalic_m
6:     Server: with m[M]nmsubscript𝑚delimited-[]𝑀subscript𝑛𝑚\sum_{m\in[M]}n_{m}∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT denoted as n𝑛nitalic_n, send ω^ω^ηm[M]nmnΔm^𝜔^𝜔𝜂subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscriptΔ𝑚\hat{\omega}\leftarrow\hat{\omega}-\eta\sum_{m\in[M]}\frac{n_{m}}{n}\Delta_{m}over^ start_ARG italic_ω end_ARG ← over^ start_ARG italic_ω end_ARG - italic_η ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG roman_Δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to all clients
7:     Client m𝑚mitalic_m: receive ω^superscript^𝜔\hat{\omega}^{\prime}over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT from the server and set ω^mω^subscript^𝜔𝑚^𝜔\hat{\omega}_{m}\leftarrow\hat{\omega}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← over^ start_ARG italic_ω end_ARG
8:end for

A.3 Limitations and Future Works

While this work proposes a novel, broadly applicable FCB design, i.e., FedIGW, there are still many interesting directions that are worth further exploring.

\bullet Paralleling CB and FL. As mentioned in Section 2.2, the current FL studies largely focus on learning from batched and static datasets. To accommodate such protocols, FCB designs typically follow a periodically alternating scheme as shown in Fig. 1, which is thus the focus of this work. While such alternating designs are capable of achieving statistical and communication efficiency, there is still room for improvement: (1) the CB interactions need to wait for the completeness of a full FL process, which may be slow when computation resources are limited and communication delays are large; (2) it is desirable to use the CB data in a more timely fashion instead of accumulating to the end of an epoch.

As one variant of periodically alternating, we can have FedIGW interleave CB and FL as shown in Fig. 4(a). This approach provides some buffer to perform FL without agents waiting for its completeness. Especially, in epoch l𝑙litalic_l, on one hand, the agents perform FL with datasets from epoch l1𝑙1l-1italic_l - 1; on the other hand, they perform CB interactions following IGW with an estimated function f^l2superscript^𝑓𝑙2\widehat{f}^{l-2}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT learned during epoch l1𝑙1l-1italic_l - 1 via datasets from epoch l2𝑙2l-2italic_l - 2. In other words, there will be one epoch delay compared with the basic form of FedIGW, while this delay is used for the FL process.

Furthermore, a better approach is to have FL and CB fully paralleled as shown in Fig. 4(b). Then, neither of them needs to wait for the other part, while CB data can be processed more timely. As mentioned in Section 6.2, we believe that the framework of FL with data streams proposed in a recent work of Marfoq et al. (2023) could be a suitable tool, as the sequential CB interactions essentially provide data steams. We believe this direction is not only worth further exploring in FCB but perhaps more importantly, calls for more investigation in FL with data streams, where FCB can also serve as an important motivation application.

Refer to caption
(a) Interleaving
Refer to caption
(b) Fully Paralleling
Figure 4: Different Styles of Connecting FL and CB in FCB.

\bullet Incorporating other FL advances. Given the flexible FL choice in FedIGW, although this work has provided detailed discussions on incorporating many aspects of FL advancements (including canonical algorithmic designs, convergence analysis, and useful appendages), there are still many directions worth further exploration. For example, as mentioned in Section 2.1, this work and most FCB investigations are focused on collaborating through a central server, while the case of communicating via a connected graph is less explored, where certain consensus errors commonly appear (Xin et al., 2020; Ye et al., 2020). It is worth noting that the design and analysis framework of FedIGW are both applicable in the later setting. Especially, the consensus error can be modeled as one part of the optimization error in Lemma 4.3. This further validates the value of the proposed FedIGW design and the general analysis framework while further specifications are left for future works.

Also, it would be great to leverage extra tools to save computations in the adopted FL protocol. Using local updates as in Chou et al. (2020) is one promising direction. These approaches are all feasible in FedIGW as long as the agents can obtain a learned reward function to perform IGW interactions. Their specific impacts can be captured via the established analysis framework through their own optimization errors.

\bullet Leveraging other CB designs. With previous FCB studies largely focused on the CB component, this work is motivated to incorporate more advances from FL. Thus, we propose the FedIGW design which can leverage canonical protocols, convergence analyses, and flexible appendages from FL.

However, we also note that there are still many CB algorithms that remain under-explored in FCB, where UCB-based designs are dominating. For example, the simple greedy algorithm is shown to be efficient when the context generation contains certain exploration capabilities in (Han et al., 2020). Moreover, varying attempts have been made in Xu & Zeevi (2020); Foster & Rakhlin (2020); Foster et al. (2020); Zhu et al. (2022) to design generally applicable CB algorithms with tight performance guarantees, e.g., handling infinite arms. It would be interesting to investigate how to bring these designs to the federated setting and whether such connections provide new opportunities and insights.

\bullet Complex environments. This work is focused on a stationary environment with stochastic rewards, which is well motivated by practical applications and commonly adopted in FCB studies. To further broaden the applicability of FCB, we believe that it is also important to study adversarial or non-stationary environments. Many advances have been made in standard single-agent bandits, e.g., Auer et al. (2002); Neu & Olkhovskaya (2020); Zierahn et al. (2023); Wei & Luo (2021). A recent work (Yi & Vojnović, 2023) investigates the federated adversarial environment in the tabular setting and further investigations are desired to provide further concrete designs and analyses.

\bullet Extension to RL. It would also be meaningful to extend the current study of FCB to federated reinforcement learning (RL) as a further step in understanding the combination of FL and sequential decision-making. Some results have been reported in Dubey & Pentland (2021); Min et al. (2023); ** et al. (2022); Fan et al. (2021); Cisneros-Velarde et al. (2023). We hope this work can serve as a starting point for more principled and generally applicable studies in federated RL.

Appendix B Additional Related Works

The studies on federated multi-armed bandits (FMAB) and federated contextual bandits (FCB) can be viewed as a version of the general multi-agent bandits (Liu & Zhao, 2010; Boursier & Perchet, 2019) and parallelizing bandits (Chan et al., 2021; Karbasi et al., 2021) that is more suitable for modern applications. We provide a more detailed review in the following.

\bullet Tabular. There have been many studies on cooperative designs in multi-armed bandits (i.e., the tabular setting), e.g., Hillel et al. (2013); Szorenyi et al. (2013); Landgren et al. (2016); Martínez-Rubio et al. (2019), focusing on different learning targets and different communication protocols (e.g., through a communication graph or with some randomly selected peers). Notably, in Wang et al. (2019), communication-efficient designs are proposed via periodically aggregating local estimates and performing arm elimination globally. We here also discuss another line of works on FMAB (Shi & Shen, 2021; Shi et al., 2021; Réda et al., 2022; Zhu et al., 2021; Chen et al., 2022). In their considered setting, the global rewards are (weighted) averages of local observations; however the former is not directly observable. With maximizing global rewards as the learning target, the agents need to collaboratively perform explorations and aggregate local information. Despite the model differences, the design principle of FedIGW may still be beneficial for studying this setting. Especially, it is worth considering replacing the UCB-based explorations commonly adopted in Shi & Shen (2021); Shi et al. (2021); Réda et al. (2022); Zhu et al. (2021); Chen et al. (2022) with regression-based ones as in FedIGW to facilitate incorporation of FL studies.

\bullet Linear. The most commonly studied FCB setting is federated linear bandits. There have been many investigations in this direction. Especially, different environments have been tackled in different works, e.g., the finite-armed fixed-context setting (Wang et al., 2019; Huang et al., 2021b), the finite-armed stochastic-context setting (Amani et al., 2022), the finite-armed adversarial context setting (Fan et al., 2023), the infinite-armed fixed-context setting (Salgia & Zhao, 2022), and the infinite-armed adversarial-context setting (Wang et al., 2019; Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022). Furthermore, many other settings, e.g., unobserved context (Lin & Moothedath, 2023), and additional properties, e.g., privacy (Dubey & Pentland, 2020; Zhou & Chowdhury, 2023), robustness (Jadbabaie et al., 2022), have been investigated. As summarized in the main paper, these works mainly select arm elimination (AE) (Lattimore & Szepesvári, 2020) or LinUCB (Abbasi-Yadkori et al., 2011) as their CB designs, which require both model estimates and confidence bounds. Thus, in their designed FL protocols, compressed local data (e.g., aggregated local rewards and covariance matrices) are often directly shared to solve a global ridge regression and to construct tighter confidence bounds. Compared with these studies, FedIGW can effectively solve the finite-armed stochastic-context setting without sharing any raw or compressed local data but only communicate processed model parameters (e.g., gradients). More detailed discussions and concrete results are provided in Appendix D.3.

A detailed comparison of the obtained regrets and the amounts of communicated real numbers is provided in Table 4. It can be observed that adapting FedIGW to the specific case of linear bandits does not provide the same near-optimal performance as in previous works. This is not a surprise as (single-agent) IGW itself has not yet been shown to achieve the lower-bound performance of linear bandits, while the previous works are largely built upon the nearly optimal LinUCB design (Abbasi-Yadkori et al., 2011). However, as noted in Remark 3.2, IGW only requires a learned reward function, instead of complicated data analytics such as UCB, which grants it great flexibility to better incorporate FL advancements and handle more general scenarios beyond the linear setting.

\bullet Generalized Linear and Kernelized. As extensions of the linear reward functions, Li & Wang (2022b) considers the generalized-linear class, and Li et al. (2022; 2023) study the kernelized one. The adopted basic techniques are similar to the aforementioned ones in federated linear bandits, while efforts are focused on fine-tuning communications (e.g., via Nyström approximation (Li et al., 2022; 2023)). It is worth noting that Li & Wang (2022b) invokes the distributed accelerated gradient descent algorithm to solve their considered distributed optimization with a generalized linear function class, which can be viewed as a preliminary attempt of involving FL or distributed optimization designs in FCB. However, the motivation there is the lack of a closed-form solution as in the linear case, while Li & Wang (2022b) additionally needs to share the local covariance matrices to construct better confidence bounds. This work, instead, formally proposed FedIGW which can rely only on canonical FL framework and accommodate flexible FL choices.

\bullet Neural. A recent work of Dai et al. (2023) extends the advances on single-agent neural bandits (Zhou et al., 2020) to the federated setting, where the neural tangent kernel (NTK) analyses are incorporated. With NTK to “linearize” the considered over-parameterized neural network, Dai et al. (2023) still largely follows the designs in the aforementioned federated linear bandits while some additional attempts have been made, e.g., an extra one-round averaging of model parameters besides aggregating NTK. This work, instead, takes a step further to fully leverage FL protocols, which often perform multiple (instead of one) rounds of model aggregations that are often necessary to guarantee convergence. Also, the optimization and generalization errors of a FedAvg variant with overparameterized neural networks are provided in Huang et al. (2021a), which is conceivably compatible with FedIGW for the corresponding analyses. Moreover, as shown by the additional experimental results in Sec. 5, FedIGW empirically outperforms FN-UCB (Dai et al., 2023) on different tasks and is more computationally efficient.

Table 4: A comparison of settings and results of federated linear bandits; note that FedIGW is not specifically designed and optimized to handle linear reward functions as previous designs.
Reference Arms Context Regret # of Numbers Communicated
Wang et al. (2019) Infinite Fixed O~(dMT)~𝑂𝑑𝑀𝑇\tilde{O}(d\sqrt{MT})over~ start_ARG italic_O end_ARG ( italic_d square-root start_ARG italic_M italic_T end_ARG ) O((dM+dloglog(d))log(T))𝑂𝑑𝑀𝑑𝑑𝑇O((dM+d\log\log(d))\log(T))italic_O ( ( italic_d italic_M + italic_d roman_log roman_log ( italic_d ) ) roman_log ( italic_T ) )
He et al. (2022) Infinite Adversarial O~(dMT)~𝑂𝑑𝑀𝑇\tilde{O}(d\sqrt{MT})over~ start_ARG italic_O end_ARG ( italic_d square-root start_ARG italic_M italic_T end_ARG ) O(d3M2log(MT))𝑂superscript𝑑3superscript𝑀2𝑀𝑇O(d^{3}M^{2}\log(MT))italic_O ( italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_M italic_T ) )
Huang et al. (2021b) Finite Fixed O~(dMT)~𝑂𝑑𝑀𝑇\tilde{O}(\sqrt{dMT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_d italic_M italic_T end_ARG ) O(d2+dK)Mlog(T))O(d^{2}+dK)M\log(T))italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d italic_K ) italic_M roman_log ( italic_T ) )
Amani et al. (2022){}^{\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPT Finite Stochastic O~(dMT)~𝑂𝑑𝑀𝑇\tilde{O}(\sqrt{dMT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_d italic_M italic_T end_ARG ) O(dMloglog(MT))𝑂𝑑𝑀𝑀𝑇O(dM\log\log(MT))italic_O ( italic_d italic_M roman_log roman_log ( italic_M italic_T ) )
FedIGW{}^{\ddagger}start_FLOATSUPERSCRIPT ‡ end_FLOATSUPERSCRIPT Finite Stochastic O~(dKMT)~𝑂𝑑𝐾𝑀𝑇\tilde{O}(\sqrt{dKMT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_d italic_K italic_M italic_T end_ARG ) O(d2Mlog(T))𝑂superscript𝑑2𝑀𝑇O(d^{2}M\log(T))italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M roman_log ( italic_T ) )
FedIGW{}^{\flat}start_FLOATSUPERSCRIPT ♭ end_FLOATSUPERSCRIPT Finite Stochastic O~(dKMT)~𝑂𝑑𝐾𝑀𝑇\tilde{O}(\sqrt{dKMT})over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_d italic_K italic_M italic_T end_ARG ) O(dlog(d)M3T)𝑂𝑑𝑑superscript𝑀3𝑇O(d\log(d)\sqrt{M^{3}T})italic_O ( italic_d roman_log ( italic_d ) square-root start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_T end_ARG )

\dagger: assuming a homogeneous and known context distribution for all agents;

\ddagger: solving the global ridge regression via directly sharing aggregated local rewards and covariance matrices as in the other listed works;

\flat: solving the global ridge regression via distributed accelerated gradient descent;

Appendix C Proofs for Section 4.1

C.1 Notations

We first introduce notations that are repeatedly used. For the output function from the adopted FL protocol, we characterize its performance via the following definition of its excess risk, which is commonly adopted in the analysis of IGW-type CB algorithms (Simchi-Levi & Xu, 2022; Sen et al., 2021; Ghosh et al., 2021).

Definition C.1.

Let p[M]:={pm:m[M]}assignsubscript𝑝delimited-[]𝑀conditional-setsubscript𝑝𝑚𝑚delimited-[]𝑀p_{[M]}:=\{p_{m}:m\in[M]\}italic_p start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := { italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } be a set of M𝑀Mitalic_M arbitrary independent arm selection distributions. Given an overall dataset 𝒮[M]:={𝒮m:m[M]}assignsubscript𝒮delimited-[]𝑀conditional-setsubscript𝒮𝑚𝑚delimited-[]𝑀\mathcal{S}_{[M]}:=\{\mathcal{S}_{m}:m\in[M]\}caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := { caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } where each dataset 𝒮msubscript𝒮𝑚\mathcal{S}_{m}caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT consists of nmsubscript𝑛𝑚n_{m}italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT training samples of the form (xm,am;rm(am))subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚subscript𝑎𝑚(x_{m},a_{m};r_{m}(a_{m}))( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) independently and identically drawn according to (xm,rm)𝒟msimilar-tosubscript𝑥𝑚subscript𝑟𝑚subscript𝒟𝑚(x_{m},r_{m})\sim\mathcal{D}_{m}( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∼ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, ampm(|xm)a_{m}\sim p_{m}(\cdot|x_{m})italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), the federated protocol 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎(𝒮[M])={𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎m(𝒮m):m[M]}𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎subscript𝒮delimited-[]𝑀conditional-setsubscript𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎𝑚subscript𝒮𝑚𝑚delimited-[]𝑀\textup{{FLroutine}}(\mathcal{S}_{[M]})=\{\textup{{FLroutine}}_{m}(\mathcal{S}% _{m}):m\in[M]\}FLroutine ( caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) = { FLroutine start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) : italic_m ∈ [ italic_M ] } returns a predictor f^()normal-^𝑓normal-⋅\widehat{f}(\cdot)over^ start_ARG italic_f end_ARG ( ⋅ ), and its excess risk is defined as

(;n[M]):=𝔼S[M],ξ[m[M]nmn𝔼xm𝒟m𝒳m,ampm(|xm)[(f^(xm,am)f*(xm,am))2]],\displaystyle\mathcal{E}(\mathcal{F};n_{[M]}):=\mathbb{E}_{S_{[M]},\xi}\left[% \sum_{m\in[M]}\frac{n_{m}}{n}\cdot\mathbb{E}_{x_{m}\sim\mathcal{D}^{\mathcal{X% }_{m}}_{m},a_{m}\sim p_{m}(\cdot|x_{m})}\left[\left(\widehat{f}(x_{m},a_{m})-f% ^{*}(x_{m},a_{m})\right)^{2}\right]\right],caligraphic_E ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT , italic_ξ end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ] ,

where n[M]:={nm:m[M]}assignsubscript𝑛delimited-[]𝑀conditional-setsubscript𝑛𝑚𝑚delimited-[]𝑀n_{[M]}:=\{n_{m}:m\in[M]\}italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := { italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } and ξ𝜉\xiitalic_ξ denotes the random source in the potentially stochastic FL algorithm. We often abbreviate (;n[M])subscript𝑛delimited-[]𝑀\mathcal{E}(\mathcal{F};n_{[M]})caligraphic_E ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) as (n[M])subscript𝑛delimited-[]𝑀\mathcal{E}(n_{[M]})caligraphic_E ( italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) to simplify notations.

This definition measures in expectation (w.r.t. the random data generation and the stochastic FL process) how far the output of the adopted FL protocol is from the true reward function on the weighted data distribution of all agents. Note that the excess risk bound (n[M])subscript𝑛delimited-[]𝑀\mathcal{E}(n_{[M]})caligraphic_E ( italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) would typically rely on some other parameters in the adopted FL protocol (e.g., the step size and the number of iterations in gradient-based approaches), which are currently not specified for generality.

Then, let ΥlsuperscriptΥ𝑙\Upsilon^{l}roman_Υ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT denote the sigma-algebra generated by the history up to epoch l𝑙litalic_l, i.e., {(xm,tm,am,tm,rm,tm):m[M],tm[tm(τl)]}conditional-setsubscript𝑥𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚subscript𝑟𝑚subscript𝑡𝑚formulae-sequence𝑚delimited-[]𝑀subscript𝑡𝑚delimited-[]subscript𝑡𝑚superscript𝜏𝑙\{(x_{m,t_{m}},a_{m,t_{m}},r_{m,t_{m}}):m\in[M],t_{m}\in[t_{m}(\tau^{l})]\}{ ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) : italic_m ∈ [ italic_M ] , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ] }, and the randomness in the adopted FL protocol up to epoch l𝑙litalic_l, i.e., {ξi:i[l]}conditional-setsubscript𝜉𝑖𝑖delimited-[]𝑙\{\xi_{i}:i\in[l]\}{ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_i ∈ [ italic_l ] }, where ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the random source in epoch i𝑖iitalic_i. Then, we denote lm(tm):=min{l:tmtm(τl)}assignsubscript𝑙𝑚subscript𝑡𝑚:𝑙subscript𝑡𝑚subscript𝑡𝑚superscript𝜏𝑙l_{m}(t_{m}):=\min\{l\in\mathbb{N}:t_{m}\leq t_{m}(\tau^{l})\}italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := roman_min { italic_l ∈ blackboard_N : italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≤ italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) } as the epoch that agent m𝑚mitalic_m’s tmsubscript𝑡𝑚t_{m}italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT belongs to. Also, let Ψm:=𝒜m𝒳massignsubscriptΨ𝑚superscriptsubscript𝒜𝑚subscript𝒳𝑚\Psi_{m}:=\mathcal{A}_{m}^{\mathcal{X}_{m}}roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the set of deterministic functions from 𝒳msubscript𝒳𝑚\mathcal{X}_{m}caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to 𝒜msubscript𝒜𝑚\mathcal{A}_{m}caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT for agent m𝑚mitalic_m and Ψ[M]:=×m[M]Ψm\Psi_{[M]}:=\times_{m\in[M]}\Psi_{m}roman_Ψ start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT := × start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT the Cartesian product of {Ψm:m[M]}conditional-setsubscriptΨ𝑚𝑚delimited-[]𝑀\{\Psi_{m}:m\in[M]\}{ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] }. Furthermore, for any action selection kernel p[M]={pm:m[M]}subscript𝑝delimited-[]𝑀conditional-setsubscript𝑝𝑚𝑚delimited-[]𝑀p_{[M]}=\{p_{m}:m\in[M]\}italic_p start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] }, where pm(am|xm)subscript𝑝𝑚conditionalsubscript𝑎𝑚subscript𝑥𝑚p_{m}(a_{m}|x_{m})italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is the probability of selecting action am𝒜subscript𝑎𝑚𝒜a_{m}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A given context xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and any policy π[M]={πm:m[M]}Ψsubscript𝜋delimited-[]𝑀conditional-setsubscript𝜋𝑚𝑚delimited-[]𝑀Ψ\pi_{[M]}=\{\pi_{m}:m\in[M]\}\in\Psiitalic_π start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT = { italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } ∈ roman_Ψ, we define

Vm(pm,πm)subscript𝑉𝑚subscript𝑝𝑚subscript𝜋𝑚\displaystyle V_{m}(p_{m},\pi_{m})italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) :=𝔼xm𝒟m𝒳m[1pm(πm(xm)|xm)],assignabsentsubscript𝔼similar-tosubscript𝑥𝑚subscriptsuperscript𝒟subscript𝒳𝑚𝑚delimited-[]1subscript𝑝𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚\displaystyle:=\mathbb{E}_{x_{m}\sim\mathcal{D}^{\mathcal{X}_{m}}_{m}}\left[% \frac{1}{p_{m}(\pi_{m}(x_{m})|x_{m})}\right],:= blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG ] ,
m(πm)subscript𝑚subscript𝜋𝑚\displaystyle\mathcal{R}_{m}(\pi_{m})caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) :=𝔼xm𝒟m𝒳m[f*(xm,πm(xm))],assignabsentsubscript𝔼similar-tosubscript𝑥𝑚superscriptsubscript𝒟𝑚subscript𝒳𝑚delimited-[]superscript𝑓subscript𝑥𝑚subscript𝜋𝑚subscript𝑥𝑚\displaystyle:=\mathbb{E}_{x_{m}\sim\mathcal{D}_{m}^{\mathcal{X}_{m}}}\left[f^% {*}(x_{m},\pi_{m}(x_{m}))\right],:= blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ] ,
^ml(πmΥl1)superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) :=𝔼xm𝒟m𝒳m[f^l(xm,πm(xm))Υl1],assignabsentsubscript𝔼similar-tosubscript𝑥𝑚superscriptsubscript𝒟𝑚subscript𝒳𝑚delimited-[]conditionalsuperscript^𝑓𝑙subscript𝑥𝑚subscript𝜋𝑚subscript𝑥𝑚superscriptΥ𝑙1\displaystyle:=\mathbb{E}_{x_{m}\sim\mathcal{D}_{m}^{\mathcal{X}_{m}}}\left[% \widehat{f}^{l}(x_{m},\pi_{m}(x_{m}))\mid\Upsilon^{l-1}\right],:= blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ,
Regm(πm)subscriptReg𝑚subscript𝜋𝑚\displaystyle\textup{Reg}_{m}(\pi_{m})Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) :=m(πm*)m(πm),assignabsentsubscript𝑚subscriptsuperscript𝜋𝑚subscript𝑚subscript𝜋𝑚\displaystyle:=\mathcal{R}_{m}(\pi^{*}_{m})-\mathcal{R}_{m}(\pi_{m}),:= caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ,
Reg^ml(πmΥl1)subscriptsuperscript^Reg𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\widehat{\textup{Reg}}^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) :=^m,tml(π^mlΥl1)^m,tml(πmΥl1).assignabsentsubscriptsuperscript^𝑙𝑚subscript𝑡𝑚conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1subscriptsuperscript^𝑙𝑚subscript𝑡𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle:=\widehat{\mathcal{R}}^{l}_{m,t_{m}}(\widehat{\pi}^{l}_{m}\mid% \Upsilon^{l-1})-\widehat{\mathcal{R}}^{l}_{m,t_{m}}(\pi_{m}\mid\Upsilon^{l-1}).:= over^ start_ARG caligraphic_R end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) .

where π^ml(xm):=argmaxam𝒜mf^l(xm,am)assignsubscriptsuperscript^𝜋𝑙𝑚subscript𝑥𝑚subscriptargmaxsubscript𝑎𝑚subscript𝒜𝑚superscript^𝑓𝑙subscript𝑥𝑚subscript𝑎𝑚\widehat{\pi}^{l}_{m}(x_{m}):=\operatorname*{arg\,max}_{a_{m}\in\mathcal{A}_{m% }}\widehat{f}^{l}(x_{m},a_{m})over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) for a given f^lsuperscript^𝑓𝑙\widehat{f}^{l}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT (determined by Υl1superscriptΥ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT).

The following proofs are largely inspired by the single-agent contextual bandits work (Simchi-Levi & Xu, 2022), while major changes have been made to accommodate the more complex federated system considered in this work.

C.2 Proofs of Theorem 4.1

First, the following lemma characterizes the relation between the excess errors and the selected learning rates.

Lemma C.2.

For all l>1𝑙1l>1italic_l > 1, it holds that

𝔼Υl1[m[M]Eml1m[M]Eml1𝔼xm𝒟m𝒳m,ampml1(|xm)[(f^l(xm,am)f*(xm,am))2Υl1]]\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}\frac{E^{l-1}_{m}}% {\sum_{m^{\prime}\in[M]}E^{l-1}_{m^{\prime}}}\cdot\mathbb{E}_{x_{m}\sim% \mathcal{D}^{\mathcal{X}_{m}}_{m},a_{m}\sim p^{l-1}_{m}(\cdot|x_{m})}\left[% \left(\widehat{f}^{l}(x_{m},a_{m})-f^{*}(x_{m},a_{m})\right)^{2}\mid\Upsilon^{% l-1}\right]\right]blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ⋅ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ]
(;E[M]l1)=m[M]Eml1Kmm[M]Eml1(γl)2.absentsubscriptsuperscript𝐸𝑙1delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚superscriptsuperscript𝛾𝑙2\displaystyle\leq\mathcal{E}(\mathcal{F};E^{l-1}_{[M]})=\frac{\sum_{m\in[M]}E^% {l-1}_{m}K_{m}}{\sum_{m\in[M]}E^{l-1}_{m}(\gamma^{l})^{2}}.≤ caligraphic_E ( caligraphic_F ; italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .
Proof.

The first inequality is from the Assumption C.1, while the second is based on the choice of γlsuperscript𝛾𝑙\gamma^{l}italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT in Theorem 4.1, i.e.,

γl=m[M]Eml1Kmm[M]Eml1(;E[M]l1),superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptsuperscript𝐸𝑙1delimited-[]𝑀\displaystyle\gamma^{l}=\sqrt{\frac{\sum_{m\in[M]}E^{l-1}_{m}K_{m}}{\sum_{m\in% [M]}E^{l-1}_{m}\mathcal{E}(\mathcal{F};E^{l-1}_{[M]})}},italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_E ( caligraphic_F ; italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) end_ARG end_ARG ,

which leads to the lemma. ∎

Then, the following lemma bounds the estimated rewards ^mlsuperscriptsubscript^𝑚𝑙\widehat{\mathcal{R}}_{m}^{l}over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and true rewards msubscript𝑚\mathcal{R}_{m}caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

Lemma C.3.

For any epoch l>1𝑙1l>1italic_l > 1, for any πmΨmsubscript𝜋𝑚subscriptnormal-Ψ𝑚\pi_{m}\in\Psi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, conditioned on Υl1superscriptnormal-Υ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT, it holds that

|^ml(πmΥl1)m(πm)|\displaystyle\left|\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-% \mathcal{R}_{m}(\pi_{m})\right|| over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | Vm(pml1,πmΥl1)ml1(Υl1),absentsubscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\leq\sqrt{V_{m}(p^{l-1}_{m},\pi_{m}\mid\Upsilon^{l-1})}\sqrt{% \mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})},≤ square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG ,

where ml1(Υl1):=𝔼xm𝒟m𝒳m,aml1pml1(|xm)[(f^l(xm,aml1)f*(xm,aml1))2Υl1]\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1}):=\mathbb{E}_{x_{m}\sim\mathcal{D}^{% \mathcal{X}_{m}}_{m},a^{l-1}_{m}\sim p_{m}^{l-1}(\cdot|x_{m})}\left[\left(% \widehat{f}^{l}(x_{m},a_{m}^{l-1})-f^{*}(x_{m},a_{m}^{l-1})\right)^{2}\mid% \Upsilon^{l-1}\right]caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) := blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ].

Proof.

For simplicity, we abbreviate 𝔼xm𝒟m𝒳m,aml1pml1(|xm)[]\mathbb{E}_{x_{m}\sim\mathcal{D}^{\mathcal{X}_{m}}_{m},a^{l-1}_{m}\sim p_{m}^{% l-1}(\cdot|x_{m})}[\cdot]blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ⋅ ] as 𝔼xm,aml1[]subscript𝔼subscript𝑥𝑚subscriptsuperscript𝑎𝑙1𝑚delimited-[]\mathbb{E}_{x_{m},a^{l-1}_{m}}[\cdot]blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⋅ ], and for any policy πmΨmsubscript𝜋𝑚subscriptΨ𝑚\pi_{m}\in\Psi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and any epoch l>1𝑙1l>1italic_l > 1, we define

Δml(πm(xm)):=f^l(xm,πm(xm))f*(xm,πm(xm))assignsubscriptsuperscriptΔ𝑙𝑚subscript𝜋𝑚subscript𝑥𝑚superscript^𝑓𝑙subscript𝑥𝑚subscript𝜋𝑚subscript𝑥𝑚superscript𝑓subscript𝑥𝑚subscript𝜋𝑚subscript𝑥𝑚\displaystyle\Delta^{l}_{m}(\pi_{m}(x_{m})):=\widehat{f}^{l}(x_{m},\pi_{m}(x_{% m}))-f^{*}(x_{m},\pi_{m}(x_{m}))roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) := over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) )

which indicates that

^ml(πmΥl1)m(πm)=𝔼xm[Δml(πm(xm)Υl1],\displaystyle\widehat{\mathcal{R}}^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})-\mathcal% {R}_{m}(\pi_{m})=\mathbb{E}_{x_{m}}\left[\Delta^{l}_{m}(\pi_{m}(x_{m})\mid% \Upsilon^{l-1}\right],over^ start_ARG caligraphic_R end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ,

and

𝔼xm,aml1[(Δml(aml1))2Υl1]𝔼xm[pml1(πm(xm)|xm)(Δml(πm(xm)))2Υl1].subscript𝔼subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙1delimited-[]conditionalsuperscriptsubscriptsuperscriptΔ𝑙𝑚superscriptsubscript𝑎𝑚𝑙12superscriptΥ𝑙1subscript𝔼subscript𝑥𝑚delimited-[]conditionalsubscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚superscriptsubscriptsuperscriptΔ𝑙𝑚subscript𝜋𝑚subscript𝑥𝑚2superscriptΥ𝑙1\displaystyle\mathbb{E}_{x_{m},a_{m}^{l-1}}\left[\left(\Delta^{l}_{m}(a_{m}^{l% -1})\right)^{2}\mid\Upsilon^{l-1}\right]\geq\mathbb{E}_{x_{m}}\left[p^{l-1}_{m% }(\pi_{m}(x_{m})|x_{m})\left(\Delta^{l}_{m}(\pi_{m}(x_{m}))\right)^{2}\mid% \Upsilon^{l-1}\right].blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ≥ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] .

Furthermore, conditioned on Υl1superscriptΥ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT, we can obtain that

Vm(pml1,πmΥl1)𝔼xm,aml1[(Δml(aml1))2Υl1]subscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝔼subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙1delimited-[]conditionalsuperscriptsubscriptsuperscriptΔ𝑙𝑚superscriptsubscript𝑎𝑚𝑙12superscriptΥ𝑙1\displaystyle V_{m}(p^{l-1}_{m},\pi_{m}\mid\Upsilon^{l-1})\cdot\mathbb{E}_{x_{% m},a_{m}^{l-1}}\left[\left(\Delta^{l}_{m}(a_{m}^{l-1})\right)^{2}\mid\Upsilon^% {l-1}\right]italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ⋅ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ]
=𝔼xm[1pml1(πm(xm)|xm)Υl1]𝔼xm,aml1[(Δml(aml1))2Υl1]absentsubscript𝔼subscript𝑥𝑚delimited-[]conditional1subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚superscriptΥ𝑙1subscript𝔼subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙1delimited-[]conditionalsuperscriptsubscriptsuperscriptΔ𝑙𝑚superscriptsubscript𝑎𝑚𝑙12superscriptΥ𝑙1\displaystyle=\mathbb{E}_{x_{m}}\left[\frac{1}{p^{l-1}_{m}(\pi_{m}(x_{m})|x_{m% })}\mid\Upsilon^{l-1}\right]\mathbb{E}_{x_{m},a_{m}^{l-1}}\left[\left(\Delta^{% l}_{m}(a_{m}^{l-1})\right)^{2}\mid\Upsilon^{l-1}\right]= blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ]
(𝔼xm[1pml1(πm(xm)|xm)𝔼aml1[(Δml(aml1))2]Υl1])2absentsuperscriptsubscript𝔼subscript𝑥𝑚delimited-[]conditional1subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚subscript𝔼superscriptsubscript𝑎𝑚𝑙1delimited-[]superscriptsubscriptsuperscriptΔ𝑙𝑚superscriptsubscript𝑎𝑚𝑙12superscriptΥ𝑙12\displaystyle\geq\left(\mathbb{E}_{x_{m}}\left[\sqrt{\frac{1}{p^{l-1}_{m}(\pi_% {m}(x_{m})|x_{m})}\mathbb{E}_{a_{m}^{l-1}}\left[\left(\Delta^{l}_{m}(a_{m}^{l-% 1})\right)^{2}\right]}\mid\Upsilon^{l-1}\right]\right)^{2}≥ ( blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG blackboard_E start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(𝔼xm[1pml1(πm(xm)|xm)pml1(πm(xm)|xm)(Δml(πm(xm)))2Υl1])2absentsuperscriptsubscript𝔼subscript𝑥𝑚delimited-[]conditional1subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚subscript𝑥𝑚subscript𝑥𝑚superscriptsubscriptsuperscriptΔ𝑙𝑚subscript𝜋𝑚subscript𝑥𝑚2superscriptΥ𝑙12\displaystyle\geq\left(\mathbb{E}_{x_{m}}\left[\sqrt{\frac{1}{p^{l-1}_{m}(\pi_% {m}(x_{m})|x_{m})}p^{l-1}_{m}(\pi_{m}(x_{m})|x_{m})\left(\Delta^{l}_{m}(\pi_{m% }(x_{m}))\right)^{2}}\mid\Upsilon^{l-1}\right]\right)^{2}≥ ( blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ( roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(𝔼xm[|Δml(πm(xm))|Υl1])2absentsuperscriptsubscript𝔼subscript𝑥𝑚delimited-[]conditionalsubscriptsuperscriptΔ𝑙𝑚subscript𝜋𝑚subscript𝑥𝑚superscriptΥ𝑙12\displaystyle=\left(\mathbb{E}_{x_{m}}\left[\left|\Delta^{l}_{m}(\pi_{m}(x_{m}% ))\right|\mid\Upsilon^{l-1}\right]\right)^{2}= ( blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | roman_Δ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) | ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
|^ml(πmΥl1)m(πm)|2.\displaystyle\geq\left|\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1}% )-\mathcal{R}_{m}(\pi_{m})\right|^{2}.≥ | over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

As a result, it holds that

|^ml(πmΥl1)m(πm)|\displaystyle\left|\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-% \mathcal{R}_{m}(\pi_{m})\right|| over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | Vm(pml1,πmΥl1)ml1(Υl1),absentsubscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\leq\sqrt{V_{m}(p^{l-1}_{m},\pi_{m}\mid\Upsilon^{l-1})}\sqrt{% \mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})},≤ square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG ,

where the last step we use the definition that

ml1(Υl1)=𝔼xm,aml1[(f^l(xm,aml1)f*(xm,aml1))2Υl1].superscriptsubscript𝑚𝑙1superscriptΥ𝑙1subscript𝔼subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙1delimited-[]conditionalsuperscriptsuperscript^𝑓𝑙subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙1superscript𝑓subscript𝑥𝑚superscriptsubscript𝑎𝑚𝑙12superscriptΥ𝑙1\displaystyle\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})=\mathbb{E}_{x_{m},a_{m}^{l-% 1}}\left[\left(\widehat{f}^{l}(x_{m},a_{m}^{l-1})-f^{*}(x_{m},a_{m}^{l-1})% \right)^{2}\mid\Upsilon^{l-1}\right].caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] .

This concludes the proof. ∎

Furthermore, the following lemma provides a characterization of the relation between the virtual loss Reg^mlsubscriptsuperscript^Reg𝑙𝑚\widehat{\textup{Reg}}^{l}_{m}over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and the true loss RegmlsubscriptsuperscriptReg𝑙𝑚\textup{Reg}^{l}_{m}Reg start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

Lemma C.4.

For any epochs l1𝑙1l\geq 1italic_l ≥ 1, for any policies π[M]Ψ[M]subscript𝜋delimited-[]𝑀subscriptnormal-Ψdelimited-[]𝑀\pi_{[M]}\in\Psi_{[M]}italic_π start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT, it holds that

m[M]EmlRegm(πm)2m[M]Eml𝔼Υl1[Reg^ml(πmΥl1)]+ηl,subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscript𝜂𝑙\displaystyle\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})\leq 2\sum_{m\in[% M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{\textup{Reg}}_{m}^{l}(% \pi_{m}\mid\Upsilon^{l-1})\right]+\eta^{l},∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ,
m[M]Eml𝔼Υl1[Reg^ml(πmΥl1)]2m[M]EmlRegm(πm)+ηl,subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙12subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚superscript𝜂𝑙\displaystyle\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{% \textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right]\leq 2\sum_{m\in[M]}E^{% l}_{m}\textup{Reg}_{m}(\pi_{m})+\eta^{l},∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] ≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ,

with

ηl:=9c2γlm[M]EmlKm.assignsuperscript𝜂𝑙9superscript𝑐2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚\displaystyle\eta^{l}:=\frac{9c^{2}}{\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}K_{m}.italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := divide start_ARG 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT .
Proof.

First, we note that for l=1𝑙1l=1italic_l = 1, it holds that

m[M]Em1Regm(πm)subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\sum_{m\in[M]}E^{1}_{m}\textup{Reg}_{m}(\pi_{m})∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) m[M]Em1η1=9c2m[M]Em1Km;absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚superscript𝜂19superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscript𝐾𝑚\displaystyle\leq\sum_{m\in[M]}E^{1}_{m}\leq\eta^{1}=9c^{2}\sum_{m\in[M]}E^{1}% _{m}K_{m};≤ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≤ italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ;
m[M]Em1Reg^ml(πm)subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscriptsuperscript^Reg𝑙𝑚subscript𝜋𝑚\displaystyle\sum_{m\in[M]}E^{1}_{m}\widehat{\textup{Reg}}^{l}_{m}(\pi_{m})∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) =0η1=9c2m[M]Em1Km,absent0superscript𝜂19superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscript𝐾𝑚\displaystyle=0\leq\eta^{1}=9c^{2}\sum_{m\in[M]}E^{1}_{m}K_{m},= 0 ≤ italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ,

which means the lemma holds for the first epoch.

We then perform an inductive proof and start by assuming that for epoch l1𝑙1l-1italic_l - 1 and any policies πmΨmsubscript𝜋𝑚subscriptΨ𝑚\pi_{m}\in\Psi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, it holds that

m[M]Eml1Regm(πm)2m[M]Eml1𝔼Υl2[Reg^ml1(πmΥl2)]+ηl1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscript𝜋𝑚2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙2delimited-[]superscriptsubscript^Reg𝑚𝑙1conditionalsubscript𝜋𝑚superscriptΥ𝑙2superscript𝜂𝑙1\displaystyle\sum_{m\in[M]}E^{l-1}_{m}\textup{Reg}_{m}(\pi_{m})\leq 2\sum_{m% \in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-2}}\left[\widehat{\textup{Reg}}_{m}^% {l-1}(\pi_{m}\mid\Upsilon^{l-2})\right]+\eta^{l-1}∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT
m[M]Eml1𝔼Υl2[Reg^ml1(πmΥl2)]2m[M]Eml1Regm(πm)+ηl1.subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙2delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙22subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscript𝜋𝑚superscript𝜂𝑙1\displaystyle\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-2}}\left[% \widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid\Upsilon^{l-2})\right]\leq 2\sum_{% m\in[M]}E^{l-1}_{m}\textup{Reg}_{m}(\pi_{m})+\eta^{l-1}.∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT ) ] ≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT .

Then, it can be observed that

Regm(πm)Reg^ml(πmΥl1)subscriptReg𝑚subscript𝜋𝑚superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\textup{Reg}_{m}(\pi_{m})-\widehat{\textup{Reg}}_{m}^{l}(\pi_{m}% \mid\Upsilon^{l-1})Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT )
=m(πm*)m(πm)(^ml(π^mlΥl1)^ml(πmΥl1))absentsubscript𝑚subscriptsuperscript𝜋𝑚subscript𝑚subscript𝜋𝑚superscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle=\mathcal{R}_{m}(\pi^{*}_{m})-\mathcal{R}_{m}(\pi_{m})-\left(% \widehat{\mathcal{R}}_{m}^{l}(\widehat{\pi}^{l}_{m}\mid\Upsilon^{l-1})-% \widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right)= caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - ( over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) )
m(πm*)m(πm)(^ml(πm*Υl1)^ml(πmΥl1))absentsubscript𝑚subscriptsuperscript𝜋𝑚subscript𝑚subscript𝜋𝑚superscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\leq\mathcal{R}_{m}(\pi^{*}_{m})-\mathcal{R}_{m}(\pi_{m})-\left(% \widehat{\mathcal{R}}_{m}^{l}(\pi^{*}_{m}\mid\Upsilon^{l-1})-\widehat{\mathcal% {R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right)≤ caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - ( over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) )
=m(πm*)^ml(πm*Υl1)+^ml(πmΥl1)m(πm)absentsubscript𝑚subscriptsuperscript𝜋𝑚superscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝑚subscript𝜋𝑚\displaystyle=\mathcal{R}_{m}(\pi^{*}_{m})-\widehat{\mathcal{R}}_{m}^{l}(\pi^{% *}_{m}\mid\Upsilon^{l-1})+\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l% -1})-\mathcal{R}_{m}(\pi_{m})= caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) + over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
(a)Vm(pml1,πm*Υl1)ml1(Υl1)+Vm(pml1,πmΥl1)ml1(Υl1)𝑎subscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsuperscriptsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1subscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\overset{(a)}{\leq}\sqrt{V_{m}(p^{l-1}_{m},\pi_{m}^{*}\mid% \Upsilon^{l-1})}\sqrt{\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})}+\sqrt{V_{m}(p^{l-% 1}_{m},\pi_{m}\mid\Upsilon^{l-1})}\sqrt{\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})}start_OVERACCENT ( italic_a ) end_OVERACCENT start_ARG ≤ end_ARG square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG + square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG
Vm(pml1,πm*Υl1)8cγl+Vm(pml1,πmΥl1)8cγl+4cγlml1(Υl1)absentsubscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsuperscriptsubscript𝜋𝑚superscriptΥ𝑙18𝑐superscript𝛾𝑙subscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙18𝑐superscript𝛾𝑙4𝑐superscript𝛾𝑙superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\leq\frac{V_{m}(p^{l-1}_{m},\pi_{m}^{*}\mid\Upsilon^{l-1})}{8c% \gamma^{l}}+\frac{V_{m}(p^{l-1}_{m},\pi_{m}\mid\Upsilon^{l-1})}{8c\gamma^{l}}+% 4c\gamma^{l}\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})≤ divide start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT )
(b)Km+γl1Reg^ml1(πm*Υl1)8cγl+Km+γl1Reg^ml1(πmΥl1)8cγl+4cγlml1(Υl1),𝑏subscript𝐾𝑚superscript𝛾𝑙1subscriptsuperscript^Reg𝑙1𝑚conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙18𝑐superscript𝛾𝑙subscript𝐾𝑚superscript𝛾𝑙1subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙18𝑐superscript𝛾𝑙4𝑐superscript𝛾𝑙superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\overset{(b)}{\leq}\frac{K_{m}+\gamma^{l-1}\widehat{\textup{Reg}}% ^{l-1}_{m}(\pi^{*}_{m}\mid\Upsilon^{l-1})}{8c\gamma^{l}}+\frac{K_{m}+\gamma^{l% -1}\widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid\Upsilon^{l-1})}{8c\gamma^{l}}+% 4c\gamma^{l}\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1}),start_OVERACCENT ( italic_b ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ,

where inequality (a) is from Lemma C.3 and inequality (b) is from Lemma C.10.

Then, summing over all M𝑀Mitalic_M agents, we can obtain that

𝔼Υl1[m[M]Eml(Regm(πm)Reg^ml(πmΥl1))]subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}\left(% \textup{Reg}_{m}(\pi_{m})-\widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{% l-1})\right)\right]blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) ]
m[M]EmlKm4cγl+γl18cγlm[M]Eml𝔼Υl1[Reg^ml1(πm*Υl1)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙1\displaystyle\leq\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}+\frac{% \gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}% \left[\widehat{\textup{Reg}}^{l-1}_{m}(\pi^{*}_{m}\mid\Upsilon^{l-1})\right]≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
+γl18cγlm[M]Eml𝔼Υl1[Reg^ml1(πmΥl1)]+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\quad+\frac{\gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}% \mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid% \Upsilon^{l-1})\right]+4c\gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon% ^{l-1}}\left[\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]+ divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
(d)m[M]EmlKm4cγl+c¯γl18cγlm[M]Eml1𝔼Υl1[Reg^ml1(πm*Υl1)]𝑑subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙¯𝑐superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙1\displaystyle\overset{(d)}{\leq}\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{% l}}+\frac{\overline{c}\gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}E^{l-1}_{m}% \mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{\textup{Reg}}^{l-1}_{m}(\pi^{*}_{m}% \mid\Upsilon^{l-1})\right]start_OVERACCENT ( italic_d ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
+c¯γl18cγlm[M]Eml1𝔼Υl1[Reg^ml1(πmΥl1)]+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]¯𝑐superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\quad+\frac{\overline{c}\gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}% E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{\textup{Reg}}^{l-1}_{m}(% \pi_{m}\mid\Upsilon^{l-1})\right]+4c\gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E% }_{\Upsilon^{l-1}}\left[\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]+ divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
(e)m[M]EmlKm4cγl+c¯γl14cγlm[M]Eml1Regm(πm)+c¯γl14cγlηl1𝑒subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscript𝜋𝑚¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙superscript𝜂𝑙1\displaystyle\overset{(e)}{\leq}\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{% l}}+\frac{\overline{c}\gamma^{l-1}}{4c\gamma^{l}}\sum_{m\in[M]}E^{l-1}_{m}% \textup{Reg}_{m}(\pi_{m})+\frac{\overline{c}\gamma^{l-1}}{4c\gamma^{l}}\cdot% \eta^{l-1}start_OVERACCENT ( italic_e ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ⋅ italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT
+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]4𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\quad+4c\gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-% 1}}\left[\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]+ 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
(f)m[M]EmlKm4cγl+14m[M]EmlRegm(πm)+9c2m[M]EmlKm4γl+4c2m[M]EmlKmγl,𝑓subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙14subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚9superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4superscript𝛾𝑙4superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚superscript𝛾𝑙\displaystyle\overset{(f)}{\leq}\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{% l}}+\frac{1}{4}\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})+\frac{9c^{2}% \sum_{m\in[M]}E^{l}_{m}K_{m}}{4\gamma^{l}}+\frac{4c^{2}\sum_{m\in[M]}E^{l}_{m}% K_{m}}{\gamma^{l}},start_OVERACCENT ( italic_f ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + divide start_ARG 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 4 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ,

where inequality (d) is from the definition c¯:=maxm[M],l[2,l(T)]Eml/Eml1assign¯𝑐subscriptformulae-sequence𝑚delimited-[]𝑀𝑙2𝑙𝑇subscriptsuperscript𝐸𝑙𝑚subscriptsuperscript𝐸𝑙1𝑚\overline{c}:=\max_{m\in[M],l\in[2,l(T)]}E^{l}_{m}/E^{l-1}_{m}over¯ start_ARG italic_c end_ARG := roman_max start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] , italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Inequality (e) is from the induction assumption that

m[M]Eml1𝔼Υl1[Reg^ml1(πm*Υl1)]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙1\displaystyle\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[% \widehat{\textup{Reg}}^{l-1}_{m}(\pi^{*}_{m}\mid\Upsilon^{l-1})\right]∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] =m[M]Eml1𝔼Υl2[Reg^ml1(πm*Υl2)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙2delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscriptsuperscript𝜋𝑚superscriptΥ𝑙2\displaystyle=\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-2}}\left[% \widehat{\textup{Reg}}^{l-1}_{m}(\pi^{*}_{m}\mid\Upsilon^{l-2})\right]= ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT ) ]
2m[M]Eml1Regm(πm*)+ηl1=ηl1,absent2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscriptsuperscript𝜋𝑚superscript𝜂𝑙1superscript𝜂𝑙1\displaystyle\leq 2\sum_{m\in[M]}E^{l-1}_{m}\textup{Reg}_{m}(\pi^{*}_{m})+\eta% ^{l-1}=\eta^{l-1},≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT = italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ,
m[M]Eml1𝔼Υl1[Reg^ml1(πmΥl1)]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[% \widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid\Upsilon^{l-1})\right]∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] =m[M]Eml1𝔼Υl2[Reg^ml1(πmΥl2)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙2delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙2\displaystyle=\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{\Upsilon^{l-2}}\left[% \widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid\Upsilon^{l-2})\right]= ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 2 end_POSTSUPERSCRIPT ) ]
2m[M]Eml1Regm(πm)+ηl1.absent2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscript𝜋𝑚superscript𝜂𝑙1\displaystyle\leq 2\sum_{m\in[M]}E^{l-1}_{m}\textup{Reg}_{m}(\pi_{m})+\eta^{l-% 1}.≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT .

Inequality (f) is based on the definition c¯:=minm[M],l[2,l(T)]Eml/Eml1assign¯𝑐subscriptformulae-sequence𝑚delimited-[]𝑀𝑙2𝑙𝑇subscriptsuperscript𝐸𝑙𝑚subscriptsuperscript𝐸𝑙1𝑚\underline{c}:=\min_{m\in[M],l\in[2,l(T)]}E^{l}_{m}/E^{l-1}_{m}under¯ start_ARG italic_c end_ARG := roman_min start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] , italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, c:=c¯/c¯assign𝑐¯𝑐¯𝑐c:=\overline{c}/\underline{c}italic_c := over¯ start_ARG italic_c end_ARG / under¯ start_ARG italic_c end_ARG and ηl:=9c2m[M]EmlKm/γlassignsuperscript𝜂𝑙9superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚superscript𝛾𝑙\eta^{l}:=9c^{2}\sum_{m\in[M]}E^{l}_{m}K_{m}/\gamma^{l}italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, also the assumption that γlγl1superscript𝛾𝑙superscript𝛾𝑙1\gamma^{l}\geq\gamma^{l-1}italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ≥ italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT and Lemma C.2, which indicates that

𝔼Υl1[m[M]Eml1ml1(Υl1)]m[M]Eml1Km(γl)2.subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚superscriptsubscript𝑚𝑙1superscriptΥ𝑙1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚superscriptsuperscript𝛾𝑙2\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l-1}_{m}% \mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]\leq\frac{\sum_{m\in[M]}E^{l-1}_{m% }K_{m}}{(\gamma^{l})^{2}}.blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG ( italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Thus, we can obtain that

34m[M]EmlRegm(πm)34subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\frac{3}{4}\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})divide start_ARG 3 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) m[M]Eml𝔼Υl1[Reg^ml(πmΥl1)]+m[M]EmlKm4cγlabsentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙\displaystyle\leq\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[% \widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right]+\frac{\sum_{m% \in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}≤ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
+25c2m[M]EmlKm4γl25superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4superscript𝛾𝑙\displaystyle+\frac{25c^{2}\sum_{m\in[M]}E^{l}_{m}K_{m}}{4\gamma^{l}}+ divide start_ARG 25 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
m[M]EmlRegm(πm)absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\Rightarrow\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})⇒ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) 43m[M]Eml𝔼Υl1[Reg^ml(πmΥl1)]+m[M]EmlKm3cγlabsent43subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚3𝑐superscript𝛾𝑙\displaystyle\leq\frac{4}{3}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}% \left[\widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right]+\frac{% \sum_{m\in[M]}E^{l}_{m}K_{m}}{3c\gamma^{l}}≤ divide start_ARG 4 end_ARG start_ARG 3 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
+25c2m[M]EmlKm4γl25superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4superscript𝛾𝑙\displaystyle+\frac{25c^{2}\sum_{m\in[M]}E^{l}_{m}K_{m}}{4\gamma^{l}}+ divide start_ARG 25 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
2m[M]Eml𝔼Υl1[Reg^ml(πmΥl1)]+ηlabsent2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscript𝜂𝑙\displaystyle\leq 2\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[% \widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\right]+\eta^{l}≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT

Also, it similarly holds that

Reg^ml(πmΥl1)Regm(πm)superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚\displaystyle\widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-\textup% {Reg}_{m}(\pi_{m})over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
=^ml(π^mlΥl1)^ml(πmΥl1)(m(πm*)m(πm))absentsuperscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝑚subscriptsuperscript𝜋𝑚subscript𝑚subscript𝜋𝑚\displaystyle=\widehat{\mathcal{R}}_{m}^{l}(\widehat{\pi}^{l}_{m}\mid\Upsilon^% {l-1})-\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-\left(\mathcal% {R}_{m}(\pi^{*}_{m})-\mathcal{R}_{m}(\pi_{m})\right)= over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - ( caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) )
^ml(π^mlΥl1)^ml(πmΥl1)(m(π^ml)m(πm))absentsuperscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝑚subscriptsuperscript^𝜋𝑙𝑚subscript𝑚subscript𝜋𝑚\displaystyle\leq\widehat{\mathcal{R}}_{m}^{l}(\widehat{\pi}^{l}_{m}\mid% \Upsilon^{l-1})-\widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-\left% (\mathcal{R}_{m}(\widehat{\pi}^{l}_{m})-\mathcal{R}_{m}(\pi_{m})\right)≤ over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - ( caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) )
=^ml(π^mlΥl1)m(π^ml)+m(πm)^ml(πmΥl1)absentsuperscriptsubscript^𝑚𝑙conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1subscript𝑚subscriptsuperscript^𝜋𝑙𝑚subscript𝑚subscript𝜋𝑚superscriptsubscript^𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle=\widehat{\mathcal{R}}_{m}^{l}(\widehat{\pi}^{l}_{m}\mid\Upsilon^% {l-1})-\mathcal{R}_{m}(\widehat{\pi}^{l}_{m})+\mathcal{R}_{m}(\pi_{m})-% \widehat{\mathcal{R}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})= over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT )
Vm(pml1,π^mlΥl1)ml1(Υl1)+Vm(pml1,πmΥl1)ml1(Υl1)absentsubscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsuperscriptsubscript^𝜋𝑚𝑙superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1subscript𝑉𝑚subscriptsuperscript𝑝𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\leq\sqrt{V_{m}(p^{l-1}_{m},\widehat{\pi}_{m}^{l}\mid\Upsilon^{l-% 1})}\sqrt{\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})}+\sqrt{V_{m}(p^{l-1}_{m},\pi_{% m}\mid\Upsilon^{l-1})}\sqrt{\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})}≤ square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG + square-root start_ARG italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG square-root start_ARG caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG
Km+γl1Reg^ml1(π^mlΥl1)8cγl+Km+γl1Reg^ml1(πmΥl1)8cγl+4cγlml1(Υl1).absentsubscript𝐾𝑚superscript𝛾𝑙1subscriptsuperscript^Reg𝑙1𝑚conditionalsuperscriptsubscript^𝜋𝑚𝑙superscriptΥ𝑙18𝑐superscript𝛾𝑙subscript𝐾𝑚superscript𝛾𝑙1subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙18𝑐superscript𝛾𝑙4𝑐superscript𝛾𝑙superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle\leq\frac{K_{m}+\gamma^{l-1}\widehat{\textup{Reg}}^{l-1}_{m}(% \widehat{\pi}_{m}^{l}\mid\Upsilon^{l-1})}{8c\gamma^{l}}+\frac{K_{m}+\gamma^{l-% 1}\widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m}\mid\Upsilon^{l-1})}{8c\gamma^{l}}+4% c\gamma^{l}\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1}).≤ divide start_ARG italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) .

Then, summing over M𝑀Mitalic_M agents, we can obtain that

𝔼Υl1[m[M]Eml(Reg^ml(πmΥl1)Regm(πm))]subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}\left(% \widehat{\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})-\textup{Reg}_{m}(\pi% _{m})\right)\right]blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) - Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ]
m[M]EmlKm4cγl+c¯γl18cγlm[M]Eml1𝔼Υl1[Reg^ml1(π^mlΥl1)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙¯𝑐superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsuperscriptsubscript^𝜋𝑚𝑙superscriptΥ𝑙1\displaystyle\leq\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}+\frac{% \overline{c}\gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{% \Upsilon^{l-1}}\left[\widehat{\textup{Reg}}^{l-1}_{m}(\widehat{\pi}_{m}^{l}% \mid\Upsilon^{l-1})\right]≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
+c¯γl18cγlm[M]Eml1𝔼Υl1[Reg^ml1(πmΥl1)]+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]¯𝑐superscript𝛾𝑙18𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle+\frac{\overline{c}\gamma^{l-1}}{8c\gamma^{l}}\sum_{m\in[M]}E^{l-% 1}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{\textup{Reg}}^{l-1}_{m}(\pi_{m% }\mid\Upsilon^{l-1})\right]+4c\gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{% \Upsilon^{l-1}}\left[\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]+ divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
m[M]EmlKm4cγl+c¯γl14cγlm[M]Eml1𝔼Υl1[Regm(π^mlΥl1)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptReg𝑚conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1\displaystyle\leq\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}+\frac{% \overline{c}\gamma^{l-1}}{4c\gamma^{l}}\sum_{m\in[M]}E^{l-1}_{m}\mathbb{E}_{% \Upsilon^{l-1}}\left[\textup{Reg}_{m}(\widehat{\pi}^{l}_{m}\mid\Upsilon^{l-1})\right]≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
+c¯γl14cγlm[M]Eml1Regm(πm)+c¯γl14cγlηl1+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚subscript𝜋𝑚¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙superscript𝜂𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle+\frac{\overline{c}\gamma^{l-1}}{4c\gamma^{l}}\sum_{m\in[M]}E^{l-% 1}_{m}\textup{Reg}_{m}(\pi_{m})+\frac{\overline{c}\gamma^{l-1}}{4c\gamma^{l}}% \cdot\eta^{l-1}+4c\gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}% \left[\mathcal{E}_{m}^{l-1}(\Upsilon^{l-1})\right]+ divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ⋅ italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
(g)m[M]EmlKm4cγl+γl14γlηl+γl14γlm[M]EmlRegm(πm)𝑔subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙superscript𝛾𝑙14superscript𝛾𝑙superscript𝜂𝑙superscript𝛾𝑙14superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\overset{(g)}{\leq}\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{% l}}+\frac{\gamma^{l-1}}{4\gamma^{l}}\cdot\eta^{l}+\frac{\gamma^{l-1}}{4\gamma^% {l}}\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})start_OVERACCENT ( italic_g ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ⋅ italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
+c¯γl14cγlηl1+4cγlm[M]Eml𝔼Υl1[ml1(Υl1)]¯𝑐superscript𝛾𝑙14𝑐superscript𝛾𝑙superscript𝜂𝑙14𝑐superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]superscriptsubscript𝑚𝑙1superscriptΥ𝑙1\displaystyle+\frac{\overline{c}\gamma^{l-1}}{4c\gamma^{l}}\cdot\eta^{l-1}+4c% \gamma^{l}\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\mathcal{E}_% {m}^{l-1}(\Upsilon^{l-1})\right]+ divide start_ARG over¯ start_ARG italic_c end_ARG italic_γ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ⋅ italic_η start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ]
m[M]EmlKm4cγl+9c2m[M]EmlKm4γl+14m[M]EmlRegm(πm)absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙9superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4superscript𝛾𝑙14subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\leq\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}+\frac{9c^{2% }\sum_{m\in[M]}E^{l}_{m}K_{m}}{4\gamma^{l}}+\frac{1}{4}\sum_{m\in[M]}E^{l}_{m}% \textup{Reg}_{m}(\pi_{m})≤ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
+9c2m[M]EmlKm4γl+4c2m[M]EmlKmγl,9superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4superscript𝛾𝑙4superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚superscript𝛾𝑙\displaystyle+\frac{9c^{2}\sum_{m\in[M]}E^{l}_{m}K_{m}}{4\gamma^{l}}+\frac{4c^% {2}\sum_{m\in[M]}E^{l}_{m}K_{m}}{\gamma^{l}},+ divide start_ARG 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 4 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ,

where inequality (g) is from the previous derivation that

m[M]Eml1Regm(π^mlΥl1)2c¯m[M]EmlReg^ml(π^mlΥl1)+c¯ηl=c¯ηlsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptReg𝑚conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙12¯𝑐subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsuperscript^Reg𝑙𝑚conditionalsubscriptsuperscript^𝜋𝑙𝑚superscriptΥ𝑙1¯𝑐superscript𝜂𝑙¯𝑐superscript𝜂𝑙\displaystyle\sum_{m\in[M]}E^{l-1}_{m}\textup{Reg}_{m}(\widehat{\pi}^{l}_{m}% \mid\Upsilon^{l-1})\leq 2\underline{c}\sum_{m\in[M]}E^{l}_{m}\widehat{\textup{% Reg}}^{l}_{m}(\widehat{\pi}^{l}_{m}\mid\Upsilon^{l-1})+\underline{c}\eta^{l}=% \underline{c}\eta^{l}∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ≤ 2 under¯ start_ARG italic_c end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) + under¯ start_ARG italic_c end_ARG italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = under¯ start_ARG italic_c end_ARG italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT

Thus, it holds that

m[M]Eml𝔼Υl1[Reg^ml1(π^mlΥl1)]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsuperscriptsubscript^𝜋𝑚𝑙superscriptΥ𝑙1\displaystyle\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}\left[\widehat{% \textup{Reg}}^{l-1}_{m}(\widehat{\pi}_{m}^{l}\mid\Upsilon^{l-1})\right]∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] 54m[M]EmlRegm(πm)absent54subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\leq\frac{5}{4}\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})≤ divide start_ARG 5 end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
+m[M]EmlKm4cγl+17c2m[M]EmlKm2γlsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚4𝑐superscript𝛾𝑙17superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚2superscript𝛾𝑙\displaystyle+\frac{\sum_{m\in[M]}E^{l}_{m}K_{m}}{4c\gamma^{l}}+\frac{17c^{2}% \sum_{m\in[M]}E^{l}_{m}K_{m}}{2\gamma^{l}}+ divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 4 italic_c italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG + divide start_ARG 17 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
m[M]Eml𝔼Υl1[Reg^ml1(π^mlΥl1)]absentsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsuperscript^Reg𝑙1𝑚conditionalsuperscriptsubscript^𝜋𝑚𝑙superscriptΥ𝑙1\displaystyle\Rightarrow\sum_{m\in[M]}E^{l}_{m}\mathbb{E}_{\Upsilon^{l-1}}% \left[\widehat{\textup{Reg}}^{l-1}_{m}(\widehat{\pi}_{m}^{l}\mid\Upsilon^{l-1}% )\right]⇒ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] 2m[M]EmlRegm(πm)+ηl.absent2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptReg𝑚subscript𝜋𝑚superscript𝜂𝑙\displaystyle\leq 2\sum_{m\in[M]}E^{l}_{m}\textup{Reg}_{m}(\pi_{m})+\eta^{l}.≤ 2 ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT .

With these two parts, the lemma can be obtained by induction. ∎

Furthermore, the following lemma provides a characterization of the per-epoch loss of the federation.

Lemma C.5.

For every epoch l>1𝑙1l>1italic_l > 1, conditioned on Υl1superscriptnormal-Υ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT, it holds that

𝔼Υl1[m[M]EmlπmΨmQml(πmΥl1)Regm(πm)]11c2γlm[M]EmlKm,subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚11superscript𝑐2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}\sum_{\pi% _{m}\in\Psi_{m}}Q^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})\textup{Reg}_{m}(\pi_{m})% \right]\leq\frac{11c^{2}}{\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}K_{m},blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ] ≤ divide start_ARG 11 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ,

where Ql(|Υl1)Q^{l}(\cdot|\Upsilon^{l-1})italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( ⋅ | roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) is a probability measure on Ψmsubscriptnormal-Ψ𝑚\Psi_{m}roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT defined in Lemma C.7

Proof.

For any probability measures {Q~ml():m[M]}conditional-setsubscriptsuperscript~𝑄𝑙𝑚𝑚delimited-[]𝑀\{\tilde{Q}^{l}_{m}(\cdot):m\in[M]\}{ over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ) : italic_m ∈ [ italic_M ] }, where Q~ml()subscriptsuperscript~𝑄𝑙𝑚\tilde{Q}^{l}_{m}(\cdot)over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ) is on ΨMsubscriptΨ𝑀\Psi_{M}roman_Ψ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, it holds that

m[M]EmlπmΨmQ~ml(πm)Regm(πm)subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript~𝑄𝑙𝑚subscript𝜋𝑚subscriptReg𝑚subscript𝜋𝑚\displaystyle\sum_{m\in[M]}E^{l}_{m}\sum_{\pi_{m}\in\Psi_{m}}\tilde{Q}^{l}_{m}% (\pi_{m})\textup{Reg}_{m}(\pi_{m})∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )
(a)2𝔼Υl1[π[M]Ψ[M]Q~l(π[M])m[M]EmlReg^m(πmΥl1)]+ηl𝑎2subscript𝔼superscriptΥ𝑙1delimited-[]subscriptsubscript𝜋delimited-[]𝑀subscriptΨdelimited-[]𝑀superscript~𝑄𝑙subscript𝜋delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript^Reg𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscript𝜂𝑙\displaystyle\overset{(a)}{\leq}2\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{\pi_{[% M]}\in\Psi_{[M]}}\tilde{Q}^{l}(\pi_{[M]})\sum_{m\in[M]}E^{l}_{m}\widehat{% \textup{Reg}}_{m}(\pi_{m}\mid\Upsilon^{l-1})\right]+\eta^{l}start_OVERACCENT ( italic_a ) end_OVERACCENT start_ARG ≤ end_ARG 2 blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT
=2𝔼Υl1[m[M]EmlπmΨmQ~ml(πm)Reg^m(πmΥl1)]+ηl,absent2subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript~𝑄𝑙𝑚subscript𝜋𝑚subscript^Reg𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscript𝜂𝑙\displaystyle=2\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}\sum_{% \pi_{m}\in\Psi_{m}}\tilde{Q}^{l}_{m}(\pi_{m})\widehat{\textup{Reg}}_{m}(\pi_{m% }\mid\Upsilon^{l-1})\right]+\eta^{l},= 2 blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ,

where inequality (a) is from Lemma C.4 and Q~l(π[M]):=m[M]Q~ml(πm)assignsuperscript~𝑄𝑙subscript𝜋delimited-[]𝑀subscriptproduct𝑚delimited-[]𝑀subscriptsuperscript~𝑄𝑙𝑚subscript𝜋𝑚\tilde{Q}^{l}(\pi_{[M]}):=\prod_{m\in[M]}\tilde{Q}^{l}_{m}(\pi_{m})over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := ∏ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). Thus, we can obtain that

𝔼Υl1[m[M]EmlπmΨmQml(πmΥl1)Regm(πm)]subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚\displaystyle\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}\sum_{\pi% _{m}\in\Psi_{m}}Q^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})\textup{Reg}_{m}(\pi_{m})\right]blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ]
2𝔼Υl1[m[M]EmlπmΨmQml(πmΥl1)Reg^m(πmΥl1)]+ηlabsent2subscript𝔼superscriptΥ𝑙1delimited-[]subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript^Reg𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscript𝜂𝑙\displaystyle\leq 2\mathbb{E}_{\Upsilon^{l-1}}\left[\sum_{m\in[M]}E^{l}_{m}% \sum_{\pi_{m}\in\Psi_{m}}Q^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})\widehat{\textup{% Reg}}_{m}(\pi_{m}\mid\Upsilon^{l-1})\right]+\eta^{l}≤ 2 blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ] + italic_η start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT
(b)2γlm[M]EmlKm+9c2γlm[M]EmlKm𝑏2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚9superscript𝑐2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚\displaystyle\overset{(b)}{\leq}\frac{2}{\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}K_{% m}+\frac{9c^{2}}{\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}K_{m}start_OVERACCENT ( italic_b ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 2 end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + divide start_ARG 9 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
11c2γlm[M]EmlKm,absent11superscript𝑐2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚\displaystyle\leq\frac{11c^{2}}{\gamma^{l}}\sum_{m\in[M]}E^{l}_{m}K_{m},≤ divide start_ARG 11 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ,

where inequality (b) is from Lemma C.9. ∎

With the previous lemmas, we can obtain the final Theorem 4.1, which is restated in the following.

Theorem C.6 (Restatement of Theorem 4.1).

Using a learning rate

γl=O(m[M]Eml1Km/(m[M]Eml1(E[M]l1)))superscript𝛾𝑙𝑂subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptsuperscript𝐸𝑙1delimited-[]𝑀\displaystyle\gamma^{l}=O\left(\sqrt{\sum_{m\in[M]}E^{l-1}_{m}K_{m}/\left(\sum% _{m\in[M]}E^{l-1}_{m}\mathcal{E}(E^{l-1}_{[M]})\right)}\right)italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = italic_O ( square-root start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ) end_ARG )

in epoch l𝑙litalic_l, denoting K¯l:=m[M]EmlKm/m[M]Emlassignsuperscriptnormal-¯𝐾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚\bar{K}^{l}:=\sum_{m\in[M]}E^{l}_{m}K_{m}/\sum_{m\in[M]}E^{l}_{m}over¯ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, the regret of FedIGW can be bounded as

Reg(T)=O(m[M]Em1+l[2,l(T)]c52K¯l(E[M]l1)m[M]Eml).Reg𝑇𝑂subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚subscript𝑙2𝑙𝑇superscript𝑐52superscript¯𝐾𝑙subscriptsuperscript𝐸𝑙1delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚\displaystyle\textup{Reg}(T)=O\left(\sum_{m\in[M]}E^{1}_{m}+\sum_{l\in[2,l(T)]% }c^{\frac{5}{2}}\sqrt{\bar{K}^{l}\mathcal{E}(E^{l-1}_{[M]})}\sum_{m\in[M]}E^{l% }_{m}\right).Reg ( italic_T ) = italic_O ( ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT divide start_ARG 5 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG over¯ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT caligraphic_E ( italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) .
Proof of Theorem 4.1.

The expected regret can be bounded as

Reg(T)=𝔼[m[M]tm[Tm](f*(xm,tm,πm*(xm,tm))f*(xm,tm,am,tm))]Reg𝑇𝔼delimited-[]subscript𝑚delimited-[]𝑀subscriptsubscript𝑡𝑚delimited-[]subscript𝑇𝑚superscript𝑓subscript𝑥𝑚subscript𝑡𝑚subscriptsuperscript𝜋𝑚subscript𝑥𝑚subscript𝑡𝑚superscript𝑓subscript𝑥𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚\displaystyle\textup{Reg}(T)=\mathbb{E}\left[\sum_{m\in[M]}\sum_{t_{m}\in[T_{m% }]}\left(f^{*}(x_{m,t_{m}},\pi^{*}_{m}(x_{m,t_{m}}))-f^{*}(x_{m,t_{m}},a_{m,t_% {m}})\right)\right]Reg ( italic_T ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ]
𝔼[l[2,l(T)]m[M]tm[tm(τl1)+1,tm(τl)](f*(xm,tm,πm*(xm,tm))f*(xm,tm,am,tm))]+m[M]Em1absent𝔼delimited-[]subscript𝑙2𝑙𝑇subscript𝑚delimited-[]𝑀subscriptsubscript𝑡𝑚subscript𝑡𝑚superscript𝜏𝑙11subscript𝑡𝑚superscript𝜏𝑙superscript𝑓subscript𝑥𝑚subscript𝑡𝑚subscriptsuperscript𝜋𝑚subscript𝑥𝑚subscript𝑡𝑚superscript𝑓subscript𝑥𝑚subscript𝑡𝑚subscript𝑎𝑚subscript𝑡𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle\leq\mathbb{E}\left[\sum_{l\in[2,l(T)]}\sum_{m\in[M]}\sum_{t_{m}% \in[t_{m}(\tau^{l-1})+1,t_{m}(\tau^{l})]}\left(f^{*}(x_{m,t_{m}},\pi^{*}_{m}(x% _{m,t_{m}}))-f^{*}(x_{m,t_{m}},a_{m,t_{m}})\right)\right]+\sum_{m\in[M]}E^{1}_% {m}≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) + 1 , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ] end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ] + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
=l[2,l(T)]𝔼Υl1[𝔼xm,aml[m[M]Eml(f*(xm,πm*(xm))f*(xm,am))Υl1]Υl1]+m[M]Em1absentsubscript𝑙2𝑙𝑇subscript𝔼superscriptΥ𝑙1delimited-[]conditionalsubscript𝔼subscript𝑥𝑚subscriptsuperscript𝑎𝑙𝑚delimited-[]conditionalsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚superscript𝑓subscript𝑥𝑚subscriptsuperscript𝜋𝑚subscript𝑥𝑚superscript𝑓subscript𝑥𝑚subscript𝑎𝑚superscriptΥ𝑙1superscriptΥ𝑙1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle=\sum_{l\in[2,l(T)]}\mathbb{E}_{\Upsilon^{l-1}}\left[\mathbb{E}_{% x_{m},a^{l}_{m}}\left[\sum_{m\in[M]}E^{l}_{m}\left(f^{*}(x_{m},\pi^{*}_{m}(x_{% m}))-f^{*}(x_{m},a_{m})\right)\mid\Upsilon^{l-1}\right]\mid\Upsilon^{l-1}% \right]+\sum_{m\in[M]}E^{1}_{m}= ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
=(a)l[2,l(T)]𝔼Υl1[m[M]EmlπmΨmQml(πmΥl1)Regm(πm)Υl1]+m[M]Em1𝑎subscript𝑙2𝑙𝑇subscript𝔼superscriptΥ𝑙1delimited-[]conditionalsubscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscriptsubscript𝜋𝑚superscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚superscriptΥ𝑙1subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle\overset{(a)}{=}\sum_{l\in[2,l(T)]}\mathbb{E}_{\Upsilon^{l-1}}% \left[\sum_{m\in[M]}E^{l}_{m}\sum_{\pi_{m}\in\Psi^{m}}Q^{l}_{m}(\pi_{m}\mid% \Upsilon^{l-1})\textup{Reg}_{m}(\pi_{m})\mid\Upsilon^{l-1}\right]+\sum_{m\in[M% ]}E^{1}_{m}start_OVERACCENT ( italic_a ) end_OVERACCENT start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
(b)l[2,l(T)]11c2γlm[M]EmlKm+m[M]Em1𝑏subscript𝑙2𝑙𝑇11superscript𝑐2superscript𝛾𝑙subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle\overset{(b)}{\leq}\sum_{l\in[2,l(T)]}\frac{11c^{2}}{\gamma^{l}}% \sum_{m\in[M]}E^{l}_{m}K_{m}+\sum_{m\in[M]}E^{1}_{m}start_OVERACCENT ( italic_b ) end_OVERACCENT start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT divide start_ARG 11 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
=(c)l[2,l(T)]11c2m[M]Eml1(;E[M]l1)m[M]Eml1Kmm[M]EmlKm+m[M]Em1𝑐subscript𝑙2𝑙𝑇11superscript𝑐2subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscriptsuperscript𝐸𝑙1delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙𝑚subscript𝐾𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle\overset{(c)}{=}\sum_{l\in[2,l(T)]}11c^{2}\sqrt{\frac{\sum_{m\in[% M]}E^{l-1}_{m}\mathcal{E}(\mathcal{F};E^{l-1}_{[M]})}{\sum_{m\in[M]}E^{l-1}_{m% }K_{m}}}\sum_{m\in[M]}E^{l}_{m}K_{m}+\sum_{m\in[M]}E^{1}_{m}start_OVERACCENT ( italic_c ) end_OVERACCENT start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT 11 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_E ( caligraphic_F ; italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
l[2,l(T)]11c2K¯(;E[M]l1)m[M]Eml1+m[M]Em1,absentsubscript𝑙2𝑙𝑇11superscript𝑐2¯𝐾subscriptsuperscript𝐸𝑙1delimited-[]𝑀subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸𝑙1𝑚subscript𝑚delimited-[]𝑀subscriptsuperscript𝐸1𝑚\displaystyle\leq\sum_{l\in[2,l(T)]}11c^{2}\sqrt{\overline{K}\mathcal{E}(% \mathcal{F};E^{l-1}_{[M]})}\sum_{m\in[M]}E^{l-1}_{m}+\sum_{m\in[M]}E^{1}_{m},≤ ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT 11 italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG over¯ start_ARG italic_K end_ARG caligraphic_E ( caligraphic_F ; italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ,

where equality (a) is from Lemma C.8, inequality (b) is from Lemma C.5, and inequality (c) is from the choice of γlsuperscript𝛾𝑙\gamma^{l}italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT. The proof is then concluded. ∎

C.3 Supporting Lemmas

The following supporting lemmas can be similarly obtained by the corresponding proofs in Simchi-Levi & Xu (2022).

Lemma C.7 (Lemma 3, Simchi-Levi & Xu (2022)).

For any epoch l𝑙l\in\mathbb{N}italic_l ∈ blackboard_N, conditioned on Υl1superscriptnormal-Υ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT, there exists a probability measure Qml(|Υl1)Q^{l}_{m}(\cdot|\Upsilon^{l-1})italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ | roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) on Ψmsubscriptnormal-Ψ𝑚\Psi_{m}roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT such that

am𝒜m,xm𝒳m,pml(am|xm,Υl1)=πmΨm𝟙{πm(xm)=am}Qml(πm|Υl1).formulae-sequencefor-allsubscript𝑎𝑚subscript𝒜𝑚formulae-sequencefor-allsubscript𝑥𝑚subscript𝒳𝑚subscriptsuperscript𝑝𝑙𝑚conditionalsubscript𝑎𝑚subscript𝑥𝑚superscriptΥ𝑙1subscriptsubscript𝜋𝑚subscriptΨ𝑚1subscript𝜋𝑚subscript𝑥𝑚subscript𝑎𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle\forall a_{m}\in\mathcal{A}_{m},\forall x_{m}\in\mathcal{X}_{m},% \qquad p^{l}_{m}(a_{m}|x_{m},\Upsilon^{l-1})=\sum_{\pi_{m}\in\Psi_{m}}\mathds{% 1}\{\pi_{m}(x_{m})=a_{m}\}Q^{l}_{m}(\pi_{m}|\Upsilon^{l-1}).∀ italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , ∀ italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 { italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) .
Lemma C.8 (Lemma 4, Simchi-Levi & Xu (2022)).

Fix any epoch l𝑙l\in\mathbb{N}italic_l ∈ blackboard_N, we have

𝔼xm𝒟m𝒳m,amlpml(|xm)[f*(xm,πm*(xm))f*(xm,aml)Υl1]\displaystyle\mathbb{E}_{x_{m}\sim\mathcal{D}_{m}^{\mathcal{X}_{m}},a_{m}^{l}% \sim p_{m}^{l}(\cdot|x_{m})}\left[f^{*}(x_{m},\pi^{*}_{m}(x_{m}))-f^{*}(x_{m},% a_{m}^{l})\mid\Upsilon^{l-1}\right]blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ]
=πmΨmQml(πmΥl1)Regm(πm).absentsubscriptsubscript𝜋𝑚subscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscriptReg𝑚subscript𝜋𝑚\displaystyle=\sum_{\pi_{m}\in\Psi_{m}}Q^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})% \textup{Reg}_{m}(\pi_{m}).= ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) Reg start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) .
Lemma C.9 (Lemma 5, Simchi-Levi & Xu (2022)).

Fix any epoch l𝑙l\in\mathbb{N}italic_l ∈ blackboard_N, conditioned on Υl1superscriptnormal-Υ𝑙1\Upsilon^{l-1}roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT, we have

πΨmQml(πmΥl1)Reg^ml(πmΥl1)Kmγl.subscript𝜋subscriptΨ𝑚subscriptsuperscript𝑄𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1superscriptsubscript^Reg𝑚𝑙conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝐾𝑚superscript𝛾𝑙\displaystyle\sum_{\pi\in\Psi_{m}}Q^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1})\widehat% {\textup{Reg}}_{m}^{l}(\pi_{m}\mid\Upsilon^{l-1})\leq\frac{K_{m}}{\gamma^{l}}.∑ start_POSTSUBSCRIPT italic_π ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) over^ start_ARG Reg end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ≤ divide start_ARG italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG .
Lemma C.10 (Lemma 6, Simchi-Levi & Xu (2022)).

Fix any epoch l𝑙l\in\mathbb{N}italic_l ∈ blackboard_N, for any policy πmΨmsubscript𝜋𝑚subscriptnormal-Ψ𝑚\pi_{m}\in\Psi_{m}italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ roman_Ψ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we have

Vm(pml,πmΥl1)Km+γlReg^ml(πmΥl1).subscript𝑉𝑚subscriptsuperscript𝑝𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1subscript𝐾𝑚superscript𝛾𝑙subscriptsuperscript^Reg𝑙𝑚conditionalsubscript𝜋𝑚superscriptΥ𝑙1\displaystyle V_{m}(p^{l}_{m},\pi_{m}\mid\Upsilon^{l-1})\leq K_{m}+\gamma^{l}% \widehat{\textup{Reg}}^{l}_{m}(\pi_{m}\mid\Upsilon^{l-1}).italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ≤ italic_K start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT over^ start_ARG Reg end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∣ roman_Υ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) .

Appendix D Proofs for Section 4.2

D.1 Proofs of Corollary 4.2

First, with realizability, i.e., Assumption 3.1, the following characterization can be obtained.

Lemma D.1 (Lemma 4.2, Agarwal et al. (2012)).

Fix a function f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F. Suppose we sample xm,rmsubscript𝑥𝑚subscript𝑟𝑚x_{m},r_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT from the data distribution 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and an action amsubscript𝑎𝑚a_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT from an arbitrary distribution such that rmsubscript𝑟𝑚r_{m}italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and amsubscript𝑎𝑚a_{m}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are conditionally independent given xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Define the random variable

m(f):=(f(xm,am)rm(am))2(f*(xm,am)rm(am))2.assignsubscript𝑚𝑓superscript𝑓subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚subscript𝑎𝑚2superscriptsuperscript𝑓subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚subscript𝑎𝑚2\displaystyle\ell_{m}(f):=\left(f(x_{m},a_{m})-r_{m}(a_{m})\right)^{2}-\left(f% ^{*}(x_{m},a_{m})-r_{m}(a_{m})\right)^{2}.roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ) := ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Then, we have

𝔼xm,rm,am[m(f)]=𝔼xm,am[(f(xm,am)f*(xm,am))2]subscript𝔼subscript𝑥𝑚subscript𝑟𝑚subscript𝑎𝑚delimited-[]subscript𝑚𝑓subscript𝔼subscript𝑥𝑚subscript𝑎𝑚delimited-[]superscript𝑓subscript𝑥𝑚subscript𝑎𝑚superscript𝑓subscript𝑥𝑚subscript𝑎𝑚2\displaystyle\mathbb{E}_{x_{m},r_{m},a_{m}}\left[\ell_{m}(f)\right]=\mathbb{E}% _{x_{m},a_{m}}\left[\left(f(x_{m},a_{m})-f^{*}(x_{m},a_{m})\right)^{2}\right]blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ) ] = blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

and

𝕍xm,rm,am[m(f)]4𝔼xm,rm,am[m(f)],subscript𝕍subscript𝑥𝑚subscript𝑟𝑚subscript𝑎𝑚delimited-[]subscript𝑚𝑓4subscript𝔼subscript𝑥𝑚subscript𝑟𝑚subscript𝑎𝑚delimited-[]subscript𝑚𝑓\displaystyle\mathbb{V}_{x_{m},r_{m},a_{m}}\left[\ell_{m}(f)\right]\leq 4% \mathbb{E}_{x_{m},r_{m},a_{m}}\left[\ell_{m}(f)\right],blackboard_V start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ) ] ≤ 4 blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ) ] ,

where 𝕍[]𝕍delimited-[]normal-⋅\mathbb{V}[\cdot]blackboard_V [ ⋅ ] denotes the variance of a random variable.

Then, we establish an upper bound for the excess risk bound required in Definition C.1 via the following lemma

Lemma D.2.

Under the setup of Assumption C.1, if the adopted FL protocol provides an exact minimizer for the optimization problem in Eqn. (1) with quadratic losses, i.e.,

f^=argminf1nm[M]i[nm](f(xmi,ami)ymi)2,^𝑓subscriptargmin𝑓1𝑛subscript𝑚delimited-[]𝑀subscript𝑖delimited-[]subscript𝑛𝑚superscript𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑦𝑚𝑖2\displaystyle\widehat{f}=\operatorname*{arg\,min}_{f\in\mathcal{F}}\frac{1}{n}% \sum_{m\in[M]}\sum_{i\in[n_{m}]}\left(f(x_{m}^{i},a_{m}^{i})-y_{m}^{i}\right)^% {2},over^ start_ARG italic_f end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_f ∈ caligraphic_F end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

then, with probability at least 1δ1𝛿1-\delta1 - italic_δ, it holds that

m[M]nmn𝔼xm𝒟m𝒳m,ampm(|xm)[(f^(xm,am)f*(xm,am))2]25log(||/δ)n.\displaystyle\sum_{m\in[M]}\frac{n_{m}}{n}\cdot\mathbb{E}_{x_{m}\sim\mathcal{D% }^{\mathcal{X}_{m}}_{m},a_{m}\sim p_{m}(\cdot|x_{m})}\left[\left(\widehat{f}(x% _{m},a_{m})-f^{*}(x_{m},a_{m})\right)^{2}\right]\leq\frac{25\log(|\mathcal{F}|% /\delta)}{n}.∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUPERSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ divide start_ARG 25 roman_log ( | caligraphic_F | / italic_δ ) end_ARG start_ARG italic_n end_ARG .

As a result, Definition C.1 holds with

(δ,n[M])O(log(||n)/n).𝛿subscript𝑛delimited-[]𝑀𝑂𝑛𝑛\displaystyle\mathcal{E}(\delta,n_{[M]})\leq O\left(\log(|\mathcal{F}|n)/n% \right).caligraphic_E ( italic_δ , italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ italic_O ( roman_log ( | caligraphic_F | italic_n ) / italic_n ) .
Proof.

For simplicity, we abbreviate the quadratic loss associated with a fixed function f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F as

mi(f)=m(f(xmi,ami);rmi):=(f(xmi,ami)rmi)2,m[M].formulae-sequencesuperscriptsubscript𝑚𝑖𝑓subscript𝑚𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖assignsuperscript𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖2for-all𝑚delimited-[]𝑀\displaystyle\ell_{m}^{i}(f)=\ell_{m}(f(x_{m}^{i},a_{m}^{i});r_{m}^{i}):=\left% (f(x_{m}^{i},a_{m}^{i})-r_{m}^{i}\right)^{2},\qquad\forall m\in[M].roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) = roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := ( italic_f ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_m ∈ [ italic_M ] .

Then, with a probability at least 1δ1𝛿1-\delta1 - italic_δ, for a fixed f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F, it holds that

m[M]im[nm]𝔼xmi,rmi,ami[mi(f)mi(f*)]m[M]i[nm][mi(f)mi(f*)]subscript𝑚delimited-[]𝑀subscriptsubscript𝑖𝑚delimited-[]subscript𝑛𝑚subscript𝔼superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑟𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑚𝑖𝑓superscriptsubscript𝑚𝑖superscript𝑓subscript𝑚delimited-[]𝑀subscript𝑖delimited-[]subscript𝑛𝑚delimited-[]superscriptsubscript𝑚𝑖𝑓superscriptsubscript𝑚𝑖superscript𝑓\displaystyle\sum_{m\in[M]}\sum_{i_{m}\in[n_{m}]}\mathbb{E}_{x_{m}^{i},r_{m}^{% i},a_{m}^{i}}\left[\ell_{m}^{i}(f)-\ell_{m}^{i}(f^{*})\right]-\sum_{m\in[M]}% \sum_{i\in[n_{m}]}\left[\ell_{m}^{i}(f)-\ell_{m}^{i}(f^{*})\right]∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] - ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ]
(a)2m[M]im[nm]𝕍xmi,rmi,ami[mi(f)mi(f*)]log(1/δ)+43log(1/δ)𝑎2subscript𝑚delimited-[]𝑀subscriptsubscript𝑖𝑚delimited-[]subscript𝑛𝑚subscript𝕍superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑟𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑚𝑖𝑓superscriptsubscript𝑚𝑖superscript𝑓1𝛿431𝛿\displaystyle\overset{(a)}{\leq}2\sqrt{\sum_{m\in[M]}\sum_{i_{m}\in[n_{m}]}% \mathbb{V}_{x_{m}^{i},r_{m}^{i},a_{m}^{i}}\left[\ell_{m}^{i}(f)-\ell_{m}^{i}(f% ^{*})\right]\log(1/\delta)}+\frac{4}{3}\log(1/\delta)start_OVERACCENT ( italic_a ) end_OVERACCENT start_ARG ≤ end_ARG 2 square-root start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT blackboard_V start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] roman_log ( 1 / italic_δ ) end_ARG + divide start_ARG 4 end_ARG start_ARG 3 end_ARG roman_log ( 1 / italic_δ )
(b)4m[M]im[nm]𝔼xmi,rmi,ami[mi(f)mi(f*)]log(1/δ)+43log(1/δ),𝑏4subscript𝑚delimited-[]𝑀subscriptsubscript𝑖𝑚delimited-[]subscript𝑛𝑚subscript𝔼superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑟𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑚𝑖𝑓superscriptsubscript𝑚𝑖superscript𝑓1𝛿431𝛿\displaystyle\overset{(b)}{\leq}4\sqrt{\sum_{m\in[M]}\sum_{i_{m}\in[n_{m}]}% \mathbb{E}_{x_{m}^{i},r_{m}^{i},a_{m}^{i}}\left[\ell_{m}^{i}(f)-\ell_{m}^{i}(f% ^{*})\right]\log(1/\delta)}+\frac{4}{3}\log(1/\delta),start_OVERACCENT ( italic_b ) end_OVERACCENT start_ARG ≤ end_ARG 4 square-root start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] roman_log ( 1 / italic_δ ) end_ARG + divide start_ARG 4 end_ARG start_ARG 3 end_ARG roman_log ( 1 / italic_δ ) ,

where inequality (a) leverages Bernstein’s inequality and inequality (b) is based on Lemma D.1.

With

X(f)=m[M]im[nm]𝔼xmi,rmi,ami[mi(f)m,i(f*)];𝑋𝑓subscript𝑚delimited-[]𝑀subscriptsubscript𝑖𝑚delimited-[]subscript𝑛𝑚subscript𝔼superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑟𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑚𝑖𝑓subscript𝑚𝑖superscript𝑓\displaystyle X(f)=\sqrt{\sum_{m\in[M]}\sum_{i_{m}\in[n_{m}]}\mathbb{E}_{x_{m}% ^{i},r_{m}^{i},a_{m}^{i}}\left[\ell_{m}^{i}(f)-\ell_{m,i}(f^{*})\right]};italic_X ( italic_f ) = square-root start_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] end_ARG ;
Z(f)=m[M]i[nm][mi(f)m,i(f*)];C=log(1/δ),formulae-sequence𝑍𝑓subscript𝑚delimited-[]𝑀subscript𝑖delimited-[]subscript𝑛𝑚delimited-[]superscriptsubscript𝑚𝑖𝑓subscript𝑚𝑖superscript𝑓𝐶1𝛿\displaystyle Z(f)=\sum_{m\in[M]}\sum_{i\in[n_{m}]}\left[\ell_{m}^{i}(f)-\ell_% {m,i}(f^{*})\right];\qquad C=\sqrt{\log(1/\delta)},italic_Z ( italic_f ) = ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f ) - roman_ℓ start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ; italic_C = square-root start_ARG roman_log ( 1 / italic_δ ) end_ARG ,

applying a union bound to the above inequality indicates that with probability 1||δ1𝛿1-|\mathcal{F}|\delta1 - | caligraphic_F | italic_δ, for all f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F, it holds that

X(f)2Z(f)4CX(f)+43C2(X(f)2C)2Z(f)163C2.formulae-sequence𝑋superscript𝑓2𝑍𝑓4𝐶𝑋𝑓43superscript𝐶2superscript𝑋𝑓2𝐶2𝑍𝑓163superscript𝐶2\displaystyle X(f)^{2}-Z(f)\leq 4CX(f)+\frac{4}{3}C^{2}\quad\Rightarrow\quad(X% (f)-2C)^{2}-Z(f)\leq\frac{16}{3}C^{2}.italic_X ( italic_f ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_Z ( italic_f ) ≤ 4 italic_C italic_X ( italic_f ) + divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⇒ ( italic_X ( italic_f ) - 2 italic_C ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_Z ( italic_f ) ≤ divide start_ARG 16 end_ARG start_ARG 3 end_ARG italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since f^^𝑓\widehat{f}over^ start_ARG italic_f end_ARG satisfies that Z(f^)0𝑍^𝑓0Z(\widehat{f})\leq 0italic_Z ( over^ start_ARG italic_f end_ARG ) ≤ 0, we can obtain that

X(f^)225C2,𝑋superscript^𝑓225superscript𝐶2\displaystyle X(\widehat{f})^{2}\leq 25C^{2},italic_X ( over^ start_ARG italic_f end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 25 italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

In other words, with probability 1δ1𝛿1-\delta1 - italic_δ, it holds that

m[M]im[nm]𝔼xmi,rmi,ami[(f^(xmi,ami)rmi)2(f*(xmi,ami)rmi)2]subscript𝑚delimited-[]𝑀subscriptsubscript𝑖𝑚delimited-[]subscript𝑛𝑚subscript𝔼superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑟𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscript^𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖2superscriptsuperscript𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖2\displaystyle\sum_{m\in[M]}\sum_{i_{m}\in[n_{m}]}\mathbb{E}_{x_{m}^{i},r_{m}^{% i},a_{m}^{i}}\left[\left(\widehat{f}(x_{m}^{i},a_{m}^{i})-r_{m}^{i}\right)^{2}% -\left(f^{*}(x_{m}^{i},a_{m}^{i})-r_{m}^{i}\right)^{2}\right]∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=m[M]nm𝔼xmi,ami[(f^(xmi,ami)f*(xmi,ami))2]25log(||/δ),absentsubscript𝑚delimited-[]𝑀subscript𝑛𝑚subscript𝔼superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖delimited-[]superscript^𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscript𝑓superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖225𝛿\displaystyle=\sum_{m\in[M]}n_{m}\mathbb{E}_{x_{m}^{i},a_{m}^{i}}\left[\left(% \widehat{f}(x_{m}^{i},a_{m}^{i})-f^{*}(x_{m}^{i},a_{m}^{i})\right)^{2}\right]% \leq 25\log(|\mathcal{F}|/\delta),= ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 25 roman_log ( | caligraphic_F | / italic_δ ) ,

where the equality is from the realizability in Assumption 3.1. The first half of the lemma is then proved.

With δ=1/n𝛿1𝑛\delta=1/nitalic_δ = 1 / italic_n, the second half can be obtained as

𝔼S[M][m[M]nmn𝔼xm,am[(f^(xm,am)f*(xm,am))2]]25log(||n)n+1n,subscript𝔼subscript𝑆delimited-[]𝑀delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚delimited-[]superscript^𝑓subscript𝑥𝑚subscript𝑎𝑚superscript𝑓subscript𝑥𝑚subscript𝑎𝑚225𝑛𝑛1𝑛\displaystyle\mathbb{E}_{S_{[M]}}\left[\sum_{m\in[M]}\frac{n_{m}}{n}\cdot% \mathbb{E}_{x_{m},a_{m}}\left[\left(\widehat{f}(x_{m},a_{m})-f^{*}(x_{m},a_{m}% )\right)^{2}\right]\right]\leq\frac{25\log(|\mathcal{F}|n)}{n}+\frac{1}{n},blackboard_E start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ] ≤ divide start_ARG 25 roman_log ( | caligraphic_F | italic_n ) end_ARG start_ARG italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ,

which concludes the proof. ∎

Based on the established excess risk bound, Corollary 4.2 can be obtained as follows.

Corollary D.3 (Restatement of Corollary 4.2).

If ||<|\mathcal{F}|<\infty| caligraphic_F | < ∞ and the adopted FL protocol provides an exact minimizer for Eqn. (1) with quadratic losses, with τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, FedIGW incurs a regret of

Reg(T)=O(KMTlog(||MT))Reg𝑇𝑂𝐾𝑀𝑇𝑀𝑇\displaystyle\textup{Reg}(T)=O(\sqrt{KMT\log(|\mathcal{F}|MT)})Reg ( italic_T ) = italic_O ( square-root start_ARG italic_K italic_M italic_T roman_log ( | caligraphic_F | italic_M italic_T ) end_ARG )

and a total O(log(T))𝑂𝑇O(\log(T))italic_O ( roman_log ( italic_T ) ) calls of the adopted FL protocol.

Proof of Corollary 4.2.

With Theorem 4.1 and Lemma D.2, under the choice of τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, the regret can be bounded as

Reg(T)Reg𝑇\displaystyle\textup{Reg}(T)Reg ( italic_T ) =O(ME1+l[2,l(T)]KMEllog(||MEl))absent𝑂𝑀superscript𝐸1subscript𝑙2𝑙𝑇𝐾𝑀superscript𝐸𝑙𝑀superscript𝐸𝑙\displaystyle=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{KME^{l}\log(|\mathcal{F}|% ME^{l})}\right)= italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT roman_log ( | caligraphic_F | italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG )
=O(l[2,log2(T)]KM2llog(||MT))absent𝑂subscript𝑙2subscript2𝑇𝐾𝑀superscript2𝑙𝑀𝑇\displaystyle=O\left(\sum_{l\in[2,\lceil\log_{2}(T)\rceil]}\sqrt{KM2^{l}\log(|% \mathcal{F}|MT)}\right)= italic_O ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ⌉ ] end_POSTSUBSCRIPT square-root start_ARG italic_K italic_M 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT roman_log ( | caligraphic_F | italic_M italic_T ) end_ARG )
=O(KMTlog(||MT)),absent𝑂𝐾𝑀𝑇𝑀𝑇\displaystyle=O\left(\sqrt{KMT\log(|\mathcal{F}|MT)}\right),= italic_O ( square-root start_ARG italic_K italic_M italic_T roman_log ( | caligraphic_F | italic_M italic_T ) end_ARG ) ,

and the exponentially growing epoch length naturally leads to O(log(T))𝑂𝑇O(\log(T))italic_O ( roman_log ( italic_T ) ) calls of the adopted FL protocol, which concludes the proof. ∎

D.2 Proofs of Corollary 4.4 and Additional Results

In the following, we first prove Lemma 4.3 while also noting that this result is general and does not rely on the specific parameterization of \mathcal{F}caligraphic_F, although we presented it with the d𝑑ditalic_d-dimensional parameterization considered in Section 4.2.

Lemma D.4 (Complete Version of Lemma 4.3).

If the loss function lm(;)subscript𝑙𝑚normal-⋅normal-⋅l_{m}(\cdot;\cdot)italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( ⋅ ; ⋅ ) is μfsubscript𝜇𝑓\mu_{f}italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT-strongly convex in its first coordinate for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], i.e.,

lm(z1;z2)lm(z1;z2)dlm(z1;z2)dz1(z1z1)+μf2(z1z1)2,for any z1,z1 and z2,subscript𝑙𝑚superscriptsubscript𝑧1subscript𝑧2subscript𝑙𝑚subscript𝑧1subscript𝑧2dsubscript𝑙𝑚subscript𝑧1subscript𝑧2dsubscript𝑧1superscriptsubscript𝑧1subscript𝑧1subscript𝜇𝑓2superscriptsuperscriptsubscript𝑧1subscript𝑧12for any z1,z1 and z2,\displaystyle l_{m}(z_{1}^{\prime};z_{2})-l_{m}(z_{1};z_{2})\geq\frac{\textup{% d}l_{m}(z_{1};z_{2})}{\textup{d}z_{1}}\cdot(z_{1}^{\prime}-z_{1})+\frac{\mu_{f% }}{2}(z_{1}^{\prime}-z_{1})^{2},\quad\text{for any $z_{1},z_{1}^{\prime}$ and % $z_{2}$,}italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ divide start_ARG d italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG d italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⋅ ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for any italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

and

infy𝔼rm[lm(y,rm(am))|xm,am]=𝔼rm[l(fω*(xm,am),rm(am))|xm,am]subscriptinfimum𝑦subscript𝔼subscript𝑟𝑚delimited-[]conditionalsubscript𝑙𝑚𝑦subscript𝑟𝑚subscript𝑎𝑚subscript𝑥𝑚subscript𝑎𝑚subscript𝔼subscript𝑟𝑚delimited-[]conditional𝑙subscript𝑓superscript𝜔subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚subscript𝑎𝑚subscript𝑥𝑚subscript𝑎𝑚\displaystyle\inf_{y\in\mathbb{R}}\mathbb{E}_{r_{m}}[l_{m}(y,r_{m}(a_{m}))|x_{% m},a_{m}]=\mathbb{E}_{r_{m}}[l(f_{\omega^{*}}(x_{m},a_{m}),r_{m}(a_{m}))|x_{m}% ,a_{m}]roman_inf start_POSTSUBSCRIPT italic_y ∈ blackboard_R end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_y , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] = blackboard_E start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_l ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) | italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] (3)

for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ], (xm,am)𝒳m×𝒜msubscript𝑥𝑚subscript𝑎𝑚subscript𝒳𝑚subscript𝒜𝑚(x_{m},a_{m})\in\mathcal{X}_{m}\times\mathcal{A}_{m}( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT × caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Definition C.1 holds with

(;n[M])2(ε𝑜𝑝𝑡(;n[M])+ε𝑔𝑒𝑛(;n[M]))/μf,subscript𝑛delimited-[]𝑀2subscript𝜀𝑜𝑝𝑡subscript𝑛delimited-[]𝑀subscript𝜀𝑔𝑒𝑛subscript𝑛delimited-[]𝑀subscript𝜇𝑓\displaystyle\mathcal{E}(\mathcal{F};n_{[M]})\geq 2\left(\varepsilon_{\text{% opt}}(\mathcal{F};n_{[M]})+\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]})\right% )/\mu_{f},caligraphic_E ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≥ 2 ( italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ) / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ,

where

ε𝑔𝑒𝑛(;n[M]):=𝔼𝒮,ξ[(fω^𝒮)^(fω^𝒮;𝒮)];assignsubscript𝜀𝑔𝑒𝑛subscript𝑛delimited-[]𝑀subscript𝔼𝒮𝜉delimited-[]subscript𝑓subscript^𝜔𝒮^subscript𝑓subscript^𝜔𝒮𝒮\displaystyle\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]}):=\mathbb{E}_{% \mathcal{S},\xi}[\mathcal{L}(f_{\widehat{\omega}_{\mathcal{S}}})-\widehat{% \mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};\mathcal{S})];italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] ;
ε𝑜𝑝𝑡(;n[M]):=𝔼𝒮,ξ[^(fω^𝒮;𝒮)^(fω𝒮*;𝒮)].assignsubscript𝜀𝑜𝑝𝑡subscript𝑛delimited-[]𝑀subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮^subscript𝑓subscriptsuperscript𝜔𝒮𝒮\displaystyle\varepsilon_{\text{opt}}(\mathcal{F};n_{[M]}):=\mathbb{E}_{% \mathcal{S},\xi}[\widehat{\mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};% \mathcal{S})-\widehat{\mathcal{L}}(f_{\omega^{*}_{\mathcal{S}}};\mathcal{S})].italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] .
Proof.

First, for any ω^𝒮subscript^𝜔𝒮\widehat{\omega}_{\mathcal{S}}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT, it holds that

(fω^𝒮)(fω*)subscript𝑓subscript^𝜔𝒮subscript𝑓superscript𝜔\displaystyle\mathcal{L}(f_{\widehat{\omega}_{\mathcal{S}}})-\mathcal{L}(f_{% \omega^{*}})caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
=m[M]nmn𝔼xm,i,am,i,rm,i[(fω^𝒮(xm,i,am,i);rm,i)(fω*(xm,i,am,i);rm,i)]absentsubscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑟𝑚𝑖delimited-[]subscript𝑓subscript^𝜔𝒮subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑟𝑚𝑖subscript𝑓superscript𝜔subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑟𝑚𝑖\displaystyle=\sum_{m\in[M]}\frac{n_{m}}{n}\mathbb{E}_{x_{m,i},a_{m,i},r_{m,i}% }\left[\ell(f_{\widehat{\omega}_{\mathcal{S}}}(x_{m,i},a_{m,i});r_{m,i})-\ell(% f_{\omega^{*}}(x_{m,i},a_{m,i});r_{m,i})\right]= ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ℓ ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) - roman_ℓ ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ]
μf2m[M]nmn𝔼xm,i,am,i[(fω^𝒮(xm,i,am,i)fω*(xm,i,am,i))2]absentsubscript𝜇𝑓2subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑓subscript^𝜔𝒮subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑓superscript𝜔subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖2\displaystyle\geq\frac{\mu_{f}}{2}\sum_{m\in[M]}\frac{n_{m}}{n}\mathbb{E}_{x_{% m,i},a_{m,i}}\left[\left(f_{\widehat{\omega}_{\mathcal{S}}}(x_{m,i},a_{m,i})-f% _{\omega^{*}}(x_{m,i},a_{m,i})\right)^{2}\right]≥ divide start_ARG italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

where the inequality is due to the strong convexity of (;)\ell(\cdot;\cdot)roman_ℓ ( ⋅ ; ⋅ ) w.r.t. its first coordinate and the optimality of fω*subscript𝑓superscript𝜔f_{\omega^{*}}italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT assumed in Eqn. (3). Thus, we obtain that

m[M]nmn𝔼xm,i,am,i[(fω^𝒮(xm,i,am,i)fω*(xm,i,am,i))2]2μf((fω^𝒮)(fω*)).subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖delimited-[]superscriptsubscript𝑓subscript^𝜔𝒮subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑓superscript𝜔subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖22subscript𝜇𝑓subscript𝑓subscript^𝜔𝒮subscript𝑓superscript𝜔\displaystyle\sum_{m\in[M]}\frac{n_{m}}{n}\mathbb{E}_{x_{m,i},a_{m,i}}\left[% \left(f_{\widehat{\omega}_{\mathcal{S}}}(x_{m,i},a_{m,i})-f_{\omega^{*}}(x_{m,% i},a_{m,i})\right)^{2}\right]\leq\frac{2}{\mu_{f}}\left(\mathcal{L}(f_{% \widehat{\omega}_{\mathcal{S}}})-\mathcal{L}(f_{\omega^{*}})\right).∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG ( caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) .

Furthermore, it holds that

𝔼𝒮,ξ[(fω^𝒮)](fω*)subscript𝔼𝒮𝜉delimited-[]subscript𝑓subscript^𝜔𝒮subscript𝑓superscript𝜔\displaystyle\mathbb{E}_{\mathcal{S},\xi}\left[\mathcal{L}(f_{\widehat{\omega}% _{\mathcal{S}}})\right]-\mathcal{L}(f_{\omega^{*}})blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] - caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
=𝔼𝒮,ξ[(fω^𝒮)]𝔼𝒮,ξ[^(fω^𝒮;𝒮)]+𝔼𝒮,ξ[^(fω^𝒮;𝒮)](fω*)absentsubscript𝔼𝒮𝜉delimited-[]subscript𝑓subscript^𝜔𝒮subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮subscript𝑓superscript𝜔\displaystyle=\mathbb{E}_{\mathcal{S},\xi}\left[\mathcal{L}(f_{\widehat{\omega% }_{\mathcal{S}}})\right]-\mathbb{E}_{\mathcal{S},\xi}\left[\widehat{\mathcal{L% }}(f_{\widehat{\omega}_{\mathcal{S}}};\mathcal{S})\right]+\mathbb{E}_{\mathcal% {S},\xi}\left[\widehat{\mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};% \mathcal{S})\right]-\mathcal{L}(f_{\omega^{*}})= blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] - blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] + blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] - caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
𝔼𝒮,ξ[(fω^𝒮)]𝔼𝒮,ξ[^(fω^𝒮;𝒮)]+𝔼𝒮,ξ[^(fω^𝒮;𝒮)]𝔼𝒮,ξ[^(fω𝒮*;𝒮)],absentsubscript𝔼𝒮𝜉delimited-[]subscript𝑓subscript^𝜔𝒮subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮subscript𝔼𝒮𝜉delimited-[]^subscript𝑓subscriptsuperscript𝜔𝒮𝒮\displaystyle\leq\mathbb{E}_{\mathcal{S},\xi}\left[\mathcal{L}(f_{\widehat{% \omega}_{\mathcal{S}}})\right]-\mathbb{E}_{\mathcal{S},\xi}\left[\widehat{% \mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};\mathcal{S})\right]+\mathbb{E}% _{\mathcal{S},\xi}\left[\widehat{\mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}% }};\mathcal{S})\right]-\mathbb{E}_{\mathcal{S},\xi}\left[\widehat{\mathcal{L}}% (f_{\omega^{*}_{\mathcal{S}}};\mathcal{S})\right],≤ blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] - blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] + blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] - blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] ,

where the last inequality is due to

(fω*)=𝔼𝒮[^(fω*;𝒮)]𝔼𝒮[^(fω𝒮*;𝒮)].subscript𝑓superscript𝜔subscript𝔼𝒮delimited-[]^subscript𝑓superscript𝜔𝒮subscript𝔼𝒮delimited-[]^subscript𝑓subscriptsuperscript𝜔𝒮𝒮\displaystyle\mathcal{L}(f_{\omega^{*}})=\mathbb{E}_{\mathcal{S}}\left[% \widehat{\mathcal{L}}(f_{\omega^{*}};\mathcal{S})\right]\geq\mathbb{E}_{% \mathcal{S}}\left[\widehat{\mathcal{L}}(f_{\omega^{*}_{\mathcal{S}}};\mathcal{% S})\right].caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] ≥ blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] .

The proof is then concluded. ∎

Then, for the generalization error analyses, the following lemma can be obtained via standard proofs (e.g., Theorem 6.4 in Zhang (2023); Theorem 3.3 in Mohri et al. (2018)).

Lemma D.5.

It holds that

ε𝑔𝑒𝑛(;n[M]):=𝔼𝒮,ξ[(fω^𝒮)^(fω^𝒮;𝒮)]2(;n[M]).assignsubscript𝜀𝑔𝑒𝑛subscript𝑛delimited-[]𝑀subscript𝔼𝒮𝜉delimited-[]subscript𝑓subscript^𝜔𝒮^subscript𝑓subscript^𝜔𝒮𝒮2subscript𝑛delimited-[]𝑀\displaystyle\varepsilon_{\text{gen}}(\mathcal{F};n_{[M]}):=\mathbb{E}_{% \mathcal{S},\xi}[\mathcal{L}(f_{\widehat{\omega}_{\mathcal{S}}})-\widehat{% \mathcal{L}}(f_{\widehat{\omega}_{\mathcal{S}}};\mathcal{S})]\leq 2\mathfrak{R% }(\mathcal{F};n_{[M]}).italic_ε start_POSTSUBSCRIPT gen end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ] ≤ 2 fraktur_R ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) .

Here, the distributional-independent upper bound (;n[M])subscript𝑛delimited-[]𝑀\mathfrak{R}(\mathcal{F};n_{[M]})fraktur_R ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) on the Rademacher complexity is defined as

(;n[M]):=sup{𝔼𝒮[M],𝝈[supω{m[M]1ni[nm]σm,im(fω(xm,i,am,i);rm,i)}]},assignsubscript𝑛delimited-[]𝑀supremumsubscript𝔼subscript𝒮delimited-[]𝑀𝝈delimited-[]subscriptsupremum𝜔subscript𝑚delimited-[]𝑀1𝑛subscript𝑖delimited-[]subscript𝑛𝑚subscript𝜎𝑚𝑖subscript𝑚subscript𝑓𝜔subscript𝑥𝑚𝑖subscript𝑎𝑚𝑖subscript𝑟𝑚𝑖\displaystyle\mathfrak{R}(\mathcal{F};n_{[M]}):=\sup\left\{\mathbb{E}_{% \mathcal{S}_{[M]},\bm{\sigma}}\left[\sup_{\omega}\left\{\sum_{m\in[M]}\frac{1}% {n}\sum_{i\in[n_{m}]}\sigma_{m,i}\cdot\ell_{m}(f_{\omega}(x_{m,i},a_{m,i});r_{% m,i})\right\}\right]\right\},fraktur_R ( caligraphic_F ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := roman_sup { blackboard_E start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT , bold_italic_σ end_POSTSUBSCRIPT [ roman_sup start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT { ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ⋅ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ) } ] } , (4)

where the outside supremum is over possible distributions of dataset 𝒮𝒮\mathcal{S}caligraphic_S defined in Definition C.1 and the expectation is w.r.t. the generation of dataset 𝒮[M]subscript𝒮delimited-[]𝑀\mathcal{S}_{[M]}caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT following a fixed distribution and independent Rademacher random variables 𝛔:={σm,i:m[M],i[nm]}assign𝛔conditional-setsubscript𝜎𝑚𝑖formulae-sequence𝑚delimited-[]𝑀𝑖delimited-[]subscript𝑛𝑚\bm{\sigma}:=\{\sigma_{m,i}:m\in[M],i\in[n_{m}]\}bold_italic_σ := { italic_σ start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] , italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] }.

The optimization error of FedAvg (McMahan et al., 2017) and SCAFFOLD (Karimireddy et al., 2020) are presented in Appendices F.1 and F.2. Combining the generalization error and optimization error via Lemma 4.3 into Theorem 4.1, Corollary 4.4 can be obtained, which is restated in the following.

Corollary D.6 (Restatement of Corollary 4.4).

Under the condition of Lemma 4.3, the regret of FedIGW can be bounded as

Reg(T)=O(ME1+l[2,l(T)]K(l1+ε𝑜𝑝𝑡l))/μfMEl),\displaystyle\textup{Reg}(T)=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{K\left(% \mathfrak{R}^{l-1}+\varepsilon_{\text{opt}}^{l})\right)/\mu_{f}}ME^{l}\right),Reg ( italic_T ) = italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K ( fraktur_R start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT + italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ,

where l:=(;{El:m[M]})assignsuperscript𝑙conditional-setsuperscript𝐸𝑙𝑚delimited-[]𝑀\mathfrak{R}^{l}:=\mathfrak{R}(\mathcal{F};\{E^{l}:m\in[M]\})fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := fraktur_R ( caligraphic_F ; { italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : italic_m ∈ [ italic_M ] } ) and using ρlsuperscript𝜌𝑙\rho^{l}italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT rounds of agents-server communications (i.e., global aggregations) and κlsuperscript𝜅𝑙\kappa^{l}italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT rounds of local updates in epoch l𝑙litalic_l, under certain assumptions,

  • with FedAvg as 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ), if ^m(fω;𝒮[M]l)subscript^𝑚subscript𝑓𝜔subscriptsuperscript𝒮𝑙delimited-[]𝑀\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}^{l}_{[M]})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex and βωsubscript𝛽𝜔\beta_{\omega}italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-smooth w.r.t. ω𝜔\omegaitalic_ω for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ] while the gradients are unbiased, have a σb2subscriptsuperscript𝜎2𝑏\sigma^{2}_{b}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded variance and have a Gbsubscript𝐺𝑏G_{b}italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded dissimilarity, the output fω^lsubscript𝑓superscript^𝜔𝑙f_{\widehat{\omega}^{l}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT satisfies that ε𝑜𝑝𝑡l:=εopt(;n[M]l)O~(σb2(μωρlκlM)1+βωGb2(μωρl)2)assignsuperscriptsubscript𝜀𝑜𝑝𝑡𝑙subscript𝜀𝑜𝑝𝑡subscriptsuperscript𝑛𝑙delimited-[]𝑀~𝑂superscriptsubscript𝜎𝑏2superscriptsubscript𝜇𝜔superscript𝜌𝑙superscript𝜅𝑙𝑀1subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔superscript𝜌𝑙2\varepsilon_{\text{opt}}^{l}:=\varepsilon_{opt}(\mathcal{F};n^{l}_{[M]})\leq% \tilde{O}(\sigma_{b}^{2}(\mu_{\omega}\rho^{l}\kappa^{l}M)^{-1}+\beta_{\omega}G% _{b}^{2}(\mu_{\omega}\rho^{l})^{-2})italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := italic_ε start_POSTSUBSCRIPT italic_o italic_p italic_t end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ over~ start_ARG italic_O end_ARG ( italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ), when ρlΩ(βω/μω)superscript𝜌𝑙Ωsubscript𝛽𝜔subscript𝜇𝜔\rho^{l}\geq\Omega(\beta_{\omega}/\mu_{\omega})italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ≥ roman_Ω ( italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT / italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) (see Lemma F.1 for the full statement);

  • with SCAFFOLD as 𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎()𝙵𝙻𝚛𝚘𝚞𝚝𝚒𝚗𝚎\texttt{FLroutine}(\cdot)FLroutine ( ⋅ ), if ^m(fω;𝒮[M]l)subscript^𝑚subscript𝑓𝜔subscriptsuperscript𝒮𝑙delimited-[]𝑀\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}^{l}_{[M]})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex and βωsubscript𝛽𝜔\beta_{\omega}italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-smooth w.r.t. ω𝜔\omegaitalic_ω for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ] while the gradients are unbiased and have a σb2subscriptsuperscript𝜎2𝑏\sigma^{2}_{b}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded variance, the output fω^lsubscript𝑓superscript^𝜔𝑙f_{\widehat{\omega}^{l}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT satisfies that ε𝑜𝑝𝑡l:=εopt(;n[M]l)O~(σb2(μωρlκlM)1)assignsuperscriptsubscript𝜀𝑜𝑝𝑡𝑙subscript𝜀𝑜𝑝𝑡subscriptsuperscript𝑛𝑙delimited-[]𝑀~𝑂superscriptsubscript𝜎𝑏2superscriptsubscript𝜇𝜔superscript𝜌𝑙superscript𝜅𝑙𝑀1\varepsilon_{\text{opt}}^{l}:=\varepsilon_{opt}(\mathcal{F};n^{l}_{[M]})\leq% \tilde{O}(\sigma_{b}^{2}(\mu_{\omega}\rho^{l}\kappa^{l}M)^{-1})italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := italic_ε start_POSTSUBSCRIPT italic_o italic_p italic_t end_POSTSUBSCRIPT ( caligraphic_F ; italic_n start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ over~ start_ARG italic_O end_ARG ( italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ), when ρlΩ(βω/μω)superscript𝜌𝑙Ωsubscript𝛽𝜔subscript𝜇𝜔\rho^{l}\geq\Omega(\beta_{\omega}/\mu_{\omega})italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ≥ roman_Ω ( italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT / italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) (see Lemma F.6 for the full statement);.

By further setting a suitable number of global aggregations for each epoch such that the optimization error is on the same order as the generalization error, the following more specific corollary can obtained for FedAvg and SCAFFOLD, which can be easily extended for other FL designs.

Corollary D.7.

Under the conditions of Lemma 4.3 and Corollary D.6, FedIGW incurs a regret of

Reg(T)=O(ME1+l[2,l(T)]Kl1/μfMEl)Reg𝑇𝑂𝑀superscript𝐸1subscript𝑙2𝑙𝑇𝐾superscript𝑙1subscript𝜇𝑓𝑀superscript𝐸𝑙\displaystyle\textup{Reg}(T)=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{K\mathfrak% {R}^{l-1}/\mu_{f}}ME^{l}\right)Reg ( italic_T ) = italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K fraktur_R start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )

with the following bounds on the rounds of communications

O~(l[l(T)]βωμω+σb2μωlκlM+βωGb2μω2l)(using FedAvg);~𝑂subscript𝑙delimited-[]𝑙𝑇subscript𝛽𝜔subscript𝜇𝜔superscriptsubscript𝜎𝑏2subscript𝜇𝜔superscript𝑙superscript𝜅𝑙𝑀subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔2superscript𝑙(using FedAvg)\displaystyle\tilde{O}\left(\sum_{l\in[l(T)]}\frac{\beta_{\omega}}{\mu_{\omega% }}+\frac{\sigma_{b}^{2}}{\mu_{\omega}\mathfrak{R}^{l}\kappa^{l}M}+\sqrt{\frac{% \beta_{\omega}G_{b}^{2}}{\mu_{\omega}^{2}\mathfrak{R}^{l}}}\right)\qquad\text{% (using FedAvg)};over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ italic_l ( italic_T ) ] end_POSTSUBSCRIPT divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M end_ARG + square-root start_ARG divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG end_ARG ) (using FedAvg) ;
O~(l[l(T)]βωμω+σb2μωlκlM)(using SCAFFOLD),~𝑂subscript𝑙delimited-[]𝑙𝑇subscript𝛽𝜔subscript𝜇𝜔superscriptsubscript𝜎𝑏2subscript𝜇𝜔superscript𝑙superscript𝜅𝑙𝑀(using SCAFFOLD)\displaystyle\tilde{O}\left(\sum_{l\in[l(T)]}\frac{\beta_{\omega}}{\mu_{\omega% }}+\frac{\sigma_{b}^{2}}{\mu_{\omega}\mathfrak{R}^{l}\kappa^{l}M}\right)\qquad% \text{(using SCAFFOLD)},over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ italic_l ( italic_T ) ] end_POSTSUBSCRIPT divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M end_ARG ) (using SCAFFOLD) ,

where l:=([M],{El:m[M]})assignsuperscript𝑙subscriptdelimited-[]𝑀conditional-setsuperscript𝐸𝑙𝑚delimited-[]𝑀\mathfrak{R}^{l}:=\mathfrak{R}(\mathcal{F}_{[M]},\{E^{l}:m\in[M]\})fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := fraktur_R ( caligraphic_F start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT , { italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : italic_m ∈ [ italic_M ] } ) and κlsuperscript𝜅𝑙\kappa^{l}italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is the number of local updates in epoch l𝑙litalic_l.

Proof.

From Corollary D.6, when using FedAvg as the adopted FL protocol in FedIGW, the optimization error in epoch l𝑙litalic_l of form

O~(σb2μωρlκlM+βωGb2μω2(ρl)2),~𝑂superscriptsubscript𝜎𝑏2subscript𝜇𝜔superscript𝜌𝑙superscript𝜅𝑙𝑀subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔2superscriptsuperscript𝜌𝑙2\displaystyle\tilde{O}\left(\frac{\sigma_{b}^{2}}{\mu_{\omega}\rho^{l}\kappa^{% l}M}+\frac{\beta_{\omega}G_{b}^{2}}{\mu_{\omega}^{2}(\rho^{l})^{2}}\right),over~ start_ARG italic_O end_ARG ( divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M end_ARG + divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ,

when ρl=Ω(βω/μω)superscript𝜌𝑙Ωsubscript𝛽𝜔subscript𝜇𝜔\rho^{l}=\Omega(\beta_{\omega}/\mu_{\omega})italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = roman_Ω ( italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT / italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ). Thus, if the communication rounds

ρl=Θ~(βωμω+σb2μωlκlM+βωGb2μω2l).superscript𝜌𝑙~Θsubscript𝛽𝜔subscript𝜇𝜔superscriptsubscript𝜎𝑏2subscript𝜇𝜔superscript𝑙superscript𝜅𝑙𝑀subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔2superscript𝑙\displaystyle\rho^{l}=\tilde{\Theta}\left(\frac{\beta_{\omega}}{\mu_{\omega}}+% \frac{\sigma_{b}^{2}}{\mu_{\omega}\mathfrak{R}^{l}\kappa^{l}M}+\sqrt{\frac{% \beta_{\omega}G_{b}^{2}}{\mu_{\omega}^{2}\mathfrak{R}^{l}}}\right).italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = over~ start_ARG roman_Θ end_ARG ( divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M end_ARG + square-root start_ARG divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG end_ARG ) .

we are guaranteed to have the optimization error on the order of O(l)𝑂superscript𝑙O(\mathfrak{R}^{l})italic_O ( fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ).

Then, the regret in Corollary 4.4 is of order

Reg(T)=O(ME1+l[2,l(T)]Kl1/μfMEl)Reg𝑇𝑂𝑀superscript𝐸1subscript𝑙2𝑙𝑇𝐾superscript𝑙1subscript𝜇𝑓𝑀superscript𝐸𝑙\displaystyle\textup{Reg}(T)=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{K\mathfrak% {R}^{l-1}/\mu_{f}}ME^{l}\right)Reg ( italic_T ) = italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K fraktur_R start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )

while the overall communication rounds can be bounded as

l[l(T)]ρl=O~(l[l(T)]βωμω+σb2μωlκlM+βωGb2μω2l),subscript𝑙delimited-[]𝑙𝑇superscript𝜌𝑙~𝑂subscript𝑙delimited-[]𝑙𝑇subscript𝛽𝜔subscript𝜇𝜔superscriptsubscript𝜎𝑏2subscript𝜇𝜔superscript𝑙superscript𝜅𝑙𝑀subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔2superscript𝑙\displaystyle\sum_{l\in[l(T)]}\rho^{l}=\tilde{O}\left(\sum_{l\in[l(T)]}\frac{% \beta_{\omega}}{\mu_{\omega}}+\frac{\sigma_{b}^{2}}{\mu_{\omega}\mathfrak{R}^{% l}\kappa^{l}M}+\sqrt{\frac{\beta_{\omega}G_{b}^{2}}{\mu_{\omega}^{2}\mathfrak{% R}^{l}}}\right),∑ start_POSTSUBSCRIPT italic_l ∈ [ italic_l ( italic_T ) ] end_POSTSUBSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ italic_l ( italic_T ) ] end_POSTSUBSCRIPT divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M end_ARG + square-root start_ARG divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG end_ARG ) ,

which concludes the proof for FedAvg. The result of using SCAFFOLD can be similarly obtained. ∎

D.3 A Linear Reward Function Class

We here provide a detailed discussion on the linear reward function class considered in Remark 4.5 at the end of Section 4.2. Especially, following standard assumptions in linear bandits (Abbasi-Yadkori et al., 2011) and federated linear bandits (Li & Wang, 2022a; He et al., 2022; Amani et al., 2022), we consider μm(xm,am)=ϕ(xm,am),ω*subscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚italic-ϕsubscript𝑥𝑚subscript𝑎𝑚superscript𝜔\mu_{m}(x_{m},a_{m})=\langle\phi(x_{m},a_{m}),\omega^{*}\rangleitalic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = ⟨ italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩, where ϕ()italic-ϕ\phi(\cdot)italic_ϕ ( ⋅ ) is a known d𝑑ditalic_d-dimensional map** and ω*superscript𝜔\omega^{*}italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is an unknown d𝑑ditalic_d-dimensional system parameter. Then, it is sufficient to consider a linear function class \mathcal{F}caligraphic_F, where fω()=ω,ϕ()subscript𝑓𝜔𝜔italic-ϕf_{\omega}(\cdot)=\langle\omega,\phi(\cdot)\rangleitalic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ⋅ ) = ⟨ italic_ω , italic_ϕ ( ⋅ ) ⟩ and f*()=ω*,ϕ()superscript𝑓superscript𝜔italic-ϕf^{*}(\cdot)=\langle\omega^{*},\phi(\cdot)\rangleitalic_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( ⋅ ) = ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( ⋅ ) ⟩. Moreover, for convenience, we assume that ϕ(xm,am)21subscriptnormitalic-ϕsubscript𝑥𝑚subscript𝑎𝑚21\|\phi(x_{m},a_{m})\|_{2}\leq 1∥ italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 and ω*21subscriptnormsuperscript𝜔21\|\omega^{*}\|_{2}\leq 1∥ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1.

As mentioned in Remark 4.5, the FL problem can be formulated as a standard ridge regression with

m(fω(xm,am);rm):=(ω,ϕ(xm,am)rm)2+λω22.assignsubscript𝑚subscript𝑓𝜔subscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚subscript𝑟𝑚2𝜆superscriptsubscriptnorm𝜔22\displaystyle\ell_{m}(f_{\omega}(x_{m},a_{m});r_{m}):=\left(\langle\omega,\phi% (x_{m},a_{m})\rangle-r_{m}\right)^{2}+\lambda\|\omega\|_{2}^{2}.roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := ( ⟨ italic_ω , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_ω ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

In other words, Eqn. (1) can be restated as

minωd^(fω;𝒮):=m[M]1ni[nm](ω,ϕ(xmi,ami)rmi)2+λω22,assignsubscript𝜔superscript𝑑^subscript𝑓𝜔𝒮subscript𝑚delimited-[]𝑀1𝑛subscript𝑖delimited-[]subscript𝑛𝑚superscript𝜔italic-ϕsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖2𝜆superscriptsubscriptnorm𝜔22\displaystyle\min_{\omega\in\mathbb{R}^{d}}\widehat{\mathcal{L}}(f_{\omega};% \mathcal{S}):=\sum_{m\in[M]}\frac{1}{n}\sum_{i\in[n_{m}]}\left(\langle\omega,% \phi(x_{m}^{i},a_{m}^{i})\rangle-r_{m}^{i}\right)^{2}+\lambda\|\omega\|_{2}^{2},roman_min start_POSTSUBSCRIPT italic_ω ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( ⟨ italic_ω , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ⟩ - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_ω ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (5)

which has an exact minimizer as

ω𝒮*=(1nm[M]i[nm]ϕ(xmi,ami)ϕ(xmi,ami)+λI)1(1nm[M]i[nm]ϕ(xmi,ami)rmi).subscriptsuperscript𝜔𝒮superscript1𝑛subscript𝑚delimited-[]𝑀subscript𝑖delimited-[]subscript𝑛𝑚italic-ϕsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖italic-ϕsuperscriptsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖top𝜆𝐼11𝑛subscript𝑚delimited-[]𝑀subscript𝑖delimited-[]subscript𝑛𝑚italic-ϕsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖\displaystyle\omega^{*}_{\mathcal{S}}=\left(\frac{1}{n}\sum_{m\in[M]}\sum_{i% \in[n_{m}]}\phi(x_{m}^{i},a_{m}^{i})\phi(x_{m}^{i},a_{m}^{i})^{\top}+\lambda I% \right)^{-1}\left(\frac{1}{n}\sum_{m\in[M]}\sum_{i\in[n_{m}]}\phi(x_{m}^{i},a_% {m}^{i})r_{m}^{i}\right).italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT = ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (6)

We provide an excess risk bound required in Definition C.1 through the following decomposition:

𝔼𝒮,ξ[m[M]nmn𝔼xm,am(ω^𝒮,ϕ(xm,am)ω*,ϕ(xm,am))2]subscript𝔼𝒮𝜉delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚superscriptsubscript^𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚2\displaystyle\mathbb{E}_{\mathcal{S},\xi}\left[\sum_{m\in[M]}\frac{n_{m}}{n}% \mathbb{E}_{x_{m},a_{m}}\left(\langle\widehat{\omega}_{\mathcal{S}},\phi(x_{m}% ,a_{m})\rangle-\langle\omega^{*},\phi(x_{m},a_{m})\rangle\right)^{2}\right]blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⟨ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
2𝔼𝒮,ξ[m[M]nmn𝔼xm,am(ω^𝒮,ϕ(xm,am)ω𝒮*,ϕ(xm,am))2]absent2subscript𝔼𝒮𝜉delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚superscriptsubscript^𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚subscriptsuperscript𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚2\displaystyle\leq 2\mathbb{E}_{\mathcal{S},\xi}\left[\sum_{m\in[M]}\frac{n_{m}% }{n}\mathbb{E}_{x_{m},a_{m}}\left(\langle\widehat{\omega}_{\mathcal{S}},\phi(x% _{m},a_{m})\rangle-\langle\omega^{*}_{\mathcal{S}},\phi(x_{m},a_{m})\rangle% \right)^{2}\right]≤ 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⟨ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
+2𝔼𝒮,ξ[m[M]nmn𝔼xm,am(ω𝒮*,ϕ(xm,am)ω*,ϕ(xm,am))2]2subscript𝔼𝒮𝜉delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚superscriptsubscriptsuperscript𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚2\displaystyle\quad+2\mathbb{E}_{\mathcal{S},\xi}\left[\sum_{m\in[M]}\frac{n_{m% }}{n}\mathbb{E}_{x_{m},a_{m}}\left(\langle\omega^{*}_{\mathcal{S}},\phi(x_{m},% a_{m})\rangle-\langle\omega^{*},\phi(x_{m},a_{m})\rangle\right)^{2}\right]+ 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=2𝔼𝒮,ξ[ω^𝒮ω𝒮*Σ2]absent2subscript𝔼𝒮𝜉delimited-[]superscriptsubscriptnormsubscript^𝜔𝒮subscriptsuperscript𝜔𝒮Σ2\displaystyle=2\mathbb{E}_{\mathcal{S},\xi}\left[\left\|\widehat{\omega}_{% \mathcal{S}}-\omega^{*}_{\mathcal{S}}\right\|_{\Sigma}^{2}\right]= 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ ∥ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT - italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
+2𝔼𝒮[m[M]nmn𝔼xm,am(ω𝒮*,ϕ(xm,am)ω*,ϕ(xm,am))2]2subscript𝔼𝒮delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚superscriptsubscriptsuperscript𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚2\displaystyle\quad+2\mathbb{E}_{\mathcal{S}}\left[\sum_{m\in[M]}\frac{n_{m}}{n% }\mathbb{E}_{x_{m},a_{m}}\left(\langle\omega^{*}_{\mathcal{S}},\phi(x_{m},a_{m% })\rangle-\langle\omega^{*},\phi(x_{m},a_{m})\rangle\right)^{2}\right]+ 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
2𝔼𝒮,ξ[λmax(Σ)ω^𝒮ω𝒮*22]absent2subscript𝔼𝒮𝜉delimited-[]subscript𝜆Σsuperscriptsubscriptnormsubscript^𝜔𝒮subscriptsuperscript𝜔𝒮22\displaystyle\leq 2\mathbb{E}_{\mathcal{S},\xi}\left[\lambda_{\max}(\Sigma)% \left\|\widehat{\omega}_{\mathcal{S}}-\omega^{*}_{\mathcal{S}}\right\|_{2}^{2}\right]≤ 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S , italic_ξ end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( roman_Σ ) ∥ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT - italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =:term (A)\displaystyle=:\text{term (A)}= : term (A)
+2𝔼𝒮[m[M]nmn𝔼xm,am(ω𝒮*,ϕ(xm,am)ω*,ϕ(xm,am))2]2subscript𝔼𝒮delimited-[]subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚superscriptsubscriptsuperscript𝜔𝒮italic-ϕsubscript𝑥𝑚subscript𝑎𝑚superscript𝜔italic-ϕsubscript𝑥𝑚subscript𝑎𝑚2\displaystyle\quad+2\mathbb{E}_{\mathcal{S}}\left[\sum_{m\in[M]}\frac{n_{m}}{n% }\mathbb{E}_{x_{m},a_{m}}\left(\langle\omega^{*}_{\mathcal{S}},\phi(x_{m},a_{m% })\rangle-\langle\omega^{*},\phi(x_{m},a_{m})\rangle\right)^{2}\right]+ 2 blackboard_E start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ - ⟨ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⟩ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =:term (B)\displaystyle=:\text{term (B)}= : term (B)

where

Σ:=m[M]nmn𝔼xm,am[ϕ(xm,am)ϕ(xm,am)]assignΣsubscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript𝔼subscript𝑥𝑚subscript𝑎𝑚delimited-[]italic-ϕsubscript𝑥𝑚subscript𝑎𝑚italic-ϕsuperscriptsubscript𝑥𝑚subscript𝑎𝑚top\displaystyle\Sigma:=\sum_{m\in[M]}\frac{n_{m}}{n}\mathbb{E}_{x_{m},a_{m}}% \left[\phi(x_{m},a_{m})\phi(x_{m},a_{m})^{\top}\right]roman_Σ := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]

and λmax(Σ)subscript𝜆Σ\lambda_{\max}(\Sigma)italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( roman_Σ ) denotes the maximum eigenvalue of ΣΣ\Sigmaroman_Σ. With ϕ(x,a)21subscriptnormitalic-ϕ𝑥𝑎21\|\phi(x,a)\|_{2}\leq 1∥ italic_ϕ ( italic_x , italic_a ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1, it can be verified that λmax(Σ)1subscript𝜆Σ1\lambda_{\max}(\Sigma)\leq 1italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( roman_Σ ) ≤ 1. In the above decomposition, term (A) can be interpreted as the optimization error, while term (B) is the generalization error.

We can then plug in the aforementioned explicit formula of ω𝒮*subscriptsuperscript𝜔𝒮\omega^{*}_{\mathcal{S}}italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT into term (B) and demonstrate that term (B)=O~(d/n)term (B)~𝑂𝑑𝑛\text{term (B)}=\tilde{O}(d/n)term (B) = over~ start_ARG italic_O end_ARG ( italic_d / italic_n ) with λ=1/n𝜆1𝑛\lambda=1/nitalic_λ = 1 / italic_n under the assumption that ω*21subscriptnormsuperscript𝜔21\|\omega^{*}\|_{2}\leq 1∥ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 and rm[0,1]subscript𝑟𝑚01r_{m}\in[0,1]italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ [ 0 , 1 ] (e.g., following Theorem 9.35 in Zhang (2023)).

For the ridge regression problem in Eqn. (5), previous designs on federated linear bandits typically (Wang et al., 2019; Dubey & Pentland, 2020; Li & Wang, 2022a; He et al., 2022; Amani et al., 2022) have agents collaboratively provide the exact minimizer in Eqn. (6) via directly communicating their local rewards aggregates, i.e., i[nm]ϕ(xmi,ami)rmisubscript𝑖delimited-[]subscript𝑛𝑚italic-ϕsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖\sum_{i\in[n_{m}]}\phi(x_{m}^{i},a_{m}^{i})r_{m}^{i}∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and local covariance matrices, i.e., i[nm]ϕ(xmi,ami)ϕ(xmi,ami)subscript𝑖delimited-[]subscript𝑛𝑚italic-ϕsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖italic-ϕsuperscriptsuperscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖top\sum_{i\in[n_{m}]}\phi(x_{m}^{i},a_{m}^{i})\phi(x_{m}^{i},a_{m}^{i})^{\top}∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Thus, one round of agent-server communication is sufficient, where O(Md2)𝑂𝑀superscript𝑑2O(Md^{2})italic_O ( italic_M italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) real numbers are shared. However, directly sharing such compressed data is often undesired in FL studies due to privacy concerns. We refer to this protocol as the “direct method” for simplicity in the following discussions.

With the flexible FL choice in FedIGW, it can accommodate many other efficient optimization algorithms. In particular, a distributed version of accelerated gradient descent (AGD) (Nesterov, 2003) takes only O(κlog(1/ε))𝑂𝜅1superscript𝜀O(\sqrt{\kappa}\log(1/\varepsilon^{\prime}))italic_O ( square-root start_ARG italic_κ end_ARG roman_log ( 1 / italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) rounds of communications of gradients to have an optimization error of εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, where κ𝜅\kappaitalic_κ is the condition number (i.e., the ratio between the smooth and strongly convex parameter in the considered problem). With λ=1/n𝜆1𝑛\lambda=1/nitalic_λ = 1 / italic_n, it holds that κ=O(n)𝜅𝑂𝑛\kappa=O(n)italic_κ = italic_O ( italic_n ); thus O(nlog(d/n))𝑂𝑛𝑑𝑛O(\sqrt{n}\log(d/n))italic_O ( square-root start_ARG italic_n end_ARG roman_log ( italic_d / italic_n ) ) rounds of communications of gradients are sufficient to obtain an optimization error of order O~(d/n)~𝑂𝑑𝑛\tilde{O}(d/n)over~ start_ARG italic_O end_ARG ( italic_d / italic_n ), where each agents’ gradients are intuitively d𝑑ditalic_d-dimensional.

With the above illustration, the following corollary regarding the performance FedIGW with a linear reward function class is then a straightforward extension from Theorem 4.1.

Corollary D.8.

In the considered linear reward function class with shared true parameters, using the direct method or distributed AGD as the adopted FL protocol to solve the FL problem in Eqn. (5) and τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, FedIGW obtains a regret of

Reg(T)=O~(l[log2(T)]KdM2l1M2l)=O~(MKdT)Reg𝑇~𝑂subscript𝑙delimited-[]subscript2𝑇𝐾𝑑𝑀superscript2𝑙1𝑀superscript2𝑙~𝑂𝑀𝐾𝑑𝑇\displaystyle\textup{Reg}(T)=\tilde{O}\left(\sum_{l\in[\log_{2}(T)]}\sqrt{% \frac{Kd}{M2^{l-1}}}M2^{l}\right)=\tilde{O}\left(\sqrt{MKdT}\right)Reg ( italic_T ) = over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG divide start_ARG italic_K italic_d end_ARG start_ARG italic_M 2 start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT end_ARG end_ARG italic_M 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) = over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_M italic_K italic_d italic_T end_ARG )

and the amount of real numbers communicated can be bounded as

O(l[log2(T)]Md2)𝑂subscript𝑙delimited-[]subscript2𝑇𝑀superscript𝑑2\displaystyle O\left(\sum_{l\in[\log_{2}(T)]}Md^{2}\right)italic_O ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ] end_POSTSUBSCRIPT italic_M italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) =O(Md2log(T))absent𝑂𝑀superscript𝑑2𝑇\displaystyle=O(Md^{2}\log(T))\qquad= italic_O ( italic_M italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_T ) ) (using the direct method);(using the direct method)\displaystyle\text{(using the direct method)};(using the direct method) ;
O(l[log2(T)]MdM2llog(d/(M2l)))𝑂subscript𝑙delimited-[]subscript2𝑇𝑀𝑑𝑀superscript2𝑙𝑑𝑀superscript2𝑙\displaystyle O\left(\sum_{l\in[\log_{2}(T)]}Md\sqrt{M2^{l}}\log(d/(M2^{l}))\right)italic_O ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ] end_POSTSUBSCRIPT italic_M italic_d square-root start_ARG italic_M 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG roman_log ( italic_d / ( italic_M 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) ) =O(dlog(d)M3T)absent𝑂𝑑𝑑superscript𝑀3𝑇\displaystyle=O(d\log(d)\sqrt{M^{3}T})\qquad= italic_O ( italic_d roman_log ( italic_d ) square-root start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_T end_ARG ) (using distributed AGD).(using distributed AGD)\displaystyle\text{(using distributed AGD)}.(using distributed AGD) .

Appendix E Details of Section 6

E.1 Personalized Learning: Details of Section 6.1

Additional details for the personalized learning setting in Section 6.1 are discussed here. In particular, the overall algorithm structure still follows Algorithm 1, while the major difference is that a personalized FL problem is considered:

minωα,ω[M]β^(fωα,ω[M]β;𝒮[M]):=m[M]nmn^m(fωα,ωmβ;𝒮m),assignsubscriptsuperscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀^subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀subscript𝒮delimited-[]𝑀subscript𝑚delimited-[]𝑀subscript𝑛𝑚𝑛subscript^𝑚subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚subscript𝒮𝑚\displaystyle\min_{\omega^{\alpha},\omega^{\beta}_{[M]}}\widehat{\mathcal{L}}(% f_{\omega^{\alpha},\omega^{\beta}_{[M]}};\mathcal{S}_{[M]}):=\sum_{m\in[M]}% \frac{n_{m}}{n}\widehat{\mathcal{L}}_{m}(f_{\omega^{\alpha},\omega^{\beta}_{m}% };\mathcal{S}_{m}),roman_min start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) := ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ,

where

^m(fωα,ωmβ;𝒮m):=1nmi[nm]m(fωα,ωmβ(xmi,ami);rmi).assignsubscript^𝑚subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚subscript𝒮𝑚1subscript𝑛𝑚subscript𝑖delimited-[]subscript𝑛𝑚subscript𝑚subscript𝑓superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖\displaystyle\widehat{\mathcal{L}}_{m}(f_{\omega^{\alpha},\omega^{\beta}_{m}};% \mathcal{S}_{m}):=\frac{1}{n_{m}}\sum_{i\in[n_{m}]}\ell_{m}(f_{\omega^{\alpha}% ,\omega^{\beta}_{m}}(x_{m}^{i},a_{m}^{i});r_{m}^{i}).over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) := divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) .

Furthermore, to bound the generalization error, similar to the Rademacher complexity in Eqn. (4), a slightly different Rademacher complexity is introduced as

𝔓([M];n[M])=sup{𝔼𝒮,𝝈[supωα,ω[M]β{m[M]1ni[nm]σm,im(fωm(xmi,ami);rmi)}]},𝔓subscriptdelimited-[]𝑀subscript𝑛delimited-[]𝑀supremumsubscript𝔼𝒮𝝈delimited-[]subscriptsupremumsuperscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀subscript𝑚delimited-[]𝑀1𝑛subscript𝑖delimited-[]subscript𝑛𝑚subscript𝜎𝑚𝑖subscript𝑚subscript𝑓subscript𝜔𝑚superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖\displaystyle\mathfrak{P}(\mathcal{F}_{[M]};n_{[M]})=\sup\left\{\mathbb{E}_{% \mathcal{S},\bm{\sigma}}\left[\sup_{\omega^{\alpha},\omega^{\beta}_{[M]}}\left% \{\sum_{m\in[M]}\frac{1}{n}\sum_{i\in[n_{m}]}\sigma_{m,i}\cdot\ell_{m}(f_{% \omega_{m}}(x_{m}^{i},a_{m}^{i});r_{m}^{i})\right\}\right]\right\},fraktur_P ( caligraphic_F start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) = roman_sup { blackboard_E start_POSTSUBSCRIPT caligraphic_S , bold_italic_σ end_POSTSUBSCRIPT [ roman_sup start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT end_POSTSUBSCRIPT { ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_m , italic_i end_POSTSUBSCRIPT ⋅ roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) } ] } ,

which is suitable for the considered personalized setting with parameters [ωα,ω[M]β]superscript𝜔𝛼subscriptsuperscript𝜔𝛽delimited-[]𝑀[\omega^{\alpha},\omega^{\beta}_{[M]}][ italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ] involved. A similar notation is also adopted in Mohri et al. (2019).

The following corollary can then be established for the personalized version of FedIGW with the LSGD-PFL algorithm (Hanzely et al., 2021) adopted to solve the personalized FL task.

Corollary E.1.

Under the conditions of Lemmas 4.3 and F.7, with LSGD-PFL as the adopted personalized FL protocol, FedIGW incurs a regret of

Reg(T)=O(ME1+l[2,l(T)]K𝔓l1/μfMEl)Reg𝑇𝑂𝑀superscript𝐸1subscript𝑙2𝑙𝑇𝐾superscript𝔓𝑙1subscript𝜇𝑓𝑀superscript𝐸𝑙\displaystyle\textup{Reg}(T)=O\left(ME^{1}+\sum_{l\in[2,l(T)]}\sqrt{K\mathfrak% {P}^{l-1}/\mu_{f}}ME^{l}\right)Reg ( italic_T ) = italic_O ( italic_M italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l ∈ [ 2 , italic_l ( italic_T ) ] end_POSTSUBSCRIPT square-root start_ARG italic_K fraktur_P start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT / italic_μ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG italic_M italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )

with

O~(l[l(T)]max{βωβ(κl)1,βωα}μω1+σb2(μωκlM𝔓l)1+βωα(G2+σ2)(μω2𝔓l)1)~𝑂subscript𝑙delimited-[]𝑙𝑇subscript𝛽superscript𝜔𝛽superscriptsuperscript𝜅𝑙1subscript𝛽superscript𝜔𝛼subscriptsuperscript𝜇1𝜔subscriptsuperscript𝜎2𝑏superscriptsubscript𝜇𝜔superscript𝜅𝑙𝑀superscript𝔓𝑙1subscript𝛽superscript𝜔𝛼superscript𝐺2superscript𝜎2superscriptsuperscriptsubscript𝜇𝜔2superscript𝔓𝑙1\displaystyle\tilde{O}\left(\sum_{l\in[l(T)]}\max\{\beta_{\omega^{\beta}}(% \kappa^{l})^{-1},\beta_{\omega^{\alpha}}\}\mu^{-1}_{\omega}+\sigma^{2}_{b}(\mu% _{\omega}\kappa^{l}M\mathfrak{P}^{l})^{-1}+\sqrt{\beta_{\omega^{\alpha}}(G^{2}% +\sigma^{2})(\mu_{\omega}^{2}\mathfrak{P}^{l})^{-1}}\right)over~ start_ARG italic_O end_ARG ( ∑ start_POSTSUBSCRIPT italic_l ∈ [ italic_l ( italic_T ) ] end_POSTSUBSCRIPT roman_max { italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } italic_μ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_M fraktur_P start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_P start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG )

rounds of communications, where 𝔓l:=𝔓([M],{El:m[M]})assignsuperscript𝔓𝑙𝔓subscriptdelimited-[]𝑀conditional-setsuperscript𝐸𝑙𝑚delimited-[]𝑀\mathfrak{P}^{l}:=\mathfrak{P}(\mathcal{F}_{[M]},\{E^{l}:m\in[M]\})fraktur_P start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT := fraktur_P ( caligraphic_F start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT , { italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT : italic_m ∈ [ italic_M ] } ) and κlsuperscript𝜅𝑙\kappa^{l}italic_κ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is the number of local updates in epoch l𝑙litalic_l.

The proof largely follows that of Corollary D.7: decomposing excess risk to generalization and optimization errors; using Rademacher complexity to characterize the generalization error; using FL convergence analyses to characterize the optimization error; and combining them together such that the optimization error does not dominate the generalization error. As the LSGD-PFL protocol (Hanzely et al., 2021) is adopted to solve the personalized FL task as an illustration, its corresponding convergence analyses should be incorporated, which is presented in Lemma F.7.

E.1.1 A Linear Reward Function Class

As an extension of the linear reward function in Appendix D.3, we consider that

μm(xm,am)=ϕ(xm,am),ωm*,m[M],(xm,am)𝒳m×𝒜m,formulae-sequencesubscript𝜇𝑚subscript𝑥𝑚subscript𝑎𝑚italic-ϕsubscript𝑥𝑚subscript𝑎𝑚subscriptsuperscript𝜔𝑚formulae-sequencefor-all𝑚delimited-[]𝑀subscript𝑥𝑚subscript𝑎𝑚subscript𝒳𝑚subscript𝒜𝑚\displaystyle\mu_{m}(x_{m},a_{m})=\langle\phi(x_{m},a_{m}),\omega^{*}_{m}% \rangle,\qquad\forall m\in[M],(x_{m},a_{m})\in\mathcal{X}_{m}\times\mathcal{A}% _{m},italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = ⟨ italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⟩ , ∀ italic_m ∈ [ italic_M ] , ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT × caligraphic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ,

and the true model parameters {ωm*:m[M]}conditional-setsubscriptsuperscript𝜔𝑚𝑚delimited-[]𝑀\{\omega^{*}_{m}:m\in[M]\}{ italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } follow Assumption 6.2, i.e., ωm*=[ωα,*,ωm*,β]subscriptsuperscript𝜔𝑚superscript𝜔𝛼subscriptsuperscript𝜔𝛽𝑚\omega^{*}_{m}=[\omega^{\alpha,*},\omega^{*,\beta}_{m}]italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = [ italic_ω start_POSTSUPERSCRIPT italic_α , * end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT * , italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] with ωα,*superscript𝜔𝛼\omega^{\alpha,*}italic_ω start_POSTSUPERSCRIPT italic_α , * end_POSTSUPERSCRIPT shared among all agents.

It can be further realized that the above problem setting is identical to a d~~𝑑\tilde{d}over~ start_ARG italic_d end_ARG-dimensional linear system, where d~:=dα+m[M]dmβassign~𝑑superscript𝑑𝛼subscript𝑚delimited-[]𝑀subscriptsuperscript𝑑𝛽𝑚\tilde{d}:=d^{\alpha}+\sum_{m\in[M]}d^{\beta}_{m}over~ start_ARG italic_d end_ARG := italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT: the overall true model parameter is

ω~*=[ω*,α,ω1*,β,,ωM*,β]d~.superscript~𝜔superscript𝜔𝛼subscriptsuperscript𝜔𝛽1subscriptsuperscript𝜔𝛽𝑀superscript~𝑑\displaystyle\tilde{\omega}^{*}=\left[\omega^{*,\alpha},\omega^{*,\beta}_{1},% \cdots,\omega^{*,\beta}_{M}\right]\in\mathbb{R}^{\tilde{d}}.over~ start_ARG italic_ω end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = [ italic_ω start_POSTSUPERSCRIPT * , italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUPERSCRIPT * , italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_ω start_POSTSUPERSCRIPT * , italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT over~ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT .

and a correspondingly feature map** ϕ~()~italic-ϕ\tilde{\phi}(\cdot)over~ start_ARG italic_ϕ end_ARG ( ⋅ ) is

ϕ~(xm,am)=[ϕ(xm,am)[1:dα],𝑶d1β,,𝑶dm1β,ϕ(xm,am)[dα+1:dm],𝑶dm+1β,,𝑶dMβ],~italic-ϕsubscript𝑥𝑚subscript𝑎𝑚italic-ϕsubscriptsubscript𝑥𝑚subscript𝑎𝑚delimited-[]:1superscript𝑑𝛼subscript𝑶subscriptsuperscript𝑑𝛽1subscript𝑶subscriptsuperscript𝑑𝛽𝑚1italic-ϕsubscriptsubscript𝑥𝑚subscript𝑎𝑚delimited-[]:superscript𝑑𝛼1subscript𝑑𝑚subscript𝑶subscriptsuperscript𝑑𝛽𝑚1subscript𝑶subscriptsuperscript𝑑𝛽𝑀\displaystyle\tilde{\phi}(x_{m},a_{m})=\left[\phi(x_{m},a_{m})_{[1:d^{\alpha}]% },\bm{O}_{d^{\beta}_{1}},\cdots,\bm{O}_{d^{\beta}_{m-1}},\phi(x_{m},a_{m})_{[d% ^{\alpha}+1:d_{m}]},\bm{O}_{d^{\beta}_{m+1}},\cdots,\bm{O}_{d^{\beta}_{M}}% \right],over~ start_ARG italic_ϕ end_ARG ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = [ italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT [ 1 : italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT , bold_italic_O start_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ⋯ , bold_italic_O start_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + 1 : italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT , bold_italic_O start_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ⋯ , bold_italic_O start_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ,

i.e., an expanded version of the original feature, where ϕ(xm,am)[i:j]ji+1italic-ϕsubscriptsubscript𝑥𝑚subscript𝑎𝑚delimited-[]:𝑖𝑗superscript𝑗𝑖1\phi(x_{m},a_{m})_{[i:j]}\in\mathbb{R}^{j-i+1}italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT [ italic_i : italic_j ] end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_j - italic_i + 1 end_POSTSUPERSCRIPT denotes the sub-vector containing [i:j]delimited-[]:𝑖𝑗[i:j][ italic_i : italic_j ]-th elements in ϕ(xm,am)italic-ϕsubscript𝑥𝑚subscript𝑎𝑚\phi(x_{m},a_{m})italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) and 𝑶iisubscript𝑶𝑖superscript𝑖\bm{O}_{i}\in\mathbb{R}^{i}bold_italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT an i𝑖iitalic_i-dimensional null vector.

With this reformulated problem, discussions from Appendix D.3 can be directly leveraged. Especially, Corollary D.8 indicates the following result.

Corollary E.2.

In the considered linear reward function class with partially true parameters, using distributed AGD as the adopted FL protocol to solve the FL problem in Eqn. (5) with reformulated feature map** ϕ~()normal-~italic-ϕnormal-⋅\tilde{\phi}(\cdot)over~ start_ARG italic_ϕ end_ARG ( ⋅ ) and τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, FedIGW incurs a regret of

Reg(T)=O~(MKd~T)Reg𝑇~𝑂𝑀𝐾~𝑑𝑇\displaystyle\textup{Reg}(T)=\tilde{O}\left(\sqrt{MK\tilde{d}T}\right)Reg ( italic_T ) = over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_M italic_K over~ start_ARG italic_d end_ARG italic_T end_ARG )

and the amount of real numbers communicated can be bounded as O(dαlog(dα)M3T)𝑂superscript𝑑𝛼superscript𝑑𝛼superscript𝑀3𝑇O(d^{\alpha}\log(d^{\alpha})\sqrt{M^{3}T})italic_O ( italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT roman_log ( italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) square-root start_ARG italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_T end_ARG ).

E.2 Robustness, Privacy, and Beyond: Details of Section 6.2

We here provide some additional discussions on incorporating appendages in FL studies to provide robustness and privacy guarantees for FedIGW among some other directions, e.g., fairness guarantees (Mohri et al., 2019; Du et al., 2021), client selections (Balakrishnan et al., 2022; Fraboni et al., 2021), and practical communication designs (Chen et al., 2021; Wei & Shen, 2022; Zheng et al., 2020). The key is that as long as one FL protocol can provide an estimated function f^^𝑓\widehat{f}over^ start_ARG italic_f end_ARG (which is used in IGW interactions), it can be adopted in FedIGW; thus the desirable properties of the selected FL protocol are naturally inherited to FedIGW.

For example, Yin et al. (2018); Pillutla et al. (2022); Fu et al. (2019); Li et al. (2021); Zhu et al. (2023) studied how to handle malicious agents, who can deviate arbitrarily from the FL protocol and tamper with their updates, during learning. The commonly adopted approach is to invoke certain robust estimators (e.g., median and trimmed mean). Under suitable assumptions, existing approaches have shown that as long as the proportion of malicious agents does not exceed a threshold (typically, 1/2121/21 / 2), the estimators calculated by federation can still converge within certain amounts of error due to the malicious agents. A recent work (Zhu et al., 2023) provides a summary of convergence rates with different robust estimators, which can be leveraged to establish theoretical understandings of FedIGW with robustness.

On the privacy side, many mechanisms have also been studied in FL (Wei et al., 2020; Yin et al., 2021; Liu et al., 2022), to guarantee differential privacy (DP), where the most common approach is to insert noises of suitable scales. Convergence rates have also been established under suitable assumptions, e.g., in Wei et al. (2020); Girgis et al. (2021); Wei et al. (2021). With those analyses, the theoretical behavior of FedIGW with DP can also be similarly established as Corollaries D.7 and E.1.

Appendix F Algorithm Sketches and Convergence Analyses of FL Designs

F.1 FedAvg

The FedAvg algorithm (McMahan et al., 2017) is one of the most standard and well-adopted FL protocol. Following it, agents perform local stochastic gradient descents (SGD) with their local objective functions for certain steps and then communicate the updated local models to the server; the server aggregates local models to a global one via a weighted average, which is then communicated to the agents to perform further local SGDs.

Many theoretical analyses have been provided for FedAvg (e.g., Li et al. (2020b)). We adopt the one from Karimireddy et al. (2020) in the following.

Lemma F.1 (Theorem V in Karimireddy et al. (2020) without client sampling).

For any dataset 𝒮𝒮\mathcal{S}caligraphic_S, if

  • ^m(fω;𝒮m)subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}_{m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex w.r.t. ω𝜔\omegaitalic_ω (see Definition F.2) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • ^m(fω;𝒮m)subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}_{m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is βωsubscript𝛽𝜔\beta_{\omega}italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-smooth w.r.t. ω𝜔\omegaitalic_ω (see Definition F.3) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • the stochastic gradients are unbiased and have a σb2subscriptsuperscript𝜎2𝑏\sigma^{2}_{b}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded variance (see Definition F.4);

  • the gradients have Gbsubscript𝐺𝑏G_{b}italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded dissimilarity (see Definition F.5),

with FedAvg as the adopted FL protocol, the output ω^normal-^𝜔\widehat{\omega}over^ start_ARG italic_ω end_ARG satisfies that

𝔼ξ[^(fω^𝒮;𝒮)^(fω𝒮*;𝒮)𝒮]O~(σb2μωρκM+βωGb2μω2ρ2+μωω0ω𝒮*22exp(μωρ16βω))subscript𝔼𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮conditional^subscript𝑓subscriptsuperscript𝜔𝒮𝒮𝒮~𝑂superscriptsubscript𝜎𝑏2subscript𝜇𝜔𝜌𝜅𝑀subscript𝛽𝜔superscriptsubscript𝐺𝑏2superscriptsubscript𝜇𝜔2superscript𝜌2subscript𝜇𝜔superscriptsubscriptnormsuperscript𝜔0subscriptsuperscript𝜔𝒮22subscript𝜇𝜔𝜌16subscript𝛽𝜔\displaystyle\mathbb{E}_{\xi}[\widehat{\mathcal{L}}(f_{\widehat{\omega}_{% \mathcal{S}}};\mathcal{S})-\widehat{\mathcal{L}}(f_{\omega^{*}_{\mathcal{S}}};% \mathcal{S})\mid\mathcal{S}]\leq\tilde{O}\left(\frac{\sigma_{b}^{2}}{\mu_{% \omega}\rho\kappa M}+\frac{\beta_{\omega}G_{b}^{2}}{\mu_{\omega}^{2}\rho^{2}}+% \mu_{\omega}\|\omega^{0}-\omega^{*}_{\mathcal{S}}\|_{2}^{2}\exp\left(-\frac{% \mu_{\omega}\rho}{16\beta_{\omega}}\right)\right)blackboard_E start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ∣ caligraphic_S ] ≤ over~ start_ARG italic_O end_ARG ( divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ italic_κ italic_M end_ARG + divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ∥ italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ end_ARG start_ARG 16 italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG ) )

when ρ8βωμω𝜌8subscript𝛽𝜔subscript𝜇𝜔\rho\geq\frac{8\beta_{\omega}}{\mu_{\omega}}italic_ρ ≥ divide start_ARG 8 italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG, where ρ𝜌\rhoitalic_ρ denotes the round of communications (i.e., number of global aggregations), κ𝜅\kappaitalic_κ is the number of local updates (i.e., SGD) between each communication, and ω0superscript𝜔0\omega^{0}italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is the initialization. Note that the last term which decays exponentially w.r.t. ρ𝜌\rhoitalic_ρ is omitted in Corollary D.6 and the following derivations for simplicity.

A few definitions used above are made precise in the following, which are inherited from Karimireddy et al. (2020) and presented here for completeness:

Definition F.2 (Strongly Convex).

^m(fω;𝒮)subscript^𝑚subscript𝑓𝜔𝒮\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex w.r.t. ω𝜔\omegaitalic_ω for μω>0subscript𝜇𝜔0\mu_{\omega}>0italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT > 0 if

^m(fω;𝒮)^m(fω;𝒮)ω^m(fω;𝒮),ωω+μω2ωω22,for any ω and ω.subscript^𝑚subscript𝑓superscript𝜔𝒮subscript^𝑚subscript𝑓𝜔𝒮subscript𝜔subscript^𝑚subscript𝑓𝜔𝒮superscript𝜔𝜔subscript𝜇𝜔2superscriptsubscriptnormsuperscript𝜔𝜔22for any ω and ω.\displaystyle\widehat{\mathcal{L}}_{m}(f_{\omega^{\prime}};\mathcal{S})-% \widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S})\geq\left\langle\nabla_{% \omega}\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}),\omega^{\prime}-% \omega\right\rangle+\frac{\mu_{\omega}}{2}\left\|\omega^{\prime}-\omega\right% \|_{2}^{2},\quad\text{for any $\omega$ and $\omega^{\prime}$.}over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) ≥ ⟨ ∇ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) , italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ω ⟩ + divide start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ω ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for any italic_ω and italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .
Definition F.3 (Smooth).

^m(fω;𝒮)subscript^𝑚subscript𝑓𝜔𝒮\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) is βωsubscript𝛽𝜔\beta_{\omega}italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-smooth w.r.t. ω𝜔\omegaitalic_ω for βω>0subscript𝛽𝜔0\beta_{\omega}>0italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT > 0 if

^m(fω;𝒮)^m(fω;𝒮)ω^m(fω;𝒮),ωω+βω2ωω22,for any ω and ω.subscript^𝑚subscript𝑓superscript𝜔𝒮subscript^𝑚subscript𝑓𝜔𝒮subscript𝜔subscript^𝑚subscript𝑓𝜔𝒮superscript𝜔𝜔subscript𝛽𝜔2superscriptsubscriptnormsuperscript𝜔𝜔22for any ω and ω.\displaystyle\widehat{\mathcal{L}}_{m}(f_{\omega^{\prime}};\mathcal{S})-% \widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S})\leq\left\langle\nabla_{% \omega}\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}),\omega^{\prime}-% \omega\right\rangle+\frac{\beta_{\omega}}{2}\left\|\omega^{\prime}-\omega% \right\|_{2}^{2},\quad\text{for any $\omega$ and $\omega^{\prime}$.}over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) ≤ ⟨ ∇ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S ) , italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ω ⟩ + divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ω ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for any italic_ω and italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .
Definition F.4 (Stochastic Gradients with Bounded Variances).

The stochastic gradients have a σb2subscriptsuperscript𝜎2𝑏\sigma^{2}_{b}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded variance if

1nmi[nm]ωm(fω(xmi,ami);rmi)ω^m(fω;𝒮m)22σb2,for any ω and m.1subscript𝑛𝑚subscript𝑖delimited-[]subscript𝑛𝑚superscriptsubscriptnormsubscript𝜔subscript𝑚subscript𝑓𝜔superscriptsubscript𝑥𝑚𝑖superscriptsubscript𝑎𝑚𝑖superscriptsubscript𝑟𝑚𝑖subscript𝜔subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚22superscriptsubscript𝜎𝑏2for any ω and m.\displaystyle\frac{1}{n_{m}}\sum_{i\in[n_{m}]}\left\|\nabla_{\omega}\ell_{m}(f% _{\omega}(x_{m}^{i},a_{m}^{i});r_{m}^{i})-\nabla_{\omega}\widehat{\mathcal{L}}% _{m}(f_{\omega};\mathcal{S}_{m})\right\|_{2}^{2}\leq\sigma_{b}^{2},\quad\text{% for any $\omega$ and $m$.}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ∥ ∇ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - ∇ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for any italic_ω and italic_m .
Definition F.5 (Gradients with Bounded Dissimilarity).

The gradients have a Gbsubscript𝐺𝑏G_{b}italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded dissimilarity if

1Mm[M]ω^m(fω;𝒮m)22Gb2,for any ω.1𝑀subscript𝑚delimited-[]𝑀superscriptsubscriptnormsubscript𝜔subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚22superscriptsubscript𝐺𝑏2for any ω.\displaystyle\frac{1}{M}\sum_{m\in[M]}\left\|\nabla_{\omega}\widehat{\mathcal{% L}}_{m}(f_{\omega};\mathcal{S}_{m})\right\|_{2}^{2}\leq G_{b}^{2},\quad\text{% for any $\omega$.}divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ [ italic_M ] end_POSTSUBSCRIPT ∥ ∇ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for any italic_ω .

F.2 SCAFFOLD

The SCAFFOLD algorithm is proposed in Karimireddy et al. (2020), which enhances FedAvg via leveraging variance reduction to correct drifts in heterogenous agents’ local updates. The following result is established in Karimireddy et al. (2020) to characterize the convergence of the SCAFFOLD protocol.

Lemma F.6 (Theorem VII in Karimireddy et al. (2020) without client sampling).

For any dataset 𝒮𝒮\mathcal{S}caligraphic_S, if

  • ^m(fω;𝒮m)subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}_{m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex w.r.t. ω𝜔\omegaitalic_ω (see Definition F.2) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • ^m(fω;𝒮m)subscript^𝑚subscript𝑓𝜔subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega};\mathcal{S}_{m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is βωsubscript𝛽𝜔\beta_{\omega}italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-smooth w.r.t. ω𝜔\omegaitalic_ω (see Definition F.3) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • the stochastic gradients are unbiased and have a σb2subscriptsuperscript𝜎2𝑏\sigma^{2}_{b}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT-bounded variance (see Definition F.4),

with SCAFFOLD as the adopted FL protocol, the output ω^normal-^𝜔\widehat{\omega}over^ start_ARG italic_ω end_ARG satisfies that

𝔼ξ[^(fω^𝒮;𝒮)^(fω𝒮*;𝒮)𝒮]O~(σb2μωρκM+μωD~2exp(min{ρ30,μωρ162βω}))subscript𝔼𝜉delimited-[]^subscript𝑓subscript^𝜔𝒮𝒮conditional^subscript𝑓subscriptsuperscript𝜔𝒮𝒮𝒮~𝑂superscriptsubscript𝜎𝑏2subscript𝜇𝜔𝜌𝜅𝑀subscript𝜇𝜔superscript~𝐷2𝜌30subscript𝜇𝜔𝜌162subscript𝛽𝜔\displaystyle\mathbb{E}_{\xi}[\widehat{\mathcal{L}}(f_{\widehat{\omega}_{% \mathcal{S}}};\mathcal{S})-\widehat{\mathcal{L}}(f_{\omega^{*}_{\mathcal{S}}};% \mathcal{S})\mid\mathcal{S}]\leq\tilde{O}\left(\frac{\sigma_{b}^{2}}{\mu_{% \omega}\rho\kappa M}+\mu_{\omega}\tilde{D}^{2}\exp\left(-\min\left\{\frac{\rho% }{30},\frac{\mu_{\omega}\rho}{162\beta_{\omega}}\right\}\right)\right)blackboard_E start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT [ over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) - over^ start_ARG caligraphic_L end_ARG ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S ) ∣ caligraphic_S ] ≤ over~ start_ARG italic_O end_ARG ( divide start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ italic_κ italic_M end_ARG + italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - roman_min { divide start_ARG italic_ρ end_ARG start_ARG 30 end_ARG , divide start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ρ end_ARG start_ARG 162 italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG } ) )

when ρmax{162βωμω,30}𝜌162subscript𝛽𝜔subscript𝜇𝜔30\rho\geq\max\{\frac{162\beta_{\omega}}{\mu_{\omega}},30\}italic_ρ ≥ roman_max { divide start_ARG 162 italic_β start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG , 30 }, where ρ𝜌\rhoitalic_ρ denotes the round of communications (i.e., number of global aggregations), κ𝜅\kappaitalic_κ is the number of local updates (i.e., SGD) between each communication, D~2superscriptnormal-~𝐷2\tilde{D}^{2}over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a distant measure w.r.t. the initialization defined in Karimireddy et al. (2020). Note that the last term which decays exponentially w.r.t. ρ𝜌\rhoitalic_ρ is omitted in Corollary D.6 and the following derivations for simplicity.

F.3 LSGD-PFL

The LSGD-PFL protocol is summarized in Hanzely et al. (2021), which is a general design for personalized federated learning problems. It largely follows FedAvg (McMahan et al., 2017), while only the globally shared parameters are communicated and aggregated. The following lemma is provided in Hanzely et al. (2021) to characterize the convergence of LSGD-PFL.

Lemma F.7 (Theorem 1 in Hanzely et al. (2021)).

For any dataset 𝒮𝒮\mathcal{S}caligraphic_S, if

  • ^m(fωm;𝒮m)subscript^𝑚subscript𝑓subscript𝜔𝑚subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega_{m}};\mathcal{S}_{m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT-strongly convex w.r.t. ωmsubscript𝜔𝑚\omega_{m}italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT (see Definition F.2) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • ^m(fωα,ωmβ;𝒮m)subscript^𝑚subscript𝑓superscript𝜔𝛼superscriptsubscript𝜔𝑚𝛽subscript𝒮𝑚\widehat{\mathcal{L}}_{m}(f_{\omega^{\alpha},\omega_{m}^{\beta}};\mathcal{S}_{% m})over^ start_ARG caligraphic_L end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ; caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is βωαsubscript𝛽superscript𝜔𝛼\beta_{\omega^{\alpha}}italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUBSCRIPT-smooth w.r.t. ωαsuperscript𝜔𝛼\omega^{\alpha}italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT and Mβωβ𝑀subscript𝛽superscript𝜔𝛽M\beta_{\omega^{\beta}}italic_M italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_POSTSUBSCRIPT-smooth w.r.t. ωmβsubscriptsuperscript𝜔𝛽𝑚\omega^{\beta}_{m}italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT (see Definition F.3) for all m[M]𝑚delimited-[]𝑀m\in[M]italic_m ∈ [ italic_M ];

  • the stochastic gradients w.r.t. ωαsuperscript𝜔𝛼\omega^{\alpha}italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT is unbiased and have a σb2superscriptsubscript𝜎𝑏2\sigma_{b}^{2}italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-bounded variance (see Definition F.4);

  • the stochastic gradients w.r.t. {ωmβ:m[M]}conditional-setsubscriptsuperscript𝜔𝛽𝑚𝑚delimited-[]𝑀\{\omega^{\beta}_{m}:m\in[M]\}{ italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_m ∈ [ italic_M ] } is unbiased and have a σb2superscriptsubscript𝜎𝑏2\sigma_{b}^{2}italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-bounded variance (see Definition F.4);

  • the gradients w.r.t. ω𝜔\omegaitalic_ω have Gbsubscript𝐺𝑏G_{b}italic_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bounded dissimilarity (see Definition F.5),

with LSGD-PFL as the adopted FL protocol, the output ω^normal-^𝜔\widehat{\omega}over^ start_ARG italic_ω end_ARG has ε𝑜𝑝𝑡([M];n[M])εsubscript𝜀𝑜𝑝𝑡subscriptdelimited-[]𝑀subscript𝑛delimited-[]𝑀superscript𝜀normal-′\varepsilon_{\text{opt}}(\mathcal{F}_{[M]};n_{[M]})\leq\varepsilon^{\prime}italic_ε start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ( caligraphic_F start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT [ italic_M ] end_POSTSUBSCRIPT ) ≤ italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT after

O~(max{βωβκ1,βωα}μω+σb2μωκMε+1μωβωα(G2+σ2)ε)~𝑂subscript𝛽superscript𝜔𝛽superscript𝜅1subscript𝛽superscript𝜔𝛼subscript𝜇𝜔subscriptsuperscript𝜎2𝑏subscript𝜇𝜔𝜅𝑀superscript𝜀1subscript𝜇𝜔subscript𝛽superscript𝜔𝛼superscript𝐺2superscript𝜎2superscript𝜀\displaystyle\tilde{O}\left(\frac{\max\{\beta_{\omega^{\beta}}\kappa^{-1},% \beta_{\omega^{\alpha}}\}}{\mu_{\omega}}+\frac{\sigma^{2}_{b}}{\mu_{\omega}% \kappa M\varepsilon^{\prime}}+\frac{1}{\mu_{\omega}}\sqrt{\frac{\beta_{\omega^% {\alpha}}(G^{2}+\sigma^{2})}{\varepsilon^{\prime}}}\right)over~ start_ARG italic_O end_ARG ( divide start_ARG roman_max { italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_κ italic_M italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG square-root start_ARG divide start_ARG italic_β start_POSTSUBSCRIPT italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG end_ARG )

rounds of communications, where κ𝜅\kappaitalic_κ is the number of local updates.

Appendix G Experiment Details

This section first provides a comprehensive description of the experimental settings and procedures. The codes and detailed instructions have been uploaded in the supplementary materials so as to execute the experiments and reproduce the results.

Experimental details. In the experiments, the system is designed as a synchronous one, i.e., tm(t)=t,m[M]formulae-sequencesubscript𝑡𝑚𝑡𝑡for-all𝑚delimited-[]𝑀t_{m}(t)=t,\forall m\in[M]italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) = italic_t , ∀ italic_m ∈ [ italic_M ], and for both tasks, two-layer multi-layer perceptrons (MLPs) with a hidden layer having a constant 256256256256 width are used to approximate the reward functions.

For practical conveniences, instead of selecting a theoretically sound but sophisticated choice of γ𝛾\gammaitalic_γ for FedIGW as in Theorem 4.1, we set it as a constant hyper-parameter and perform some preliminary manual selections with the final adopted values reported in Table 5. We believe this approach is more practically appealing as it does not need to scale γ𝛾\gammaitalic_γ consistently; a similar choice of using constant γ𝛾\gammaitalic_γ’s is also adopted in Agarwal et al. (2023).

Table 5: Hyperparameter choices for FedIGW in Bibtex and Delicious
Task Learning Rate Batch Size Communications Parameter γ𝛾\gammaitalic_γ
Bibtex 0.1 64 100 7000
Delicious 0.2 64 100 7000

Multiple standard FL protocols including FedAvg (McMahan et al., 2017), SCAFFOLD (Karimireddy et al., 2020) and FedProx (Li et al., 2020a) are adopted as the FL component in FedIGW. During each FL process, the local batch size, the number of communications, and the local learning rate are specified in Table 5. Moreover, the epoch length is designed to be growing exponentially as in Corollaries 4.2, D.8 and E.2, i.e., τl=2lsuperscript𝜏𝑙superscript2𝑙\tau^{l}=2^{l}italic_τ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, while culminating at an upper limit of 4096409640964096 to maintain timely updates.

Additional comparisons with single-agent baselines. In Fig. 3, comparisons between FedIGW and the state-of-the-art FN-UCB are provided, demonstrating the superiority of FedIGW. Here we further report Fig. 5, containing comparisons between FedIGW and two single-agent baselines:

  • AGR. The adaptive greedy (AGR) algorithm (Chakrabarti et al., 2008) is selected as one of the single-agent baselines due to its strong empirical performance on Bibtex and Delicious reported in Cortes (2018). The algorithmic details can be found in Cortes (2018), and we also leveraged the code provided in Cortes (2018) to build this baseline.

  • FALCON. The other single-agent baseline, FALCON, is proposed in Simchi-Levi & Xu (2022), which is essentially the single-agent version of FedIGW. We still adopt the same algorithmic configurations as FedIGW (i.e., epoch length, parameter γ𝛾\gammaitalic_γ, local batch size, and local learning rate) except that the MLP is optimized locally instead of in a federation, i.e., there are no communications.

It can be observed that FedIGW (with M=10𝑀10M=10italic_M = 10 participating agents and the basic FedAvg) can outperform the two single-agent baselines on both tasks, demonstrating the benefit of learning in a federation.

Refer to caption
Refer to caption
Figure 5: The averaged reward collected by each agent via FedIGW (with FedAvg and M=10𝑀10M=10italic_M = 10 participating agents) and two single-agent baselines on Bibtex (left) and Delicious (right) datasets.