Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

1st Mahtab Talaei1

Department of Electrical and Computer Engineering
Isfahan University of Technology
Isfahan, Iran
[email protected]
   2nd Iman Izadi
Department of Electrical and Computer Engineering
Isfahan University of Technology
Isfahan, Iran
[email protected]
Abstract

Federated learning (FL), a novel branch of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, it is still possible to access the model updates (gradient updates of deep neural networks) transferred between clients and servers, potentially revealing sensitive local information to adversaries using model inversion attacks. Differential privacy (DP) offers a promising approach to addressing this issue by adding noise to the parameters. On the other hand, heterogeneities in data structure, storage, communication, and computational capabilities of devices can cause convergence problems and delays in develo** the global model. A personalized weighted averaging of local parameters based on the resources of each device can yield a better aggregated model in each round. In this paper, to efficiently preserve privacy, we propose a personalized DP framework that injects noise based on clients’ relative impact factors and aggregates parameters while considering heterogeneities and adjusting properties. To fulfill the DP requirements, we first analyze the convergence boundary of the FL algorithm when impact factors are personalized and fixed throughout the learning process. We then further study the convergence property considering time-varying (adaptive) impact factors.

Index Terms:
Federated Learning, Differential Privacy, Personalized Impact Factors, Adaptive Impact Factors, Systems and Statistical Heterogineties
11footnotetext: Mahtab Talaei was affiliated with the Department of Electrical and Computer Engineering at Isfahan University of Technology during the research for this paper. At the time of submission, she is affiliated with the Division of Systems Engineering at Boston University, Boston, USA.

I Introduction

Smart distributed systems such as smartphones, automated vehicles, multi-agent systems, and wearable devices are growing rapidly in our daily lives. Their underlying mechanism which is attached with sensing and communicating generates an unprecedented amount of data every day. Therefore, utilizing these sources of rich information to enhance services offered to people and organizations owning the data, without violating their privacy, matters a great deal. The developments in the computational and communicational capabilities of intelligent distributed devices along with their abilities to collect and store large datasets have opened up effective alternatives for managing and analyzing local databases.

A common traditional practice to develop predictive machine learning (ML) models is to transmit raw data over networks and generate models in a centralized manner. While this method has provided data owners valuable services throughout the years, their efficiency for today’s crowdsourced data is called into question. Communication costs of sending large volumes of data on one hand, and privacy concerns for sharing personal information on the other hand have provided space for decentralized ML algorithms, such as federated learning (FL) [1].

Federated machine learning is a promising solution in settings dealing with large volumes of data as well as privacy concerns about clients’ sensitive information [2]. In this framework, each device builds its model using local datasets, and the essential model parameters, rather than raw data, are transmitted to the cloud server. The server aggregates these parameters and updates the global model throughout a recursive downloading and uploading cycle [3, 4, 5]. Hence, each client benefits from a larger database during the learning process, without direct access to it. While offering great advantages over conventional ML methods, FL has its own challenges. Expensive communications, systems and statistical heterogeneities, and security risks are considered as the four main issues while develo** FL models [6].

Deep Learning (DL) models are widely used in FL, especially for feature extraction in the large image, voice, and text datasets. In order to optimize local DL models inside the clients, stochastic gradient descent (SGD) is generally adopted [7, 8, 9]. Sending frequent gradient updates with the massive number of both parameters and clients in FL leads to an extreme rise in communication costs. Increasing the number of local updates [10, 11] is one natural way to modify communication bottlenecks with more local computations. On the other hand, quantization [12, 13, 14] and sparsification [15, 16] methods mitigate this challenge by reducing the size of transmitted messages in each round.

Dealing with systems that have different computational capabilities, network capacities, and power resources is an inevitable challenge in FL. Several approaches, including resource-based client selection [17, 18], robust and fault-tolerant algorithms [19], and asynchronous communications [20] address these challenges. On the other extreme, heterogeneity in data distributions of the clients causes problems in the training and convergence of FL algorithms. Using multi-task learning methods [21, 22] and avoiding local minimums by adding a proximal term to the objective function [23] help handling unbalanced and non-IID data in FL [6].

Even though the idea of FL was first proposed for its strong privacy guarantees, it has been shown that local datasets can be still revealed to stragglers using model inversion attacks on shared updates [24], especially when DL is used in local models [25, 26]. To mitigate this challenge, differential privacy (DP) is one of the widely used protection algorithms due to its solid theoretical guarantees [27, 28]. In order to reduce the risk of data leakage in ML algorithms, noise with Gaussian, Laplace, or Exponential distribution is deliberately added to data in DP. The work in [29] proposes a global DP algorithm in FL and gives a theoretical explanation for the convergence behavior of the suggested scheme.

As discussed earlier, nodes in a distributed architecture differ in the data structure, dataset size, network condition, reliability, availability, and computation capabilities, which can even be time-varying. A privacy-preserving approach in FL is not effective unless paying attention to these personalized characteristics. Hence, there exist multiple works on DP with the content “adaptive” to compensate systems and statistical heterogeneities in FL. These works can be divided into two general directions based on the adaptability criterion. One direction injects an adaptive noise distribution to local parameters to enhance local protection. It considers each client separately without involving their heterogeneity. For instance, adaptive clip**  [30] finds the best clip** constant for DP in each device based on their local behaviors. In [31], noise with Laplace distribution is added to model updates based on the neurons’ contributions in the clients. The work in [32] achieves a trade-off between privacy and accuracy by adding more noise to less important parameters and less noise to more important ones.

The second direction, however, concentrates on personalized training in the heterogeneous networks. The work in [33] trains differentially private models in each client and uploads the local updates for the server. These directions both lack considering the local characteristics in the aggregating process. More specifically, they assume the same impact factor for all devices during aggregation, regardless of their local dissimilarities. This assumption not only simplifies the convergence analysis of the algorithm but also changes the DP requirements [8]. To the best of our knowledge, the privacy and convergence analysis in FL with non-identical and time-varying impact factors have not yet been studied in the existing literature.

In this paper, we combine the heterogeneity and privacy concerns in a novel FL scheme. Regardless of multi-task learning algorithms used in FL, each local model possesses a weight or an impact in the global cost function. This impact can be assigned considering many factors by the server or the clients. It can also change (increase or decrease) or even become zero during the learning process. We, therefore, propose a DP algorithm considering the non-identical impact factors, namely, personalized aggregation in differentially private federated learning (PADPFL). We further establish the convergence analysis of the algorithm and the influence of the additive noise on it.

In summary, the main contributions of this paper are as follows.

  • We propose a noise injection paradigm, PADPFL, that satisfies DP requirements with Gaussian distribution when clients have different impact factors in the aggregation process.

  • We perform a convergence analysis of the proposed algorithm for Non-IID clients when using fixed non-identical impact factors throughout training the global model.

  • We perform a convergence analysis of the proposed algorithm for Non-IID clients when using adaptive (time-varying) impact factors throughout training the global model.

  • We conduct evaluations on real-world datasets to verify the effectiveness of PADPFL, and observe the trade-off between model accuracy, privacy budget, and impact factors.

The remainder of this paper is organized as follows. In Section II, we review some preliminaries on FL,and DP. In Section III, we introduce our approach for a differentially private federated learning in a client and server side. Next, we analyse the convergence bound on the global loss function of the proposed solution for a fixed and time-varying impacts, in Section IV and V, respectively. Simulations and results are presented in Section IV, and the summary and conclusion are given in Section VI.

II Preliminaries

In this section, we briefly review some key materials of FL and DP.

II-A Federated Learning

The goal in a standard FL problem is to develop a global ML model for tens to millions of clients without direct access to their local datasets [34]. The only messages transmitted from the clients to the cloud server in this framework are the training parameter updates of the local loss functions. To formalize this goal, consider N𝑁Nitalic_N clients as depicted in Fig. 1. We wish to find weight matrix x𝑥xitalic_x minimizing the following loss function:

minxL(x),whereL(x):=i=1Npili(x,𝒟i),assignsubscript𝑥𝐿𝑥where𝐿𝑥superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖𝑥subscript𝒟𝑖\min_{x}L(x),\,\textnormal{where}\,L(x):=\sum_{i=1}^{N}p_{i}l_{i}(x,\mathcal{D% }_{i}),roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_L ( italic_x ) , where italic_L ( italic_x ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (1)

where lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represent the local loss function and training database of the i𝑖iitalic_i-th client, respectively. Moreover, the coefficient pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is considered as the relative impact factor of device i𝑖iitalic_i in the global model, so that

i=1Npi=1,0pi1formulae-sequencesuperscriptsubscript𝑖1𝑁subscript𝑝𝑖10subscript𝑝𝑖1\sum_{i=1}^{N}p_{i}=1,\qquad 0\leqslant p_{i}\leqslant 1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , 0 ⩽ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⩽ 1 (2)

In order to solve (1), matrix of the global parameters at itteration (t)𝑡(t)( italic_t ) is updated using weighted averaging of the trained local parameters (x1(t),x2t),xN(t)x_{1}^{(t)},x_{2}^{t}),...x_{N}^{(t)}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , … italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT[35]

x(t):=i=1Npixi(t),assignsuperscript𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡x^{(t)}:=\sum_{i=1}^{N}p_{i}x_{i}^{(t)},italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , (3)
Refer to caption
Figure 1: A FL training model.

To address heterogenities, FedProx [23] is utilized in the learning process. Therefore, defining

hi(xi(t+1);x(t))=li(xi(t+1))+μ2xi(t+1)x(t)2,γ[0,1],formulae-sequencesubscript𝑖superscriptsubscript𝑥𝑖𝑡1superscript𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇2superscriptdelimited-∥∥superscriptsubscript𝑥𝑖𝑡1superscript𝑥𝑡2𝛾01h_{i}(x_{i}^{(t+1)};x^{(t)})=l_{i}(x_{i}^{(t+1)})+\frac{\mu}{2}\lVert x_{i}^{(% t+1)}-x^{(t)}\rVert^{2},\quad\gamma\in[0,1],italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_γ ∈ [ 0 , 1 ] , (4)

x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT-inexact solution for minxhi(x,x(t))subscript𝑥subscript𝑖𝑥superscript𝑥𝑡\min_{x}h_{i}(x,x^{(t)})roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ).

II-B Differential Privacy

DP gives a rigorous mathematical definition of privacy and strongly guarantees preserving data in ML algorithms. A randomized mechanism \mathcal{M}caligraphic_M is differentially private if its output is robust to any change of one sample in the original dataset. The following definition formally clarifies this statement for (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP [27]:

Definition 1 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP).

A randomized mechanism :𝒳:𝒳\mathcal{M}:\mathcal{X}\rightarrow\mathcal{R}caligraphic_M : caligraphic_X → caligraphic_R satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy for two non-negative numbers ϵitalic-ϵ\epsilonitalic_ϵ and δ𝛿\deltaitalic_δ if for all adjacent datasets 𝒟𝒟\mathcal{D}caligraphic_D and 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT d(𝒟,𝒟)=1𝑑𝒟superscript𝒟1d(\mathcal{D},\mathcal{D}^{\prime})=1italic_d ( caligraphic_D , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 1, and for all subsets S𝑆S\subseteq\mathcal{R}italic_S ⊆ caligraphic_R, there holds

Pr[(𝒟)S]eϵPr[(𝒟)S]+δ,Prdelimited-[]𝒟𝑆superscript𝑒italic-ϵPrdelimited-[]superscript𝒟𝑆𝛿\textnormal{Pr}[\mathcal{M}(\mathcal{D})\in S]\leqslant e^{\epsilon}% \textnormal{Pr}[\mathcal{M}(\mathcal{D}^{\prime})\in S]+\delta,Pr [ caligraphic_M ( caligraphic_D ) ∈ italic_S ] ⩽ italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT Pr [ caligraphic_M ( caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_S ] + italic_δ , (5)

where the randomized algorithm \mathcal{M}caligraphic_M maps an input x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X discretely to (x)=y𝑥𝑦\mathcal{M}(x)=ycaligraphic_M ( italic_x ) = italic_y with probability ((x))y,ysubscript𝑥𝑦for-all𝑦(\mathcal{M}(x))_{y},\,\forall y\in\mathcal{R}( caligraphic_M ( italic_x ) ) start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , ∀ italic_y ∈ caligraphic_R. The probability space is defined over the coin flips of the mechanism \mathcal{M}caligraphic_M. Note that the difference between two datasets 𝒟𝒟\mathcal{D}caligraphic_D and 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, d(𝒟,𝒟)𝑑𝒟superscript𝒟d(\mathcal{D},\mathcal{D}^{\prime})italic_d ( caligraphic_D , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), is typically defined as the number of records on which they differ.

It is concluded from this definition that, with a probability of δ𝛿\deltaitalic_δ, the output of a differentially private mechanism on two adjacent datasets varies more than a factor of eϵsuperscript𝑒italic-ϵe^{\epsilon}italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT. Thus, smaller values of δ𝛿\deltaitalic_δ enhance the probability of having the same outputs. Smaller values of ϵitalic-ϵ\epsilonitalic_ϵ narrow down the privacy protection bound. The smaller ϵitalic-ϵ\epsilonitalic_ϵ and δ𝛿\deltaitalic_δ, the lower the risk of privacy violation.

Based on [27], considering f𝑓fitalic_f as an arbitrary d𝑑ditalic_d-dimensional function applied on a dataset, for ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) and c2ln(1.25/δ)𝑐21.25𝛿c\geqslant\sqrt{2\ln{(1.25/\delta)}}italic_c ⩾ square-root start_ARG 2 roman_ln ( 1.25 / italic_δ ) end_ARG, a Gaussian mechanism with parameter σcΔf/ϵ𝜎𝑐Δ𝑓italic-ϵ\sigma\geqslant c\Delta f/\epsilonitalic_σ ⩾ italic_c roman_Δ italic_f / italic_ϵ that deliberately adds Gaussian noise scaled to 𝒩(0,σ2)𝒩0superscript𝜎2\mathcal{N}(0,\sigma^{2})caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) to each output component of f𝑓fitalic_f is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differentially private. Here, ΔfΔ𝑓\Delta froman_Δ italic_f is the sensitivity of the function f𝑓fitalic_f defined by Δf=max𝒟,𝒟f(𝒟)f(𝒟)Δ𝑓subscript𝒟superscript𝒟𝑓𝒟𝑓superscript𝒟\Delta f=\max_{\mathcal{D},\mathcal{D}^{\prime}}\lVert f(\mathcal{D})-f({% \mathcal{D}^{\prime}})\rVertroman_Δ italic_f = roman_max start_POSTSUBSCRIPT caligraphic_D , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_f ( caligraphic_D ) - italic_f ( caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥.

III Personalized Differential Privacy in Federated Learning

In this section, we propose the personalized noise injection for preserving DP. We first describe the threat model and then propose the algorithm.

III-A Threat Model and Design Goals

We consider the cloud server to be an “honest-but-curious” entity, ie, the central server can use model inversion attacks to recover training data. Additionally, local and global parameter updates can be revealed to adversaries in the uploading and downloading channels. For this reason, the goal of our approach is to protect the weights transmitted between the server and clients from being inferred any extra information about users to both the server or external adversaries. Preserving global privacy is the primal goal of our approach, but it leads to a level of local privacy, as well.

Following from [29], we also assume that downloading channels are exposed to more external attacks than uploading channels as they are broadcasting. Hence, considering T𝑇Titalic_T aggregation times, the revelation of local parameters while uploading can be at most R𝑅Ritalic_R times (RT𝑅𝑇R\leqslant Titalic_R ⩽ italic_T).

III-B Proposed Privacy-Preserving Scheme

considering systems and statistical heterogeneities, each client influences the global loss function (L𝐿Litalic_L) differently. Apart from multi-task learning methods that aim to develop personalized local models, the aforementioned impact factor (pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) can play a vital role in training accurate models. The impact factors assigned to the clients at each iteration can strengthen or weaken the effect of local models in the global loss function.

Although using non-identical impact factors may seem a straightforward approach, its importance is underestimated in the literature. Assuming the natural setting pi=1Nsubscript𝑝𝑖1𝑁p_{i}=\frac{1}{N}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG or even pi=mimsubscript𝑝𝑖subscript𝑚𝑖𝑚p_{i}=\frac{m_{i}}{m}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG, where mi=|𝒟i|subscript𝑚𝑖subscript𝒟𝑖m_{i}=|\mathcal{D}_{i}|italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | and m=imi𝑚subscript𝑖subscript𝑚𝑖m=\sum_{i}m_{i}italic_m = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the total number of samples, is far from reality and oversimplifies the problem. In fact, clients participate in learning in different ways as their data and structures are not the same. To compensate for these heterogeneities, we assign different impacts to clients while aggregating models. The heterogeneities between the clients can come from several sources, including:

  • data quality

  • data reliability

  • dataset size

  • link quality

  • revelation probability

  • accessibility

  • client reliability

The first three items relate to the clients, while the remaining depends primarily on the knowledge of the central server.

While working collaboratively with possibly millions of users, even when datasets have the same nature, the quality of information used in local models is not the same. For instance, while modelling image datasets, the resolution of data varies between the clients, and hence, training local models based on low-quality images reduces the global model performance. Moreover, the reliability of local datasets can influence the validity of models, as they may contain irrelevant information. So, users can send additional bits to the cloud server, based on local model performances, to help the server assign relevant impact factors. On the other hand, the size of local datasets does not necessarily affect impact factors directly. In Non-IID structures, we may have valuable informative datasets that are relatively small in size, but global models should be biased in favor of them. Therefore, the assumption pi=mimsubscript𝑝𝑖subscript𝑚𝑖𝑚p_{i}=\frac{m_{i}}{m}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG would not be a wise choice.

Stemming from the fact that distributed learning requires training and inference of models over a wireless system, uncertainty and stochasticity exist in its nature. Link errors and delays can adversely influence the convergence speed of the learning algorithm [36]. However, utilizing variant impact factors can mitigate this challenge to some extent. When client k𝑘kitalic_k cannot synchronize itself with the others due to network faults, the server can distribute pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT among counterpart clients for that iteration to keep pace with the algorithm. Additionally, different levels of accessibility and reliability of local parameters lead to utilizing non-identical impact factors. When several groups of IoT devices and sensor networks perform the measurements and model updates cooperatively in a FL task, weighting the updates appropriately and based on the devices’ accuracy, reliability, or accessibility must be a priority for the server.

Along with all the aforementioned heterogeneity sources, impact factors can vary between global iterations. In other terms, the impacts assigned to the local model weights are not fixed throughout the learning process. They can change to accommodate different situations. For instance, a client sending accurate updates may become out of charge or encounter noisy links in the middle of learning process. Hence, a wise server should adaptively reduce the impact factor of its parameters to save the global model performance.

In this paper, we achieve (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP using Gaussian mechanism, which provides theoretical privacy guarantees for sharing DL model updates. Here, we calculate the amount of the client-side and server-side additive noise based on the sensitivity parameter when impact factors are non-identical. The differential privacy requirements are satisfied for each iteration, and the only limitation on impact factors is (2).

Client-side DP

Assume that local model parameters are sent to the cloud server as the updates. Setting the batch size equal to the local dataset size |𝒟i|=misubscript𝒟𝑖subscript𝑚𝑖\lvert\mathcal{D}_{i}\rvert=m_{i}| caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the function f𝑓fitalic_f to be protected is defined as

fi(𝒟i)xi=argminxli(x,𝒟i)=1mij=1miargminxli(x,𝒟i,j),i\begin{split}f_{i}(\mathcal{D}_{i})\triangleq x_{i}&=\text{arg}\min_{x}l_{i}(x% ,\mathcal{D}_{i})\\ &=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}\text{arg}\min_{x}l_{i}(x,\mathcal{D}_{i,j}% ),\quad\forall i\end{split}start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≜ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) , ∀ italic_i end_CELL end_ROW (6)

By clip** the local weights using a bounding limit B𝐵Bitalic_B, xiBdelimited-∥∥subscript𝑥𝑖𝐵\lVert x_{i}\rVert\leq B∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ italic_B [8], the sensitivity of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is calculated as

Δfi=max𝒟i,𝒟ifi,𝒟ifi,𝒟i=max𝒟i,𝒟i1mi(j=1miargminxli(x,𝒟i,j)j=1miargminxli(x,𝒟i,j))=1mimaxargminxli(x,𝒟i,k)argminxli(x,𝒟i,k)=2Bmi,Δsubscript𝑓𝑖subscriptsubscript𝒟𝑖subscriptsuperscript𝒟𝑖subscript𝑓𝑖subscript𝒟𝑖subscript𝑓𝑖subscriptsuperscript𝒟𝑖subscriptsubscript𝒟𝑖subscriptsuperscript𝒟𝑖1subscript𝑚𝑖superscriptsubscript𝑗1subscript𝑚𝑖argsubscript𝑥subscript𝑙𝑖𝑥subscript𝒟𝑖𝑗superscriptsubscript𝑗1subscript𝑚𝑖argsubscript𝑥subscript𝑙𝑖𝑥subscriptsuperscript𝒟𝑖𝑗1subscript𝑚𝑖argsubscript𝑥subscript𝑙𝑖𝑥subscript𝒟𝑖𝑘argsubscript𝑥subscript𝑙𝑖𝑥subscriptsuperscript𝒟𝑖𝑘2𝐵subscript𝑚𝑖\begin{split}\Delta f_{i}&=\max_{\mathcal{D}_{i},\mathcal{D}^{\prime}_{i}}% \lVert f_{i,\mathcal{D}_{i}}-f_{i,\mathcal{D}^{\prime}_{i}}\rVert\\ &=\max_{\mathcal{D}_{i},\mathcal{D}^{\prime}_{i}}\bigg{\lVert}\frac{1}{m_{i}}% \bigg{(}\sum_{j=1}^{m_{i}}\text{arg}\min_{x}l_{i}(x,\mathcal{D}_{i,j})\\ &-\sum_{j=1}^{m_{i}}\text{arg}\min_{x}l_{i}(x,\mathcal{D}^{\prime}_{i,j})\bigg% {)}\bigg{\rVert}\\ &=\frac{1}{m_{i}}\max\big{\lVert}\text{arg}\min_{x}l_{i}(x,\mathcal{D}_{i,k})-% \text{arg}\min_{x}l_{i}(x,\mathcal{D}^{\prime}_{i,k})\big{\rVert}\\ &=\frac{2B}{m_{i}},\end{split}start_ROW start_CELL roman_Δ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = roman_max start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_i , caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_i , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_max start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_max ∥ arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ) - arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 2 italic_B end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW (7)

where, based on the definition of sensitivity here, the i𝑖iitalic_i-th client’s dataset 𝒟isubscript𝒟𝑖\mathcal{D}_{i}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒟isubscriptsuperscript𝒟𝑖\mathcal{D}^{\prime}_{i}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT differ only in one sample (k𝑘kitalic_k-th sample).

To ensure (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP for each client in FL, we have to add Gaussian noise with parameter σ=c2Bmiϵ𝜎𝑐2𝐵subscript𝑚𝑖italic-ϵ\sigma=c\frac{2B}{m_{i}\epsilon}italic_σ = italic_c divide start_ARG 2 italic_B end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϵ end_ARG to the weight matrices of all clients before uploading. Considering the maximum revelation times R𝑅Ritalic_R, σ𝜎\sigmaitalic_σ should be multiplied with it to guarantee the desired protection level of local parameters. Hence, to have a united noise parameter, we define the standard deviation (SD) of the additive noise in the client-side as

σCi=2BRcmin{mi}ϵ,isubscript𝜎subscript𝐶𝑖2𝐵𝑅𝑐subscript𝑚𝑖italic-ϵfor-all𝑖\sigma_{C_{i}}=\frac{2BR\,c}{\min\{m_{i}\}\,\epsilon},\quad\forall iitalic_σ start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 2 italic_B italic_R italic_c end_ARG start_ARG roman_min { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } italic_ϵ end_ARG , ∀ italic_i (8)

Server-side DP

The function to be protected in the server side is the global aggregated weight transmitted to the clients defined by

fx=i=1Npixi.𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑥𝑖f\triangleq x=\sum_{i=1}^{N}p_{i}x_{i}.italic_f ≜ italic_x = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (9)

Based on the analysis provided in [8] and the client-side sensitivity in (7), the sensitivity of f𝑓fitalic_f is bounded as

Δf2Bmax{pi}min{mi}.Δ𝑓2𝐵subscript𝑝𝑖subscript𝑚𝑖\Delta f\leq 2B\frac{\max\{p_{i}\}}{\min\{m_{i}\}}.roman_Δ italic_f ≤ 2 italic_B divide start_ARG roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_min { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG . (10)

Here, to find the SD of the additive Gaussian noise in the server, we first calculate the distribution of the aggregated local noises. The aggregated noisy weight at each iteration is given as

x=i=1npix~i=i=1Npi(xi+ni)=i=1Npixi+i=1Npinin,𝑥superscriptsubscript𝑖1𝑛subscript𝑝𝑖subscript~𝑥𝑖superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑥𝑖subscript𝑛𝑖superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑥𝑖subscriptsuperscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑛𝑖𝑛x=\sum_{i=1}^{n}p_{i}\tilde{x}_{i}=\sum_{i=1}^{N}p_{i}(x_{i}+n_{i})=\sum_{i=1}% ^{N}p_{i}x_{i}+\underbrace{\sum_{i=1}^{N}p_{i}n_{i}}_{n},italic_x = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (11)

where, for the independent normally distributed ni𝒩(0,σCi2)similar-tosubscript𝑛𝑖𝒩0superscriptsubscript𝜎subscript𝐶𝑖2n_{i}\sim\mathcal{N}(0,\sigma_{C_{i}}^{2})italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), we have

n𝒩(0,i=1NσCi2pi2σAC2).similar-to𝑛𝒩0subscriptsuperscriptsubscript𝑖1𝑁superscriptsubscript𝜎subscript𝐶𝑖2superscriptsubscript𝑝𝑖2subscriptsuperscript𝜎2𝐴𝐶n\sim\mathcal{N}(0,\underbrace{\sum_{i=1}^{N}{\sigma_{C_{i}}^{2}p_{i}^{2}}}_{% \sigma^{2}_{AC}}).italic_n ∼ caligraphic_N ( 0 , under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A italic_C end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (12)

Therefore, we have the following theorem to ensure (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP from the server perspective.

Theorem 1 (server-side DP).

Considering T𝑇Titalic_T as the aggregation times and the maximum revelations in the broadcasting channels, the SD of the server-side noise is given by

σS={2BcT2maxpi2R2i=1Npi2min{mi}ϵ,ifT>Ripi2max{pi}0,otherwise.subscript𝜎𝑆cases2𝐵𝑐superscript𝑇2superscriptsubscript𝑝𝑖2superscript𝑅2superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖2subscript𝑚𝑖italic-ϵif𝑇𝑅subscript𝑖superscriptsubscript𝑝𝑖2subscript𝑝𝑖0otherwise.\sigma_{S}=\begin{cases}\frac{2Bc\sqrt{T^{2}\max{p_{i}}^{2}-R^{2}\sum_{i=1}^{N% }p_{i}^{2}}}{\min\{m_{i}\}\,\epsilon},&{\text{if}}\ T>\frac{R\sqrt{\sum_{i}p_{% i}^{2}}}{\max\{p_{i}\}}\\ {0,}&{\text{otherwise.}}\end{cases}italic_σ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = { start_ROW start_CELL divide start_ARG 2 italic_B italic_c square-root start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG roman_min { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } italic_ϵ end_ARG , end_CELL start_CELL if italic_T > divide start_ARG italic_R square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise. end_CELL end_ROW (13)
Proof.

The standard deviation of the total desired noise, based on (10), is σA=2BTcmax{pi}min{mi}ϵsubscript𝜎𝐴2𝐵𝑇𝑐subscript𝑝𝑖subscript𝑚𝑖italic-ϵ\sigma_{A}=\frac{2B\,T\,c\max\{p_{i}\}}{\min\{m_{i}\}\,\epsilon}italic_σ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = divide start_ARG 2 italic_B italic_T italic_c roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_min { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } italic_ϵ end_ARG. Hence, the variance of the server-side Gaussian noise is calculated by

σS2=σA2σAC2,subscriptsuperscript𝜎2𝑆subscriptsuperscript𝜎2𝐴subscriptsuperscript𝜎2𝐴𝐶\sigma^{2}_{S}=\sigma^{2}_{A}-\sigma^{2}_{AC},italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT - italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A italic_C end_POSTSUBSCRIPT , (14)

which results in (13). ∎

Applying the client and server-side noise with the calculated Gaussian distributions satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP theoretically in the uploading and downloading channels for each iteration. Since the involved clients add noise to the local parameters before uploading for the server, a level of local privacy is also achieved here. The server, subsequently, chooses the relative impact factors based on the information acquired and updates the global parameters. Then, it decides on the extra server-side noise, ns𝒩(0,σS)similar-tosubscript𝑛𝑠𝒩0subscript𝜎𝑆n_{s}\sim\mathcal{N}(0,\sigma_{S})italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ), and transmits x~=x+ns~𝑥𝑥subscript𝑛𝑠\tilde{x}=x+n_{s}over~ start_ARG italic_x end_ARG = italic_x + italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for the upcoming training cycle.

IV Convergence Analysis of the
Personalized DP in FL

In this section, we analyze the convergence properties of the proposed algorithm for personalized DP in FL. Our main purpose is to reach a convergence upper limit for the algorithm when we have personalized impact factors. The required assumptions for our analysis about the properties of the global and local loss functions, regarding their relation L(x)=i=1Npili(x)𝐿𝑥superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖𝑥L(x)=\sum_{i=1}^{N}p_{i}l_{i}(x)italic_L ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ), are as follows:

Assumption 1.
  1. 1.

    li(x)subscript𝑙𝑖𝑥l_{i}(x)italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) is convex.

  2. 2.

    li(x)subscript𝑙𝑖𝑥l_{i}(x)italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) is ρ𝜌\rhoitalic_ρ-Lipschitz smooth, i.e., li(a)li(b)ρab,a,bdelimited-∥∥subscript𝑙𝑖𝑎subscript𝑙𝑖𝑏𝜌delimited-∥∥𝑎𝑏for-all𝑎𝑏\lVert\nabla l_{i}(a)-\nabla l_{i}(b)\rVert\leqslant\rho\lVert a-b\rVert,\,% \forall a,b∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_b ) ∥ ⩽ italic_ρ ∥ italic_a - italic_b ∥ , ∀ italic_a , italic_b.

  3. 3.

    L(x(0))L(x)=Θ𝐿superscript𝑥0𝐿superscript𝑥ΘL(x^{(0)})-L(x^{*})=\Thetaitalic_L ( italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) - italic_L ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = roman_Θ; where x(0)superscript𝑥0x^{(0)}italic_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represent the initial and optimal model parameters, respectively.

  4. 4.

    li(x)L(x)ε,i,xdelimited-∥∥subscript𝑙𝑖𝑥𝐿𝑥𝜀for-all𝑖𝑥\lVert\nabla l_{i}(x)-\nabla L(x)\rVert\leqslant\varepsilon,\,\forall i,x∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - ∇ italic_L ( italic_x ) ∥ ⩽ italic_ε , ∀ italic_i , italic_x; where ε𝜀\varepsilonitalic_ε is the divergence measure.

Note that the distribution of local datasets in the non-i.i.d fashion breaks the general assumption of pi=mimsubscript𝑝𝑖subscript𝑚𝑖𝑚p_{i}=\frac{m_{i}}{m}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG. Hence, the expectation over clients 𝔼{li(x)}𝔼subscript𝑙𝑖𝑥\mathbb{E}\{l_{i}(x)\}blackboard_E { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) } is not considered equal with the global expectation 𝔼{L(x)}𝔼𝐿𝑥\mathbb{E}\{L(x)\}blackboard_E { italic_L ( italic_x ) }. The only assumption on relative impact factors is i=1Npi=1superscriptsubscript𝑖1𝑁subscript𝑝𝑖1\sum_{i=1}^{N}p_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1.

As the first step through our convergence bounding analysis, we present the following lemma for the local dissimilarity measure, when having non-identical impacts.

Lemma 1 (A𝐴Aitalic_A-local dissimilarity).

For the local loss functions lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with impact factors pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the FL global function L𝐿Litalic_L, there exists A𝐴Aitalic_A as a measure of dissimilarity at x𝑥xitalic_x such that

i=1Npili(x)L(x)Ai,superscriptsubscript𝑖1𝑁subscript𝑝𝑖normsubscript𝑙𝑖𝑥norm𝐿𝑥𝐴for-all𝑖\sum_{i=1}^{N}p_{i}\|\nabla l_{i}(x)\|\leqslant\|\nabla L(x)\|A\quad\forall i,∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ ⩽ ∥ ∇ italic_L ( italic_x ) ∥ italic_A ∀ italic_i , (15)
Proof.

Due to Assumption 1, we have

li(x)L(x)2ε2superscriptnormsubscript𝑙𝑖𝑥𝐿𝑥2superscript𝜀2\|\nabla l_{i}(x)-\nabla L(x)\|^{2}\leqslant\varepsilon^{2}∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (16)

and

li(x)L(x)2=li(x)22li(x)L(x)+L(x)2.superscriptdelimited-∥∥subscript𝑙𝑖𝑥𝐿𝑥2superscriptdelimited-∥∥subscript𝑙𝑖𝑥22subscript𝑙𝑖superscript𝑥top𝐿𝑥superscriptdelimited-∥∥𝐿𝑥2\|\nabla l_{i}(x)-\nabla L(x)\|^{2}\\ =\|\nabla l_{i}(x)\|^{2}-2\nabla l_{i}(x)^{\top}\nabla L(x)+\|\nabla L(x)\|^{2}.start_ROW start_CELL ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL = ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ italic_L ( italic_x ) + ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (17)

Considering (17) and multiplying (16) with pi,isubscript𝑝𝑖for-all𝑖p_{i},\forall iitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i yields

i=1Npili(x)22i=1Npili(x)L(x)+L(w)2i=1Npiε2i=1Npi.superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptdelimited-∥∥subscript𝑙𝑖𝑥22superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript𝑥top𝐿𝑥superscriptdelimited-∥∥𝐿𝑤2superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscript𝜀2superscriptsubscript𝑖1𝑁subscript𝑝𝑖\sum_{i=1}^{N}p_{i}\|\nabla l_{i}(x)\|^{2}-2\sum_{i=1}^{N}p_{i}\nabla l_{i}(x)% ^{\top}\nabla L(x)\\ +\|\nabla L(w)\|^{2}\sum_{i=1}^{N}p_{i}\leqslant\varepsilon^{2}\sum_{i=1}^{N}p% _{i}.start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ italic_L ( italic_x ) end_CELL end_ROW start_ROW start_CELL + ∥ ∇ italic_L ( italic_w ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⩽ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW (18)

Considering i=1Npi=1superscriptsubscript𝑖1𝑁subscript𝑝𝑖1\sum_{i=1}^{N}p_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 and i=1Npili(x)=L(x)superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖𝑥𝐿𝑥\sum_{i=1}^{N}p_{i}\nabla l_{i}(x)=\nabla L(x)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = ∇ italic_L ( italic_x ), we have

i=1Npili(x)22L(x)L(x)L(x)2+ε2=L(x)2+ε2=L(x)2A1(x)2.superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptdelimited-∥∥subscript𝑙𝑖𝑥22𝐿superscript𝑥top𝐿𝑥superscriptdelimited-∥∥𝐿𝑥2superscript𝜀2superscriptdelimited-∥∥𝐿𝑥2superscript𝜀2superscriptdelimited-∥∥𝐿𝑥2subscript𝐴1superscript𝑥2\sum_{i=1}^{N}p_{i}\|\nabla l_{i}(x)\|^{2}\leqslant 2\nabla L(x)^{\top}\nabla L% (x)-\|\nabla L(x)\|^{2}+\varepsilon^{2}\\ =\|\nabla L(x)\|^{2}+\varepsilon^{2}=\|\nabla L(x)\|^{2}A_{1}(x)^{2}.start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 ∇ italic_L ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ italic_L ( italic_x ) - ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL = ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (19)

Note that when L(x)20superscriptnorm𝐿𝑥20\|\nabla L(x)\|^{2}\neq 0∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≠ 0, there exists

A1(x)=1+ε2L(x)21.subscript𝐴1𝑥1superscript𝜀2superscriptnorm𝐿𝑥21A_{1}(x)=\sqrt{1+\frac{\varepsilon^{2}}{\|\nabla L(x)\|^{2}}}\geqslant 1.italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = square-root start_ARG 1 + divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ⩾ 1 . (20)

Therefore, we have

i=1Npili(x)2L(x)2A12,superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptnormsubscript𝑙𝑖𝑥2superscriptnorm𝐿𝑥2superscriptsubscript𝐴12\sum_{i=1}^{N}p_{i}\|\nabla l_{i}(x)\|^{2}\leqslant\|\nabla L(x)\|^{2}A_{1}^{2},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ ∥ ∇ italic_L ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (21)

where A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the upper bound of A1(x)subscript𝐴1𝑥A_{1}(x)italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ). Considering (21), there also exists A1𝐴1A\geqslant 1italic_A ⩾ 1 such that

i=1Npili(x)L(x)A.superscriptsubscript𝑖1𝑁subscript𝑝𝑖normsubscript𝑙𝑖𝑥norm𝐿𝑥𝐴\sum_{i=1}^{N}p_{i}\|\nabla l_{i}(x)\|\leqslant\|\nabla L(x)\|A.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ ⩽ ∥ ∇ italic_L ( italic_x ) ∥ italic_A . (22)

This completes the proof. ∎

Now, the following lemma gives an expected upper bound on the increment of global loss value per-iteration, when DP noise injection is adopted.

Lemma 2 (Per-iteration expected increment).

The expected difference of global loss functions in two consecutive iterations (t)𝑡(t)( italic_t ) and (t+1)𝑡1(t+1)( italic_t + 1 ), or the per-iteration expected increment in the value of the loss function, has the following upper limit:

𝔼{L(x~(t+1))L(x~(t))}λ2L(x~(t))2+λ1𝔼{n(t+1)}L(x~(t))+λ0𝔼{n(t+1)2},𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡subscript𝜆2superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2subscript𝜆1𝔼delimited-∥∥superscript𝑛𝑡1delimited-∥∥𝐿superscript~𝑥𝑡subscript𝜆0𝔼superscriptdelimited-∥∥superscript𝑛𝑡12\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant\lambda_{2}\|L(% \tilde{x}^{(t)})\|^{2}\\ +\lambda_{1}\mathbb{E}\{\|n^{(t+1)}\|\}\|L(\tilde{x}^{(t)})\|+\lambda_{0}% \mathbb{E}\{\|n^{(t+1)}\|^{2}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW (23)

where

λ2=1μ+Aμ(γ+ρ(1+γ)μ¯)+ρA2(1+γ)22μ¯2,subscript𝜆21𝜇𝐴𝜇𝛾𝜌1𝛾¯𝜇𝜌superscript𝐴2superscript1𝛾22superscript¯𝜇2\lambda_{2}=-\frac{1}{\mu}+\frac{A}{\mu}\left(\gamma+\frac{\rho(1+\gamma)}{% \overline{\mu}}\right)+\frac{\rho A^{2}{(1+\gamma)}^{2}}{2{\overline{\mu}}^{2}},italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_A end_ARG start_ARG italic_μ end_ARG ( italic_γ + divide start_ARG italic_ρ ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
λ1=1+ρA(1+γ)μ¯,λ0=ρ2,formulae-sequencesubscript𝜆11𝜌𝐴1𝛾¯𝜇subscript𝜆0𝜌2\lambda_{1}=1+\frac{\rho A(1+\gamma)}{\overline{\mu}},\lambda_{0}=\frac{\rho}{% 2},italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG , italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ,

and n(t)=i=1Npini(t)+ns(t)superscript𝑛𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝑛𝑠𝑡n^{(t)}=\sum_{i=1}^{N}p_{i}n_{i}^{(t)}+n_{s}^{(t)}italic_n start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is the aggregated noise of the clients and server in each cycle.

Proof.

Considering the aggregation process with artificial noises of the client and server side in the (t+1)𝑡1(t+1)( italic_t + 1 )-th aggregation, we have

x~(t+1)=i=1Npixi(t+1)+n(t+1),superscript~𝑥𝑡1superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript𝑛𝑡1\tilde{x}^{(t+1)}=\sum_{i=1}^{N}p_{i}x_{i}^{(t+1)}+n^{(t+1)},over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , (24)

where

n(t)=i=1Npini(t)+ns(t)superscript𝑛𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝑛𝑠𝑡n^{(t)}=\sum_{i=1}^{N}p_{i}n_{i}^{(t)}+n_{s}^{(t)}italic_n start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT (25)

Because li()subscript𝑙𝑖l_{i}(\cdot)italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) is ρ𝜌\rhoitalic_ρ-Lipschitz smooth, we have

li(x~(t+1))li(x~(t))+li(x~(t))(x~(t+1)x~(t))+ρ2x~(t+1)x~(t)2subscript𝑙𝑖superscript~𝑥𝑡1subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsuperscript~𝑥𝑡topsuperscript~𝑥𝑡1superscript~𝑥𝑡𝜌2superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2l_{i}(\tilde{x}^{(t+1)})\leqslant l_{i}(\tilde{x}^{(t)})+\nabla l_{i}(\tilde{x% }^{(t)})^{\top}(\tilde{x}^{(t+1)}-\tilde{x}^{(t)})\\ +\frac{\rho}{2}\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}start_ROW start_CELL italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ⩽ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW (26)

for all x~(t+1),x~(t)superscript~𝑥𝑡1superscript~𝑥𝑡\tilde{x}^{(t+1)},\leavevmode\nobreak\ \tilde{x}^{(t)}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. Summation of (26) multiplied with pi,isubscript𝑝𝑖for-all𝑖p_{i},\forall iitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i yields

i=1Npili(x~(t+1))i=1Npili(x~(t))+i=1Npili(x~(t))(x~(t+1)x~(t))+ρ2x~(t+1)x~(t)2i=1Npi.superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡1superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscriptsuperscript~𝑥𝑡topsuperscript~𝑥𝑡1superscript~𝑥𝑡𝜌2superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2superscriptsubscript𝑖1𝑁subscript𝑝𝑖\begin{split}&\sum_{i=1}^{N}p_{i}l_{i}(\tilde{x}^{(t+1)})\leqslant\sum_{i=1}^{% N}p_{i}l_{i}(\tilde{x}^{(t)})\\ &+\sum_{i=1}^{N}p_{i}\nabla l_{i}(\tilde{x}^{(t)})^{\top}(\tilde{x}^{(t+1)}-% \tilde{x}^{(t)})\\ &+\frac{\rho}{2}\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\sum_{i=1}^{N}p_{i}.% \end{split}start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW (27)

Considering the definition of global loss function L()𝐿L(\cdot)italic_L ( ⋅ ) and i=1Npi=1superscriptsubscript𝑖1𝑁subscript𝑝𝑖1\sum_{i=1}^{N}p_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, we have

L(x~(t+1))L(x~(t))L(x~(t))(x~(t+1)x~(t))+ρ2x~(t+1)x~(t)2𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡𝐿superscriptsuperscript~𝑥𝑡topsuperscript~𝑥𝑡1superscript~𝑥𝑡𝜌2superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\leqslant\nabla L(\tilde{x}^{(t)})^{% \top}(\tilde{x}^{(t+1)}-\tilde{x}^{(t)})\\ +\frac{\rho}{2}\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}start_ROW start_CELL italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⩽ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW (28)

and therefore,

𝔼{L(x~(t+1))L(x~(t))}𝔼{L(x~(t)),(x~(t+1)x~(t))}+ρ2𝔼{x~(t+1)x~(t)2}𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡𝔼𝐿superscript~𝑥𝑡superscript~𝑥𝑡1superscript~𝑥𝑡𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2\begin{split}&\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant\\ &\mathbb{E}\left\{\left\langle\nabla L(\tilde{x}^{(t)}),(\tilde{x}^{(t+1)}-% \tilde{x}^{(t)})\right\rangle\right\}+\frac{\rho}{2}\mathbb{E}\{\|\tilde{x}^{(% t+1)}-\tilde{x}^{(t)}\|^{2}\}\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL blackboard_E { ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ } + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW (29)

Defining

h(xi(t+1);x~(t))li(xi(t+1))+μ2xi(t+1)x~(t)2,superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇2superscriptnormsuperscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡2h(x_{i}^{(t+1)};\tilde{x}^{(t)})\triangleq l_{i}(x_{i}^{(t+1)})+\frac{\mu}{2}% \|x_{i}^{(t+1)}-\tilde{x}^{(t)}\|^{2},italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ≜ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (30)

we have

h(xi(t+1);x~(t))=li(xi(t+1))+μ(xi(t+1)x~(t)).superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})=\nabla l_{i}(x_{i}^{(t+1)})+\mu(x_{i}^% {(t+1)}-\tilde{x}^{(t)}).∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) + italic_μ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) . (31)

Summation of (31) multiplied with pi,isubscript𝑝𝑖for-all𝑖p_{i},\,\forall iitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i yields

i=1Npih(xi(t+1);x~(t))=i=1Npili(xi(t+1))+μi=1Npi(xi(t+1)x~(t))=i=1Npili(xi(t+1))+μi=1Npixi(t+1)μx~(t)superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscript~𝑥𝑡\sum_{i=1}^{N}p_{i}\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})=\sum_{i=1}^{N}p_{i}% \nabla l_{i}(x_{i}^{(t+1)})\\ +\mu\sum_{i=1}^{N}p_{i}(x_{i}^{(t+1)}-\tilde{x}^{(t)})=\sum_{i=1}^{N}p_{i}% \nabla l_{i}(x_{i}^{(t+1)})\\ +\mu\sum_{i=1}^{N}p_{i}x_{i}^{(t+1)}-\mu\tilde{x}^{(t)}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - italic_μ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_CELL end_ROW (32)

and therefore,

x~(t+1)x~(t)=i=1Npixi(t+1)+n(t+1)x~(t)=1μ[i=1Npi(h(xi(t+1);x~(t))li(xi(t+1)))]+n(t+1).superscript~𝑥𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript𝑛𝑡1superscript~𝑥𝑡1𝜇delimited-[]superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1superscript𝑛𝑡1\begin{split}&\tilde{x}^{(t+1)}-\tilde{x}^{(t)}=\sum_{i=1}^{N}p_{i}x_{i}^{(t+1% )}+n^{(t+1)}-\tilde{x}^{(t)}\\ &=\frac{1}{\mu}\left[\sum_{i=1}^{N}p_{i}\left(\nabla h(x_{i}^{(t+1)};\tilde{x}% ^{(t)})-\nabla l_{i}(x_{i}^{(t+1)})\right)\right]+n^{(t+1)}.\end{split}start_ROW start_CELL end_CELL start_CELL over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ] + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT . end_CELL end_ROW (33)

Substituting (33) into (29), we obtain

𝔼{L(x~(t+1))L(x~(t))}𝔼{1μL(x~(t)),i=1Npi(h(xi(t+1);x~(t))li(xi(t+1)))+L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}=𝔼{1μL(x~(t)),i=1Npi(h(xi(t+1);x~(t))li(xi(t+1))+li(x~(t)))i=1Npili(x~(t))+L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}=1μL(x~(t))2+𝔼{1μL(x~(t)),i=1Npih(xi(t+1);x~(t))+i=1Npi(li(x~(t))li(xi(t+1)))}+𝔼{L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1subscript𝑙𝑖superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡21𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝔼𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2\begin{split}&\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant% \mathbb{E}\Bigg{\{}\frac{1}{\mu}\Big{\langle}\nabla L(\tilde{x}^{(t)}),\\ &\sum_{i=1}^{N}p_{i}\left(\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})-\nabla l_{i}% (x_{i}^{(t+1)})\right)\Big{\rangle}\\ &+\left\langle\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\right\rangle\Bigg{\}}+\frac{% \rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\right\}\\ &=\mathbb{E}\Bigg{\{}\frac{1}{\mu}\Big{\langle}\nabla L(\tilde{x}^{(t)}),\sum_% {i=1}^{N}p_{i}\left(\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})\right.\\ &-\nabla l_{i}(x_{i}^{(t+1)})+\left.\nabla l_{i}(\tilde{x}^{(t)})\right)-\sum_% {i=1}^{N}p_{i}\nabla l_{i}(\tilde{x}^{(t)})\Big{\rangle}\\ &+\Big{\langle}\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\Big{\rangle}\Bigg{\}}+\frac% {\rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\right\}\\ &=-\frac{1}{\mu}\|\nabla L(\tilde{x}^{(t)})\|^{2}+\mathbb{E}\Bigg{\{}\frac{1}{% \mu}\Big{\langle}\nabla L(\tilde{x}^{(t)}),\\ &\sum_{i=1}^{N}p_{i}\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})+\sum_{i=1}^{N}p_{i% }\left(\nabla l_{i}(\tilde{x}^{(t)})\right.\\ &-\left.\nabla l_{i}(x_{i}^{(t+1)})\right)\Big{\rangle}\Bigg{\}}+\mathbb{E}% \left\{\Big{\langle}\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\Big{\rangle}\right\}\\ &+\frac{\rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}% \right\}\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ⟩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) + ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ⟩ } + blackboard_E { ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW (34)

Now, let us bound x~(t+1)x~(t)normsuperscript~𝑥𝑡1superscript~𝑥𝑡\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥. We know

xi(t+1)x~(t)xi(t+1)x^i(t+1)+x^i(t+1)x~(t),normsuperscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡normsuperscriptsubscript𝑥𝑖𝑡1superscriptsubscript^𝑥𝑖𝑡1normsuperscriptsubscript^𝑥𝑖𝑡1superscript~𝑥𝑡\|x_{i}^{(t+1)}-\tilde{x}^{(t)}\|\leqslant\|x_{i}^{(t+1)}-\hat{x}_{i}^{(t+1)}% \|+\|\hat{x}_{i}^{(t+1)}-\tilde{x}^{(t)}\|,∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ⩽ ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ + ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ , (35)

where x^i(t+1)=argminxhi(x;x~(t))superscriptsubscript^𝑥𝑖𝑡1argsubscript𝑥subscript𝑖𝑥superscript~𝑥𝑡\hat{x}_{i}^{(t+1)}=\text{arg}\min_{x}h_{i}(x;\tilde{x}^{(t)})over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = arg roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) Define μ¯=μρ>0¯𝜇𝜇subscript𝜌0\overline{\mu}=\mu-\rho_{-}>0over¯ start_ARG italic_μ end_ARG = italic_μ - italic_ρ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT > 0, due to the μ¯¯𝜇\overline{\mu}over¯ start_ARG italic_μ end_ARG-convexity of hi(x;x~(t))subscript𝑖𝑥superscript~𝑥𝑡h_{i}(x;\tilde{x}^{(t)})italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) we have

x^i(t+1)xi(t+1)γμ¯li(x~(t))normsuperscriptsubscript^𝑥𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1𝛾¯𝜇normsubscript𝑙𝑖superscript~𝑥𝑡\|\hat{x}_{i}^{(t+1)}-x_{i}^{(t+1)}\|\leqslant\frac{\gamma}{\overline{\mu}}\|% \nabla l_{i}(\tilde{x}^{(t)})\|∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ⩽ divide start_ARG italic_γ end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ (36)

and

x^i(t+1)x~(t)1μ¯li(x~(t))normsuperscriptsubscript^𝑥𝑖𝑡1superscript~𝑥𝑡1¯𝜇normsubscript𝑙𝑖superscript~𝑥𝑡\|\hat{x}_{i}^{(t+1)}-\tilde{x}^{(t)}\|\leqslant\frac{1}{\overline{\mu}}\|% \nabla l_{i}(\tilde{x}^{(t)})\|∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ⩽ divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ (37)

where γ[0,1]𝛾01\gamma\in[0,1]italic_γ ∈ [ 0 , 1 ] denotes a γ𝛾\gammaitalic_γ-inexact solution of minxhi(x;x~(t))subscript𝑥subscript𝑖𝑥superscript~𝑥𝑡\min_{x}h_{i}(x;\tilde{x}^{(t)})roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) [37]. For such a solution, x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have

h(x0;x~)γh(x~;x~).normsubscript𝑥0~𝑥𝛾norm~𝑥~𝑥\|\nabla h(x_{0};\tilde{x})\|\leqslant\gamma\|\nabla h(\tilde{x};\tilde{x})\|.∥ ∇ italic_h ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; over~ start_ARG italic_x end_ARG ) ∥ ⩽ italic_γ ∥ ∇ italic_h ( over~ start_ARG italic_x end_ARG ; over~ start_ARG italic_x end_ARG ) ∥ . (38)

Now we can use (36) and (37) to obtain

xi(t+1)x~(t)1+γμ¯li(x~(t)).normsuperscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡1𝛾¯𝜇normsubscript𝑙𝑖superscript~𝑥𝑡\|x_{i}^{(t+1)}-\tilde{x}^{(t)}\|\leqslant\frac{1+\gamma}{\overline{\mu}}\|% \nabla l_{i}(\tilde{x}^{(t)})\|.∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ⩽ divide start_ARG 1 + italic_γ end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ . (39)

Therefore,

x~(t+1)x~(t)=x(t+1)+n(t+1)x~(t)x(t+1)x~(t)+n(t+1)=i=1Npi(xi(t+1)x~(t))+n(t+1)i=1Npixi(t+1)x~(t)+n(t+1)i=1Npi(1+γμ¯li(x~(t)))+n(t+1)A(1+γ)μ¯L(x~(t))+n(t+1).delimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑥𝑡1superscript𝑛𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑥𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1delimited-∥∥superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1superscriptsubscript𝑖1𝑁subscript𝑝𝑖delimited-∥∥superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1superscriptsubscript𝑖1𝑁subscript𝑝𝑖1𝛾¯𝜇delimited-∥∥subscript𝑙𝑖superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1\begin{split}\|\tilde{x}^{(t+1)}&-\tilde{x}^{(t)}\|=\|x^{(t+1)}+n^{(t+1)}-% \tilde{x}^{(t)}\|\\ &\leqslant\|x^{(t+1)}-\tilde{x}^{(t)}\|+\|n^{(t+1)}\|\\ &=\left\|\sum_{i=1}^{N}p_{i}(x_{i}^{(t+1)}-\tilde{x}^{(t)})\right\|+\|n^{(t+1)% }\|\\ &\leqslant\sum_{i=1}^{N}p_{i}\|x_{i}^{(t+1)}-\tilde{x}^{(t)}\|+\|n^{(t+1)}\|\\ &\leqslant\sum_{i=1}^{N}p_{i}\left(\frac{1+\gamma}{\overline{\mu}}\|\nabla l_{% i}(\tilde{x}^{(t)})\|\right)+\|n^{(t+1)}\|\\ &\leqslant\frac{A(1+\gamma)}{\overline{\mu}}\|\nabla L(\tilde{x}^{(t)})\|+\|n^% {(t+1)}\|.\end{split}start_ROW start_CELL ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ = ∥ italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∥ italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( divide start_ARG 1 + italic_γ end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ) + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ divide start_ARG italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ . end_CELL end_ROW (40)

Since li()subscript𝑙𝑖l_{i}(\cdot)italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) is ρ𝜌\rhoitalic_ρ-Lipschitz smooth, we have

li(x~(t))li(xi(t+1))ρx~(t)xi(t+1)normsubscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜌normsuperscript~𝑥𝑡superscriptsubscript𝑥𝑖𝑡1\|\nabla l_{i}(\tilde{x}^{(t)})-\nabla l_{i}(x_{i}^{(t+1)})\|\leqslant\rho\|% \tilde{x}^{(t)}-x_{i}^{(t+1)}\|∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ∥ ⩽ italic_ρ ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ (41)

Using the triangle inequality, (38), (40), and (41), we obtain

i=1Npih(wi(t+1);x~(t))+i=1Npi(li(x~(t))li(xi(t+1)))i=1Npih(xi(t+1);x~(t))+i=1Npi(li(x~(t))li(xi(t+1)))i=1Npih(xi(t+1);x~(t))+i=1Npi(li(x~(t))li(xi(t+1)))γi=1Npili(x~(t))+ρi=1Npix~(t)xi(t+1)AγL(x~(t))+ρA(1+γ)μ¯L(x~(t)).delimited-∥∥superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑤𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1delimited-∥∥superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1superscriptsubscript𝑖1𝑁subscript𝑝𝑖delimited-∥∥superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖delimited-∥∥subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝛾superscriptsubscript𝑖1𝑁subscript𝑝𝑖delimited-∥∥subscript𝑙𝑖superscript~𝑥𝑡𝜌superscriptsubscript𝑖1𝑁subscript𝑝𝑖delimited-∥∥superscript~𝑥𝑡superscriptsubscript𝑥𝑖𝑡1𝐴𝛾delimited-∥∥𝐿superscript~𝑥𝑡𝜌𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡\begin{split}&\Bigg{\|}\sum_{i=1}^{N}p_{i}\nabla h(w_{i}^{(t+1)};\tilde{x}^{(t% )})+\sum_{i=1}^{N}p_{i}\left(\nabla l_{i}(\tilde{x}^{(t)})\right.\\ &-\left.\nabla l_{i}(x_{i}^{(t+1)})\right)\Bigg{\|}\leqslant\left\|\sum_{i=1}^% {N}p_{i}\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})\right\|\\ &+\left\|\sum_{i=1}^{N}p_{i}\left(\nabla l_{i}(\tilde{x}^{(t)})\right.-\left.% \nabla l_{i}(x_{i}^{(t+1)})\right)\right\|\\ &\leqslant\sum_{i=1}^{N}p_{i}\left\|\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})% \right\|+\sum_{i=1}^{N}p_{i}\Big{\|}\left(\nabla l_{i}(\tilde{x}^{(t)})\right.% \\ &-\left.\nabla l_{i}(x_{i}^{(t+1)})\right)\Big{\|}\leqslant\gamma\sum_{i=1}^{N% }p_{i}\|\nabla l_{i}(\tilde{x}^{(t)})\|\\ &+\rho\sum_{i=1}^{N}p_{i}\|\tilde{x}^{(t)}-x_{i}^{(t+1)}\|\leqslant A\gamma\|% \nabla L(\tilde{x}^{(t)})\|\\ &+\frac{\rho A(1+\gamma)}{\overline{\mu}}\|\nabla L(\tilde{x}^{(t)})\|.\end{split}start_ROW start_CELL end_CELL start_CELL ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_h ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ∥ ⩽ ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ( ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ∥ ⩽ italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_ρ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ⩽ italic_A italic_γ ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ . end_CELL end_ROW (42)

Then, from (42) and the Cauchy-Schwarz inequality we have

L(x~(t)),i=1Npih(xi(t+1);x~(t))+i=1Npi(li(x~(t))li(xi(t+1)))L(x~(t))[(Aγ+ρA(1+γ)μ¯)L(x~(t))]=(Aγ+ρA(1+γ)μ¯)L(x~(t))2𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑙𝑖superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1delimited-∥∥𝐿superscript~𝑥𝑡delimited-[]𝐴𝛾𝜌𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡𝐴𝛾𝜌𝐴1𝛾¯𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2\begin{split}&\Big{\langle}\nabla L(\tilde{x}^{(t)}),\sum_{i=1}^{N}p_{i}\nabla h% (x_{i}^{(t+1)};\tilde{x}^{(t)})\\ &+\sum_{i=1}^{N}p_{i}\left(\nabla l_{i}(\tilde{x}^{(t)})\right.-\left.\nabla l% _{i}(x_{i}^{(t+1)})\right)\Big{\rangle}\leqslant\|\nabla L(\tilde{x}^{(t)})\|% \\ &\left[\left(A\gamma\ +\frac{\rho A(1+\gamma)}{\overline{\mu}}\right)\|\nabla L% (\tilde{x}^{(t)})\|\right]\\ &=\left(A\gamma\ +\frac{\rho A(1+\gamma)}{\overline{\mu}}\right)\|\nabla L(% \tilde{x}^{(t)})\|^{2}\end{split}start_ROW start_CELL end_CELL start_CELL ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ⟩ ⩽ ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL [ ( italic_A italic_γ + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( italic_A italic_γ + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW (43)

Substituting (40) and (43) into (34) yields

𝔼{L(x~(t+1))L(x~(t))}1μL(x~(t))2+(Aγμ+ρA(1+γ)μμ¯)L(x~(t))2+𝔼{L(x~(t))n(t+1)}+ρ2𝔼{[A(1+γ)μ¯L(x~(t))+n(t+1)]2}.𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡1𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝐴𝛾𝜇𝜌𝐴1𝛾𝜇¯𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝔼delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-[]𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡12\begin{split}&\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant-% \frac{1}{\mu}\|\nabla L(\tilde{x}^{(t)})\|^{2}\\ &+\left(\frac{A\gamma}{\mu}+\frac{\rho A(1+\gamma)}{\mu\overline{\mu}}\right)% \|\nabla L(\tilde{x}^{(t)})\|^{2}\\ &+\mathbb{E}\{\|\nabla L(\tilde{x}^{(t)})\|\|n^{(t+1)}\|\}\\ &+\frac{\rho}{2}\mathbb{E}\left\{\left[\frac{A(1+\gamma)}{\overline{\mu}}\|% \nabla L(\tilde{x}^{(t)})\|+\|n^{(t+1)}\|\right]^{2}\right\}.\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( divide start_ARG italic_A italic_γ end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG italic_μ over¯ start_ARG italic_μ end_ARG end_ARG ) ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + blackboard_E { ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { [ divide start_ARG italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . end_CELL end_ROW (44)

Then, we obtain

𝔼{L(x~(t+1))L(x~(t))}λ2L(x~(t))2+λ1𝔼{n(t+1)}L(x~(t))+λ0𝔼{n(t+1)2},𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡subscript𝜆2superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2subscript𝜆1𝔼delimited-∥∥superscript𝑛𝑡1delimited-∥∥𝐿superscript~𝑥𝑡subscript𝜆0𝔼superscriptdelimited-∥∥superscript𝑛𝑡12\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant\lambda_{2}\|L(% \tilde{x}^{(t)})\|^{2}\\ +\lambda_{1}\mathbb{E}\{\|n^{(t+1)}\|\}\|L(\tilde{x}^{(t)})\|+\lambda_{0}% \mathbb{E}\{\|n^{(t+1)}\|^{2}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW (45)

where

λ2=1μ+Aμ(γ+ρ(1+γ)μ¯)+ρA2(1+γ)22μ¯2,subscript𝜆21𝜇𝐴𝜇𝛾𝜌1𝛾¯𝜇𝜌superscript𝐴2superscript1𝛾22superscript¯𝜇2\lambda_{2}=-\frac{1}{\mu}+\frac{A}{\mu}\left(\gamma+\frac{\rho(1+\gamma)}{% \overline{\mu}}\right)+\frac{\rho A^{2}{(1+\gamma)}^{2}}{2{\overline{\mu}}^{2}},italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_A end_ARG start_ARG italic_μ end_ARG ( italic_γ + divide start_ARG italic_ρ ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
λ1=1+ρA(1+γ)μ¯ and λ0=ρ2subscript𝜆11𝜌𝐴1𝛾¯𝜇 and subscript𝜆0𝜌2\lambda_{1}=1+\frac{\rho A(1+\gamma)}{\overline{\mu}}\text{ and }\lambda_{0}=% \frac{\rho}{2}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 + divide start_ARG italic_ρ italic_A ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG and italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG

This completes the proof. ∎

As expected, lemma 2222 indicates the adverse effect of differential privacy in the expected per-iteration increment of the global loss value. ….

As the final step, we use the per-iteration increment to establish the convergence analysis of the proposed algorithm.

Theorem 2 (Convergence upper bound of personalized ….).

The upper limit of the difference between the T𝑇Titalic_T-th and the optimal loss function values defined as the convergence property is given by

𝔼{L(x~(T))L(x)}Θ+k2T+k1T2ϵ+k0T3ϵ2,𝔼𝐿superscript~𝑥𝑇𝐿superscript𝑥Θsubscript𝑘2𝑇subscript𝑘1superscript𝑇2italic-ϵsubscript𝑘0superscript𝑇3superscriptitalic-ϵ2\begin{split}&\mathbb{E}\{{L(\tilde{x}^{(T)})-L(x^{*})}\}\leqslant\Theta+k_{2}% T+\frac{k_{1}T^{2}}{\epsilon}+\frac{k_{0}T^{3}}{\epsilon^{2}},\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) - italic_L ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ⩽ roman_Θ + italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T + divide start_ARG italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG + divide start_ARG italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (46)

where k2=λ2β2,k1=2λ1βBcmax{pi}max{mi}2Nπ, and k0=4λ0B2c2max{pi}2max{mi}2k_{2}=\lambda_{2}\beta^{2},\,k_{1}=\frac{2\lambda_{1}\beta Bc\max\{p_{i}\}}{% \max\{m_{i}\}}\sqrt{\frac{2N}{\pi}},\text{ and }k_{0}=\frac{4\lambda_{0}B^{2}c% ^{2}{\max\{p_{i}\}}^{2}}{{\max\{m_{i}\}}^{2}}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β italic_B italic_c roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG square-root start_ARG divide start_ARG 2 italic_N end_ARG start_ARG italic_π end_ARG end_ARG , and italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 4 italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Proof.

Considering the same and independent noise distribution of the additive noise, we define 𝔼{n(t)}=𝔼{n} and 𝔼{n(t)2}=𝔼{n2}𝔼normsuperscript𝑛𝑡𝔼norm𝑛 and 𝔼superscriptnormsuperscript𝑛𝑡2𝔼superscriptnorm𝑛2\mathbb{E}\{\|n^{(t)}\|\}=\mathbb{E}\{\|n\|\}\text{ and }\mathbb{E}\{\|n^{(t)}% \|^{2}\}=\mathbb{E}\{\|n\|^{2}\}blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ } = blackboard_E { ∥ italic_n ∥ } and blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = blackboard_E { ∥ italic_n ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. Applying (45) recursively for 0tT0𝑡𝑇0\leqslant t\leqslant T0 ⩽ italic_t ⩽ italic_T yields

𝔼{L(x~(T))L(x~(0))}Tλ2L(x~(t))2+Tλ1L(x~(t))𝔼{n}+Tλ0𝔼{n2},𝔼𝐿superscript~𝑥𝑇𝐿superscript~𝑥0𝑇subscript𝜆2superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝑇subscript𝜆1delimited-∥∥𝐿superscript~𝑥𝑡𝔼delimited-∥∥𝑛𝑇subscript𝜆0𝔼superscriptdelimited-∥∥𝑛2\mathbb{E}\{{L(\tilde{x}^{(T)})-L(\tilde{x}^{(0)})}\}\leqslant T\lambda_{2}\|L% (\tilde{x}^{(t)})\|^{2}\\ +T\lambda_{1}\|L(\tilde{x}^{(t)})\|\mathbb{E}\{\|n\|\}+T\lambda_{0}\mathbb{E}% \{\|n\|^{2}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) } ⩽ italic_T italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + italic_T italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ blackboard_E { ∥ italic_n ∥ } + italic_T italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW (47)

Considering L(x~(t))βnorm𝐿superscript~𝑥𝑡𝛽\|L(\tilde{x}^{(t)})\|\leqslant\beta∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ⩽ italic_β and Adding 𝔼{L(x~(0))L(x)}𝔼𝐿superscript~𝑥0𝐿superscript𝑥\mathbb{E}\{L(\tilde{x}^{(0)})-L(x^{*})\}blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) - italic_L ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } to both sides of (47), we have

𝔼{L(w~(T))L(w)}Θ+λ2Tβ2+λ1Tβ𝔼{n}+λ0T𝔼{n2},𝔼𝐿superscript~𝑤𝑇𝐿superscript𝑤Θsubscript𝜆2𝑇superscript𝛽2subscript𝜆1𝑇𝛽𝔼delimited-∥∥𝑛subscript𝜆0𝑇𝔼superscriptdelimited-∥∥𝑛2\mathbb{E}\{{L(\tilde{w}^{(T)})-L(w^{*})}\}\leqslant\Theta+\lambda_{2}T\beta^{% 2}\\ +\lambda_{1}T\beta\mathbb{E}\{\|n\|\}+\lambda_{0}T\mathbb{E}\{\|n\|^{2}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_w end_ARG start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) - italic_L ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ⩽ roman_Θ + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T italic_β blackboard_E { ∥ italic_n ∥ } + italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_T blackboard_E { ∥ italic_n ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW (48)

Since we have σA=ΔfTcϵsubscript𝜎𝐴Δ𝑓𝑇𝑐italic-ϵ\sigma_{A}=\frac{\Delta fTc}{\epsilon}italic_σ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = divide start_ARG roman_Δ italic_f italic_T italic_c end_ARG start_ARG italic_ϵ end_ARG, we obtain

𝔼{n}=ΔfTcϵ2Nπ and 𝔼{n2}=Δf2T2c2Nϵ2.𝔼norm𝑛Δ𝑓𝑇𝑐italic-ϵ2𝑁𝜋 and 𝔼superscriptnorm𝑛2Δsuperscript𝑓2superscript𝑇2superscript𝑐2𝑁superscriptitalic-ϵ2\mathbb{E}\{\|n\|\}=\frac{\Delta fTc}{\epsilon}\sqrt{\frac{2N}{\pi}}\text{ and% }\mathbb{E}\{\|n\|^{2}\}=\frac{\Delta f^{2}T^{2}c^{2}N}{\epsilon^{2}}.blackboard_E { ∥ italic_n ∥ } = divide start_ARG roman_Δ italic_f italic_T italic_c end_ARG start_ARG italic_ϵ end_ARG square-root start_ARG divide start_ARG 2 italic_N end_ARG start_ARG italic_π end_ARG end_ARG and blackboard_E { ∥ italic_n ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = divide start_ARG roman_Δ italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (49)

Setting Δf=2Bmax{pi}max{mi}Δ𝑓2𝐵subscript𝑝𝑖subscript𝑚𝑖\Delta f=2B\frac{\max\{p_{i}\}}{\max\{m_{i}\}}roman_Δ italic_f = 2 italic_B divide start_ARG roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG and substituting (49) into (48), we have

𝔼{L(x~(T))L(x)}Θ+λ2Tβ2+2λ1T2βBcmax{pi}ϵmax{mi}2Nπ+4λ0T3B2c2max{pi}2ϵ2max{mi}2=Θ+k2T+k1T2ϵ+k0T3ϵ2,\begin{split}&\mathbb{E}\{{L(\tilde{x}^{(T)})-L(x^{*})}\}\leqslant\Theta+% \lambda_{2}T\beta^{2}\\ &+\frac{2\lambda_{1}T^{2}\beta Bc\max\{p_{i}\}}{\epsilon\max\{m_{i}\}}\sqrt{% \frac{2N}{\pi}}+\frac{4\lambda_{0}T^{3}B^{2}c^{2}{\max\{p_{i}\}}^{2}}{\epsilon% ^{2}{\max\{m_{i}\}}^{2}}\\ &=\Theta+k_{2}T+\frac{k_{1}T^{2}}{\epsilon}+\frac{k_{0}T^{3}}{\epsilon^{2}},% \end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) - italic_L ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ⩽ roman_Θ + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG 2 italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β italic_B italic_c roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG italic_ϵ roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG square-root start_ARG divide start_ARG 2 italic_N end_ARG start_ARG italic_π end_ARG end_ARG + divide start_ARG 4 italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_Θ + italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T + divide start_ARG italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG + divide start_ARG italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (50)

where k2=λ2β2,k1=2λ1βBcmax{pi}max{mi}2Nπ, and k0=4λ0B2c2max{pi}2max{mi}2k_{2}=\lambda_{2}\beta^{2},k_{1}=\frac{2\lambda_{1}\beta Bc\max\{p_{i}\}}{\max% \{m_{i}\}}\sqrt{\frac{2N}{\pi}},\text{ and }k_{0}=\frac{4\lambda_{0}B^{2}c^{2}% {\max\{p_{i}\}}^{2}}{{\max\{m_{i}\}}^{2}}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β italic_B italic_c roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG square-root start_ARG divide start_ARG 2 italic_N end_ARG start_ARG italic_π end_ARG end_ARG , and italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 4 italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. This completes the proof. ∎

The last two terms in the right hand side of (46) depend directly on the amount of noise. lower ϵitalic-ϵ\epsilonitalic_ϵ values strengthen the privacy protection and adversely affect the convergence property. The first two terms, however, are the constant parts depending on the number of iterations. …

In the above analysis, we saw that by a wise choice of impact factors, T𝑇Titalic_T, and N𝑁Nitalic_N we can be confident about the convergence of the FL algorithm while (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP is used. The number of clients involved in learning in the presented analysis should not necessarily be fixed through training, and this enhances the compatibility of the proposed approach. In the next section, we present the analysis of the same algorithm when impact factors adaptively change throughout the learning process.

V Convergence Analysis of
DP in FL with adaptive impact factors

In this section, we consider an extension to the previous part when impact factors are not fixed during the training. In fact, impacts assigned to clients can vary in each iteration based on the devises’ resources or network conditions. The calculated amount of Gaussian noise in section 3333 can still be utilized here, since iterations are independent in noise generation. However, the convergence analysis provided in the previous section needs to be more generalized.

Here, we change pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to pi(t)superscriptsubscript𝑝𝑖𝑡p_{i}^{(t)}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT to represent this adaptability in our equations. Without loss of generality, we assume the relation between two consecutive impact factors to be

pi(t+1)=pi(t)+αi(t)i,superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑝𝑖𝑡superscriptsubscript𝛼𝑖𝑡for-all𝑖p_{i}^{(t+1)}=p_{i}^{(t)}+\alpha_{i}^{(t)}\quad\forall i,italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∀ italic_i , (51)

where αi(t)superscriptsubscript𝛼𝑖𝑡\alpha_{i}^{(t)}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is the amount of change that the relative impact factor assigned to i𝑖iitalic_i-th client undergoes for (t+1)𝑡1(t+1)( italic_t + 1 )-th iteration. Hence, |αi|1subscript𝛼𝑖1|\alpha_{i}|\leqslant 1| italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ⩽ 1 and i=1Nαi(t)=0superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡0\sum_{i=1}^{N}\alpha_{i}^{(t)}=0∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = 0.

In order to perform the analysis of the adaptive form, we first present an extension to lemma 2222 and then present the convergence upper bound in theorem 3333.

Lemma 3 (Per-iteration expected increment: Extension).

The per-iteration expected increment in the value of the loss function, when adaptive pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is adopted, has the following upper limit:

𝔼{L(x~(t+1))L(x~(t))}λ2L(x~(t))2+λ1𝔼{n(t+1)}L(x~(t))+λ0𝔼{n(t+1)2}+12max{li},𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡subscriptsuperscript𝜆2superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2subscriptsuperscript𝜆1𝔼delimited-∥∥superscript𝑛𝑡1delimited-∥∥𝐿superscript~𝑥𝑡subscriptsuperscript𝜆0𝔼superscriptdelimited-∥∥superscript𝑛𝑡1212subscript𝑙𝑖\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant\\ \lambda^{\prime}_{2}\|L(\tilde{x}^{(t)})\|^{2}+\lambda^{\prime}_{1}\mathbb{E}% \{\|n^{(t+1)}\|\}\|L(\tilde{x}^{(t)})\|\\ +\lambda^{\prime}_{0}\mathbb{E}\{\|n^{(t+1)}\|^{2}\}+\frac{1}{2}\max\{l_{i}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL + italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } , end_CELL end_ROW (52)

where

λ2=1μ+Aμ(γ+ρ(1+γ)μ¯)+ρA2(1+γ)22μ¯2,subscriptsuperscript𝜆21𝜇superscript𝐴𝜇𝛾𝜌1𝛾¯𝜇𝜌superscriptsuperscript𝐴2superscript1𝛾22superscript¯𝜇2\lambda^{\prime}_{2}=-\frac{1}{\mu}+\frac{A^{\prime}}{\mu}\left(\gamma+\frac{% \rho(1+\gamma)}{\overline{\mu}}\right)+\frac{\rho{A^{\prime}}^{2}{(1+\gamma)}^% {2}}{2{\overline{\mu}}^{2}},italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG ( italic_γ + divide start_ARG italic_ρ ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
λ1=1+ρA(1+γ)μ¯,λ0=ρ2,formulae-sequencesubscriptsuperscript𝜆11𝜌superscript𝐴1𝛾¯𝜇subscriptsuperscript𝜆0𝜌2\lambda^{\prime}_{1}=1+\frac{\rho A^{\prime}(1+\gamma)}{\overline{\mu}},% \lambda^{\prime}_{0}=\frac{\rho}{2},italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG , italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ,

and n(t)=i=1Npini(t)+ns(t)superscript𝑛𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝑛𝑠𝑡n^{(t)}=\sum_{i=1}^{N}p_{i}n_{i}^{(t)}+n_{s}^{(t)}italic_n start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is the aggregated noise of the clients and server in each cycle.

Proof.

From (15) we have

i=1Npi(t)li(x(t))L(x(t))A.superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡normsubscript𝑙𝑖superscript𝑥𝑡norm𝐿superscript𝑥𝑡𝐴\sum_{i=1}^{N}p_{i}^{(t)}\|\nabla l_{i}(x^{(t)})\|\leqslant\|\nabla L(x^{(t)})% \|A.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ⩽ ∥ ∇ italic_L ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ italic_A . (53)

Adding i=1Nαi(t)li(x(t))superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡normsubscript𝑙𝑖superscript𝑥𝑡\sum_{i=1}^{N}\alpha_{i}^{(t)}\|\nabla l_{i}(x^{(t)})\|∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ to both sides of (53) yields

i=1Npi(t+1)li(x(t))i=1Nαi(t)li(x(t))+L(x(t))Asuperscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1delimited-∥∥subscript𝑙𝑖superscript𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡delimited-∥∥subscript𝑙𝑖superscript𝑥𝑡delimited-∥∥𝐿superscript𝑥𝑡𝐴\sum_{i=1}^{N}p_{i}^{(t+1)}\|\nabla l_{i}(x^{(t)})\|\leqslant\\ \sum_{i=1}^{N}\alpha_{i}^{(t)}\|\nabla l_{i}(x^{(t)})\|+\|\nabla L(x^{(t)})\|Astart_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ⩽ end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ ∇ italic_L ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ italic_A end_CELL end_ROW (54)

Hence, we have

i=1Npi(t+1)li(x(t))L(x(t))A,superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1normsubscript𝑙𝑖superscript𝑥𝑡norm𝐿superscript𝑥𝑡superscript𝐴\sum_{i=1}^{N}p_{i}^{(t+1)}\|\nabla l_{i}(x^{(t)})\|\leqslant\|\nabla L(x^{(t)% })\|A^{\prime},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ⩽ ∥ ∇ italic_L ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (55)

where

A=i=1Nαi(t)li(x(t))L(x(t))+A.superscript𝐴superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡normsubscript𝑙𝑖superscript𝑥𝑡norm𝐿superscript𝑥𝑡𝐴A^{\prime}=\frac{\sum_{i=1}^{N}\alpha_{i}^{(t)}\|\nabla l_{i}(x^{(t)})\|}{\|% \nabla L(x^{(t)})\|}+A.italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_ARG start_ARG ∥ ∇ italic_L ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_ARG + italic_A . (56)

Therefore, we can bound x~(t+1)x~(t)normsuperscript~𝑥𝑡1superscript~𝑥𝑡\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ as

x~(t+1)x~(t)=x(t+1)+n(t+1)x~(t)x(t+1)x~(t)+n(t+1)=i=1Npi(t+1)(xi(t+1)x~(t))+n(t+1)i=1Npi(t+1)xi(t+1)x~(t)+n(t+1)i=1Npi(t+1)(1+γμ¯li(x~(t)))+n(t+1)A(1+γ)μ¯L(x~(t))+n(t+1).delimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑥𝑡1superscript𝑛𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑥𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1delimited-∥∥superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1delimited-∥∥superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡11𝛾¯𝜇delimited-∥∥subscript𝑙𝑖superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1superscript𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1\begin{split}\|\tilde{x}^{(t+1)}&-\tilde{x}^{(t)}\|=\|x^{(t+1)}+n^{(t+1)}-% \tilde{x}^{(t)}\|\\ &\leqslant\|x^{(t+1)}-\tilde{x}^{(t)}\|+\|n^{(t+1)}\|\\ &=\left\|\sum_{i=1}^{N}p_{i}^{(t+1)}(x_{i}^{(t+1)}-\tilde{x}^{(t)})\right\|+\|% n^{(t+1)}\|\\ &\leqslant\sum_{i=1}^{N}p_{i}^{(t+1)}\|x_{i}^{(t+1)}-\tilde{x}^{(t)}\|+\|n^{(t% +1)}\|\\ &\leqslant\sum_{i=1}^{N}p_{i}^{(t+1)}\left(\frac{1+\gamma}{\overline{\mu}}\|% \nabla l_{i}(\tilde{x}^{(t)})\|\right)+\|n^{(t+1)}\|\\ &\leqslant\frac{A^{\prime}(1+\gamma)}{\overline{\mu}}\|\nabla L(\tilde{x}^{(t)% })\|+\|n^{(t+1)}\|.\end{split}start_ROW start_CELL ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ = ∥ italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∥ italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( divide start_ARG 1 + italic_γ end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ) + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ divide start_ARG italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ . end_CELL end_ROW (57)

Summation of (26) multiplied with pi(t),isuperscriptsubscript𝑝𝑖𝑡for-all𝑖p_{i}^{(t)},\forall iitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , ∀ italic_i yields

i=1Npi(t)li(x~(t+1))i=1Npi(t)li(x~(t))+i=1Npi(t)li(x~(t))(x~(t+1)x~(t))+ρ2x~(t+1)x~(t)2i=1Npi(t).superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡subscript𝑙𝑖superscript~𝑥𝑡1superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡subscript𝑙𝑖superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡subscript𝑙𝑖superscriptsuperscript~𝑥𝑡topsuperscript~𝑥𝑡1superscript~𝑥𝑡𝜌2superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡\begin{split}&\sum_{i=1}^{N}p_{i}^{(t)}l_{i}(\tilde{x}^{(t+1)})\sum_{i=1}^{N}p% _{i}^{(t)}l_{i}(\tilde{x}^{(t)})\\ &+\sum_{i=1}^{N}p_{i}^{(t)}\nabla l_{i}(\tilde{x}^{(t)})^{\top}(\tilde{x}^{(t+% 1)}-\tilde{x}^{(t)})\\ &+\frac{\rho}{2}\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\sum_{i=1}^{N}p_{i}^{% (t)}.\end{split}start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT . end_CELL end_ROW (58)

Considering (51), we have

L(x~(t+1))L(x~(t))i=1Nαi(t)li(x~(t+1))+L(x~(t)),(x~(t+1)x~(t))+ρ2{x~(t+1)x~(t)2}.𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡subscript𝑙𝑖superscript~𝑥𝑡1𝐿superscript~𝑥𝑡superscript~𝑥𝑡1superscript~𝑥𝑡𝜌2superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2\begin{split}&L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\leqslant\sum_{i=1}^{N}% \alpha_{i}^{(t)}l_{i}(\tilde{x}^{(t+1)})\\ &+\left\langle\nabla L(\tilde{x}^{(t)}),(\tilde{x}^{(t+1)}-\tilde{x}^{(t)})% \right\rangle+\frac{\rho}{2}\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\}.\end% {split}start_ROW start_CELL end_CELL start_CELL italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⩽ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . end_CELL end_ROW (59)

Without loss of generality, we assume 𝔼{li(x(t))}=1Nli(x(t))𝔼subscript𝑙𝑖superscript𝑥𝑡1𝑁subscript𝑙𝑖superscript𝑥𝑡\mathbb{E}\{l_{i}(x^{(t)})\}=\frac{1}{N}l_{i}(x^{(t)})blackboard_E { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ), and therefore

𝔼{i=1Nαi(t)li(x(t))}=i=1Nαi(t)𝔼{li(x(t))}=i=1Nαi(t)(1Nli(x(t)))12max{li(x(t))}𝔼superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡subscript𝑙𝑖superscript𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡𝔼subscript𝑙𝑖superscript𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝛼𝑖𝑡1𝑁subscript𝑙𝑖superscript𝑥𝑡12subscript𝑙𝑖superscript𝑥𝑡\begin{split}\mathbb{E}\left\{\sum_{i=1}^{N}\alpha_{i}^{(t)}l_{i}(x^{(t)})% \right\}&=\sum_{i=1}^{N}\alpha_{i}^{(t)}\mathbb{E}\left\{l_{i}(x^{(t)})\right% \}\\ &=\sum_{i=1}^{N}\alpha_{i}^{(t)}\left(\frac{1}{N}l_{i}(x^{(t)})\right)\\ &\leqslant\frac{1}{2}\max\{l_{i}(x^{(t)})\}\end{split}start_ROW start_CELL blackboard_E { ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT blackboard_E { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } end_CELL end_ROW (60)

Then, (59) and (60) gives

𝔼{L(x~(t+1))L(x~(t))}12max{li}+𝔼{L(x~(t)),(x~(t+1)x~(t))}+ρ2𝔼{x~(t+1)x~(t)2}.𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡12subscript𝑙𝑖𝔼𝐿superscript~𝑥𝑡superscript~𝑥𝑡1superscript~𝑥𝑡𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡2\mathbb{E}\left\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\right\}\leqslant\\ \frac{1}{2}\max\{l_{i}\}+\mathbb{E}\left\{\left\langle\nabla L(\tilde{x}^{(t)}% ),(\tilde{x}^{(t+1)}-\tilde{x}^{(t)})\right\rangle\right\}\\ +\frac{\rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}% \right\}.start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } + blackboard_E { ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ } end_CELL end_ROW start_ROW start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . end_CELL end_ROW (61)

Defining h()h(\cdot)italic_h ( ⋅ ) as (30) and Summation of (31) multiplied with pi(t+1),isuperscriptsubscript𝑝𝑖𝑡1for-all𝑖p_{i}^{(t+1)},\,\forall iitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT , ∀ italic_i yields

i=1Npi(t+1)h(xi(t+1);x~(t))=i=1Npi(t+1)li(xi(t+1))+μi=1Npi(t+1)(xi(t+1)x~(t))=i=1Npi(t+1)li(xi(t+1))+μi=1Npi(t+1)xi(t+1)μx~(t)superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝜇superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1𝜇superscript~𝑥𝑡\sum_{i=1}^{N}p_{i}^{(t+1)}\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})=\sum_{i=1}^% {N}p_{i}^{(t+1)}\nabla l_{i}(x_{i}^{(t+1)})\\ +\mu\sum_{i=1}^{N}p_{i}^{(t+1)}(x_{i}^{(t+1)}-\tilde{x}^{(t)})=\sum_{i=1}^{N}p% _{i}^{(t+1)}\nabla l_{i}(x_{i}^{(t+1)})\\ +\mu\sum_{i=1}^{N}p_{i}^{(t+1)}x_{i}^{(t+1)}-\mu\tilde{x}^{(t)}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + italic_μ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - italic_μ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_CELL end_ROW (62)

and therefore,

x~(t+1)x~(t)=i=1Npi(t+1)xi(t+1)+n(t+1)x~(t)=1μ[i=1Npi(t+1)(h(xi(t+1);x~(t))li(xi(t+1)))]+n(t+1).superscript~𝑥𝑡1superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript𝑛𝑡1superscript~𝑥𝑡1𝜇delimited-[]superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1superscript𝑛𝑡1\begin{split}&\tilde{x}^{(t+1)}-\tilde{x}^{(t)}=\sum_{i=1}^{N}p_{i}^{(t+1)}x_{% i}^{(t+1)}+n^{(t+1)}-\tilde{x}^{(t)}\\ &=\frac{1}{\mu}\left[\sum_{i=1}^{N}p_{i}^{(t+1)}\left(\nabla h(x_{i}^{(t+1)};% \tilde{x}^{(t)})-\nabla l_{i}(x_{i}^{(t+1)})\right)\right]\\ &+n^{(t+1)}.\end{split}start_ROW start_CELL end_CELL start_CELL over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT . end_CELL end_ROW (63)

Substituting (63) into (61), we obtain

𝔼{L(x~(t+1))L(x~(t))}𝔼{1μL(x~(t)),i=1Npi(t+1)(h(xi(t+1);x~(t))li(xi(t+1)))+L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}+12max{li}=𝔼{1μL(x~(t)),i=1Npi(t+1)(h(xi(t+1);x~(t))li(xi(t+1)))+i=1Npi(t)li(x~(t))i=1Npi(t)li(x~(t))+L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}+12max{li}=1μL(x~(t))2+𝔼{1μL(x~(t)),i=1Npi(t+1)h(xi(t+1);x~(t))L(xi(t+1))+L(x~(t))}+𝔼{L(x~(t)),n(t+1)}+ρ2𝔼{x~(t+1)x~(t)2}+12max{li}𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡212subscript𝑙𝑖𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡subscript𝑙𝑖superscriptsubscript𝑥𝑖𝑡1superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡subscript𝑙𝑖superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡subscript𝑙𝑖superscript~𝑥𝑡𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡212subscript𝑙𝑖1𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝔼1𝜇𝐿superscript~𝑥𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡𝐿superscriptsubscript𝑥𝑖𝑡1𝐿superscript~𝑥𝑡𝔼𝐿superscript~𝑥𝑡superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-∥∥superscript~𝑥𝑡1superscript~𝑥𝑡212subscript𝑙𝑖\begin{split}&\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant% \mathbb{E}\Bigg{\{}\frac{1}{\mu}\Big{\langle}\nabla L(\tilde{x}^{(t)}),\\ &\sum_{i=1}^{N}p_{i}^{(t+1)}\left(\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})-% \nabla l_{i}(x_{i}^{(t+1)})\right)\Big{\rangle}\\ &+\left\langle\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\right\rangle\Bigg{\}}+\frac{% \rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\right\}\\ &+\frac{1}{2}\max\{l_{i}\}=\mathbb{E}\Bigg{\{}\frac{1}{\mu}\Big{\langle}\nabla L% (\tilde{x}^{(t)}),\\ &\sum_{i=1}^{N}p_{i}^{(t+1)}\left(\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})-% \nabla l_{i}(x_{i}^{(t+1)})\right)\\ &+\sum_{i=1}^{N}p_{i}^{(t)}\nabla l_{i}(\tilde{x}^{(t)})-\sum_{i=1}^{N}p_{i}^{% (t)}\nabla l_{i}(\tilde{x}^{(t)})\Big{\rangle}\\ &+\Big{\langle}\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\Big{\rangle}\Bigg{\}}+\frac% {\rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}\right\}\\ &+\frac{1}{2}\max\{l_{i}\}=-\frac{1}{\mu}\|\nabla L(\tilde{x}^{(t)})\|^{2}+% \mathbb{E}\Bigg{\{}\frac{1}{\mu}\Big{\langle}\nabla L(\tilde{x}^{(t)}),\\ &\sum_{i=1}^{N}p_{i}^{(t+1)}\nabla h(x_{i}^{(t+1)};\tilde{x}^{(t)})-\nabla L(x% _{i}^{(t+1)})\\ &+\nabla L(\tilde{x}^{(t)})\Big{\rangle}\Bigg{\}}+\mathbb{E}\left\{\Big{% \langle}\nabla L(\tilde{x}^{(t)}),n^{(t+1)}\Big{\rangle}\right\}\\ &+\frac{\rho}{2}\mathbb{E}\left\{\|\tilde{x}^{(t+1)}-\tilde{x}^{(t)}\|^{2}% \right\}+\frac{1}{2}\max\{l_{i}\}\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ⟩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∇ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E { divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_L ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ⟩ } + blackboard_E { ⟨ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ⟩ } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_CELL end_ROW (64)

ρ𝜌\rhoitalic_ρ-Lipschitzity of local loss functions leads to have a ρ𝜌\rhoitalic_ρ-Lipschitz global loss function. Hence,

L(x~(t)L(xi(t+1))ρx~(t)xi(t+1)\|\nabla L(\tilde{x}^{(t)}-\nabla L(x_{i}^{(t+1)})\|\leqslant\rho\|\tilde{x}^{% (t)}-x_{i}^{(t+1)}\|∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - ∇ italic_L ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ∥ ⩽ italic_ρ ∥ over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ (65)

Therefore, using triangle inequality, (55), and (65) we obtain

i=1Npi(t+1)h(xi(t+1);x~(t))+L(x~(t))L(xi(t+1))i=1Npi(t+1)h(xi(t+1);x~(t))+i=1Npi(t+1)(L(x~(t))L(xi(t+1)))(Aγ+ρA(1+γ)μ¯)L(x~(t))delimited-∥∥superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡𝐿superscript~𝑥𝑡𝐿superscriptsubscript𝑥𝑖𝑡1delimited-∥∥superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1superscriptsubscript𝑥𝑖𝑡1superscript~𝑥𝑡delimited-∥∥superscriptsubscript𝑖1𝑁superscriptsubscript𝑝𝑖𝑡1𝐿superscript~𝑥𝑡𝐿superscriptsubscript𝑥𝑖𝑡1superscript𝐴𝛾𝜌superscript𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡\begin{split}&\left\|\sum_{i=1}^{N}p_{i}^{(t+1)}\nabla h(x_{i}^{(t+1)};\tilde{% x}^{(t)})+\nabla L(\tilde{x}^{(t)})-\nabla L(x_{i}^{(t+1)})\right\|\\ &\leqslant\left\|\sum_{i=1}^{N}p_{i}^{(t+1)}\nabla h(x_{i}^{(t+1)};\tilde{x}^{% (t)})\right\|+\left\|\sum_{i=1}^{N}p_{i}^{(t+1)}\left(\nabla L(\tilde{x}^{(t)}% )\right.\right.\\ &\left.\left.-\nabla L(x_{i}^{(t+1)})\right)\right\|\leqslant\left(A^{\prime}% \gamma+\frac{\rho A^{\prime}(1+\gamma)}{\overline{\mu}}\right)\|\nabla L(% \tilde{x}^{(t)})\|\end{split}start_ROW start_CELL end_CELL start_CELL ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) - ∇ italic_L ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⩽ ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∇ italic_h ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ; over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ( ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - ∇ italic_L ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) ) ∥ ⩽ ( italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_γ + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW (66)

Substituting (66) and (64) into (34) yields

𝔼{L(x~(t+1))L(x~(t))}1μL(x~(t))2+(Aγμ+ρA(1+γ)μμ¯)L(x~(t))2+𝔼{L(x~(t))n(t+1)}+ρ2𝔼{[A(1+γ)μ¯L(x~(t))+n(t+1)]2}+12max{li}.𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡1𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2superscript𝐴𝛾𝜇𝜌superscript𝐴1𝛾𝜇¯𝜇superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2𝔼delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1𝜌2𝔼superscriptdelimited-[]superscript𝐴1𝛾¯𝜇delimited-∥∥𝐿superscript~𝑥𝑡delimited-∥∥superscript𝑛𝑡1212subscript𝑙𝑖\begin{split}&\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant-% \frac{1}{\mu}\|\nabla L(\tilde{x}^{(t)})\|^{2}\\ &+\left(\frac{A^{\prime}\gamma}{\mu}+\frac{\rho A^{\prime}(1+\gamma)}{\mu% \overline{\mu}}\right)\|\nabla L(\tilde{x}^{(t)})\|^{2}\\ &+\mathbb{E}\{\|\nabla L(\tilde{x}^{(t)})\|\|n^{(t+1)}\|\}\\ &+\frac{\rho}{2}\mathbb{E}\left\{\left[\frac{A^{\prime}(1+\gamma)}{\overline{% \mu}}\|\nabla L(\tilde{x}^{(t)})\|+\|n^{(t+1)}\|\right]^{2}\right\}\\ &+\frac{1}{2}\max\{l_{i}\}.\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( divide start_ARG italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_γ end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG italic_μ over¯ start_ARG italic_μ end_ARG end_ARG ) ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + blackboard_E { ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG blackboard_E { [ divide start_ARG italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ∥ ∇ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ + ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } . end_CELL end_ROW (67)

And, we get

𝔼{L(x~(t+1))L(x~(t))}λ2L(x~(t))2+λ1𝔼{n(t+1)}L(x~(t))+λ0𝔼{n(t+1)2}+12max{li},𝔼𝐿superscript~𝑥𝑡1𝐿superscript~𝑥𝑡subscriptsuperscript𝜆2superscriptdelimited-∥∥𝐿superscript~𝑥𝑡2subscriptsuperscript𝜆1𝔼delimited-∥∥superscript𝑛𝑡1delimited-∥∥𝐿superscript~𝑥𝑡subscriptsuperscript𝜆0𝔼superscriptdelimited-∥∥superscript𝑛𝑡1212subscript𝑙𝑖\mathbb{E}\{L(\tilde{x}^{(t+1)})-L(\tilde{x}^{(t)})\}\leqslant\\ \lambda^{\prime}_{2}\|L(\tilde{x}^{(t)})\|^{2}+\lambda^{\prime}_{1}\mathbb{E}% \{\|n^{(t+1)}\|\}\|L(\tilde{x}^{(t)})\|\\ +\lambda^{\prime}_{0}\mathbb{E}\{\|n^{(t+1)}\|^{2}\}+\frac{1}{2}\max\{l_{i}\},start_ROW start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) - italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) } ⩽ end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ } ∥ italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ end_CELL end_ROW start_ROW start_CELL + italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT blackboard_E { ∥ italic_n start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } , end_CELL end_ROW (68)

where

λ2=1μ+Aμ(γ+ρ(1+γ)μ¯)+ρA2(1+γ)22μ¯2,subscriptsuperscript𝜆21𝜇superscript𝐴𝜇𝛾𝜌1𝛾¯𝜇𝜌superscriptsuperscript𝐴2superscript1𝛾22superscript¯𝜇2\lambda^{\prime}_{2}=-\frac{1}{\mu}+\frac{A^{\prime}}{\mu}\left(\gamma+\frac{% \rho(1+\gamma)}{\overline{\mu}}\right)+\frac{\rho{A^{\prime}}^{2}{(1+\gamma)}^% {2}}{2{\overline{\mu}}^{2}},italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG + divide start_ARG italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG ( italic_γ + divide start_ARG italic_ρ ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG ) + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
λ1=1+ρA(1+γ)μ¯,λ0=ρ2.formulae-sequencesubscriptsuperscript𝜆11𝜌superscript𝐴1𝛾¯𝜇subscriptsuperscript𝜆0𝜌2\lambda^{\prime}_{1}=1+\frac{\rho A^{\prime}(1+\gamma)}{\overline{\mu}},% \lambda^{\prime}_{0}=\frac{\rho}{2}.italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 + divide start_ARG italic_ρ italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 + italic_γ ) end_ARG start_ARG over¯ start_ARG italic_μ end_ARG end_ARG , italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG .

This completes the proof. ∎

Theorem 3 (Convergence upper bound of adaptive personalized ….).

Using adaptive pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT assignment, the upper limit of the difference between the T𝑇Titalic_T-th and the optimal loss function values defined as the convergence property is given by

𝔼{L(x~(T))L(x)}Θ+k2T+k1T2ϵ+k0T3ϵ2,𝔼𝐿superscript~𝑥𝑇𝐿superscript𝑥Θsubscriptsuperscript𝑘2𝑇superscriptsubscript𝑘1superscript𝑇2italic-ϵsuperscriptsubscript𝑘0superscript𝑇3superscriptitalic-ϵ2\begin{split}&\mathbb{E}\{{L(\tilde{x}^{(T)})-L(x^{*})}\}\leqslant\Theta+k^{% \prime}_{2}T+\frac{k_{1}^{\prime}T^{2}}{\epsilon}+\frac{k_{0}^{\prime}T^{3}}{% \epsilon^{2}},\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E { italic_L ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT ) - italic_L ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ⩽ roman_Θ + italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T + divide start_ARG italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG + divide start_ARG italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (69)

where k2=λ2β2+max{li}2,k1=2λ1βBcmax{pi}max{mi}2Nπ, and k0=4λ0B2c2max{pi}2max{mi}2k^{\prime}_{2}=\lambda^{\prime}_{2}\beta^{2}+\frac{\max\{l_{i}\}}{2},\,k^{% \prime}_{1}=\frac{2\lambda^{\prime}_{1}\beta Bc\max\{p_{i}\}}{\max\{m_{i}\}}% \sqrt{\frac{2N}{\pi}},\text{ and }k^{\prime}_{0}=\frac{4\lambda^{\prime}_{0}B^% {2}c^{2}{\max\{p_{i}\}}^{2}}{{\max\{m_{i}\}}^{2}}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG roman_max { italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG 2 end_ARG , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β italic_B italic_c roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG square-root start_ARG divide start_ARG 2 italic_N end_ARG start_ARG italic_π end_ARG end_ARG , and italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 4 italic_λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_max { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Proof.

The proof can be easily extended from the proof for Theorem 2222 and using lemma 3333. ∎

VI Simulation Results

In this section we evaluate our approach against different privacy budgets and impact factors. We present four scenarios to study the effect of noise and impact factors on convergence bound and accuracy of the models.

VI-A Experimental Setting

We evaluate our approach on the real-world Modified National Institute of Standards and Technology (MNIST) dataset [38]. MNIST is a widely used dataset for handwritten digit identification which is consisted of 60000 training and 10000 testing samples. We use a multi-layer perceptron (MLP) neural network in local clients and the model weights are communicated with the server for aggregation at each cycle. The designed MLP model classifies the input images using a ReLU activation function in the hidden layer and softmax of 10 classes in the output layer. To proceed the SGD algorithm in the local optimizers, we set the learning rate equal to 0.020.020.020.02.

We stablish our evaluation using four scenarios listed in Table I. A small randomly chosen subset of the MNIST is distributed between the clients non-identically in each scenario between 60606060 clients. The dataset is purposely reduced in size to avoid overfitting. The personalized DP noise is injected in both the client-side and server-side, and the affect of non-identical impact factors during the aggregation process is checked for each scenario. We set δ=0.01𝛿0.01\delta=0.01italic_δ = 0.01 for the privacy budget, and choose different protection levels (ϵitalic-ϵ\epsilonitalic_ϵ) throughout this experiment for 30303030 global iterations. We further discuss each scenario in the following section.

Table I: Simulation Scenarios
number of
Scenario clients (N𝑁Nitalic_N) Description
Part 1::1absent1:1 : 20202020 clients with severe noisy data
1 60606060 Part 2::2absent2:2 : 20202020 clients with moderate noisy data
Part 3::3absent3:3 : 20202020 clients without noise
Part 1::1absent1:1 : 20202020 clients with severe noisy data
2 60606060 Part 2::2absent2:2 : 35353535 clients with slight noisy data
Part 3::3absent3:3 : 5555 clients without noise
Part 1::1absent1:1 : 20202020 clients with 50505050 samples
3 60606060 Part 2::2absent2:2 : 20202020 clients with 120120120120 samples
Part 3::3absent3:3 : 20202020 clients with 271271271271 samples
t10𝑡10t\leqslant 10italic_t ⩽ 10:
4 60606060 Part 1::1absent1:1 : 20202020 clients with severe noisy data
Part 2::2absent2:2 : 20202020 clients with moderate noisy data
Part 3::3absent3:3 : 20202020 clients with slight noisy data
10<t3010𝑡3010<t\leqslant 3010 < italic_t ⩽ 30 :
Part 1::1absent1:1 : 20202020 clients with slight noisy data
Part 2::2absent2:2 : 20202020 clients with moderate noisy data
Part 3::3absent3:3 : 20202020 clients with severe noisy data

VI-B Numerical Results

After distributing the dataset between 60606060 clients, we randomly divide clients into three parts. In the following items, the details of each scenario are presented, respectively.

  1. 1.

    scenario 1111 : In the first scenario, we apply the presented privacy protection scheme on clients with heterogeneous data quality. Clients are identical in terms of dataset size, or m=150𝑚150m=150italic_m = 150 for all parts, and we deliberately add salt-and-pepper noise with various densities to each part to change data quality between the clients. We first set different impact factors in the non-private mode for comparison. Fig. 3 depicts the importance of impact factors in FL model performances. As it is shown, equal pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (the green curve) leads to the worst accuracy. The sequence of numbers dedicated to each curve in Fig. 3 represents the relations between impact factors of the three parts. For instance, “0-1-2” means that we have set pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT equal to 0,1N,and2N01𝑁and2𝑁0,\,\frac{1}{N},\text{and}\frac{2}{N}0 , divide start_ARG 1 end_ARG start_ARG italic_N end_ARG , and divide start_ARG 2 end_ARG start_ARG italic_N end_ARG for the first, second, and third part, respectively.

    Considering “0-1-2” as the optimal ratio between the impact factors, Fig. 3 compares the results after applying DP. Here, Gaussian noise is injected using (8) and (13) for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 with non-identical impacts. As expected from (46), values of the loss function decrease for higher privacy protection levels. In this experiment, we also compare the results when identical pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is adopted for ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20. As shown in Fig. 3, the model performance using identical pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is even worse than a higher protection level ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5, when personalized DP is used.

    Refer to caption
    Figure 2: The comparison of loss function values in the non-private mode for five different ways of assigning impact factors to clients of each part in the first scenario.
    Refer to caption
    Figure 3: The comparison of loss function values for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 when impact factors of the first, second, and third parts in the first scenario are set as 00, 160160\frac{1}{60}divide start_ARG 1 end_ARG start_ARG 60 end_ARG, and 130130\frac{1}{30}divide start_ARG 1 end_ARG start_ARG 30 end_ARG, respectively. The loss value of the model with identical impacts is also presented for ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 as a reference.
  2. 2.

    scenario 2222 : In this scenario, we go further and change clients’ distributions in addition to data quality. Hence, we divide clients into three parts of 20202020, 35353535, and 5555 clients each, and deliberately add salt-and-pepper noise to them based on densities in Table I. Fig. 5 depicts the loss function values in the non-private mode using different impact factor assignments. As shown, equal pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT cannot be a right choice in the presence of heterogeneities. In this case, weighting clients is based on a balance between involving a sufficient number of clients in learning and exploiting the most accurate samples. The red curve related to “0-2-1” impact assignment sets impact factors of the first, second and third parts equal to 00, 85N85𝑁\frac{8}{5N}divide start_ARG 8 end_ARG start_ARG 5 italic_N end_ARG, and 45N45𝑁\frac{4}{5N}divide start_ARG 4 end_ARG start_ARG 5 italic_N end_ARG, respectively.

    Considering “0-2-1” as the basis ratio between the impact factors, model performances in the private mode is compared in Fig. 5. It is clear from Fig. 5 that a wise choice of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT significantly improves model accuracy in distributed architectures, especially while using DP.

    Refer to caption
    Figure 4: The comparison of loss function values in the non-private mode for five different ways of assigning impact factors to clients of each part in the second scenario.
    Refer to caption
    Figure 5: The comparison of loss function values for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 when impact factors of the first, second, and third parts in the second scenario are set as 00, 275275\frac{2}{75}divide start_ARG 2 end_ARG start_ARG 75 end_ARG, and 175175\frac{1}{75}divide start_ARG 1 end_ARG start_ARG 75 end_ARG, respectively. The loss value of the model with identical impacts is also presented for ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 as a reference.
  3. 3.

    scenario 3333 : The third scenario is designed to see the effect of the dataset size and impact factors on the convergence performance of FL model. As given in Table I, we set m𝑚mitalic_m equal to 50505050, 120120120120, and 271271271271 in clients of part 1111, 2222, and 3333 respectively, and as depicted in Fig. 7, setting higher weights to the third part (clients with larger datasets) yields to better results. The common method defining impact factors based on dataset size, or setting pi=mimsubscript𝑝𝑖subscript𝑚𝑖𝑚p_{i}=\frac{m_{i}}{m}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG, is probably developed from this assumption. But it can adversely affect the global model accuracy in heterogeneous structures.

    Reducing clients’ training samples increases sensitivity and the amount of noise required for preserving DP. Fig. 7 compares the results after Gaussian noise is added for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20. As it is shown for ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20, the accuracy of the model when identical pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is set for all parts is still worse than assigning impact factors proportional to 1111, 3333, and 5555 for part 1111, 2222, and 3333, respectively. This result has been achieved in spite of the fact that more noise is added due to a higher sensitivity in the latter experiment.

    Refer to caption
    Figure 6: The comparison of loss function values in the non-private mode for three different ways of assigning impact factors to clients of each part in the third scenario.
    Refer to caption
    Figure 7: The comparison of loss function values for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 when impact factors of the first, second, and third parts in the third scenario are set as 11801180\frac{1}{180}divide start_ARG 1 end_ARG start_ARG 180 end_ARG, 160160\frac{1}{60}divide start_ARG 1 end_ARG start_ARG 60 end_ARG, and 136136\frac{1}{36}divide start_ARG 1 end_ARG start_ARG 36 end_ARG, respectively. The loss value of the model with identical impacts is also presented for ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 as a reference.
  4. 4.

    scenario 4444 : We have designed the forth scenario to compare convergence properties when adaptive impact factors are used. As presented in Table I, the quality of clients’ datasets are changed after the 10101010-th aggregation round, and therefore, pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of each part should be changed to obtain the best model performance. Fig. 9 depicts the results of three types of impact factor assignments. The green curve which belongs to the experiment giving the heaviest weight to slight noisy datasets yields to the fastest and the most accurate result. Considering the green curve as the basis for the private mode, we compare loss function values for protection levels of ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 in Fig. 9.

    Refer to caption
    Figure 8: The comparison of loss function values in the non-private mode for three different ways of assigning impact factors to clients of each part in the forth scenario.
    Refer to caption
    Figure 9: The comparison of loss function values for protection levels ϵ=5italic-ϵ5\epsilon=5italic_ϵ = 5 and ϵ=20italic-ϵ20\epsilon=20italic_ϵ = 20 when impact factors of the first, second, and third parts in the forth scenario are respectively set as 00, 160160\frac{1}{60}divide start_ARG 1 end_ARG start_ARG 60 end_ARG, and 130130\frac{1}{30}divide start_ARG 1 end_ARG start_ARG 30 end_ARG for t10𝑡10t\leqslant 10italic_t ⩽ 10, and 130130\frac{1}{30}divide start_ARG 1 end_ARG start_ARG 30 end_ARG, 160160\frac{1}{60}divide start_ARG 1 end_ARG start_ARG 60 end_ARG, and 00 for 10<t3010𝑡3010<t\leqslant 3010 < italic_t ⩽ 30.

VII Conclusion

In this paper, we presented a personalized privacy preserving approach in federated learning models. Considering the systems and statistical heterogeneities in distributed architectures, we have first focused on the roles that impact factors play in obtaining the best model performance. We further clarified that the impacts are not necessarily fixed during training the global model and undergo changes. Hence, the influence each client has on learning can increase, decrease, or become zero while i=1Npi=1superscriptsubscript𝑖1𝑁subscript𝑝𝑖1\sum_{i=1}^{N}p_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 applies. Then, we have proposed the requirements for preserving (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP in both clients and the server, when personalized aggregation is applied. We have developed the convergence analysis of the proposed scheme for both fixed and time-varying impact factors. Our simulation results on four scenarios helps understanding the importance of assigning non-identical impact factors to compensate the weaknesses of local datasets, clients, links, and the server.

References

  • [1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, vol. 54, 2017, pp. 1273–1282.
  • [2] V. Smith, C. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,” in NIPS, 2017, pp. 4424–4434.
  • [3] K. A. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, B. McMahan, T. V. Overveldt, D. Petrou, D. Ramage, and J. Roselander, “Towards federated learning at scale: System design,” in MLSys, 2019.
  • [4] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, pp. 12:1–12:19, 2019.
  • [5] J. Qian, S. P. Gochhayat, and L. K. Hansen, “Distributed active learning strategies on edge computing,” in CSCloud/EdgeCom.   IEEE, 2019, pp. 221–226.
  • [6] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.
  • [7] A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” in CDC.   IEEE, 2012, pp. 5451–5452.
  • [8] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. **, T. Q. S. Quek, and H. V. Poor, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 3454–3469, 2020.
  • [9] M. Talaei and I. Izadi, “Comments on “federated learning with differential privacy: Algorithms and performance analysis”,” arXiv preprint arXiv:2406.05858, 2024.
  • [10] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, ser. Proceedings of Machine Learning Research, vol. 54.   PMLR, 2017, pp. 1273–1282.
  • [11] S. Salehkaleybar, A. Sharif-Nassab, and S. J. Golestani, “One-shot federated learning: Theoretical limits and algorithms to achieve them,” J. Mach. Learn. Res., vol. 22, pp. 189:1–189:47, 2021.
  • [12] Y. Du, S. Yang, and K. Huang, “High-dimensional stochastic gradient quantization for communication-efficient edge learning,” IEEE Trans. Signal Process., vol. 68, pp. 2128–2142, 2020.
  • [13] N. Shlezinger, M. Chen, Y. C. Eldar, H. V. Poor, and S. Cui, “Uveqfed: Universal vector quantization for federated learning,” IEEE Trans. Signal Process., vol. 69, pp. 500–514, 2021.
  • [14] A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, “Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization,” in AISTATS, ser. Proceedings of Machine Learning Research, vol. 108.   PMLR, 2020, pp. 2021–2031.
  • [15] H. Wang, S. Sievert, S. Liu, Z. B. Charles, D. S. Papailiopoulos, and S. J. Wright, “ATOMO: communication-efficient learning via atomic sparsification,” in NeurIPS, 2018, pp. 9872–9883.
  • [16] N. Strom, “Scalable distributed DNN training using commodity GPU cloud computing,” in INTERSPEECH.   ISCA, 2015, pp. 1488–1492.
  • [17] T. Nishio and R. Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in ICC.   IEEE, 2019, pp. 1–7.
  • [18] L. Ye and V. Gupta, “Client scheduling for federated learning over wireless networks: A submodular optimization approach,” in CDC.   IEEE, 2021, pp. 63–68.
  • [19] A. Ghosh, J. Hong, D. Yin, and K. Ramchandran, “Robust federated learning in a heterogeneous environment,” CoRR, vol. abs/1906.06629, 2019.
  • [20] B. Recht, C. Ré, S. J. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in NIPS, 2011, pp. 693–701.
  • [21] T. Li, S. Hu, A. Beirami, and V. Smith, “Federated multi-task learning for competing constraints,” CoRR, vol. abs/2012.04221, 2020.
  • [22] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” in ICLR.   OpenReview.net, 2020.
  • [23] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine Learning and Systems, vol. 2, pp. 429–450, 2020.
  • [24] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Gener. Comput. Syst., vol. 115, pp. 619–640, 2021.
  • [25] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in USENIX Security Symposium.   USENIX Association, 2019, pp. 267–284.
  • [26] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning: Stand-alone and federated learning under passive and active white-box inference attacks,” CoRR, vol. abs/1812.00910, 2018.
  • [27] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3-4, pp. 211–407, 2014.
  • [28] M. Talaei and I. Izadi, “Adaptive differential privacy in federated learning: A priority-based approach,” arXiv preprint arXiv:2401.02453, 2024.
  • [29] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. **, T. Q. S. Quek, and H. V. Poor, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 3454–3469, 2020.
  • [30] O. Thakkar, G. Andrew, and H. B. McMahan, “Differentially private learning with adaptive clip**,” CoRR, vol. abs/1905.03871, 2019.
  • [31] X. Liu, H. Li, G. Xu, R. Lu, and M. He, “Adaptive privacy-preserving federated learning,” Peer-to-Peer Netw. Appl., vol. 13, no. 6, pp. 2356–2366, 2020.
  • [32] M. Gong, K. Pan, Y. Xie, A. K. Qin, and Z. Tang, “Preserving differential privacy in deep neural networks with relevance-based adaptive noise imposition,” Neural Networks, vol. 125, pp. 131–141, 2020.
  • [33] R. Hu, Y. Guo, H. Li, Q. Pei, and Y. Gong, “Personalized federated learning with differential privacy,” IEEE Internet Things J., vol. 7, no. 10, pp. 9530–9539, 2020.
  • [34] B. Recht, C. Ré, S. J. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in NIPS, 2011, pp. 693–701.
  • [35] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, ser. Proceedings of Machine Learning Research, vol. 54.   PMLR, 2017, pp. 1273–1282.
  • [36] M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed learning in wireless networks: Recent progress and future challenges,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3579–3605, 2021.
  • [37] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” in MLSys.   mlsys.org, 2020.
  • [38] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.