Revisiting Decentralized ProxSkip: Achieving Linear Speedupthanks: Corresponding author: **de Cao.

Luyao Guo1, Sulaiman A. Alghunaim2, Kun Yuan3, Laurent Condat4, **de Cao1
1Southeast University 2Kuwait University 3Peking University
4King Abdullah University of Science and Technology (KAUST)

{ly_guo, jdcao}@seu.edu.cn
[email protected][email protected]
Abstract

The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable.

In this paper, we revisit decentralized ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is 𝒪(pσ2/nϵ2)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2\mathcal{O}(\nicefrac{{p\sigma^{2}}}{{n\epsilon^{2}}})caligraphic_O ( / start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) for non-convex and convex settings, and 𝒪(pσ2/nϵ)𝒪𝑝superscript𝜎2𝑛italic-ϵ\mathcal{O}(\nicefrac{{p\sigma^{2}}}{{n\epsilon}})caligraphic_O ( / start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG ) for the strongly convex setting, where n𝑛nitalic_n represents the number of nodes, p𝑝pitalic_p denotes the probability of communication, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT signifies the level of stochastic noise, and ϵitalic-ϵ\epsilonitalic_ϵ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes.

1 Introduction

In this work, we consider the following decentralized optimization problem by a group of agents [n]:={1,2,,n}assigndelimited-[]𝑛12𝑛[n]:=\{1,2,\ldots,n\}[ italic_n ] := { 1 , 2 , … , italic_n } connected over a network:

f=min𝐱d[f(𝐱):=1ni=1nfi(𝐱)],superscript𝑓subscript𝐱superscript𝑑assign𝑓𝐱1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖𝐱\displaystyle f^{\star}=\min_{{\bf{x}}\in\mathbb{R}^{d}}\Big{[}f({\bf{x}}):=% \frac{1}{n}\sum_{i=1}^{n}f_{i}({\bf{x}})\Big{]},italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_f ( bold_x ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ] , (1)
with fi(𝐱)=𝔼ξi𝒟i[Fi(𝐱,ξi)],with subscript𝑓𝑖𝐱subscript𝔼similar-tosubscript𝜉𝑖subscript𝒟𝑖delimited-[]subscript𝐹𝑖𝐱subscript𝜉𝑖\displaystyle\text{with }f_{i}({\bf{x}})=\mathbb{E}_{\xi_{i}\sim\mathcal{D}_{i% }}[F_{i}({\bf{x}},\xi_{i})],with italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ,

where{𝒟i}i=1nsubscriptsuperscriptsubscript𝒟𝑖𝑛𝑖1\{\mathcal{D}_{i}\}^{n}_{i=1}{ caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT represent data distributions, which can be heterogeneous across n𝑛nitalic_n nodes, fi:d:subscript𝑓𝑖superscript𝑑f_{i}:\mathbb{R}^{d}\rightarrow\mathbb{R}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R is a smooth local function accessed by node i𝑖iitalic_i. In this setup, a network of nodes (also referred to as agents, workers, or clients) collaboratively seeks to minimize the average of the nodes’ objectives. Solving problem (1) in a decentralized manner has garnered considerable attention in recent years [2, 3, 4, 5, 6]. These methods do not rely on a central coordinator and that communicate only with neighbors in an arbitrary communication topology. Nevertheless, decentralized optimization algorithms may still face challenges arising from communication bottlenecks.

Table 1: Comparisons with existing convergence rates of ProxSkip for decentralized optimization. σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the variance of the stochastic gradient. ζ0:=max{1αμ,1p2}assignsubscript𝜁01𝛼𝜇1superscript𝑝2\zeta_{0}:=\max\{1-\alpha\mu,1-p^{2}\}italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := roman_max { 1 - italic_α italic_μ , 1 - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, the definition of ζnewsubscript𝜁new\zeta_{{\mathrm{new}}}italic_ζ start_POSTSUBSCRIPT roman_new end_POSTSUBSCRIPT and ζ𝜁\zetaitalic_ζ can be found in [50] and Theorem 1, respectively. CVX, N-CVX, and S-CVX mean convex, non-convex, and strongly convex, respectively.
Reference convergence rate decentralized linear speedup
N-CVX CVX S-CVX
[47, 48] no results no results ζ0Tsuperscriptsubscript𝜁0𝑇\zeta_{0}^{T}italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0
[46, 49] no results no results ζ0T+𝒪(ασ2)superscriptsubscript𝜁0𝑇𝒪𝛼superscript𝜎2\zeta_{0}^{T}+\mathcal{O}(\alpha\sigma^{2})italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + caligraphic_O ( italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
[50] no results no results ζnewT+𝒪(ασ2)superscriptsubscript𝜁new𝑇𝒪𝛼superscript𝜎2\zeta_{{\mathrm{new}}}^{T}+\mathcal{O}(\alpha\sigma^{2})italic_ζ start_POSTSUBSCRIPT roman_new end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + caligraphic_O ( italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
[46] no results no results ζTsuperscript𝜁𝑇\zeta^{T}italic_ζ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0
[51] no results 𝒪(1αT)𝒪1𝛼𝑇\mathcal{O}\left(\frac{1}{\alpha T}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG ) ζTsuperscript𝜁𝑇\zeta^{T}italic_ζ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0
this work 𝒪(1αT+ασ2n+α2σ2)𝒪1𝛼𝑇𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\mathcal{O}\left(\frac{1}{\alpha T}+{\color[rgb]{1.00,0.50,0.00}\definecolor[% named]{pgfstrokecolor}{rgb}{1.00,0.50,0.00}\frac{\alpha\sigma^{2}}{n}}+\alpha^% {2}\sigma^{2}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(1αT+ασ2n+α2σ2)𝒪1𝛼𝑇𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\mathcal{O}\left(\frac{1}{\alpha T}+{\color[rgb]{1.00,0.50,0.00}\definecolor[% named]{pgfstrokecolor}{rgb}{1.00,0.50,0.00}\frac{\alpha\sigma^{2}}{n}}+\alpha^% {2}\sigma^{2}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(ζT+ασ2n+α2σ2)𝒪superscript𝜁𝑇𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\mathcal{O}\left(\zeta^{T}+{\color[rgb]{1.00,0.50,0.00}\definecolor[named]{% pgfstrokecolor}{rgb}{1.00,0.50,0.00}\frac{\alpha\sigma^{2}}{n}}+\alpha^{2}% \sigma^{2}\right)caligraphic_O ( italic_ζ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

To reduce communication costs, many techniques have been proposed. These techniques include compressing models and gradients [7, 8, 9, 10, 11, 12, 13], using accelerated scheme [14, 15, 16, 17, 18, 19], and implementing local updates [20, 21, 22, 23, 24]. By applying these strategies, it is possible to reduce the amount of information exchanged between different nodes during training, thereby improving the efficiency of distributed training setups.

In this work, we mainly focus on performing local updates as means to reduce communication frequency. In centralized settings (federated learning), local-SGD/FedAvg [22, 23, 25, 26] has emerged as one of the most widely adopted learning methods that employ local updates. However, when dealing with heterogeneous data, Local-SGD/FedAvg encounters the challenge of “client-drift.” This phenomenon arises from the diversity of functions on each node, causing each client to converge towards the minima of its respective function fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which may be significantly distant from the global optimum fsuperscript𝑓f^{\star}italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. To tackle this issue, several algorithms have been proposed, including Scaffold [27], Scaffold with momentum [28], FedLin [29], FedPD [30], FedDyn [31], VRL-SGD [32], FedGATE [33], SCALLION/SCAFCOM [34]. In decentralized settings, local-DSGD has been introduced in [35]. Similarly to local-SGD, it also encounters the issue of client-drift when dealing with heterogeneous data. To mitigate the drift in Local-DSGD, several algorithms have been proposed. Notably, gradient-tracking (GT) based approaches, such as local-GT [39] and K𝐾Kitalic_K-GT [40], have been developed. Additionally, algorithms based on Exact-Diffusion/NIDS/D2 [41, 42, 43, 44], such as LED [45], have been introduced. Distinct from these periodic local updates methods [27, 29, 30, 35, 39, 40, 45], methods incorporating probabilistic local update are proposed such ProxSkip [46] and its extended versions, such as TAMUNA [47], CompressedScaffnew [48], VR-ProxSkip [49], ODEProx [50], and RandProx [51].

It is known that ProxSkip does not depend on the heterogeneity of the data and exhibit linear convergence on distributed strongly convex problems in the absence of stochastic noise [46]. When the network is sufficiently well-connected, ProxSkip [46] and its extensions [47, 48, 49, 50, 51] are gaining increasing attention due to their proven benefits in accelerating communication complexity. When deploying ProxSkip within the context of machine learning, it becomes imperative to comprehend its behavior on non-convex tasks and its susceptibility to stochastic noise. However, existing ProxSkip convergence analyses focus on convex settings, and the main limitation of the existing analyses is the inability to prove that linear speedup in terms of the number of nodes. Notice that although [50] presents the ODEProx algorithm and gives a more rigorous analysis of ProxSkip in the strongly convex setting, this new analysis shares the same limitation as the original ProxSkip analysis, namely, the inability to achieve linear speedup. Achieving linear speedup is highly desirable for a decentralized/federated learning algorithm as it enables effective utilization of the massive parallelism inherent in large decentralized/federated learning systems. Consequently, two fundamental open questions emerge:

(1) How does ProxSkip behave on non-convex tasks?

(2) Can we establish a linear speedup bound for ProxSkip in the presence of stochastic noise?

In this paper, we revisit ProxSkip for decentralized learning and provide answers to both questions. Specifically, we develop a new analysis with a novel proof technique under non-convex, convex, and strongly convex settings. Through this analysis, we obtain several new results that are comparable to the bounds of state-of-the-art decentralized algorithms while achieving linear speedup bounds.

We highlight our contributions as follows:

  • We establish the non-asymptotic convergence rate under stochastic non-convex, convex, and strongly convex settings of ProxSkip for problem (1). In particular, we prove that ProxSkip at iteration T𝑇Titalic_T converges with rate

    N-CVX/CVX: 𝒪(1αT+α2(1λ2)2T+ασ2n+σ2α21λ2),𝒪1𝛼𝑇superscript𝛼2superscript1subscript𝜆22𝑇𝛼superscript𝜎2𝑛superscript𝜎2superscript𝛼21subscript𝜆2\displaystyle\mathcal{O}\bigg{(}\frac{1}{\alpha T}+\frac{\alpha^{2}}{(1-% \lambda_{2})^{2}T}+{\color[rgb]{1.00,0.50,0.00}\definecolor[named]{% pgfstrokecolor}{rgb}{1.00,0.50,0.00}\frac{\alpha\sigma^{2}}{n}}+\frac{\sigma^{% 2}\alpha^{2}}{1-\lambda_{2}}\bigg{)},caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) , (2)
    S-CVX: 𝒪((1αμ)Ta0+ασ2μn+σ2α2μ(1λ2)),𝒪superscript1𝛼𝜇𝑇subscript𝑎0𝛼superscript𝜎2𝜇𝑛superscript𝜎2superscript𝛼2𝜇1subscript𝜆2\displaystyle\mathcal{O}\bigg{(}(1-\alpha\mu)^{T}a_{0}+{\color[rgb]{% 1.00,0.50,0.00}\definecolor[named]{pgfstrokecolor}{rgb}{1.00,0.50,0.00}\frac{% \alpha\sigma^{2}}{\mu n}}+\frac{\sigma^{2}\alpha^{2}}{\mu(1-\lambda_{2})}\bigg% {)},caligraphic_O ( ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG + divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) , (3)

    where α𝛼\alphaitalic_α is the stepsize of ProxSkip, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denotes the variance of the stochastic gradient, 1λ21subscript𝜆21-\lambda_{2}1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a topology-dependent quantity that approaches 00 for a large and sparse network, μ𝜇\muitalic_μ is the strongly convex constant, and a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a constant that depends on the initialization. To the best of our knowledge, it is the first work that establishes the convergence rate of probabilistic decentralized methods for non-convex settings. We offer a comparison of convergence rates of ProxSkip for problem (1) in Table 1.

  • We prove that, after enough transient time, the expected communication complexity of ProxSkip is 𝒪(pσ2/nϵ2)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2\mathcal{O}(\nicefrac{{p\sigma^{2}}}{{n\epsilon^{2}}})caligraphic_O ( / start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (or 𝒪~(pσ2/nϵ)~𝒪𝑝superscript𝜎2𝑛italic-ϵ\tilde{\mathcal{O}}(\nicefrac{{p\sigma^{2}}}{{n\epsilon}})over~ start_ARG caligraphic_O end_ARG ( / start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG ) for S-CVX), where ϵitalic-ϵ\epsilonitalic_ϵ denotes the desired accuracy level, demonstrating that ProxSkip achieves linear speedup with respect to the number of nodes n𝑛nitalic_n. In addition, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes. The proposed new proof technique overcomes the analytical limitations of [46, 47, 48, 49, 50, 51]. To the best of our knowledge, we prove for the first time that ProxSkip can achieve linear speedup.

  • We elucidate the effects of noise, local steps, and data heterogeneity on the convergence of ProxSkip in stochastic non-convex, convex, and strongly convex settings. We demonstrate the robustness of ProxSkip against data heterogeneity while enhancing communication efficiency by local updates. Furthermore, we show that the convergence rates exhibited by ProxSkip in stochastic settings are comparable with those of existing state-of-the-art decentralized algorithms incorporating local updates [35, 40, 45] (see Table 2).

2 Setup

All vectors are column vectors unless otherwise stated. We let 𝐱itdsuperscriptsubscript𝐱𝑖𝑡superscript𝑑\mathbf{x}_{i}^{t}\in\mathbb{R}^{d}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT represent the local state of node i𝑖iitalic_i at the t𝑡titalic_t-th iteration. For the sake of convenience in notation, we use bold capital letters to denote stacked variables. For instance,

𝐗t:=assignsuperscript𝐗𝑡absent\displaystyle\mathbf{X}^{t}:=bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT := [𝐱1t,𝐱2t,,𝐱nt]𝖳n×d,superscriptsuperscriptsubscript𝐱1𝑡superscriptsubscript𝐱2𝑡superscriptsubscript𝐱𝑛𝑡𝖳superscript𝑛𝑑\displaystyle\ [\mathbf{x}_{1}^{t},\mathbf{x}_{2}^{t},\ldots,\mathbf{x}_{n}^{t% }]^{\sf T}\in\mathbb{R}^{n\times d},[ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT ,
𝐆t:=assignsuperscript𝐆𝑡absent\displaystyle\mathbf{G}^{t}:=bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT := [𝐠1t,𝐠2t,,𝐠nt]𝖳n×d,superscriptsuperscriptsubscript𝐠1𝑡superscriptsubscript𝐠2𝑡superscriptsubscript𝐠𝑛𝑡𝖳superscript𝑛𝑑\displaystyle\ [\mathbf{g}_{1}^{t},\mathbf{g}_{2}^{t},\ldots,\mathbf{g}_{n}^{t% }]^{\sf T}\in\mathbb{R}^{n\times d},[ bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , bold_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT ,
F(𝐗t):=assign𝐹superscript𝐗𝑡absent\displaystyle\nabla F(\mathbf{X}^{t}):=∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) := [f1(𝐱1t),f2(𝐱2t),,fn(𝐱nt)]𝖳n×d.superscriptsubscript𝑓1subscriptsuperscript𝐱𝑡1subscript𝑓2subscriptsuperscript𝐱𝑡2subscript𝑓𝑛subscriptsuperscript𝐱𝑡𝑛𝖳superscript𝑛𝑑\displaystyle\ [\nabla f_{1}(\mathbf{x}^{t}_{1}),\nabla f_{2}(\mathbf{x}^{t}_{% 2}),\ldots,\nabla f_{n}(\mathbf{x}^{t}_{n})]^{\sf T}\in\mathbb{R}^{n\times d}.[ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ∇ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ∇ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT .

2.1 Network graph

Algorithm 1 ProxSkip for decentralized stochastic optimization
1:  Input α>0𝛼0\alpha>0italic_α > 0, β>0𝛽0\beta>0italic_β > 0, 0<p10𝑝10<p\leq 10 < italic_p ≤ 1, χ1𝜒1\chi\geq 1italic_χ ≥ 1, initial iterates 𝐱i0=𝐱0d,i=1,,nformulae-sequencesubscriptsuperscript𝐱0𝑖superscript𝐱0superscript𝑑𝑖1𝑛{\bf{x}}^{0}_{i}={\bf{x}}^{0}\in\mathbb{R}^{d},~{}i=1,\dots,nbold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_i = 1 , … , italic_n, initial dual variables 𝐲i0=0,i=1,,nformulae-sequencesubscriptsuperscript𝐲0𝑖0𝑖1𝑛{\bf{y}}^{0}_{i}=0,~{}i=1,\dots,nbold_y start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_i = 1 , … , italic_n, weights for averaging 𝐖a=𝐈1/2χ(𝐈𝐖):=(W~ij)i,j=1nsubscript𝐖𝑎𝐈12𝜒𝐈𝐖assignsubscriptsuperscriptsubscript~𝑊𝑖𝑗𝑛𝑖𝑗1{\bf{W}}_{a}={\bf{I}}-\nicefrac{{1}}{{2\chi}}({\bf{I-W}}):=(\widetilde{W}_{ij}% )^{n}_{i,j=1}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = bold_I - / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - bold_W ) := ( over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT.
2:  Flip coins [θ0,,θT1]subscript𝜃0subscript𝜃𝑇1[\theta_{0},\ldots,\theta_{T-1}][ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ], where θt{0,1}subscript𝜃𝑡01\theta_{t}\in\{0,1\}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 }, with 𝐏(θt=1)=p𝐏subscript𝜃𝑡1𝑝\mathop{{\bf{P}}}(\theta_{t}=1)=pbold_P ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 ) = italic_p.
3:  for t=0,1,,T1𝑡01𝑇1t=0,1,\dotsc,T-1italic_t = 0 , 1 , … , italic_T - 1 every node do
4:     Sample ξitsuperscriptsubscript𝜉𝑖𝑡\xi_{i}^{t}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, compute gradient 𝐠it=Fi(𝐱it,ξit)superscriptsubscript𝐠𝑖𝑡subscript𝐹𝑖subscriptsuperscript𝐱𝑡𝑖subscriptsuperscript𝜉𝑡𝑖{\bf{g}}_{i}^{t}=\nabla F_{i}({\bf{x}}^{t}_{i},\xi^{t}_{i})bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
5:     𝐳^it=𝐱itα𝐠it𝐲itsubscriptsuperscript^𝐳𝑡𝑖subscriptsuperscript𝐱𝑡𝑖𝛼subscriptsuperscript𝐠𝑡𝑖subscriptsuperscript𝐲𝑡𝑖\hat{{\bf{z}}}^{t}_{i}={\bf{x}}^{t}_{i}-\alpha{\bf{g}}^{t}_{i}-{\bf{y}}^{t}_{i}over^ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α bold_g start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \triangleright update the prediction variate 𝐳^itsubscriptsuperscript^𝐳𝑡𝑖\hat{{\bf{z}}}^{t}_{i}over^ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
6:     if θt=1subscript𝜃𝑡1\theta_{t}=1italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 then
7:        𝐱it+1=j=1nW~ij𝐳^jtsubscriptsuperscript𝐱𝑡1𝑖superscriptsubscript𝑗1𝑛subscript~𝑊𝑖𝑗subscriptsuperscript^𝐳𝑡𝑗{\bf{x}}^{t+1}_{i}=\sum_{j=1}^{n}\widetilde{W}_{ij}\hat{{\bf{z}}}^{t}_{j}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT over^ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT \triangleright communicate with probability p𝑝pitalic_p
8:        𝐲it+1=𝐲it+β(𝐳^it𝐱it+1)subscriptsuperscript𝐲𝑡1𝑖subscriptsuperscript𝐲𝑡𝑖𝛽subscriptsuperscript^𝐳𝑡𝑖subscriptsuperscript𝐱𝑡1𝑖{\bf{y}}^{t+1}_{i}={\bf{y}}^{t}_{i}+\beta(\hat{{\bf{z}}}^{t}_{i}-{\bf{x}}^{t+1% }_{i})bold_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β ( over^ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) \triangleright update the control variate 𝐲it+1subscriptsuperscript𝐲𝑡1𝑖{\bf{y}}^{t+1}_{i}bold_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
9:     else
10:        𝐲it+1=𝐲it,𝐱it+1=𝐳^itformulae-sequencesubscriptsuperscript𝐲𝑡1𝑖subscriptsuperscript𝐲𝑡𝑖subscriptsuperscript𝐱𝑡1𝑖subscriptsuperscript^𝐳𝑡𝑖{\bf{y}}^{t+1}_{i}={\bf{y}}^{t}_{i},~{}{\bf{x}}^{t+1}_{i}=\hat{{\bf{z}}}^{t}_{i}bold_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG bold_z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \triangleright skip communication
11:     end if
12:  end for

In this work, we focus on decentralized scenarios (undirected and connected network), where a network of n𝑛nitalic_n nodes is interconnected by a graph with a set of edges \mathcal{E}caligraphic_E, where node i𝑖iitalic_i is connected to node j𝑗jitalic_j if (i,j)𝑖𝑗(i,j)\in\mathcal{E}( italic_i , italic_j ) ∈ caligraphic_E. To describe the algorithm, we introduce the global mixing matrix 𝐖=[Wij]𝐖delimited-[]subscript𝑊𝑖𝑗\mathbf{W}=[W_{ij}]bold_W = [ italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ], where Wij=Wji=0subscript𝑊𝑖𝑗subscript𝑊𝑗𝑖0W_{ij}=W_{ji}=0italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = 0 if (i,j)𝑖𝑗(i,j)\notin\mathcal{E}( italic_i , italic_j ) ∉ caligraphic_E, and Wij>0subscript𝑊𝑖𝑗0W_{ij}>0italic_W start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > 0 otherwise. We impose the following standard assumption on the mixing matrix.

Assumption 1.

The mixing matrix 𝐖[0,1]n×n𝐖superscript01𝑛𝑛\mathbf{W}\in[0,1]^{n\times n}bold_W ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT is symmetric, doubly stochastic, and primitive. Let λ1=1subscript𝜆11\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 denote the largest eigenvalue of the mixing matrix 𝐖𝐖\mathbf{W}bold_W, and the remaining eigenvalues are denoted as 1>λ2λ3λn>11subscript𝜆2subscript𝜆3subscript𝜆𝑛11>\lambda_{2}\geq\lambda_{3}\geq\cdots\geq\lambda_{n}>-11 > italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ≥ ⋯ ≥ italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > - 1.

We introduce two quantities as follows: 𝐖a=𝐈1/2χ(𝐈𝐖),𝐖b=(𝐈𝐖)1/2formulae-sequencesubscript𝐖𝑎𝐈12𝜒𝐈𝐖subscript𝐖𝑏superscript𝐈𝐖12\mathbf{W}_{a}=\mathbf{I}-\nicefrac{{1}}{{2\chi}}(\mathbf{I}-\mathbf{W}),\ % \mathbf{W}_{b}=(\mathbf{I}-\mathbf{W})^{1/2}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = bold_I - / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - bold_W ) , bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = ( bold_I - bold_W ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, where χ1𝜒1\chi\geq 1italic_χ ≥ 1. Under Assumption 1, the matrix 𝐖asubscript𝐖𝑎\mathbf{W}_{a}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is positive semi-definite and doubly stochastic. Furthermore, we have 𝐈𝐖a=1/2χ𝐖b2𝐈subscript𝐖𝑎12𝜒superscriptsubscript𝐖𝑏2\mathbf{I}-\mathbf{W}_{a}=\nicefrac{{1}}{{2\chi}}\mathbf{W}_{b}^{2}bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and 𝐖asubscript𝐖𝑎\mathbf{W}_{a}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is well-conditioned when χ𝜒\chiitalic_χ is large.

2.2 Algorithm description

The ProxSkip algorithm [46] for problem (1) can be written as

𝐙^tsuperscript^𝐙𝑡\displaystyle\hat{\mathbf{Z}}^{t}over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐗tα𝐆t𝐘t,absentsuperscript𝐗𝑡𝛼superscript𝐆𝑡superscript𝐘𝑡\displaystyle=\mathbf{X}^{t}-\alpha\mathbf{G}^{t}-\mathbf{Y}^{t},= bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (4a)
𝐗t+1superscript𝐗𝑡1\displaystyle\mathbf{X}^{t+1}bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =(1θt)𝐙^t+θt𝐖a𝐙^t,absent1subscript𝜃𝑡superscript^𝐙𝑡subscript𝜃𝑡subscript𝐖𝑎superscript^𝐙𝑡\displaystyle=(1-\theta_{t})\hat{\mathbf{Z}}^{t}+\theta_{t}\mathbf{W}_{a}\hat{% \mathbf{Z}}^{t},= ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (4b)
𝐘t+1superscript𝐘𝑡1\displaystyle\mathbf{Y}^{t+1}bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐘t+β(𝐙^t𝐗t+1).absentsuperscript𝐘𝑡𝛽superscript^𝐙𝑡superscript𝐗𝑡1\displaystyle=\mathbf{Y}^{t}+\beta(\hat{\mathbf{Z}}^{t}-\mathbf{X}^{t+1}).= bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_β ( over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) . (4c)

Here, α>0𝛼0\alpha>0italic_α > 0 is the stepsize (learning rate), β>0𝛽0\beta>0italic_β > 0, 𝐆t=[𝐠1t,𝐠2t,,𝐠nt]𝖳n×dsuperscript𝐆𝑡superscriptsuperscriptsubscript𝐠1𝑡superscriptsubscript𝐠2𝑡superscriptsubscript𝐠𝑛𝑡𝖳superscript𝑛𝑑\mathbf{G}^{t}=[\mathbf{g}_{1}^{t},\mathbf{g}_{2}^{t},\ldots,\mathbf{g}_{n}^{t% }]^{\sf T}\in\mathbb{R}^{n\times d}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , bold_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT with 𝐠itsuperscriptsubscript𝐠𝑖𝑡\mathbf{g}_{i}^{t}bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT representing the stochastic gradient of fi(𝐱it)subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡\nabla f_{i}(\mathbf{x}_{i}^{t})∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), θt=1subscript𝜃𝑡1\theta_{t}=1italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 with probability p𝑝pitalic_p and θt=0subscript𝜃𝑡0\theta_{t}=0italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 with probability 1p1𝑝1-p1 - italic_p, and 𝐘tsuperscript𝐘𝑡\mathbf{Y}^{t}bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the control variate. At each iteration t0𝑡0t\geq 0italic_t ≥ 0, communication takes place with a probability p(0,1]𝑝01p\in(0,1]italic_p ∈ ( 0 , 1 ]. In the absence of communication, the update 𝐗t+1=𝐗tα𝐆t𝐘tsuperscript𝐗𝑡1superscript𝐗𝑡𝛼superscript𝐆𝑡superscript𝐘𝑡\mathbf{X}^{t+1}=\mathbf{X}^{t}-\alpha\mathbf{G}^{t}-\mathbf{Y}^{t}bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is performed, while 𝐘t+1superscript𝐘𝑡1\mathbf{Y}^{t+1}bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT remains unchanged. This allows for multiple iterations of local computations to be performed between communication rounds. Decomposing the updates for individual nodes, we provide a detailed implementation in Algorithm 1.

2.3 Assumptions

We further use the following standard assumptions:

Assumption 2.

A solution exists to problem (1), and f>superscript𝑓f^{*}>-\inftyitalic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > - ∞. Moreover, fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is L𝐿Litalic_L-smooth, i.e.,

fi(𝐱)fi(𝐲)L𝐱𝐲, for any 𝐱,𝐲d.formulae-sequencenormsubscript𝑓𝑖𝐱subscript𝑓𝑖𝐲𝐿norm𝐱𝐲 for any 𝐱𝐲superscript𝑑\|\nabla f_{i}({\bf{x}})-\nabla f_{i}({\bf{y}})\|\leq L\|{\bf{x}}-{\bf{y}}\|,% \text{ for any }{\bf{x}},{\bf{y}}\in\mathbb{R}^{d}.∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) ∥ ≤ italic_L ∥ bold_x - bold_y ∥ , for any bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .
Assumption 3.

Each function fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-strongly convex for constant μ0𝜇0\mu\geq 0italic_μ ≥ 0, i.e.,

fi(𝐱)fi(𝐲)+μ2𝐱𝐲2fi(𝐱),𝐱𝐲, for any 𝐱,𝐲d.formulae-sequencesubscript𝑓𝑖𝐱subscript𝑓𝑖𝐲𝜇2superscriptnorm𝐱𝐲2subscript𝑓𝑖𝐱𝐱𝐲 for any 𝐱𝐲superscript𝑑f_{i}({\bf{x}})-f_{i}({\bf{y}})+\frac{\mu}{2}\|{\bf{x}}-{\bf{y}}\|^{2}\leq% \langle\nabla f_{i}({\bf{x}}),{\bf{x}}-{\bf{y}}\rangle,\text{ for any }{\bf{x}% },{\bf{y}}\in\mathbb{R}^{d}.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ bold_x - bold_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) , bold_x - bold_y ⟩ , for any bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .
Assumption 4.

For all iteration t0𝑡0t\geq 0italic_t ≥ 0, the local stochastic gradient 𝐠it=Fi(𝐱it,ξit)subscriptsuperscript𝐠𝑡𝑖subscript𝐹𝑖superscriptsubscript𝐱𝑖𝑡subscriptsuperscript𝜉𝑡𝑖{\bf{g}}^{t}_{i}=\nabla F_{i}({\bf{x}}_{i}^{t},\xi^{t}_{i})bold_g start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is an unbiased estimate, i.e.,

𝔼ξit[Fi(𝐱it,ξit)|𝐱it]=fi(𝐱it),subscript𝔼subscriptsuperscript𝜉𝑡𝑖delimited-[]conditionalsubscript𝐹𝑖superscriptsubscript𝐱𝑖𝑡subscriptsuperscript𝜉𝑡𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡\mathbb{E}_{\xi^{t}_{i}}[\nabla F_{i}({\bf{x}}_{i}^{t},\xi^{t}_{i})\;|\;{\bf{x% }}_{i}^{t}\ ]=\nabla f_{i}({\bf{x}}_{i}^{t}),blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ,

and there exists a constant σ>0𝜎0\sigma>0italic_σ > 0 such that

1ni=1n𝔼ξit[Fi(𝐱it,ξit)fi(𝐱it)2]σ2.1𝑛superscriptsubscript𝑖1𝑛subscript𝔼superscriptsubscript𝜉𝑖𝑡delimited-[]superscriptnormsubscript𝐹𝑖superscriptsubscript𝐱𝑖𝑡subscriptsuperscript𝜉𝑡𝑖subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡2superscript𝜎2\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}_{\xi_{i}^{t}}\big{[}\|\nabla F_{i}({\bf{x}% }_{i}^{t},\xi^{t}_{i})-\nabla f_{i}({\bf{x}}_{i}^{t})\|^{2}\big{]}\leq\sigma^{% 2}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∥ ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

3 Convergence results

We now present our novel convergence results for ProxSkip. In Section 3.1, we recall the existing results in [46]. In Section 3.2, the convergence rates and communication complexities for nonconvex and convex functions are presented Theorem 2 and Corollary 1, respectively. In Section 3.3, we prove further that ProxSkip can achieve linear speedup with network-independent stepsizes.

3.1 Preliminary

We start to recall the existing convergence results of ProxSkip [46, 51].

Theorem 1.

Suppose that Assumptions 1, 2, and 4 hold, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-strongly convex for some 0<μL0𝜇𝐿0<\mu\leq L0 < italic_μ ≤ italic_L. If 0<α1/L0𝛼1𝐿0<\alpha\leq\nicefrac{{1}}{{L}}0 < italic_α ≤ / start_ARG 1 end_ARG start_ARG italic_L end_ARG, β=p𝛽𝑝\beta=pitalic_β = italic_p, and χ1𝜒1\chi\geq 1italic_χ ≥ 1, it holds that

𝔼[𝐱¯t+1𝐱2]ζt+1a0+α2σ21ζ,𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝜁𝑡1subscript𝑎0superscript𝛼2superscript𝜎21𝜁\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\right]\leq\zeta^{t+1}a_{0}+\frac{\alpha^{2}\sigma^{2}}{1-\zeta},blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG , (5)

where a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a constant that depends on the initialization and ζ=max{1αμ,1(1λ2)p22χ}<1𝜁1𝛼𝜇11subscript𝜆2superscript𝑝22𝜒1\zeta=\max\{1-\alpha\mu,1-\frac{(1-\lambda_{2})p^{2}}{2\chi}\}<1italic_ζ = roman_max { 1 - italic_α italic_μ , 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG } < 1.

When σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0, by setting α=1/L𝛼1𝐿\alpha=\nicefrac{{1}}{{L}}italic_α = / start_ARG 1 end_ARG start_ARG italic_L end_ARG and χ=1𝜒1\chi=1italic_χ = 1, we can deduce from (5) that the communication complexity of ProxSkip to achieve ϵitalic-ϵ\epsilonitalic_ϵ-accuracy, i.e., 𝔼[|𝐱¯t𝐱2]ϵ𝔼delimited-[]superscriptdelimited-|‖superscript¯𝐱𝑡superscript𝐱2italic-ϵ\mathbb{E}[|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}]\leq\epsilonblackboard_E [ | over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ, is given by 𝒪((pκ+1/p(1λ2))log1/ϵ)𝒪𝑝𝜅1𝑝1subscript𝜆2log1italic-ϵ\mathcal{O}((p\kappa+\nicefrac{{1}}{{p(1-\lambda_{2})}})\mathrm{log}\nicefrac{% {1}}{{\epsilon}})caligraphic_O ( ( italic_p italic_κ + / start_ARG 1 end_ARG start_ARG italic_p ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ), where κ=L/μ𝜅𝐿𝜇\kappa=\nicefrac{{L}}{{\mu}}italic_κ = / start_ARG italic_L end_ARG start_ARG italic_μ end_ARG. If the network is sufficiently well-connected, i.e., 1/(1λ2)κ<111subscript𝜆2𝜅1\nicefrac{{1}}{{(1-\lambda_{2})\kappa}}<1/ start_ARG 1 end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_κ end_ARG < 1, and we set p=1/(1λ2)κ𝑝11subscript𝜆2𝜅p=\sqrt{\nicefrac{{1}}{{(1-\lambda_{2})\kappa}}}italic_p = square-root start_ARG / start_ARG 1 end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_κ end_ARG end_ARG, the iteration complexity becomes 𝒪(κ/1λ2log1/ϵ)𝒪𝜅1subscript𝜆2log1italic-ϵ\mathcal{O}(\sqrt{\nicefrac{{\kappa}}{{1-\lambda_{2}}}}\ \mathrm{log}\nicefrac% {{1}}{{\epsilon}})caligraphic_O ( square-root start_ARG / start_ARG italic_κ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ), achieving the optimal communication complexity as proven by [52].

When σ20superscript𝜎20\sigma^{2}\neq 0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≠ 0, based on (5) and the fact that α2σ21ζ=α2σ2αμ=𝒪(ασ2)superscript𝛼2superscript𝜎21𝜁superscript𝛼2superscript𝜎2𝛼𝜇𝒪𝛼superscript𝜎2\frac{\alpha^{2}\sigma^{2}}{1-\zeta}=\frac{\alpha^{2}\sigma^{2}}{\alpha\mu}=% \mathcal{O}(\alpha\sigma^{2})divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG = divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_μ end_ARG = caligraphic_O ( italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), we can conclude that the local solution 𝐱itsuperscriptsubscript𝐱𝑖𝑡{\bf{x}}_{i}^{t}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT generated by ProxSkip converges to the global minimizer 𝐱superscript𝐱{\bf{x}}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT at a linear rate until it reaches an 𝒪(ασ2)𝒪𝛼superscript𝜎2\mathcal{O}(\alpha\sigma^{2})caligraphic_O ( italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )-neighborhood of 𝐱superscript𝐱{\bf{x}}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. However, it is important to note that relying solely on equation (5) is not sufficient to achieve the desired linear speedup term 𝒪(ασ2n)+𝒪(α2)𝒪𝛼superscript𝜎2𝑛𝒪superscript𝛼2\mathcal{O}(\frac{\alpha\sigma^{2}}{n})+\mathcal{O}(\alpha^{2})caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) + caligraphic_O ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). This indicates that the direct extension of the analysis techniques proposed in [46] or [51] to the stochastic scenario does not guarantee linear speedup, despite ensuring convergence. Therefore, further analysis is required to achieve the desired linear speedup.

3.2 Main theorem—Convergence rate of ProxSkip

We are now ready to present the new convergence results for ProxSkip.

Theorem 2.

Suppose that Assumptions 1, 2, and 4 hold. Let 𝐱¯t=1ni=1n𝐱itsuperscript¯𝐱𝑡1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐱𝑖𝑡\bar{{\bf{x}}}^{t}=\frac{1}{n}\sum_{i=1}^{n}{\bf{x}}_{i}^{t}over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denote the iterates of Algorithm 1 and 𝐱superscript𝐱{\bf{x}}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT solves (1). For sufficiently small α𝛼\alphaitalic_α, χ=𝒪(max{1,(1p)/(1λ2)})𝜒𝒪11𝑝1subscript𝜆2\chi=\mathcal{O}(\max\{1,\nicefrac{{(1-p)}}{{(1-\lambda_{2})}}\})italic_χ = caligraphic_O ( roman_max { 1 , / start_ARG ( 1 - italic_p ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG } ), and β=1𝛽1\beta=1italic_β = 1, we have the following convergence results.

Non-convex: Let F0=f(𝐱¯0)fsubscript𝐹0𝑓superscript¯𝐱0superscript𝑓F_{0}=f(\bar{{\bf{x}}}^{0})-f^{\star}italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and ς02=1ni=1nfi(𝐱¯0)f(𝐱¯0)2subscriptsuperscript𝜍201𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖superscript¯𝐱0𝑓superscript¯𝐱02\varsigma^{2}_{0}=\frac{1}{n}\sum_{i=1}^{n}\|\nabla f_{i}(\bar{{\bf{x}}}^{0})-% \nabla f(\bar{{\bf{x}}}^{0})\|^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It holds that

1Tt=0T1𝔼[f(𝐱¯t)2]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 𝒪(F0αT+χ2L2ς02α2(1λ2)2Tdeterministicpart+ασ2Ln+χL2σ2α21λ2stochasticpart).𝒪subscriptsubscript𝐹0𝛼𝑇superscript𝜒2superscript𝐿2subscriptsuperscript𝜍20superscript𝛼2superscript1subscript𝜆22𝑇deterministicpartsubscript𝛼superscript𝜎2𝐿𝑛𝜒superscript𝐿2superscript𝜎2superscript𝛼21subscript𝜆2stochasticpart\displaystyle\mathcal{O}\bigg{(}\underbrace{\frac{F_{0}}{\alpha T}+\frac{\chi^% {2}L^{2}\varsigma^{2}_{0}\alpha^{2}}{(1-\lambda_{2})^{2}T}}_{{\mathrm{% deterministic\ part}}}+\underbrace{\frac{\alpha\sigma^{2}L}{n}+\frac{\chi L^{2% }\sigma^{2}\alpha^{2}}{1-\lambda_{2}}}_{{\mathrm{stochastic\ part}}}\bigg{)}.caligraphic_O ( under⏟ start_ARG divide start_ARG italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG end_ARG start_POSTSUBSCRIPT roman_deterministic roman_part end_POSTSUBSCRIPT + under⏟ start_ARG divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_χ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT roman_stochastic roman_part end_POSTSUBSCRIPT ) . (6)

Convex: Let R02=𝐱¯0𝐱2superscriptsubscript𝑅02superscriptnormsuperscript¯𝐱0superscript𝐱2R_{0}^{2}=\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Under the additional Assumption 3 with μ0𝜇0\mu\geq 0italic_μ ≥ 0, it holds that

1Tt=0T1𝔼[f(𝐱¯t)f]𝒪(R02αT+χ2Lς02α2(1λ2)2Tdeterministicpart+ασ2n+χLσ2α21λ2stochasticpart).1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓𝒪subscriptsuperscriptsubscript𝑅02𝛼𝑇superscript𝜒2𝐿subscriptsuperscript𝜍20superscript𝛼2superscript1subscript𝜆22𝑇deterministicpartsubscript𝛼superscript𝜎2𝑛𝜒𝐿superscript𝜎2superscript𝛼21subscript𝜆2stochasticpart\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leq\mathcal{O}\bigg{(}\underbrace{\frac{R_{0}^{2}}{\alpha T% }+\frac{\chi^{2}L\varsigma^{2}_{0}\alpha^{2}}{(1-\lambda_{2})^{2}T}}_{{\mathrm% {deterministic\ part}}}+\underbrace{\frac{\alpha\sigma^{2}}{n}+\frac{\chi L% \sigma^{2}\alpha^{2}}{1-\lambda_{2}}}_{{\mathrm{stochastic\ part}}}\bigg{)}.divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( under⏟ start_ARG divide start_ARG italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG end_ARG start_POSTSUBSCRIPT roman_deterministic roman_part end_POSTSUBSCRIPT + under⏟ start_ARG divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_χ italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT roman_stochastic roman_part end_POSTSUBSCRIPT ) . (7)

Strongly convex: Under the additional Assumption 3 with μ>0𝜇0\mu>0italic_μ > 0, it holds that

𝔼[𝐱¯T𝐱2]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2absent\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\big% {\|}^{2}\right]\leqblackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ (1αμ4)T(𝐱¯0𝐱2+8χα2ς021λ2)deterministicpart+𝒪(ασ2μn+χLσ2α2μ(1λ2)stochasticpart).subscriptsuperscript1𝛼𝜇4𝑇superscriptnormsuperscript¯𝐱0superscript𝐱28𝜒superscript𝛼2superscriptsubscript𝜍021subscript𝜆2deterministicpart𝒪subscript𝛼superscript𝜎2𝜇𝑛𝜒𝐿superscript𝜎2superscript𝛼2𝜇1subscript𝜆2stochasticpart\displaystyle\underbrace{\Big{(}1-\frac{\alpha\mu}{4}\Big{)}^{T}\Big{(}\big{\|% }\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}+\frac{8\chi\alpha^{2}% \varsigma_{0}^{2}}{1-\lambda_{2}}\Big{)}}_{{\mathrm{deterministic\ part}}}+% \mathcal{O}\bigg{(}\underbrace{\frac{\alpha\sigma^{2}}{\mu n}+\frac{\chi L% \sigma^{2}\alpha^{2}}{\mu(1-\lambda_{2})}}_{{\mathrm{stochastic\ part}}}\bigg{% )}.under⏟ start_ARG ( 1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 8 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) end_ARG start_POSTSUBSCRIPT roman_deterministic roman_part end_POSTSUBSCRIPT + caligraphic_O ( under⏟ start_ARG divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG + divide start_ARG italic_χ italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_ARG start_POSTSUBSCRIPT roman_stochastic roman_part end_POSTSUBSCRIPT ) . (8)

For the non-convex setting, Theorem 2 demonstrates that the ProxSkip algorithm converges to a radius around some stationary point. Without any additional assumptions, a stationary point is the best guarantee possible and is a satisfactory criterion to measure the performance of distributed methods with nonconvex objectives [35]. For the convex case, Theorem 2 shows that ProxSkip converges around some optimal solution. When σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0, i.e., in the deterministic case, ProxSkip converges exactly with sublinear and linear rates for N-CVX/CVX and S-CVX settings, respectively.

Note that stochastic part in convergence rates (6), (7), and (8), which all can be rewritten as 𝒪(ασ2n+α2σ2)𝒪𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\mathcal{O}(\frac{\alpha\sigma^{2}}{n}+\alpha^{2}\sigma^{2})caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). It follows from Theorem 2 that

N-CVX: 1Tt=0T1𝔼[f(𝐱¯t)2]𝒪(1αT)+𝒪(ασ2n+α2σ2),CVX: 1Tt=0T1𝔼[f(𝐱¯t)f]𝒪(1αT)+𝒪(ασ2n+α2σ2),S-CVX: 𝔼[𝐱¯T𝐱2]𝒪((1αμ)T)+𝒪(ασ2n+α2σ2).N-CVX: 1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2𝒪1𝛼𝑇𝒪𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2CVX: 1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓𝒪1𝛼𝑇𝒪𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2S-CVX: 𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2𝒪superscript1𝛼𝜇𝑇𝒪𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\begin{array}[]{rr}\text{N-CVX: }&\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left% [\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}\right]\leq\mathcal{O}(\frac{1}{\alpha T}% )+\mathcal{O}(\frac{\alpha\sigma^{2}}{n}+\alpha^{2}\sigma^{2}),\\ \text{CVX: }&\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leq\mathcal{O}(\frac{1}{\alpha T})+\mathcal{O}(\frac{\alpha% \sigma^{2}}{n}+\alpha^{2}\sigma^{2}),\\ \text{S-CVX: }&\mathbb{E}\!\left[\|\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\|^{2}% \right]\leq\mathcal{O}((1-\alpha\mu)^{T})+\mathcal{O}(\frac{\alpha\sigma^{2}}{% n}+\alpha^{2}\sigma^{2}).\end{array}start_ARRAY start_ROW start_CELL N-CVX: end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG ) + caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL CVX: end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG ) + caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL S-CVX: end_CELL start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) + caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . end_CELL end_ROW end_ARRAY

Thus, it is established in Theorem 2 that the linear speedup term 𝒪(ασ2n)𝒪𝛼superscript𝜎2𝑛\mathcal{O}(\frac{\alpha\sigma^{2}}{n})caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) can be achieved. When the stepsize is sufficiently small the term 𝒪(ασ2n)𝒪𝛼superscript𝜎2𝑛\mathcal{O}(\frac{\alpha\sigma^{2}}{n})caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) dominates convergence rates (6), (7), and (8), which improve linearly with the number of nodes n𝑛nitalic_n.

Setting α=nT𝛼𝑛𝑇\alpha=\sqrt{\frac{n}{T}}italic_α = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG italic_T end_ARG end_ARG for sufficiently large T𝑇Titalic_T for non-convex and convex settings, it holds that the rates are bounded by

𝒪(1nT+σ2nT+nχσ2(1λ2)T+nχ2(1λ2)2T2)=𝒪(1nT).𝒪1𝑛𝑇superscript𝜎2𝑛𝑇𝑛𝜒superscript𝜎21subscript𝜆2𝑇𝑛superscript𝜒2superscript1subscript𝜆22superscript𝑇2𝒪1𝑛𝑇\mathcal{O}\left(\sqrt{\frac{1}{nT}}+\sqrt{\frac{\sigma^{2}}{nT}}+\frac{n\chi% \sigma^{2}}{(1-\lambda_{2})T}+\frac{n\chi^{2}}{(1-\lambda_{2})^{2}T^{2}}\right% )=\mathcal{O}\left(\sqrt{\frac{1}{nT}}\right).caligraphic_O ( square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG end_ARG + square-root start_ARG divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG end_ARG + divide start_ARG italic_n italic_χ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T end_ARG + divide start_ARG italic_n italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) = caligraphic_O ( square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG end_ARG ) .

For the strongly convex setting, letting α=4lnT2μT𝛼4superscript𝑇2𝜇𝑇\alpha=\frac{4\ln T^{2}}{\mu T}italic_α = divide start_ARG 4 roman_ln italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_T end_ARG for sufficiently large T𝑇Titalic_T, it holds that 1αμ4exp(αμT4)=1T21𝛼𝜇4exp𝛼𝜇𝑇41superscript𝑇21-\frac{\alpha\mu}{4}\leq{\mathrm{exp}}(-\frac{\alpha\mu T}{4})=\frac{1}{T^{2}}1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ≤ roman_exp ( - divide start_ARG italic_α italic_μ italic_T end_ARG start_ARG 4 end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, where exp()exp{\mathrm{exp}}(\cdot)roman_exp ( ⋅ ) denotes the exponential function, thus the rate is bounded by

𝒪~(σ2nT+1T2+χσ2(1λ2)T2+χ(1λ2)T4)=𝒪~(1nT).~𝒪superscript𝜎2𝑛𝑇1superscript𝑇2𝜒superscript𝜎21subscript𝜆2superscript𝑇2𝜒1subscript𝜆2superscript𝑇4~𝒪1𝑛𝑇\tilde{\mathcal{O}}\left(\frac{\sigma^{2}}{nT}+\frac{1}{T^{2}}+\frac{\chi% \sigma^{2}}{(1-\lambda_{2})T^{2}}+\frac{\chi}{(1-\lambda_{2})T^{4}}\right)=% \tilde{\mathcal{O}}\left(\frac{1}{nT}\right).over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_χ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ) = over~ start_ARG caligraphic_O end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG ) .

When T𝑇Titalic_T is sufficiently large, the term 1nT1𝑛𝑇\frac{1}{\sqrt{nT}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n italic_T end_ARG end_ARG (or 1nT1𝑛𝑇\frac{1}{nT}divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG for the strongly convex setting) will dominate the rate. In this scenario, ProxSkip requires T=Ω(1nϵ2)𝑇Ω1𝑛superscriptitalic-ϵ2T=\Omega\left(\frac{1}{n\epsilon^{2}}\right)italic_T = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (or T=Ω(1nϵ)𝑇Ω1𝑛italic-ϵT=\Omega\left(\frac{1}{n\epsilon}\right)italic_T = roman_Ω ( divide start_ARG 1 end_ARG start_ARG italic_n italic_ϵ end_ARG )) iterations to reach a desired ϵitalic-ϵ\epsilonitalic_ϵ-accurate solution, thus the convergence accuracy improves linearly with n𝑛nitalic_n.

In addition, based on Theorem 2, we can even get a tighter rate by carefully selecting the stepsize to obtain the following result.

Corollary 1.

Same settings as in Theorem 1, we have the following convergence results.

Non-convex: It holds that

1Tt=0T1𝔼[f(𝐱¯t)2]ϵ after 𝒪(pσ2nϵ2+pχ1λ2σϵ3/2+χ(1λ2)ϵ)1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2italic-ϵ after 𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝𝜒1subscript𝜆2𝜎superscriptitalic-ϵ32𝜒1subscript𝜆2italic-ϵ\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[\|\nabla f(\bar{{\bf{x}}}^{% t})\|^{2}]\leq\epsilon\quad\text{ after }\quad\mathcal{O}\left(\frac{p\sigma^{% 2}}{n\epsilon^{2}}+\frac{p\sqrt{\chi}}{\sqrt{1-\lambda_{2}}}\frac{\sigma}{% \epsilon^{\nicefrac{{3}}{{2}}}}+\frac{\chi}{(1-\lambda_{2})\epsilon}\right)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ after caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p square-root start_ARG italic_χ end_ARG end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_ϵ end_ARG ) (9)

expected communication rounds.

Convex: Under the additional Assumption 3 with μ0𝜇0\mu\geq 0italic_μ ≥ 0, it holds that

1Tt=0T1𝔼[f(𝐱¯t)f]ϵ after 𝒪(pσ2nϵ2+pχ1λ2σϵ3/2+χ(1λ2)ϵ)1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓italic-ϵ after 𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝𝜒1subscript𝜆2𝜎superscriptitalic-ϵ32𝜒1subscript𝜆2italic-ϵ\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[f(\bar{{\bf{x}}}^{t})-f^{% \star}]\leq\epsilon\quad\text{ after }\quad\mathcal{O}\left(\frac{p\sigma^{2}}% {n\epsilon^{2}}+\frac{p\sqrt{\chi}}{\sqrt{1-\lambda_{2}}}\frac{\sigma}{% \epsilon^{\nicefrac{{3}}{{2}}}}+\frac{\chi}{(1-\lambda_{2})\epsilon}\right)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ italic_ϵ after caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p square-root start_ARG italic_χ end_ARG end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_ϵ end_ARG ) (10)

expected communication rounds.

Strongly Convex: Under the additional Assumption 3 with μ>0𝜇0\mu>0italic_μ > 0, it holds that

𝔼[𝐱¯T𝐱2]ϵ after 𝒪~(pσ2nϵ+pχ1λ2σϵ+χlog1/ϵ1λ2)𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2italic-ϵ after ~𝒪𝑝superscript𝜎2𝑛italic-ϵ𝑝𝜒1subscript𝜆2𝜎italic-ϵ𝜒log1italic-ϵ1subscript𝜆2\displaystyle\mathbb{E}[\|\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\|^{2}]\leq% \epsilon\quad\text{ after }\quad\tilde{\mathcal{O}}\left(\frac{p\sigma^{2}}{n% \epsilon}+\frac{p\sqrt{\chi}}{\sqrt{1-\lambda_{2}}}\frac{\sigma}{\sqrt{% \epsilon}}+\frac{\chi\mathrm{log}\nicefrac{{1}}{{\epsilon}}}{1-\lambda_{2}}\right)blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ after over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG + divide start_ARG italic_p square-root start_ARG italic_χ end_ARG end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG italic_χ roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) (11)

expected communication rounds. Here, the notation 𝒪~()~𝒪\tilde{\mathcal{O}}(\cdot)over~ start_ARG caligraphic_O end_ARG ( ⋅ ) ignores logarithmic factors.

We provide Table 2 to compare the convergence results of ProxSkip with existing state-of-the-art decentralized optimization algorithms, such as local-DSGD [35], K𝐾Kitalic_K-GT [40], and LED [45], with local updates in terms of the number of communication rounds needed to achieve ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0.

Table 2: A comparison with existing methods employing local steps. ρ=1λ2𝜌1subscript𝜆2\rho=1-\lambda_{2}italic_ρ = 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, K𝐾Kitalic_K denotes the number of local steps, and SL-NI denotes linear speedup with network-independent stepsizes.
Method # communication rounds LS-NI
N-CVX/CVX S-CVX S-CVX
local-DSGD [35] 𝒪(σ2nKϵ2+(σρK+ςρ)1ϵ3/2+1ρϵ)𝒪superscript𝜎2𝑛𝐾superscriptitalic-ϵ2𝜎𝜌𝐾𝜍𝜌1superscriptitalic-ϵ321𝜌italic-ϵ\mathcal{O}\left(\frac{\sigma^{2}}{nK\epsilon^{2}}+\left(\frac{\sigma}{\sqrt{% \rho K}}+\frac{\varsigma}{\rho}\right)\frac{1}{\epsilon^{\nicefrac{{3}}{{2}}}}% +\frac{1}{\rho\epsilon}\right)caligraphic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG + divide start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ italic_ϵ end_ARG )a 𝒪~(σ2nKϵ+(σρK+ςρ)1ϵ+1ρlog1/ϵ)~𝒪superscript𝜎2𝑛𝐾italic-ϵ𝜎𝜌𝐾𝜍𝜌1italic-ϵ1𝜌log1italic-ϵ\tilde{\mathcal{O}}\left(\frac{\sigma^{2}}{nK\epsilon}+\left(\frac{\sigma}{% \sqrt{\rho K}}+\frac{\varsigma}{\rho}\right)\frac{1}{\sqrt{\epsilon}}+\frac{1}% {\rho}\mathrm{log}\nicefrac{{1}}{{\epsilon}}\right)over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG + divide start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG ) divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG )
K𝐾Kitalic_K-GT [40] 𝒪(σ2nKϵ2+(σρ2K)1ϵ3/2+1ρ2ϵ)b𝒪superscript𝜎2𝑛𝐾superscriptitalic-ϵ2𝜎superscript𝜌2𝐾1superscriptitalic-ϵ321superscript𝜌2italic-ϵb\mathcal{O}\left(\frac{\sigma^{2}}{nK\epsilon^{2}}+\left(\frac{\sigma}{\rho^{2% }\sqrt{K}}\right)\frac{1}{\epsilon^{\nicefrac{{3}}{{2}}}}+\frac{1}{\rho^{2}% \epsilon}\right){\textsuperscript{b}}caligraphic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG italic_K end_ARG end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ end_ARG ) no results
Periodical GT [40] 𝒪(σ2nKϵ2+(σρ2)1ϵ3/2+1ρ2ϵ)b𝒪superscript𝜎2𝑛𝐾superscriptitalic-ϵ2𝜎superscript𝜌21superscriptitalic-ϵ321superscript𝜌2italic-ϵb\mathcal{O}\left(\frac{\sigma^{2}}{nK\epsilon^{2}}+\left(\frac{\sigma}{\rho^{2% }}\right)\frac{1}{\epsilon^{\nicefrac{{3}}{{2}}}}+\frac{1}{\rho^{2}\epsilon}% \right){\textsuperscript{b}}caligraphic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ end_ARG ) no results
LED [45] 𝒪(σ2nKϵ2+(σρK)1ϵ3/2+1ρϵ)𝒪superscript𝜎2𝑛𝐾superscriptitalic-ϵ2𝜎𝜌𝐾1superscriptitalic-ϵ321𝜌italic-ϵ\mathcal{O}\left(\frac{\sigma^{2}}{nK\epsilon^{2}}+\left(\frac{\sigma}{\sqrt{% \rho K}}\right)\frac{1}{\epsilon^{\nicefrac{{3}}{{2}}}}+\frac{1}{\rho\epsilon}\right)caligraphic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ italic_ϵ end_ARG ) 𝒪~(σ2nKϵ+(σρK)1ϵ+1ρlog1/ϵ)~𝒪superscript𝜎2𝑛𝐾italic-ϵ𝜎𝜌𝐾1italic-ϵ1𝜌log1italic-ϵ\tilde{\mathcal{O}}\left(\frac{\sigma^{2}}{nK\epsilon}+\left(\frac{\sigma}{% \sqrt{\rho K}}\right)\frac{1}{\sqrt{\epsilon}}+\frac{1}{\rho}\mathrm{log}% \nicefrac{{1}}{{\epsilon}}\right)over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ end_ARG + ( divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG ) divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG )
ProxSkip 𝒪(pσ2nϵ2+pχρσϵ3/2+χρϵ)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝𝜒𝜌𝜎superscriptitalic-ϵ32𝜒𝜌italic-ϵ\mathcal{O}\left(\frac{p\sigma^{2}}{n\epsilon^{2}}+\frac{p\sqrt{\chi}}{\sqrt{% \rho}}\frac{\sigma}{\epsilon^{\nicefrac{{3}}{{2}}}}+\frac{\chi}{\rho\epsilon}\right)caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p square-root start_ARG italic_χ end_ARG end_ARG start_ARG square-root start_ARG italic_ρ end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_χ end_ARG start_ARG italic_ρ italic_ϵ end_ARG ) 𝒪~(pσ2nϵ+pχρσϵ+χρlog1/ϵ)~𝒪𝑝superscript𝜎2𝑛italic-ϵ𝑝𝜒𝜌𝜎italic-ϵ𝜒𝜌log1italic-ϵ\tilde{\mathcal{O}}\left(\frac{p\sigma^{2}}{n\epsilon}+\frac{p\sqrt{\chi}}{% \sqrt{\rho}}\frac{\sigma}{\sqrt{\epsilon}}+\frac{\chi}{\rho}\mathrm{log}% \nicefrac{{1}}{{\epsilon}}\right)over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG + divide start_ARG italic_p square-root start_ARG italic_χ end_ARG end_ARG start_ARG square-root start_ARG italic_ρ end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG italic_χ end_ARG start_ARG italic_ρ end_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG )
  • a

    ς2superscript𝜍2\varsigma^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is function heterogeneity constant such that 1/ni=1nfi(𝐱)f(𝐱)2ς21𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖𝐱𝑓superscript𝐱2superscript𝜍2\nicefrac{{1}}{{n}}\sum_{i=1}^{n}\|\nabla f_{i}({\bf{x}})-\nabla f({\bf{x}}^{% \star})\|^{2}\leq\varsigma^{2}/ start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • b

    The results is for the non-convex setting, and no corresponding result is given for the convex setting.

Achieving acceleration by p𝑝pitalic_p and n𝑛nitalic_n. According to (9), (10), and (11), when ϵitalic-ϵ\epsilonitalic_ϵ is sufficiently small, the convergence rate is dominated by noise and is unaffected by the graph parameter 1λ21subscript𝜆21-\lambda_{2}1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for ProxSkip. After enough transient time, ProxSkip with 𝒪(pσ2nϵ2)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2\mathcal{O}\big{(}\frac{p\sigma^{2}}{n\epsilon^{2}}\big{)}caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (or 𝒪~(pσ2nϵ)~𝒪𝑝superscript𝜎2𝑛italic-ϵ\tilde{\mathcal{O}}\big{(}\frac{p\sigma^{2}}{n\epsilon}\big{)}over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG ) for the strongly convex setting) achieves linear speedup by the probability of communication p𝑝pitalic_p and the number of nodes n𝑛nitalic_n.

Removing dependence on data heterogeneity. According to Table 2, the second term of the communication complexity of local-DSGD [35], a popular algorithm for decentralized optimization, is as follows:

N-CVX/CVX: 𝒪((σ/ρK+ς/ρ)ϵ3/2),S-CVX: 𝒪~((σ/ρK+ς/ρ)ϵ1/2).N-CVX/CVX: 𝒪𝜎𝜌𝐾𝜍𝜌superscriptitalic-ϵ32S-CVX: ~𝒪𝜎𝜌𝐾𝜍𝜌superscriptitalic-ϵ12\text{N-CVX/CVX: }\mathcal{O}\left(\left(\nicefrac{{\sigma}}{{\sqrt{\rho K}}}+% \nicefrac{{\varsigma}}{{\rho}}\right)\epsilon^{-\nicefrac{{3}}{{2}}}\right),% \quad\text{S-CVX: }\tilde{\mathcal{O}}\left(\left(\nicefrac{{\sigma}}{{\sqrt{% \rho K}}}+\nicefrac{{\varsigma}}{{\rho}}\right)\epsilon^{-\nicefrac{{1}}{{2}}}% \right).N-CVX/CVX: caligraphic_O ( ( / start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG + / start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG ) italic_ϵ start_POSTSUPERSCRIPT - / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) , S-CVX: over~ start_ARG caligraphic_O end_ARG ( ( / start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ρ italic_K end_ARG end_ARG + / start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG ) italic_ϵ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) .

Here, ς2superscript𝜍2\varsigma^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT represents the function heterogeneity constant such that 1/ni=1nfi(𝐱)f(𝐱)2ς21𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖𝐱𝑓superscript𝐱2superscript𝜍2\nicefrac{{1}}{{n}}\sum_{i=1}^{n}\|\nabla f_{i}(\mathbf{x})-\nabla f(\mathbf{x% }^{\star})\|^{2}\leq\varsigma^{2}/ start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We note that ProxSkip lacks the additional term ςρ1ϵ3/2𝜍𝜌1superscriptitalic-ϵ32\frac{\varsigma}{\rho}\frac{1}{\epsilon^{\nicefrac{{3}}{{2}}}}divide start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG (ςρ1ϵ𝜍𝜌1italic-ϵ\frac{\varsigma}{\rho}\frac{1}{\sqrt{\epsilon}}divide start_ARG italic_ς end_ARG start_ARG italic_ρ end_ARG divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG for the strongly convex case). Thus, ProxSkip effectively eliminates dependence on the data heterogeneity level ς2superscript𝜍2\varsigma^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Comparable with existing decentralized algorithms incorporating local updates. When p<λ2𝑝subscript𝜆2p<\lambda_{2}italic_p < italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have χ=𝒪(max{1,1p1λ2})=𝒪(11λ2)𝜒𝒪11𝑝1subscript𝜆2𝒪11subscript𝜆2\chi=\mathcal{O}(\max\{1,\frac{1-p}{1-\lambda_{2}}\})=\mathcal{O}(\frac{1}{1-% \lambda_{2}})italic_χ = caligraphic_O ( roman_max { 1 , divide start_ARG 1 - italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG } ) = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ); when p>λ2𝑝subscript𝜆2p>\lambda_{2}italic_p > italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, χ=𝒪(max{1,1p1λ2})=𝒪(1)𝜒𝒪11𝑝1subscript𝜆2𝒪1\chi=\mathcal{O}(\max\{1,\frac{1-p}{1-\lambda_{2}}\})=\mathcal{O}(1)italic_χ = caligraphic_O ( roman_max { 1 , divide start_ARG 1 - italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG } ) = caligraphic_O ( 1 ). Highlighting the network quantities, the second and third terms of the communication complexity of ProxSkip is 𝒪(pρ1+ρ2)𝒪𝑝superscript𝜌1superscript𝜌2\mathcal{O}\left(p\rho^{-1}+\rho^{-2}\right)caligraphic_O ( italic_p italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_ρ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) when p<1ρ𝑝1𝜌p<1-\rhoitalic_p < 1 - italic_ρ, 𝒪(pρ1/2+ρ1)𝒪𝑝superscript𝜌12superscript𝜌1\mathcal{O}\left(p\rho^{-\nicefrac{{1}}{{2}}}+\rho^{-1}\right)caligraphic_O ( italic_p italic_ρ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) when p1ρ𝑝1𝜌p\geq 1-\rhoitalic_p ≥ 1 - italic_ρ. Compared with GT based methods [40], the network dependent bounds are improved. Let p=1/K𝑝1𝐾p=\nicefrac{{1}}{{K}}italic_p = / start_ARG 1 end_ARG start_ARG italic_K end_ARG. Considering that the first term of the communication complexity of ProxSkip, local-DSGD [35], K𝐾Kitalic_K-GT [40], Periodical-GT [40], and LED [45] are 𝒪(σ2nKϵ2)𝒪superscript𝜎2𝑛𝐾superscriptitalic-ϵ2\mathcal{O}(\frac{\sigma^{2}}{nK\epsilon^{2}})caligraphic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (or 𝒪~(σ2nKϵ)~𝒪superscript𝜎2𝑛𝐾italic-ϵ\tilde{\mathcal{O}}(\frac{\sigma^{2}}{nK\epsilon})over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_K italic_ϵ end_ARG ) for the strongly convex setting), where K𝐾Kitalic_K denotes the number of local steps, the convergence rates of ProxSkip are comparable with these existing decentralized algorithms incorporating local updates.

3.3 Achieving linear speedup with network-independent stepsizes

Theorem 3.

Suppose that Assumptions 1, 2, and 4 hold, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-strongly convex for some 0<μL0𝜇𝐿0<\mu\leq L0 < italic_μ ≤ italic_L. If 0<α12L0𝛼12𝐿0<\alpha\leq\frac{1}{2L}0 < italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, β=p𝛽𝑝\beta=pitalic_β = italic_p, there exists χ=𝒪(max{1p,11λ2,1p1λ2})𝜒𝒪1𝑝11subscript𝜆21𝑝1subscript𝜆2\chi=\mathcal{O}(\max\{\frac{1}{p},\frac{1}{1-\lambda_{2}},\frac{1-p}{1-% \lambda_{2}}\})italic_χ = caligraphic_O ( roman_max { divide start_ARG 1 end_ARG start_ARG italic_p end_ARG , divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , divide start_ARG 1 - italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG } ) such that

𝔼[𝐱¯t+1𝐱2]ζ0t+1a0+ασ2nμ+𝒪(α2σ2),𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscriptsubscript𝜁0𝑡1subscript𝑎0𝛼superscript𝜎2𝑛𝜇𝒪superscript𝛼2superscript𝜎2\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\right]\leq\zeta_{0}^{t+1}a_{0}+\frac{\alpha\sigma^{2}}{n\mu}+% \mathcal{O}(\alpha^{2}\sigma^{2}),blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_μ end_ARG + caligraphic_O ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (12)

where a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a constant that depends on the initialization and ζ0=max{1αμ,1(1λ2)p22χ}<1subscript𝜁01𝛼𝜇11subscript𝜆2superscript𝑝22𝜒1\zeta_{0}=\max\{1-\alpha\mu,\sqrt{1-\frac{(1-\lambda_{2})p^{2}}{2\chi}}\}<1italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { 1 - italic_α italic_μ , square-root start_ARG 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG end_ARG } < 1.

According to this rate, a linear speedup term of 𝒪(ασ2n)+𝒪(α2)𝒪𝛼superscript𝜎2𝑛𝒪superscript𝛼2\mathcal{O}(\frac{\alpha\sigma^{2}}{n})+\mathcal{O}(\alpha^{2})caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) + caligraphic_O ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) can be achieved. Importantly, the upper bound on the step size is independent of network topologies, making it a favorable property for practical implementation. Referring to Table 2, in the strongly convex setting, while local-DSGD [35], K𝐾Kitalic_K-GT [40], and LED [45] achieve linear speedup bounds, this property hinges on the requirement of network-dependent step sizes, wherein these step sizes are correlated with 1λ21subscript𝜆21-\lambda_{2}1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In contrast, the step size condition for ProxSkip is 0<α12L0𝛼12𝐿0<\alpha\leq\frac{1}{2L}0 < italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, which remains independent of 1λ21subscript𝜆21-\lambda_{2}1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Notably, [53] for the first time prove that NIDS/ED/D2 can achieve linear speedup with network-independent stepsizes. However, it remains an open question whether, with local updates, a linear speedup bound can be achieved using network-independent stepsizes. Theorem 3 offers a positive response to this question.

3.4 Proof sketch of the main theorem

The existing convergence analysis of ProxSkip [46, 47, 48, 49, 50, 51] relies on primal-dual methodologies. Nevertheless, these analyses are limited to the use of first-order (stochastic) gradient information, leading to a suboptimal exploitation of the available function data. We propose a new proof that, in order to fully utilize function and gradient information, we use matrix factorization techniques to equivalently transform the iteration of ProxSkip into “SGD + consensus” form.

Here, we provide a proof sketch for Theorem 2 concerning non-convex objectives.

Step 1. (Lemma 1) We first give the equivalent form of update (4) as follows.

𝐱¯t+1=𝐱¯tα𝐠¯t,t+1=𝚪t+𝚯1t+𝚯2t,formulae-sequencesuperscript¯𝐱𝑡1superscript¯𝐱𝑡𝛼superscript¯𝐠𝑡superscript𝑡1𝚪superscript𝑡superscriptsubscript𝚯1𝑡superscriptsubscript𝚯2𝑡\displaystyle\bar{{\bf{x}}}^{t+1}=\bar{{\bf{x}}}^{t}-\alpha\bar{{\bf{g}}}^{t},% \quad\mathcal{E}^{t+1}={\bf{\Gamma}}\mathcal{E}^{t}+\bm{\Theta}_{1}^{t}+\bm{% \Theta}_{2}^{t},over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG bold_g end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (13)

where 𝐠¯t=1ni=1n𝐠itsuperscript¯𝐠𝑡1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐠𝑖𝑡\bar{{\bf{g}}}^{t}=\frac{1}{n}\sum_{i=1}^{n}{\bf{g}}_{i}^{t}over¯ start_ARG bold_g end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, 𝚪<1norm𝚪1\|{\bf{\Gamma}}\|<1∥ bold_Γ ∥ < 1, tsuperscript𝑡\mathcal{E}^{t}caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT measures “consensus”, 𝚯1tsuperscriptsubscript𝚯1𝑡\bm{\Theta}_{1}^{t}bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is related to the stochastic gradient, and 𝚯2tsuperscriptsubscript𝚯2𝑡\bm{\Theta}_{2}^{t}bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT measures “communication error” (𝚯2t=0superscriptsubscript𝚯2𝑡0\bm{\Theta}_{2}^{t}=0bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0, if θt=1subscript𝜃𝑡1\theta_{t}=1italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1). This description may be less rigorous, but it helps to understand the proof more clearly. See Lemma 1 in Appendix for more details.

Step 2. (Lemma 2) Based on this equivalent update of ProxSkip and by the L𝐿Litalic_L-smoothness of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we establish the following descent inequality.

𝔼[f(𝐱¯t+1)]𝔼delimited-[]𝑓superscript¯𝐱𝑡1absent\displaystyle\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t+1})\right]\leqblackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] ≤ f(𝐱¯t)α2f(𝐱¯t)2+2αL2ntF2+Lα2σ22n.𝑓superscript¯𝐱𝑡𝛼2superscriptnorm𝑓superscript¯𝐱𝑡22𝛼superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝑡F2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle f(\bar{{\bf{x}}}^{t})-\frac{\alpha}{2}\big{\|}\nabla f(\bar{{\bf% {x}}}^{t})\big{\|}^{2}+\frac{2\alpha L^{2}}{n}\big{\|}\mathcal{E}^{t}\big{\|}_% {\mathrm{F}}^{2}+{\frac{L\alpha^{2}\sigma^{2}}{2n}}.italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG .

Taking average for both sides over t=0,1,,T1𝑡01𝑇1t=0,1,\ldots,T-1italic_t = 0 , 1 , … , italic_T - 1, we have

1Tt=0T1𝔼[f(𝐱¯t)2]2F0αT+4L2nTt=0T1tF2+αLσ2n.1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡22subscript𝐹0𝛼𝑇4superscript𝐿2𝑛𝑇superscriptsubscript𝑡0𝑇1superscriptsubscriptnormsuperscript𝑡F2𝛼𝐿superscript𝜎2𝑛\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leq\frac{2F_{0}}{\alpha T}+\frac{4L^{2}% }{nT}\sum_{t=0}^{T-1}\big{\|}\mathcal{E}^{t}\big{\|}_{\mathrm{F}}^{2}+{\frac{% \alpha L\sigma^{2}}{n}}.divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ divide start_ARG 2 italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG . (14)

Step 3. (Lemma 2) Subsequently, we establish the following consensus inequality.

𝔼[t+1F2]γ~tF2+A1nα4f(𝐱¯t)2+A2nα2σ2+A3α4σ2,𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡1F2~𝛾superscriptsubscriptnormsuperscript𝑡F2subscript𝐴1𝑛superscript𝛼4superscriptnorm𝑓superscript¯𝐱𝑡2subscript𝐴2𝑛superscript𝛼2superscript𝜎2subscript𝐴3superscript𝛼4superscript𝜎2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\right]% \leq\tilde{\gamma}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+A_{1}n\alpha^{4}\|% \nabla f(\bar{{\bf{x}}}^{t})\|^{2}+A_{2}n\alpha^{2}\sigma^{2}+A_{3}\alpha^{4}% \sigma^{2},blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where 0<γ~<10~𝛾10<\tilde{\gamma}<10 < over~ start_ARG italic_γ end_ARG < 1 and A1,A2,A3>0subscript𝐴1subscript𝐴2subscript𝐴30A_{1},A_{2},A_{3}>0italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > 0. Unrolling this recurrence, we have

𝔼[tF2]0F21γ~+A2nα2σ2+A3α4σ21γ~+A1nα4k=0t1γ~t1kf(𝐱¯k)2.𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡F2superscriptsubscriptnormsuperscript0F21~𝛾subscript𝐴2𝑛superscript𝛼2superscript𝜎2subscript𝐴3superscript𝛼4superscript𝜎21~𝛾subscript𝐴1𝑛superscript𝛼4superscriptsubscript𝑘0𝑡1superscript~𝛾𝑡1𝑘superscriptnorm𝑓superscript¯𝐱𝑘2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}\right]\leq% \frac{\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}}{1-\tilde{\gamma}}+\frac{A_{2}n% \alpha^{2}\sigma^{2}+A_{3}\alpha^{4}\sigma^{2}}{1-\tilde{\gamma}}+A_{1}n\alpha% ^{4}\sum_{k=0}^{t-1}\tilde{\gamma}^{t-1-k}\|\nabla f(\bar{{\bf{x}}}^{k})\|^{2}.blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ divide start_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG + divide start_ARG italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT italic_t - 1 - italic_k end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Step 4. Since t=0T1k=0t1γ~t1kf(𝐱¯k)211γ~t=0T1f(𝐱¯t)2superscriptsubscript𝑡0𝑇1superscriptsubscript𝑘0𝑡1superscript~𝛾𝑡1𝑘superscriptnorm𝑓superscript¯𝐱𝑘211~𝛾superscriptsubscript𝑡0𝑇1superscriptnorm𝑓superscript¯𝐱𝑡2\sum_{t=0}^{T-1}\sum_{k=0}^{t-1}\tilde{\gamma}^{t-1-k}\|\nabla f(\bar{{\bf{x}}% }^{k})\|^{2}\leq\frac{1}{1-\tilde{\gamma}}\sum_{t=0}^{T-1}\|\nabla f(\bar{{\bf% {x}}}^{t})\|^{2}∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT italic_t - 1 - italic_k end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, letting α𝛼\alphaitalic_α such that 4L2A1α41γ~124superscript𝐿2subscript𝐴1superscript𝛼41~𝛾12\frac{4L^{2}A_{1}\alpha^{4}}{1-\tilde{\gamma}}\leq\frac{1}{2}divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG, it gives that 4L2nTt=0T1tF2𝒪(α2σ2)+12Tt=0T1f(𝐱¯t)24superscript𝐿2𝑛𝑇superscriptsubscript𝑡0𝑇1superscriptsubscriptnormsuperscript𝑡F2𝒪superscript𝛼2superscript𝜎212𝑇superscriptsubscript𝑡0𝑇1superscriptnorm𝑓superscript¯𝐱𝑡2\frac{4L^{2}}{nT}\sum_{t=0}^{T-1}\big{\|}\mathcal{E}^{t}\big{\|}_{\mathrm{F}}^% {2}\leq\mathcal{O}(\alpha^{2}\sigma^{2})+\frac{1}{2T}\sum_{t=0}^{T-1}\|\nabla f% (\bar{{\bf{x}}}^{t})\|^{2}divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ caligraphic_O ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Combining it with (14), we complete the proof, i.e.,

1Tt=0T1𝔼[f(𝐱¯t)2]𝒪(1αT)+𝒪(ασ2n+α2σ2).1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2𝒪1𝛼𝑇𝒪𝛼superscript𝜎2𝑛superscript𝛼2superscript𝜎2\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\|\nabla f(\bar{{\bf% {x}}}^{t})\|^{2}\right]\leq\mathcal{O}\left(\frac{1}{\alpha T}\right)+\mathcal% {O}\left(\frac{\alpha\sigma^{2}}{n}+\alpha^{2}\sigma^{2}\right).divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_α italic_T end_ARG ) + caligraphic_O ( divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

4 Experiments

We empirically verify the theoretical results of ProxSkip for stochastic decentralized optimization. The experiment results for the deterministic case can be found in [46].

Setup. Similar as [46], we also demonstrate our findings on the logistic regression problem with a regularizer. The objective function is f(𝐱)=1ni=1n{1mij=1miln(1+e(𝒜ij𝖳𝐱)ij)}+r(𝐱)𝑓𝐱1𝑛superscriptsubscript𝑖1𝑛1subscript𝑚𝑖superscriptsubscript𝑗1subscript𝑚𝑖1superscript𝑒superscriptsubscript𝒜𝑖𝑗𝖳𝐱subscript𝑖𝑗𝑟𝐱f({\bf{x}})=\frac{1}{n}\sum_{i=1}^{n}\big{\{}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}% \ln(1+e^{-(\mathcal{A}_{ij}^{\sf T}{{\bf{x}}})\mathcal{B}_{ij}})\big{\}}+r({% \bf{x}})italic_f ( bold_x ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_ln ( 1 + italic_e start_POSTSUPERSCRIPT - ( caligraphic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_x ) caligraphic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) } + italic_r ( bold_x ). Here, r(𝐱)𝑟𝐱r({\bf{x}})italic_r ( bold_x ) is the regularizer, any node i𝑖iitalic_i holds its own training date (𝒜ij,ij)subscript𝒜𝑖𝑗subscript𝑖𝑗absent\left(\mathcal{A}_{ij},\mathcal{B}_{ij}\right)\in( caligraphic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , caligraphic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ d×{1,1},j=1,,miformulae-sequencesuperscript𝑑11𝑗1subscript𝑚𝑖\mathbb{R}^{d}\times\{-1,1\},j=1,\cdots,m_{i}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × { - 1 , 1 } , italic_j = 1 , ⋯ , italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, including sample vectors 𝒜ijsubscript𝒜𝑖𝑗\mathcal{A}_{ij}caligraphic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and corresponding classes ijsubscript𝑖𝑗\mathcal{B}_{ij}caligraphic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. We use the dataset ijcnn1 from the widely-used LIBSVM library [54], whose attributes is d=22𝑑22d=22italic_d = 22 and i=1nmi=49950superscriptsubscript𝑖1𝑛subscript𝑚𝑖49950\sum_{i=1}^{n}m_{i}=49950∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 49950. Moreover, the training samples are randomly and evenly distributed over all the n𝑛nitalic_n agents. We control the stochastic noise σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by adding Gaussian noise to every stochastic gradient, i.e., the stochastic gradients are generated as follows: Fi(𝐱)=fi(𝐱)+ωisubscript𝐹𝑖𝐱subscript𝑓𝑖𝐱subscript𝜔𝑖\nabla F_{i}({\bf{x}})=\nabla f_{i}({\bf{x}})+\omega_{i}∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) + italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where ωi𝒩(0,σ2𝐈d)subscript𝜔𝑖𝒩0superscript𝜎2subscript𝐈𝑑\omega_{i}\thicksim\mathcal{N}(0,\sigma^{2}{\bf{I}}_{d})italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and σ2=103superscript𝜎2superscript103\sigma^{2}=10^{-3}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT.

For all experiments, we first compute the solution 𝐱superscript𝐱{\bf{x}}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT to (1) by centralized methods, and then run over a randomly generated connected network with n𝑛nitalic_n agents and ιn(n1)2𝜄𝑛𝑛12\frac{\iota n(n-1)}{2}divide start_ARG italic_ι italic_n ( italic_n - 1 ) end_ARG start_ARG 2 end_ARG undirected edges, where ι𝜄\iotaitalic_ι is the connectivity ratio. The mixing matrix 𝐖𝐖{\bf{W}}bold_W is generated with the Metropolis-Hastings rule. All stochastic results are averaged over 10 runs.

Achieving linear speedup by n𝑛nitalic_n and 1/p1𝑝\nicefrac{{1}}{{p}}/ start_ARG 1 end_ARG start_ARG italic_p end_ARG. We choose the regularizer r(𝐱)=12𝐱2𝑟𝐱12superscriptnorm𝐱2r({\bf{x}})=\frac{1}{2}\|{\bf{x}}\|^{2}italic_r ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to demonstrate the results in the convex setting. The results are shown in Fig. 1. The relative error 𝐱¯t𝐱/𝐱normsuperscript¯𝐱𝑡superscript𝐱normsuperscript𝐱\nicefrac{{\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|}}{{\|{\bf{x}}^{\star}\|}}/ start_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ end_ARG start_ARG ∥ bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ end_ARG is shown on the y𝑦yitalic_y-axis. Here, we set α=12L𝛼12𝐿\alpha=\frac{1}{2L}italic_α = divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, which independent of the network topology. We show the performance of ProxSkip at different number of nodes n𝑛nitalic_n, network connectivity ι𝜄\iotaitalic_ι, and communication probability p𝑝pitalic_p. The results show that, when the number of nodes is increased, the relative errors of ProxSkip is reduced under a constant and network-independent stepsize, which validates our results about linear speedup. Moreover, Fig. 1 shows that we can save on communication rounds by reducing p𝑝pitalic_p, i.e., increasing the number of local steps reduces the amount of communication required to achieve the same level of accuracy.

Refer to caption
Figure 1: Experimental results for ProxSkip to logistic regression problem with a strongly convex regularizer r(𝐱)=12𝐱2𝑟𝐱12superscriptnorm𝐱2r({\bf{x}})=\frac{1}{2}\|{\bf{x}}\|^{2}italic_r ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over ijcnn1 dataset.

Comparing with existing decentralized algorithms. We choose the regularizer r(𝐱)=j=1d𝐱(j)21+𝐱(j)2𝑟𝐱superscriptsubscript𝑗1𝑑𝐱superscript𝑗21𝐱superscript𝑗2r({\bf{x}})=\sum_{j=1}^{d}\frac{{\bf{x}}(j)^{2}}{1+{\bf{x}}(j)^{2}}italic_r ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG bold_x ( italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + bold_x ( italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, n=10𝑛10n=10italic_n = 10, and ι=0.1𝜄0.1\iota=0.1italic_ι = 0.1 to demonstrate the results in the non-convex setting, where 𝐱=col{𝐱(j)}j=1dd𝐱colsuperscriptsubscript𝐱𝑗𝑗1𝑑superscript𝑑{\bf{x}}=\mathrm{col}\{{\bf{x}}(j)\}_{j=1}^{d}\in\mathbb{R}^{d}bold_x = roman_col { bold_x ( italic_j ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. In this case, we compare ProxSkip to the decentralized methods local-DSGD [35], K𝐾Kitalic_K-GT [40], and LED [45] for different local steps 1/p=10,5,11𝑝1051\nicefrac{{1}}{{p}}=10,5,1/ start_ARG 1 end_ARG start_ARG italic_p end_ARG = 10 , 5 , 1. We use the same stepsize α=0.01𝛼0.01\alpha=0.01italic_α = 0.01 for all algorithms. From Fig. 2, it shows that ProxSkip and LED perform similarly, and they outperforms the other methods as we increase the number of local steps.

Refer to caption
Figure 2: Experimental results for ProxSkip to logistic regression problem with a non-convex regularizer r(𝐱)=j=1d𝐱(j)21+𝐱(j)2𝑟𝐱superscriptsubscript𝑗1𝑑𝐱superscript𝑗21𝐱superscript𝑗2r({\bf{x}})=\sum_{j=1}^{d}\frac{{\bf{x}}(j)^{2}}{1+{\bf{x}}(j)^{2}}italic_r ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG bold_x ( italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + bold_x ( italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over ijcnn1 dataset.

5 Conclusion

This paper revisits the convergence bounds of ProxSkip for stochastic decentralized optimization. We present a new analysis with a novel proof technique applicable to stochastic non-convex, convex, and strongly convex settings. Through this comprehensive analysis, we derive several new results that rival the bounds of state-of-the-art decentralized algorithms [35, 40, 45]. We establish that the leading communication complexity of ProxSkip is 𝒪(pn1σ2)𝒪𝑝superscript𝑛1superscript𝜎2\mathcal{O}(pn^{-1}\sigma^{2})caligraphic_O ( italic_p italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), indicating that ProxSkip can achieve acceleration by p𝑝pitalic_p and n𝑛nitalic_n. Our proposed proof technique overcomes the analytical limitations of prior work [46, 47, 48, 49, 50, 51] and might be of independent interest in the community.

References

  • [1]
  • [2] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent,” in Proc. Adv. Neural Inf. Process. Sys., pp. 5330–5340, 2017.
  • [3] M Assran, N Loizou, N Ballas, M Rabbat, “Stochastic gradient push for distributed deep learning,” in Proc. Int. Conf. Mach. Learn., pp. 344–353, 2019.
  • [4] A. Koloskova, T. Lin, and S. Stich, “An improved analysis of gradient tracking for decentralized machine learning,”, in Proc. Adv. Neural Inf. Process. Sys., pp. 11422–11435, 2021.
  • [5] S. A. Alghunaim and K. Yuan, “A unified and refined convergence analysis for non-convex decentralized learning,” IEEE Trans. Signal Process., vol. 70, pp. 3264–3279, 2022.
  • [6] L. Guo, X. Shi, S. Yang, and J. Cao, “DISA: A Dual inexact splitting algorithm for distributed convex composite optimization,” IEEE Trans. Autom. Control, doi: 10.1109/TAC.2023.3301289, 2023.
  • [7] D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” in Proc. Adv. Neural Inf. Process. Sys., 2017.
  • [8] J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, “signSGD: Compressed optimisation for non-convex problems,” in Proc. Int. Conf. Mach. Learn., pp.560-569, 2018.
  • [9] J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsification for communication-efficient distributed optimization,” in Proc. Adv. Neural Inf. Process. Sys., 2018.
  • [10] A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” in Proc. Int. Conf. Mach. Learn., pp. 3478–3487, 2019.
  • [11] S.P. Karimireddy, Q. Rebjock, S. Stich, M. Jaggi, “Error feedback fixes signSGD and other gradient compression schemes,” in Proc. Int. Conf. Mach. Learn., pp. 3252–3261, 2019.
  • [12] I. Fatkhullin, A. Tyurin, and P. Richtárik, “Momentum provably improves error feedback!,” in Proc. Adv. Neural Inf. Process. Sys., 2023.
  • [13] A. Tyurin and P. Richtárik, “2Direction: Theoretically faster distributed training with bidirectional communication compression,” in Proc. Adv. Neural Inf. Process. Sys., 2023.
  • [14] A. Sadiev, D. Kovalev, and P. Richtárik, “Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with inexact Prox,” in Proc. Adv. Neural Inf. Process. Sys., pp. 21777–21791, 2022.
  • [15] D. Kovalev, A. Salim, and P. Richtárik, “Optimal and practical algorithms for smooth and strongly convex decentralized optimization,” in Proc. Adv. Neural Inf. Process. Sys., pp. 18342–18352, 2020.
  • [16] H. Li, C. Fang, W. Yin and Z. Lin, “Decentralized accelerated gradient methods with increasing penalty parameters,” IEEE Trans. Signal Process., vol. 68, pp. 4855–4870, 2020.
  • [17] H. Li, Z. Lin, and Y. Fang, “Variance reduced EXTRA and DIGing and their optimal acceleration for strongly convex decentralized optimization,” J. Mach. Learn. Res., vol. 23, 2022.
  • [18] H. Hendrikx, F. Bach, and L. Massoulié,“An optimal algorithm for decentralized finite-sum optimization,” SIAM J. Optim., vol. 31, no. 4, pp. 2753–2783, 2021.
  • [19] Z. Song, L. Shi, S. Pu, and M. Yan, “Optimal gradient tracking for decentralized optimization,” Math. Program., 2023. doi: 10.1007/s10107-023-01997-7.
  • [20] T. Lin, S. Stich, K. K. Patel, and M. Jaggi “Don’t use large mini-batches, use local SGD,” in Proc. Int. Conf. Learn. Represent., 2018, arXiv:1808.07217. [Online]. Available: https://arxiv.longhoe.net/abs/1808.07217.
  • [21] B. Woodworth, K. K. Patel, S. Stich, Z. Dai, B. Bullins, H. B. McMahan, O. Shamir, and N. Srebro, “Is Local SGD Better than Minibatch SGD?,” in Proc. Int. Conf. Mach. Learn., pp. 10334–10343, 2020.
  • [22] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A.T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2016, arXiv:1610.05492. [Online]. Available: https://arxiv.longhoe.net/abs/1610.05492.
  • [23] S. Stich, “Local SGD converges fast and communicates little,” in Proc. Int. Conf. Learn. Represent., 2018, arXiv:1805.09767. https://arxiv.longhoe.net/abs/1805.09767.
  • [24] H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-IID federated learning,” in Proc. Int. Conf. Learn. Represent., 2021.
  • [25] A. Khaled, K. Mishchenko, and P. Richtárik, “Tighter theory for local SGD on identical and heterogeneous data,” in Proc. Int. Conf. Artif. Intell. Statist., pp. 4519–4529, 2020.
  • [26] J. Wang and G. Joshi, “Cooperative SGD: A unified framework for the design and analysis of local update SGD algorithms,” J. Mach. Learn. Res., vol. 22, no. 213, 2021.
  • [27] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning” , in Proc. Int. Conf. Mach. Learn., pp. 5132–5143, 2020.
  • [28] Z. Cheng, X. Huang, and K. Yuan, “Momentum benefits non-IID federated learning simply and provably, ” in Proc. Int. Conf. Learn. Represent., 2024.
  • [29] A. Mitra, R. Jaafar, G. J. Pappas, and H. Hassani, “Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients,” in Proc. Adv. Neural Inf. Process. Sys., pp. 14606–14619, 2021.
  • [30] X. Zhang, M. Hong, S. Dhople, W. Yin, and Y. Liu, “FedPD: A federated learning framework with adaptivity to Non-IID data,” IEEE Trans. Signal Process., vol. 69, pp. 6055–6070, 2021.
  • [31] A. E. Durmus, Z. Yue, M. Ramon, M. Matthew, W. Paul, and S. Venkatesh, “Federated learning based on dynamic regularization,” in Proc. Int. Conf. Learn. Represent., 2021.
  • [32] X. Liang, S. Shen, J. Liu, Z. Pan, E. Chen, and Y. Cheng, “Variance reduced local SGD with lower communication complexity,” 2019, arXiv:1912.12844. https://arxiv.longhoe.net/abs/1912.12844.
  • [33] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, “Federated learning with compression: Unified analysis and sharp guarantees,” in Proc. Int. Conf. Artif. Intell. Statist., pp. 2350–2358, 2021.
  • [34] X. Huang, P. Li, and X. Li, “Stochastic controlled averaging for federated learning with communication compression,” in Proc. Int. Conf. Learn. Represent., 2024.
  • [35] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” in Proc. Int. Conf. Mach. Learn., pp. 5381–5393, 2020.
  • [36] S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Math. Program., vol. 187, pp. 409–457, 2021.
  • [37] A. Nedić, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM J. Optim., vol. 27, no. 4, pp. 2597–2633, 2017.
  • [38] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Trans. Control Netw. Syst., vol. 5, no. 3, pp. 1245–1260, Sep. 2018.
  • [39] E. D. H. Nguyen, S. A. Alghunaim, K. Yuan, and C. A. Uribe, “On the performance of gradient tracking with local updates,” 2022, arXiv:2210.04757, [Online]. Available: https://arxiv.longhoe.net/abs/2210.04757.
  • [40] Y. Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentralized gradient tracking with local steps,” 2023, arXiv:2301.01313, [Online]. Available: https://arxiv.longhoe.net/abs/2301.01313.
  • [41] K. Yuan, B. Ying, X. Zhao, and A. H. Sayed, “Exact diffusion for distributed optimization and learning-part I: Algorithm development,” IEEE Trans. Signal Process., vol. 67, no. 3, pp. 708–723, Feb. 2019.
  • [42] Z. Li, W. Shi, and M. Yan, “A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates,” IEEE Trans. Signal Process., vol. 67, no. 17, pp. 4494–4506, Sep. 2019.
  • [43] H. Tang, X. Lian, M. Yan, C. Zhang, and J. Liu, “D2: Decentralized training over decentralized data,” in Proc. Int. Conf. Mach. Learn., pp. 4848–4856, 2018.
  • [44] L. Guo, X. Shi, J. Cao, and Z. Wang, “Decentralized inexact proximal gradient method with network-independent stepsizes for convex composite optimization,” IEEE Trans. Signal Process., vol. 71, pp. 786–801, 2023.
  • [45] S. A. Alghunaim, “Local exact-diffusion for decentralized optimization and learning,” 2023, arXiv:2302.00620, [Online]. Available: https://arxiv.longhoe.net/abs/2302.00620.
  • [46] K. Mishchenko, G. Malinovsky, S. Stich, and P. Richtárik, “ProxSkip: Yes! Local gradient steps provably lead to communication acceleration! Finally!,” in Proc. Int. Conf. Mach. Learn., pp. 15750–15769, 2022.
  • [47] L. Condat, I. Agarský, G. Malinovsky, and P. Richtárik, TAMUNA: Doubly accelerated federated learning with local training, compression, and partial participation, 2023, arXiv:2302.09832, [Online]. Available: https://arxiv.longhoe.net/abs/2302.09832.
  • [48] L. Condat, I. Agarský, and P. Richtárik, “Provably doubly accelerated federated learning: The first theoretically successful combination of local training and communication compression,” 2023, arXiv:2210.13277, [Online]. Available: https://arxiv.longhoe.net/abs/2210.13277.
  • [49] G. Malinovsky, K. Yi, and P. Richtárik, “Variance reduced ProxSkip: Algorithm, theory and application to federated learning,” in Proc. Adv. Neural Inf. Process. Sys., pp. 15176–15189, 2022.
  • [50] Z. Hu and H. Huang, “Tighter analysis for ProxSkip,” in Proc. Int. Conf. Mach. Learn., pp. 13469–13496, 2023.
  • [51] L. Condat and P. Richtárik, “RandProx: Primal-dual optimization algorithms with randomized proximal updates,” in Proc. Int. Conf. Learn. Represent., 2023.
  • [52] K. Scaman, F. Bach, S. Bubeck, Y.-T. Lee, and L. Massoulié, “Optimal algorithms for smooth and strongly convex distributed optimization in networks,” in Proc. Int. Conf. Mach. Learn., pp. 3027–3036, 2017.
  • [53] H. Yuan, S. A. Alghunaim, and K. Yuan, “Achieving linear speedup with network-independent learning rates in decentralized stochastic optimization,” Proc. in IEEE Conf. Decis. Control, pp. 139-144, 2023.
  • [54] C.-C. Chang and C.-J. Lin, “LibSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, 2011, Art. no. 27.

Appendix

Appendix A Preliminaries

A.1 Basic Facts

The stochastic processes such as randomized communication and gradient estimation generate two sequences of σ𝜎\sigmaitalic_σ-algebra. We denote by 𝒢tsuperscript𝒢𝑡\mathcal{G}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT the σ𝜎\sigmaitalic_σ-algebra of gradient estimation at t𝑡titalic_t-th iteration and tsuperscript𝑡\mathcal{F}^{t}caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT the σ𝜎\sigmaitalic_σ-algebra of randomized communication at the same step. The sequences {𝒢t}t0subscriptsuperscript𝒢𝑡𝑡0\{\mathcal{G}^{t}\}_{t\geq 0}{ caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT and {t}t0subscriptsuperscript𝑡𝑡0\{\mathcal{F}^{t}\}_{t\geq 0}{ caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT satisfy

𝒢00𝒢11𝒢22𝒢tt.superscript𝒢0superscript0superscript𝒢1superscript1superscript𝒢2superscript2superscript𝒢𝑡superscript𝑡\mathcal{G}^{0}\subset\mathcal{F}^{0}\subset\mathcal{G}^{1}\subset\mathcal{F}^% {1}\subset\mathcal{G}^{2}\subset\mathcal{F}^{2}\subset\cdots\subset\mathcal{G}% ^{t}\subset\mathcal{F}^{t}\subset\cdots\ .caligraphic_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⊂ caligraphic_G start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⊂ caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⊂ ⋯ ⊂ caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊂ ⋯ .

With these notations, we can clarify the stochastic dependencies among the variables generated by Algorithmd 1, i.e., (𝐆t,𝐙^t)superscript𝐆𝑡superscript^𝐙𝑡({\bf{G}}^{t},\hat{{\bf{Z}}}^{t})( bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) is measurable in 𝒢tsuperscript𝒢𝑡\mathcal{G}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and (𝐘t+1,𝐗t+1)superscript𝐘𝑡1superscript𝐗𝑡1({\bf{Y}}^{t+1},{\bf{X}}^{t+1})( bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) is measurable in tsuperscript𝑡\mathcal{F}^{t}caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

The Bregman divergence of f𝑓fitalic_f at points (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) is defined by

Df(x,y):=f(x)f(y)f(y),xy.assignsubscript𝐷𝑓𝑥𝑦𝑓𝑥𝑓𝑦𝑓𝑦𝑥𝑦D_{f}(x,y):=f(x)-f(y)-\langle\nabla f(y),x-y\rangle.italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) := italic_f ( italic_x ) - italic_f ( italic_y ) - ⟨ ∇ italic_f ( italic_y ) , italic_x - italic_y ⟩ .

It is easy to verify that f(x)f(y),xy=Df(x,y)+Df(y,x)𝑓𝑥𝑓𝑦𝑥𝑦subscript𝐷𝑓𝑥𝑦subscript𝐷𝑓𝑦𝑥\langle\nabla f(x)-\nabla f(y),x-y\rangle=D_{f}(x,y)+D_{f}(y,x)⟨ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) , italic_x - italic_y ⟩ = italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) + italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_y , italic_x ). If f𝑓fitalic_f is convex, from the definition of convex function, we have Df(x,y)0subscript𝐷𝑓𝑥𝑦0D_{f}(x,y)\geq 0italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) ≥ 0 and Df(y,x)0subscript𝐷𝑓𝑦𝑥0D_{f}(y,x)\geq 0italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_y , italic_x ) ≥ 0. Thus

f(x)f(y),xyDf(x,y), and f(x)f(y),xyDf(y,x).formulae-sequence𝑓𝑥𝑓𝑦𝑥𝑦subscript𝐷𝑓𝑥𝑦 and 𝑓𝑥𝑓𝑦𝑥𝑦subscript𝐷𝑓𝑦𝑥\displaystyle\langle\nabla f(x)-\nabla f(y),x-y\rangle\geq D_{f}(x,y),\text{ % and }\langle\nabla f(x)-\nabla f(y),x-y\rangle\geq D_{f}(y,x).⟨ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) , italic_x - italic_y ⟩ ≥ italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) , and ⟨ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) , italic_x - italic_y ⟩ ≥ italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_y , italic_x ) . (15)

For an L𝐿Litalic_L-smooth and μ𝜇\muitalic_μ-strongly convex function f𝑓fitalic_f, by [46, Appendix. A] we have

μ2xy2𝜇2superscriptnorm𝑥𝑦2absent\displaystyle\frac{\mu}{2}\|x-y\|^{2}\leqdivide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ Df(x,y)L2xy2,subscript𝐷𝑓𝑥𝑦𝐿2superscriptnorm𝑥𝑦2\displaystyle D_{f}(x,y)\leq\frac{L}{2}\|x-y\|^{2},italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) ≤ divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (16)
12Lf(x)f(y)212𝐿superscriptnorm𝑓𝑥𝑓𝑦2absent\displaystyle\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}\leqdivide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG ∥ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ Df(x,y)12μf(x)f(y)2.subscript𝐷𝑓𝑥𝑦12𝜇superscriptnorm𝑓𝑥𝑓𝑦2\displaystyle D_{f}(x,y)\leq\frac{1}{2\mu}\|\nabla f(x)-\nabla f(y)\|^{2}.italic_D start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_μ end_ARG ∥ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (17)

Under the L𝐿Litalic_L-smoothness condition, we have

f(y)f(x)+f(x),yx+L2xy2,x,ydformulae-sequence𝑓𝑦𝑓𝑥𝑓𝑥𝑦𝑥𝐿2superscriptnorm𝑥𝑦2for-all𝑥𝑦superscript𝑑\displaystyle f(y)\leq f(x)+\langle\nabla f(x),y-x\rangle+\frac{L}{2}\|x-y\|^{% 2},\ \forall x,y\in\mathbb{R}^{d}italic_f ( italic_y ) ≤ italic_f ( italic_x ) + ⟨ ∇ italic_f ( italic_x ) , italic_y - italic_x ⟩ + divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (18)

A.2 Notations

For any n×m𝑛𝑚n\times mitalic_n × italic_m matrices 𝐚𝐚\mathbf{a}bold_a and 𝐛𝐛\mathbf{b}bold_b, their inner product is denoted as 𝐚,𝐛=Trace(𝐚𝖳𝐛)𝐚𝐛Tracesuperscript𝐚𝖳𝐛\langle\mathbf{a},\mathbf{b}\rangle=\mathrm{Trace}(\mathbf{a}^{\sf T}\mathbf{b})⟨ bold_a , bold_b ⟩ = roman_Trace ( bold_a start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_b ). For a given matrix 𝐚𝐚\mathbf{a}bold_a, the Frobenius norm is given by 𝐚Fsubscriptnorm𝐚F\|\mathbf{a}\|_{\mathrm{F}}∥ bold_a ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT, while the spectral norm is given by 𝐚norm𝐚\|\mathbf{a}\|∥ bold_a ∥. Define the gradient and communication noise as

gradient noise: 𝐒t=[𝐬1t,,𝐬nt]𝖳=𝐆tF(𝐗t), where 𝐬it=𝐠itfi(𝐱it);formulae-sequencesuperscript𝐒𝑡superscriptsubscriptsuperscript𝐬𝑡1subscriptsuperscript𝐬𝑡𝑛𝖳superscript𝐆𝑡𝐹superscript𝐗𝑡 where subscriptsuperscript𝐬𝑡𝑖superscriptsubscript𝐠𝑖𝑡subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡\displaystyle{\bf{S}}^{t}=[{\bf{s}}^{t}_{1},\ldots,{\bf{s}}^{t}_{n}]^{\sf T}={% \bf{G}}^{t}-\nabla F({\bf{X}}^{t}),\text{ where }{\bf{s}}^{t}_{i}={\bf{g}}_{i}% ^{t}-\nabla f_{i}({\bf{x}}_{i}^{t});bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , where bold_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ;
communication noise: 𝐄t=θt12χ𝐖b𝐙^t.superscript𝐄𝑡subscript𝜃𝑡12𝜒subscript𝐖𝑏superscript^𝐙𝑡\displaystyle{\bf{E}}^{t}=\frac{\theta_{t}-1}{2\chi}{\bf{W}}_{b}\hat{{\bf{Z}}}% ^{t}.bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .

We also define the following notations to simplify the analysis:

𝐱¯t(1ni=1n𝐱it)𝖳,𝐗¯t=𝟏𝐱¯t,𝐬¯t(1ni=1n𝐬it)𝖳,F¯(𝐗t)(1ni=1nfi(𝐱it))𝖳.formulae-sequencesuperscript¯𝐱𝑡superscript1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐱𝑖𝑡𝖳formulae-sequencesuperscript¯𝐗𝑡tensor-product1superscript¯𝐱𝑡formulae-sequencesuperscript¯𝐬𝑡superscript1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐬𝑖𝑡𝖳¯𝐹superscript𝐗𝑡superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖subscriptsuperscript𝐱𝑡𝑖𝖳\displaystyle\bar{{\bf{x}}}^{t}\triangleq\big{(}\frac{1}{n}\sum_{i=1}^{n}{\bf{% x}}_{i}^{t}\big{)}^{\sf T},\quad\bar{{\bf{X}}}^{t}={\bf{1}}\otimes\bar{{\bf{x}% }}^{t},\quad\bar{{\bf{s}}}^{t}\triangleq\big{(}\frac{1}{n}\sum_{i=1}^{n}{\bf{s% }}_{i}^{t}\big{)}^{\sf T},\quad\overline{\nabla F}({\bf{X}}^{t})\triangleq\big% {(}\frac{1}{n}\sum_{i=1}^{n}\nabla f_{i}({\bf{x}}^{t}_{i})\big{)}^{\sf T}.over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_1 ⊗ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≜ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT .

With Assumption 1 and [5, Section IV-B], the mixing matrix 𝐖𝐖{\bf{W}}bold_W can be decomposed as

𝐖=𝐏𝚲𝐏1=[𝟏𝐏^][𝐈00𝚲^][1n𝟏𝖳𝐏^𝖳],𝐖𝐏𝚲superscript𝐏1delimited-[]1^𝐏delimited-[]𝐈00^𝚲delimited-[]1𝑛superscript1𝖳superscript^𝐏𝖳{\bf{W}}={\bf{P\Lambda P}}^{-1}=\left[\begin{array}[]{cc}{\bf{1}}&\hat{{\bf{P}% }}\\ \end{array}\right]\left[\begin{array}[]{cc}{\bf{I}}&0\\ 0&\hat{{\bf{\Lambda}}}\\ \end{array}\right]\left[\begin{array}[]{c}\frac{1}{n}{\bf{1}}^{\sf T}\\ \hat{{\bf{P}}}^{\sf T}\\ \end{array}\right],bold_W = bold_P bold_Λ bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL bold_1 end_CELL start_CELL over^ start_ARG bold_P end_ARG end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL bold_I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL over^ start_ARG bold_Λ end_ARG end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ,

where 𝚲^=diag{λ2,,λn}^𝚲diagsubscript𝜆2subscript𝜆𝑛\hat{{\bf{\Lambda}}}=\mathrm{diag}\{\lambda_{2},\ldots,\lambda_{n}\}over^ start_ARG bold_Λ end_ARG = roman_diag { italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, and matrix 𝐏^n×(n1)^𝐏superscript𝑛𝑛1\hat{{\bf{P}}}\in\mathbb{R}^{n\times(n-1)}over^ start_ARG bold_P end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × ( italic_n - 1 ) end_POSTSUPERSCRIPT satisfies

𝐏^𝖳𝐏^=𝐈, 1𝖳𝐏^=0,𝐏^𝐏^𝖳=𝐈1n𝟏𝟏𝖳.formulae-sequencesuperscript^𝐏𝖳^𝐏𝐈formulae-sequencesuperscript1𝖳^𝐏0^𝐏superscript^𝐏𝖳𝐈1𝑛superscript11𝖳\hat{{\bf{P}}}^{\sf T}\hat{{\bf{P}}}={\bf{I}},\ {\bf{1}}^{\sf T}\hat{{\bf{P}}}% =0,\ \hat{{\bf{P}}}\hat{{\bf{P}}}^{\sf T}={\bf{I}}-\frac{1}{n}{\bf{11}}^{\sf T}.over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG = bold_I , bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG = 0 , over^ start_ARG bold_P end_ARG over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = bold_I - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_11 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT .

Therefore, the matrix 𝐖asubscript𝐖𝑎{\bf{W}}_{a}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and 𝐖bsubscript𝐖𝑏{\bf{W}}_{b}bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT can be decomposed as

𝐖a=[𝟏𝐏^][𝟏00𝚲^a]:=𝚲a[1n𝟏𝖳𝐏^𝖳],𝐖b2=[𝟏𝐏^][000𝚲^b2]:=𝚲b2[1n𝟏𝖳𝐏^𝖳],formulae-sequencesubscript𝐖𝑎delimited-[]1^𝐏subscriptdelimited-[]100subscript^𝚲𝑎assignabsentsubscript𝚲𝑎delimited-[]1𝑛superscript1𝖳superscript^𝐏𝖳superscriptsubscript𝐖𝑏2delimited-[]1^𝐏subscriptdelimited-[]000superscriptsubscript^𝚲𝑏2assignabsentsuperscriptsubscript𝚲𝑏2delimited-[]1𝑛superscript1𝖳superscript^𝐏𝖳\displaystyle{\bf{W}}_{a}=\left[\begin{array}[]{cc}{\bf{1}}&\hat{{\bf{P}}}\\ \end{array}\right]\underbrace{\left[\begin{array}[]{cc}{\bf{1}}&0\\ 0&\hat{{\bf{\Lambda}}}_{a}\\ \end{array}\right]}_{:={\bf{\Lambda}}_{a}}\left[\begin{array}[]{c}\frac{1}{n}{% \bf{1}}^{\sf T}\\ \hat{{\bf{P}}}^{\sf T}\\ \end{array}\right],~{}{\bf{W}}_{b}^{2}=\left[\begin{array}[]{cc}{\bf{1}}&\hat{% {\bf{P}}}\\ \end{array}\right]\underbrace{\left[\begin{array}[]{cc}0&0\\ 0&\hat{{\bf{\Lambda}}}_{b}^{2}\\ \end{array}\right]}_{:={\bf{\Lambda}}_{b}^{2}}\left[\begin{array}[]{c}\frac{1}% {n}{\bf{1}}^{\sf T}\\ \hat{{\bf{P}}}^{\sf T}\\ \end{array}\right],bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL bold_1 end_CELL start_CELL over^ start_ARG bold_P end_ARG end_CELL end_ROW end_ARRAY ] under⏟ start_ARG [ start_ARRAY start_ROW start_CELL bold_1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := bold_Λ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL bold_1 end_CELL start_CELL over^ start_ARG bold_P end_ARG end_CELL end_ROW end_ARRAY ] under⏟ start_ARG [ start_ARRAY start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := bold_Λ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , (29)

where 𝚲^a=𝐈12χ(𝐈𝚲^)subscript^𝚲𝑎𝐈12𝜒𝐈^𝚲\hat{{\bf{\Lambda}}}_{a}={\bf{I}}-\frac{1}{2\chi}({\bf{I}}-\hat{{\bf{\Lambda}}})over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ), 𝚲^b=𝐈𝚲^subscript^𝚲𝑏𝐈^𝚲\hat{{\bf{\Lambda}}}_{b}=\sqrt{{\bf{I}}-\hat{{\bf{\Lambda}}}}over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = square-root start_ARG bold_I - over^ start_ARG bold_Λ end_ARG end_ARG. Since λi(1,1)subscript𝜆𝑖11\lambda_{i}\in(-1,1)italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( - 1 , 1 ) for i=2,,n𝑖2𝑛i=2,\dots,nitalic_i = 2 , … , italic_n, it holds that 112χ(1λi)[0,1)112𝜒1subscript𝜆𝑖011-\frac{1}{2\chi}(1-\lambda_{i})\in[0,1)1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ) and 0𝐖a𝐈precedes-or-equals0subscript𝐖𝑎precedes𝐈0\preceq{\bf{W}}_{a}\prec{\bf{I}}0 ⪯ bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ≺ bold_I for χ1𝜒1\chi\geq 1italic_χ ≥ 1.

Appendix B Proof of Theorem 2 and Corollary 1

B.1 Transformation and Some Descent Inequalities

Here, we introduce an auxiliary variable 𝐑t=𝐘t+αF(𝐗¯t)superscript𝐑𝑡superscript𝐘𝑡𝛼𝐹superscript¯𝐗𝑡{\bf{R}}^{t}={\bf{Y}}^{t}+\alpha\nabla F(\bar{{\bf{X}}}^{t})bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), where 𝐗¯t=𝟏𝐱¯tsuperscript¯𝐗𝑡tensor-product1superscript¯𝐱𝑡\bar{{\bf{X}}}^{t}=\mathbf{1}\otimes\bar{{\bf{x}}}^{t}over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_1 ⊗ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. It follows from (4b) and (4c) that, when β=1𝛽1\beta=1italic_β = 1 and p=1𝑝1p=1italic_p = 1, 𝐘t+1=𝐘t+12χ𝐖b2𝐙^tsuperscript𝐘𝑡1superscript𝐘𝑡12𝜒superscriptsubscript𝐖𝑏2superscript^𝐙𝑡{\bf{Y}}^{t+1}={\bf{Y}}^{t}+\frac{1}{2\chi}{\bf{W}}_{b}^{2}\hat{{\bf{Z}}}^{t}bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. For any fixed point (𝐗,𝐘)𝐗𝐘({\bf{X}},{\bf{Y}})( bold_X , bold_Y ) of update (4), it holds that 𝐙^=𝐗^𝐙𝐗\hat{{\bf{Z}}}={\bf{X}}over^ start_ARG bold_Z end_ARG = bold_X, 𝐘+αF(𝐗)=0𝐘𝛼𝐹𝐗0{\bf{Y}}+\alpha\nabla F({{\bf{X}}})=0bold_Y + italic_α ∇ italic_F ( bold_X ) = 0, 𝐖b𝐗=0subscript𝐖𝑏𝐗0{\bf{W}}_{b}{\bf{X}}=0bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_X = 0. Thus, 𝐑=0𝐑0{\bf{R}}=0bold_R = 0 implies that 1ni=1nfi(𝐱)=01𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖𝐱0\frac{1}{n}\sum_{i=1}^{n}\nabla f_{i}({\bf{x}})=0divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = 0, i.e., 𝐱𝐱{\bf{x}}bold_x is a stationary point of problem (1). By this new variable, we give following error dynamics of Algorithm 1.

Lemma 1.

Suppose that Assumption 1 holds. There exists a invertible matrix 𝐐𝐐{\bf{Q}}bold_Q and a diagonal matrix 𝚪𝚪{\bf{\Gamma}}bold_Γ such that

𝐱¯t+1superscript¯𝐱𝑡1\displaystyle\bar{{\bf{x}}}^{t+1}over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐱¯tαF¯(𝐗t)α𝐬¯t,absentsuperscript¯𝐱𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\displaystyle=\bar{{\bf{x}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t})-% \alpha\bar{{\bf{s}}}^{t},= over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (30a)
t+1superscript𝑡1\displaystyle\mathcal{E}^{t+1}caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝚪tα𝐐1[𝚲^a𝐏^𝖳𝚺1t12χ𝚲^b2𝐏^𝖳𝚺1t+𝐏^𝖳𝚺2t]:=𝔾t+𝐐1[𝚲^b𝐏^𝖳𝐄t𝚲^b𝐏^𝖳𝐄t]:=𝔽t,absentsubscript𝚪superscript𝑡𝛼superscript𝐐1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳superscriptsubscript𝚺1𝑡12𝜒superscriptsubscript^𝚲𝑏2superscript^𝐏𝖳superscriptsubscript𝚺1𝑡superscript^𝐏𝖳superscriptsubscript𝚺2𝑡assignabsentsuperscript𝔾𝑡subscriptsuperscript𝐐1delimited-[]subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡assignabsentsuperscript𝔽𝑡\displaystyle=\underbrace{{\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}% \left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}{\bf{% \Sigma}}_{1}^{t}\\ \frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}\hat{{\bf{P}}}^{\sf T}{\bf{\Sigma}}% _{1}^{t}+\hat{{\bf{P}}}^{\sf T}{\bf{\Sigma}}_{2}^{t}\\ \end{array}\right]}_{:=\mathbb{G}^{t}}+\underbrace{{\bf{Q}}^{-1}\left[\begin{% array}[]{c}-\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right]}_{:=\mathbb{F}^{t}},= under⏟ start_ARG bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := blackboard_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (30f)

where γ𝚪=112χ(1λ2)<1𝛾norm𝚪112𝜒1subscript𝜆21\gamma\triangleq\|{\bf{\Gamma}}\|=\sqrt{1-\frac{1}{2\chi}(1-\lambda_{2})}<1italic_γ ≜ ∥ bold_Γ ∥ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 1,

t𝐐1[𝐏^𝖳𝐗t𝐏^𝖳𝐑t],{𝚺1t=F(𝐗t)F(𝐗¯t)+𝐒t,𝚺2t=F(𝐗¯t)F(𝐗¯t+1).superscript𝑡superscript𝐐1delimited-[]superscript^𝐏𝖳superscript𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡casessuperscriptsubscript𝚺1𝑡𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript𝐒𝑡superscriptsubscript𝚺2𝑡𝐹superscript¯𝐗𝑡𝐹superscript¯𝐗𝑡1\displaystyle\mathcal{E}^{t}\triangleq{\bf{Q}}^{-1}\left[\begin{array}[]{c}% \hat{{\bf{P}}}^{\sf T}{{\bf{X}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}{{\bf{R}}}^{t}\\ \end{array}\right],\quad\left\{\begin{array}[]{l}{\bf{\Sigma}}_{1}^{t}=\nabla F% ({\bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t})+{\bf{S}}^{t},\\ {\bf{\Sigma}}_{2}^{t}=\nabla F(\bar{{\bf{X}}}^{t})-\nabla F(\bar{{\bf{X}}}^{t+% 1})\end{array}\right..caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , { start_ARRAY start_ROW start_CELL bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY .

Moreover, we have

𝐐22 and 𝐐122χ(1+λn)(1λ2).superscriptnorm𝐐22 and superscriptnormsuperscript𝐐122𝜒1subscript𝜆𝑛1subscript𝜆2\|{\bf{Q}}\|^{2}\leq 2\text{ and }\|{\bf{Q}}^{-1}\|^{2}\leq\frac{2\chi}{(1+% \lambda_{n})(1-\lambda_{2})}.∥ bold_Q ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 and ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_χ end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

In addition, we have

𝐗t𝐗¯tF24tF2.superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\displaystyle\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}.∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (31)
Proof.

See Appendix D. ∎

Based on Lemma 1, we give the following descent inequalities.

Lemma 2.

Suppose that Assumptions 1, 2, and 4 hold. If α12L𝛼12𝐿\alpha\leq\frac{1}{2L}italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, it holds that

𝔼[f(𝐱¯t+1)|𝒢t]𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡1superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t+1})\;|\;\mathcal{G}^{t}% \right]\leqblackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ f(𝐱¯t)α2f(𝐱¯t)2+2αL2ntF2+Lα2σ22n,𝑓superscript¯𝐱𝑡𝛼2superscriptnorm𝑓superscript¯𝐱𝑡22𝛼superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝑡F2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle f(\bar{{\bf{x}}}^{t})-\frac{\alpha}{2}\|\nabla f(\bar{{\bf{x}}}^% {t})\|^{2}+\frac{2\alpha L^{2}}{n}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{L% \alpha^{2}\sigma^{2}}{2n},italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG , (32)
𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]\leqblackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ γ~tF2+4nα4L22χ1λ21γf(𝐱¯t)2~𝛾superscriptsubscriptnormsuperscript𝑡F24𝑛superscript𝛼4superscript𝐿22𝜒1subscript𝜆21𝛾superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle\tilde{\gamma}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{4n\alpha% ^{4}L^{2}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\|\nabla f(\bar{{\bf{x}}}^{t})% \|^{2}over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2α4L2σ22χ1λ21γ+2nα2σ2(2χ2+(1p))χ2,2superscript𝛼4superscript𝐿2superscript𝜎22𝜒1subscript𝜆21𝛾2𝑛superscript𝛼2superscript𝜎22superscript𝜒21𝑝superscript𝜒2\displaystyle\quad+\frac{2\alpha^{4}L^{2}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}% }{1-\gamma}+\frac{2n\alpha^{2}\sigma^{2}(2\chi^{2}+(1-p))}{\chi^{2}},+ divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (33)

where

γ~=γ+32α2L2+16α4L42χ1λ21γ+2(1p)(3+24χα2L2(1+λn)(1λ2))χ2.~𝛾𝛾32superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾21𝑝324𝜒superscript𝛼2superscript𝐿21subscript𝜆𝑛1subscript𝜆2superscript𝜒2\displaystyle\tilde{\gamma}=\gamma+\frac{32\alpha^{2}L^{2}+16\alpha^{4}L^{4}% \frac{2\chi}{1-\lambda_{2}}}{1-\gamma}+\frac{2(1-p)\big{(}3+\frac{24\chi\alpha% ^{2}L^{2}}{(1+\lambda_{n})(1-\lambda_{2})}\big{)}}{\chi^{2}}.over~ start_ARG italic_γ end_ARG = italic_γ + divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 ( 1 - italic_p ) ( 3 + divide start_ARG 24 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (34)

Moreover, if fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-convex (μ0𝜇0\mu\geq 0italic_μ ≥ 0) and α14L𝛼14𝐿\alpha\leq\frac{1}{4L}italic_α ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG, it holds that

𝔼[𝐱¯t+1𝐱2|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}\|^{2}\;% |\;\mathcal{G}^{t}\right]blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] (1μα)𝐱¯t𝐱2+6αLntF2absent1𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝑡F2\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+% \frac{6\alpha L}{n}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+α2σ2nα(f(𝐱¯t)f(𝐱)),superscript𝛼2superscript𝜎2𝑛𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle\quad+\frac{\alpha^{2}\sigma^{2}}{n}-\alpha(f(\bar{{\bf{x}}}^{t})% -f({\bf{x}}^{\star})),+ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) , (35)

where 𝐱superscript𝐱{\bf{x}}^{\star}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT solves problem (1).

Proof.

See Appendix E. ∎

B.2 Convergence Analysis: Non-convex

With Lemma 1 and Lemma 2, we further have the following theorem.

Theorem 4.

Suppose that Assumptions 1, 2, and 4 hold. If β=1𝛽1\beta=1italic_β = 1, α𝛼\alphaitalic_α and χ𝜒\chiitalic_χ satisfy that χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }

0<αmin{12L,1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)312χ3414L},0𝛼12𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2312superscript𝜒314𝐿\displaystyle 0<\alpha\leq\min\left\{\frac{1}{2L},\frac{1-\lambda_{2}}{32\sqrt% {3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L},% \sqrt[4]{\frac{(1-\lambda_{2})^{3}}{12\chi^{3}}}\frac{1}{4L}\right\},0 < italic_α ≤ roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } , (36)

it holds that γ~1+γ2<1~𝛾1𝛾21\tilde{\gamma}\leq\frac{1+\gamma}{2}<1over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1 and

1Tt=0T1𝔼[f(𝐱¯t)2]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 4(f(𝐱¯0)f)αT+128χ2L2α2ς02(1λ2)2T+2Lασ2n4𝑓superscript¯𝐱0superscript𝑓𝛼𝑇128superscript𝜒2superscript𝐿2superscript𝛼2subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇2𝐿𝛼superscript𝜎2𝑛\displaystyle\frac{4(f(\bar{{\bf{x}}}^{0})-f^{\star})}{\alpha T}+\frac{128\chi% ^{2}L^{2}\alpha^{2}\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}+\frac{2L\alpha% \sigma^{2}}{n}divide start_ARG 4 ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 128 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG 2 italic_L italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
+α2L2σ2(χ3+256χ(2χ2+(1p)))2(1λ2)χ2,superscript𝛼2superscript𝐿2superscript𝜎2superscript𝜒3256𝜒2superscript𝜒21𝑝21subscript𝜆2superscript𝜒2\displaystyle+\frac{\alpha^{2}L^{2}\sigma^{2}\big{(}\chi^{3}+256\chi(2\chi^{2}% +(1-p))\big{)}}{2(1-\lambda_{2})\chi^{2}},+ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 256 italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (37)

where ς02=1ni=1nfi(𝐱¯0)f(𝐱¯0)2subscriptsuperscript𝜍201𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖superscript¯𝐱0𝑓superscript¯𝐱02\varsigma^{2}_{0}=\frac{1}{n}\sum_{i=1}^{n}\|\nabla f_{i}(\bar{{\bf{x}}}^{0})-% \nabla f(\bar{{\bf{x}}}^{0})\|^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Proof.

See Appendix F. ∎

Based on Theorem 2, we can even get a tighter rate by carefully selecting the stepsize similar to [35, Lemma 17], [40, Lemma C.13], or [45, Corollary 1].

Corollary 2.

Suppose that Assumptions 1, 2, and 4 hold. If β=1𝛽1\beta=1italic_β = 1, χ𝜒\chiitalic_χ satisfies that χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }, there exist a constant α=𝒪(1λ2χL)𝛼𝒪1subscript𝜆2𝜒𝐿\alpha=\mathcal{O}\left(\frac{1-\lambda_{2}}{\chi L}\right)italic_α = caligraphic_O ( divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_χ italic_L end_ARG ) such that

1Tt=0T1𝔼[f(𝐱¯t)2]𝒪((σ2nT)12+(χσ2(1λ2)T2)13+χ(1λ2)T).1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2𝒪superscriptsuperscript𝜎2𝑛𝑇12superscript𝜒superscript𝜎21subscript𝜆2superscript𝑇213𝜒1subscript𝜆2𝑇\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[\|\nabla f(\bar{{\bf{x}}}^{% t})\|^{2}]\leq\mathcal{O}\left(\left(\frac{\sigma^{2}}{nT}\right)^{\frac{1}{2}% }+\left(\frac{\chi\sigma^{2}}{(1-\lambda_{2})T^{2}}\right)^{\frac{1}{3}}+\frac% {\chi}{(1-\lambda_{2})T}\right).divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + ( divide start_ARG italic_χ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T end_ARG ) . (38)
Proof.

See Appendix G. ∎

When p<λ2𝑝subscript𝜆2p<\lambda_{2}italic_p < italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have max{288(1p)1λ2,1}=𝒪(11λ2)2881𝑝1subscript𝜆21𝒪11subscript𝜆2\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}=\mathcal{O}\left(\frac{1}{% 1-\lambda_{2}}\right)roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ). Choosing χ=max{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi=\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ = roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }. Since in each iteration we trigger communication with probability p𝑝pitalic_p, for any desired accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the expected number of communication rounds required to achieve 1Tt=0T1𝔼[f(𝐱¯t)2]ϵ1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2italic-ϵ\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}]\leq\epsilondivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ is bounded by:

p<λ2:p×(iteration complexity)=𝒪(pσ2nϵ2+p1λ2σϵ3/2+1(1λ2)21ϵ).:𝑝subscript𝜆2𝑝(iteration complexity)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝1subscript𝜆2𝜎superscriptitalic-ϵ321superscript1subscript𝜆221italic-ϵp<\lambda_{2}:\ p\times\text{(iteration complexity)}=\mathcal{O}\left(\frac{p% \sigma^{2}}{n\epsilon^{2}}+\frac{p}{1-\lambda_{2}}\frac{\sigma}{\epsilon^{% \nicefrac{{3}}{{2}}}}+\frac{1}{(1-\lambda_{2})^{2}}\frac{1}{\epsilon}\right).italic_p < italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_p × (iteration complexity) = caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_σ end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) .

When pλ2𝑝subscript𝜆2p\geq\lambda_{2}italic_p ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have max{288(1p)1λ2,1}=𝒪(1)2881𝑝1subscript𝜆21𝒪1\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}=\mathcal{O}(1)roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } = caligraphic_O ( 1 ). If we choose χ𝜒\chiitalic_χ such that χ=max{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi=\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ = roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }, then for any desired accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the expected communication complexity is bounded by

pλ2:p×(iteration complexity)=𝒪(pσ2nϵ2+p1λ2σϵ3/2+11λ21ϵ).:𝑝subscript𝜆2𝑝(iteration complexity)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝1subscript𝜆2𝜎superscriptitalic-ϵ3211subscript𝜆21italic-ϵp\geq\lambda_{2}:\ p\times\text{(iteration complexity)}=\mathcal{O}\left(\frac% {p\sigma^{2}}{n\epsilon^{2}}+\frac{p}{\sqrt{1-\lambda_{2}}}\frac{\sigma}{% \epsilon^{\nicefrac{{3}}{{2}}}}+\frac{1}{1-\lambda_{2}}\frac{1}{\epsilon}% \right).italic_p ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_p × (iteration complexity) = caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) .

B.3 Convergence Analysis: Convex

By Lemma 1 and Lemma 2, we also can deduce the following lemma.

Theorem 5.

Suppose that Assumptions 1, 2, and 4 hold. Under the additional Assumption 3 with μ0𝜇0\mu\geq 0italic_μ ≥ 0, if β=1𝛽1\beta=1italic_β = 1, α𝛼\alphaitalic_α and χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }

0<αmin{12L,1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)324χ3414L},0𝛼12𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2324superscript𝜒314𝐿\displaystyle 0<\alpha\leq\min\left\{\frac{1}{2L},\frac{1-\lambda_{2}}{32\sqrt% {3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L},% \sqrt[4]{\frac{(1-\lambda_{2})^{3}}{24\chi^{3}}}\frac{1}{4L}\right\},0 < italic_α ≤ roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 24 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } , (39)

it holds that

1Tt=0T1𝔼[f(𝐱¯t)f]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ 2𝐱¯0𝐱2αT+192χ2α2Lς02(1λ2)2T2superscriptnormsuperscript¯𝐱0superscript𝐱2𝛼𝑇192superscript𝜒2superscript𝛼2𝐿subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇\displaystyle\frac{2\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}}{\alpha T}+% \frac{192\chi^{2}\alpha^{2}L\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}divide start_ARG 2 ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG
+2ασ2n+Lα2σ2(χ3+384χ(2χ2+(1p)))2(1λ2)χ2.2𝛼superscript𝜎2𝑛𝐿superscript𝛼2superscript𝜎2superscript𝜒3384𝜒2superscript𝜒21𝑝21subscript𝜆2superscript𝜒2\displaystyle\quad+\frac{2\alpha\sigma^{2}}{n}+\frac{L\alpha^{2}\sigma^{2}\big% {(}\chi^{3}+384\chi(2\chi^{2}+(1-p))\big{)}}{2(1-\lambda_{2})\chi^{2}}.+ divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 384 italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (40)
Proof.

See Appendix H. ∎

Similar as the analysis of non-convex setting, with Theorem 5, we have the following results.

Corollary 3.

Suppose that Assumptions 1, 2, and 4 hold. Under the additional Assumption 3 with μ0𝜇0\mu\geq 0italic_μ ≥ 0, if β=1𝛽1\beta=1italic_β = 1 and χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }, there exist a constant α=𝒪(1λ2χL)𝛼𝒪1subscript𝜆2𝜒𝐿\alpha=\mathcal{O}\left(\frac{1-\lambda_{2}}{\chi L}\right)italic_α = caligraphic_O ( divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_χ italic_L end_ARG ) such that

1Tt=0T1𝔼[f(𝐱¯t)f]𝒪((σ2nT)12+(χσ2(1λ2)T2)13+χ(1λ2)T).1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓𝒪superscriptsuperscript𝜎2𝑛𝑇12superscript𝜒superscript𝜎21subscript𝜆2superscript𝑇213𝜒1subscript𝜆2𝑇\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leq\mathcal{O}\left(\left(\frac{\sigma^{2}}{nT}\right)^{% \frac{1}{2}}+\left(\frac{\chi\sigma^{2}}{(1-\lambda_{2})T^{2}}\right)^{\frac{1% }{3}}+\frac{\chi}{(1-\lambda_{2})T}\right).divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + ( divide start_ARG italic_χ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T end_ARG ) . (41)
Proof.

See Appendix I. ∎

Similar as the analysis of non-convex setting, when p<λ2𝑝subscript𝜆2p<\lambda_{2}italic_p < italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and choosing χ=max{288(1p)1λ2,1}𝒪(11λ2)𝜒2881𝑝1subscript𝜆21𝒪11subscript𝜆2\chi=\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}\leq\mathcal{O}\left(% \frac{1}{1-\lambda_{2}}\right)italic_χ = roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } ≤ caligraphic_O ( divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ), for any desired accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the expected communication complexity is bounded by

p×(iteration complexity)=𝒪(pσ2nϵ2+p1λ2σLϵ3/2+1(1λ2)2Lϵ).𝑝(iteration complexity)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝1subscript𝜆2𝜎𝐿superscriptitalic-ϵ321superscript1subscript𝜆22𝐿italic-ϵp\times\text{(iteration complexity)}=\mathcal{O}\left(\frac{p\sigma^{2}}{n% \epsilon^{2}}+\frac{p}{1-\lambda_{2}}\frac{\sigma\sqrt{L}}{\epsilon^{\nicefrac% {{3}}{{2}}}}+\frac{1}{(1-\lambda_{2})^{2}}\frac{L}{\epsilon}\right).italic_p × (iteration complexity) = caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_σ square-root start_ARG italic_L end_ARG end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_L end_ARG start_ARG italic_ϵ end_ARG ) .

When pλ2𝑝subscript𝜆2p\geq\lambda_{2}italic_p ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have max{288(1p)1λ2,1}=𝒪(1)2881𝑝1subscript𝜆21𝒪1\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}=\mathcal{O}(1)roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } = caligraphic_O ( 1 ). If we choose χ𝜒\chiitalic_χ such that χ=max{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi=\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ = roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }, then for any desired accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the expected communication complexity is bounded by

p×(iteration complexity)=𝒪(pσ2nϵ2+p1λ2σLϵ3/2+11λ2Lϵ).𝑝(iteration complexity)𝒪𝑝superscript𝜎2𝑛superscriptitalic-ϵ2𝑝1subscript𝜆2𝜎𝐿superscriptitalic-ϵ3211subscript𝜆2𝐿italic-ϵp\times\text{(iteration complexity)}=\mathcal{O}\left(\frac{p\sigma^{2}}{n% \epsilon^{2}}+\frac{p}{\sqrt{1-\lambda_{2}}}\frac{\sigma\sqrt{L}}{\epsilon^{% \nicefrac{{3}}{{2}}}}+\frac{1}{1-\lambda_{2}}\frac{L}{\epsilon}\right).italic_p × (iteration complexity) = caligraphic_O ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_p end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ square-root start_ARG italic_L end_ARG end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_L end_ARG start_ARG italic_ϵ end_ARG ) .

B.4 Convergence Analysis: Strongly Convex

By Lemma 1 and Lemma 2, we also can deduce the following lemma.

Theorem 6.

Suppose that Assumptions 1, 2, and 4 hold. Under the additional Assumption 3 with μ>0𝜇0\mu>0italic_μ > 0, If β=1𝛽1\beta=1italic_β = 1, α𝛼\alphaitalic_α and χ𝜒\chiitalic_χ satisfy that χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } and

0<αmin{12L,1λ2323χL,(1+λn)(1λ2)2χ12L,72μL2,1γ12L+μ/2,4μ(1γ)3L},0𝛼12𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿72𝜇superscript𝐿21𝛾12𝐿𝜇234𝜇1𝛾𝐿\displaystyle 0<\alpha\leq\min\left\{\frac{1}{2L},\frac{1-\lambda_{2}}{32\sqrt% {3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L},% \frac{72\mu}{L^{2}},\frac{1-\gamma}{12L+\nicefrac{{\mu}}{{2}}},\frac{\sqrt[3]{% 4\mu(1-\gamma)}}{L}\right\},0 < italic_α ≤ roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 72 italic_μ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG 1 - italic_γ end_ARG start_ARG 12 italic_L + / start_ARG italic_μ end_ARG start_ARG 2 end_ARG end_ARG , divide start_ARG nth-root start_ARG 3 end_ARG start_ARG 4 italic_μ ( 1 - italic_γ ) end_ARG end_ARG start_ARG italic_L end_ARG } , (42)

where γ=112χ(1λ2)<1𝛾112𝜒1subscript𝜆21\gamma=\sqrt{1-\frac{1}{2\chi}(1-\lambda_{2})}<1italic_γ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 1, it holds that

𝔼[𝐱¯t𝐱2]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡superscript𝐱2absent\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\big% {\|}^{2}\right]\leqblackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ (1αμ4)t(𝐱¯0𝐱2+8χα2ς021λ2)superscript1𝛼𝜇4𝑡superscriptnormsuperscript¯𝐱0superscript𝐱28𝜒superscript𝛼2superscriptsubscript𝜍021subscript𝜆2\displaystyle\Big{(}1-\frac{\alpha\mu}{4}\Big{)}^{t}\Big{(}\big{\|}\bar{{\bf{x% }}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}+\frac{8\chi\alpha^{2}\varsigma_{0}^{2}}{1% -\lambda_{2}}\Big{)}( 1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 8 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG )
+2ασ2μn+7Lα2σ2(192χ2+(4χ2+2(1p)))12μ(1λ2)χ.2𝛼superscript𝜎2𝜇𝑛7𝐿superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝12𝜇1subscript𝜆2𝜒\displaystyle+\frac{2\alpha\sigma^{2}}{\mu n}+\frac{7L\alpha^{2}\sigma^{2}(192% \chi^{2}+(4\chi^{2}+2(1-p)))}{12\mu(1-\lambda_{2})\chi}.+ divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG + divide start_ARG 7 italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 12 italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ end_ARG . (43)
Proof.

See Appendix J. ∎

Based on Theorem 6, we can even get a tighter rate by carefully selecting the step size similar to [35].

Corollary 4.

Suppose that Assumptions 1, 2, and 4 hold. Under the additional Assumption 3 with μ>0𝜇0\mu>0italic_μ > 0, if β=1𝛽1\beta=1italic_β = 1 and χmax{288(1p)1λ2,1}𝜒2881𝑝1subscript𝜆21\chi\geq\max\left\{\frac{288(1-p)}{1-\lambda_{2}},1\right\}italic_χ ≥ roman_max { divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 }, there exist a constant α=𝒪(μ(1λ2)χL2)𝛼𝒪𝜇1subscript𝜆2𝜒superscript𝐿2\alpha=\mathcal{O}\left(\frac{\mu(1-\lambda_{2})}{\chi L^{2}}\right)italic_α = caligraphic_O ( divide start_ARG italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_χ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) such that

𝔼[𝐱¯T𝐱2]𝒪~(σ2nT+σ2χ(1λ2)T2+exp[(1λ2)Tχ]).𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2~𝒪superscript𝜎2𝑛𝑇superscript𝜎2𝜒1subscript𝜆2superscript𝑇2expdelimited-[]1subscript𝜆2𝑇𝜒\displaystyle\mathbb{E}[\|\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\|^{2}]\leq\tilde% {\mathcal{O}}\left(\frac{\sigma^{2}}{nT}+\frac{\sigma^{2}\chi}{(1-\lambda_{2})% T^{2}}+\mathrm{exp}\Big{[}-\frac{(1-\lambda_{2})T}{\chi}\Big{]}\right).blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + roman_exp [ - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_T end_ARG start_ARG italic_χ end_ARG ] ) . (44)
Proof.

See Appendix K. ∎

Similar as the analysis of non-convex and convex settings, we have χ=max{288(1p)/1λ2,1}𝒪(1/(1λ2))𝜒2881𝑝1subscript𝜆21𝒪11subscript𝜆2\chi=\max\{\nicefrac{{288(1-p)}}{{1-\lambda_{2}}},1\}\leq\mathcal{O}(\nicefrac% {{1}}{{(1-\lambda_{2})}})italic_χ = roman_max { / start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } ≤ caligraphic_O ( / start_ARG 1 end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) if p<λ2𝑝subscript𝜆2p<\lambda_{2}italic_p < italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and χ=max{288(1p)/1λ2,1}=𝒪(1)𝜒2881𝑝1subscript𝜆21𝒪1\chi=\max\{\nicefrac{{288(1-p)}}{{1-\lambda_{2}}},1\}=\mathcal{O}(1)italic_χ = roman_max { / start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , 1 } = caligraphic_O ( 1 ) if pλ2𝑝subscript𝜆2p\geq\lambda_{2}italic_p ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Thus, for any desired accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the expected number of communication rounds required to achieve 𝔼[𝐱¯T𝐱2]ϵ𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2italic-ϵ\mathbb{E}[\|\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\|^{2}]\leq\epsilonblackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ϵ is bounded by

p×(iteration complexity)=𝒪~(pσ2nϵ+p1λ2σϵ+log1/ϵ(1λ2)2),p(0,λ2),formulae-sequence𝑝(iteration complexity)~𝒪𝑝superscript𝜎2𝑛italic-ϵ𝑝1subscript𝜆2𝜎italic-ϵlog1italic-ϵsuperscript1subscript𝜆22𝑝0subscript𝜆2p\times\text{(iteration complexity)}=\tilde{\mathcal{O}}\left(\frac{p\sigma^{2% }}{n\epsilon}+\frac{p}{{1-\lambda_{2}}}\frac{\sigma}{\sqrt{\epsilon}}+\frac{% \mathrm{log}\nicefrac{{1}}{{\epsilon}}}{(1-\lambda_{2})^{2}}\right),\ p\in(0,% \lambda_{2}),italic_p × (iteration complexity) = over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG + divide start_ARG italic_p end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , italic_p ∈ ( 0 , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

and

p×(iteration complexity)=𝒪~(pσ2nϵ+p1λ2σϵ+log1/ϵ1λ2),p[λ2,1].formulae-sequence𝑝(iteration complexity)~𝒪𝑝superscript𝜎2𝑛italic-ϵ𝑝1subscript𝜆2𝜎italic-ϵlog1italic-ϵ1subscript𝜆2𝑝subscript𝜆21p\times\text{(iteration complexity)}=\tilde{\mathcal{O}}\left(\frac{p\sigma^{2% }}{n\epsilon}+\frac{p}{\sqrt{1-\lambda_{2}}}\frac{\sigma}{\sqrt{\epsilon}}+% \frac{\mathrm{log}\nicefrac{{1}}{{\epsilon}}}{1-\lambda_{2}}\right),\ p\in[% \lambda_{2},1].italic_p × (iteration complexity) = over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_p italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_ϵ end_ARG + divide start_ARG italic_p end_ARG start_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG + divide start_ARG roman_log / start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) , italic_p ∈ [ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 ] .

Appendix C Proof Theorem 3

Then, we further prove ProxSkip can achieve linear speedup with network-independent stepsize. We introduce new iterates {𝐔t}superscript𝐔𝑡\{{\bf{U}}^{t}\}{ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } to facilitate the analysis. Similar techniques can be found, e.g., in [41, 42, 6], 𝐘t=α𝐖b𝐔tsuperscript𝐘𝑡𝛼subscript𝐖𝑏superscript𝐔𝑡{\bf{Y}}^{t}=\alpha{\bf{W}}_{b}{\bf{U}}^{t}bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_α bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Since 𝐈𝐖a=12χ𝐖b2𝐈subscript𝐖𝑎12𝜒subscriptsuperscript𝐖2𝑏{\bf{I}}-{\bf{W}}_{a}=\frac{1}{2\chi}{\bf{W}}^{2}_{b}bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, from (4b) and (4c), we have

{𝐗t+1=(1θt)𝐙^t+θt𝐖a𝐙^tα𝐖b𝐔t+1=α𝐖b𝐔t+β(𝐙^t𝐗t+1){𝐖b𝐔t+1=𝐖b𝐔t+βθt2χα𝐖b2𝐙^t𝐗t+1=𝐙^tαβ𝐖b(𝐔t+1𝐔t)casessuperscript𝐗𝑡1absent1subscript𝜃𝑡superscript^𝐙𝑡subscript𝜃𝑡subscript𝐖𝑎superscript^𝐙𝑡𝛼subscript𝐖𝑏superscript𝐔𝑡1absent𝛼subscript𝐖𝑏superscript𝐔𝑡𝛽superscript^𝐙𝑡superscript𝐗𝑡1casessubscript𝐖𝑏superscript𝐔𝑡1absentsubscript𝐖𝑏superscript𝐔𝑡𝛽subscript𝜃𝑡2𝜒𝛼superscriptsubscript𝐖𝑏2superscript^𝐙𝑡superscript𝐗𝑡1absentsuperscript^𝐙𝑡𝛼𝛽subscript𝐖𝑏superscript𝐔𝑡1superscript𝐔𝑡\left\{\begin{array}[]{rl}{\bf{X}}^{t+1}\ =&(1-\theta_{t})\hat{{\bf{Z}}}^{t}+% \theta_{t}{\bf{W}}_{a}\hat{{\bf{Z}}}^{t}\\ \alpha{\bf{W}}_{b}{\bf{U}}^{t+1}\ =&\alpha{\bf{W}}_{b}{\bf{U}}^{t}+\beta(\hat{% {\bf{Z}}}^{t}-{\bf{X}}^{t+1})\end{array}\right.\Longleftrightarrow\left\{% \begin{array}[]{rl}{\bf{W}}_{b}{\bf{U}}^{t+1}\ =&{\bf{W}}_{b}{\bf{U}}^{t}+% \frac{\beta\theta_{t}}{2\chi\alpha}{\bf{W}}_{b}^{2}\hat{{\bf{Z}}}^{t}\\ {\bf{X}}^{t+1}\ =&\hat{{\bf{Z}}}^{t}-\frac{\alpha}{\beta}{\bf{W}}_{b}({\bf{U}}% ^{t+1}-{\bf{U}}^{t})\end{array}\right.{ start_ARRAY start_ROW start_CELL bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = end_CELL start_CELL ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_α bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = end_CELL start_CELL italic_α bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_β ( over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ⟺ { start_ARRAY start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = end_CELL start_CELL bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG italic_β italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_χ italic_α end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = end_CELL start_CELL over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY

Therefore, letting 𝐘0=0superscript𝐘00{\bf{Y}}^{0}=0bold_Y start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 0, we have the following equivalent form of ProxSkip (4) in the sense that they generate an identical sequence (𝐗t,𝐙^t)superscript𝐗𝑡superscript^𝐙𝑡({\bf{X}}^{t},\hat{{\bf{Z}}}^{t})( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ).

𝐙^tsuperscript^𝐙𝑡\displaystyle\hat{{\bf{Z}}}^{t}over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐗tα𝐆tα𝐖b𝐔t,absentsuperscript𝐗𝑡𝛼superscript𝐆𝑡𝛼subscript𝐖𝑏superscript𝐔𝑡\displaystyle=\ {\bf{X}}^{t}-\alpha{\bf{G}}^{t}-\alpha{\bf{W}}_{b}{\bf{U}}^{t},= bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (45a)
𝐔t+1superscript𝐔𝑡1\displaystyle{\bf{U}}^{t+1}bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐔t+βθt2χα𝐖b𝐙^t,absentsuperscript𝐔𝑡𝛽subscript𝜃𝑡2𝜒𝛼subscript𝐖𝑏superscript^𝐙𝑡\displaystyle=\ {\bf{U}}^{t}+\frac{\beta\theta_{t}}{2\chi\alpha}{\bf{W}}_{b}% \hat{{\bf{Z}}}^{t},= bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG italic_β italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_χ italic_α end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (45b)
𝐗t+1superscript𝐗𝑡1\displaystyle{\bf{X}}^{t+1}bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐙^tαβ𝐖b(𝐔t+1𝐔t).absentsuperscript^𝐙𝑡𝛼𝛽subscript𝐖𝑏superscript𝐔𝑡1superscript𝐔𝑡\displaystyle=\ \hat{{\bf{Z}}}^{t}-\frac{\alpha}{\beta}{\bf{W}}_{b}({\bf{U}}^{% t+1}-{\bf{U}}^{t}).= over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) . (45c)

This equivalent form is more useful for the subsequent convergence analysis. The optimality condition of problem (1) is as the following lemma.

Lemma 3.

Suppose that Assumption 1 holds. If there exists a point (𝐗,𝐔)superscript𝐗superscript𝐔({\bf{X}}^{\star},{\bf{U}}^{\star})( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) that satisfies:

00\displaystyle 0 =F(𝐗)+𝐖b𝐔,absent𝐹superscript𝐗subscript𝐖𝑏superscript𝐔\displaystyle=\nabla F({\bf{X}}^{\star})+{\bf{W}}_{b}{\bf{U}}^{\star},= ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , (46a)
00\displaystyle 0 =𝐖b𝐙,absentsubscript𝐖𝑏superscript𝐙\displaystyle={\bf{W}}_{b}{\bf{Z}}^{\star},= bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_Z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , (46b)

then it holds that 𝐗=[𝐱,𝐱,,𝐱]𝖳superscript𝐗superscriptsuperscript𝐱superscript𝐱superscript𝐱𝖳{\bf{X}}^{\star}=[{\bf{x}}^{\star},{\bf{x}}^{\star},\ldots,{\bf{x}}^{\star}]^{% \sf T}bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = [ bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , … , bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, where 𝐱dsuperscript𝐱superscript𝑑{\bf{x}}^{\star}\in\mathbb{R}^{d}bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a stationary point to problem (1).

From Lemma 3, when 𝐆t=F(𝐗t)superscript𝐆𝑡𝐹superscript𝐗𝑡{\bf{G}}^{t}=\nabla F({\bf{X}}^{t})bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), we have that any fixed point of (45) satisfies the condition (46). We also define the following notations to simplify the analysis:

𝐙~t𝐙^t𝐗,𝐗~t𝐗t𝐗,𝐔~tα(𝐔t𝐔),𝐞¯t𝐱¯t(𝐱)𝖳,formulae-sequencesuperscript~𝐙𝑡superscript^𝐙𝑡superscript𝐗formulae-sequencesuperscript~𝐗𝑡superscript𝐗𝑡superscript𝐗formulae-sequencesuperscript~𝐔𝑡𝛼superscript𝐔𝑡superscript𝐔superscript¯𝐞𝑡superscript¯𝐱𝑡superscriptsuperscript𝐱𝖳\displaystyle\widetilde{{\bf{Z}}}^{t}\triangleq\hat{{\bf{Z}}}^{t}-{\bf{X}}^{% \star},\quad\widetilde{{\bf{X}}}^{t}\triangleq{\bf{X}}^{t}-{\bf{X}}^{\star},% \quad\widetilde{{\bf{U}}}^{t}\triangleq\alpha({\bf{U}}^{t}-{\bf{U}}^{\star}),% \quad\bar{{\bf{e}}}^{t}\triangleq\bar{{\bf{x}}}^{t}-({\bf{x}}^{\star})^{\sf T},over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ italic_α ( bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ,

where (𝐗,𝐔)superscript𝐗superscript𝐔({\bf{X}}^{\star},{\bf{U}}^{\star})( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) satisfies (46). Similar as Lemma 1, we give another error dynamics of ProxSkip, which will be used for proving the linear speedup with network-independent stepsizes of ProxSkip under strongly convexity.

Lemma 4.

Suppose that Assumption 1 holds. If β=p𝛽𝑝\beta=pitalic_β = italic_p and χp1𝜒𝑝1\chi p\geq 1italic_χ italic_p ≥ 1, there exist a invertible matrix 𝐐ssuperscript𝐐s{\bf{Q}}^{\mathrm{s}}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT and a diagonal matrix 𝚪𝚪{\bf{\Gamma}}bold_Γ such that

𝐞¯t+1superscript¯𝐞𝑡1\displaystyle\bar{{\bf{e}}}^{t+1}over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐞¯tαF¯(𝐗t)α𝐬¯k,absentsuperscript¯𝐞𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑘\displaystyle=\bar{{\bf{e}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t})-% \alpha\bar{{\bf{s}}}^{k},= over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , (47a)
st+1superscriptsubscripts𝑡1\displaystyle\mathcal{E}_{\mathrm{s}}^{t+1}caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝚪stαυ(𝐐s)1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)]:=𝔾st+υ(𝐐s)1[𝚲^b𝐏^𝖳𝐄tp𝐏^𝖳𝐄t]:=𝔽st,absentsubscript𝚪superscriptsubscripts𝑡𝛼𝜐superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡assignabsentsuperscriptsubscript𝔾s𝑡subscript𝜐superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡𝑝superscript^𝐏𝖳superscript𝐄𝑡assignabsentsuperscriptsubscript𝔽s𝑡\displaystyle=\underbrace{{\bf{\Gamma}}\mathcal{E}_{\mathrm{s}}^{t}-\alpha% \upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}\hat{{\bf{\Lambda}% }}_{a}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})% +{\bf{S}}^{t})\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \end{array}\right]}_{:=\mathbb{G}_{\mathrm{s}}^{t}}+\underbrace{\upsilon({\bf{% Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}-\hat{{\bf{\Lambda}}}_{b}\hat{{% \bf{P}}}^{\sf T}{\bf{E}}^{t}\\ p\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right]}_{:=\mathbb{F}_{\mathrm{s}}^{t}},= under⏟ start_ARG bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := blackboard_F start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (47f)

where υ𝜐\upsilonitalic_υ is an arbitrary strictly positive constant,

stυ(𝐐s)1[𝐏^𝖳𝐗~t𝐏^𝖳𝐔~t] and γ𝚪=112χ(1λ2)<1.superscriptsubscripts𝑡𝜐superscriptsuperscript𝐐s1delimited-[]superscript^𝐏𝖳superscript~𝐗𝑡superscript^𝐏𝖳superscript~𝐔𝑡 and 𝛾norm𝚪112𝜒1subscript𝜆21\mathcal{E}_{\mathrm{s}}^{t}\triangleq\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}% \left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t}\\ \end{array}\right]\text{ and }\gamma\triangleq\|{\bf{\Gamma}}\|=\sqrt{1-\frac{% 1}{2\chi}(1-\lambda_{2})}<1.caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] and italic_γ ≜ ∥ bold_Γ ∥ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 1 .

Moreover, we have 𝐐s2(𝐐s)128χ2p2(1+λn)superscriptnormsuperscript𝐐s2superscriptnormsuperscriptsuperscript𝐐s128superscript𝜒2superscript𝑝21subscript𝜆𝑛\|{\bf{Q}}^{\mathrm{s}}\|^{2}\|({\bf{Q}}^{\mathrm{s}})^{-1}\|^{2}\leq\frac{8% \chi^{2}}{p^{2}(1+\lambda_{n})}∥ bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 8 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG.

Proof.

See Appendix L. ∎

With this error dynamics, similar as Lemma 2, we give the following descent inequalities.

Lemma 5.

Suppose that Assumptions 1, 2, and 4 hold, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-strongly convex for some 0<μL0𝜇𝐿0<\mu\leq L0 < italic_μ ≤ italic_L. Let υ=1/(𝐐s)1𝜐1normsuperscriptsuperscript𝐐s1\upsilon=1/\|({\bf{Q}}^{\mathrm{s}})^{-1}\|italic_υ = 1 / ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥. If α12L𝛼12𝐿\alpha\leq\frac{1}{2L}italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, it holds that

𝔼[𝐱¯t+1𝐱2|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] (1μα)𝐱¯t𝐱2+2αLϑsnstF2+α2σ2n,absent1𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱22𝛼𝐿subscriptitalic-ϑs𝑛superscriptsubscriptnormsuperscriptsubscript𝑠𝑡F2superscript𝛼2superscript𝜎2𝑛\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+% \frac{2\alpha L\vartheta_{\mathrm{s}}}{n}\|\mathcal{E}_{s}^{t}\|_{\mathrm{F}}^% {2}+\frac{\alpha^{2}\sigma^{2}}{n},≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG , (48)
𝔼[st+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscripts𝑡1F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] γ~sstF2+D1𝐗~tF2+D2nα2σ2,absentsubscript~𝛾ssuperscriptsubscriptnormsuperscriptsubscripts𝑡F2subscript𝐷1superscriptsubscriptnormsuperscript~𝐗𝑡F2subscript𝐷2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq\tilde{\gamma}_{\mathrm{s}}\|\mathcal{E}_{\mathrm{s}}^{t}\|_{% \mathrm{F}}^{2}+D_{1}\|\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+D_{2}n% \alpha^{2}\sigma^{2},≤ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (49)

where ϑs=𝐐s2(𝐐s)12subscriptitalic-ϑssuperscriptnormsuperscript𝐐s2superscriptnormsuperscriptsuperscript𝐐s12\vartheta_{\mathrm{s}}=\|{\bf{Q}}^{\mathrm{s}}\|^{2}\|({\bf{Q}}^{\mathrm{s}})^% {-1}\|^{2}italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = ∥ bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, γ~s=γ+3(1p)(2+p2)χ2subscript~𝛾s𝛾31𝑝2superscript𝑝2superscript𝜒2\tilde{\gamma}_{\mathrm{s}}=\gamma+\frac{3(1-p)(2+p^{2})}{\chi^{2}}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = italic_γ + divide start_ARG 3 ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

D1=α2L2(2χ2+p2)2χ2(1γ)+3α2L2(1p)(2+p2)2χ2,D2=(1p)(2+p2)+(p2+2χ2)2χ2.formulae-sequencesubscript𝐷1superscript𝛼2superscript𝐿22superscript𝜒2superscript𝑝22superscript𝜒21𝛾3superscript𝛼2superscript𝐿21𝑝2superscript𝑝22superscript𝜒2subscript𝐷21𝑝2superscript𝑝2superscript𝑝22superscript𝜒22superscript𝜒2D_{1}=\frac{\alpha^{2}L^{2}(2\chi^{2}+p^{2})}{2\chi^{2}(1-\gamma)}+\frac{3% \alpha^{2}L^{2}(1-p)(2+p^{2})}{2\chi^{2}},\ D_{2}=\frac{(1-p)(2+p^{2})+(p^{2}+% 2\chi^{2})}{2\chi^{2}}.italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ ) end_ARG + divide start_ARG 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .
Proof.

See Appendix M. ∎

With Lemmas 4 and 5, we have the following Theorem.

Theorem 7.

Suppose that Assumptions1, 2, and 4 hold, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is μ𝜇\muitalic_μ-strongly convex for some 0<μL0𝜇𝐿0<\mu\leq L0 < italic_μ ≤ italic_L. If 0<α12L0𝛼12𝐿0<\alpha\leq\frac{1}{2L}0 < italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, β=p𝛽𝑝\beta=pitalic_β = italic_p, and

χ>max{1p,361λ2,72(1p)1λ2},𝜒1𝑝361subscript𝜆2721𝑝1subscript𝜆2\displaystyle\chi>\max\left\{\frac{1}{p},\frac{36}{1-\lambda_{2}},\frac{72(1-p% )}{1-\lambda_{2}}\right\},italic_χ > roman_max { divide start_ARG 1 end_ARG start_ARG italic_p end_ARG , divide start_ARG 36 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , divide start_ARG 72 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG } , (50)

it holds that γ~s<1subscript~𝛾s1\tilde{\gamma}_{\mathrm{s}}<1over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT < 1 and

𝔼[𝐱¯t+1𝐱2]ζ0t+1a0+𝒪(α4σ2L3χ4μp2(1λ2)2(1ζ)+α2σ2Lχ3μp2(1λ2))+ασ2nμ,𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscriptsubscript𝜁0𝑡1subscript𝑎0𝒪superscript𝛼4superscript𝜎2superscript𝐿3superscript𝜒4𝜇superscript𝑝2superscript1subscript𝜆221𝜁superscript𝛼2superscript𝜎2𝐿superscript𝜒3𝜇superscript𝑝21subscript𝜆2𝛼superscript𝜎2𝑛𝜇\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\right]\leq\zeta_{0}^{t+1}a_{0}+\mathcal{O}\left(\frac{\alpha^{4}% \sigma^{2}L^{3}\chi^{4}}{\mu p^{2}(1-\lambda_{2})^{2}(1-\zeta)}+\frac{\alpha^{% 2}\sigma^{2}L\chi^{3}}{\mu p^{2}(1-\lambda_{2})}\right)+\frac{\alpha\sigma^{2}% }{n\mu},blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + caligraphic_O ( divide start_ARG italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_ζ ) end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_μ end_ARG , (51)

where a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a constant that depends on the initialization and ζ0=max{1αμ,1(1λ2)p22χ}<1subscript𝜁01𝛼𝜇11subscript𝜆2superscript𝑝22𝜒1\zeta_{0}=\max\{1-\alpha\mu,\sqrt{1-\frac{(1-\lambda_{2})p^{2}}{2\chi}}\}<1italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { 1 - italic_α italic_μ , square-root start_ARG 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG end_ARG } < 1.

Proof.

See Appendix N. ∎

Appendix D Proof of Lemma 1

Proof.

It follows from (4b), 𝐈𝐖a=12χ𝐖b2𝐈subscript𝐖𝑎12𝜒superscriptsubscript𝐖𝑏2\mathbf{I}-\mathbf{W}_{a}=\frac{1}{2\chi}\mathbf{W}_{b}^{2}bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 𝐄t=12χ(θt1)𝐖b𝐙^tsuperscript𝐄𝑡12𝜒subscript𝜃𝑡1subscript𝐖𝑏superscript^𝐙𝑡{\bf{E}}^{t}=\frac{1}{2\chi}(\theta_{t}-1){\bf{W}}_{b}\hat{{\bf{Z}}}^{t}bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT that

𝐗t+1superscript𝐗𝑡1\displaystyle\mathbf{X}^{t+1}bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =(1θt)𝐙^t+θt𝐖a𝐙^tabsent1subscript𝜃𝑡superscript^𝐙𝑡subscript𝜃𝑡subscript𝐖𝑎superscript^𝐙𝑡\displaystyle=(1-\theta_{t})\hat{\mathbf{Z}}^{t}+\theta_{t}\mathbf{W}_{a}\hat{% \mathbf{Z}}^{t}= ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
=𝐖a𝐙^t+(1θt)𝐙^t+θt𝐖a𝐙^t𝐖a𝐙^tabsentsubscript𝐖𝑎superscript^𝐙𝑡1subscript𝜃𝑡superscript^𝐙𝑡subscript𝜃𝑡subscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑎superscript^𝐙𝑡\displaystyle={\bf{W}}_{a}\hat{\mathbf{Z}}^{t}+(1-\theta_{t})\hat{\mathbf{Z}}^% {t}+\theta_{t}\mathbf{W}_{a}\hat{\mathbf{Z}}^{t}-{\bf{W}}_{a}\hat{\mathbf{Z}}^% {t}= bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
=𝐖a𝐙^t+(1θt)(𝐈𝐖a)𝐙^tabsentsubscript𝐖𝑎superscript^𝐙𝑡1subscript𝜃𝑡𝐈subscript𝐖𝑎superscript^𝐙𝑡\displaystyle={\bf{W}}_{a}\hat{\mathbf{Z}}^{t}+(1-\theta_{t})({\bf{I}}-{\bf{W}% }_{a})\hat{\mathbf{Z}}^{t}= bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
=𝐖a𝐙^t𝐖b𝐄t.absentsubscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡\displaystyle={\bf{W}}_{a}\hat{\mathbf{Z}}^{t}-{\bf{W}}_{b}{\bf{E}}^{t}.= bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .

Since β=1𝛽1\beta=1italic_β = 1, it follows from (4b), (4c), and 𝐈𝐖a=12χ𝐖b2𝐈subscript𝐖𝑎12𝜒superscriptsubscript𝐖𝑏2\mathbf{I}-\mathbf{W}_{a}=\frac{1}{2\chi}\mathbf{W}_{b}^{2}bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT that

𝐘t+1=𝐘t+12χ𝐖b2𝐙^t+𝐖b𝐄t=𝐘t+(𝐈𝐖a)𝐙^t+𝐖b𝐄t.superscript𝐘𝑡1superscript𝐘𝑡12𝜒superscriptsubscript𝐖𝑏2superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡superscript𝐘𝑡𝐈subscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡{\bf{Y}}^{t+1}={\bf{Y}}^{t}+\frac{1}{2\chi}{\bf{W}}_{b}^{2}\hat{{\bf{Z}}}^{t}+% {\bf{W}}_{b}{\bf{E}}^{t}={\bf{Y}}^{t}+({\bf{I}}-{\bf{W}}_{a})\hat{{\bf{Z}}}^{t% }+{\bf{W}}_{b}{\bf{E}}^{t}.bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .

By 𝐑t=𝐘t+αF(𝐗¯t)superscript𝐑𝑡superscript𝐘𝑡𝛼𝐹superscript¯𝐗𝑡{\bf{R}}^{t}={\bf{Y}}^{t}+\alpha\nabla F(\bar{{\bf{X}}}^{t})bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), 𝚺2t=F(𝐗¯t)F(𝐗¯t+1)superscriptsubscript𝚺2𝑡𝐹superscript¯𝐗𝑡𝐹superscript¯𝐗𝑡1{\bf{\Sigma}}_{2}^{t}=\nabla F(\bar{{\bf{X}}}^{t})-\nabla F(\bar{{\bf{X}}}^{t+% 1})bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ), and 𝐄t=12χ(θt1)𝐖b𝐙^tsuperscript𝐄𝑡12𝜒subscript𝜃𝑡1subscript𝐖𝑏superscript^𝐙𝑡{\bf{E}}^{t}=\frac{1}{2\chi}(\theta_{t}-1){\bf{W}}_{b}\hat{{\bf{Z}}}^{t}bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, we have

𝐑t+1𝐑tsuperscript𝐑𝑡1superscript𝐑𝑡\displaystyle{\bf{R}}^{t+1}-{\bf{R}}^{t}bold_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐘t+1𝐘t+α(F(𝐗¯t+1)F(𝐗¯t))absentsuperscript𝐘𝑡1superscript𝐘𝑡𝛼𝐹superscript¯𝐗𝑡1𝐹superscript¯𝐗𝑡\displaystyle={\bf{Y}}^{t+1}-{\bf{Y}}^{t}+\alpha(\nabla F(\bar{{\bf{X}}}^{t+1}% )-\nabla F(\bar{{\bf{X}}}^{t}))= bold_Y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ( ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=(𝐈𝐖a)𝐙^t+𝐖b𝐄t+α(F(𝐗¯t+1)F(𝐗¯t))absent𝐈subscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡𝛼𝐹superscript¯𝐗𝑡1𝐹superscript¯𝐗𝑡\displaystyle=({\bf{I}}-{\bf{W}}_{a})\hat{{\bf{Z}}}^{t}+{\bf{W}}_{b}{\bf{E}}^{% t}+\alpha(\nabla F(\bar{{\bf{X}}}^{t+1})-\nabla F(\bar{{\bf{X}}}^{t}))= ( bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ( ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=(𝐈𝐖a)𝐙^t+𝐖b𝐄tα𝚺2t.absent𝐈subscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡𝛼superscriptsubscript𝚺2𝑡\displaystyle=({\bf{I}}-{\bf{W}}_{a})\hat{{\bf{Z}}}^{t}+{\bf{W}}_{b}{\bf{E}}^{% t}-\alpha{\bf{\Sigma}}_{2}^{t}.= ( bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .

Note that 𝚺1t=F(𝐗t)F(𝐗¯t)+𝐒tsuperscriptsubscript𝚺1𝑡𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript𝐒𝑡{\bf{\Sigma}}_{1}^{t}=\nabla F({\bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t})+{\bf% {S}}^{t}bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Algorithm 1 (update (4)) is equivalent to

𝐙^tsuperscript^𝐙𝑡\displaystyle\hat{{\bf{Z}}}^{t}over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐗t𝐑tα𝚺1t,absentsuperscript𝐗𝑡superscript𝐑𝑡𝛼superscriptsubscript𝚺1𝑡\displaystyle={\bf{X}}^{t}-{\bf{R}}^{t}-\alpha{\bf{\Sigma}}_{1}^{t},= bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,
𝐗t+1superscript𝐗𝑡1\displaystyle{\bf{X}}^{t+1}bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐖a𝐙^t𝐖b𝐄t,absentsubscript𝐖𝑎superscript^𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡\displaystyle={\bf{W}}_{a}\hat{{\bf{Z}}}^{t}-{\bf{W}}_{b}{\bf{E}}^{t},= bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,
𝐑t+1superscript𝐑𝑡1\displaystyle{\bf{R}}^{t+1}bold_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐑t+(𝐈𝐖a)𝐙^tα𝚺2t+𝐖b𝐄t,absentsuperscript𝐑𝑡𝐈subscript𝐖𝑎superscript^𝐙𝑡𝛼superscriptsubscript𝚺2𝑡subscript𝐖𝑏superscript𝐄𝑡\displaystyle={\bf{R}}^{t}+({\bf{I}}-{\bf{W}}_{a})\hat{{\bf{Z}}}^{t}-\alpha{% \bf{\Sigma}}_{2}^{t}+{\bf{W}}_{b}{\bf{E}}^{t},= bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,

which also can be rewritten as (since 𝐖a=𝐈12χ𝐖b2subscript𝐖𝑎𝐈12𝜒subscriptsuperscript𝐖2𝑏{\bf{W}}_{a}={\bf{I}}-\frac{1}{2\chi}{\bf{W}}^{2}_{b}bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT)

[𝐗t+1𝐑t+1]=delimited-[]superscript𝐗𝑡1superscript𝐑𝑡1absent\displaystyle\left[\begin{array}[]{c}{\bf{X}}^{t+1}\\ {\bf{R}}^{t+1}\\ \end{array}\right]=[ start_ARRAY start_ROW start_CELL bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [𝐖a𝐖a𝐈𝐖a𝐖a][𝐗t𝐑t]α[𝐖a𝚺1t12χ𝐖b2𝚺1t+𝚺2t]+[𝐖b𝐄t𝐖b𝐄t].delimited-[]subscript𝐖𝑎subscript𝐖𝑎𝐈subscript𝐖𝑎subscript𝐖𝑎delimited-[]superscript𝐗𝑡superscript𝐑𝑡𝛼delimited-[]subscript𝐖𝑎superscriptsubscript𝚺1𝑡12𝜒superscriptsubscript𝐖𝑏2superscriptsubscript𝚺1𝑡superscriptsubscript𝚺2𝑡delimited-[]subscript𝐖𝑏superscript𝐄𝑡subscript𝐖𝑏superscript𝐄𝑡\displaystyle\left[\begin{array}[]{cc}{\bf{W}}_{a}&-{\bf{W}}_{a}\\ {\bf{I}}-{\bf{W}}_{a}&{\bf{W}}_{a}\\ \end{array}\right]\left[\begin{array}[]{c}{\bf{X}}^{t}\\ {\bf{R}}^{t}\\ \end{array}\right]-\alpha\left[\begin{array}[]{c}{\bf{W}}_{a}{\bf{\Sigma}}_{1}% ^{t}\\ \frac{1}{2\chi}{\bf{W}}_{b}^{2}{\bf{\Sigma}}_{1}^{t}+{\bf{\Sigma}}_{2}^{t}\\ \end{array}\right]+\left[\begin{array}[]{c}-{\bf{W}}_{b}{\bf{E}}^{t}\\ {\bf{W}}_{b}{\bf{E}}^{t}\\ \end{array}\right].[ start_ARRAY start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_I - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] - italic_α [ start_ARRAY start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Multiplying both sides of the above by diag{𝐏1,𝐏1}diagsuperscript𝐏1superscript𝐏1\mathrm{diag}\{{\bf{P}}^{-1},{\bf{P}}^{-1}\}roman_diag { bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT } on the left, and using (29) and

𝐏1𝐗t=[𝐱¯t𝐏^𝖳𝐗t],𝐏1𝐑t=[αF¯(𝐗¯t)𝐏^𝖳𝐑t],𝐏1F(𝐗t)=[F¯(𝐗t)𝐏^𝖳F(𝐗t)],𝐏1𝐄t=[0𝐏^𝖳𝐄t].formulae-sequencesuperscript𝐏1superscript𝐗𝑡delimited-[]superscript¯𝐱𝑡superscript^𝐏𝖳superscript𝐗𝑡formulae-sequencesuperscript𝐏1superscript𝐑𝑡delimited-[]𝛼¯𝐹superscript¯𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡formulae-sequencesuperscript𝐏1𝐹superscript𝐗𝑡delimited-[]¯𝐹superscript𝐗𝑡superscript^𝐏𝖳𝐹superscript𝐗𝑡superscript𝐏1superscript𝐄𝑡delimited-[]0superscript^𝐏𝖳superscript𝐄𝑡{\bf{P}}^{-1}{\bf{X}}^{t}=\left[\begin{array}[]{c}\bar{{\bf{x}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}{\bf{X}}^{t}\\ \end{array}\right],\ {\bf{P}}^{-1}{\bf{R}}^{t}=\left[\begin{array}[]{c}\alpha% \overline{\nabla F}(\bar{{\bf{X}}}^{t})\\ \hat{{\bf{P}}}^{\sf T}{\bf{R}}^{t}\\ \end{array}\right],\ {\bf{P}}^{-1}\nabla F({\bf{X}}^{t})=\left[\begin{array}[]% {c}\overline{\nabla F}({\bf{X}}^{t})\\ \hat{{\bf{P}}}^{\sf T}\nabla F({\bf{X}}^{t})\\ \end{array}\right],\ {\bf{P}}^{-1}{\bf{E}}^{t}=\left[\begin{array}[]{c}0\\ \hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right].bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL italic_α over¯ start_ARG ∇ italic_F end_ARG ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = [ start_ARRAY start_ROW start_CELL over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

we have

𝐱¯t+1=superscript¯𝐱𝑡1absent\displaystyle\bar{{\bf{x}}}^{t+1}=over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = 𝐱¯tαF¯(𝐗t)α𝐬¯t,superscript¯𝐱𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\displaystyle~{}\bar{{\bf{x}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t})-% \alpha\bar{{\bf{s}}}^{t},over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,
[𝐏^𝖳𝐗t+1𝐏^𝖳𝐑t+1]=delimited-[]superscript^𝐏𝖳superscript𝐗𝑡1superscript^𝐏𝖳superscript𝐑𝑡1absent\displaystyle\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}{\bf{X}}^{t+1}\\ \hat{{\bf{P}}}^{\sf T}{\bf{R}}^{t+1}\\ \end{array}\right]=[ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [𝚲^a𝚲^a𝐈𝚲^a𝚲^a][𝐏^𝖳𝐗t𝐏^𝖳𝐑t]α[𝚲^a𝐏^𝖳𝚺1t12χ𝚲^b2𝐏^𝖳𝚺1t+𝐏^𝖳𝚺2t]+[𝚲^b𝐏^𝖳𝐄t𝚲^b𝐏^𝖳𝐄t].delimited-[]subscript^𝚲𝑎subscript^𝚲𝑎𝐈subscript^𝚲𝑎subscript^𝚲𝑎delimited-[]superscript^𝐏𝖳superscript𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡𝛼delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳superscriptsubscript𝚺1𝑡12𝜒superscriptsubscript^𝚲𝑏2superscript^𝐏𝖳superscriptsubscript𝚺1𝑡superscript^𝐏𝖳superscriptsubscript𝚺2𝑡delimited-[]subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡\displaystyle\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}&-\hat{{\bf{% \Lambda}}}_{a}\\ {\bf{I}}-\hat{{\bf{\Lambda}}}_{a}&\hat{{\bf{\Lambda}}}_{a}\\ \end{array}\right]\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}{\bf{X}}^{t}\\ \hat{{\bf{P}}}^{\sf T}{\bf{R}}^{t}\\ \end{array}\right]-\alpha\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{% {\bf{P}}}^{\sf T}{\bf{\Sigma}}_{1}^{t}\\ \frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}\hat{{\bf{P}}}^{\sf T}{\bf{\Sigma}}% _{1}^{t}+\hat{{\bf{P}}}^{\sf T}{\bf{\Sigma}}_{2}^{t}\\ \end{array}\right]+\left[\begin{array}[]{c}-\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{% P}}}^{\sf T}{\bf{E}}^{t}\\ \hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right].[ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] - italic_α [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Let

𝐇=[𝚲^a𝚲^a𝐈𝚲^a𝚲^a]=[𝐈12χ(𝐈𝚲^)(𝐈12χ(𝐈𝚲^))12χ(𝐈𝚲^)𝐈12χ(𝐈𝚲^)],𝐇delimited-[]subscript^𝚲𝑎subscript^𝚲𝑎𝐈subscript^𝚲𝑎subscript^𝚲𝑎delimited-[]𝐈12𝜒𝐈^𝚲𝐈12𝜒𝐈^𝚲12𝜒𝐈^𝚲𝐈12𝜒𝐈^𝚲{\bf{H}}=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}&-\hat{{\bf{\Lambda}% }}_{a}\\ {\bf{I}}-\hat{{\bf{\Lambda}}}_{a}&\hat{{\bf{\Lambda}}}_{a}\\ \end{array}\right]=\left[\begin{array}[]{cc}{\bf{I}}-\frac{1}{2\chi}({\bf{I}}-% \hat{{\bf{\Lambda}}})&-({\bf{I}}-\frac{1}{2\chi}({\bf{I}}-\hat{{\bf{\Lambda}}}% ))\\ \frac{1}{2\chi}({{\bf{I}}-\hat{{\bf{\Lambda}}}})&{\bf{I}}-\frac{1}{2\chi}({\bf% {I}}-\hat{{\bf{\Lambda}}})\\ \end{array}\right],bold_H = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) end_CELL start_CELL - ( bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) end_CELL start_CELL bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) end_CELL end_ROW end_ARRAY ] ,

where 𝚲^=diag{λ2,,λn}^𝚲diagsubscript𝜆2subscript𝜆𝑛\hat{{\bf{\Lambda}}}=\mathrm{diag}\{\lambda_{2},\ldots,\lambda_{n}\}over^ start_ARG bold_Λ end_ARG = roman_diag { italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, and λi(1,1)subscript𝜆𝑖11\lambda_{i}\in(-1,1)italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( - 1 , 1 ). Since the blocks of 𝐇𝐇{\bf{H}}bold_H are diagonal matrices, there exists a permutation matrix 𝐐1subscript𝐐1{\bf{Q}}_{1}bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that 𝐐1𝐇𝐐1𝖳=blkdiag{Hi}i=2nsubscript𝐐1superscriptsubscript𝐇𝐐1𝖳blkdiagsuperscriptsubscriptsubscript𝐻𝑖𝑖2𝑛{\bf{Q}}_{1}{\bf{H}}{\bf{Q}}_{1}^{\sf T}=\mathrm{blkdiag}\{H_{i}\}_{i=2}^{n}bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_HQ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = roman_blkdiag { italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where

Hi=[112χ(1λi)(112χ(1λi))12χ(1λi)112χ(1λi)].subscript𝐻𝑖delimited-[]112𝜒1subscript𝜆𝑖112𝜒1subscript𝜆𝑖12𝜒1subscript𝜆𝑖112𝜒1subscript𝜆𝑖H_{i}=\left[\begin{array}[]{cc}1-\frac{1}{2\chi}(1-\lambda_{i})&-(1-\frac{1}{2% \chi}(1-\lambda_{i}))\\ \frac{1}{2\chi}(1-\lambda_{i})&1-\frac{1}{2\chi}(1-\lambda_{i})\\ \end{array}\right].italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL start_CELL - ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL start_CELL 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ] .

Setting νi=112χ(1λi)subscript𝜈𝑖112𝜒1subscript𝜆𝑖\nu_{i}=1-\frac{1}{2\chi}(1-\lambda_{i})italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), we have νi(0,1)subscript𝜈𝑖01\nu_{i}\in(0,1)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be rewritten as

Hi=[νiνi1νiνi]2×2.subscript𝐻𝑖delimited-[]subscript𝜈𝑖subscript𝜈𝑖1subscript𝜈𝑖subscript𝜈𝑖superscript22H_{i}=\left[\begin{array}[]{cc}\nu_{i}&-\nu_{i}\\ 1-\nu_{i}&\nu_{i}\\ \end{array}\right]\in\mathbb{R}^{2\times 2}.italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] ∈ blackboard_R start_POSTSUPERSCRIPT 2 × 2 end_POSTSUPERSCRIPT .

It holds that Tr(Hi)=2νiTrsubscript𝐻𝑖2subscript𝜈𝑖\mathrm{Tr}(H_{i})=2\nu_{i}roman_Tr ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 2 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, det(Hi)=νidetsubscript𝐻𝑖subscript𝜈𝑖\mathrm{det}(H_{i})=\nu_{i}roman_det ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Thus, the eigenvalues of Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are

γ(1,2),isubscript𝛾12𝑖\displaystyle\gamma_{(1,2),i}italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT =12[Tr(Hi)±Tr(Hi)24det(Hi)]=νi±νi2νi.absent12delimited-[]plus-or-minusTrsubscript𝐻𝑖Trsuperscriptsubscript𝐻𝑖24detsubscript𝐻𝑖plus-or-minussubscript𝜈𝑖superscriptsubscript𝜈𝑖2subscript𝜈𝑖\displaystyle=\frac{1}{2}\Big{[}\mathrm{Tr}(H_{i})\pm\sqrt{\mathrm{Tr}(H_{i})^% {2}-4\mathrm{det}(H_{i})}\Big{]}=\nu_{i}\pm\sqrt{\nu_{i}^{2}-\nu_{i}}.= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_Tr ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ± square-root start_ARG roman_Tr ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 roman_d roman_e roman_t ( italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ] = italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ± square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

Notice that |γ(1,2),i|<1subscript𝛾12𝑖1|\gamma_{(1,2),i}|<1| italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT | < 1 when 1/3<νi<113subscript𝜈𝑖1-\nicefrac{{1}}{{3}}<\nu_{i}<1- / start_ARG 1 end_ARG start_ARG 3 end_ARG < italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1, which holds under Assumption 1 since 𝐖a0succeedssubscript𝐖𝑎0{\bf{W}}_{a}\succ 0bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ≻ 0, i.e., 0<νi<1(i=2,,n)0subscript𝜈𝑖1𝑖2𝑛0<\nu_{i}<1\ (i=2,\ldots,n)0 < italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 ( italic_i = 2 , … , italic_n ). For 0<νi<10subscript𝜈𝑖10<\nu_{i}<10 < italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1, the eigenvalues of Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are complex and distinct:

γ(1,2),i=νi±jνiνi2,|γ(1,2),i|<1,formulae-sequencesubscript𝛾12𝑖plus-or-minussubscript𝜈𝑖𝑗subscript𝜈𝑖superscriptsubscript𝜈𝑖2subscript𝛾12𝑖1\displaystyle\gamma_{(1,2),i}=\nu_{i}\pm j\sqrt{\nu_{i}-\nu_{i}^{2}},\ |\gamma% _{(1,2),i}|<1,italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT = italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ± italic_j square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , | italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT | < 1 ,

where j2=1superscript𝑗21j^{2}=-1italic_j start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = - 1. Through algebraic multiplication it can be verified that Hi=Q2,iΓiQ2,i1subscript𝐻𝑖subscript𝑄2𝑖subscriptΓ𝑖superscriptsubscript𝑄2𝑖1H_{i}=Q_{2,i}\Gamma_{i}Q_{2,i}^{-1}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, where Γi=diag{γ1,i,γ2,i}subscriptΓ𝑖diagsubscript𝛾1𝑖subscript𝛾2𝑖\Gamma_{i}=\mathrm{diag}\{\gamma_{1,i},\gamma_{2,i}\}roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_diag { italic_γ start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT } and

Q2,i=[νiνij1νij1νi],Q2,i1=[12νij21νi12νij21νi].formulae-sequencesubscript𝑄2𝑖delimited-[]subscript𝜈𝑖subscript𝜈𝑖𝑗1subscript𝜈𝑖𝑗1subscript𝜈𝑖superscriptsubscript𝑄2𝑖1delimited-[]12subscript𝜈𝑖𝑗21subscript𝜈𝑖12subscript𝜈𝑖𝑗21subscript𝜈𝑖Q_{2,i}=\left[\begin{array}[]{cc}\sqrt{\nu_{i}}&\sqrt{\nu_{i}}\\ -j\sqrt{1-\nu_{i}}&j\sqrt{1-\nu_{i}}\end{array}\right],\quad Q_{2,i}^{-1}=% \left[\begin{array}[]{cc}\frac{1}{2\sqrt{\nu_{i}}}&\frac{j}{2\sqrt{1-\nu_{i}}}% \\ \frac{1}{2\sqrt{\nu_{i}}}&-\frac{j}{2\sqrt{1-\nu_{i}}}\end{array}\right].italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL - italic_j square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL italic_j square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARRAY ] , italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW end_ARRAY ] .

Note that

Q2,iQ2,i=[2νi002(1νi)], and (Q2,i1)(Q2,i1)=14νi(1νi)[112νi12νi1].formulae-sequencesubscript𝑄2𝑖superscriptsubscript𝑄2𝑖delimited-[]2subscript𝜈𝑖0021subscript𝜈𝑖 and superscriptsubscript𝑄2𝑖1superscriptsuperscriptsubscript𝑄2𝑖114subscript𝜈𝑖1subscript𝜈𝑖delimited-[]112subscript𝜈𝑖12subscript𝜈𝑖1Q_{2,i}Q_{2,i}^{*}=\left[\begin{array}[]{cc}2\nu_{i}&0\\ 0&2(1-\nu_{i})\end{array}\right],\text{ and }(Q_{2,i}^{-1})(Q_{2,i}^{-1})^{*}=% \frac{1}{4\nu_{i}(1-\nu_{i})}\left[\begin{array}[]{cc}1&1-2\nu_{i}\\ 1-2\nu_{i}&1\end{array}\right].italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL 2 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 2 ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ] , and ( italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 4 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG [ start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL 1 - 2 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 - 2 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL 1 end_CELL end_ROW end_ARRAY ] .

Since the spectral radius of matrix is upper bounded by any of its norm and 0<νi<10subscript𝜈𝑖10<\nu_{i}<10 < italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1, it holds that

Q2,i2Q2,iQ2,i12, and Q2,i12(Q2,i1)(Q2,i1)124νi(1νi).formulae-sequencesuperscriptnormsubscript𝑄2𝑖2subscriptnormsubscript𝑄2𝑖superscriptsubscript𝑄2𝑖12 and superscriptnormsuperscriptsubscript𝑄2𝑖12subscriptnormsuperscriptsubscript𝑄2𝑖1superscriptsuperscriptsubscript𝑄2𝑖1124subscript𝜈𝑖1subscript𝜈𝑖\|Q_{2,i}\|^{2}\leq\|Q_{2,i}Q_{2,i}^{*}\|_{1}\leq 2,\text{ and }\|Q_{2,i}^{-1}% \|^{2}\leq\|(Q_{2,i}^{-1})(Q_{2,i}^{-1})^{*}\|_{1}\leq\frac{2}{4\nu_{i}(1-\nu_% {i})}.∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 2 , and ∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ ( italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG 4 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG .

Using νi112χ(1λn)subscript𝜈𝑖112𝜒1subscript𝜆𝑛\nu_{i}\geq 1-\frac{1}{2\chi}(1-\lambda_{n})italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and 1νi=12χ(1λi)12χ(1λ2)1subscript𝜈𝑖12𝜒1subscript𝜆𝑖12𝜒1subscript𝜆21-\nu_{i}=\frac{1}{2\chi}(1-\lambda_{i})\geq\frac{1}{2\chi}(1-\lambda_{2})1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), we have

Q2,i12χ(112χ(1λn))(1λ2)2χ(1+λn)(1λ2).superscriptnormsuperscriptsubscript𝑄2𝑖12𝜒112𝜒1subscript𝜆𝑛1subscript𝜆22𝜒1subscript𝜆𝑛1subscript𝜆2\|Q_{2,i}^{-1}\|^{2}\leq\frac{\chi}{(1-\frac{1}{2\chi}(1-\lambda_{n}))(1-% \lambda_{2})}\leq\frac{2\chi}{(1+\lambda_{n})(1-\lambda_{2})}\ .∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_χ end_ARG start_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ≤ divide start_ARG 2 italic_χ end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

Let 𝐐=𝐐1𝖳𝐐2𝐐superscriptsubscript𝐐1𝖳subscript𝐐2{\bf{Q}}={\bf{Q}}_{1}^{\sf T}{\bf{Q}}_{2}bold_Q = bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with 𝐐2=blkdiag{Q2,i}i=2nsubscript𝐐2blkdiagsuperscriptsubscriptsubscript𝑄2𝑖𝑖2𝑛{\bf{Q}}_{2}=\mathrm{blkdiag}\{Q_{2,i}\}_{i=2}^{n}bold_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_blkdiag { italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We have 𝐐1𝐇𝐐=𝚪superscript𝐐1𝐇𝐐𝚪{\bf{Q}}^{-1}{\bf{H}}{\bf{Q}}={\bf{\Gamma}}bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_HQ = bold_Γ, where 𝚪=blkdiag{Γi}i=2n𝚪blkdiagsuperscriptsubscriptsubscriptΓ𝑖𝑖2𝑛{\bf{\Gamma}}=\mathrm{blkdiag}\{\Gamma_{i}\}_{i=2}^{n}bold_Γ = roman_blkdiag { roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, i.e., there exists an invertible matrix 𝐐𝐐{\bf{Q}}bold_Q such that 𝐇=𝐐𝚪𝐐1𝐇𝐐𝚪superscript𝐐1{\bf{H}}={\bf{Q}}{\bf{\Gamma}}{\bf{Q}}^{-1}bold_H = bold_Q bold_Γ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and

𝚪=112χ(1λ2)<1.norm𝚪112𝜒1subscript𝜆21\|{\bf{\Gamma}}\|=\sqrt{1-\frac{1}{2\chi}(1-\lambda_{2})}<1.∥ bold_Γ ∥ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 1 .

Therefore, we finally obtain (30). Moreover, we have

𝐐22 and 𝐐122χ(1+λn)(1λ2).superscriptnorm𝐐22 and superscriptnormsuperscript𝐐122𝜒1subscript𝜆𝑛1subscript𝜆2\|{\bf{Q}}\|^{2}\leq 2\text{ and }\|{\bf{Q}}^{-1}\|^{2}\leq\frac{2\chi}{(1+% \lambda_{n})(1-\lambda_{2})}.∥ bold_Q ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 and ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_χ end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

Then, we prove 𝐗t𝐗¯tF24tF2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Since

t=𝐐1[𝐏^𝖳𝐗t𝐏^𝖳𝐑t] and 𝐐1=𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12]superscript𝑡superscript𝐐1delimited-[]superscript^𝐏𝖳superscript𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡 and superscript𝐐1subscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12\mathcal{E}^{t}={\bf{Q}}^{-1}\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}{{% \bf{X}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}{{\bf{R}}}^{t}\\ \end{array}\right]\text{ and }{\bf{Q}}^{-1}={\bf{Q}}_{1}\left[\begin{array}[]{% cc}\frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-% \hat{{\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] and bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ]

taking the squared norm, we have

tF2superscriptsubscriptnormsuperscript𝑡F2\displaystyle\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12][𝐏^𝖳𝐗t𝐏^𝖳𝐑t]F214[𝚲^a12𝐏^𝖳𝐗t+j(𝐈𝚲^a)12𝐏^𝖳𝐑t𝚲^a12𝐏^𝖳𝐗tj(𝐈𝚲^a)12𝐏^𝖳𝐑t]F2absentsuperscriptsubscriptnormsubscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12delimited-[]superscript^𝐏𝖳superscript𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡F214superscriptsubscriptnormdelimited-[]superscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐗𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐑𝑡superscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐗𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐑𝑡F2\displaystyle=\left\|{\bf{Q}}_{1}\left[\begin{array}[]{cc}\frac{1}{2}\hat{{\bf% {\Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^% {-\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}{{\bf{X}}}^{t}% \\ \hat{{\bf{P}}}^{\sf T}{{\bf{R}}}^{t}\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}\leq\frac{1}{4}\left\|\left[\begin{% array}[]{c}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{{\bf{% X}}}^{t}+j({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\hat{{\bf{P}}}^{% \sf T}{{\bf{R}}}^{t}\\ \hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{{\bf{X}}}^{t}-j(% {\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{{\bf{R% }}}^{t}\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}= ∥ bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG ∥ [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝚲^a12𝐏^𝖳𝐗t2+(𝐈𝚲^a)12𝐏^𝖳𝐑tF2.absentsuperscriptnormsuperscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐗𝑡2superscriptsubscriptnormsuperscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐑𝑡F2\displaystyle\leq\|\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}\hat{{\bf{P}}}^{\sf T% }{{\bf{X}}}^{t}\|^{2}+\|({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\hat% {{\bf{P}}}^{\sf T}{{\bf{R}}}^{t}\|_{\mathrm{F}}^{2}.≤ ∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

On the other hand, noting that

𝐐=[𝚲^a12𝚲^a12j(𝐈𝚲^a)12j(𝐈𝚲^a)12]𝐐1𝖳,𝐐delimited-[]superscriptsubscript^𝚲𝑎12superscriptsubscript^𝚲𝑎12𝑗superscript𝐈subscript^𝚲𝑎12𝑗superscript𝐈subscript^𝚲𝑎12superscriptsubscript𝐐1𝖳{\bf{Q}}=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}&\hat{% {\bf{\Lambda}}}_{a}^{\frac{1}{2}}\\ -j({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{\frac{1}{2}}&j({\bf{I}}-\hat{{\bf{% \Lambda}}}_{a})^{\frac{1}{2}}\\ \end{array}\right]{\bf{Q}}_{1}^{\sf T},bold_Q = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ,

it holds that

[𝐏^𝖳𝐗t𝐏^𝖳𝐑t]=[𝚲^a12𝚲^a12j(𝐈𝚲^a)12j(𝐈𝚲^a)12]𝐐1𝖳t=[𝚲^a12(𝐐1,u𝖳+𝐐1,l𝖳)tj(𝐈𝚲^a)12(𝐐1,u𝖳+𝐐1,l𝖳)t],delimited-[]superscript^𝐏𝖳superscript𝐗𝑡superscript^𝐏𝖳superscript𝐑𝑡delimited-[]superscriptsubscript^𝚲𝑎12superscriptsubscript^𝚲𝑎12𝑗superscript𝐈subscript^𝚲𝑎12𝑗superscript𝐈subscript^𝚲𝑎12superscriptsubscript𝐐1𝖳superscript𝑡delimited-[]superscriptsubscript^𝚲𝑎12superscriptsubscript𝐐1𝑢𝖳superscriptsubscript𝐐1𝑙𝖳superscript𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscriptsubscript𝐐1𝑢𝖳superscriptsubscript𝐐1𝑙𝖳superscript𝑡\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}{{\bf{X}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}{{\bf{R}}}^{t}\\ \end{array}\right]=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}^{\frac{1}% {2}}&\hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}\\ -j({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{\frac{1}{2}}&j({\bf{I}}-\hat{{\bf{% \Lambda}}}_{a})^{\frac{1}{2}}\\ \end{array}\right]{\bf{Q}}_{1}^{\sf T}\mathcal{E}^{t}=\left[\begin{array}[]{c}% \hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}({\bf{Q}}_{1,u}^{\sf T}+{\bf{Q}}_{1,l}^{% \sf T})\mathcal{E}^{t}\\ -j({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{\frac{1}{2}}({\bf{Q}}_{1,u}^{\sf T}+{% \bf{Q}}_{1,l}^{\sf T})\mathcal{E}^{t}\\ \end{array}\right],[ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT + bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT + bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ,

where 𝐐1,u𝖳superscriptsubscript𝐐1𝑢𝖳{\bf{Q}}_{1,u}^{\sf T}bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT and 𝐐1,l𝖳superscriptsubscript𝐐1𝑙𝖳{\bf{Q}}_{1,l}^{\sf T}bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT are the upper and lower blocks of 𝐐1𝖳=[𝐐1,u𝖳;𝐐1,l𝖳]superscriptsubscript𝐐1𝖳superscriptsubscript𝐐1𝑢𝖳superscriptsubscript𝐐1𝑙𝖳{\bf{Q}}_{1}^{\sf T}=[{\bf{Q}}_{1,u}^{\sf T};{\bf{Q}}_{1,l}^{\sf T}]bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = [ bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ; bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ]. Then, it holds that

𝐗t𝐗¯tF2=𝐏^𝖳𝐗tF2=𝚲^a12(𝐐1,u𝖳+𝐐1,l𝖳)tF24tF2,superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F2subscriptsuperscriptnormsuperscript^𝐏𝖳superscript𝐗𝑡2Fsubscriptsuperscriptnormsuperscriptsubscript^𝚲𝑎12superscriptsubscript𝐐1𝑢𝖳superscriptsubscript𝐐1𝑙𝖳superscript𝑡2F4superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}=\|\hat{{\bf{P}}}^{\sf T}{% {\bf{X}}}^{t}\|^{2}_{\mathrm{F}}=\|\hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}({\bf% {Q}}_{1,u}^{\sf T}+{\bf{Q}}_{1,l}^{\sf T})\mathcal{E}^{t}\|^{2}_{\mathrm{F}}% \leq 4\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2},∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT = ∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT + bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where we used 𝐐1,u𝖳+𝐐1,l𝖳24superscriptnormsuperscriptsubscript𝐐1𝑢𝖳superscriptsubscript𝐐1𝑙𝖳24\|{\bf{Q}}_{1,u}^{\sf T}+{\bf{Q}}_{1,l}^{\sf T}\|^{2}\leq 4∥ bold_Q start_POSTSUBSCRIPT 1 , italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT + bold_Q start_POSTSUBSCRIPT 1 , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 since 𝐐1subscript𝐐1{\bf{Q}}_{1}bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a permutation matrix 𝐐1=1normsubscript𝐐11\|{\bf{Q}}_{1}\|=1∥ bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ = 1. ∎

Appendix E Proof of Lemma 2

Proof.

Proof of the descent inequality (32). Since f𝑓fitalic_f is L𝐿Litalic_L-smooth, setting y=𝐱¯t+1𝑦superscript¯𝐱𝑡1y=\bar{{\bf{x}}}^{t+1}italic_y = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT and x=𝐱¯t𝑥superscript¯𝐱𝑡x=\bar{{\bf{x}}}^{t}italic_x = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in (18), it gives that

f(𝐱¯t+1)f(𝐱¯t)+f(𝐱¯t),𝐱¯t+1𝐱¯t+L2𝐱¯t+1𝐱¯t2.𝑓superscript¯𝐱𝑡1𝑓superscript¯𝐱𝑡𝑓superscript¯𝐱𝑡superscript¯𝐱𝑡1superscript¯𝐱𝑡𝐿2superscriptnormsuperscript¯𝐱𝑡1superscript¯𝐱𝑡2\displaystyle f(\bar{{\bf{x}}}^{t+1})\leq f(\bar{{\bf{x}}}^{t})+\langle\nabla f% (\bar{{\bf{x}}}^{t}),\bar{{\bf{x}}}^{t+1}-\bar{{\bf{x}}}^{t}\rangle+\frac{L}{2% }\|\bar{{\bf{x}}}^{t+1}-\bar{{\bf{x}}}^{t}\|^{2}.italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

From (30a), i.e.,

𝐱¯t+1=𝐱¯tαF¯(𝐗t)α𝐬¯t,superscript¯𝐱𝑡1superscript¯𝐱𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\bar{{\bf{x}}}^{t+1}=\bar{{\bf{x}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t}% )-\alpha\bar{{\bf{s}}}^{t},over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,

where F¯(𝐗t)=(1ni=1nfi(𝐱it))𝖳¯𝐹superscript𝐗𝑡superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖subscriptsuperscript𝐱𝑡𝑖𝖳\overline{\nabla F}({\bf{X}}^{t})=\big{(}\frac{1}{n}\sum_{i=1}^{n}\nabla f_{i}% ({\bf{x}}^{t}_{i})\big{)}^{\sf T}over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, we have

f(𝐱¯t+1)f(𝐱¯t)αf(𝐱¯t),F¯(𝐗t)+𝐬¯t+Lα22F¯(𝐗t)+𝐬¯t2.𝑓superscript¯𝐱𝑡1𝑓superscript¯𝐱𝑡𝛼𝑓superscript¯𝐱𝑡¯𝐹superscript𝐗𝑡superscript¯𝐬𝑡𝐿superscript𝛼22superscriptnorm¯𝐹superscript𝐗𝑡superscript¯𝐬𝑡2\displaystyle f(\bar{{\bf{x}}}^{t+1})\leq f(\bar{{\bf{x}}}^{t})-\alpha\big{% \langle}\nabla f(\bar{{\bf{x}}}^{t}),\overline{\nabla F}({\bf{X}}^{t})+\bar{{% \bf{s}}}^{t}\big{\rangle}+\frac{L\alpha^{2}}{2}\|\overline{\nabla F}({\bf{X}}^% {t})+\bar{{\bf{s}}}^{t}\|^{2}.italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Taking conditioned expectation with respect to 𝒢tsuperscript𝒢𝑡\mathcal{G}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and by

𝔼[𝐬¯t|𝒢t]=0,𝔼[𝐬¯t2|𝒢t]σ2n,formulae-sequence𝔼delimited-[]conditionalsuperscript¯𝐬𝑡superscript𝒢𝑡0𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐬𝑡2superscript𝒢𝑡superscript𝜎2𝑛\mathbb{E}\!\left[\bar{{\bf{s}}}^{t}\;|\;\mathcal{G}^{t}\right]=0,\ \mathbb{E}% \!\left[\|\bar{{\bf{s}}}^{t}\|^{2}\;|\;\mathcal{G}^{t}\right]\leq\frac{\sigma^% {2}}{n},blackboard_E [ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = 0 , blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ,

it holds that

𝔼[f(𝐱¯t+1)|𝒢t]𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡1superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t+1})\;|\;\mathcal{G}^{t}\right]blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] 𝔼[f(𝐱¯t)|𝒢t]αf(𝐱¯t),F¯(𝐗t)+𝔼[Lα22F¯(𝐗t)+𝐬¯t2|t]absent𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡superscript𝒢𝑡𝛼𝑓superscript¯𝐱𝑡¯𝐹superscript𝐗𝑡𝔼delimited-[]conditional𝐿superscript𝛼22superscriptnorm¯𝐹superscript𝐗𝑡superscript¯𝐬𝑡2superscript𝑡\displaystyle\leq\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t})\;|\;\mathcal{G}^{t}% \right]-\alpha\big{\langle}\nabla f(\bar{{\bf{x}}}^{t}),\overline{\nabla F}({% \bf{X}}^{t})\big{\rangle}+\mathbb{E}\!\left[\frac{L\alpha^{2}}{2}\|\overline{% \nabla F}({\bf{X}}^{t})+\bar{{\bf{s}}}^{t}\|^{2}\;|\;\mathcal{F}^{t}\right]≤ blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_α ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩ + blackboard_E [ divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝔼[f(𝐱¯t)|𝒢t]αf(𝐱¯t),F¯(𝐗t)+Lα22(𝔼[F¯(𝐗t)2|t]+𝔼[𝐬¯t2|t])absent𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡superscript𝒢𝑡𝛼𝑓superscript¯𝐱𝑡¯𝐹superscript𝐗𝑡𝐿superscript𝛼22𝔼delimited-[]conditionalsuperscriptnorm¯𝐹superscript𝐗𝑡2superscript𝑡𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐬𝑡2superscript𝑡\displaystyle=\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t})\;|\;\mathcal{G}^{t}% \right]-\alpha\big{\langle}\nabla f(\bar{{\bf{x}}}^{t}),\overline{\nabla F}({% \bf{X}}^{t})\big{\rangle}+\frac{L\alpha^{2}}{2}\Big{(}\mathbb{E}\!\left[\|% \overline{\nabla F}({\bf{X}}^{t})\|^{2}\;|\;\mathcal{F}^{t}\right]+\mathbb{E}% \!\left[\|\bar{{\bf{s}}}^{t}\|^{2}\;|\;\mathcal{F}^{t}\right]\Big{)}= blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_α ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩ + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ( blackboard_E [ ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] )
𝔼[f(𝐱¯t)|𝒢t]αf(𝐱¯t),F¯(𝐗t)+Lα22𝔼[F¯(𝐗t)2|𝒢t]+Lα2σ22n.absent𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡superscript𝒢𝑡𝛼𝑓superscript¯𝐱𝑡¯𝐹superscript𝐗𝑡𝐿superscript𝛼22𝔼delimited-[]conditionalsuperscriptnorm¯𝐹superscript𝐗𝑡2superscript𝒢𝑡𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle\leq\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t})\;|\;\mathcal{G}^{t}% \right]-\alpha\big{\langle}\nabla f(\bar{{\bf{x}}}^{t}),\overline{\nabla F}({% \bf{X}}^{t})\big{\rangle}+\frac{L\alpha^{2}}{2}\mathbb{E}\!\left[\|\overline{% \nabla F}({\bf{X}}^{t})\|^{2}\;|\;\mathcal{G}^{t}\right]+\frac{L\alpha^{2}% \sigma^{2}}{2n}.≤ blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_α ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩ + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG blackboard_E [ ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG .

Since 2a,b=a2+b2ab22𝑎𝑏superscriptnorm𝑎2superscriptnorm𝑏2superscriptnorm𝑎𝑏22\langle a,b\rangle=\|a\|^{2}+\|b\|^{2}-\|a-b\|^{2}2 ⟨ italic_a , italic_b ⟩ = ∥ italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_a - italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

f(𝐱¯t),F¯(𝐗t)=12f(𝐱¯t)212F¯(𝐗t)2+12F¯(𝐗t)f(𝐱¯t)2.𝑓superscript¯𝐱𝑡¯𝐹superscript𝐗𝑡12superscriptnorm𝑓superscript¯𝐱𝑡212superscriptnorm¯𝐹superscript𝐗𝑡212superscriptnorm¯𝐹superscript𝐗𝑡𝑓superscript¯𝐱𝑡2-\big{\langle}\nabla f(\bar{{\bf{x}}}^{t}),\overline{\nabla F}({\bf{X}}^{t})% \big{\rangle}=-\frac{1}{2}\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}-\frac{1}{2}\|% \overline{\nabla F}({\bf{X}}^{t})\|^{2}+\frac{1}{2}\|\overline{\nabla F}({\bf{% X}}^{t})-\nabla f(\bar{{\bf{x}}}^{t})\|^{2}.- ⟨ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩ = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Combining the last two equations and by αL12𝛼𝐿12\alpha L\leq\frac{1}{2}italic_α italic_L ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG, we get

𝔼[f(𝐱¯t+1)|𝒢t]𝔼delimited-[]conditional𝑓superscript¯𝐱𝑡1superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t+1})\;|\;\mathcal{G}^{t}\right]blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] f(𝐱¯t)α2f(𝐱¯t)2α2F¯(𝐗t)2+α2F¯(𝐗t)f(𝐱¯t)2absent𝑓superscript¯𝐱𝑡𝛼2superscriptnorm𝑓superscript¯𝐱𝑡2𝛼2superscriptnorm¯𝐹superscript𝐗𝑡2𝛼2superscriptnorm¯𝐹superscript𝐗𝑡𝑓superscript¯𝐱𝑡2\displaystyle\leq f(\bar{{\bf{x}}}^{t})-\frac{\alpha}{2}\|\nabla f(\bar{{\bf{x% }}}^{t})\|^{2}-\frac{\alpha}{2}\|\overline{\nabla F}({\bf{X}}^{t})\|^{2}+\frac% {\alpha}{2}\|\overline{\nabla F}({\bf{X}}^{t})-\nabla f(\bar{{\bf{x}}}^{t})\|^% {2}≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+Lα22F¯(𝐗t)2+Lα2σ22n𝐿superscript𝛼22superscriptnorm¯𝐹superscript𝐗𝑡2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle\quad+\frac{L\alpha^{2}}{2}\|\overline{\nabla F}({\bf{X}}^{t})\|^% {2}+\frac{L\alpha^{2}\sigma^{2}}{2n}+ divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG
f(𝐱¯t)α2f(𝐱¯t)2α2(1αL)F¯(𝐗t)2+α2F¯(𝐗t)f(𝐱¯t)2+Lα2σ22nabsent𝑓superscript¯𝐱𝑡𝛼2superscriptnorm𝑓superscript¯𝐱𝑡2𝛼21𝛼𝐿superscriptnorm¯𝐹superscript𝐗𝑡2𝛼2superscriptnorm¯𝐹superscript𝐗𝑡𝑓superscript¯𝐱𝑡2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle\leq f(\bar{{\bf{x}}}^{t})-\frac{\alpha}{2}\|\nabla f(\bar{{\bf{x% }}}^{t})\|^{2}-\frac{\alpha}{2}(1-\alpha L)\|\overline{\nabla F}({\bf{X}}^{t})% \|^{2}+\frac{\alpha}{2}\|\overline{\nabla F}({\bf{X}}^{t})-\nabla f(\bar{{\bf{% x}}}^{t})\|^{2}+\frac{L\alpha^{2}\sigma^{2}}{2n}≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ( 1 - italic_α italic_L ) ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG
f(𝐱¯t)α2f(𝐱¯t)2+α2F¯(𝐗t)f(𝐱¯t)2+Lα2σ22n.absent𝑓superscript¯𝐱𝑡𝛼2superscriptnorm𝑓superscript¯𝐱𝑡2𝛼2superscriptnorm¯𝐹superscript𝐗𝑡𝑓superscript¯𝐱𝑡2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle\leq f(\bar{{\bf{x}}}^{t})-\frac{\alpha}{2}\|\nabla f(\bar{{\bf{x% }}}^{t})\|^{2}+\frac{\alpha}{2}\|\overline{\nabla F}({\bf{X}}^{t})-\nabla f(% \bar{{\bf{x}}}^{t})\|^{2}+\frac{L\alpha^{2}\sigma^{2}}{2n}.≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG .

By (31), i.e., 𝐗t𝐗¯tF24tF2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

α2F¯(𝐗t)f(𝐱¯t)2𝛼2superscriptnorm¯𝐹superscript𝐗𝑡𝑓superscript¯𝐱𝑡2\displaystyle\frac{\alpha}{2}\|\overline{\nabla F}({\bf{X}}^{t})-\nabla f(\bar% {{\bf{x}}}^{t})\|^{2}divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =α21ni=1n(fi(𝐱it)fi(𝐱¯t))2absent𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript¯𝐱𝑡2\displaystyle=\frac{\alpha}{2}\|\frac{1}{n}\sum_{i=1}^{n}(\nabla f_{i}({\bf{x}% }_{i}^{t})-\nabla f_{i}(\bar{{\bf{x}}}^{t}))\|^{2}= divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
α2ni=1nfi(𝐱it)fi(𝐱¯t)2αL22n𝐗t𝐗¯tF22αL2ntF2.absent𝛼2𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript¯𝐱𝑡2𝛼superscript𝐿22𝑛superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F22𝛼superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝑡F2\displaystyle\leq\frac{\alpha}{2n}\sum_{i=1}^{n}\|\nabla f_{i}({\bf{x}}_{i}^{t% })-\nabla f_{i}(\bar{{\bf{x}}}^{t})\|^{2}\leq\frac{\alpha L^{2}}{2n}\|{\bf{X}}% ^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq\frac{2\alpha L^{2}}{n}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}.≤ divide start_ARG italic_α end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Thus, the descent inequality (32) holds.

Proof of the inequality (33). Taking conditioned expectation with respect to tsuperscript𝑡\mathcal{F}^{t}caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, from (30f), we have

𝔼[t+1F2|t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{F}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] =𝔾tF2+𝔼[𝔽tF2|t]absentsuperscriptsubscriptnormsuperscript𝔾𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝔽𝑡F2superscript𝑡\displaystyle=\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}+\mathbb{E}\!\left[\|\mathbb{% F}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]= ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ blackboard_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝔾tF2+𝔼[𝐐1𝚲^b𝐏^𝖳𝐄tF2|t]+𝔼[𝐐1𝚲^b𝐏^𝖳𝐄tF2|t].absentsuperscriptsubscriptnormsuperscript𝔾𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡\displaystyle=\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}+\mathbb{E}\!\left[\|{\bf{Q}}% ^{-1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}% ^{2}\;|\;\mathcal{F}^{t}\right]+\mathbb{E}\!\left[\|{\bf{Q}}^{-1}\hat{{\bf{% \Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{F}^{t}\right].= ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] .

Since 𝐄t=(θt1)2χ𝐖b𝐙^tsuperscript𝐄𝑡subscript𝜃𝑡12𝜒subscript𝐖𝑏superscript^𝐙𝑡{\bf{E}}^{t}=\frac{(\theta_{t}-1)}{2\chi}{\bf{W}}_{b}\hat{{\bf{Z}}}^{t}bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, Prob(θt=1)=pProbsubscript𝜃𝑡1𝑝\mathop{\rm Prob}(\theta_{t}=1)=proman_Prob ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 ) = italic_p, and Prob(θt=0)=1pProbsubscript𝜃𝑡01𝑝\mathop{\rm Prob}(\theta_{t}=0)=1-proman_Prob ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 ) = 1 - italic_p, we have

𝔼[𝐐1𝚲^b𝐏^𝖳𝐄tF2|t]+𝔼[𝐐1𝚲^b𝐏^𝖳𝐄tF2|t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡\displaystyle\mathbb{E}\!\left[\|{\bf{Q}}^{-1}\hat{{\bf{\Lambda}}}_{b}\hat{{% \bf{P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]+% \mathbb{E}\!\left[\|{\bf{Q}}^{-1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T% }{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=1p4χ2(𝐐1𝚲^b𝐏^𝖳𝐖b𝐙^tF2+𝐐1𝚲^b𝐏^𝖳𝐖b𝐙^tF2)2(1p)χ2𝐐1𝐏^𝖳𝐙^tF2.absent1𝑝4superscript𝜒2superscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡F2superscriptsubscriptnormsuperscript𝐐1subscript^𝚲𝑏superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡F221𝑝superscript𝜒2superscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript^𝐙𝑡F2\displaystyle=\frac{1-p}{4\chi^{2}}\Big{(}\|{\bf{Q}}^{-1}\hat{{\bf{\Lambda}}}_% {b}\hat{{\bf{P}}}^{\sf T}{\bf{W}}_{b}\hat{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}+\|{% \bf{Q}}^{-1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{W}}_{b}\hat{{% \bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\Big{)}\leq{\frac{2(1-p)}{\chi^{2}}}\|{\bf{Q}}^% {-1}\hat{{\bf{P}}}^{\sf T}\hat{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}.= divide start_ARG 1 - italic_p end_ARG start_ARG 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 2 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Hence, it gives that

𝔼[t+1F2|t]𝔾tF2+2(1p)χ2𝐐1𝐏^𝖳𝐙^tF2.𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝑡superscriptsubscriptnormsuperscript𝔾𝑡F221𝑝superscript𝜒2superscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript^𝐙𝑡F2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{F}^{t}\right]\leq\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}+{\frac{2(1-p)}{% \chi^{2}}}\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}\hat{{\bf{Z}}}^{t}\|_{\mathrm{F% }}^{2}.blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Taking conditioned expectation with respect to 𝒢ttsuperscript𝒢𝑡superscript𝑡\mathcal{G}^{t}\subset\mathcal{F}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and using the unbiasedness of 𝐆tsuperscript𝐆𝑡{\bf{G}}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, we have

𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] 𝔼[𝔾tF2|𝒢t]+2(1p)χ2𝔼[𝐐1𝐏^𝖳𝐙^tF2|𝒢t].absent𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝔾𝑡F2superscript𝒢𝑡21𝑝superscript𝜒2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript^𝐙𝑡F2superscript𝒢𝑡\displaystyle\leq\mathbb{E}\!\left[\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]+\frac{2(1-p)}{\chi^{2}}\mathbb{E}\!\left[\|{\bf{Q}}^{-1% }\hat{{\bf{P}}}^{\sf T}\hat{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{% t}\right].≤ blackboard_E [ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + divide start_ARG 2 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] . (52)

We first bound 𝔼[𝔾t2|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript𝔾𝑡2superscript𝒢𝑡\mathbb{E}\!\left[\|\mathbb{G}^{t}\|^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]. Recall the definition of 𝔾tsuperscript𝔾𝑡\mathbb{G}^{t}blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

𝔾tsuperscript𝔾𝑡\displaystyle\mathbb{G}^{t}blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝚪tα𝐐1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗¯t)+𝐒t)12χ𝚲^b2𝐏^𝖳(F(𝐗t)F(𝐗¯t)+𝐒t)+𝐏^𝖳(F(𝐗¯t)F(𝐗¯t+1))]absent𝚪superscript𝑡𝛼superscript𝐐1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript𝐒𝑡12𝜒superscriptsubscript^𝚲𝑏2superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript𝐒𝑡superscript^𝐏𝖳𝐹superscript¯𝐗𝑡𝐹superscript¯𝐗𝑡1\displaystyle={\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}\left[\begin{% array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t% })-\nabla F(\bar{{\bf{X}}}^{t})+{\bf{S}}^{t})\\ \frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}\hat{{\bf{P}}}^{\sf T}(\nabla F({% \bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t})+{\bf{S}}^{t})+\hat{{\bf{P}}}^{\sf T}% (\nabla F(\bar{{\bf{X}}}^{t})-\nabla F(\bar{{\bf{X}}}^{t+1}))\\ \end{array}\right]= bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ]
=𝚪tα𝐐1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗¯t))12χ𝚲^b2𝐏^𝖳(F(𝐗t)F(𝐗¯t))+𝐏^𝖳(F(𝐗¯t)F(𝐗¯t+1))]𝐅tα𝐐1[𝚲^a𝐏^𝖳12χ𝚲^b2𝐏^𝖳]𝐂𝐒tabsent𝚪superscript𝑡𝛼superscript𝐐1subscriptdelimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡12𝜒superscriptsubscript^𝚲𝑏2superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript^𝐏𝖳𝐹superscript¯𝐗𝑡𝐹superscript¯𝐗𝑡1superscript𝐅𝑡𝛼subscriptsuperscript𝐐1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳12𝜒superscriptsubscript^𝚲𝑏2superscript^𝐏𝖳𝐂superscript𝐒𝑡\displaystyle={\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}\underbrace{% \left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}(\nabla F% ({\bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t}))\\ \frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}\hat{{\bf{P}}}^{\sf T}(\nabla F({% \bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t}))+\hat{{\bf{P}}}^{\sf T}(\nabla F(% \bar{{\bf{X}}}^{t})-\nabla F(\bar{{\bf{X}}}^{t+1}))\\ \end{array}\right]}_{{\bf{F}}^{t}}-\alpha\underbrace{{\bf{Q}}^{-1}\left[\begin% {array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}\\ \frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}\hat{{\bf{P}}}^{\sf T}\\ \end{array}\right]}_{{\bf{C}}}{\bf{S}}^{t}= bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT under⏟ start_ARG [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) + over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_α under⏟ start_ARG bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
=𝚪tα𝐐1𝐅tα𝐂𝐒t.absent𝚪superscript𝑡𝛼superscript𝐐1superscript𝐅𝑡𝛼superscript𝐂𝐒𝑡\displaystyle={\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}{\bf{F}}^{t}-% \alpha{\bf{C}}{\bf{S}}^{t}.= bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT .

Note that 𝐐1=𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12]superscript𝐐1subscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12{\bf{Q}}^{-1}={\bf{Q}}_{1}\left[\begin{array}[]{cc}\frac{1}{2}\hat{{\bf{% \Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{% -\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] and 𝐈𝚲^a=12χ𝚲^b2𝐈subscript^𝚲𝑎12𝜒superscriptsubscript^𝚲𝑏2{\bf{I}}-\hat{{\bf{\Lambda}}}_{a}=\frac{1}{2\chi}\hat{{\bf{\Lambda}}}_{b}^{2}bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It follows that

𝐂𝐒tsuperscript𝐂𝐒𝑡\displaystyle{\bf{CS}}^{t}bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12][𝚲^a𝐏^𝖳(𝐈𝚲^a)𝐏^𝖳]𝐒t=12𝐐1[𝚲^a12𝐏^𝖳𝐒t+j(𝐈𝚲^a)12𝐏^𝖳𝐒t𝚲^a12𝐏^𝖳𝐒tj(𝐈𝚲^a)12𝐏^𝖳𝐒t],absentsubscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐈subscript^𝚲𝑎superscript^𝐏𝖳superscript𝐒𝑡12subscript𝐐1delimited-[]superscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡superscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡\displaystyle={\bf{Q}}_{1}\left[\begin{array}[]{cc}\frac{1}{2}\hat{{\bf{% \Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{% -\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}% }}^{\sf T}\\ ({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})\hat{{\bf{P}}}^{\sf T}\\ \end{array}\right]{\bf{S}}^{t}=\frac{1}{2}{\bf{Q}}_{1}\left[\begin{array}[]{c}% \hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}+j({% \bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^% {t}\\ \hat{{\bf{\Lambda}}}_{a}^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}-j({% \bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^% {t}\\ \end{array}\right],= bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ,

where 𝐐1subscript𝐐1{\bf{Q}}_{1}bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a permutation matrix 𝐐1=1normsubscript𝐐11\|{\bf{Q}}_{1}\|=1∥ bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ = 1. Therefore, we have

𝐂𝐒tF214(𝚲^a12𝐏^𝖳𝐒t+j(𝐈𝚲^a)12𝐏^𝖳𝐒t2+𝚲^a12𝐏^𝖳𝐒tj(𝐈𝚲^a)12𝐏^𝖳𝐒t2)2𝐒tF2.subscriptsuperscriptnormsuperscript𝐂𝐒𝑡2F14superscriptnormsuperscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡2superscriptnormsuperscriptsubscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡𝑗superscript𝐈subscript^𝚲𝑎12superscript^𝐏𝖳superscript𝐒𝑡22subscriptsuperscriptnormsuperscript𝐒𝑡2F\|{\bf{CS}}^{t}\|^{2}_{\mathrm{F}}\leq\frac{1}{4}(\|\hat{{\bf{\Lambda}}}_{a}^{% \frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}+j({\bf{I}}-\hat{{\bf{\Lambda}}}% _{a})^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}\|^{2}+\|\hat{{\bf{% \Lambda}}}_{a}^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}-j({\bf{I}}-\hat% {{\bf{\Lambda}}}_{a})^{\frac{1}{2}}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}\|^{2})% \leq 2\|{\bf{S}}^{t}\|^{2}_{\mathrm{F}}.∥ bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG ( ∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_j ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ 2 ∥ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT .

Then, using Cauchy-Schwarz inequality, 𝚲^a1normsubscript^𝚲𝑎1\|\hat{{\bf{\Lambda}}}_{a}\|\leq 1∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∥ ≤ 1, 𝚲^b22normsuperscriptsubscript^𝚲𝑏22\|\hat{{\bf{\Lambda}}}_{b}^{2}\|\leq 2∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ≤ 2, and 𝐏^𝖳1normsuperscript^𝐏𝖳1\|\hat{{\bf{P}}}^{\sf T}\|\leq 1∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∥ ≤ 1, we have

𝔾tF2superscriptsubscriptnormsuperscript𝔾𝑡F2\displaystyle\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝚪tα𝐐1𝐅tF22α𝚪t,𝐂𝐒t+2α2𝐐1𝐅t,𝐂𝐒t+α2𝐂𝐒t2absentsuperscriptsubscriptnorm𝚪superscript𝑡𝛼superscript𝐐1superscript𝐅𝑡F22𝛼𝚪superscript𝑡superscript𝐂𝐒𝑡2superscript𝛼2superscript𝐐1superscript𝐅𝑡superscript𝐂𝐒𝑡superscript𝛼2superscriptnormsuperscript𝐂𝐒𝑡2\displaystyle=\|{\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}{\bf{F}}^{t}\|% _{\mathrm{F}}^{2}-2\alpha\langle{\bf{\Gamma}}\mathcal{E}^{t},{\bf{C}}{\bf{S}}^% {t}\rangle+2\alpha^{2}\langle{\bf{Q}}^{-1}{\bf{F}}^{t},{\bf{C}}{\bf{S}}^{t}% \rangle+\alpha^{2}\|{\bf{CS}}^{t}\|^{2}= ∥ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_α ⟨ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟨ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝚪tα𝐐1𝐅tF22α𝚪t,𝐂𝐒t+α2𝐐1𝐅tF2+2α2𝐂𝐒tF2absentsuperscriptsubscriptnorm𝚪superscript𝑡𝛼superscript𝐐1superscript𝐅𝑡F22𝛼𝚪superscript𝑡superscript𝐂𝐒𝑡superscript𝛼2superscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F22superscript𝛼2superscriptsubscriptnormsuperscript𝐂𝐒𝑡F2\displaystyle\leq\|{\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}{\bf{F}}^{t% }\|_{\mathrm{F}}^{2}-2\alpha\langle{\bf{\Gamma}}\mathcal{E}^{t},{\bf{C}}{\bf{S% }}^{t}\rangle+\alpha^{2}\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}+2\alpha% ^{2}\|{\bf{C}}{\bf{S}}^{t}\|_{\mathrm{F}}^{2}≤ ∥ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_α ⟨ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝚪tα𝐐1𝐅tF2+α2𝐐1𝐅tF22α𝚪t,𝐂𝐒t+4α2𝐒tF2.absentsuperscriptsubscriptnorm𝚪superscript𝑡𝛼superscript𝐐1superscript𝐅𝑡F2superscript𝛼2superscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F22𝛼𝚪superscript𝑡superscript𝐂𝐒𝑡4superscript𝛼2superscriptsubscriptnormsuperscript𝐒𝑡F2\displaystyle\leq\|{\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}{\bf{F}}^{t% }\|_{\mathrm{F}}^{2}+\alpha^{2}\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}-% 2\alpha\langle{\bf{\Gamma}}\mathcal{E}^{t},{\bf{C}}{\bf{S}}^{t}\rangle+4\alpha% ^{2}\|{\bf{S}}^{t}\|_{\mathrm{F}}^{2}.≤ ∥ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_α ⟨ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

For any matrices 𝐚𝐚{\bf{a}}bold_a and 𝐛𝐛{\bf{b}}bold_b, it holds from Jensen’s inequality that 𝐚+𝐛F21θ𝐚F2+11θ𝐛F2superscriptsubscriptnorm𝐚𝐛F21𝜃superscriptsubscriptnorm𝐚F211𝜃superscriptsubscriptnorm𝐛F2\|{\bf{a+b}}\|_{\mathrm{F}}^{2}\leq\frac{1}{\theta}\|{\bf{a}}\|_{\mathrm{F}}^{% 2}+\frac{1}{1-\theta}\|{\bf{b}}\|_{\mathrm{F}}^{2}∥ bold_a + bold_b ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ∥ bold_a ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 1 - italic_θ end_ARG ∥ bold_b ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for any θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ). Therefore, letting θ=𝚪:=γ𝜃norm𝚪assign𝛾\theta=\|{\bf{\Gamma}}\|:=\gammaitalic_θ = ∥ bold_Γ ∥ := italic_γ, it holds that

𝚪tα𝐐1𝐅tF21γ𝚪tF2+11γα𝐐1𝐅tF2γtF2+α21γ𝐐1𝐅tF2.superscriptsubscriptnorm𝚪superscript𝑡𝛼superscript𝐐1superscript𝐅𝑡F21𝛾superscriptsubscriptnorm𝚪superscript𝑡F211𝛾superscriptsubscriptnorm𝛼superscript𝐐1superscript𝐅𝑡F2𝛾superscriptsubscriptnormsuperscript𝑡F2superscript𝛼21𝛾superscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2\|{\bf{\Gamma}}\mathcal{E}^{t}-\alpha{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^% {2}\leq\frac{1}{\gamma}\|{\bf{\Gamma}}\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac% {1}{1-\gamma}\|\alpha{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}\leq\gamma\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{\alpha^{2}}{1-\gamma}\|{\bf{Q}}^{-1}{% \bf{F}}^{t}\|_{\mathrm{F}}^{2}.∥ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ∥ italic_α bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_γ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since 11γ>111𝛾1\frac{1}{1-\gamma}>1divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG > 1, we have

𝔾tF2γtF2+2α21γ𝐐1𝐅tF22αυ𝚪t,𝐂𝐒t+4α2𝐒tF2.superscriptsubscriptnormsuperscript𝔾𝑡F2𝛾superscriptsubscriptnormsuperscript𝑡F22superscript𝛼21𝛾superscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F22𝛼𝜐𝚪superscript𝑡superscript𝐂𝐒𝑡4superscript𝛼2superscriptsubscriptnormsuperscript𝐒𝑡F2\displaystyle\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}\leq\gamma\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}+\frac{2\alpha^{2}}{1-\gamma}\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{% \mathrm{F}}^{2}-2\alpha\upsilon\langle{\bf{\Gamma}}\mathcal{E}^{t},{\bf{C}}{% \bf{S}}^{t}\rangle+4\alpha^{2}\|{\bf{S}}^{t}\|_{\mathrm{F}}^{2}.∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_γ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_α italic_υ ⟨ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ + 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Note that

𝐒t=𝐆tF(𝐗t),𝔼[𝐒t|𝒢t]=0,𝔼[𝐒tF2|𝒢t]nσ2.formulae-sequencesuperscript𝐒𝑡superscript𝐆𝑡𝐹superscript𝐗𝑡formulae-sequence𝔼delimited-[]conditionalsuperscript𝐒𝑡superscript𝒢𝑡0𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐒𝑡F2superscript𝒢𝑡𝑛superscript𝜎2{\bf{S}}^{t}={\bf{G}}^{t}-\nabla F({\bf{X}}^{t}),\ \mathbb{E}\!\left[{\bf{S}}^% {t}\;|\;\mathcal{G}^{t}\right]=0,\ \mathbb{E}\!\left[\|{\bf{S}}^{t}\|_{\mathrm% {F}}^{2}\;|\;\mathcal{G}^{t}\right]\leq n\sigma^{2}.bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , blackboard_E [ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = 0 , blackboard_E [ ∥ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ italic_n italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

It follows from this above inequality that

𝔼[𝔾tF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝔾𝑡F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]blackboard_E [ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] γtF2+2α21γ𝔼[𝐐1𝐅tF2|𝒢t]2α𝔼[𝚪t,𝐂𝐒t|𝒢t]+4α2𝔼[𝐒tF2|𝒢t]absent𝛾superscriptsubscriptnormsuperscript𝑡F22superscript𝛼21𝛾𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡2𝛼𝔼delimited-[]conditional𝚪superscript𝑡superscript𝐂𝐒𝑡superscript𝒢𝑡4superscript𝛼2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐒𝑡F2superscript𝒢𝑡\displaystyle\leq\gamma\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{2\alpha^{2}}% {1-\gamma}\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}\;|% \;\mathcal{G}^{t}\right]-2\alpha\mathbb{E}\!\left[\langle{\bf{\Gamma}}\mathcal% {E}^{t},{\bf{C}}{\bf{S}}^{t}\rangle\;|\;\mathcal{G}^{t}\right]+4\alpha^{2}% \mathbb{E}\!\left[\|{\bf{S}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]≤ italic_γ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - 2 italic_α blackboard_E [ ⟨ bold_Γ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_CS start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
γtF2+2α21γ𝔼[𝐐1𝐅tF2|𝒢t]+4nα2σ2.absent𝛾superscriptsubscriptnormsuperscript𝑡F22superscript𝛼21𝛾𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡4𝑛superscript𝛼2superscript𝜎2\displaystyle\leq\gamma\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{2\alpha^{2}}% {1-\gamma}\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}\;|% \;\mathcal{G}^{t}\right]+4n\alpha^{2}\sigma^{2}.≤ italic_γ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + 4 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (53)

𝔼[𝐐1𝐅tF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{% G}^{t}\right]blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] can be bounded as follows: Note that

𝐐1=𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12] and 𝐈𝚲^a=12χ𝚲^b2.superscript𝐐1subscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12 and 𝐈subscript^𝚲𝑎12𝜒superscriptsubscript^𝚲𝑏2{\bf{Q}}^{-1}={\bf{Q}}_{1}\left[\begin{array}[]{cc}\frac{1}{2}\hat{{\bf{% \Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{% -\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]\text{ and }{\bf{I}}-\hat{{\bf{\Lambda}}}_{a}=\frac{1}{2\chi% }\hat{{\bf{\Lambda}}}_{b}^{2}.bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] and bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

It follows that

𝐐1𝐅tsuperscript𝐐1superscript𝐅𝑡\displaystyle{\bf{Q}}^{-1}{\bf{F}}^{t}bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐐1[12𝚲^a12j2(𝐈𝚲^a)1212𝚲^a12j2(𝐈𝚲^a)12][𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗¯t))(𝐈𝚲^a)𝐏^𝖳(F(𝐗t)F(𝐗¯t))+𝐏^𝖳(F(𝐗¯t)F(𝐗¯t+1))].absentsubscript𝐐1delimited-[]12superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎1212superscriptsubscript^𝚲𝑎12𝑗2superscript𝐈subscript^𝚲𝑎12delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡𝐈subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript^𝐏𝖳𝐹superscript¯𝐗𝑡𝐹superscript¯𝐗𝑡1\displaystyle={\bf{Q}}_{1}\left[\begin{array}[]{cc}\frac{1}{2}\hat{{\bf{% \Lambda}}}_{a}^{-\frac{1}{2}}&\frac{j}{2}({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{% -\frac{1}{2}}\\ \frac{1}{2}\hat{{\bf{\Lambda}}}_{a}^{-\frac{1}{2}}&-\frac{j}{2}({\bf{I}}-\hat{% {\bf{\Lambda}}}_{a})^{-\frac{1}{2}}\\ \end{array}\right]\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}% }}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t}))\\ ({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t% })-\nabla F(\bar{{\bf{X}}}^{t}))+\hat{{\bf{P}}}^{\sf T}(\nabla F(\bar{{\bf{X}}% }^{t})-\nabla F(\bar{{\bf{X}}}^{t+1}))\\ \end{array}\right].= bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL - divide start_ARG italic_j end_ARG start_ARG 2 end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) + over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ] .

By (𝐈𝚲^a)1=2χ1λ2normsuperscript𝐈subscript^𝚲𝑎12𝜒1subscript𝜆2\|({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{-1}\|=\frac{2\chi}{1-\lambda_{2}}∥ ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ = divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG and L𝐿Litalic_L-smoothness of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have

𝔼[𝐐1𝐅tF2|𝒢t]4L2𝐗t𝐗¯tF2+2χnL21λ2𝔼[𝐱¯t𝐱¯t+12|𝒢t].𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡4superscript𝐿2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F22𝜒𝑛superscript𝐿21subscript𝜆2𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡superscript¯𝐱𝑡12superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}% \;|\;\mathcal{G}^{t}\right]\leq 4L^{2}\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{% \mathrm{F}}^{2}+\frac{2\chi nL^{2}}{1-\lambda_{2}}\mathbb{E}\!\left[\|\bar{{% \bf{x}}}^{t}-\bar{{\bf{x}}}^{t+1}\|^{2}\;|\;\mathcal{G}^{t}\right].blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_χ italic_n italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] . (54)

Since 𝐗t𝐗¯tF24tF2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, it holds that

𝔼[𝐐1𝐅tF2|𝒢t]16L2tF2+2χnL21λ2𝔼[𝐱¯t𝐱¯t+12|𝒢t].𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡16superscript𝐿2superscriptsubscriptnormsuperscript𝑡F22𝜒𝑛superscript𝐿21subscript𝜆2𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡superscript¯𝐱𝑡12superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}% \;|\;\mathcal{G}^{t}\right]\leq 16L^{2}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+% \frac{2\chi nL^{2}}{1-\lambda_{2}}\mathbb{E}\!\left[\|\bar{{\bf{x}}}^{t}-\bar{% {\bf{x}}}^{t+1}\|^{2}\;|\;\mathcal{G}^{t}\right].blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ 16 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_χ italic_n italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] . (55)

Since 𝐱¯t+1=𝐱¯tαF¯(𝐗t)α𝐬¯tsuperscript¯𝐱𝑡1superscript¯𝐱𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\bar{{\bf{x}}}^{t+1}=\bar{{\bf{x}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t}% )-\alpha\bar{{\bf{s}}}^{t}over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, 𝔼[𝐬¯t|𝒢t]=0𝔼delimited-[]conditionalsuperscript¯𝐬𝑡superscript𝒢𝑡0\mathbb{E}\!\left[\bar{{\bf{s}}}^{t}\;|\;\mathcal{G}^{t}\right]=0blackboard_E [ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = 0, and 𝔼[𝐬¯t2|𝒢t]σ2n𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐬𝑡2superscript𝒢𝑡superscript𝜎2𝑛\mathbb{E}\!\left[\|\bar{{\bf{s}}}^{t}\|^{2}\;|\;\mathcal{G}^{t}\right]\leq% \frac{\sigma^{2}}{n}blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG, it gives that

𝔼[𝐱¯t𝐱¯t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript¯𝐱𝑡superscript¯𝐱𝑡1F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\bar{{\bf{x}}}^{t}-\bar{{\bf{x}}}^{t+1}\|_{% \mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] =𝔼[αF¯(𝐗t)+α𝐬¯tF2|𝒢t]absent𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡F2superscript𝒢𝑡\displaystyle=\mathbb{E}\!\left[\|\alpha\overline{\nabla F}({\bf{X}}^{t})+% \alpha\bar{{\bf{s}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]= blackboard_E [ ∥ italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=α2𝔼[𝐬¯t+(F¯(𝐗t)F¯(𝐗¯t))+F¯(𝐗¯t)F2|𝒢t]absentsuperscript𝛼2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript¯𝐬𝑡¯𝐹superscript𝐗𝑡¯𝐹superscript¯𝐗𝑡¯𝐹superscript¯𝐗𝑡F2superscript𝒢𝑡\displaystyle=\alpha^{2}\mathbb{E}\!\left[\|\bar{{\bf{s}}}^{t}+(\overline{% \nabla F}({\bf{X}}^{t})-\overline{\nabla F}(\bar{{\bf{X}}}^{t}))+\overline{% \nabla F}(\bar{{\bf{X}}}^{t})\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]= italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - over¯ start_ARG ∇ italic_F end_ARG ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) + over¯ start_ARG ∇ italic_F end_ARG ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
α2𝔼[𝐬¯t2|𝒢t]+2α2F¯(𝐗t)F¯(𝐗¯t)F2+2α2F¯(𝐗¯t)F2absentsuperscript𝛼2𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐬𝑡2superscript𝒢𝑡2superscript𝛼2superscriptsubscriptnorm¯𝐹superscript𝐗𝑡¯𝐹superscript¯𝐗𝑡F22superscript𝛼2superscriptsubscriptnorm¯𝐹superscript¯𝐗𝑡F2\displaystyle\leq\alpha^{2}\mathbb{E}\!\left[\|\bar{{\bf{s}}}^{t}\|^{2}\;|\;% \mathcal{G}^{t}\right]+2\alpha^{2}\|\overline{\nabla F}({\bf{X}}^{t})-% \overline{\nabla F}(\bar{{\bf{X}}}^{t})\|_{\mathrm{F}}^{2}+2\alpha^{2}\|% \overline{\nabla F}(\bar{{\bf{X}}}^{t})\|_{\mathrm{F}}^{2}≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - over¯ start_ARG ∇ italic_F end_ARG ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over¯ start_ARG ∇ italic_F end_ARG ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
α2σ2n+2α2L2n𝐗t𝐗¯tF2+2α2f(𝐱¯t)2absentsuperscript𝛼2superscript𝜎2𝑛2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F22superscript𝛼2superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle\leq\frac{\alpha^{2}\sigma^{2}}{n}+\frac{2\alpha^{2}L^{2}}{n}\|{% \bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+2\alpha^{2}\|\nabla f(\bar{{% \bf{x}}}^{t})\|^{2}≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
α2σ2n+8α2L2ntF2+2α2f(𝐱¯t)2.absentsuperscript𝛼2superscript𝜎2𝑛8superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝑡F22superscript𝛼2superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle\leq\frac{\alpha^{2}\sigma^{2}}{n}+\frac{8\alpha^{2}L^{2}}{n}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+2\alpha^{2}\|\nabla f(\bar{{\bf{x}}}^{t})\|% ^{2}.≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 8 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Then, substituting it into (55), we have

𝔼[𝐐1𝐅tF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript𝐅𝑡F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|{\bf{Q}}^{-1}{\bf{F}}^{t}\|_{\mathrm{F}}^{2}% \;|\;\mathcal{G}^{t}\right]\leqblackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ (16L2+16α2L4χ1λ2)tF2+4nα2L2χ1λ2f(𝐱¯t)2+2α2L2σ2χ1λ2.16superscript𝐿216superscript𝛼2superscript𝐿4𝜒1subscript𝜆2superscriptsubscriptnormsuperscript𝑡F24𝑛superscript𝛼2superscript𝐿2𝜒1subscript𝜆2superscriptnorm𝑓superscript¯𝐱𝑡22superscript𝛼2superscript𝐿2superscript𝜎2𝜒1subscript𝜆2\displaystyle(16L^{2}+\frac{16\alpha^{2}L^{4}\chi}{1-\lambda_{2}})\|\mathcal{E% }^{t}\|_{\mathrm{F}}^{2}+\frac{4n\alpha^{2}L^{2}\chi}{1-\lambda_{2}}\|\nabla f% (\bar{{\bf{x}}}^{t})\|^{2}+\frac{2\alpha^{2}L^{2}\sigma^{2}\chi}{1-\lambda_{2}}.( 16 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG . (56)

Thus, combining (E) and (56), it holds that

𝔼[𝔾tF2|𝒢t]=𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝔾𝑡F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]=blackboard_E [ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = γtF2+32α2L2+16α4L42χ1λ21γtF2𝛾superscriptsubscriptnormsuperscript𝑡F232superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾superscriptsubscriptnormsuperscript𝑡F2\displaystyle\gamma\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{32\alpha^{2}L^{2% }+16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\|\mathcal{E}^{t}\|_{% \mathrm{F}}^{2}italic_γ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+8nα4L2χ(1γ)(1λ2)f(𝐱¯t)2+4α4L2σ2χ(1γ)(1λ2)+4nα2σ2.8𝑛superscript𝛼4superscript𝐿2𝜒1𝛾1subscript𝜆2superscriptnorm𝑓superscript¯𝐱𝑡24superscript𝛼4superscript𝐿2superscript𝜎2𝜒1𝛾1subscript𝜆24𝑛superscript𝛼2superscript𝜎2\displaystyle+\frac{8n\alpha^{4}L^{2}\chi}{(1-\gamma)(1-\lambda_{2})}\|\nabla f% (\bar{{\bf{x}}}^{t})\|^{2}+\frac{4\alpha^{4}L^{2}\sigma^{2}\chi}{(1-\gamma)(1-% \lambda_{2})}+4n\alpha^{2}\sigma^{2}.+ divide start_ARG 8 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG ( 1 - italic_γ ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG ( 1 - italic_γ ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG + 4 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (57)

Then, we bound 𝔼[𝐐1𝐏^𝖳𝐙^tF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript^𝐙𝑡F2superscript𝒢𝑡\mathbb{E}\!\left[\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}\hat{{\bf{Z}}}^{t}\|_{% \mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]. Using 𝐗t𝐗¯tF24tF2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 𝐙^t=𝐗t𝐑tα(F(𝐗t)F(𝐗¯t)+𝐒t))\hat{{\bf{Z}}}^{t}={\bf{X}}^{t}-{\bf{R}}^{t}-\alpha(\nabla F({\bf{X}}^{t})-% \nabla F(\bar{{\bf{X}}}^{t})+{\bf{S}}^{t}))over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ), and 𝐐122χ(1+λn)(1λ2)superscriptnormsuperscript𝐐122𝜒1subscript𝜆𝑛1subscript𝜆2\|{\bf{Q}}^{-1}\|^{2}\leq\frac{2\chi}{(1+\lambda_{n})(1-\lambda_{2})}∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_χ end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG, we have

𝔼[𝐐1𝐏^𝖳𝐙^tF2|𝒢t]=𝔼[𝐐1𝐏^𝖳(𝐗t𝐑tα(F(𝐗t)F(𝐗¯t)+𝐒t))F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript^𝐙𝑡F2superscript𝒢𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript𝐗𝑡superscript𝐑𝑡𝛼𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡superscript𝐒𝑡F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}\hat{{\bf{% Z}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]=\mathbb{E}\!\left[\|{% \bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}({\bf{X}}^{t}-{\bf{R}}^{t}-\alpha(\nabla F({% \bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t})+{\bf{S}}^{t}))\|_{\mathrm{F}}^{2}\;|% \;\mathcal{G}^{t}\right]blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = blackboard_E [ ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝐐1𝐏^𝖳(𝐗t𝐑tα(F(𝐗t)F(𝐗¯t)))F2+𝔼[α𝐒tF2|𝒢t]absentsuperscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript𝐗𝑡superscript𝐑𝑡𝛼𝐹superscript𝐗𝑡𝐹superscript¯𝐗𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝛼superscript𝐒𝑡F2superscript𝒢𝑡\displaystyle\quad=\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}({\bf{X}}^{t}-{\bf{R}}% ^{t}-\alpha(\nabla F({\bf{X}}^{t})-\nabla F(\bar{{\bf{X}}}^{t})))\|_{\mathrm{F% }}^{2}+\mathbb{E}\!\left[\|\alpha{\bf{S}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{% G}^{t}\right]= ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ italic_α bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
3𝐐1𝐏^𝖳𝐗tF2+3𝐐1𝐏^𝖳𝐑tF2+3α2L2𝐐12𝐗t𝐗¯tF2+nα2σ2absent3superscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript𝐗𝑡F23superscriptsubscriptnormsuperscript𝐐1superscript^𝐏𝖳superscript𝐑𝑡F23superscript𝛼2superscript𝐿2superscriptnormsuperscript𝐐12superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F2𝑛superscript𝛼2superscript𝜎2\displaystyle\quad\leq 3\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}{\bf{X}}^{t}\|_{% \mathrm{F}}^{2}+3\|{\bf{Q}}^{-1}\hat{{\bf{P}}}^{\sf T}{\bf{R}}^{t}\|_{\mathrm{% F}}^{2}+3\alpha^{2}L^{2}\|{\bf{Q}}^{-1}\|^{2}\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}% \|_{\mathrm{F}}^{2}+n\alpha^{2}\sigma^{2}≤ 3 ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
3tF2+24χα2L2(1+λn)(1λ2)tF2+nα2σ2.absent3superscriptsubscriptnormsuperscript𝑡F224𝜒superscript𝛼2superscript𝐿21subscript𝜆𝑛1subscript𝜆2superscriptsubscriptnormsuperscript𝑡F2𝑛superscript𝛼2superscript𝜎2\displaystyle\quad\leq 3\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{24\chi% \alpha^{2}L^{2}}{(1+\lambda_{n})(1-\lambda_{2})}\|\mathcal{E}^{t}\|_{\mathrm{F% }}^{2}+n\alpha^{2}\sigma^{2}.≤ 3 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 24 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (58)

Combining (52) and (E), we have

𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] 𝔼[𝔾tF2|𝒢t]+2(1p)χ2(3tF2+24χα2L2(1+λn)(1λ2)tF2+nα2σ2).absent𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝔾𝑡F2superscript𝒢𝑡21𝑝superscript𝜒23superscriptsubscriptnormsuperscript𝑡F224𝜒superscript𝛼2superscript𝐿21subscript𝜆𝑛1subscript𝜆2superscriptsubscriptnormsuperscript𝑡F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq\mathbb{E}\!\left[\|\mathbb{G}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]+\frac{2(1-p)}{\chi^{2}}\left(3\|\mathcal{E}^{t}\|_{% \mathrm{F}}^{2}+\frac{24\chi\alpha^{2}L^{2}}{(1+\lambda_{n})(1-\lambda_{2})}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+n\alpha^{2}\sigma^{2}\right).≤ blackboard_E [ ∥ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + divide start_ARG 2 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 3 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 24 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Therefore, combining it and (E), the inequality (33) follows.

Proof of the inequality (2). Let 𝐞¯t𝐱¯t(𝐱)𝖳superscript¯𝐞𝑡superscript¯𝐱𝑡superscriptsuperscript𝐱𝖳\bar{{\bf{e}}}^{t}\triangleq\bar{{\bf{x}}}^{t}-({\bf{x}}^{\star})^{\sf T}over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT. It follows from (30a), i.e., 𝐱¯t+1=𝐱¯tαF¯(𝐗t)α𝐬¯tsuperscript¯𝐱𝑡1superscript¯𝐱𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\bar{{\bf{x}}}^{t+1}=\bar{{\bf{x}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t}% )-\alpha\bar{{\bf{s}}}^{t}over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, Assumption 4, and i=1nfi(𝐱)=0superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscript𝐱0\sum_{i=1}^{n}\nabla f_{i}({\bf{x}}^{\star})=0∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = 0 that

𝔼[𝐞¯t+12|𝒢t]=𝐞¯tαni=1n(fi(𝐱it)fi(𝐱))2+α2𝔼[𝐬¯k2|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐞𝑡12superscript𝒢𝑡superscriptnormsuperscript¯𝐞𝑡𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript𝐱2superscript𝛼2𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐬𝑘2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{e}}}^{t+1}\big{\|}^{2}\;|\;% \mathcal{G}^{t}\right]=\big{\|}\bar{{\bf{e}}}^{t}-\frac{\alpha}{n}\sum_{i=1}^{% n}(\nabla f_{i}({\bf{x}}_{i}^{t})-\nabla f_{i}({\bf{x}}^{\star}))\big{\|}^{2}+% \alpha^{2}\mathbb{E}\!\left[\|\bar{{\bf{s}}}^{k}\|^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
𝐞¯tαni=1n(fi(𝐱it)fi(𝐱))2+α2σ2nabsentsuperscriptnormsuperscript¯𝐞𝑡𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript𝐱2superscript𝛼2superscript𝜎2𝑛\displaystyle\leq\big{\|}\bar{{\bf{e}}}^{t}-\frac{\alpha}{n}\sum_{i=1}^{n}(% \nabla f_{i}({\bf{x}}_{i}^{t})-\nabla f_{i}({\bf{x}}^{\star}))\big{\|}^{2}+% \frac{\alpha^{2}\sigma^{2}}{n}≤ ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
=𝐞¯t2+α21ni=1n(fi(𝐱it)fi(𝐱))2+α2σ2n2αni=1nfi(𝐱it),𝐞¯t.absentsuperscriptnormsuperscript¯𝐞𝑡2superscript𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript𝐱2superscript𝛼2superscript𝜎2𝑛2𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡superscript¯𝐞𝑡\displaystyle=\|\bar{{\bf{e}}}^{t}\|^{2}+\alpha^{2}\Big{\|}\frac{1}{n}\sum_{i=% 1}^{n}(\nabla f_{i}({\bf{x}}_{i}^{t})-\nabla f_{i}({\bf{x}}^{\star}))\Big{\|}^% {2}+\frac{\alpha^{2}\sigma^{2}}{n}-\frac{2\alpha}{n}\sum_{i=1}^{n}\langle% \nabla f_{i}({\bf{x}}_{i}^{t}),\bar{{\bf{e}}}^{t}\rangle.= ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - divide start_ARG 2 italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ . (59)

It follows from the L𝐿Litalic_L-smoothness of f𝑓fitalic_f and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Jensen’s inequality that

α21ni=1n(fi(𝐱it)fi(𝐱))2=α21ni=1n(fi(𝐱it)fi(𝐱¯t)+fi(𝐱¯t)fi(𝐱))2superscript𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript𝐱2superscript𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript¯𝐱𝑡subscript𝑓𝑖superscript¯𝐱𝑡subscript𝑓𝑖superscript𝐱2\displaystyle\alpha^{2}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}(\nabla f_{i}({\bf{x}}% _{i}^{t})-\nabla f_{i}({\bf{x}}^{\star}))\Big{\|}^{2}=\alpha^{2}\Big{\|}\frac{% 1}{n}\sum_{i=1}^{n}(\nabla f_{i}({\bf{x}}_{i}^{t})-\nabla f_{i}(\bar{{\bf{x}}}% ^{t})+\nabla f_{i}(\bar{{\bf{x}}}^{t})-\nabla f_{i}({\bf{x}}^{\star}))\Big{\|}% ^{2}italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2α21ni=1n(fi(𝐱it)fi(𝐱¯t))2+2α21ni=1n(fi(𝐱¯t)fi(𝐱))2absent2superscript𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript¯𝐱𝑡22superscript𝛼2superscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscript¯𝐱𝑡subscript𝑓𝑖superscript𝐱2\displaystyle\leq 2\alpha^{2}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}(\nabla f_{i}({% \bf{x}}_{i}^{t})-\nabla f_{i}(\bar{{\bf{x}}}^{t}))\Big{\|}^{2}+2\alpha^{2}\Big% {\|}\frac{1}{n}\sum_{i=1}^{n}(\nabla f_{i}(\bar{{\bf{x}}}^{t})-\nabla f_{i}({% \bf{x}}^{\star}))\Big{\|}^{2}≤ 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2α2ni=1nfi(𝐱it)fi(𝐱¯t)2+2α2fi(𝐱¯t)fi(𝐱)2absent2superscript𝛼2𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript¯𝐱𝑡22superscript𝛼2superscriptnormsubscript𝑓𝑖superscript¯𝐱𝑡subscript𝑓𝑖superscript𝐱2\displaystyle\leq\frac{2\alpha^{2}}{n}\sum_{i=1}^{n}\Big{\|}\nabla f_{i}({\bf{% x}}_{i}^{t})-\nabla f_{i}(\bar{{\bf{x}}}^{t})\Big{\|}^{2}+2\alpha^{2}\Big{\|}% \nabla f_{i}(\bar{{\bf{x}}}^{t})-\nabla f_{i}({\bf{x}}^{\star})\Big{\|}^{2}≤ divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2α2L2n𝐗t𝟏𝐱¯tF2+4α2L(f(𝐱¯t)f(𝐱)𝐱¯t𝐱,f(𝐱))absent2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F24superscript𝛼2𝐿𝑓superscript¯𝐱𝑡𝑓superscript𝐱superscript¯𝐱𝑡superscript𝐱𝑓superscript𝐱\displaystyle\leq\frac{2\alpha^{2}L^{2}}{n}\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}% }}^{t}\|_{\mathrm{F}}^{2}+4\alpha^{2}L\big{(}f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^% {\star})-\langle\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star},\nabla f({\bf{x}}^{\star})% \rangle\big{)}≤ divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - ⟨ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ⟩ )
=2α2L2n𝐗t𝟏𝐱¯tF2+4α2L(f(𝐱¯t)f(𝐱)).absent2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F24superscript𝛼2𝐿𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle=\frac{2\alpha^{2}L^{2}}{n}\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^% {t}\|_{\mathrm{F}}^{2}+4\alpha^{2}L(f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star})).= divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) . (60)

Then, we consider the bound of 2αni=1nfi(𝐱it),𝐞¯t2𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡superscript¯𝐞𝑡-\frac{2\alpha}{n}\sum_{i=1}^{n}\langle\nabla f_{i}({\bf{x}}_{i}^{t}),\bar{{% \bf{e}}}^{t}\rangle- divide start_ARG 2 italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩. Since fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is L𝐿Litalic_L-smooth and μ𝜇\muitalic_μ-strongly convex, and 12i=1n𝐱it𝐱21ni=1n(𝐱it𝐱)12superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐱𝑖𝑡superscript𝐱2norm1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐱𝑖𝑡superscript𝐱\frac{1}{2}\sum_{i=1}^{n}\|{\bf{x}}_{i}^{t}-{\bf{x}}^{\star}\|^{2}\leq-\|\frac% {1}{n}\sum_{i=1}^{n}({\bf{x}}_{i}^{t}-{\bf{x}}^{\star})\|divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ - ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥, by (16), it gives that

2αni=1nfi(𝐱it),𝐞¯t=2αni=1n(fi(𝐱it),𝐱¯t𝐱itfi(𝐱it),𝐱it𝐱)2𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡superscript¯𝐞𝑡2𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡superscript¯𝐱𝑡superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscriptsuperscript𝐱𝑡𝑖superscript𝐱\displaystyle-\frac{2\alpha}{n}\sum_{i=1}^{n}\langle\nabla f_{i}({\bf{x}}_{i}^% {t}),\bar{{\bf{e}}}^{t}\rangle=\frac{2\alpha}{n}\sum_{i=1}^{n}\big{(}-\langle% \nabla f_{i}({\bf{x}}_{i}^{t}),\bar{{\bf{x}}}^{t}-{\bf{x}}_{i}^{t}\rangle-% \langle\nabla f_{i}({\bf{x}}_{i}^{t}),{\bf{x}}^{t}_{i}-{\bf{x}}^{\star}\rangle% \big{)}- divide start_ARG 2 italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ = divide start_ARG 2 italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( - ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ - ⟨ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ⟩ )
2αni=1n(fi(𝐱¯t)+fi(𝐱it)+L2𝐱¯t𝐱it2μ2𝐱it𝐱2fi(𝐱it)+fi(𝐱))absent2𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscript¯𝐱𝑡subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡𝐿2superscriptnormsuperscript¯𝐱𝑡subscriptsuperscript𝐱𝑡𝑖2𝜇2superscriptnormsuperscriptsubscript𝐱𝑖𝑡superscript𝐱2subscript𝑓𝑖superscriptsubscript𝐱𝑖𝑡subscript𝑓𝑖superscript𝐱\displaystyle\leq\frac{2\alpha}{n}\sum_{i=1}^{n}\Big{(}-f_{i}(\bar{{\bf{x}}}^{% t})+f_{i}({\bf{x}}_{i}^{t})+\frac{L}{2}\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{t}_{i}\|% ^{2}-\frac{\mu}{2}\|{\bf{x}}_{i}^{t}-{\bf{x}}^{\star}\|^{2}-f_{i}({\bf{x}}_{i}% ^{t})+f_{i}({\bf{x}}^{\star})\Big{)}≤ divide start_ARG 2 italic_α end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) )
2α(f(𝐱¯t)f(𝐱))+αLni=1n𝐱¯t𝐱it2μα𝐱¯t𝐱2absent2𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱𝛼𝐿𝑛superscriptsubscript𝑖1𝑛superscriptnormsuperscript¯𝐱𝑡superscriptsubscript𝐱𝑖𝑡2𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱2\displaystyle\leq-2\alpha(f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star}))+\frac{% \alpha L}{n}\sum_{i=1}^{n}\|\bar{{\bf{x}}}^{t}-{\bf{x}}_{i}^{t}\|^{2}-\mu% \alpha\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}≤ - 2 italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) + divide start_ARG italic_α italic_L end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ italic_α ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=2α(f(𝐱¯t)f(𝐱))+αLn𝐗t𝟏𝐱¯tF2μα𝐞¯t2.absent2𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F2𝜇𝛼superscriptnormsuperscript¯𝐞𝑡2\displaystyle=-2\alpha(f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star}))+\frac{\alpha L% }{n}\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^{t}\|_{\mathrm{F}}^{2}-\mu\alpha\|% \bar{{\bf{e}}}^{t}\|^{2}.= - 2 italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) + divide start_ARG italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ italic_α ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (61)

Substituting (E) and (E) into (E), and using f(𝐱¯t)f(𝐱)0𝑓superscript¯𝐱𝑡𝑓superscript𝐱0f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star})\geq 0italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ≥ 0, we have

𝔼[𝐞¯t+12|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐞𝑡12superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{e}}}^{t+1}\big{\|}^{2}\;|\;% \mathcal{G}^{t}\right]\leqblackboard_E [ ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ (1μα)𝐞¯t2+(αLn+2α2L2n)𝐗t𝐗¯tF21𝜇𝛼superscriptnormsuperscript¯𝐞𝑡2𝛼𝐿𝑛2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F2\displaystyle(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\Big{(}\frac{\alpha L}{n}% +\frac{2\alpha^{2}L^{2}}{n}\Big{)}\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm% {F}}^{2}( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_α italic_L end_ARG start_ARG italic_n end_ARG + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+α2σ2n2α(12αL)(f(𝐱¯t)f(𝐱)).superscript𝛼2superscript𝜎2𝑛2𝛼12𝛼𝐿𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle+\frac{\alpha^{2}\sigma^{2}}{n}-2\alpha(1-2\alpha L)(f(\bar{{\bf{% x}}}^{t})-f({\bf{x}}^{\star})).+ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - 2 italic_α ( 1 - 2 italic_α italic_L ) ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) . (62)

Since α14L𝛼14𝐿\alpha\leq\frac{1}{4L}italic_α ≤ divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG, it holds that

𝔼[𝐱¯t+1𝐱2|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\;|\;\mathcal{G}^{t}\right]\leqblackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ (1μα)𝐱¯t𝐱2+(αLn+2α2L2n)𝐗t𝐗¯tF21𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱2𝛼𝐿𝑛2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F2\displaystyle(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\Big{(}% \frac{\alpha L}{n}+\frac{2\alpha^{2}L^{2}}{n}\Big{)}\|{\bf{X}}^{t}-\bar{{\bf{X% }}}^{t}\|_{\mathrm{F}}^{2}( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_α italic_L end_ARG start_ARG italic_n end_ARG + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+α2σ2n2α(12αL)(f(𝐱¯t)f(𝐱))superscript𝛼2superscript𝜎2𝑛2𝛼12𝛼𝐿𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle+\frac{\alpha^{2}\sigma^{2}}{n}-2\alpha(1-2\alpha L)(f(\bar{{\bf{% x}}}^{t})-f({\bf{x}}^{\star}))+ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - 2 italic_α ( 1 - 2 italic_α italic_L ) ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) )
\displaystyle\leq (1μα)𝐱¯t𝐱2+3αL2n𝐗t𝐗¯tF2+α2σ2nα(f(𝐱¯t)f(𝐱)).1𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱23𝛼𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F2superscript𝛼2superscript𝜎2𝑛𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\frac{3% \alpha L}{2n}\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+\frac{\alpha% ^{2}\sigma^{2}}{n}-\alpha(f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star})).( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 3 italic_α italic_L end_ARG start_ARG 2 italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) .

Combining with 𝐗t𝐗¯tF24tF2superscriptsubscriptnormsuperscript𝐗𝑡superscript¯𝐗𝑡F24superscriptsubscriptnormsuperscript𝑡F2\|{\bf{X}}^{t}-\bar{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}\leq 4\|\mathcal{E}^{t}\|_% {\mathrm{F}}^{2}∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we complete the proof. ∎

Appendix F Proof of Theorem 4

Proof.

From the condition of stepsize, we have

α(1+λn)(1λ2)2χ12L24χα2L2(1+λn)(1λ2)3.𝛼1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿24𝜒superscript𝛼2superscript𝐿21subscript𝜆𝑛1subscript𝜆23\alpha\leq\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L}% \Longrightarrow\frac{24\chi\alpha^{2}L^{2}}{(1+\lambda_{n})(1-\lambda_{2})}% \leq 3.italic_α ≤ square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG ⟹ divide start_ARG 24 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ≤ 3 .

Then, it follows from the definition of γ~~𝛾\tilde{\gamma}over~ start_ARG italic_γ end_ARG (34) that

γ~~𝛾\displaystyle\tilde{\gamma}over~ start_ARG italic_γ end_ARG =γ+32α2L2+16α4L42χ1λ21γ+2(1p)(3+24χα2L2(1+λn)(1λ2))χ2absent𝛾32superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾21𝑝324𝜒superscript𝛼2superscript𝐿21subscript𝜆𝑛1subscript𝜆2superscript𝜒2\displaystyle=\gamma+\frac{32\alpha^{2}L^{2}+16\alpha^{4}L^{4}\frac{2\chi}{1-% \lambda_{2}}}{1-\gamma}+\frac{2(1-p)\big{(}3+\frac{24\chi\alpha^{2}L^{2}}{(1+% \lambda_{n})(1-\lambda_{2})}\big{)}}{\chi^{2}}= italic_γ + divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 ( 1 - italic_p ) ( 3 + divide start_ARG 24 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
γ+32α2L2+16α4L42χ1λ21γ+12(1p)χ2absent𝛾32superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾121𝑝superscript𝜒2\displaystyle\leq\gamma+\frac{32\alpha^{2}L^{2}+16\alpha^{4}L^{4}\frac{2\chi}{% 1-\lambda_{2}}}{1-\gamma}+\frac{12(1-p)}{\chi^{2}}≤ italic_γ + divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 12 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
γ+32α2L2+16α4L42χ1λ21γ+12(1p)χ2.absent𝛾32superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾121𝑝superscript𝜒2\displaystyle\leq\gamma+\frac{32\alpha^{2}L^{2}+16\alpha^{4}L^{4}\frac{2\chi}{% 1-\lambda_{2}}}{1-\gamma}+\frac{12(1-p)}{\chi^{2}}.≤ italic_γ + divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 12 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

To ensure γ~1+γ2~𝛾1𝛾2\tilde{\gamma}\leq\frac{1+\gamma}{2}over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG, we need to choose α𝛼\alphaitalic_α and χ𝜒\chiitalic_χ such that

32α2L2+16α4L42χ1λ21γ+12(1p)χ21γ2.32superscript𝛼2superscript𝐿216superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾121𝑝superscript𝜒21𝛾2\frac{32\alpha^{2}L^{2}+16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma% }+\frac{12(1-p)}{\chi^{2}}\leq\frac{1-\gamma}{2}.divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 12 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 1 - italic_γ end_ARG start_ARG 2 end_ARG .

By solving these inequalities

32α2L21γ1γ6,16α4L42χ1λ21γ1γ6,12(1p)χ21γ6,formulae-sequence32superscript𝛼2superscript𝐿21𝛾1𝛾6formulae-sequence16superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾1𝛾6121𝑝superscript𝜒21𝛾6\displaystyle\frac{32\alpha^{2}L^{2}}{1-\gamma}\leq\frac{1-\gamma}{6},\quad% \frac{16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\leq\frac{1-% \gamma}{6},\quad\frac{12(1-p)}{\chi^{2}}\leq\frac{1-\gamma}{6},divide start_ARG 32 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 1 - italic_γ end_ARG start_ARG 6 end_ARG , divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 1 - italic_γ end_ARG start_ARG 6 end_ARG , divide start_ARG 12 ( 1 - italic_p ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 1 - italic_γ end_ARG start_ARG 6 end_ARG ,

and using γ=112χ(1λ2)𝛾112𝜒1subscript𝜆2\gamma=\sqrt{1-\frac{1}{2\chi}(1-\lambda_{2})}italic_γ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG, we have

αmin{1λ2323χL,(1λ2)312χ3414L},χ288(1p)1λ2.formulae-sequence𝛼1subscript𝜆2323𝜒𝐿4superscript1subscript𝜆2312superscript𝜒314𝐿𝜒2881𝑝1subscript𝜆2\alpha\leq\min\left\{\frac{1-\lambda_{2}}{32\sqrt{3}\chi L},\sqrt[4]{\frac{(1-% \lambda_{2})^{3}}{12\chi^{3}}}\frac{1}{4L}\right\},\ \chi\geq\frac{288(1-p)}{1% -\lambda_{2}}.italic_α ≤ roman_min { divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } , italic_χ ≥ divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .

Thus, it implies that if the condition of α𝛼\alphaitalic_α and χ𝜒\chiitalic_χ in this Lemma holds, then γ~1+γ2<1~𝛾1𝛾21\tilde{\gamma}\leq\frac{1+\gamma}{2}<1over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1.

Define the Lyapunov function

t=f(𝐱¯t)f+2αL2n(1γ~)t2.superscript𝑡𝑓superscript¯𝐱𝑡superscript𝑓2𝛼superscript𝐿2𝑛1~𝛾superscriptnormsuperscript𝑡2\mathcal{L}^{t}=f(\bar{{\bf{x}}}^{t})-f^{\star}+\frac{2\alpha L^{2}}{n(1-% \tilde{\gamma})}\|\mathcal{E}^{t}\|^{2}.caligraphic_L start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since 11γ4χ1λ211𝛾4𝜒1subscript𝜆2\frac{1}{1-\gamma}\leq\frac{4\chi}{1-\lambda_{2}}divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 4 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG and 11γ~21γ11~𝛾21𝛾\frac{1}{1-\tilde{\gamma}}\leq\frac{2}{1-\gamma}divide start_ARG 1 end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG ≤ divide start_ARG 2 end_ARG start_ARG 1 - italic_γ end_ARG, we have

2χ1λ2(1γ)232χ3(1λ2)3 and 16α4L42χ1λ2(1γ~)(1γ)1024α4L4χ3(1λ2)3.2𝜒1subscript𝜆2superscript1𝛾232superscript𝜒3superscript1subscript𝜆23 and 16superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾1024superscript𝛼4superscript𝐿4superscript𝜒3superscript1subscript𝜆23\frac{\frac{2\chi}{1-\lambda_{2}}}{(1-\gamma)^{2}}\leq\frac{32\chi^{3}}{(1-% \lambda_{2})^{3}}\text{ and }\frac{16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}% }}{(1-\tilde{\gamma})(1-\gamma)}\leq\frac{1024\alpha^{4}L^{4}\chi^{3}}{(1-% \lambda_{2})^{3}}.divide start_ARG divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 32 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG and divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG ≤ divide start_ARG 1024 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG .

Thus, it gives that

α(1λ2)38χ3414L12<116α4L42χ1λ2(1γ~)(1γ).𝛼4superscript1subscript𝜆238superscript𝜒314𝐿12116superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾\alpha\leq\sqrt[4]{\frac{(1-\lambda_{2})^{3}}{8\chi^{3}}}\frac{1}{4L}% \Rightarrow\frac{1}{2}<1-\frac{16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{(% 1-\tilde{\gamma})(1-\gamma)}.italic_α ≤ nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG ⇒ divide start_ARG 1 end_ARG start_ARG 2 end_ARG < 1 - divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG .

Then, since α12L𝛼12𝐿\alpha\leq\frac{1}{2L}italic_α ≤ divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG, by (32) and (33), it gives

𝔼[t+1|𝒢t]𝔼delimited-[]conditionalsuperscript𝑡1superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\mathcal{L}^{t+1}\;|\;\mathcal{G}^{t}\right]\leqblackboard_E [ caligraphic_L start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ f(𝐱¯t)fα2f(𝐱¯t)2+2αL2ntF2+Lα2σ22n𝑓superscript¯𝐱𝑡superscript𝑓𝛼2superscriptnorm𝑓superscript¯𝐱𝑡22𝛼superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝑡F2𝐿superscript𝛼2superscript𝜎22𝑛\displaystyle f(\bar{{\bf{x}}}^{t})-f^{\star}-\frac{\alpha}{2}\big{\|}\nabla f% (\bar{{\bf{x}}}^{t})\big{\|}^{2}+\frac{2\alpha L^{2}}{n}\big{\|}\mathcal{E}^{t% }\big{\|}_{\mathrm{F}}^{2}+\frac{L\alpha^{2}\sigma^{2}}{2n}italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG
+2αL2n(1γ~)(γ~tF2+4nα4L22χ1λ21γf(𝐱¯t)2+2α4L2σ22χ1λ21γ+2nα2σ2(2χ2+(1p))χ2)2𝛼superscript𝐿2𝑛1~𝛾~𝛾superscriptsubscriptnormsuperscript𝑡F24𝑛superscript𝛼4superscript𝐿22𝜒1subscript𝜆21𝛾superscriptnorm𝑓superscript¯𝐱𝑡22superscript𝛼4superscript𝐿2superscript𝜎22𝜒1subscript𝜆21𝛾2𝑛superscript𝛼2superscript𝜎22superscript𝜒21𝑝superscript𝜒2\displaystyle+\frac{2\alpha L^{2}}{n(1-\tilde{\gamma})}\Big{(}\tilde{\gamma}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{4n\alpha^{4}L^{2}\frac{2\chi}{1-% \lambda_{2}}}{1-\gamma}\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}+\frac{2\alpha^{4}L% ^{2}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}+\frac{2n\alpha^{2}\sigma^% {2}(2\chi^{2}+(1-p))}{\chi^{2}}\Big{)}+ divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ( over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
=\displaystyle== f(𝐱¯t)f+2αL2n(1γ~)tF2α2(116α4L42χ1λ2(1γ~)(1γ))f(𝐱¯t)2𝑓superscript¯𝐱𝑡superscript𝑓2𝛼superscript𝐿2𝑛1~𝛾superscriptsubscriptnormsuperscript𝑡F2𝛼2116superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle f(\bar{{\bf{x}}}^{t})-f^{\star}+\frac{2\alpha L^{2}}{n(1-\tilde{% \gamma})}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}-\frac{\alpha}{2}\Big{(}1-\frac{1% 6\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{(1-\tilde{\gamma})(1-\gamma)}\Big% {)}\big{\|}\nabla f(\bar{{\bf{x}}}^{t})\big{\|}^{2}italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ( 1 - divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG ) ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+Lα2σ22n+4σ2L4α52χ1λ2n(1γ~)(1γ)+4L2σ2α3(2χ2+(1p))(1γ~)χ2𝐿superscript𝛼2superscript𝜎22𝑛4superscript𝜎2superscript𝐿4superscript𝛼52𝜒1subscript𝜆2𝑛1~𝛾1𝛾4superscript𝐿2superscript𝜎2superscript𝛼32superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle+\frac{L\alpha^{2}\sigma^{2}}{2n}+\frac{4\sigma^{2}L^{4}\alpha^{5% }\frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{4L^{2}% \sigma^{2}\alpha^{3}(2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}}+ divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 4 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq tα4f(𝐱¯t)2+Lα2σ22n+4σ2L4α52χ1λ2n(1γ~)(1γ)+4L2σ2α3(2χ2+(1p))(1γ~)χ2,superscript𝑡𝛼4superscriptnorm𝑓superscript¯𝐱𝑡2𝐿superscript𝛼2superscript𝜎22𝑛4superscript𝜎2superscript𝐿4superscript𝛼52𝜒1subscript𝜆2𝑛1~𝛾1𝛾4superscript𝐿2superscript𝜎2superscript𝛼32superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\mathcal{L}^{t}-\frac{\alpha}{4}\big{\|}\nabla f(\bar{{\bf{x}}}^{% t})\big{\|}^{2}+\frac{L\alpha^{2}\sigma^{2}}{2n}+\frac{4\sigma^{2}L^{4}\alpha^% {5}\frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{4L^{2}% \sigma^{2}\alpha^{3}(2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}},caligraphic_L start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 4 end_ARG ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 4 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where the last inequality holds because the condition (36) implies 12<116α4L42χ1λ2(1γ~)(1γ)12116superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾\frac{1}{2}<1-\frac{16\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{(1-\tilde{% \gamma})(1-\gamma)}divide start_ARG 1 end_ARG start_ARG 2 end_ARG < 1 - divide start_ARG 16 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG. Taking full expectation, we have

𝔼[t+1]𝔼[t]α4𝔼[f(𝐱¯t)2]+Lα2σ22n+4σ2L4α52χ1λ2n(1γ~)(1γ)+4L2σ2α3(2χ2+(1p))(1γ~)χ2.𝔼delimited-[]superscript𝑡1𝔼delimited-[]superscript𝑡𝛼4𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2𝐿superscript𝛼2superscript𝜎22𝑛4superscript𝜎2superscript𝐿4superscript𝛼52𝜒1subscript𝜆2𝑛1~𝛾1𝛾4superscript𝐿2superscript𝜎2superscript𝛼32superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\mathbb{E}\!\left[\mathcal{L}^{t+1}\right]\leq\mathbb{E}\!\left[% \mathcal{L}^{t}\right]-\frac{\alpha}{4}\mathbb{E}\!\left[\big{\|}\nabla f(\bar% {{\bf{x}}}^{t})\big{\|}^{2}\right]+\frac{L\alpha^{2}\sigma^{2}}{2n}+\frac{4% \sigma^{2}L^{4}\alpha^{5}\frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{\gamma})(1-% \gamma)}+\frac{4L^{2}\sigma^{2}\alpha^{3}(2\chi^{2}+(1-p))}{(1-\tilde{\gamma})% \chi^{2}}.blackboard_E [ caligraphic_L start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ] ≤ blackboard_E [ caligraphic_L start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - divide start_ARG italic_α end_ARG start_ARG 4 end_ARG blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 4 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Summing it over t=0,1,,T1𝑡01𝑇1t=0,1,\cdots,T-1italic_t = 0 , 1 , ⋯ , italic_T - 1, we can obtain

α4t=0T1𝔼[f(𝐱¯t)2]0+T(Lα2σ22n+4σ2L4α52χ1λ2n(1γ~)(1γ)+4L2σ2α3(2χ2+(1p))(1γ~)χ2),𝛼4superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2superscript0𝑇𝐿superscript𝛼2superscript𝜎22𝑛4superscript𝜎2superscript𝐿4superscript𝛼52𝜒1subscript𝜆2𝑛1~𝛾1𝛾4superscript𝐿2superscript𝜎2superscript𝛼32superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\frac{\alpha}{4}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f% (\bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leq\mathcal{L}^{0}+T\Big{(}\frac{L% \alpha^{2}\sigma^{2}}{2n}+\frac{4\sigma^{2}L^{4}\alpha^{5}\frac{2\chi}{1-% \lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{4L^{2}\sigma^{2}\alpha^{3}(% 2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}}\Big{)},divide start_ARG italic_α end_ARG start_ARG 4 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_L start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_T ( divide start_ARG italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 4 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 4 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ,

which implies that

1Tt=0T1𝔼[f(𝐱¯t)2]40αT+2Lασ2n+16σ2L4α42χ1λ2n(1γ~)(1γ)+16L2σ2α2(2χ2+(1p))(1γ~)χ2.1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡24superscript0𝛼𝑇2𝐿𝛼superscript𝜎2𝑛16superscript𝜎2superscript𝐿4superscript𝛼42𝜒1subscript𝜆2𝑛1~𝛾1𝛾16superscript𝐿2superscript𝜎2superscript𝛼22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leq\frac{4\mathcal{L}^{0}}{\alpha T}+% \frac{2L\alpha\sigma^{2}}{n}+\frac{16\sigma^{2}L^{4}\alpha^{4}\frac{2\chi}{1-% \lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{16L^{2}\sigma^{2}\alpha^{2}% (2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}}.divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ divide start_ARG 4 caligraphic_L start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 2 italic_L italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 16 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 16 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since 𝐗0=[𝐱0,,𝐱0]𝖳superscript𝐗0superscriptsuperscript𝐱0superscript𝐱0𝖳{\bf{X}}^{0}=[{\bf{x}}^{0},\cdots,{\bf{x}}^{0}]^{\sf T}bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = [ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, by [45, (75)], we have 0F22α2(𝐈𝚲^a)1F(𝐗0)𝟏n(f(𝐱0))𝖳2superscriptsubscriptnormsuperscript0F22superscript𝛼2normsuperscript𝐈subscript^𝚲𝑎1superscriptnorm𝐹superscript𝐗0tensor-productsubscript1𝑛superscript𝑓superscript𝐱0𝖳2\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}\leq 2\alpha^{2}\|({\bf{I}}-\hat{{\bf{% \Lambda}}}_{a})^{-1}\|\|\nabla F({\bf{X}}^{0})-{\bf{1}}_{n}\otimes(\nabla f({% \bf{x}}^{0}))^{\sf T}\|^{2}∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊗ ( ∇ italic_f ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Notice that ς02=1ni=1nfi(𝐱¯0)f(𝐱¯0)2subscriptsuperscript𝜍201𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑓𝑖superscript¯𝐱0𝑓superscript¯𝐱02\varsigma^{2}_{0}=\frac{1}{n}\sum_{i=1}^{n}\|\nabla f_{i}(\bar{{\bf{x}}}^{0})-% \nabla f(\bar{{\bf{x}}}^{0})\|^{2}italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It holds that

0superscript0\displaystyle\mathcal{L}^{0}caligraphic_L start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =f(𝐱¯0)f+2αL2n(1γ~)0F2absent𝑓superscript¯𝐱0superscript𝑓2𝛼superscript𝐿2𝑛1~𝛾superscriptsubscriptnormsuperscript0F2\displaystyle=f(\bar{{\bf{x}}}^{0})-f^{\star}+\frac{2\alpha L^{2}}{n(1-\tilde{% \gamma})}\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}= italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=f(𝐱¯0)f+2αL2n(1γ~)(2α2(𝐈𝚲^a)1F(𝐗0)𝟏n(f(𝐱0))𝖳2)absent𝑓superscript¯𝐱0superscript𝑓2𝛼superscript𝐿2𝑛1~𝛾2superscript𝛼2normsuperscript𝐈subscript^𝚲𝑎1superscriptnorm𝐹superscript𝐗0tensor-productsubscript1𝑛superscript𝑓superscript𝐱0𝖳2\displaystyle=f(\bar{{\bf{x}}}^{0})-f^{\star}+\frac{2\alpha L^{2}}{n(1-\tilde{% \gamma})}\Big{(}2\alpha^{2}\|({\bf{I}}-\hat{{\bf{\Lambda}}}_{a})^{-1}\|\|% \nabla F({\bf{X}}^{0})-{\bf{1}}_{n}\otimes(\nabla f({\bf{x}}^{0}))^{\sf T}\|^{% 2}\Big{)}= italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ( 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊗ ( ∇ italic_f ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
f(𝐱¯0)f+32χ2α3L2ς02(1λ2)2.absent𝑓superscript¯𝐱0superscript𝑓32superscript𝜒2superscript𝛼3superscript𝐿2subscriptsuperscript𝜍20superscript1subscript𝜆22\displaystyle\leq f(\bar{{\bf{x}}}^{0})-f^{\star}+\frac{32\chi^{2}\alpha^{3}L^% {2}\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}}.≤ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + divide start_ARG 32 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (63)

Using (63) and

11γ4χ1λ2,11γ~21γ,formulae-sequence11𝛾4𝜒1subscript𝜆211~𝛾21𝛾\frac{1}{1-\gamma}\leq\frac{4\chi}{1-\lambda_{2}},\ \frac{1}{1-\tilde{\gamma}}% \leq\frac{2}{1-\gamma},divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 4 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , divide start_ARG 1 end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG ≤ divide start_ARG 2 end_ARG start_ARG 1 - italic_γ end_ARG ,

we have

1Tt=0T1𝔼[f(𝐱¯t)2]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] 4(f(𝐱¯0)f)αT+128χ2L2α2ς02(1λ2)2T+2Lασ2nabsent4𝑓superscript¯𝐱0superscript𝑓𝛼𝑇128superscript𝜒2superscript𝐿2superscript𝛼2subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇2𝐿𝛼superscript𝜎2𝑛\displaystyle\leq\frac{4(f(\bar{{\bf{x}}}^{0})-f^{*})}{\alpha T}+\frac{128\chi% ^{2}L^{2}\alpha^{2}\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}+\frac{2L\alpha% \sigma^{2}}{n}≤ divide start_ARG 4 ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 128 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG 2 italic_L italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
+1024σ2L4α4χ3n(1λ2)3+128χα2L2σ2(2χ2+(1p))(1λ2)χ2.1024superscript𝜎2superscript𝐿4superscript𝛼4superscript𝜒3𝑛superscript1subscript𝜆23128𝜒superscript𝛼2superscript𝐿2superscript𝜎22superscript𝜒21𝑝1subscript𝜆2superscript𝜒2\displaystyle\quad+\frac{1024\sigma^{2}L^{4}\alpha^{4}\chi^{3}}{n(1-\lambda_{2% })^{3}}+\frac{128\chi\alpha^{2}L^{2}\sigma^{2}(2\chi^{2}+(1-p))}{(1-\lambda_{2% })\chi^{2}}.+ divide start_ARG 1024 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 128 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since α1λ2323χL𝛼1subscript𝜆2323𝜒𝐿\alpha\leq\frac{1-\lambda_{2}}{32\sqrt{3}\chi L}italic_α ≤ divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG, we have 1024σ2L4α4χ3n(1λ2)3α2L2σ2χ3n(1λ2)α2L2σ2χ2(1λ2)1024superscript𝜎2superscript𝐿4superscript𝛼4superscript𝜒3𝑛superscript1subscript𝜆23superscript𝛼2superscript𝐿2superscript𝜎2𝜒3𝑛1subscript𝜆2superscript𝛼2superscript𝐿2superscript𝜎2𝜒21subscript𝜆2\frac{1024\sigma^{2}L^{4}\alpha^{4}\chi^{3}}{n(1-\lambda_{2})^{3}}\leq\frac{% \alpha^{2}L^{2}\sigma^{2}\chi}{3n(1-\lambda_{2})}\leq\frac{\alpha^{2}L^{2}% \sigma^{2}\chi}{2(1-\lambda_{2})}divide start_ARG 1024 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 3 italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG, it holds that

1Tt=0T1𝔼[f(𝐱¯t)2]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] 4(f(𝐱¯0)f)αT+128χ2L2α2ς02(1λ2)2Tabsent4𝑓superscript¯𝐱0superscript𝑓𝛼𝑇128superscript𝜒2superscript𝐿2superscript𝛼2subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇\displaystyle\leq\frac{4(f(\bar{{\bf{x}}}^{0})-f^{*})}{\alpha T}+\frac{128\chi% ^{2}L^{2}\alpha^{2}\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}≤ divide start_ARG 4 ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 128 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG
+2Lασ2n+α2L2σ2χ3+256χα2L2σ2(2χ2+(1p))2(1λ2)χ2,2𝐿𝛼superscript𝜎2𝑛superscript𝛼2superscript𝐿2superscript𝜎2superscript𝜒3256𝜒superscript𝛼2superscript𝐿2superscript𝜎22superscript𝜒21𝑝21subscript𝜆2superscript𝜒2\displaystyle\quad+\frac{2L\alpha\sigma^{2}}{n}+\frac{\alpha^{2}L^{2}\sigma^{2% }\chi^{3}+256\chi\alpha^{2}L^{2}\sigma^{2}(2\chi^{2}+(1-p))}{2(1-\lambda_{2})% \chi^{2}},+ divide start_ARG 2 italic_L italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 256 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

i.e, (4) holds. ∎

Appendix G Proof of Corollary 2

Proof.

We derive a tighter rate by carefully selecting the step size similar to [35]. We rewrite (4) as

1Tt=0T1𝔼[f(𝐱¯t)2]c0αT+c1α+c2α2:=ΨT+a0α2T,1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2subscriptsubscript𝑐0𝛼𝑇subscript𝑐1𝛼subscript𝑐2superscript𝛼2assignabsentsubscriptΨ𝑇subscript𝑎0superscript𝛼2𝑇\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[\big{\|}\nabla f(% \bar{{\bf{x}}}^{t})\big{\|}^{2}\right]\leq\underbrace{\frac{c_{0}}{\alpha T}+c% _{1}\alpha+c_{2}\alpha^{2}}_{:=\Psi_{T}}+\frac{a_{0}\alpha^{2}}{T},divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ under⏟ start_ARG divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT := roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG , (64)

where

c0=4(f(𝐱¯0)f),c1=2Lσ2n,c2=L2σ2(χ3+256χ(2χ2+(1p)))2(1λ2)χ2,a0=128χ2L2ς02(1λ2)2.formulae-sequencesubscript𝑐04𝑓superscript¯𝐱0superscript𝑓formulae-sequencesubscript𝑐12𝐿superscript𝜎2𝑛formulae-sequencesubscript𝑐2superscript𝐿2superscript𝜎2superscript𝜒3256𝜒2superscript𝜒21𝑝21subscript𝜆2superscript𝜒2subscript𝑎0128superscript𝜒2superscript𝐿2subscriptsuperscript𝜍20superscript1subscript𝜆22\displaystyle c_{0}=4(f(\bar{{\bf{x}}}^{0})-f^{*}),\ c_{1}=\frac{2L\sigma^{2}}% {n},\ c_{2}=\frac{L^{2}\sigma^{2}\big{(}\chi^{3}+256\chi(2\chi^{2}+(1-p))\big{% )}}{2(1-\lambda_{2})\chi^{2}},\ a_{0}=\frac{128\chi^{2}L^{2}\varsigma^{2}_{0}}% {(1-\lambda_{2})^{2}}.italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 4 ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 256 italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 128 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (65)

From the condition of stepsize, we have

α1α¯=min{12L,1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)312χ3414L}=𝒪(1λ2χL).𝛼1¯𝛼12𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2312superscript𝜒314𝐿𝒪1subscript𝜆2𝜒𝐿\alpha\leq\frac{1}{\underline{\alpha}}=\min\left\{\frac{1}{2L},\frac{1-\lambda% _{2}}{32\sqrt{3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}% \frac{1}{2L},\sqrt[4]{\frac{(1-\lambda_{2})^{3}}{12\chi^{3}}}\frac{1}{4L}% \right\}=\mathcal{O}\left(\frac{1-\lambda_{2}}{\chi L}\right).italic_α ≤ divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG = roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } = caligraphic_O ( divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_χ italic_L end_ARG ) .

Setting

α=min{(c0c1T)12,(c0c2T)13,1α¯},𝛼superscriptsubscript𝑐0subscript𝑐1𝑇12superscriptsubscript𝑐0subscript𝑐2𝑇131¯𝛼\alpha=\min\left\{\left(\frac{c_{0}}{c_{1}T}\right)^{\frac{1}{2}},\left(\frac{% c_{0}}{c_{2}T}\right)^{\frac{1}{3}},\frac{1}{\underline{\alpha}}\right\},italic_α = roman_min { ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT , divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG } ,

we have the following cases.

- When α=1α¯𝛼1¯𝛼\alpha=\frac{1}{\underline{\alpha}}italic_α = divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG and is smaller than both (c0c1T)12superscriptsubscript𝑐0subscript𝑐1𝑇12\left(\frac{c_{0}}{c_{1}T}\right)^{\frac{1}{2}}( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT and (c0c2T)13superscriptsubscript𝑐0subscript𝑐2𝑇13\left(\frac{c_{0}}{c_{2}T}\right)^{\frac{1}{3}}( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT, then

ΨT=c0αT+c1α+c2α2=αc0T+c1α¯+c2α¯2αc0T+c112(c0T)12+c213(c0T)23.subscriptΨ𝑇subscript𝑐0𝛼𝑇subscript𝑐1𝛼subscript𝑐2superscript𝛼2𝛼subscript𝑐0𝑇subscript𝑐1¯𝛼subscript𝑐2superscript¯𝛼2𝛼subscript𝑐0𝑇superscriptsubscript𝑐112superscriptsubscript𝑐0𝑇12superscriptsubscript𝑐213superscriptsubscript𝑐0𝑇23\Psi_{T}=\frac{c_{0}}{\alpha T}+c_{1}\alpha+c_{2}\alpha^{2}=\frac{\alpha c_{0}% }{T}+\frac{c_{1}}{\underline{\alpha}}+\frac{c_{2}}{\underline{\alpha}^{2}}\leq% \frac{\alpha c_{0}}{T}+c_{1}^{\frac{1}{2}}\left(\frac{c_{0}}{T}\right)^{\frac{% 1}{2}}+c_{2}^{\frac{1}{3}}\left(\frac{c_{0}}{T}\right)^{\frac{2}{3}}.roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_α italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG + divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG + divide start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_α italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT .

- When α=(c0c1T)12(c0c2T)13𝛼superscriptsubscript𝑐0subscript𝑐1𝑇12superscriptsubscript𝑐0subscript𝑐2𝑇13\alpha=\left(\frac{c_{0}}{c_{1}T}\right)^{\frac{1}{2}}\leq\left(\frac{c_{0}}{c% _{2}T}\right)^{\frac{1}{3}}italic_α = ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≤ ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT, then

ΨT2c112(c0T)12+c2(c0c1T)2c112(c0T)12+c213(c0T)23.subscriptΨ𝑇2superscriptsubscript𝑐112superscriptsubscript𝑐0𝑇12subscript𝑐2subscript𝑐0subscript𝑐1𝑇2superscriptsubscript𝑐112superscriptsubscript𝑐0𝑇12superscriptsubscript𝑐213superscriptsubscript𝑐0𝑇23\Psi_{T}\leq 2c_{1}^{\frac{1}{2}}\left(\frac{c_{0}}{T}\right)^{\frac{1}{2}}+c_% {2}\left(\frac{c_{0}}{c_{1}T}\right)\leq 2c_{1}^{\frac{1}{2}}\left(\frac{c_{0}% }{T}\right)^{\frac{1}{2}}+c_{2}^{\frac{1}{3}}\left(\frac{c_{0}}{T}\right)^{% \frac{2}{3}}.roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T end_ARG ) ≤ 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT .

- When α=(c0c2T)13(c0c1T)12𝛼superscriptsubscript𝑐0subscript𝑐2𝑇13superscriptsubscript𝑐0subscript𝑐1𝑇12\alpha=\left(\frac{c_{0}}{c_{2}T}\right)^{\frac{1}{3}}\leq\left(\frac{c_{0}}{c% _{1}T}\right)^{\frac{1}{2}}italic_α = ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ≤ ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, then

ΨT2c213(c0T)23+c1(c0c2T)132c213(c0T)23+c112(c0T)12.subscriptΨ𝑇2superscriptsubscript𝑐213superscriptsubscript𝑐0𝑇23subscript𝑐1superscriptsubscript𝑐0subscript𝑐2𝑇132superscriptsubscript𝑐213superscriptsubscript𝑐0𝑇23superscriptsubscript𝑐112superscriptsubscript𝑐0𝑇12\Psi_{T}\leq 2c_{2}^{\frac{1}{3}}\left(\frac{c_{0}}{T}\right)^{\frac{2}{3}}+c_% {1}\left(\frac{c_{0}}{c_{2}T}\right)^{\frac{1}{3}}\leq 2c_{2}^{\frac{1}{3}}% \left(\frac{c_{0}}{T}\right)^{\frac{2}{3}}+c_{1}^{\frac{1}{2}}\left(\frac{c_{0% }}{T}\right)^{\frac{1}{2}}.roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

Combining the above three cases together it holds that

ΨT=c0αT+c1α+c2α22c112(c0T)12+2c213(c0T)23+αc0T.subscriptΨ𝑇subscript𝑐0𝛼𝑇subscript𝑐1𝛼subscript𝑐2superscript𝛼22superscriptsubscript𝑐112superscriptsubscript𝑐0𝑇122superscriptsubscript𝑐213superscriptsubscript𝑐0𝑇23𝛼subscript𝑐0𝑇\Psi_{T}=\frac{c_{0}}{\alpha T}+c_{1}\alpha+c_{2}\alpha^{2}\leq 2c_{1}^{\frac{% 1}{2}}\left(\frac{c_{0}}{T}\right)^{\frac{1}{2}}+2c_{2}^{\frac{1}{3}}\left(% \frac{c_{0}}{T}\right)^{\frac{2}{3}}+\frac{\alpha c_{0}}{T}.roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + 2 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_α italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG .

Substituting the above into (64), we conclude that

1Rr=0R1r2c112(c0R)12+2c213(c0R)23+(α¯c0+a0/α¯2)R1𝑅superscriptsubscript𝑟0𝑅1subscript𝑟2superscriptsubscript𝑐112superscriptsubscript𝑐0𝑅122superscriptsubscript𝑐213superscriptsubscript𝑐0𝑅23¯𝛼subscript𝑐0subscript𝑎0superscript¯𝛼2𝑅\displaystyle\frac{1}{R}\sum_{r=0}^{R-1}\mathcal{E}_{r}\leq 2c_{1}^{\frac{1}{2% }}\left(\frac{c_{0}}{R}\right)^{\frac{1}{2}}+2c_{2}^{\frac{1}{3}}\left(\frac{c% _{0}}{R}\right)^{\frac{2}{3}}+\frac{\left(\underline{\alpha}c_{0}+a_{0}/% \underline{\alpha}^{2}\right)}{R}divide start_ARG 1 end_ARG start_ARG italic_R end_ARG ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R - 1 end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_R end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + 2 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_R end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG ( under¯ start_ARG italic_α end_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_R end_ARG (66)

Therefore, from (66) and plugging the parameters (65)

1Tt=0T1𝔼[f(𝐱¯t)2]𝒪(L(f(𝐱¯0)f)σ2nT)1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]superscriptnorm𝑓superscript¯𝐱𝑡2𝒪𝐿𝑓superscript¯𝐱0superscript𝑓superscript𝜎2𝑛𝑇\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}[\|\nabla f(\bar{{\bf{x}}}^{% t})\|^{2}]\leq\mathcal{O}\left(\sqrt{\frac{L(f(\bar{{\bf{x}}}^{0})-f^{\star})% \sigma^{2}}{nT}}\right)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ ∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( square-root start_ARG divide start_ARG italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG end_ARG )
+𝒪(χ3+χ(1p)(1λ2)χ23(L(f(𝐱¯0)f)σT)23)+𝒪(χL(f(𝐱¯0)f)1λ2+ς02T),𝒪3superscript𝜒3𝜒1𝑝1subscript𝜆2superscript𝜒2superscript𝐿𝑓superscript¯𝐱0superscript𝑓𝜎𝑇23𝒪𝜒𝐿𝑓superscript¯𝐱0superscript𝑓1subscript𝜆2superscriptsubscript𝜍02𝑇\displaystyle+\mathcal{O}\left(\sqrt[3]{\frac{\chi^{3}+\chi(1-p)}{(1-\lambda_{% 2})\chi^{2}}}\left(\frac{L(f(\bar{{\bf{x}}}^{0})-f^{\star})\sigma}{T}\right)^{% \frac{2}{3}}\right)+\mathcal{O}\left(\frac{\frac{\chi L(f(\bar{{\bf{x}}}^{0})-% f^{\star})}{1-\lambda_{2}}+\varsigma_{0}^{2}}{T}\right),+ caligraphic_O ( nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_χ ( 1 - italic_p ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ( divide start_ARG italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) italic_σ end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ) + caligraphic_O ( divide start_ARG divide start_ARG italic_χ italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) ,

i.e., the rate (38) holds. ∎

Appendix H Proof of Theorem 5

Proof.

Plugging f(𝐱¯t)22L(f(𝐱¯t)f)superscriptnorm𝑓superscript¯𝐱𝑡22𝐿𝑓superscript¯𝐱𝑡superscript𝑓\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}\leq 2L(f(\bar{{\bf{x}}}^{t})-f^{\star})∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_L ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) into (33) gives

𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]\leqblackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ γ~tF2+8nα4L32χ1λ21γ(f(𝐱¯t)f)+2α4L2σ22χ1λ21γ+2nα2σ2(2χ2+(1p))χ2.~𝛾superscriptsubscriptnormsuperscript𝑡F28𝑛superscript𝛼4superscript𝐿32𝜒1subscript𝜆21𝛾𝑓superscript¯𝐱𝑡superscript𝑓2superscript𝛼4superscript𝐿2superscript𝜎22𝜒1subscript𝜆21𝛾2𝑛superscript𝛼2superscript𝜎22superscript𝜒21𝑝superscript𝜒2\displaystyle\tilde{\gamma}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{8n\alpha% ^{4}L^{3}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}(f(\bar{{\bf{x}}}^{t})-f^{\star% })+\frac{2\alpha^{4}L^{2}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}+% \frac{2n\alpha^{2}\sigma^{2}(2\chi^{2}+(1-p))}{\chi^{2}}.over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 8 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (67)

Similar as Lemma 4, we know that

αmin{1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)312χ3414L},χ288(1p)1λ2γ~1+γ2<1.formulae-sequence𝛼1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2312superscript𝜒314𝐿𝜒2881𝑝1subscript𝜆2~𝛾1𝛾21\alpha\leq\min\left\{\frac{1-\lambda_{2}}{32\sqrt{3}\chi L},\sqrt{\frac{(1+% \lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L},\sqrt[4]{\frac{(1-\lambda_{2}% )^{3}}{12\chi^{3}}}\frac{1}{4L}\right\},\ \chi\geq\frac{288(1-p)}{1-\lambda_{2% }}\Longrightarrow\tilde{\gamma}\leq\frac{1+\gamma}{2}<1.italic_α ≤ roman_min { divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } , italic_χ ≥ divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⟹ over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1 .

Define the Lyapunov function

ct=𝐱¯t𝐱2+6αLn(1γ~)tF2.superscriptsubscriptc𝑡superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛1~𝛾superscriptsubscriptnormsuperscript𝑡F2\mathcal{L}_{\mathrm{c}}^{t}=\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\frac% {6\alpha L}{n(1-\tilde{\gamma})}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}.caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since 11γ4χ1λ211𝛾4𝜒1subscript𝜆2\frac{1}{1-\gamma}\leq\frac{4\chi}{1-\lambda_{2}}divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 4 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG and 11γ~21γ11~𝛾21𝛾\frac{1}{1-\tilde{\gamma}}\leq\frac{2}{1-\gamma}divide start_ARG 1 end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG end_ARG ≤ divide start_ARG 2 end_ARG start_ARG 1 - italic_γ end_ARG, we have

2χ1λ2(1γ)232χ3(1λ2)3 and 24α4L42χ1λ2(1γ~)(1γ)96α4L42χ1λ2(1γ)2.2𝜒1subscript𝜆2superscript1𝛾232superscript𝜒3superscript1subscript𝜆23 and 24superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾96superscript𝛼4superscript𝐿42𝜒1subscript𝜆2superscript1𝛾2\frac{\frac{2\chi}{1-\lambda_{2}}}{(1-\gamma)^{2}}\leq\frac{32\chi^{3}}{(1-% \lambda_{2})^{3}}\text{ and }\frac{24\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}% }}{(1-\tilde{\gamma})(1-\gamma)}\leq\frac{96\alpha^{4}L^{4}\frac{2\chi}{1-% \lambda_{2}}}{(1-\gamma)^{2}}.divide start_ARG divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 32 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG and divide start_ARG 24 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG ≤ divide start_ARG 96 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

It gives that

α(1λ2)324χ3414L12<148α4L42χ1λ2(1γ~)(1γ).𝛼4superscript1subscript𝜆2324superscript𝜒314𝐿12148superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾\alpha\leq\sqrt[4]{\frac{(1-\lambda_{2})^{3}}{24\chi^{3}}}\frac{1}{4L}% \Rightarrow\frac{1}{2}<1-\frac{48\alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{(% 1-\tilde{\gamma})(1-\gamma)}.italic_α ≤ nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 24 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG ⇒ divide start_ARG 1 end_ARG start_ARG 2 end_ARG < 1 - divide start_ARG 48 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG .

Thus, according to (2), (67), and μ=0𝜇0\mu=0italic_μ = 0, we have

𝔼[ct+1|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptc𝑡1superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\mathcal{L}_{\mathrm{c}}^{t+1}\;|\;\mathcal{G}^% {t}\right]\leqblackboard_E [ caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ 𝐱¯t𝐱2+6αLntF2+α2σ2nα(f(𝐱¯t)f(𝐱))superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝑡F2superscript𝛼2superscript𝜎2𝑛𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\frac{6\alpha L}{n}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{\alpha^{2}\sigma^{2}}{n}-\alpha(f(% \bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star}))∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) )
+6αLn(1γ~)(γ~tF2+8nα4L32χ1λ21γ(f(𝐱¯t)f)+2α4L2σ22χ1λ21γ+2nα2σ2(2χ2+(1p))χ2)6𝛼𝐿𝑛1~𝛾~𝛾superscriptsubscriptnormsuperscript𝑡F28𝑛superscript𝛼4superscript𝐿32𝜒1subscript𝜆21𝛾𝑓superscript¯𝐱𝑡superscript𝑓2superscript𝛼4superscript𝐿2superscript𝜎22𝜒1subscript𝜆21𝛾2𝑛superscript𝛼2superscript𝜎22superscript𝜒21𝑝superscript𝜒2\displaystyle+\frac{6\alpha L}{n(1-\tilde{\gamma})}\Big{(}\tilde{\gamma}\|% \mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{8n\alpha^{4}L^{3}\frac{2\chi}{1-% \lambda_{2}}}{1-\gamma}(f(\bar{{\bf{x}}}^{t})-f^{\star})+\frac{2\alpha^{4}L^{2% }\sigma^{2}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}+\frac{2n\alpha^{2}\sigma^{2}% (2\chi^{2}+(1-p))}{\chi^{2}}\Big{)}+ divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ( over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 8 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
=\displaystyle== 𝐱¯t𝐱2+6αLn(1γ~)tF2α(148α4L42χ1λ2(1γ~)(1γ))(f(𝐱¯t)f)superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛1~𝛾superscriptsubscriptnormsuperscript𝑡F2𝛼148superscript𝛼4superscript𝐿42𝜒1subscript𝜆21~𝛾1𝛾𝑓superscript¯𝐱𝑡superscript𝑓\displaystyle\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\frac{6\alpha L}{n(1-% \tilde{\gamma})}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}-\alpha\Big{(}1-\frac{48% \alpha^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{(1-\tilde{\gamma})(1-\gamma)}\Big{% )}(f(\bar{{\bf{x}}}^{t})-f^{\star})∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_α ( 1 - divide start_ARG 48 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG ) ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT )
+α2σ2n+12α5L3σ22χ1λ2n(1γ~)(1γ)+12α3Lσ2(2χ2+(1p))(1γ~)χ2superscript𝛼2superscript𝜎2𝑛12superscript𝛼5superscript𝐿3superscript𝜎22𝜒1subscript𝜆2𝑛1~𝛾1𝛾12superscript𝛼3𝐿superscript𝜎22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle+\frac{\alpha^{2}\sigma^{2}}{n}+\frac{12\alpha^{5}L^{3}\sigma^{2}% \frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{12\alpha^{3}% L\sigma^{2}(2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}}+ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq ctα2(f(𝐱¯t)f)+α2σ2n+12α5L3σ22χ1λ2n(1γ~)(1γ)+12α3Lσ2(2χ2+(1p))(1γ~)χ2.superscriptsubscriptc𝑡𝛼2𝑓superscript¯𝐱𝑡superscript𝑓superscript𝛼2superscript𝜎2𝑛12superscript𝛼5superscript𝐿3superscript𝜎22𝜒1subscript𝜆2𝑛1~𝛾1𝛾12superscript𝛼3𝐿superscript𝜎22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\mathcal{L}_{\mathrm{c}}^{t}-\frac{\alpha}{2}(f(\bar{{\bf{x}}}^{t% })-f^{\star})+\frac{\alpha^{2}\sigma^{2}}{n}+\frac{12\alpha^{5}L^{3}\sigma^{2}% \frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{12\alpha^{3}% L\sigma^{2}(2\chi^{2}+(1-p))}{(1-\tilde{\gamma})\chi^{2}}.caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Taking full expectation, we have

𝔼[ct+1]𝔼[ct]α2𝔼[f(𝐱¯t)f]+α2σ2n+12α5L3σ22χ1λ2n(1γ~)(1γ)+12α3Lσ2(2χ2+(1p))(1γ~)χ2.𝔼delimited-[]superscriptsubscriptc𝑡1𝔼delimited-[]superscriptsubscriptc𝑡𝛼2𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓superscript𝛼2superscript𝜎2𝑛12superscript𝛼5superscript𝐿3superscript𝜎22𝜒1subscript𝜆2𝑛1~𝛾1𝛾12superscript𝛼3𝐿superscript𝜎22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\mathbb{E}\!\left[\mathcal{L}_{\mathrm{c}}^{t+1}\right]\leq% \mathbb{E}\!\left[\mathcal{L}_{\mathrm{c}}^{t}\right]-\frac{\alpha}{2}\mathbb{% E}\!\left[f(\bar{{\bf{x}}}^{t})-f^{\star}\right]+\frac{\alpha^{2}\sigma^{2}}{n% }+\frac{12\alpha^{5}L^{3}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}}{n(1-\tilde{% \gamma})(1-\gamma)}+\frac{12\alpha^{3}L\sigma^{2}(2\chi^{2}+(1-p))}{(1-\tilde{% \gamma})\chi^{2}}.blackboard_E [ caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ] ≤ blackboard_E [ caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (68)

Summing the inequality (68) over t=0,1,,T1𝑡01𝑇1t=0,1,\cdots,T-1italic_t = 0 , 1 , ⋯ , italic_T - 1, we can obtain

α2t=0T1𝔼[f(𝐱¯t)f]c0+T(+α2σ2n+12α5L3σ22χ1λ2n(1γ~)(1γ)+12α3Lσ2(2χ2+(1p))(1γ~)χ2),𝛼2superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓superscriptsubscriptc0𝑇superscript𝛼2superscript𝜎2𝑛12superscript𝛼5superscript𝐿3superscript𝜎22𝜒1subscript𝜆2𝑛1~𝛾1𝛾12superscript𝛼3𝐿superscript𝜎22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\frac{\alpha}{2}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}% }^{t})-f^{\star}\right]\leq\mathcal{L}_{\mathrm{c}}^{0}+T\Big{(}+\frac{\alpha^% {2}\sigma^{2}}{n}+\frac{12\alpha^{5}L^{3}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}% }{n(1-\tilde{\gamma})(1-\gamma)}+\frac{12\alpha^{3}L\sigma^{2}(2\chi^{2}+(1-p)% )}{(1-\tilde{\gamma})\chi^{2}}\Big{)},divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_T ( + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 12 italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ,

which implies that

1Tt=0T1𝔼[f(𝐱¯t)f]2c0αT+2ασ2n+24α4L3σ22χ1λ2n(1γ~)(1γ)+24α2Lσ2(2χ2+(1p))(1γ~)χ2.1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓2superscriptsubscriptc0𝛼𝑇2𝛼superscript𝜎2𝑛24superscript𝛼4superscript𝐿3superscript𝜎22𝜒1subscript𝜆2𝑛1~𝛾1𝛾24superscript𝛼2𝐿superscript𝜎22superscript𝜒21𝑝1~𝛾superscript𝜒2\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leq\frac{2\mathcal{L}_{\mathrm{c}}^{0}}{\alpha T}+\frac{2% \alpha\sigma^{2}}{n}+\frac{24\alpha^{4}L^{3}\sigma^{2}\frac{2\chi}{1-\lambda_{% 2}}}{n(1-\tilde{\gamma})(1-\gamma)}+\frac{24\alpha^{2}L\sigma^{2}(2\chi^{2}+(1% -p))}{(1-\tilde{\gamma})\chi^{2}}.divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ divide start_ARG 2 caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 24 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) ( 1 - italic_γ ) end_ARG + divide start_ARG 24 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - over~ start_ARG italic_γ end_ARG ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (69)

Since 𝐗0=[𝐱0,,𝐱0]𝖳superscript𝐗0superscriptsuperscript𝐱0superscript𝐱0𝖳{\bf{X}}^{0}=[{\bf{x}}^{0},\cdots,{\bf{x}}^{0}]^{\sf T}bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = [ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, similar as (63), we have

c0superscriptsubscriptc0\displaystyle\mathcal{L}_{\mathrm{c}}^{0}caligraphic_L start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =𝐱¯0𝐱2+6αLn(1γ~)0F2𝐱¯0𝐱2+96χ2α3Lς02(1λ2)2.absentsuperscriptnormsuperscript¯𝐱0superscript𝐱26𝛼𝐿𝑛1~𝛾superscriptsubscriptnormsuperscript0F2superscriptnormsuperscript¯𝐱0superscript𝐱296superscript𝜒2superscript𝛼3𝐿subscriptsuperscript𝜍20superscript1subscript𝜆22\displaystyle=\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}+\frac{6\alpha L}{n(1% -\tilde{\gamma})}\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}\leq\|\bar{{\bf{x}}}^{0}-% {\bf{x}}^{\star}\|^{2}+\frac{96\chi^{2}\alpha^{3}L\varsigma^{2}_{0}}{(1-% \lambda_{2})^{2}}.= ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG ) end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 96 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (70)

Substituting (70) into (69) and using

γ~1+γ2<1,11γ4χ1λ2,formulae-sequence~𝛾1𝛾2111𝛾4𝜒1subscript𝜆2\tilde{\gamma}\leq\frac{1+\gamma}{2}<1,\ \frac{1}{1-\gamma}\leq\frac{4\chi}{1-% \lambda_{2}},over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1 , divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 4 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ,

we can derive that

1Tt=0T1𝔼[f(𝐱¯t)f]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ 2𝐱¯0𝐱2αT+192χ2α2Lς02(1λ2)2T+2ασ2n+1536χ3α4L3σ2n(1λ2)3+192α2Lσ2χ(2χ2+(1p))(1λ2)χ2.2superscriptnormsuperscript¯𝐱0superscript𝐱2𝛼𝑇192superscript𝜒2superscript𝛼2𝐿subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇2𝛼superscript𝜎2𝑛1536superscript𝜒3superscript𝛼4superscript𝐿3superscript𝜎2𝑛superscript1subscript𝜆23192superscript𝛼2𝐿superscript𝜎2𝜒2superscript𝜒21𝑝1subscript𝜆2superscript𝜒2\displaystyle\frac{2\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}}{\alpha T}+% \frac{192\chi^{2}\alpha^{2}L\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}+\frac{2% \alpha\sigma^{2}}{n}+\frac{1536\chi^{3}\alpha^{4}L^{3}\sigma^{2}}{n(1-\lambda_% {2})^{3}}+\frac{192\alpha^{2}L\sigma^{2}\chi(2\chi^{2}+(1-p))}{(1-\lambda_{2})% \chi^{2}}.divide start_ARG 2 ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 1536 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 192 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since α1λ2323χL𝛼1subscript𝜆2323𝜒𝐿\alpha\leq\frac{1-\lambda_{2}}{32\sqrt{3}\chi L}italic_α ≤ divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG, we have 1536σ2L3α4χ3n(1λ2)3α2Lσ2χ2n(1λ2)α2Lσ2χ2(1λ2)1536superscript𝜎2superscript𝐿3superscript𝛼4superscript𝜒3𝑛superscript1subscript𝜆23superscript𝛼2𝐿superscript𝜎2𝜒2𝑛1subscript𝜆2superscript𝛼2𝐿superscript𝜎2𝜒21subscript𝜆2\frac{1536\sigma^{2}L^{3}\alpha^{4}\chi^{3}}{n(1-\lambda_{2})^{3}}\leq\frac{% \alpha^{2}L\sigma^{2}\chi}{2n(1-\lambda_{2})}\leq\frac{\alpha^{2}L\sigma^{2}% \chi}{2(1-\lambda_{2})}divide start_ARG 1536 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 2 italic_n ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG, it holds that

1Tt=0T1𝔼[f(𝐱¯t)f]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ 2𝐱¯0𝐱2αT+192χ2α2Lς02(1λ2)2T+2ασ2n+α2Lσ2χ3+384α2Lσ2χ(2χ2+(1p))2(1λ2)χ2.2superscriptnormsuperscript¯𝐱0superscript𝐱2𝛼𝑇192superscript𝜒2superscript𝛼2𝐿subscriptsuperscript𝜍20superscript1subscript𝜆22𝑇2𝛼superscript𝜎2𝑛superscript𝛼2𝐿superscript𝜎2superscript𝜒3384superscript𝛼2𝐿superscript𝜎2𝜒2superscript𝜒21𝑝21subscript𝜆2superscript𝜒2\displaystyle\frac{2\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}}{\alpha T}+% \frac{192\chi^{2}\alpha^{2}L\varsigma^{2}_{0}}{(1-\lambda_{2})^{2}T}+\frac{2% \alpha\sigma^{2}}{n}+\frac{\alpha^{2}L\sigma^{2}\chi^{3}+384\alpha^{2}L\sigma^% {2}\chi(2\chi^{2}+(1-p))}{2(1-\lambda_{2})\chi^{2}}.divide start_ARG 2 ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + divide start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 384 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

i.e., (5) holds. ∎

Appendix I Proof of Corollary 3

Proof.

Then, we derive a tighter rate by carefully selecting the step size similar to Corollary 2. From the condition of stepsize, we have

α1α¯=min{14L,1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)324χ3414L}=𝒪(1λ2χL).𝛼1¯𝛼14𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2324superscript𝜒314𝐿𝒪1subscript𝜆2𝜒𝐿\alpha\leq\frac{1}{\underline{\alpha}}=\min\left\{\frac{1}{4L},\frac{1-\lambda% _{2}}{32\sqrt{3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-\lambda_{2})}{2\chi}}% \frac{1}{2L},\sqrt[4]{\frac{(1-\lambda_{2})^{3}}{24\chi^{3}}}\frac{1}{4L}% \right\}=\mathcal{O}\left(\frac{1-\lambda_{2}}{\chi L}\right).italic_α ≤ divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG = roman_min { divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 24 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } = caligraphic_O ( divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_χ italic_L end_ARG ) .

Similar as the proof of Theorem 4, it follows that

1Tt=0T1𝔼[f(𝐱¯t)f]c0αT+c1α+c2α2:=ΨT+a0α2T,1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓subscriptsubscript𝑐0𝛼𝑇subscript𝑐1𝛼subscript𝑐2superscript𝛼2assignabsentsubscriptΨ𝑇subscript𝑎0superscript𝛼2𝑇\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leq\underbrace{\frac{c_{0}}{\alpha T}+c_{1}\alpha+c_{2}% \alpha^{2}}_{:=\Psi_{T}}+\frac{a_{0}\alpha^{2}}{T},divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ under⏟ start_ARG divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_α italic_T end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT := roman_Ψ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ,

where

c0=2𝐱¯0𝐱2,c1=2σ2n,c2=Lσ2(χ3+384χ(2χ2+(1p)))2(1λ2)χ2,a0=192χ2Lς02(1λ2)2.formulae-sequencesubscript𝑐02superscriptnormsuperscript¯𝐱0superscript𝐱2formulae-sequencesubscript𝑐12superscript𝜎2𝑛formulae-sequencesubscript𝑐2𝐿superscript𝜎2superscript𝜒3384𝜒2superscript𝜒21𝑝21subscript𝜆2superscript𝜒2subscript𝑎0192superscript𝜒2𝐿subscriptsuperscript𝜍20superscript1subscript𝜆22c_{0}=2\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2},\ c_{1}=\frac{2\sigma^{2}}{% n},\ c_{2}=\frac{L\sigma^{2}\big{(}\chi^{3}+384\chi(2\chi^{2}+(1-p))\big{)}}{2% (1-\lambda_{2})\chi^{2}},\ a_{0}=\frac{192\chi^{2}L\varsigma^{2}_{0}}{(1-% \lambda_{2})^{2}}.italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2 ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 384 italic_χ ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) ) end_ARG start_ARG 2 ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_ς start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Then, the following rate can be obtained by following the same arguments used for the noncovex case,

1Tt=0T1𝔼[f(𝐱¯t)f]1𝑇superscriptsubscript𝑡0𝑇1𝔼delimited-[]𝑓superscript¯𝐱𝑡superscript𝑓absent\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\mathbb{E}\!\left[f(\bar{{\bf{x}}}^{t}% )-f^{\star}\right]\leqdivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ] ≤ 𝒪(𝐱¯0𝐱2σ2nT+χ3+χ(1p)(1λ2)χ23L13(𝐱¯0𝐱2σT)23+χL𝐱¯0𝐱21λ2+ς02T),𝒪superscriptnormsuperscript¯𝐱0superscript𝐱2superscript𝜎2𝑛𝑇3superscript𝜒3𝜒1𝑝1subscript𝜆2superscript𝜒2superscript𝐿13superscriptsuperscriptnormsuperscript¯𝐱0superscript𝐱2𝜎𝑇23𝜒𝐿superscriptnormsuperscript¯𝐱0superscript𝐱21subscript𝜆2superscriptsubscript𝜍02𝑇\displaystyle\mathcal{O}\left(\sqrt{\frac{\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star% }\|^{2}\sigma^{2}}{nT}}+\sqrt[3]{\frac{\chi^{3}+\chi(1-p)}{(1-\lambda_{2})\chi% ^{2}}}L^{\frac{1}{3}}\Big{(}\frac{\|\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\|^{2}% \sigma}{T}\Big{)}^{\frac{2}{3}}+\frac{\frac{\chi L\|\bar{{\bf{x}}}^{0}-{\bf{x}% }^{\star}\|^{2}}{1-\lambda_{2}}+\varsigma_{0}^{2}}{T}\right),caligraphic_O ( square-root start_ARG divide start_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_T end_ARG end_ARG + nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_χ ( 1 - italic_p ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG italic_L start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG divide start_ARG italic_χ italic_L ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) ,

i.e., (41) holds. ∎

Appendix J Proof of Theorem 6

Proof.

Recall (33)

𝔼[𝐱¯t+1𝐱2|𝒢t](1μα)𝐱¯t𝐱2+6αLntF2+α2σ2nα(f(𝐱¯t)f(𝐱)),𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝒢𝑡1𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝑡F2superscript𝛼2superscript𝜎2𝑛𝛼𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\;|\;\mathcal{G}^{t}\right]\leq(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{% \bf{x}}^{\star}\|^{2}+\frac{6\alpha L}{n}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+% \frac{\alpha^{2}\sigma^{2}}{n}-\alpha(f(\bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star}% )),blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - italic_α ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ,

From (33) and (2), we have

𝔼[𝐱¯t+1𝐱2|𝒢t](1μα)𝐱¯t𝐱2+6αLntF2+α2σ2n,𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐱𝑡1superscript𝐱2superscript𝒢𝑡1𝜇𝛼superscriptnormsuperscript¯𝐱𝑡superscript𝐱26𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝑡F2superscript𝛼2superscript𝜎2𝑛\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t+1}-{\bf{x}}^{\star}% \big{\|}^{2}\;|\;\mathcal{G}^{t}\right]\leq(1-\mu\alpha)\|\bar{{\bf{x}}}^{t}-{% \bf{x}}^{\star}\|^{2}+\frac{6\alpha L}{n}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+% \frac{\alpha^{2}\sigma^{2}}{n},blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ,

and

𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]\leqblackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ γ~tF2+4nα4L42χ1λ21γ𝐱¯t𝐱2+2α4L2σ22χ1λ21γ+2nα2σ2(2χ2+(1p))χ2,~𝛾superscriptsubscriptnormsuperscript𝑡F24𝑛superscript𝛼4superscript𝐿42𝜒1subscript𝜆21𝛾superscriptnormsuperscript¯𝐱𝑡superscript𝐱22superscript𝛼4superscript𝐿2superscript𝜎22𝜒1subscript𝜆21𝛾2𝑛superscript𝛼2superscript𝜎22superscript𝜒21𝑝superscript𝜒2\displaystyle\tilde{\gamma}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{4n\alpha% ^{4}L^{4}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{% \star}\|^{2}+\frac{2\alpha^{4}L^{2}\sigma^{2}\frac{2\chi}{1-\lambda_{2}}}{1-% \gamma}+\frac{2n\alpha^{2}\sigma^{2}(2\chi^{2}+(1-p))}{\chi^{2}},over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_n italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG + divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_p ) ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where the last inequality follows from f(𝐱¯t)2L2𝐱¯t𝐱2superscriptnorm𝑓superscript¯𝐱𝑡2superscript𝐿2superscriptnormsuperscript¯𝐱𝑡superscript𝐱2\|\nabla f(\bar{{\bf{x}}}^{t})\|^{2}\leq L^{2}\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{% \star}\|^{2}∥ ∇ italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Similar as Lemma 4, we know that

αmin{1λ2323χL,(1+λn)(1λ2)2χ12L,(1λ2)312χ3414L},χ288(1p)1λ2γ~1+γ2<1.formulae-sequence𝛼1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿4superscript1subscript𝜆2312superscript𝜒314𝐿𝜒2881𝑝1subscript𝜆2~𝛾1𝛾21\alpha\leq\min\left\{\frac{1-\lambda_{2}}{32\sqrt{3}\chi L},\sqrt{\frac{(1+% \lambda_{n})(1-\lambda_{2})}{2\chi}}\frac{1}{2L},\sqrt[4]{\frac{(1-\lambda_{2}% )^{3}}{12\chi^{3}}}\frac{1}{4L}\right\},\ \chi\geq\frac{288(1-p)}{1-\lambda_{2% }}\Longrightarrow\tilde{\gamma}\leq\frac{1+\gamma}{2}<1.italic_α ≤ roman_min { divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , nth-root start_ARG 4 end_ARG start_ARG divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 4 italic_L end_ARG } , italic_χ ≥ divide start_ARG 288 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⟹ over~ start_ARG italic_γ end_ARG ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1 .

Since α1λ2323χL𝛼1subscript𝜆2323𝜒𝐿\alpha\leq\frac{1-\lambda_{2}}{32\sqrt{3}\chi L}italic_α ≤ divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG and 2χ1λ21γ8χ2(1λ2)22𝜒1subscript𝜆21𝛾8superscript𝜒2superscript1subscript𝜆22\frac{\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\leq\frac{8\chi^{2}}{(1-\lambda_{2% })^{2}}divide start_ARG divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 8 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, we have α22χ1λ21γ1384L2superscript𝛼22𝜒1subscript𝜆21𝛾1384superscript𝐿2\frac{\alpha^{2}\frac{2\chi}{1-\lambda_{2}}}{1-\gamma}\leq\frac{1}{384L^{2}}divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 2 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 1 end_ARG start_ARG 384 italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Thus, it holds that

𝔼[t+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝑡1F2superscript𝒢𝑡absent\displaystyle\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]\leqblackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ γ~tF2+nα2L296𝐱¯t𝐱2+nα2σ2(192χ2+(4χ2+2(1p)))192χ2.~𝛾superscriptsubscriptnormsuperscript𝑡F2𝑛superscript𝛼2superscript𝐿296superscriptnormsuperscript¯𝐱𝑡superscript𝐱2𝑛superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝192superscript𝜒2\displaystyle\tilde{\gamma}\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}+\frac{n\alpha^% {2}L^{2}}{96}\|\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\|^{2}+\frac{n\alpha^{2}% \sigma^{2}(192\chi^{2}+(4\chi^{2}+2(1-p)))}{192\chi^{2}}.over~ start_ARG italic_γ end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Then, it follows that

[𝔼[𝐱¯t+1𝐱2]1n𝔼[t+1F2]][1μα6αLα2L2961+γ2]:=A[𝔼[𝐱¯t𝐱2]1n𝔼[tF2]]+[α2σ2nα2σ2(192χ2+(4χ2+2(1p)))192χ2]:=b.delimited-[]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡1superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡1F2subscriptdelimited-[]1𝜇𝛼6𝛼𝐿superscript𝛼2superscript𝐿2961𝛾2assignabsent𝐴delimited-[]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡F2subscriptdelimited-[]superscript𝛼2superscript𝜎2𝑛superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝192superscript𝜒2assignabsent𝑏\displaystyle\left[\begin{array}[]{c}\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^% {t+1}-{\bf{x}}^{\star}\big{\|}^{2}\right]\\ \frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\right]\\ \end{array}\right]\leq\underbrace{\left[\begin{array}[]{cc}1-\mu\alpha&6\alpha L% \\ \frac{\alpha^{2}L^{2}}{96}&\frac{1+\gamma}{2}\\ \end{array}\right]}_{:=A}\left[\begin{array}[]{c}\mathbb{E}\!\left[\big{\|}% \bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\big{\|}^{2}\right]\\ \frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{t}\|_{\mathrm{F}}^{2}\right]\\ \end{array}\right]+\underbrace{\left[\begin{array}[]{c}\frac{\alpha^{2}\sigma^% {2}}{n}\\ \frac{\alpha^{2}\sigma^{2}(192\chi^{2}+(4\chi^{2}+2(1-p)))}{192\chi^{2}}\\ \end{array}\right]}_{:=b}.[ start_ARRAY start_ROW start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW end_ARRAY ] ≤ under⏟ start_ARG [ start_ARRAY start_ROW start_CELL 1 - italic_μ italic_α end_CELL start_CELL 6 italic_α italic_L end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG end_CELL start_CELL divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := italic_A end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW end_ARRAY ] + under⏟ start_ARG [ start_ARRAY start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARRAY ] end_ARG start_POSTSUBSCRIPT := italic_b end_POSTSUBSCRIPT . (79)

Note that

αmin{72μL2,1γ12L+μ/2}AA1=max{1μα+α2L296,6αL+1+γ2}1μα4<1.𝛼72𝜇superscript𝐿21𝛾12𝐿𝜇2norm𝐴subscriptnorm𝐴11𝜇𝛼superscript𝛼2superscript𝐿2966𝛼𝐿1𝛾21𝜇𝛼41\displaystyle\alpha\leq\min\left\{\frac{72\mu}{L^{2}},\frac{1-\gamma}{12L+% \nicefrac{{\mu}}{{2}}}\right\}\Longrightarrow\|A\|\leq\|A\|_{1}=\max\left\{1-% \mu\alpha+\frac{\alpha^{2}L^{2}}{96},6\alpha L+\frac{1+\gamma}{2}\right\}\leq 1% -\frac{\mu\alpha}{4}<1.italic_α ≤ roman_min { divide start_ARG 72 italic_μ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG 1 - italic_γ end_ARG start_ARG 12 italic_L + / start_ARG italic_μ end_ARG start_ARG 2 end_ARG end_ARG } ⟹ ∥ italic_A ∥ ≤ ∥ italic_A ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_max { 1 - italic_μ italic_α + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG , 6 italic_α italic_L + divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG } ≤ 1 - divide start_ARG italic_μ italic_α end_ARG start_ARG 4 end_ARG < 1 .

Since A<1norm𝐴1\|A\|<1∥ italic_A ∥ < 1, we can iterate inequality (79) to get

[𝔼[𝐱¯t+1𝐱2]1n𝔼[t+1F2]]At[𝔼[𝐱¯0𝐱2]1n𝔼[0F2]]+=0t1AbAt[𝔼[𝐱¯0𝐱2]1n𝔼[0F2]]+(IA)1b.delimited-[]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡1superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡1F2superscript𝐴𝑡delimited-[]𝔼delimited-[]superscriptnormsuperscript¯𝐱0superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript0F2superscriptsubscript0𝑡1superscript𝐴𝑏superscript𝐴𝑡delimited-[]𝔼delimited-[]superscriptnormsuperscript¯𝐱0superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript0F2superscript𝐼𝐴1𝑏\displaystyle\left[\begin{array}[]{c}\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^% {t+1}-{\bf{x}}^{\star}\big{\|}^{2}\right]\\ \frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{t+1}\|_{\mathrm{F}}^{2}\right]\\ \end{array}\right]\leq A^{t}\left[\begin{array}[]{c}\mathbb{E}\!\left[\big{\|}% \bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}\right]\\ \frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}\right]\\ \end{array}\right]+\sum_{\ell=0}^{t-1}A^{\ell}b\leq A^{t}\left[\begin{array}[]% {c}\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}% \right]\\ \frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}\right]\\ \end{array}\right]+(I-A)^{-1}b.[ start_ARRAY start_ROW start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW end_ARRAY ] ≤ italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW end_ARRAY ] + ∑ start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_b ≤ italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW end_ARRAY ] + ( italic_I - italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b .

Taking the 1-induced-norm and using properties of the (induced) norms, it holds that

𝔼[𝐱¯t𝐱2]+1n𝔼[tF2]At1a0+(IA)1b1A1ta0+(IA)1b1,𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡superscript𝐱21𝑛𝔼delimited-[]superscriptsubscriptnormsuperscript𝑡F2subscriptnormsuperscript𝐴𝑡1subscript𝑎0subscriptnormsuperscript𝐼𝐴1𝑏1superscriptsubscriptnorm𝐴1𝑡subscript𝑎0subscriptnormsuperscript𝐼𝐴1𝑏1\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\big% {\|}^{2}\right]+\frac{1}{n}\mathbb{E}\!\left[\|\mathcal{E}^{t}\|_{\mathrm{F}}^% {2}\right]\leq\|A^{t}\|_{1}a_{0}+\|(I-A)^{-1}b\|_{1}\leq\|A\|_{1}^{t}a_{0}+\|(% I-A)^{-1}b\|_{1},blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ caligraphic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ ∥ italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∥ ( italic_I - italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ ∥ italic_A ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∥ ( italic_I - italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (80)

where a0=𝐱¯0𝐱2+1n0F2subscript𝑎0superscriptnormsuperscript¯𝐱0superscript𝐱21𝑛superscriptsubscriptnormsuperscript0F2a_{0}=\big{\|}\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}+\frac{1}{n}\|% \mathcal{E}^{0}\|_{\mathrm{F}}^{2}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We now bound the last term by noting that

(IA)1bsuperscript𝐼𝐴1𝑏\displaystyle(I-A)^{-1}b( italic_I - italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b =[μα6αLα2L2961γ2]1b=1det(IA)[1γ26αLα2L296μα]babsentsuperscriptdelimited-[]𝜇𝛼6𝛼𝐿superscript𝛼2superscript𝐿2961𝛾21𝑏1det𝐼𝐴delimited-[]1𝛾26𝛼𝐿superscript𝛼2superscript𝐿296𝜇𝛼𝑏\displaystyle=\left[\begin{array}[]{cc}\mu\alpha&-6\alpha L\\ -\frac{\alpha^{2}L^{2}}{96}&\frac{1-\gamma}{2}\\ \end{array}\right]^{-1}b=\frac{1}{\mathrm{det}(I-A)}\left[\begin{array}[]{cc}% \frac{1-\gamma}{2}&6\alpha L\\ \frac{\alpha^{2}L^{2}}{96}&\mu\alpha\\ \end{array}\right]b= [ start_ARRAY start_ROW start_CELL italic_μ italic_α end_CELL start_CELL - 6 italic_α italic_L end_CELL end_ROW start_ROW start_CELL - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG end_CELL start_CELL divide start_ARG 1 - italic_γ end_ARG start_ARG 2 end_ARG end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b = divide start_ARG 1 end_ARG start_ARG roman_det ( italic_I - italic_A ) end_ARG [ start_ARRAY start_ROW start_CELL divide start_ARG 1 - italic_γ end_ARG start_ARG 2 end_ARG end_CELL start_CELL 6 italic_α italic_L end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG end_CELL start_CELL italic_μ italic_α end_CELL end_ROW end_ARRAY ] italic_b
=1μα(1γ)(12α3L316μ(1γ))[1γ26αLα2L296μα][α2σ2nα2σ2(192χ2+(4χ2+2(1p)))192χ2]absent1𝜇𝛼1𝛾12superscript𝛼3superscript𝐿316𝜇1𝛾delimited-[]1𝛾26𝛼𝐿superscript𝛼2superscript𝐿296𝜇𝛼delimited-[]superscript𝛼2superscript𝜎2𝑛superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝192superscript𝜒2\displaystyle=\frac{1}{\mu\alpha(1-\gamma)(\frac{1}{2}-\frac{\alpha^{3}L^{3}}{% 16\mu(1-\gamma)})}\left[\begin{array}[]{cc}\frac{1-\gamma}{2}&6\alpha L\\ \frac{\alpha^{2}L^{2}}{96}&\mu\alpha\\ \end{array}\right]\left[\begin{array}[]{c}\frac{\alpha^{2}\sigma^{2}}{n}\\ \frac{\alpha^{2}\sigma^{2}(192\chi^{2}+(4\chi^{2}+2(1-p)))}{192\chi^{2}}\\ \end{array}\right]= divide start_ARG 1 end_ARG start_ARG italic_μ italic_α ( 1 - italic_γ ) ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 16 italic_μ ( 1 - italic_γ ) end_ARG ) end_ARG [ start_ARRAY start_ROW start_CELL divide start_ARG 1 - italic_γ end_ARG start_ARG 2 end_ARG end_CELL start_CELL 6 italic_α italic_L end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 end_ARG end_CELL start_CELL italic_μ italic_α end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARRAY ]
4αμ(1γ)[(1γ)α2σ22n+6Lα3σ2(192χ2+(4χ2+2(1p)))192χ2α4L2σ296n+μα3σ2(192χ2+(4χ2+2(1p)))192χ2],absent4𝛼𝜇1𝛾delimited-[]1𝛾superscript𝛼2superscript𝜎22𝑛6𝐿superscript𝛼3superscript𝜎2192superscript𝜒24superscript𝜒221𝑝192superscript𝜒2superscript𝛼4superscript𝐿2superscript𝜎296𝑛𝜇superscript𝛼3superscript𝜎2192superscript𝜒24superscript𝜒221𝑝192superscript𝜒2\displaystyle\leq\frac{4}{\alpha\mu(1-\gamma)}\left[\begin{array}[]{c}\frac{(1% -\gamma)\alpha^{2}\sigma^{2}}{2n}+\frac{6L\alpha^{3}\sigma^{2}(192\chi^{2}+(4% \chi^{2}+2(1-p)))}{192\chi^{2}}\\ \frac{\alpha^{4}L^{2}\sigma^{2}}{96n}+\frac{\mu\alpha^{3}\sigma^{2}(192\chi^{2% }+(4\chi^{2}+2(1-p)))}{192\chi^{2}}\\ \end{array}\right],≤ divide start_ARG 4 end_ARG start_ARG italic_α italic_μ ( 1 - italic_γ ) end_ARG [ start_ARRAY start_ROW start_CELL divide start_ARG ( 1 - italic_γ ) italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 6 italic_L italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 96 italic_n end_ARG + divide start_ARG italic_μ italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARRAY ] ,

where the last step holds for α4μ(1γ)31L𝛼34𝜇1𝛾1𝐿\alpha\leq\sqrt[3]{4\mu(1-\gamma)}\frac{1}{L}italic_α ≤ nth-root start_ARG 3 end_ARG start_ARG 4 italic_μ ( 1 - italic_γ ) end_ARG divide start_ARG 1 end_ARG start_ARG italic_L end_ARG. Therefore,

(IA)1b12ασ2μn+(6Lα2σ2+μα2σ2)(192χ2+(4χ2+2(1p)))48μ(1γ)χ2.subscriptnormsuperscript𝐼𝐴1𝑏12𝛼superscript𝜎2𝜇𝑛6𝐿superscript𝛼2superscript𝜎2𝜇superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝48𝜇1𝛾superscript𝜒2\|(I-A)^{-1}b\|_{1}\leq\frac{2\alpha\sigma^{2}}{\mu n}+\frac{(6L\alpha^{2}% \sigma^{2}+\mu\alpha^{2}\sigma^{2})(192\chi^{2}+(4\chi^{2}+2(1-p)))}{48\mu(1-% \gamma)\chi^{2}}.∥ ( italic_I - italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG + divide start_ARG ( 6 italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 48 italic_μ ( 1 - italic_γ ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Substituting the above into (80) and using A1t(1αμ4)tsubscriptsuperscriptnorm𝐴𝑡1superscript1𝛼𝜇4𝑡\|A\|^{t}_{1}\leq(1-\frac{\alpha\mu}{4})^{t}∥ italic_A ∥ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ ( 1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and μL𝜇𝐿\mu\leq Litalic_μ ≤ italic_L, we obtain

𝔼[𝐱¯t𝐱2](1αμ4)ta0+2ασ2μn+7Lα2σ2(192χ2+(4χ2+2(1p)))48μ(1γ)χ2.𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑡superscript𝐱2superscript1𝛼𝜇4𝑡subscript𝑎02𝛼superscript𝜎2𝜇𝑛7𝐿superscript𝛼2superscript𝜎2192superscript𝜒24superscript𝜒221𝑝48𝜇1𝛾superscript𝜒2\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{t}-{\bf{x}}^{\star}\big{\|}^{2}% \right]\leq\Big{(}1-\frac{\alpha\mu}{4}\Big{)}^{t}a_{0}+\frac{2\alpha\sigma^{2% }}{\mu n}+\frac{7L\alpha^{2}\sigma^{2}(192\chi^{2}+(4\chi^{2}+2(1-p)))}{48\mu(% 1-\gamma)\chi^{2}}.blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ ( 1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG 2 italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG + divide start_ARG 7 italic_L italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 48 italic_μ ( 1 - italic_γ ) italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since 𝐗0=[𝐱0,,𝐱0]𝖳superscript𝐗0superscriptsuperscript𝐱0superscript𝐱0𝖳{\bf{X}}^{0}=[{\bf{x}}^{0},\cdots,{\bf{x}}^{0}]^{\sf T}bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = [ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, by [45, (75)], we have 0F22α2(𝐈𝚲^a)1F(𝐗0)𝟏n(f(𝐱0))𝖳2=2nα2ς021γsuperscriptsubscriptnormsuperscript0F22superscript𝛼2normsuperscript𝐈subscript^𝚲𝑎1superscriptnorm𝐹superscript𝐗0tensor-productsubscript1𝑛superscript𝑓superscript𝐱0𝖳22𝑛superscript𝛼2superscriptsubscript𝜍021𝛾\|\mathcal{E}^{0}\|_{\mathrm{F}}^{2}\leq 2\alpha^{2}\|({\bf{I}}-\hat{{\bf{% \Lambda}}}_{a})^{-1}\|\|\nabla F({\bf{X}}^{0})-{\bf{1}}_{n}\otimes(\nabla f({% \bf{x}}^{0}))^{\sf T}\|^{2}=\frac{2n\alpha^{2}\varsigma_{0}^{2}}{1-\gamma}∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_I - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) - bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊗ ( ∇ italic_f ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 2 italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_γ end_ARG. Note that 11γ4χ1λ211𝛾4𝜒1subscript𝜆2\frac{1}{1-\gamma}\leq\frac{4\chi}{1-\lambda_{2}}divide start_ARG 1 end_ARG start_ARG 1 - italic_γ end_ARG ≤ divide start_ARG 4 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG. It holds that

a0=𝐱¯0𝐱2+1n0F2𝐱¯0𝐱2+8χα2ς021λ2subscript𝑎0superscriptnormsuperscript¯𝐱0superscript𝐱21𝑛superscriptsubscriptnormsuperscript0F2superscriptnormsuperscript¯𝐱0superscript𝐱28𝜒superscript𝛼2superscriptsubscript𝜍021subscript𝜆2a_{0}=\big{\|}\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2}+\frac{1}{n}\|% \mathcal{E}^{0}\|_{\mathrm{F}}^{2}\leq\big{\|}\bar{{\bf{x}}}^{0}-{\bf{x}}^{% \star}\big{\|}^{2}+\frac{8\chi\alpha^{2}\varsigma_{0}^{2}}{1-\lambda_{2}}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 8 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG

Thus, we finally obtain (6). ∎

Appendix K Proof of Corollary 4

Proof.

Recall (6)

𝔼[𝐱¯T𝐱2]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\big% {\|}^{2}\right]blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (1αμ4)T(c0+b0α2)+c1α+c2α2absentsuperscript1𝛼𝜇4𝑇subscript𝑐0subscript𝑏0superscript𝛼2subscript𝑐1𝛼subscript𝑐2superscript𝛼2\displaystyle\leq\Big{(}1-\frac{\alpha\mu}{4}\Big{)}^{T}\Big{(}c_{0}+b_{0}% \alpha^{2}\Big{)}+c_{1}\alpha+c_{2}\alpha^{2}≤ ( 1 - divide start_ARG italic_α italic_μ end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
exp(αμ2T)(c0+b0α2)+c1α+c2α2,absentexp𝛼𝜇2𝑇subscript𝑐0subscript𝑏0superscript𝛼2subscript𝑐1𝛼subscript𝑐2superscript𝛼2\displaystyle\leq{\mathrm{exp}}\Big{(}-\frac{\alpha\mu}{2}T\Big{)}\Big{(}c_{0}% +b_{0}\alpha^{2}\Big{)}+c_{1}\alpha+c_{2}\alpha^{2},≤ roman_exp ( - divide start_ARG italic_α italic_μ end_ARG start_ARG 2 end_ARG italic_T ) ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (81)

where

c0=𝐱¯0𝐱2,b0=8χς021λ2,c1=2σ2μn,c2=7Lσ2(192χ2+(4χ2+2(1p)))12μ(1λ2)χ.formulae-sequencesubscript𝑐0superscriptnormsuperscript¯𝐱0superscript𝐱2formulae-sequencesubscript𝑏08𝜒superscriptsubscript𝜍021subscript𝜆2formulae-sequencesubscript𝑐12superscript𝜎2𝜇𝑛subscript𝑐27𝐿superscript𝜎2192superscript𝜒24superscript𝜒221𝑝12𝜇1subscript𝜆2𝜒c_{0}=\big{\|}\bar{{\bf{x}}}^{0}-{\bf{x}}^{\star}\big{\|}^{2},\ b_{0}=\frac{8% \chi\varsigma_{0}^{2}}{1-\lambda_{2}},\ c_{1}=\frac{2\sigma^{2}}{\mu n},\ c_{2% }=\frac{7L\sigma^{2}(192\chi^{2}+(4\chi^{2}+2(1-p)))}{12\mu(1-\lambda_{2})\chi}.italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 8 italic_χ italic_ς start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_n end_ARG , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG 7 italic_L italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 192 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( 1 - italic_p ) ) ) end_ARG start_ARG 12 italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_χ end_ARG .

From the setpsize condition, we have

α1α¯min{12L,1λ2323χL,(1+λn)(1λ2)2χ12L,72μL2,1γ12L+μ/2,4μ(1γ)31L}=𝒪(μ(1λ2)χL2).𝛼1¯𝛼12𝐿1subscript𝜆2323𝜒𝐿1subscript𝜆𝑛1subscript𝜆22𝜒12𝐿72𝜇superscript𝐿21𝛾12𝐿𝜇234𝜇1𝛾1𝐿𝒪𝜇1subscript𝜆2𝜒superscript𝐿2\displaystyle\alpha\leq\frac{1}{\underline{\alpha}}\triangleq\min\left\{\frac{% 1}{2L},\frac{1-\lambda_{2}}{32\sqrt{3}\chi L},\sqrt{\frac{(1+\lambda_{n})(1-% \lambda_{2})}{2\chi}}\frac{1}{2L},\frac{72\mu}{L^{2}},\frac{1-\gamma}{12L+% \nicefrac{{\mu}}{{2}}},\sqrt[3]{4\mu(1-\gamma)}\frac{1}{L}\right\}=\mathcal{O}% \left(\frac{\mu(1-\lambda_{2})}{\chi L^{2}}\right).italic_α ≤ divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG ≜ roman_min { divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 32 square-root start_ARG 3 end_ARG italic_χ italic_L end_ARG , square-root start_ARG divide start_ARG ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG , divide start_ARG 72 italic_μ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG 1 - italic_γ end_ARG start_ARG 12 italic_L + / start_ARG italic_μ end_ARG start_ARG 2 end_ARG end_ARG , nth-root start_ARG 3 end_ARG start_ARG 4 italic_μ ( 1 - italic_γ ) end_ARG divide start_ARG 1 end_ARG start_ARG italic_L end_ARG } = caligraphic_O ( divide start_ARG italic_μ ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_χ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Now we select α=min{ln(max{1,μ(c0+b0/α¯2)T/c1})μT,1α¯}1α¯𝛼1𝜇subscript𝑐0subscript𝑏0superscript¯𝛼2𝑇subscript𝑐1𝜇𝑇1¯𝛼1¯𝛼\alpha=\min\left\{\frac{\ln\left(\max\left\{1,\mu\left(c_{0}+b_{0}/\underline{% \alpha}^{2}\right)T/c_{1}\right\}\right)}{\mu T},\frac{1}{\underline{\alpha}}% \right\}\leq\frac{1}{\underline{\alpha}}italic_α = roman_min { divide start_ARG roman_ln ( roman_max { 1 , italic_μ ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ) end_ARG start_ARG italic_μ italic_T end_ARG , divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG } ≤ divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG to get the following cases.

- If α=ln(max{1,μ(c0+b0/α¯2)T/c1})μT1α¯𝛼1𝜇subscript𝑐0subscript𝑏0superscript¯𝛼2𝑇subscript𝑐1𝜇𝑇1¯𝛼\alpha=\frac{\ln\left(\max\left\{1,\mu\left(c_{0}+b_{0}/\underline{\alpha}^{2}% \right)T/c_{1}\right\}\right)}{\mu T}\leq\frac{1}{\underline{\alpha}}italic_α = divide start_ARG roman_ln ( roman_max { 1 , italic_μ ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ) end_ARG start_ARG italic_μ italic_T end_ARG ≤ divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG then

exp(αμ2T)(c0+α2b0)𝛼𝜇2𝑇subscript𝑐0superscript𝛼2subscript𝑏0\displaystyle\exp\left(-\frac{\alpha\mu}{2}T\right)\left(c_{0}+\alpha^{2}b_{0}\right)roman_exp ( - divide start_ARG italic_α italic_μ end_ARG start_ARG 2 end_ARG italic_T ) ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) 𝒪~((c0+b0α¯2)exp[ln(max{1,μ(c0+b0α¯2)T/c1})])=𝒪(c1μT)absent~𝒪subscript𝑐0subscript𝑏0superscript¯𝛼21𝜇subscript𝑐0subscript𝑏0superscript¯𝛼2𝑇subscript𝑐1𝒪subscript𝑐1𝜇𝑇\displaystyle\leq\tilde{\mathcal{O}}\left(\left(c_{0}+\frac{b_{0}}{\underline{% \alpha}^{2}}\right)\exp\left[-\ln\left(\max\left\{1,\mu\left(c_{0}+\frac{b_{0}% }{\underline{\alpha}^{2}}\right)T/c_{1}\right\}\right)\right]\right)=\mathcal{% O}\left(\frac{c_{1}}{\mu T}\right)≤ over~ start_ARG caligraphic_O end_ARG ( ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) roman_exp [ - roman_ln ( roman_max { 1 , italic_μ ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) italic_T / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ) ] ) = caligraphic_O ( divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ italic_T end_ARG )

- Otherwise α=1α¯ln(max{1,μ(c0+b0/α¯2)/c1})μT𝛼1¯𝛼1𝜇subscript𝑐0subscript𝑏0superscript¯𝛼2subscript𝑐1𝜇𝑇\alpha=\frac{1}{\underline{\alpha}}\leq\frac{\ln\left(\max\left\{1,\mu\left(c_% {0}+b_{0}/\underline{\alpha}^{2}\right)/c_{1}\right\}\right)}{\mu T}italic_α = divide start_ARG 1 end_ARG start_ARG under¯ start_ARG italic_α end_ARG end_ARG ≤ divide start_ARG roman_ln ( roman_max { 1 , italic_μ ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ) end_ARG start_ARG italic_μ italic_T end_ARG and

exp(αμ2T)(c0+α2b0)=𝒪~(exp[μT2α¯](c0+b0α¯2)).𝛼𝜇2𝑇subscript𝑐0superscript𝛼2subscript𝑏0~𝒪𝜇𝑇2¯𝛼subscript𝑐0subscript𝑏0superscript¯𝛼2\exp\left(-\frac{\alpha\mu}{2}T\right)\left(c_{0}+\alpha^{2}b_{0}\right)=% \tilde{\mathcal{O}}\left(\exp\left[-\frac{\mu T}{2\underline{\alpha}}\right]% \left(c_{0}+\frac{b_{0}}{\underline{\alpha}^{2}}\right)\right).roman_exp ( - divide start_ARG italic_α italic_μ end_ARG start_ARG 2 end_ARG italic_T ) ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = over~ start_ARG caligraphic_O end_ARG ( roman_exp [ - divide start_ARG italic_μ italic_T end_ARG start_ARG 2 under¯ start_ARG italic_α end_ARG end_ARG ] ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) .

Collecting these cases together into (K), we have

𝔼[𝐱¯T𝐱2]𝔼delimited-[]superscriptnormsuperscript¯𝐱𝑇superscript𝐱2\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{x}}}^{T}-{\bf{x}}^{\star}\big% {\|}^{2}\right]blackboard_E [ ∥ over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] exp(αμ2T)(c0+α2b0)+c1α+c2α2absent𝛼𝜇2𝑇subscript𝑐0superscript𝛼2subscript𝑏0subscript𝑐1𝛼subscript𝑐2superscript𝛼2\displaystyle\leq\exp\left(-\frac{\alpha\mu}{2}T\right)\left(c_{0}+\alpha^{2}b% _{0}\right)+c_{1}\alpha+c_{2}\alpha^{2}≤ roman_exp ( - divide start_ARG italic_α italic_μ end_ARG start_ARG 2 end_ARG italic_T ) ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_α + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝒪~(c1μT)+𝒪~(c2μ2T2)+𝒪~(exp[μT2α¯](c0+b0α¯2)).absent~𝒪subscript𝑐1𝜇𝑇~𝒪subscript𝑐2superscript𝜇2superscript𝑇2~𝒪𝜇𝑇2¯𝛼subscript𝑐0subscript𝑏0superscript¯𝛼2\displaystyle\leq\tilde{\mathcal{O}}\left(\frac{c_{1}}{\mu T}\right)+\tilde{% \mathcal{O}}\left(\frac{c_{2}}{\mu^{2}T^{2}}\right)+\tilde{\mathcal{O}}\left(% \exp\left[-\frac{\mu T}{2\underline{\alpha}}\right]\left(c_{0}+\frac{b_{0}}{% \underline{\alpha}^{2}}\right)\right).≤ over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ italic_T end_ARG ) + over~ start_ARG caligraphic_O end_ARG ( divide start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + over~ start_ARG caligraphic_O end_ARG ( roman_exp [ - divide start_ARG italic_μ italic_T end_ARG start_ARG 2 under¯ start_ARG italic_α end_ARG end_ARG ] ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG under¯ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) .

Therefore, (44) holds. ∎

Appendix L Proof of Lemma 4

Proof.

Note that ProxSkip (45) has the following equivalently updates

𝐙~tsuperscript~𝐙𝑡\displaystyle\widetilde{{\bf{Z}}}^{t}over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝐗~t𝐖b𝐔~tα(F(𝐗t)F(𝐗)+𝐒t),absentsuperscript~𝐗𝑡subscript𝐖𝑏superscript~𝐔𝑡𝛼𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡\displaystyle=\widetilde{{\bf{X}}}^{t}-{\bf{W}}_{b}\widetilde{{\bf{U}}}^{t}-% \alpha(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t}),= over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , (82a)
𝐗~t+1superscript~𝐗𝑡1\displaystyle\widetilde{{\bf{X}}}^{t+1}over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐖a𝐙~t𝐖b𝐄t,absentsubscript𝐖𝑎superscript~𝐙𝑡subscript𝐖𝑏superscript𝐄𝑡\displaystyle={\bf{W}}_{a}\widetilde{{\bf{Z}}}^{t}-{\bf{W}}_{b}{\bf{E}}^{t},= bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (82b)
𝐔~t+1superscript~𝐔𝑡1\displaystyle\widetilde{{\bf{U}}}^{t+1}over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐔~t+p2χ𝐖b𝐙~t+p𝐄t.absentsuperscript~𝐔𝑡𝑝2𝜒subscript𝐖𝑏superscript~𝐙𝑡𝑝superscript𝐄𝑡\displaystyle=\widetilde{{\bf{U}}}^{t}+\frac{p}{2\chi}{\bf{W}}_{b}\widetilde{{% \bf{Z}}}^{t}+p{\bf{E}}^{t}.= over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_p bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . (82c)

We rewrite the recursion (82) into the following matrix representation:

[𝐗~t+1𝐔~t+1]=[𝐖a𝐖a𝐖bp2χ𝐖b𝐈p2χ𝐖b2][𝐗~t𝐔~t]α[𝐖a(F(𝐗t)F(𝐗)+𝐒t)p2χ𝐖b(F(𝐗t)F(𝐗)+𝐒t)]+[𝐖b𝐄tp𝐄t].delimited-[]superscript~𝐗𝑡1superscript~𝐔𝑡1delimited-[]subscript𝐖𝑎subscript𝐖𝑎subscript𝐖𝑏𝑝2𝜒subscript𝐖𝑏𝐈𝑝2𝜒superscriptsubscript𝐖𝑏2delimited-[]superscript~𝐗𝑡superscript~𝐔𝑡𝛼delimited-[]subscript𝐖𝑎𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡𝑝2𝜒subscript𝐖𝑏𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡delimited-[]subscript𝐖𝑏superscript𝐄𝑡𝑝superscript𝐄𝑡\left[\begin{array}[]{c}\widetilde{{\bf{X}}}^{t+1}\\ \widetilde{{\bf{U}}}^{t+1}\\ \end{array}\right]=\left[\begin{array}[]{cc}{\bf{W}}_{a}&-{\bf{W}}_{a}{\bf{W}}% _{b}\\ \frac{p}{2\chi}{\bf{W}}_{b}&{\bf{I}}-\frac{p}{2\chi}{\bf{W}}_{b}^{2}\\ \end{array}\right]\left[\begin{array}[]{c}\widetilde{{\bf{X}}}^{t}\\ \widetilde{{\bf{U}}}^{t}\\ \end{array}\right]-\alpha\left[\begin{array}[]{c}{\bf{W}}_{a}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \frac{p}{2\chi}{\bf{W}}_{b}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})+% {\bf{S}}^{t})\\ \end{array}\right]+\left[\begin{array}[]{c}-{\bf{W}}_{b}{\bf{E}}^{t}\\ p{\bf{E}}^{t}\\ \end{array}\right].[ start_ARRAY start_ROW start_CELL over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL start_CELL bold_I - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] - italic_α [ start_ARRAY start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Multiplying both sides of the above by diag{𝐏1,𝐏1}diagsuperscript𝐏1superscript𝐏1\mathrm{diag}\{{\bf{P}}^{-1},{\bf{P}}^{-1}\}roman_diag { bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT } on the left and using (29), we have

[𝐏1𝐗~t+1𝐏1𝐔~t+1]=[𝚲^a𝚲^a𝚲^bp2χ𝚲^b𝐈p2χ𝚲^b2][𝐏1𝐗~t𝐏1𝐔~t]α[𝚲^a𝐏1(F(𝐗t)F(𝐗)+𝐒t)p2χ𝚲^b𝐏1(F(𝐗t)F(𝐗)+𝐒t)]+[𝚲^b𝐏1𝐄tp𝐏1𝐄t].delimited-[]superscript𝐏1superscript~𝐗𝑡1superscript𝐏1superscript~𝐔𝑡1delimited-[]subscript^𝚲𝑎subscript^𝚲𝑎subscript^𝚲𝑏𝑝2𝜒subscript^𝚲𝑏𝐈𝑝2𝜒superscriptsubscript^𝚲𝑏2delimited-[]superscript𝐏1superscript~𝐗𝑡superscript𝐏1superscript~𝐔𝑡𝛼delimited-[]subscript^𝚲𝑎superscript𝐏1𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡𝑝2𝜒subscript^𝚲𝑏superscript𝐏1𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡delimited-[]subscript^𝚲𝑏superscript𝐏1superscript𝐄𝑡𝑝superscript𝐏1superscript𝐄𝑡\left[\begin{array}[]{c}{\bf{P}}^{-1}\widetilde{{\bf{X}}}^{t+1}\\ {\bf{P}}^{-1}\widetilde{{\bf{U}}}^{t+1}\\ \end{array}\right]=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}&-\hat{{% \bf{\Lambda}}}_{a}\hat{{\bf{\Lambda}}}_{b}\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}&{\bf{I}}-\frac{p}{2\chi}\hat{{\bf{% \Lambda}}}_{b}^{2}\\ \end{array}\right]\left[\begin{array}[]{c}{\bf{P}}^{-1}\widetilde{{\bf{X}}}^{t% }\\ {\bf{P}}^{-1}\widetilde{{\bf{U}}}^{t}\\ \end{array}\right]-\alpha\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}{\bf{% P}}^{-1}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}{\bf{P}}^{-1}(\nabla F({\bf{X}}^{t})-% \nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \end{array}\right]+\left[\begin{array}[]{c}-\hat{{\bf{\Lambda}}}_{b}{\bf{P}}^{% -1}{\bf{E}}^{t}\\ p{\bf{P}}^{-1}{\bf{E}}^{t}\\ \end{array}\right].[ start_ARRAY start_ROW start_CELL bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL start_CELL bold_I - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] - italic_α [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Since 𝐔~tsuperscript~𝐔𝑡\widetilde{{\bf{U}}}^{t}over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT lies in the range space of 𝐖bsubscript𝐖𝑏{\bf{W}}_{b}bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, we have 𝟏𝖳𝐔~t=0,t0formulae-sequencesuperscript1𝖳superscript~𝐔𝑡0𝑡0{\bf{1}}^{\sf T}\widetilde{{\bf{U}}}^{t}=0,\ t\geq 0bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0 , italic_t ≥ 0. By the structure of 𝐏𝐏{\bf{P}}bold_P, we have

𝐏1𝐗~t=[𝐞¯t𝐏^𝖳𝐗~t],𝐏1𝐔~t=[0𝐏^𝖳𝐔~t],𝐏1F(𝐗t)=[F¯(𝐗t)𝐏^𝖳F(𝐗t)],𝐏1𝐄t=[0𝐏^𝖳𝐄t].formulae-sequencesuperscript𝐏1superscript~𝐗𝑡delimited-[]superscript¯𝐞𝑡superscript^𝐏𝖳superscript~𝐗𝑡formulae-sequencesuperscript𝐏1superscript~𝐔𝑡delimited-[]0superscript^𝐏𝖳superscript~𝐔𝑡formulae-sequencesuperscript𝐏1𝐹superscript𝐗𝑡delimited-[]¯𝐹superscript𝐗𝑡superscript^𝐏𝖳𝐹superscript𝐗𝑡superscript𝐏1superscript𝐄𝑡delimited-[]0superscript^𝐏𝖳superscript𝐄𝑡{\bf{P}}^{-1}\widetilde{{\bf{X}}}^{t}=\left[\begin{array}[]{c}\bar{{\bf{e}}}^{% t}\\ \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{t}\\ \end{array}\right],\ {\bf{P}}^{-1}\widetilde{{\bf{U}}}^{t}=\left[\begin{array}% []{c}0\\ \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t}\\ \end{array}\right],\ {\bf{P}}^{-1}\nabla F({\bf{X}}^{t})=\left[\begin{array}[]% {c}\overline{\nabla F}({\bf{X}}^{t})\\ \hat{{\bf{P}}}^{\sf T}\nabla F({\bf{X}}^{t})\\ \end{array}\right],\ {\bf{P}}^{-1}{\bf{E}}^{t}=\left[\begin{array}[]{c}0\\ \hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right].bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = [ start_ARRAY start_ROW start_CELL over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] , bold_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Therefor, it holds that

𝐞¯t+1superscript¯𝐞𝑡1\displaystyle\bar{{\bf{e}}}^{t+1}over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT =𝐞¯tαF¯(𝐗t)α𝐬¯t,absentsuperscript¯𝐞𝑡𝛼¯𝐹superscript𝐗𝑡𝛼superscript¯𝐬𝑡\displaystyle=\bar{{\bf{e}}}^{t}-\alpha\overline{\nabla F}({\bf{X}}^{t})-% \alpha\bar{{\bf{s}}}^{t},= over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α over¯ start_ARG ∇ italic_F end_ARG ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α over¯ start_ARG bold_s end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,
[𝐏^𝖳𝐗~t+1𝐏^𝖳𝐔~t+1]delimited-[]superscript^𝐏𝖳superscript~𝐗𝑡1superscript^𝐏𝖳superscript~𝐔𝑡1\displaystyle\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}% }^{t+1}\\ \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t+1}\\ \end{array}\right][ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] =[𝚲^a𝚲^a𝚲^bp2χ𝚲^b𝐈p2χ𝚲^b2][𝐏^𝖳𝐗~t𝐏^𝖳𝐔~t]α[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)]+[𝚲^b𝐏^𝖳𝐄tp𝐏^𝖳𝐄t].absentdelimited-[]subscript^𝚲𝑎subscript^𝚲𝑎subscript^𝚲𝑏𝑝2𝜒subscript^𝚲𝑏𝐈𝑝2𝜒superscriptsubscript^𝚲𝑏2delimited-[]superscript^𝐏𝖳superscript~𝐗𝑡superscript^𝐏𝖳superscript~𝐔𝑡𝛼delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡delimited-[]subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡𝑝superscript^𝐏𝖳superscript𝐄𝑡\displaystyle=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}&-\hat{{\bf{% \Lambda}}}_{a}\hat{{\bf{\Lambda}}}_{b}\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}&{\bf{I}}-\frac{p}{2\chi}\hat{{\bf{% \Lambda}}}_{b}^{2}\\ \end{array}\right]\left[\begin{array}[]{c}\hat{{\bf{P}}}^{\sf T}\widetilde{{% \bf{X}}}^{t}\\ \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t}\\ \end{array}\right]-\alpha\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{% {\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{% t})\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \end{array}\right]+\left[\begin{array}[]{c}-\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{% P}}}^{\sf T}{\bf{E}}^{t}\\ p\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\\ \end{array}\right].= [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL start_CELL bold_I - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] - italic_α [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] .

Let

𝐇s=[𝚲^a𝚲^a𝚲^bp2χ𝚲^b𝐈p2χ𝚲^b2]=[𝐈12χ(𝐈𝚲^)(𝐈12χ(𝐈𝚲^))𝐈𝚲^p2χ𝐈𝚲^𝐈p2χ(𝐈𝚲^)]superscript𝐇sdelimited-[]subscript^𝚲𝑎subscript^𝚲𝑎subscript^𝚲𝑏𝑝2𝜒subscript^𝚲𝑏𝐈𝑝2𝜒superscriptsubscript^𝚲𝑏2delimited-[]𝐈12𝜒𝐈^𝚲𝐈12𝜒𝐈^𝚲𝐈^𝚲𝑝2𝜒𝐈^𝚲𝐈𝑝2𝜒𝐈^𝚲{\bf{H}}^{\mathrm{s}}=\left[\begin{array}[]{cc}\hat{{\bf{\Lambda}}}_{a}&-\hat{% {\bf{\Lambda}}}_{a}\hat{{\bf{\Lambda}}}_{b}\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}&{\bf{I}}-\frac{p}{2\chi}\hat{{\bf{% \Lambda}}}_{b}^{2}\\ \end{array}\right]=\left[\begin{array}[]{cc}{\bf{I}}-\frac{1}{2\chi}({\bf{I}}-% \hat{{\bf{\Lambda}}})&-({\bf{I}}-\frac{1}{2\chi}({\bf{I}}-\hat{{\bf{\Lambda}}}% ))\sqrt{{\bf{I}}-\hat{{\bf{\Lambda}}}}\\ \frac{p}{2\chi}\sqrt{{\bf{I}}-\hat{{\bf{\Lambda}}}}&{\bf{I}}-\frac{p}{2\chi}({% \bf{I}}-\hat{{\bf{\Lambda}}})\\ \end{array}\right]bold_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL start_CELL - over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_CELL start_CELL bold_I - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) end_CELL start_CELL - ( bold_I - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) ) square-root start_ARG bold_I - over^ start_ARG bold_Λ end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG square-root start_ARG bold_I - over^ start_ARG bold_Λ end_ARG end_ARG end_CELL start_CELL bold_I - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG ( bold_I - over^ start_ARG bold_Λ end_ARG ) end_CELL end_ROW end_ARRAY ]

where 𝚲^=diag{λ2,,λn}^𝚲diagsubscript𝜆2subscript𝜆𝑛\hat{{\bf{\Lambda}}}=\mathrm{diag}\{\lambda_{2},\cdots,\lambda_{n}\}over^ start_ARG bold_Λ end_ARG = roman_diag { italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, and λi(1,1)subscript𝜆𝑖11\lambda_{i}\in(-1,1)italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( - 1 , 1 ). Since the blocks of 𝐇ssuperscript𝐇s{\bf{H}}^{\mathrm{s}}bold_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT are diagonal matrices, there exists a permutation matrix 𝐐1ssubscriptsuperscript𝐐s1{\bf{Q}}^{\mathrm{s}}_{1}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that 𝐐1s𝐇s(𝐐1s)𝖳=blkdiag{His}i=2nsubscriptsuperscript𝐐s1superscript𝐇ssuperscriptsubscriptsuperscript𝐐s1𝖳blkdiagsuperscriptsubscriptsubscriptsuperscript𝐻s𝑖𝑖2𝑛{\bf{Q}}^{\mathrm{s}}_{1}{\bf{H}}^{\mathrm{s}}({\bf{Q}}^{\mathrm{s}}_{1})^{\sf T% }=\mathrm{blkdiag}\{H^{\mathrm{s}}_{i}\}_{i=2}^{n}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = roman_blkdiag { italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where

His=[112χ(1λi)(112χ(1λi))1λip2χ1λi1p2χ(1λi)].subscriptsuperscript𝐻s𝑖delimited-[]112𝜒1subscript𝜆𝑖112𝜒1subscript𝜆𝑖1subscript𝜆𝑖𝑝2𝜒1subscript𝜆𝑖1𝑝2𝜒1subscript𝜆𝑖H^{\mathrm{s}}_{i}=\left[\begin{array}[]{cc}1-\frac{1}{2\chi}(1-\lambda_{i})&-% (1-\frac{1}{2\chi}(1-\lambda_{i}))\sqrt{1-\lambda_{i}}\\ \frac{p}{2\chi}\sqrt{1-\lambda_{i}}&1-\frac{p}{2\chi}(1-\lambda_{i})\\ \end{array}\right].italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL start_CELL - ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL 1 - divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ] .

Setting νi=112χ(1λi)subscript𝜈𝑖112𝜒1subscript𝜆𝑖\nu_{i}=1-\frac{1}{2\chi}(1-\lambda_{i})italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), we have νi(0,1)subscript𝜈𝑖01\nu_{i}\in(0,1)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be rewritten as

His=[νiνi2χ(1νi)p2χ2χ(1νi)1p(1νi)].subscriptsuperscript𝐻s𝑖delimited-[]subscript𝜈𝑖subscript𝜈𝑖2𝜒1subscript𝜈𝑖𝑝2𝜒2𝜒1subscript𝜈𝑖1𝑝1subscript𝜈𝑖H^{\mathrm{s}}_{i}=\left[\begin{array}[]{cc}\nu_{i}&-\nu_{i}\sqrt{2\chi(1-\nu_% {i})}\\ \frac{p}{2\chi}\sqrt{2\chi(1-\nu_{i})}&1-p(1-\nu_{i})\\ \end{array}\right].italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT square-root start_ARG 2 italic_χ ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG square-root start_ARG 2 italic_χ ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL 1 - italic_p ( 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ] .

Since

Tr(His)=(1+p)νi+(1p),det(His)=νi,formulae-sequenceTrsubscriptsuperscript𝐻s𝑖1𝑝subscript𝜈𝑖1𝑝detsubscriptsuperscript𝐻s𝑖subscript𝜈𝑖\displaystyle\mathrm{Tr}(H^{\mathrm{s}}_{i})=(1+p)\nu_{i}+(1-p),\quad\mathrm{% det}(H^{\mathrm{s}}_{i})=\nu_{i},roman_Tr ( italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( 1 + italic_p ) italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_p ) , roman_det ( italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

the eigenvalues of Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are

γ(1,2),isubscript𝛾12𝑖\displaystyle\gamma_{(1,2),i}italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT =12[Tr(His)±Tr(His)24det(His)]absent12delimited-[]plus-or-minusTrsubscriptsuperscript𝐻s𝑖Trsuperscriptsubscriptsuperscript𝐻s𝑖24detsubscriptsuperscript𝐻s𝑖\displaystyle=\frac{1}{2}\Big{[}\mathrm{Tr}(H^{\mathrm{s}}_{i})\pm\sqrt{% \mathrm{Tr}(H^{\mathrm{s}}_{i})^{2}-4\mathrm{det}(H^{\mathrm{s}}_{i})}\Big{]}= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ roman_Tr ( italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ± square-root start_ARG roman_Tr ( italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 roman_d roman_e roman_t ( italic_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ]
=12[(1+p)νi+(1p)]±12(1+p)2νi2+(2(1+p)(1p)4)νi+(1p)2:=Δi(νi,p).absentplus-or-minus12delimited-[]1𝑝subscript𝜈𝑖1𝑝12subscriptsuperscript1𝑝2superscriptsubscript𝜈𝑖221𝑝1𝑝4subscript𝜈𝑖superscript1𝑝2assignabsentsubscriptΔ𝑖subscript𝜈𝑖𝑝\displaystyle=\frac{1}{2}\Big{[}(1+p)\nu_{i}+(1-p)\Big{]}\pm\frac{1}{2}\sqrt{% \underbrace{(1+p)^{2}\nu_{i}^{2}+(2(1+p)(1-p)-4)\nu_{i}+(1-p)^{2}}_{:=\Delta_{% i}(\nu_{i},p)}}.= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ ( 1 + italic_p ) italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_p ) ] ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG under⏟ start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2 ( 1 + italic_p ) ( 1 - italic_p ) - 4 ) italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT := roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) end_POSTSUBSCRIPT end_ARG .

Consider the sign of Δi(νi,p)subscriptΔ𝑖subscript𝜈𝑖𝑝\Delta_{i}(\nu_{i},p)roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ). Note that Δi(νi,p)subscriptΔ𝑖subscript𝜈𝑖𝑝\Delta_{i}(\nu_{i},p)roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) is a quadratic function on νisubscript𝜈𝑖\nu_{i}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and

(1+p)2>0,Δi(0,p)=(1p)2,Δi(1,p)=0,Δi(ci,p)=0, where ci=(1p)2(1+p)2<1.formulae-sequencesuperscript1𝑝20formulae-sequencesubscriptΔ𝑖0𝑝superscript1𝑝2formulae-sequencesubscriptΔ𝑖1𝑝0formulae-sequencesubscriptΔ𝑖subscript𝑐𝑖𝑝0 where subscript𝑐𝑖superscript1𝑝2superscript1𝑝21(1+p)^{2}>0,\ \Delta_{i}(0,p)=(1-p)^{2},\ \Delta_{i}(1,p)=0,\ \Delta_{i}(c_{i}% ,p)=0,\text{ where }c_{i}=\frac{(1-p)^{2}}{(1+p)^{2}}<1.( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0 , roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 , italic_p ) = ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 , italic_p ) = 0 , roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) = 0 , where italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG < 1 .

We have

{Δi(νi,p)>0,νi(0,ci)Δi(νi,p)<0,νi(ci,1).casessubscriptΔ𝑖subscript𝜈𝑖𝑝0subscript𝜈𝑖0subscript𝑐𝑖subscriptΔ𝑖subscript𝜈𝑖𝑝0subscript𝜈𝑖subscript𝑐𝑖1\left\{\begin{array}[]{cc}\Delta_{i}(\nu_{i},p)>0,&\nu_{i}\in(0,c_{i})\\ \Delta_{i}(\nu_{i},p)<0,&\nu_{i}\in(c_{i},1)\end{array}\right..{ start_ARRAY start_ROW start_CELL roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) > 0 , end_CELL start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) < 0 , end_CELL start_CELL italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) end_CELL end_ROW end_ARRAY .

Since νi=112χ(1λi)112χ(1λn),i=2,,nformulae-sequencesubscript𝜈𝑖112𝜒1subscript𝜆𝑖112𝜒1subscript𝜆𝑛𝑖2𝑛\nu_{i}=1-\frac{1}{2\chi}(1-\lambda_{i})\geq 1-\frac{1}{2\chi}(1-\lambda_{n}),% i=2,\ldots,nitalic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_i = 2 , … , italic_n and λn(1,1)subscript𝜆𝑛11\lambda_{n}\in(-1,1)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ ( - 1 , 1 ), it holds that

χ1p(1+p)24p>(1λn)(1+p)28pνi112χ(1λn)>(1p)2(1+p)2.𝜒1𝑝superscript1𝑝24𝑝1subscript𝜆𝑛superscript1𝑝28𝑝subscript𝜈𝑖112𝜒1subscript𝜆𝑛superscript1𝑝2superscript1𝑝2\chi\geq\frac{1}{p}\geq\frac{(1+p)^{2}}{4p}>\frac{(1-\lambda_{n})(1+p)^{2}}{8p% }\Longrightarrow\nu_{i}\geq 1-\frac{1}{2\chi}(1-\lambda_{n})>\frac{(1-p)^{2}}{% (1+p)^{2}}.italic_χ ≥ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ≥ divide start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_p end_ARG > divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_p end_ARG ⟹ italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) > divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

As a result, when χ1p𝜒1𝑝\chi\geq\frac{1}{p}italic_χ ≥ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG, we have νi(ci,1)subscript𝜈𝑖subscript𝑐𝑖1\nu_{i}\in(c_{i},1)italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ), i.e., Δi(νi,p)<0subscriptΔ𝑖subscript𝜈𝑖𝑝0\Delta_{i}(\nu_{i},p)<0roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p ) < 0. It implies that

γ(1,2),isubscript𝛾12𝑖\displaystyle\gamma_{(1,2),i}italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT =12[(1+p)νi+(1p)]±j124νi[(1+p)νi+(1p)]2, and |γ(1,2),i|=νi<1,formulae-sequenceabsentplus-or-minus12delimited-[]1𝑝subscript𝜈𝑖1𝑝𝑗124subscript𝜈𝑖superscriptdelimited-[]1𝑝subscript𝜈𝑖1𝑝2 and subscript𝛾12𝑖subscript𝜈𝑖1\displaystyle=\frac{1}{2}\big{[}(1+p)\nu_{i}+(1-p)\big{]}\pm j\frac{1}{2}\sqrt% {4\nu_{i}-\big{[}(1+p)\nu_{i}+(1-p)\big{]}^{2}},\text{ and }|\gamma_{(1,2),i}|% =\sqrt{\nu_{i}}<1,= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ ( 1 + italic_p ) italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_p ) ] ± italic_j divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG 4 italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - [ ( 1 + italic_p ) italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_p ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , and | italic_γ start_POSTSUBSCRIPT ( 1 , 2 ) , italic_i end_POSTSUBSCRIPT | = square-root start_ARG italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG < 1 ,

where j2=1superscript𝑗21j^{2}=-1italic_j start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = - 1. Since γ1,iγ2,isubscript𝛾1𝑖subscript𝛾2𝑖\gamma_{1,i}\neq\gamma_{2,i}italic_γ start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT ≠ italic_γ start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT, there exists a invertible Q2,issubscriptsuperscript𝑄s2𝑖Q^{\mathrm{s}}_{2,i}italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT such that Hi=Q2,isΓi(Q2,is)1subscript𝐻𝑖subscriptsuperscript𝑄s2𝑖subscriptΓ𝑖superscriptsubscriptsuperscript𝑄s2𝑖1H_{i}=Q^{\mathrm{s}}_{2,i}\Gamma_{i}(Q^{\mathrm{s}}_{2,i})^{-1}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, where Γi=diag{γ1,i,γ2,i}subscriptΓ𝑖diagsubscript𝛾1𝑖subscript𝛾2𝑖\Gamma_{i}=\mathrm{diag}\{\gamma_{1,i},\gamma_{2,i}\}roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_diag { italic_γ start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT }. Using [5, Appendix B.2] and letting r=1νi𝑟1subscript𝜈𝑖r=\sqrt{1-\nu_{i}}italic_r = square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG, we have

Q2,is=[12(p1)1νi+12j(1+p)2(νici)12(p1)1νi12j(1+p)2(νici)p1/2χp1/2χ]subscriptsuperscript𝑄s2𝑖delimited-[]12𝑝11subscript𝜈𝑖12𝑗superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖12𝑝11subscript𝜈𝑖12𝑗superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖𝑝12𝜒𝑝12𝜒Q^{\mathrm{s}}_{2,i}=\left[\begin{array}[]{cc}\frac{1}{2}(p-1)\sqrt{1-\nu_{i}}% +\frac{1}{2}j\sqrt{(1+p)^{2}(\nu_{i}-c_{i})}&\frac{1}{2}(p-1)\sqrt{1-\nu_{i}}-% \frac{1}{2}j\sqrt{(1+p)^{2}(\nu_{i}-c_{i})}\\ p\sqrt{\nicefrac{{1}}{{2\chi}}}&p\sqrt{\nicefrac{{1}}{{2\chi}}}\end{array}\right]italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_p - 1 ) square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_j square-root start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_p - 1 ) square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_j square-root start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL italic_p square-root start_ARG / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG end_ARG end_CELL start_CELL italic_p square-root start_ARG / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG end_ARG end_CELL end_ROW end_ARRAY ]
(Q2,is)1=2χp(1+p)2(νici)[jp1/2χ12(1+p)2(νici)+12j(p1)1νijp1/2χ12(1+p)2(νici)12j(p1)1νi]superscriptsubscriptsuperscript𝑄s2𝑖12𝜒𝑝superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖delimited-[]𝑗𝑝12𝜒12superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖12𝑗𝑝11subscript𝜈𝑖𝑗𝑝12𝜒12superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖12𝑗𝑝11subscript𝜈𝑖(Q^{\mathrm{s}}_{2,i})^{-1}=\frac{\sqrt{2\chi}}{p\sqrt{(1+p)^{2}(\nu_{i}-c_{i}% )}}\left[\begin{array}[]{cc}-jp\sqrt{\nicefrac{{1}}{{2\chi}}}&\frac{1}{2}\sqrt% {(1+p)^{2}(\nu_{i}-c_{i})}+\frac{1}{2}j(p-1)\sqrt{1-\nu_{i}}\\ jp\sqrt{\nicefrac{{1}}{{2\chi}}}&\frac{1}{2}\sqrt{(1+p)^{2}(\nu_{i}-c_{i})}-% \frac{1}{2}j(p-1)\sqrt{1-\nu_{i}}\end{array}\right]( italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = divide start_ARG square-root start_ARG 2 italic_χ end_ARG end_ARG start_ARG italic_p square-root start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG end_ARG [ start_ARRAY start_ROW start_CELL - italic_j italic_p square-root start_ARG / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG end_ARG end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_j ( italic_p - 1 ) square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL italic_j italic_p square-root start_ARG / start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG end_ARG end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_j ( italic_p - 1 ) square-root start_ARG 1 - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARRAY ]

Since the spectral radius of matrix is upper bounded by any of its norm, 0<p0p<10subscript𝑝0𝑝10<p_{0}\leq p<10 < italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_p < 1, and 0<νi<10subscript𝜈𝑖10<\nu_{i}<10 < italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1, it holds that

Q2,i2Q2,iQ2,i14.superscriptnormsubscript𝑄2𝑖2subscriptnormsubscript𝑄2𝑖superscriptsubscript𝑄2𝑖14\|Q_{2,i}\|^{2}\leq\|Q_{2,i}Q_{2,i}^{*}\|_{1}\leq 4.∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 4 .

Following a similar argument for Q2,i1superscriptsubscript𝑄2𝑖1Q_{2,i}^{-1}italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and using p2(1+p)2(νici)=p2(1+p)2(112χ(1λi))(1p)24p34p2(1λn)2χ2p2(1+λn)χsuperscript𝑝2superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖superscript𝑝2superscript1𝑝2112𝜒1subscript𝜆𝑖superscript1𝑝24superscript𝑝34superscript𝑝21subscript𝜆𝑛2𝜒2superscript𝑝21subscript𝜆𝑛𝜒p^{2}(1+p)^{2}(\nu_{i}-c_{i})=p^{2}(1+p)^{2}(1-\frac{1}{2\chi}(1-\lambda_{i}))% -(1-p)^{2}\geq 4p^{3}-\frac{4p^{2}(1-\lambda_{n})}{2\chi}\geq\frac{2p^{2}(1+% \lambda_{n})}{\chi}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) - ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 4 italic_p start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - divide start_ARG 4 italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG ≥ divide start_ARG 2 italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_χ end_ARG, we have

(Q2,is)122χp2(1+p)2(νici)χ2p2(1+λn).superscriptnormsuperscriptsuperscriptsubscript𝑄2𝑖s122𝜒superscript𝑝2superscript1𝑝2subscript𝜈𝑖subscript𝑐𝑖superscript𝜒2superscript𝑝21subscript𝜆𝑛\displaystyle\|(Q_{2,i}^{\mathrm{s}})^{-1}\|^{2}\leq\frac{2\chi}{p^{2}(1+p)^{2% }(\nu_{i}-c_{i})}\leq\frac{\chi^{2}}{p^{2}(1+\lambda_{n})}.∥ ( italic_Q start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_χ end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ≤ divide start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG .

Let 𝐐s=(𝐐1s)𝖳𝐐2ssuperscript𝐐ssuperscriptsubscriptsuperscript𝐐s1𝖳subscriptsuperscript𝐐s2{\bf{Q}}^{\mathrm{s}}=({\bf{Q}}^{\mathrm{s}}_{1})^{\sf T}{\bf{Q}}^{\mathrm{s}}% _{2}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT = ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with 𝐐2s=blkdiag{Q2,is}i=2nsubscriptsuperscript𝐐s2blkdiagsuperscriptsubscriptsubscriptsuperscript𝑄s2𝑖𝑖2𝑛{\bf{Q}}^{\mathrm{s}}_{2}=\mathrm{blkdiag}\{Q^{\mathrm{s}}_{2,i}\}_{i=2}^{n}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_blkdiag { italic_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We have (𝐐s)1𝐇𝐐s=𝚪superscriptsuperscript𝐐s1superscript𝐇𝐐s𝚪({\bf{Q}}^{\mathrm{s}})^{-1}{\bf{H}}{\bf{Q}}^{\mathrm{s}}={\bf{\Gamma}}( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_HQ start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT = bold_Γ, where 𝚪=blkdiag{Γi}i=2n𝚪blkdiagsuperscriptsubscriptsubscriptΓ𝑖𝑖2𝑛{\bf{\Gamma}}=\mathrm{blkdiag}\{\Gamma_{i}\}_{i=2}^{n}bold_Γ = roman_blkdiag { roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, i.e., there exists an invertible matrix 𝐐ssuperscript𝐐s{\bf{Q}}^{\mathrm{s}}bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT such that 𝐇s=𝐐s𝚪(𝐐s)1superscript𝐇ssuperscript𝐐s𝚪superscriptsuperscript𝐐s1{\bf{H}}^{\mathrm{s}}={\bf{Q}}^{\mathrm{s}}{\bf{\Gamma}}({\bf{Q}}^{\mathrm{s}}% )^{-1}bold_H start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT = bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT bold_Γ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and

𝚪=112χ(1λ2)<1.norm𝚪112𝜒1subscript𝜆21\|{\bf{\Gamma}}\|=\sqrt{1-\frac{1}{2\chi}(1-\lambda_{2})}<1.∥ bold_Γ ∥ = square-root start_ARG 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_χ end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 1 .

Moreover, we have 𝐐s2(𝐐s)128χ2p2(1+λn)superscriptnormsuperscript𝐐s2superscriptnormsuperscriptsuperscript𝐐s128superscript𝜒2superscript𝑝21subscript𝜆𝑛\|{\bf{Q}}^{\mathrm{s}}\|^{2}\|({\bf{Q}}^{\mathrm{s}})^{-1}\|^{2}\leq\frac{8% \chi^{2}}{p^{2}(1+\lambda_{n})}∥ bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 8 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG. We thus complete the proof. ∎

Appendix M Proof of Lemma 5

Proof.

Proof of (48). It follows from (E) and 0<αL120𝛼𝐿120<\alpha L\leq\frac{1}{2}0 < italic_α italic_L ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG that

𝔼[𝐞¯t+12|𝒢t]𝔼delimited-[]conditionalsuperscriptnormsuperscript¯𝐞𝑡12superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{e}}}^{t+1}\big{\|}^{2}\;|\;% \mathcal{G}^{t}\right]blackboard_E [ ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] (1μα)𝐞¯t2+(αLn+2α2L2n)𝐗t𝟏𝐱¯tF2+α2σ2n2α(12αL)(f(𝐱¯t)f(𝐱))absent1𝜇𝛼superscriptnormsuperscript¯𝐞𝑡2𝛼𝐿𝑛2superscript𝛼2superscript𝐿2𝑛superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F2superscript𝛼2superscript𝜎2𝑛2𝛼12𝛼𝐿𝑓superscript¯𝐱𝑡𝑓superscript𝐱\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\Big{(}\frac{\alpha L% }{n}+\frac{2\alpha^{2}L^{2}}{n}\Big{)}\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^{t% }\|_{\mathrm{F}}^{2}+\frac{\alpha^{2}\sigma^{2}}{n}-2\alpha(1-2\alpha L)(f(% \bar{{\bf{x}}}^{t})-f({\bf{x}}^{\star}))≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_α italic_L end_ARG start_ARG italic_n end_ARG + divide start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - 2 italic_α ( 1 - 2 italic_α italic_L ) ( italic_f ( over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) )
(1μα)𝐞¯t2+2αLn𝐗t𝟏𝐱¯tF2+α2σ2n.absent1𝜇𝛼superscriptnormsuperscript¯𝐞𝑡22𝛼𝐿𝑛superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F2superscript𝛼2superscript𝜎2𝑛\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\frac{2\alpha L}{n}\|% {\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^{t}\|_{\mathrm{F}}^{2}+\frac{\alpha^{2}% \sigma^{2}}{n}.≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L end_ARG start_ARG italic_n end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG .

Note that 𝐏^𝖳𝐏^=𝐈, 1𝖳𝐏^=0,𝐏^𝐏^𝖳=𝐈1n𝟏𝟏𝖳formulae-sequencesuperscript^𝐏𝖳^𝐏𝐈formulae-sequencesuperscript1𝖳^𝐏0^𝐏superscript^𝐏𝖳𝐈1𝑛superscript11𝖳\hat{{\bf{P}}}^{\sf T}\hat{{\bf{P}}}={\bf{I}},\ {\bf{1}}^{\sf T}\hat{{\bf{P}}}% =0,\ \hat{{\bf{P}}}\hat{{\bf{P}}}^{\sf T}={\bf{I}}-\frac{1}{n}{\bf{11}}^{\sf T}over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG = bold_I , bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG = 0 , over^ start_ARG bold_P end_ARG over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = bold_I - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_11 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT. We obtain

𝐏^𝖳𝐗~tF2=𝐏^𝐏^𝖳𝐗~tF2=(𝐈1n𝟏𝟏𝖳)𝐗~tF2=𝐗t𝟏𝐱¯tF2.superscriptsubscriptnormsuperscript^𝐏𝖳superscript~𝐗𝑡F2superscriptsubscriptnorm^𝐏superscript^𝐏𝖳superscript~𝐗𝑡F2superscriptsubscriptnorm𝐈1𝑛superscript11𝖳superscript~𝐗𝑡F2superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F2\|\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}=\|\hat{{% \bf{P}}}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}=\|({% \bf{I}}-\frac{1}{n}{\bf{11}}^{\sf T})\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{% 2}=\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^{t}\|_{\mathrm{F}}^{2}.∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG bold_P end_ARG over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ( bold_I - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_11 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

On the other hand, 𝐏^𝖳𝐗~tF2=υ1𝐐sstF2𝐏^𝖳𝐔~tF2superscriptsubscriptnormsuperscript^𝐏𝖳superscript~𝐗𝑡F2superscriptsubscriptnormsuperscript𝜐1superscript𝐐ssuperscriptsubscripts𝑡F2superscriptsubscriptnormsuperscript^𝐏𝖳superscript~𝐔𝑡F2\|\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}=\|\upsilon% ^{-1}{\bf{Q}}^{\mathrm{s}}\mathcal{E}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}-\|% \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t}\|_{\mathrm{F}}^{2}∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_υ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It holds that

𝐗t𝟏𝐱¯tF2υ1𝐐sstF2υ2𝐐s2stF2.superscriptsubscriptnormsuperscript𝐗𝑡1superscript¯𝐱𝑡F2superscriptsubscriptnormsuperscript𝜐1superscript𝐐ssuperscriptsubscripts𝑡F2superscript𝜐2superscriptnormsuperscript𝐐s2superscriptsubscriptnormsuperscriptsubscripts𝑡F2\|{\bf{X}}^{t}-{\bf{1}}\bar{{\bf{x}}}^{t}\|_{\mathrm{F}}^{2}\leq\|\upsilon^{-1% }{\bf{Q}}^{\mathrm{s}}\mathcal{E}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\leq% \upsilon^{-2}\|{\bf{Q}}^{\mathrm{s}}\|^{2}\|\mathcal{E}_{\mathrm{s}}^{t}\|_{% \mathrm{F}}^{2}.∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_1 over¯ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_υ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_υ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∥ bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Therefore, we (48) follows.

Proof of (49). Taking conditioned expectation with respect to tsuperscript𝑡\mathcal{F}^{t}caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, it follows from (47f) that

𝔼[st+1F2|t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscripts𝑡1F2superscript𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\;|\;\mathcal{F}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] =𝔾stF2+𝔼[𝔽stF2|t]+2𝔼[𝔾st,𝔽st|t]absentsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔽s𝑡F2superscript𝑡2𝔼delimited-[]conditionalsubscriptsuperscript𝔾𝑡ssubscriptsuperscript𝔽𝑡ssuperscript𝑡\displaystyle=\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+\mathbb{E}\!% \left[\|\mathbb{F}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}% \right]+2\mathbb{E}\!\left[\langle\mathbb{G}^{t}_{\mathrm{s}},\mathbb{F}^{t}_{% \mathrm{s}}\rangle\;|\;\mathcal{F}^{t}\right]= ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ blackboard_F start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + 2 blackboard_E [ ⟨ blackboard_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , blackboard_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ⟩ | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝔾stF2+𝔼[𝔽stF2|t]absentsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔽s𝑡F2superscript𝑡\displaystyle=\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+\mathbb{E}\!% \left[\|\mathbb{F}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]= ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ blackboard_F start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝔾stF2+𝔼[υ(𝐐s)1𝚲^b𝐏^𝖳𝐄tF2|t]+𝔼[υp(𝐐s)1𝐏^𝖳𝐄tF2|t].absentsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐𝑝superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡\displaystyle=\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+\mathbb{E}\!% \left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{% P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]+\mathbb% {E}\!\left[\|\upsilon p({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}{\bf{% E}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right].= ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + blackboard_E [ ∥ italic_υ italic_p ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] .

Since 𝐄t=(θt1)2χ𝐖b𝐙^tsuperscript𝐄𝑡subscript𝜃𝑡12𝜒subscript𝐖𝑏superscript^𝐙𝑡{\bf{E}}^{t}=\frac{(\theta_{t}-1)}{2\chi}{\bf{W}}_{b}\hat{{\bf{Z}}}^{t}bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ) end_ARG start_ARG 2 italic_χ end_ARG bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, Prob(θt=1)=pProbsubscript𝜃𝑡1𝑝\mathop{\rm Prob}(\theta_{t}=1)=proman_Prob ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 ) = italic_p, and Prob(θt=0)=1pProbsubscript𝜃𝑡01𝑝\mathop{\rm Prob}(\theta_{t}=0)=1-proman_Prob ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 ) = 1 - italic_p, we have

𝔼[υ(𝐐s)1𝚲^b𝐏^𝖳𝐄tF2|t]+𝔼[υp(𝐐s)1𝐏^𝖳𝐄tF2|t]𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐𝑝superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript𝐄𝑡F2superscript𝑡\displaystyle\mathbb{E}\!\left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf% {\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{F}^{t}\right]+\mathbb{E}\!\left[\|\upsilon p({\bf{Q}}^{\mathrm{s}})^{% -1}\hat{{\bf{P}}}^{\sf T}{\bf{E}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{F}^{t}\right]blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + blackboard_E [ ∥ italic_υ italic_p ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=1p4χ2(υ(𝐐s)1𝚲^b𝐏^𝖳𝐖b𝐙^tF2+υp(𝐐s)1𝐏^𝖳𝐖b𝐙^tF2)absent1𝑝4superscript𝜒2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1subscript^𝚲𝑏superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡F2superscriptsubscriptnorm𝜐𝑝superscriptsuperscript𝐐s1superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡F2\displaystyle=\frac{1-p}{4\chi^{2}}\Big{(}\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-% 1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{W}}_{b}\hat{{\bf{Z}}}^{t}% \|_{\mathrm{F}}^{2}+\|\upsilon p({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{% \sf T}{\bf{W}}_{b}\hat{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\Big{)}= divide start_ARG 1 - italic_p end_ARG start_ARG 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_υ italic_p ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
=1p4χ2(υ(𝐐s)1𝚲^b𝐏^𝖳𝐖b(𝐙^t𝐗)F2+υp(𝐐s)1𝐏^𝖳𝐖b(𝐙^t𝐗)F2)absent1𝑝4superscript𝜒2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1subscript^𝚲𝑏superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡superscript𝐗F2superscriptsubscriptnorm𝜐𝑝superscriptsuperscript𝐐s1superscript^𝐏𝖳subscript𝐖𝑏superscript^𝐙𝑡superscript𝐗F2\displaystyle=\frac{1-p}{4\chi^{2}}\Big{(}\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-% 1}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{W}}_{b}(\hat{{\bf{Z}}}^{t% }-{\bf{X}}^{\star})\|_{\mathrm{F}}^{2}+\|\upsilon p({\bf{Q}}^{\mathrm{s}})^{-1% }\hat{{\bf{P}}}^{\sf T}{\bf{W}}_{b}(\hat{{\bf{Z}}}^{t}-{\bf{X}}^{\star})\|_{% \mathrm{F}}^{2}\Big{)}= divide start_ARG 1 - italic_p end_ARG start_ARG 4 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_υ italic_p ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
(1p)(2+p2)2χ2υ(𝐐s)1𝐏^𝖳𝐙~tF2.absent1𝑝2superscript𝑝22superscript𝜒2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐙𝑡F2\displaystyle\leq\frac{(1-p)(2+p^{2})}{2\chi^{2}}\|\upsilon({\bf{Q}}^{\mathrm{% s}})^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}.≤ divide start_ARG ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Hence, it gives that

𝔼[st+1F2|t]𝔾stF2+(1p)(2+p2)2χ2υ(𝐐s)1𝐏^𝖳𝐙~tF2.𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscripts𝑡1F2superscript𝑡superscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F21𝑝2superscript𝑝22superscript𝜒2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐙𝑡F2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\;|\;\mathcal{F}^{t}\right]\leq\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}% }^{2}+\frac{(1-p)(2+p^{2})}{2\chi^{2}}\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}% \hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}.blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Taking conditioned expectation with respect to 𝒢ttsuperscript𝒢𝑡superscript𝑡\mathcal{G}^{t}\subset\mathcal{F}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and using the unbiasedness of 𝐆tsuperscript𝐆𝑡{\bf{G}}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, we have

𝔼[st+1F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscripts𝑡1F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] 𝔼[𝔾stF2|𝒢t]+(1p)(2+p2)2χ2𝔼[υ(𝐐s)1𝐏^𝖳𝐙~tF2|𝒢t].absent𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2superscript𝒢𝑡1𝑝2superscript𝑝22superscript𝜒2𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐙𝑡F2superscript𝒢𝑡\displaystyle\leq\mathbb{E}\!\left[\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}% }^{2}\;|\;\mathcal{G}^{t}\right]+\frac{(1-p)(2+p^{2})}{2\chi^{2}}\mathbb{E}\!% \left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{% \bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right].≤ blackboard_E [ ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + divide start_ARG ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] . (83)

Let υ=1/(𝐐s)1𝜐1normsuperscriptsuperscript𝐐s1\upsilon=1/\|({\bf{Q}}^{\mathrm{s}})^{-1}\|italic_υ = 1 / ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥. 𝔼[𝔾stF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2superscript𝒢𝑡\mathbb{E}\!\left[\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\;|\;% \mathcal{G}^{t}\right]blackboard_E [ ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] can be bounded as follows:

𝔼[𝔾stF2|𝒢t]=𝔼[𝚪stυα(𝐐s)1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗)+𝐒t)]F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2superscript𝒢𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝚪superscriptsubscripts𝑡𝜐𝛼superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2% }\;|\;\mathcal{G}^{t}\right]=\mathbb{E}\!\left[\left\|{\bf{\Gamma}}\mathcal{E}% _{\mathrm{s}}^{t}-\upsilon\alpha({\bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array% }[]{c}\hat{{\bf{\Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-% \nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star})+{\bf{S}}^{t})\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = blackboard_E [ ∥ bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_υ italic_α ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=𝚪stυα(𝐐s)1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗))p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗))]F2+υ2α2𝔼[(𝐐s)1[𝚲^a𝐏^𝖳𝐒tp2χ𝚲^b𝐏^𝖳𝐒t]F2|𝒢t]absentsuperscriptsubscriptnorm𝚪superscriptsubscripts𝑡𝜐𝛼superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗F2superscript𝜐2superscript𝛼2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳superscript𝐒𝑡𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐒𝑡F2superscript𝒢𝑡\displaystyle=\left\|{\bf{\Gamma}}\mathcal{E}_{\mathrm{s}}^{t}-\upsilon\alpha(% {\bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}% \hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star}))\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star}))\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}+\upsilon^{2}\alpha^{2}\mathbb{E}\!% \left[\left\|({\bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}\hat{{\bf{% \Lambda}}}_{a}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{S}}^{t}\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]= ∥ bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_υ italic_α ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_υ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
𝚪stυα(𝐐s)1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗))p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗))]F2+(p2+2χ2)nα2σ22χ2.absentsuperscriptsubscriptnorm𝚪superscriptsubscripts𝑡𝜐𝛼superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗F2superscript𝑝22superscript𝜒2𝑛superscript𝛼2superscript𝜎22superscript𝜒2\displaystyle\leq\left\|{\bf{\Gamma}}\mathcal{E}_{\mathrm{s}}^{t}-\upsilon% \alpha({\bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}% _{a}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star}))% \\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star}))\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}+\frac{(p^{2}+2\chi^{2})n\alpha^{2}% \sigma^{2}}{2\chi^{2}}\ .≤ ∥ bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_υ italic_α ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The last inequality holds due to 𝚲^a1normsubscript^𝚲𝑎1\|\hat{{\bf{\Lambda}}}_{a}\|\leq 1∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∥ ≤ 1, 𝚲^b22superscriptnormsubscript^𝚲𝑏22\|\hat{{\bf{\Lambda}}}_{b}\|^{2}\leq 2∥ over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2, and υ=(𝐐s)1𝜐normsuperscriptsuperscript𝐐s1\upsilon=\|({\bf{Q}}^{\mathrm{s}})^{-1}\|italic_υ = ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥. For any vectors 𝐚𝐚{\bf{a}}bold_a and 𝐛𝐛{\bf{b}}bold_b, it holds from Jensen’s inequality that 𝐚+𝐛21θ𝐚2+11θ𝐛2superscriptnorm𝐚𝐛21𝜃superscriptnorm𝐚211𝜃superscriptnorm𝐛2\|{\bf{a+b}}\|^{2}\leq\frac{1}{\theta}\|{\bf{a}}\|^{2}+\frac{1}{1-\theta}\|{% \bf{b}}\|^{2}∥ bold_a + bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ∥ bold_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 1 - italic_θ end_ARG ∥ bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for any θ(0,1)𝜃01\theta\in(0,1)italic_θ ∈ ( 0 , 1 ). Therefore, letting θ=𝚪:=γ𝜃norm𝚪assign𝛾\theta=\|{\bf{\Gamma}}\|:=\gammaitalic_θ = ∥ bold_Γ ∥ := italic_γ, it holds that

𝚪stυα(𝐐s)1[𝚲^a𝐏^𝖳(F(𝐗t)F(𝐗))p2χ𝚲^b𝐏^𝖳(F(𝐗t)F(𝐗))]F2superscriptsubscriptnorm𝚪superscriptsubscripts𝑡𝜐𝛼superscriptsuperscript𝐐s1delimited-[]subscript^𝚲𝑎superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗𝑝2𝜒subscript^𝚲𝑏superscript^𝐏𝖳𝐹superscript𝐗𝑡𝐹superscript𝐗F2\displaystyle\left\|{\bf{\Gamma}}\mathcal{E}_{\mathrm{s}}^{t}-\upsilon\alpha({% \bf{Q}}^{\mathrm{s}})^{-1}\left[\begin{array}[]{c}\hat{{\bf{\Lambda}}}_{a}\hat% {{\bf{P}}}^{\sf T}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star}))\\ \frac{p}{2\chi}\hat{{\bf{\Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}(\nabla F({\bf{X}% }^{t})-\nabla F({\bf{X}}^{\star}))\\ \end{array}\right]\right\|_{\mathrm{F}}^{2}∥ bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_υ italic_α ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 2 italic_χ end_ARG over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW end_ARRAY ] ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
1γ𝚪stF2+α2(2χ2+p2)2χ2(1γ)F(𝐗t)F(𝐗)F2absent1𝛾superscriptsubscriptnorm𝚪superscriptsubscripts𝑡F2superscript𝛼22superscript𝜒2superscript𝑝22superscript𝜒21𝛾superscriptsubscriptnorm𝐹superscript𝐗𝑡𝐹superscript𝐗F2\displaystyle\leq\frac{1}{\gamma}\|{\bf{\Gamma}}\mathcal{E}_{\mathrm{s}}^{t}\|% _{\mathrm{F}}^{2}+\frac{\alpha^{2}(2\chi^{2}+p^{2})}{2\chi^{2}(1-\gamma)}\|% \nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star})\|_{\mathrm{F}}^{2}≤ divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_Γ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ ) end_ARG ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
γstF2+α2L2(2χ2+p2)2χ2(1γ)𝐗t𝐗F2.absent𝛾superscriptsubscriptnormsuperscriptsubscripts𝑡F2superscript𝛼2superscript𝐿22superscript𝜒2superscript𝑝22superscript𝜒21𝛾superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F2\displaystyle\leq\gamma\|\mathcal{E}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+\frac% {\alpha^{2}L^{2}(2\chi^{2}+p^{2})}{2\chi^{2}(1-\gamma)}\|{\bf{X}}^{t}-{\bf{X}}% ^{\star}\|_{\mathrm{F}}^{2}\ .≤ italic_γ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ ) end_ARG ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Then, we have

𝔼[𝔾stF2|𝒢t]γstF2+α2L2(2χ2+p2)2χ2(1γ)𝐗~tF2+(p2+2χ2)nα2σ22χ2.𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscriptsubscript𝔾s𝑡F2superscript𝒢𝑡𝛾superscriptsubscriptnormsuperscriptsubscripts𝑡F2superscript𝛼2superscript𝐿22superscript𝜒2superscript𝑝22superscript𝜒21𝛾superscriptsubscriptnormsuperscript~𝐗𝑡F2superscript𝑝22superscript𝜒2𝑛superscript𝛼2superscript𝜎22superscript𝜒2\displaystyle\mathbb{E}\!\left[\|\mathbb{G}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2% }\;|\;\mathcal{G}^{t}\right]\leq\gamma\|\mathcal{E}_{\mathrm{s}}^{t}\|_{% \mathrm{F}}^{2}+\frac{\alpha^{2}L^{2}(2\chi^{2}+p^{2})}{2\chi^{2}(1-\gamma)}\|% \widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+\frac{(p^{2}+2\chi^{2})n\alpha^{2}% \sigma^{2}}{2\chi^{2}}\ .blackboard_E [ ∥ blackboard_G start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ italic_γ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ ) end_ARG ∥ over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (84)

In addition, we bound 𝔼[υ(𝐐s)1𝐏^𝖳𝐙~tF2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐙𝑡F2superscript𝒢𝑡\mathbb{E}\!\left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}% \widetilde{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] as follows:

𝔼[υ(𝐐s)1𝐏^𝖳𝐙~tF2|𝒢t]=𝔼[υ(𝐐s)1𝐏^𝖳(𝐗~tα(F(𝐗t)F(𝐗)+𝐒t)𝐖b𝐔~t)F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐙𝑡F2superscript𝒢𝑡𝔼delimited-[]conditionalsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗𝑡𝛼𝐹superscript𝐗𝑡𝐹superscript𝐗superscript𝐒𝑡subscript𝐖𝑏superscript~𝐔𝑡F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf% {P}}}^{\sf T}\widetilde{{\bf{Z}}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}% \right]=\mathbb{E}\!\left[\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}% ^{\sf T}(\widetilde{{\bf{X}}}^{t}-\alpha(\nabla F({\bf{X}}^{t})-\nabla F({\bf{% X}}^{\star})+{\bf{S}}^{t})-{\bf{W}}_{b}\widetilde{{\bf{U}}}^{t})\|_{\mathrm{F}% }^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = blackboard_E [ ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=υ(𝐐s)1𝐏^𝖳(𝐗~tα(F(𝐗t)F(𝐗))𝐖b𝐔~t)F2+𝔼[α2υ(𝐐s)1𝐏^𝖳𝐒tF2|𝒢t]absentsuperscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗𝑡𝛼𝐹superscript𝐗𝑡𝐹superscript𝐗subscript𝐖𝑏superscript~𝐔𝑡F2𝔼delimited-[]conditionalsuperscript𝛼2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript𝐒𝑡F2superscript𝒢𝑡\displaystyle=\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}(% \widetilde{{\bf{X}}}^{t}-\alpha(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{% \star}))-{\bf{W}}_{b}\widetilde{{\bf{U}}}^{t})\|_{\mathrm{F}}^{2}+\mathbb{E}\!% \left[\alpha^{2}\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}{% \bf{S}}^{t}\|_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]= ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) - bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E [ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
3υ(𝐐s)1𝐏^𝖳𝐗~tF2+3α2υ(𝐐s)1(F(𝐗t)F(𝐗))F2+3υ(𝐐s)1𝐏^𝖳𝐖b𝐔~tF2+nα2σ2absent3superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗𝑡F23superscript𝛼2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1𝐹superscript𝐗𝑡𝐹superscript𝐗F23superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳subscript𝐖𝑏superscript~𝐔𝑡F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq 3\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T% }\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+3\alpha^{2}\|\upsilon({\bf{Q}}^{% \mathrm{s}})^{-1}(\nabla F({\bf{X}}^{t})-\nabla F({\bf{X}}^{\star}))\|_{% \mathrm{F}}^{2}+3\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}{% \bf{W}}_{b}\widetilde{{\bf{U}}}^{t}\|_{\mathrm{F}}^{2}+n\alpha^{2}\sigma^{2}≤ 3 ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
3υ(𝐐s)1𝐏^𝖳𝐗~tF2+6υ(𝐐s)1𝐏^𝖳𝐔~tF2+3α2L2𝐗t𝐗F2+nα2σ2absent3superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗𝑡F26superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐔𝑡F23superscript𝛼2superscript𝐿2superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq 3\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T% }\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+6\|\upsilon({\bf{Q}}^{\mathrm{s}}% )^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{t}\|_{\mathrm{F}}^{2}+3% \alpha^{2}L^{2}\|{\bf{X}}^{t}-{\bf{X}}^{\star}\|_{\mathrm{F}}^{2}+n\alpha^{2}% \sigma^{2}≤ 3 ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 6 ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
6stF2+3α2L2𝐗~tF2+nα2σ2.absent6superscriptsubscriptnormsuperscriptsubscripts𝑡F23superscript𝛼2superscript𝐿2superscriptsubscriptnormsuperscript~𝐗𝑡F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq 6\|\mathcal{E}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+3\alpha^{% 2}L^{2}\|\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}+n\alpha^{2}\sigma^{2}.≤ 6 ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (85)

Therefore, substituting (84) and (M) into (83), we can conclude (49). ∎

Appendix N Proofs of Theorem 7

Proof.

From [51, eq. (27)], we have

𝔼[𝐗t+1𝐗F2|t]+2χα2p2𝔼[𝐔t+1𝐔F2|t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐗𝑡1superscript𝐗F2subscript𝑡2𝜒superscript𝛼2superscript𝑝2𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript𝐔𝑡1superscript𝐔F2subscript𝑡\displaystyle\mathbb{E}\!\left[\big{\|}{\bf{X}}^{t+1}-{\bf{X}}^{\star}\big{\|}% _{\mathrm{F}}^{2}\;|\;\mathcal{F}_{t}\right]+\frac{2\chi\alpha^{2}}{p^{2}}% \mathbb{E}\!\left[\big{\|}{\bf{U}}^{t+1}-{\bf{U}}^{\star}\big{\|}_{\mathrm{F}}% ^{2}\;|\;\mathcal{F}_{t}\right]blackboard_E [ ∥ bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] + divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ∥ bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]
𝐕~t𝐕F2+α2(2χp2(1λ2))𝐔t𝐔F2.absentsuperscriptsubscriptnormsuperscript~𝐕𝑡superscript𝐕F2superscript𝛼22𝜒superscript𝑝21subscript𝜆2superscriptsubscriptnormsuperscript𝐔𝑡superscript𝐔F2\displaystyle\leq\big{\|}\tilde{{\bf{V}}}^{t}-{\bf{V}}^{\star}\big{\|}_{% \mathrm{F}}^{2}+\alpha^{2}\Big{(}\frac{2\chi}{p^{2}}-(1-\lambda_{2})\Big{)}% \big{\|}{\bf{U}}^{t}-{\bf{U}}^{\star}\big{\|}_{\mathrm{F}}^{2}\ .≤ ∥ over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG 2 italic_χ end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ∥ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (86)

Then, recalling the definition of 𝐕~tsuperscript~𝐕𝑡\tilde{{\bf{V}}}^{t}over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐕superscript𝐕{\bf{V}}^{\star}bold_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, it gives that

𝐕~t𝐕F2superscriptsubscriptnormsuperscript~𝐕𝑡superscript𝐕F2\displaystyle\big{\|}\tilde{{\bf{V}}}^{t}-{\bf{V}}^{\star}\big{\|}_{\mathrm{F}% }^{2}∥ over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(𝐗tαF(𝐗t))(𝐗αF(𝐗))+(αF(𝐗t)α𝐆t)F2absentsuperscriptsubscriptnormsuperscript𝐗𝑡𝛼𝐹superscript𝐗𝑡superscript𝐗𝛼𝐹superscript𝐗𝛼𝐹superscript𝐗𝑡𝛼superscript𝐆𝑡F2\displaystyle=\big{\|}({\bf{X}}^{t}-\alpha\nabla F({\bf{X}}^{t}))-({\bf{X}}^{% \star}-\alpha\nabla F({\bf{X}}^{\star}))+(\alpha\nabla F({\bf{X}}^{t})-\alpha{% \bf{G}}^{t})\big{\|}_{\mathrm{F}}^{2}= ∥ ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) - ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) + ( italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(𝐗tαF(𝐗t))(𝐗αF(𝐗))F2+αF(𝐗t)α𝐆tF2absentsuperscriptsubscriptnormsuperscript𝐗𝑡𝛼𝐹superscript𝐗𝑡superscript𝐗𝛼𝐹superscript𝐗F2superscriptsubscriptnorm𝛼𝐹superscript𝐗𝑡𝛼superscript𝐆𝑡F2\displaystyle=\big{\|}({\bf{X}}^{t}-\alpha\nabla F({\bf{X}}^{t}))-({\bf{X}}^{% \star}-\alpha\nabla F({\bf{X}}^{\star}))\big{\|}_{\mathrm{F}}^{2}+\big{\|}% \alpha\nabla F({\bf{X}}^{t})-\alpha{\bf{G}}^{t}\big{\|}_{\mathrm{F}}^{2}= ∥ ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) - ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2(𝐗tαF(𝐗t))(𝐗αF(𝐗)),αF(𝐗t)α𝐆t.2superscript𝐗𝑡𝛼𝐹superscript𝐗𝑡superscript𝐗𝛼𝐹superscript𝐗𝛼𝐹superscript𝐗𝑡𝛼superscript𝐆𝑡\displaystyle\quad+2\big{\langle}({\bf{X}}^{t}-\alpha\nabla F({\bf{X}}^{t}))-(% {\bf{X}}^{\star}-\alpha\nabla F({\bf{X}}^{\star})),\alpha\nabla F({\bf{X}}^{t}% )-\alpha{\bf{G}}^{t}\big{\rangle}.+ 2 ⟨ ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) - ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) , italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_α bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ .

Taking conditioned expectation with respect to 𝒢ttsuperscript𝒢𝑡superscript𝑡\mathcal{G}^{t}\subset\mathcal{F}^{t}caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊂ caligraphic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and using the unbiasedness of 𝐆tsuperscript𝐆𝑡{\bf{G}}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, we have

𝔼[𝐕~t𝐕F2|𝒢t]𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript~𝐕𝑡superscript𝐕F2superscript𝒢𝑡\displaystyle\mathbb{E}\!\left[\big{\|}\tilde{{\bf{V}}}^{t}-{\bf{V}}^{\star}% \big{\|}_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]blackboard_E [ ∥ over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] (𝐗tαF(𝐗t))(𝐗αF(𝐗))F2+nα2σ2.absentsuperscriptsubscriptnormsuperscript𝐗𝑡𝛼𝐹superscript𝐗𝑡superscript𝐗𝛼𝐹superscript𝐗F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq\big{\|}({\bf{X}}^{t}-\alpha\nabla F({\bf{X}}^{t}))-({\bf{X}}% ^{\star}-\alpha\nabla F({\bf{X}}^{\star}))\big{\|}_{\mathrm{F}}^{2}+n\alpha^{2% }\sigma^{2}.≤ ∥ ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) - ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (87)

By [51, Lemma 1], it gives that when 0<α<2/L0𝛼2𝐿0<\alpha<2/L0 < italic_α < 2 / italic_L and μ>0𝜇0\mu>0italic_μ > 0

(𝐗tαF(𝐗t))(𝐗αF(𝐗))F2max{(1αμ)2,(αL1)2}𝐗t𝐗F2,superscriptsubscriptnormsuperscript𝐗𝑡𝛼𝐹superscript𝐗𝑡superscript𝐗𝛼𝐹superscript𝐗F2superscript1𝛼𝜇2superscript𝛼𝐿12superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F2\displaystyle\big{\|}({\bf{X}}^{t}-\alpha\nabla F({\bf{X}}^{t}))-({\bf{X}}^{% \star}-\alpha\nabla F({\bf{X}}^{\star}))\big{\|}_{\mathrm{F}}^{2}\leq\max\{(1-% \alpha\mu)^{2},(\alpha L-1)^{2}\}\big{\|}{\bf{X}}^{t}-{\bf{X}}^{\star}\big{\|}% _{\mathrm{F}}^{2},∥ ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) - ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT - italic_α ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_max { ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_α italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (88)

and max{(1αμ)2,(αL1)2}(0,1)superscript1𝛼𝜇2superscript𝛼𝐿1201\max\{(1-\alpha\mu)^{2},(\alpha L-1)^{2}\}\in(0,1)roman_max { ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_α italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ∈ ( 0 , 1 ). Combining with (87), it gives that

𝔼[𝐕~t𝐕F2|𝒢t]max{(1αμ)2,(αL1)2}𝐗t𝐗F2+nα2σ2.𝔼delimited-[]conditionalsuperscriptsubscriptnormsuperscript~𝐕𝑡superscript𝐕F2superscript𝒢𝑡superscript1𝛼𝜇2superscript𝛼𝐿12superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F2𝑛superscript𝛼2superscript𝜎2\displaystyle\mathbb{E}\!\left[\big{\|}\tilde{{\bf{V}}}^{t}-{\bf{V}}^{\star}% \big{\|}_{\mathrm{F}}^{2}\;|\;\mathcal{G}^{t}\right]\leq\max\{(1-\alpha\mu)^{2% },(\alpha L-1)^{2}\}\big{\|}{\bf{X}}^{t}-{\bf{X}}^{\star}\big{\|}_{\mathrm{F}}% ^{2}+n\alpha^{2}\sigma^{2}.blackboard_E [ ∥ over~ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_V start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ roman_max { ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_α italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (89)

Then, it follows from (N) and (89) that

𝔼[𝐗t+1𝐗F2]+2χα2p2𝔼[𝐔t+1𝐔F2]𝔼delimited-[]superscriptsubscriptnormsuperscript𝐗𝑡1superscript𝐗F22𝜒superscript𝛼2superscript𝑝2𝔼delimited-[]superscriptsubscriptnormsuperscript𝐔𝑡1superscript𝐔F2\displaystyle\mathbb{E}\!\left[\big{\|}{\bf{X}}^{t+1}-{\bf{X}}^{\star}\big{\|}% _{\mathrm{F}}^{2}\right]+\frac{2\chi\alpha^{2}}{p^{2}}\mathbb{E}\!\left[\big{% \|}{\bf{U}}^{t+1}-{\bf{U}}^{\star}\big{\|}_{\mathrm{F}}^{2}\right]blackboard_E [ ∥ bold_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ ∥ bold_U start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
max{(1αμ)2,(αL1)2}𝐗t𝐗F2+nα2σ2+(2χα2p2(1λ2)α2)𝐔t𝐔F2absentsuperscript1𝛼𝜇2superscript𝛼𝐿12superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F2𝑛superscript𝛼2superscript𝜎22𝜒superscript𝛼2superscript𝑝21subscript𝜆2superscript𝛼2superscriptsubscriptnormsuperscript𝐔𝑡superscript𝐔F2\displaystyle\leq\max\{(1-\alpha\mu)^{2},(\alpha L-1)^{2}\}\big{\|}{\bf{X}}^{t% }-{\bf{X}}^{\star}\big{\|}_{\mathrm{F}}^{2}+n\alpha^{2}\sigma^{2}+(\frac{2\chi% \alpha^{2}}{p^{2}}-(1-\lambda_{2})\alpha^{2})\big{\|}{\bf{U}}^{t}-{\bf{U}}^{% \star}\big{\|}_{\mathrm{F}}^{2}≤ roman_max { ( 1 - italic_α italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_α italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
max{(1μα)2,(αL1)2,1(1λ2)p22χ}(𝐗t𝐗F2+2χα2p2𝐔t𝐔F2)+nα2σ2absentsuperscript1𝜇𝛼2superscript𝛼𝐿1211subscript𝜆2superscript𝑝22𝜒superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F22𝜒superscript𝛼2superscript𝑝2superscriptsubscriptnormsuperscript𝐔𝑡superscript𝐔F2𝑛superscript𝛼2superscript𝜎2\displaystyle\leq\max\{(1-\mu\alpha)^{2},(\alpha L-1)^{2},1-\frac{(1-\lambda_{% 2})p^{2}}{2\chi}\}\Big{(}\|{\bf{X}}^{t}-{\bf{X}}^{\star}\|_{\mathrm{F}}^{2}+% \frac{2\chi\alpha^{2}}{p^{2}}\|{\bf{U}}^{t}-{\bf{U}}^{\star}\|_{\mathrm{F}}^{2% }\Big{)}+n\alpha^{2}\sigma^{2}≤ roman_max { ( 1 - italic_μ italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_α italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG } ( ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=max{1(2μαμ2α2),1(2αLα2L2),1(1λ2)p22χ}:=ζ(𝐗t𝐗F2+2χα2p2𝐔t𝐔F2)+nα2σ2.absentsubscript12𝜇𝛼superscript𝜇2superscript𝛼212𝛼𝐿superscript𝛼2superscript𝐿211subscript𝜆2superscript𝑝22𝜒assignabsent𝜁superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F22𝜒superscript𝛼2superscript𝑝2superscriptsubscriptnormsuperscript𝐔𝑡superscript𝐔F2𝑛superscript𝛼2superscript𝜎2\displaystyle=\underbrace{\max\{1-(2\mu\alpha-\mu^{2}\alpha^{2}),1-(2\alpha L-% \alpha^{2}L^{2}),1-\frac{(1-\lambda_{2})p^{2}}{2\chi}\}}_{:=\zeta}\Big{(}\|{% \bf{X}}^{t}-{\bf{X}}^{\star}\|_{\mathrm{F}}^{2}+\frac{2\chi\alpha^{2}}{p^{2}}% \|{\bf{U}}^{t}-{\bf{U}}^{\star}\|_{\mathrm{F}}^{2}\Big{)}+n\alpha^{2}\sigma^{2}.= under⏟ start_ARG roman_max { 1 - ( 2 italic_μ italic_α - italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , 1 - ( 2 italic_α italic_L - italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG } end_ARG start_POSTSUBSCRIPT := italic_ζ end_POSTSUBSCRIPT ( ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since 0<α<2L0𝛼2𝐿0<\alpha<\frac{2}{L}0 < italic_α < divide start_ARG 2 end_ARG start_ARG italic_L end_ARG, 0<1λ22<101subscript𝜆2210<\frac{1-\lambda_{2}}{2}<10 < divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG < 1 and 0<p210superscript𝑝210<p^{2}\leq 10 < italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1, we have 0<ζ<10𝜁10<\zeta<10 < italic_ζ < 1. It follows from Ψt=𝐗t𝐗F2+2χα2p2𝐔t𝐔F2superscriptΨ𝑡superscriptsubscriptnormsuperscript𝐗𝑡superscript𝐗F22𝜒superscript𝛼2superscript𝑝2superscriptsubscriptnormsuperscript𝐔𝑡superscript𝐔F2\Psi^{t}=\|{\bf{X}}^{t}-{\bf{X}}^{\star}\|_{\mathrm{F}}^{2}+\frac{2\chi\alpha^% {2}}{p^{2}}\|{\bf{U}}^{t}-{\bf{U}}^{\star}\|_{\mathrm{F}}^{2}roman_Ψ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∥ bold_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_χ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT that

𝔼[Ψt+1]ζΨt+nα2σ2.𝔼delimited-[]superscriptΨ𝑡1𝜁superscriptΨ𝑡𝑛superscript𝛼2superscript𝜎2\mathbb{E}\!\left[\Psi^{t+1}\right]\leq\zeta\Psi^{t}+n\alpha^{2}\sigma^{2}.blackboard_E [ roman_Ψ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ] ≤ italic_ζ roman_Ψ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Taking full expectation, and unrolling the recurrence, we have

𝔼[ΨT]ζTΨ0+nα2σ21ζ.𝔼delimited-[]superscriptΨ𝑇superscript𝜁𝑇superscriptΨ0𝑛superscript𝛼2superscript𝜎21𝜁\displaystyle\mathbb{E}\!\left[\Psi^{T}\right]\leq\zeta^{T}\Psi^{0}+\frac{n% \alpha^{2}\sigma^{2}}{1-\zeta}.blackboard_E [ roman_Ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] ≤ italic_ζ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + divide start_ARG italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG . (90)

Note that

1p22χκw=1p2(1λ2)2χ1p2(1λ2)2χ<1, and γ=11λ22χ.formulae-sequence1superscript𝑝22𝜒subscript𝜅𝑤1superscript𝑝21subscript𝜆22𝜒1superscript𝑝21subscript𝜆22𝜒1 and 𝛾11subscript𝜆22𝜒1-\frac{p^{2}}{2\chi\kappa_{w}}=1-\frac{p^{2}(1-\lambda_{2})}{2\chi}\leq\sqrt{% 1-\frac{p^{2}(1-\lambda_{2})}{2\chi}}<1,\text{ and }\gamma=\sqrt{1-\frac{1-% \lambda_{2}}{2\chi}}.1 - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ italic_κ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG = 1 - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG ≤ square-root start_ARG 1 - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG < 1 , and italic_γ = square-root start_ARG 1 - divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_χ end_ARG end_ARG .

Since γ~s=γ+3(1p)(2+p2)χ2=11λ22χ+3(1p)(2+p2)χ2subscript~𝛾s𝛾31𝑝2superscript𝑝2superscript𝜒211subscript𝜆22𝜒31𝑝2superscript𝑝2superscript𝜒2\tilde{\gamma}_{\mathrm{s}}=\gamma+\frac{3(1-p)(2+p^{2})}{\chi^{2}}=\sqrt{1-% \frac{1-\lambda_{2}}{2\chi}}+\frac{3(1-p)(2+p^{2})}{\chi^{2}}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = italic_γ + divide start_ARG 3 ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = square-root start_ARG 1 - divide start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_χ end_ARG end_ARG + divide start_ARG 3 ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, we have

χ361λ2γ~s1p22χκw=1p2(1λ2)2χ<1.𝜒361subscript𝜆2subscript~𝛾s1superscript𝑝22𝜒subscript𝜅𝑤1superscript𝑝21subscript𝜆22𝜒1\chi\geq\frac{36}{1-\lambda_{2}}\Longrightarrow\tilde{\gamma}_{\mathrm{s}}\leq% \sqrt{1-\frac{p^{2}}{2\chi\kappa_{w}}}=\sqrt{1-\frac{p^{2}(1-\lambda_{2})}{2% \chi}}<1.italic_χ ≥ divide start_ARG 36 end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⟹ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ≤ square-root start_ARG 1 - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ italic_κ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG end_ARG = square-root start_ARG 1 - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_χ end_ARG end_ARG < 1 .

From (90), we have 𝔼[𝐗~tF2]𝔼[Ψt]ζtΨ0+nα2σ21ζ𝔼delimited-[]superscriptsubscriptnormsuperscript~𝐗𝑡F2𝔼delimited-[]superscriptΨ𝑡superscript𝜁𝑡superscriptΨ0𝑛superscript𝛼2superscript𝜎21𝜁\mathbb{E}[\|\widetilde{{\bf{X}}}^{t}\|_{\mathrm{F}}^{2}]\leq\mathbb{E}[\Psi^{% t}]\leq\zeta^{t}\Psi^{0}+\frac{n\alpha^{2}\sigma^{2}}{1-\zeta}blackboard_E [ ∥ over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ blackboard_E [ roman_Ψ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≤ italic_ζ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_Ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + divide start_ARG italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG. Substituting it to (49), we get

𝔼[st+1F2]γ~s𝔼[stF2]+F1ζt+F2,𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscripts𝑡1F2subscript~𝛾s𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscripts𝑡F2subscript𝐹1superscript𝜁𝑡subscript𝐹2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\right]\leq\tilde{\gamma}_{\mathrm{s}}\mathbb{E}\!\left[\|\mathcal{E}_{% \mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\right]+F_{1}\zeta^{t}+F_{2},blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (91)

where F1=D1Ψ0subscript𝐹1subscript𝐷1superscriptΨ0F_{1}=D_{1}\Psi^{0}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and F2=D1nα2σ21ζ+D2nα2σ2subscript𝐹2subscript𝐷1𝑛superscript𝛼2superscript𝜎21𝜁subscript𝐷2𝑛superscript𝛼2superscript𝜎2F_{2}=\frac{D_{1}n\alpha^{2}\sigma^{2}}{1-\zeta}+D_{2}n\alpha^{2}\sigma^{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG + italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Unrolling the recurrence (91), we have

𝔼[st+1F2]𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscripts𝑡1F2\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\right]blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] γ~s𝔼[stF2]+F1ζt+F2absentsubscript~𝛾s𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscripts𝑡F2subscript𝐹1superscript𝜁𝑡subscript𝐹2\displaystyle\leq\tilde{\gamma}_{\mathrm{s}}\mathbb{E}\!\left[\|\mathcal{E}_{% \mathrm{s}}^{t}\|_{\mathrm{F}}^{2}\right]+F_{1}\zeta^{t}+F_{2}≤ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
γ~st+1s0F2+F1j=0tγ~sjζtj+F2j=0tζjabsentsuperscriptsubscript~𝛾s𝑡1superscriptsubscriptnormsuperscriptsubscripts0F2subscript𝐹1superscriptsubscript𝑗0𝑡superscriptsubscript~𝛾s𝑗superscript𝜁𝑡𝑗subscript𝐹2superscriptsubscript𝑗0𝑡superscript𝜁𝑗\displaystyle\leq\tilde{\gamma}_{\mathrm{s}}^{t+1}{\|\mathcal{E}_{\mathrm{s}}^% {0}\|_{\mathrm{F}}^{2}}+F_{1}\sum_{j=0}^{t}\tilde{\gamma}_{\mathrm{s}}^{j}% \zeta^{t-j}+F_{2}\sum_{j=0}^{t}\zeta^{j}≤ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_t - italic_j end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT
=γ~st+1s0F2+F1ζt+1γ~st+1ζγ~s+F21γ~st+11γ~sabsentsuperscriptsubscript~𝛾s𝑡1superscriptsubscriptnormsuperscriptsubscripts0F2subscript𝐹1superscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssubscript𝐹21superscriptsubscript~𝛾s𝑡11subscript~𝛾s\displaystyle=\tilde{\gamma}_{\mathrm{s}}^{t+1}{\|\mathcal{E}_{\mathrm{s}}^{0}% \|_{\mathrm{F}}^{2}}+F_{1}\frac{\zeta^{t+1}-\tilde{\gamma}_{\mathrm{s}}^{t+1}}% {\zeta-\tilde{\gamma}_{\mathrm{s}}}+F_{2}\frac{1-\tilde{\gamma}_{\mathrm{s}}^{% t+1}}{1-\tilde{\gamma}_{\mathrm{s}}}= over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG
=γ~st+1(υ(𝐐s)1𝐏^𝖳𝐗~0F2+υ(𝐐s)1𝐏^𝖳𝐔~0F2)+F1ζt+1γ~st+1ζγ~s+F21γ~st+11γ~s.absentsuperscriptsubscript~𝛾s𝑡1superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗0F2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐔0F2subscript𝐹1superscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssubscript𝐹21superscriptsubscript~𝛾s𝑡11subscript~𝛾s\displaystyle=\tilde{\gamma}_{\mathrm{s}}^{t+1}\Big{(}{\|\upsilon({\bf{Q}}^{% \mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}}}^{0}\|_{\mathrm{F}}% ^{2}+\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{% \bf{U}}}^{0}\|_{\mathrm{F}}^{2}}\Big{)}+F_{1}\frac{\zeta^{t+1}-\tilde{\gamma}_% {\mathrm{s}}^{t+1}}{\zeta-\tilde{\gamma}_{\mathrm{s}}}+F_{2}\frac{1-\tilde{% \gamma}_{\mathrm{s}}^{t+1}}{1-\tilde{\gamma}_{\mathrm{s}}}.= over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG . (92)

Since 𝐗0=[𝐱0,,𝐱0]𝖳superscript𝐗0superscriptsuperscript𝐱0superscript𝐱0𝖳{\bf{X}}^{0}=[{\bf{x}}^{0},\cdots,{\bf{x}}^{0}]^{\sf T}bold_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = [ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT and 𝐔0=0superscript𝐔00{\bf{U}}^{0}=0bold_U start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 0, we have

υ(𝐐s)1𝐏^𝖳𝐗~0F2+υ(𝐐s)1𝐏^𝖳𝐔~0F2α2𝐏^𝖳𝐔F2.superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗0F2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐔0F2superscript𝛼2superscriptsubscriptnormsuperscript^𝐏𝖳superscript𝐔F2\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{X}% }}^{0}\|_{\mathrm{F}}^{2}+\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}% ^{\sf T}\widetilde{{\bf{U}}}^{0}\|_{\mathrm{F}}^{2}\leq\alpha^{2}\|\hat{{\bf{P% }}}^{\sf T}{\bf{U}}^{\star}\|_{\mathrm{F}}^{2}.∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Multiplying (46a) by 𝐏^𝖳superscript^𝐏𝖳\hat{{\bf{P}}}^{\sf T}over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT and using (29), we have

0=α𝐏^𝖳F(𝐗)+α𝚲^b𝐏^𝖳𝐔.0𝛼superscript^𝐏𝖳𝐹superscript𝐗𝛼subscript^𝚲𝑏superscript^𝐏𝖳superscript𝐔0=\alpha\hat{{\bf{P}}}^{\sf T}\nabla F({\bf{X}}^{\star})+\alpha\hat{{\bf{% \Lambda}}}_{b}\hat{{\bf{P}}}^{\sf T}{\bf{U}}^{\star}.0 = italic_α over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + italic_α over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT .

Then, it holds that

υ(𝐐s)1𝐏^𝖳𝐗~0F2+υ(𝐐s)1𝐏^𝖳𝐔~0F2α2𝐏^𝖳𝐔F2α21λ2F(𝐗)F2.superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐗0F2superscriptsubscriptnorm𝜐superscriptsuperscript𝐐s1superscript^𝐏𝖳superscript~𝐔0F2superscript𝛼2superscriptsubscriptnormsuperscript^𝐏𝖳superscript𝐔F2superscript𝛼21subscript𝜆2superscriptsubscriptnorm𝐹superscript𝐗F2\displaystyle\|\upsilon({\bf{Q}}^{\mathrm{s}})^{-1}\hat{{\bf{P}}}^{\sf T}% \widetilde{{\bf{X}}}^{0}\|_{\mathrm{F}}^{2}+\|\upsilon({\bf{Q}}^{\mathrm{s}})^% {-1}\hat{{\bf{P}}}^{\sf T}\widetilde{{\bf{U}}}^{0}\|_{\mathrm{F}}^{2}\leq% \alpha^{2}\|\hat{{\bf{P}}}^{\sf T}{\bf{U}}^{\star}\|_{\mathrm{F}}^{2}\leq\frac% {\alpha^{2}}{1-\lambda_{2}}\|\nabla F({\bf{X}}^{\star})\|_{\mathrm{F}}^{2}.∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_υ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over~ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over^ start_ARG bold_P end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (93)

Combining (N) and (93), and using 1γ~st+1<11superscriptsubscript~𝛾s𝑡111-\tilde{\gamma}_{\mathrm{s}}^{t+1}<11 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT < 1, it gives that

𝔼[st+1F2]γ~st+1α21λ2F(𝐗)F2+F1ζt+1γ~st+1ζγ~s+F21γ~s.𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscripts𝑡1F2superscriptsubscript~𝛾s𝑡1superscript𝛼21subscript𝜆2superscriptsubscriptnorm𝐹superscript𝐗F2subscript𝐹1superscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssubscript𝐹21subscript~𝛾s\displaystyle\mathbb{E}\!\left[\|\mathcal{E}_{\mathrm{s}}^{t+1}\|_{\mathrm{F}}% ^{2}\right]\leq\tilde{\gamma}_{\mathrm{s}}^{t+1}\frac{\alpha^{2}}{1-\lambda_{2% }}\|\nabla F({\bf{X}}^{\star})\|_{\mathrm{F}}^{2}+F_{1}\frac{\zeta^{t+1}-% \tilde{\gamma}_{\mathrm{s}}^{t+1}}{\zeta-\tilde{\gamma}_{\mathrm{s}}}+\frac{F_% {2}}{1-\tilde{\gamma}_{\mathrm{s}}}.blackboard_E [ ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG . (94)

Note that

{ζt+1γ~st+1ζγ~sζt+1ζγ~s,ζ>γ~s;ζt+1γ~st+1ζγ~sγ~st+1γ~sζ,ζ<γ~s.casessuperscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssuperscript𝜁𝑡1𝜁subscript~𝛾s𝜁subscript~𝛾ssuperscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssuperscriptsubscript~𝛾s𝑡1subscript~𝛾s𝜁𝜁subscript~𝛾s\displaystyle\left\{\begin{array}[]{cc}\frac{\zeta^{t+1}-\tilde{\gamma}_{% \mathrm{s}}^{t+1}}{\zeta-\tilde{\gamma}_{\mathrm{s}}}\leq\frac{\zeta^{t+1}}{% \zeta-\tilde{\gamma}_{\mathrm{s}}},&\zeta>\tilde{\gamma}_{\mathrm{s}};\\ \frac{\zeta^{t+1}-\tilde{\gamma}_{\mathrm{s}}^{t+1}}{\zeta-\tilde{\gamma}_{% \mathrm{s}}}\leq\frac{\tilde{\gamma}_{\mathrm{s}}^{t+1}}{\tilde{\gamma}_{% \mathrm{s}}-\zeta},&\zeta<\tilde{\gamma}_{\mathrm{s}}.\end{array}\right.{ start_ARRAY start_ROW start_CELL divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL italic_ζ > over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ; end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT - italic_ζ end_ARG , end_CELL start_CELL italic_ζ < over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY

We have ζt+1γ~st+1ζγ~sζ0t+1|ζγ~s|superscript𝜁𝑡1superscriptsubscript~𝛾s𝑡1𝜁subscript~𝛾ssuperscriptsubscript𝜁0𝑡1𝜁subscript~𝛾s\frac{\zeta^{t+1}-\tilde{\gamma}_{\mathrm{s}}^{t+1}}{\zeta-\tilde{\gamma}_{% \mathrm{s}}}\leq\frac{\zeta_{0}^{t+1}}{|\zeta-\tilde{\gamma}_{\mathrm{s}}|}divide start_ARG italic_ζ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_ARG start_ARG | italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT | end_ARG, where ζ0=max{ζ,γ~s,1μα}=max{1αμ,1(1λ2)p22χ}subscript𝜁0𝜁subscript~𝛾s1𝜇𝛼1𝛼𝜇11subscript𝜆2superscript𝑝22𝜒\zeta_{0}=\max\{\zeta,\tilde{\gamma}_{\mathrm{s}},1-\mu\alpha\}=\max\{1-\alpha% \mu,\sqrt{1-\frac{(1-\lambda_{2})p^{2}}{2\chi}}\}italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { italic_ζ , over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , 1 - italic_μ italic_α } = roman_max { 1 - italic_α italic_μ , square-root start_ARG 1 - divide start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_χ end_ARG end_ARG }. Substituting (94) into (48), taking full expectation, and unrolling the recurrence, we have

𝔼[𝐞¯t+12]𝔼delimited-[]superscriptnormsuperscript¯𝐞𝑡12\displaystyle\mathbb{E}\!\left[\big{\|}\bar{{\bf{e}}}^{t+1}\big{\|}^{2}\right]blackboard_E [ ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (1μα)𝐞¯t2+2αLϑsnstF2+α2σ2nabsent1𝜇𝛼superscriptnormsuperscript¯𝐞𝑡22𝛼𝐿subscriptitalic-ϑs𝑛superscriptsubscriptnormsuperscriptsubscripts𝑡F2superscript𝛼2superscript𝜎2𝑛\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\frac{2\alpha L% \vartheta_{\mathrm{s}}}{n}\|\mathcal{E}_{\mathrm{s}}^{t}\|_{\mathrm{F}}^{2}+% \frac{\alpha^{2}\sigma^{2}}{n}≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ∥ caligraphic_E start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
(1μα)𝐞¯t2+2αLϑsn(γ~stα21λ2F(𝐗)F2+F1ζ0t|ζγ~s|+F21γ~s)+α2σ2nabsent1𝜇𝛼superscriptnormsuperscript¯𝐞𝑡22𝛼𝐿subscriptitalic-ϑs𝑛superscriptsubscript~𝛾s𝑡superscript𝛼21subscript𝜆2superscriptsubscriptnorm𝐹superscript𝐗F2subscript𝐹1superscriptsubscript𝜁0𝑡𝜁subscript~𝛾ssubscript𝐹21subscript~𝛾ssuperscript𝛼2superscript𝜎2𝑛\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\frac{2\alpha L% \vartheta_{\mathrm{s}}}{n}\Big{(}\tilde{\gamma}_{\mathrm{s}}^{t}\frac{\alpha^{% 2}}{1-\lambda_{2}}\|\nabla F({\bf{X}}^{\star})\|_{\mathrm{F}}^{2}+F_{1}\frac{% \zeta_{0}^{t}}{|\zeta-\tilde{\gamma}_{\mathrm{s}}|}+\frac{F_{2}}{1-\tilde{% \gamma}_{\mathrm{s}}}\Big{)}+\frac{\alpha^{2}\sigma^{2}}{n}≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ( over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG | italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT | end_ARG + divide start_ARG italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
(1μα)𝐞¯t2+2αLϑs(α21λ2F(𝐗)F2+F1/|ζγ~s|)nζ0t+2αLϑsF2n(1γ~s)+α2σ2nabsent1𝜇𝛼superscriptnormsuperscript¯𝐞𝑡22𝛼𝐿subscriptitalic-ϑssuperscript𝛼21subscript𝜆2superscriptsubscriptnorm𝐹superscript𝐗F2subscript𝐹1𝜁subscript~𝛾s𝑛superscriptsubscript𝜁0𝑡2𝛼𝐿subscriptitalic-ϑssubscript𝐹2𝑛1subscript~𝛾ssuperscript𝛼2superscript𝜎2𝑛\displaystyle\leq(1-\mu\alpha)\|\bar{{\bf{e}}}^{t}\|^{2}+\frac{2\alpha L% \vartheta_{\mathrm{s}}(\frac{\alpha^{2}}{1-\lambda_{2}}\|\nabla F({\bf{X}}^{% \star})\|_{\mathrm{F}}^{2}+\nicefrac{{F_{1}}}{{|\zeta-\tilde{\gamma}_{\mathrm{% s}}|}})}{n}\zeta_{0}^{t}+\frac{2\alpha L\vartheta_{\mathrm{s}}F_{2}}{n(1-% \tilde{\gamma}_{\mathrm{s}})}+\frac{\alpha^{2}\sigma^{2}}{n}≤ ( 1 - italic_μ italic_α ) ∥ over¯ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ ∇ italic_F ( bold_X start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + / start_ARG italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG | italic_ζ - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT | end_ARG ) end_ARG start_ARG italic_n end_ARG italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
ζ0ta0+2LF2ϑsnμ(1γ~s)+ασ2nμ.absentsuperscriptsubscript𝜁0𝑡subscript𝑎02𝐿subscript𝐹2subscriptitalic-ϑs𝑛𝜇1subscript~𝛾s𝛼superscript𝜎2𝑛𝜇\displaystyle\leq\zeta_{0}^{t}a_{0}+\frac{2LF_{2}\vartheta_{\mathrm{s}}}{n\mu(% 1-\tilde{\gamma}_{\mathrm{s}})}+\frac{\alpha\sigma^{2}}{n\mu}.≤ italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG 2 italic_L italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_μ ( 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) end_ARG + divide start_ARG italic_α italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_μ end_ARG .

Note that χ72(1p)1λ2γ~s1+γ2<1𝜒721𝑝1subscript𝜆2subscript~𝛾s1𝛾21\chi\geq\frac{72(1-p)}{1-\lambda_{2}}\Longrightarrow\tilde{\gamma}_{\mathrm{s}% }\leq\frac{1+\gamma}{2}<1italic_χ ≥ divide start_ARG 72 ( 1 - italic_p ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⟹ over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ≤ divide start_ARG 1 + italic_γ end_ARG start_ARG 2 end_ARG < 1. We have 11γ~s8χ1λ211subscript~𝛾s8𝜒1subscript𝜆2\frac{1}{1-\tilde{\gamma}_{\mathrm{s}}}\leq\frac{8\chi}{1-\lambda_{2}}divide start_ARG 1 end_ARG start_ARG 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG 8 italic_χ end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG. Since ϑs=𝐐s2(𝐐s)128χ2p2(1+λn)subscriptitalic-ϑssuperscriptnormsuperscript𝐐s2superscriptnormsuperscriptsuperscript𝐐s128superscript𝜒2superscript𝑝21subscript𝜆𝑛\vartheta_{\mathrm{s}}=\|{\bf{Q}}^{\mathrm{s}}\|^{2}\|({\bf{Q}}^{\mathrm{s}})^% {-1}\|^{2}\leq\frac{8\chi^{2}}{p^{2}(1+\lambda_{n})}italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = ∥ bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ( bold_Q start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 8 italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG and F2=D1nα2σ21ζ+D2nα2σ2subscript𝐹2subscript𝐷1𝑛superscript𝛼2superscript𝜎21𝜁subscript𝐷2𝑛superscript𝛼2superscript𝜎2F_{2}=\frac{D_{1}n\alpha^{2}\sigma^{2}}{1-\zeta}+D_{2}n\alpha^{2}\sigma^{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ζ end_ARG + italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where

D1=α2L2(2+p2)2(1γ)+3α2L2(1p)(2+p2)2,D2=(2p)(2+p2)2,formulae-sequencesubscript𝐷1superscript𝛼2superscript𝐿22superscript𝑝221𝛾3superscript𝛼2superscript𝐿21𝑝2superscript𝑝22subscript𝐷22𝑝2superscript𝑝22D_{1}=\frac{\alpha^{2}L^{2}(2+p^{2})}{2(1-\gamma)}+\frac{3\alpha^{2}L^{2}(1-p)% (2+p^{2})}{2},\ D_{2}=\frac{(2-p)(2+p^{2})}{2},italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 ( 1 - italic_γ ) end_ARG + divide start_ARG 3 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 end_ARG , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG ( 2 - italic_p ) ( 2 + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 end_ARG ,

we have

2αLϑsF2n(1γ~s)𝒪(α4σ2L3χ4μp2(1λ2)2(1ζ)+α2σ2Lχ3μp2(1λ2)).2𝛼𝐿subscriptitalic-ϑssubscript𝐹2𝑛1subscript~𝛾s𝒪superscript𝛼4superscript𝜎2superscript𝐿3superscript𝜒4𝜇superscript𝑝2superscript1subscript𝜆221𝜁superscript𝛼2superscript𝜎2𝐿superscript𝜒3𝜇superscript𝑝21subscript𝜆2\frac{2\alpha L\vartheta_{\mathrm{s}}F_{2}}{n(1-\tilde{\gamma}_{\mathrm{s}})}% \leq\mathcal{O}\left(\frac{\alpha^{4}\sigma^{2}L^{3}\chi^{4}}{\mu p^{2}(1-% \lambda_{2})^{2}(1-\zeta)}+\frac{\alpha^{2}\sigma^{2}L\chi^{3}}{\mu p^{2}(1-% \lambda_{2})}\right).divide start_ARG 2 italic_α italic_L italic_ϑ start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_n ( 1 - over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) end_ARG ≤ caligraphic_O ( divide start_ARG italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_ζ ) end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L italic_χ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ) .

The linear speedup result (51) is thus proved. ∎