Differential error feedback for communication-efficient decentralized learning

Roula Nassif, , Stefan Vlaski, , Marco Carpentiero,
Vincenzo Matta, , Ali H. Sayed
A short conference version of this work appears in [1]. This extended version includes proofs, derivations, and new results. R. Nassif is with Université Côte d’Azur, I3S Laboratory, CNRS, France (email: [email protected]). S. Vlaski is with Imperial College London, UK (e-mail: [email protected]). M. Carpentiero and V. Matta are with University of Salerno, Italy (e-mail: {{\{{mcarpentiero,vmatta}}\}}@unisa.it). A. H. Sayed is with the Institute of Electrical and Micro Engineering, EPFL, Switzerland (e-mail: [email protected]).
Abstract

Communication-constrained algorithms for decentralized learning and optimization rely on local updates coupled with the exchange of compressed signals. In this context, differential quantization is an effective technique to mitigate the negative impact of compression by leveraging correlations between successive iterates. In addition, the use of error feedback, which consists of incorporating the compression error into subsequent steps, is a powerful mechanism to compensate for the bias caused by the compression. Under error feedback, performance guarantees in the literature have so far focused on algorithms employing a fusion center or a special class of contractive compressors that cannot be implemented with a finite number of bits. In this work, we propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. The approach is specifically tailored for decentralized learning problems where agents have individual risk functions to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as special cases, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. We show that, under some general conditions on the compression noise, and for sufficiently small step-sizes μ𝜇\muitalic_μ, the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate: by reducing μ𝜇\muitalic_μ, it is possible to keep the estimation errors small (on the order of μ𝜇\muitalic_μ) without increasing indefinitely the bit rate as μ0𝜇0\mu\rightarrow 0italic_μ → 0. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.

Index Terms:
Error feedback, differential quantization, compression operator, decentralized subspace projection, single-task learning, multitask learning, mean-square-error analysis, bit rate analysis.

I Introduction

Data is increasingly being collected in a distributed and streaming manner, in an environment where communication and data privacy are becoming major concerns. In this context, centralized learning schemes with fusion centers tend to be replaced by new paradigms, such as federated and decentralized learning [2, 3, 4, 5, 6, 7, 8, 9]. In these approaches, each participating device (which is referred to as agent or node) has a local training dataset, which is never uploaded to the server. The training data is kept locally on users’ devices, and the devices act as agents performing local computations to learn global models of interest. In applications where communication with a server becomes a bottleneck, decentralized topologies (where agents only communicate with their neighbors) become attractive alternatives to federated topologies (where a server connects with all remote devices). These decentralized implementations reduce the communication burden since model updates are exchanged locally between agents without relying on a central coordinator [6, 7, 8, 9, 10, 11]. Studies have shown that decentralized approaches can be as efficient as the centralized schemes when considering, for instance, steady-state mean-square-error performance [6, 12].

In traditional decentralized implementations, agents need to exchange (possibly high-dimensional and dense) parameter vectors at every iteration of the learning algorithm, leading to high communication costs. In modern distributed networks comprising a massive number of devices (e.g., thousands of participating smartphones), communication can be slower than local computation by many orders of magnitude (due to limited resources such as energy and bandwidth). Designers are typically limited by an upload bandwidth of 1MB/s or less [2]. Therefore, in practice, if not addressed adequately, the scarcity of the communication resources may limit the application of decentralized learning [13]. A variety of methods have been proposed to reduce the communication overhead of decentralized learning. These methods can be divided into two main categories. In the first one, communication is reduced by skip** communication rounds while performing a certain number of local updates in between [2, 14, 15], thus trading-off communication overhead, computation, and learning performance. In the second one, information is compressed by employing either quantization (e.g., employing dithered quantization [16]), sparsification (e.g., employing top-k𝑘kitalic_k or rand-k𝑘kitalic_k sparsifiers [10]), or both (e.g., employing top-k𝑘kitalic_k combined with dithering [17]), before being exchanged. Compression operators and learning algorithms are then jointly designed to prevent the compression error from accumulating during the learning process and from significantly deteriorating the performance of the decentralized approach [18, 19, 20, 21, 22, 10, 11, 23, 24, 25, 26, 27, 28]. Other works propose to combine the aforementioned two categories to further reduce the communication overhead [29, 30].

In this work, we introduce a new communication-efficient approach for decentralized learning. The approach exploits differential quantization and error feedback to mitigate the negative impact of compressed communications on the learning performance. Differential quantization is a common technique for mitigating the impact of compression by leveraging correlations between successive iterates. In this case, instead of communicating compressed versions of the iterates, agents communicate compressed versions of the differences between current estimates and their predictions based on previous iterations. Several recent works have focused on studying the benefits of differential quantization in the context of decentralized learning. For instance, the work [11] shows that, in a diminishing step-size regime, differential quantization can reduce communication overhead without degrading much the learning rate. In [31], it is shown that decentralized learning can achieve the same convergence rate as centralized learning in non-convex settings, under very high accuracy constraints on the compression operators. The constraints are relaxed in the study [27], which also assumes decentralized non-convex optimization. The work [29] studies the benefits of differential quantization and event-triggered communications. The analysis shows that compression affects slightly the convergence rate when gradients are bounded. Similar results are established in [30] under a weaker bounded gradient dissimilarity assumption, and in [24] in the context of decentralized learning over directed graphs. The works [21, 22, 26] study the benefits of differential quantization without imposing any assumptions on the gradients, and by allowing for the use of combination matrices111As we will see, combination matrices in decentralized learning are used to control the exchange of information between neighboring agents. that are not symmetric [21, 22] or that have matrix valued entries [26]. While the works [11, 31, 21, 22, 26, 27, 29, 30, 24] focus on studying primal stochastic optimization techniques (that are based on propagating and estimating primal variables), the works [10, 28] consider primal-dual techniques and the work [25] considers deterministic optimization.

Error feedback, on the other hand, consists of locally storing the compression error (i.e., the difference between the input and output of the compression operator), and incorporating it into the next iteration. This technique has been previously employed for stochastic gradient descent (SGD) algorithms. Specifically, it has been applied to the SignSGD algorithm in the single-agent context under 1111-bit quantization [32], and to the distributed SGD to handle biased compression operators [17]. In the context of decentralized learning, the DeepSqueeze approach in [33] uses error feedback without differential quantization.

In the current work, we show how to blend differential quantization and error feedback in order to obtain a communication-efficient decentralized learning algorithm. First, we describe in Sec. II the decentralized learning framework and the class of compression operators considered in the study. While most existing works on decentralized learning with compressed communications are focused on single-task or consensus algorithm design, the design in the current work goes beyond this traditional focus by allowing for both single-task and multitask implementations. In single-task learning, nodes collaborate to reach an agreement on a single parameter vector (also referred to as task or objective) despite having different local data distributions. Multitask learning, on the other hand, involves training multiple tasks simultaneously and exploiting their intrinsic relationship. This approach offers several advantages, including improved network performance, especially when the tasks share commonalities in their underlying features [9]. Compared with previous works, another contribution in the current study is the consideration of a general class of compression operators. Specifically, rather than being confined to the set of probabilistic unbiased operators as in [26, 21, 22, 10, 31], we allow for the use of biased (possibly deterministic) compression operators. Moreover, while most existing works assume that some quantities (e.g., the norm or some components of the vector to be quantized) are represented with very high precision (e.g., machine precision) and neglect the associated quantization error [10, 11, 28, 31, 21, 22, 27, 29, 30, 24, 33], the current work incorporates realistic quantization models into the compression process and shows how to effectively manage the errors and minimize their impact on the learning performance. In Secs. III and IV, we present and analyze the proposed learning strategy. While there exist several theoretical works investigating communication-efficient learning, the analysis in the current work is more general in the following sense. First, it considers a general class of compression schemes. Moreover, unlike the studies in [10, 11, 28, 25, 31, 27, 29, 30, 33], it does not require the combination matrices to be symmetric. We further allow the entries of the combination matrix to be matrix-valued (as opposed to scalar valued as in traditional implementations) in order to solve general multitask optimization problems. Finally, we do not assume bounded gradients as in [11, 29, 24, 28, 33]. For ease of reference, the modeling conditions from this and related works are summarized in Table I. We establish in Sec. IV the mean-square-error stability of the proposed decentralized communication-efficient approach. In addition to investigating the mean-square-error stability, we characterize the steady-state average bit rate of the proposed approach when variable-rate quantizers are used. The analysis shows that, by properly designing the quantization operators, the iterates generated by the proposed approach lead to small estimation errors on the order of the step-size μ𝜇\muitalic_μ (as it happens in the ideal case without compression), while concurrently guaranteeing a bounded average bit rate as μ0𝜇0\mu\rightarrow 0italic_μ → 0. Our theoretical findings show that, in the small step-size regime, the proposed strategy attains the performance achievable in the absence of compression, despite the use of a finite number of bits. This demonstrates the effectiveness of the approach in maintaining performance while reducing communication overhead. Finally, we present in Sec. V experimental results illustrating the theoretical findings and showing that blending differential compression and error feedback can achieve superior performance compared to state-of-the-art baselines.

TABLE I: Comparison of modeling assumptions for decentralized stochastic optimization studies. All works employ differential quantization without error feedback, except the one marked with and our work. While the work with uses error feedback, our work employs error feedback with differential quantization. All works assume that some quantities are exchanged with very high precision, except the one marked with and this work. We use the symbol – in the last column for works that do not have bounded gradient assumptions.
Reference Stochastic optimization context Combination matrix Compression operator Step-size Gradient assumption
[11] Primal, consensus-type Symmetric Can be deterministic & biased Diminishing Bounded
[10] Primal-dual, consensus-type Symmetric Probabilistic & unbiased Constant
[29] Primal, consensus-type Symmetric Can be deterministic & biased Diminishing Bounded
[30] Primal, consensus-type Symmetric Can be deterministic & biased Constant Bounded dissimilarities
[28] Primal-dual, multitask Symmetric Can be deterministic & biased Constant Bounded
[31] Primal, consensus-type Symmetric Probabilistic & unbiased Constant
[27] Primal, consensus-type Symmetric Can be deterministic & biased Constant
[24] Primal, consensus-type Can be non-symmetric Probabilistic & unbiased Constant Bounded
[33] Primal, consensus-type Symmetric Can be deterministic & biased Constant Bounded dissimilarities
[21] Primal, consensus-type Can be non-symmetric Probabilistic & unbiased Constant
[26] Primal, consensus-type & multitask Can be non-symmetric Probabilistic & unbiased Constant
with matrix-valued block entries no high precision representations
This work Primal, consensus-type & multitask Can be non-symmetric Can be deterministic & biased Constant
with matrix-valued block entries no high precision representations

Notation: All vectors are column vectors. Random quantities are denoted in boldface. Matrices are denoted in uppercase letters while vectors and scalars are denoted in lowercase letters. The symbol ()superscripttop(\cdot)^{\top}( ⋅ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT denotes matrix transposition. The operator col{}col\text{col}\{\cdot\}col { ⋅ } stacks the column vector entries on top of each other. The operator diag{}diag\text{diag}\{\cdot\}diag { ⋅ } forms a matrix from block arguments by placing each block immediately below and to the right of its predecessor. The symbol tensor-product\otimes denotes the Kronecker product. The ceiling and floor functions are denoted by \lceil\cdot\rceil⌈ ⋅ ⌉ and \lfloor\cdot\rfloor⌊ ⋅ ⌋, respectively. The M×M𝑀𝑀M\times Mitalic_M × italic_M identity matrix is denoted by IMsubscript𝐼𝑀I_{M}italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT. The abbreviation “w.p.” is used for “with probability”. The notation α=O(μ)𝛼𝑂𝜇\alpha=O(\mu)italic_α = italic_O ( italic_μ ) signifies that there exist two positive constants c𝑐citalic_c and μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that |α|cμ𝛼𝑐𝜇|\alpha|\leq c\mu| italic_α | ≤ italic_c italic_μ for all μμ0𝜇subscript𝜇0\mu\leq\mu_{0}italic_μ ≤ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Vectors or matrices of all zeros are denoted by 0.

II Problem setup

In this section, we formally state the decentralized optimization problem and introduce strategies, quantities, and assumptions that will be used in subsequent sections.

II-A Decentralized optimization under subspace constraints

We consider a connected graph (or network) 𝒢(𝒱,)𝒢𝒱\mathcal{G}(\mathcal{V},\mathcal{E})caligraphic_G ( caligraphic_V , caligraphic_E ), where 𝒱𝒱\mathcal{V}caligraphic_V and \mathcal{E}caligraphic_E denote the set of K𝐾Kitalic_K agents or nodes (labeled k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K) and the set of possible communication links or edges, respectively. Let wkMksubscript𝑤𝑘superscriptsubscript𝑀𝑘w_{k}\in\mathbb{R}^{M_{k}}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote some parameter vector at agent k𝑘kitalic_k and let 𝒲=col{w1,,wK}𝒲colsubscript𝑤1subscript𝑤𝐾{\scriptstyle\mathcal{W}}=\text{col}\{w_{1},\ldots,w_{K}\}caligraphic_W = col { italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } denote the M𝑀Mitalic_M-dimensional vector (where M=k=1KMk𝑀superscriptsubscript𝑘1𝐾subscript𝑀𝑘M=\sum_{k=1}^{K}M_{k}italic_M = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) collecting the parameter vectors from across the network. We associate with each agent k𝑘kitalic_k a differentiable convex risk Jk(wk):Mk:subscript𝐽𝑘subscript𝑤𝑘superscriptsubscript𝑀𝑘J_{k}(w_{k}):\mathbb{R}^{M_{k}}\rightarrow\mathbb{R}italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R, expressed as the expectation of some loss function Lk()subscript𝐿𝑘L_{k}(\cdot)italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) and written as Jk(wk)=𝔼Lk(wk;𝒚k)subscript𝐽𝑘subscript𝑤𝑘𝔼subscript𝐿𝑘subscript𝑤𝑘subscript𝒚𝑘J_{k}(w_{k})=\mathbb{E}L_{k}(w_{k};\boldsymbol{y}_{k})italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = blackboard_E italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; bold_italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where 𝒚ksubscript𝒚𝑘\boldsymbol{y}_{k}bold_italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the random data at agent k𝑘kitalic_k. The expectation is computed relative to the distribution of the local data. In the stochastic setting, when the data distribution is unknown, the risks Jk()subscript𝐽𝑘J_{k}(\cdot)italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) and their gradients wkJk()subscriptsubscript𝑤𝑘subscript𝐽𝑘\nabla_{w_{k}}J_{k}(\cdot)∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) are unknown. In this case, instead of using the true gradient, it is common to use approximate gradient vectors based on the loss functions such as wkJk^(wk)=wkLk(wk;𝒚k,i)^subscriptbold-∇subscript𝑤𝑘subscript𝐽𝑘subscript𝑤𝑘subscriptsubscript𝑤𝑘subscript𝐿𝑘subscript𝑤𝑘subscript𝒚𝑘𝑖\widehat{\boldsymbol{\nabla}_{w_{k}}J_{k}}(w_{k})=\nabla_{w_{k}}L_{k}(w_{k};% \boldsymbol{y}_{k,i})over^ start_ARG bold_∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; bold_italic_y start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ) where 𝒚k,isubscript𝒚𝑘𝑖\boldsymbol{y}_{k,i}bold_italic_y start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT represents the data observed at iteration i𝑖iitalic_i [6, 34].

In traditional single-task or consensus problems, agents need to agree on a common parameter vector (also called model or task) corresponding to the minimizer of the following weighted sum of individual risks:

wo=argminwMc1Kk=1KJk(w),superscript𝑤𝑜subscript𝑤superscriptsubscript𝑀𝑐1𝐾superscriptsubscript𝑘1𝐾subscript𝐽𝑘𝑤w^{o}=\arg\min_{w\in\mathbb{R}^{{M_{c}}}}\frac{1}{K}\sum_{k=1}^{K}J_{k}(w),italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w ) , (1)

where Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represents a common vector length, i.e., in this case, the dimensions Mksubscript𝑀𝑘M_{k}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all agents are identical and equal to Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. Moreover, w𝑤witalic_w is the global parameter vector, which all agents need to agree upon. Each agent seeks to estimate wosuperscript𝑤𝑜w^{o}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT through local computations and communications among neighboring agents without the need to know any of the risks or losses besides its own. Among many useful strategies that have been proposed in the literature [8, 6, 7, 35, 36, 37], diffusion strategies [8, 6, 7] are particularly attractive since they are scalable, robust, and enable continuous learning and adaptation in response to drifts in the location of the minimizer.

In this work, instead of considering the single-task formulation (1), we consider a generalization that allows the network to solve multitask optimization problems. Multitask learning is suitable for network applications where differences in the data distributions require more complex models and more flexible algorithms than single-task implementations. In multitask networks, agents generally need to estimate and track multiple distinct, though related, models or objectives. For instance, in distributed power system state estimation, the local state vectors to be estimated at neighboring control centers may overlap partially since the areas in a power system are interconnected [38]. Likewise, in weather forecasting applications, regional differences in the collected data distributions require agents to exploit the correlation profile in the data for enhanced decision rules [39]. Existing strategies to address multitask problems generally exploit prior knowledge on how the tasks across the network relate to each other [9]. For example, one way to model relationships among tasks is to formulate convex optimization problems with appropriate co-regularizers between neighboring agents [39, 9]. Another way to leverage the relationships among tasks is to constrain the model parameters to lie within certain subspaces that can represent for instance shared latent patterns that are common across tasks [40, 9, 41]. The choice between these techniques depends in general on the specific characteristics of the tasks and the desired trade-offs between model complexity and performance [9, 41].

In this work, we study decentralized learning under subspace constraints in the presence of compressed communications. Specifically, we consider inference problems of the form:

𝒲o=argmin𝒲k=1KJk(wk),subjectto𝒲Range(𝒰)formulae-sequencesuperscript𝒲𝑜subscript𝒲superscriptsubscript𝑘1𝐾subscript𝐽𝑘subscript𝑤𝑘subjectto𝒲Range𝒰\begin{split}{\scriptstyle\mathcal{W}}^{o}=&\arg\min_{{\scriptscriptstyle% \mathcal{W}}}~{}\sum_{k=1}^{K}J_{k}(w_{k}),\\ &\operatorname*{subject~{}to}~{}{\scriptstyle\mathcal{W}}\in\text{Range}(% \mathcal{U})\end{split}start_ROW start_CELL caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = end_CELL start_CELL roman_arg roman_min start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OPERATOR roman_subject roman_to end_OPERATOR caligraphic_W ∈ Range ( caligraphic_U ) end_CELL end_ROW (2)

where the matrix 𝒰𝒰\mathcal{U}caligraphic_U is an M×P𝑀𝑃M\times Pitalic_M × italic_P full-column rank matrix (with PMmuch-less-than𝑃𝑀P\ll Mitalic_P ≪ italic_M) assumed to be semi-unitary, i.e., its columns are orthonormal (𝒰𝒰=IPsuperscript𝒰top𝒰subscript𝐼𝑃\mathcal{U}^{\top}\mathcal{U}=I_{P}caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT caligraphic_U = italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT). By using the stochastic gradient of the individual risk Jk(wk)subscript𝐽𝑘subscript𝑤𝑘J_{k}(w_{k})italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), agent k𝑘kitalic_k seeks to estimate the klimit-from𝑘k-italic_k -th Mk×1subscript𝑀𝑘1M_{k}\times 1italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × 1 subvector wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the network vector 𝒲o=col{w1o,,wKo}superscript𝒲𝑜colsubscriptsuperscript𝑤𝑜1subscriptsuperscript𝑤𝑜𝐾{\scriptstyle\mathcal{W}}^{o}=\text{col}\{w^{o}_{1},\ldots,w^{o}_{K}\}caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = col { italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }, which is required to lie in a low-dimensional subspace. While the objective in (2) is additively separable, the subspace constraint couples the models across the agents. Constrained formulations of the form (2) have been studied previously in decentralized settings where communication constraints are absent [40, 42]. As explained in [9, 42][40, Sec. II], by properly selecting the matrix 𝒰𝒰\mathcal{U}caligraphic_U, formulation (2) can be tailored to address a wide range of optimization problems encountered in network applications. Examples include i)i)italic_i ) consensus or single-task optimization (where the agents’ objective is to reach consensus on the minimizer wosuperscript𝑤𝑜w^{o}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT in (1)), ii)ii)italic_i italic_i ) decentralized coupled optimization (where the parameter vectors to be estimated at neighboring agents are partially overlap**) [43, 41, 44, 38], and iii)iii)italic_i italic_i italic_i ) multitask inference under smoothness (where the network parameter vector 𝒲𝒲{\scriptstyle\mathcal{W}}caligraphic_W to be estimated is required to be smooth w.r.t. the underlying network topology) [40, 39]. For instance, setting in (2) 𝒰=1K(𝟙KIMc)𝒰1𝐾tensor-productsubscript1𝐾subscript𝐼subscript𝑀𝑐\mathcal{U}=\frac{1}{\sqrt{K}}(\mathds{1}_{K}\otimes I_{{M_{c}}})caligraphic_U = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ( blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) where 𝟙Ksubscript1𝐾\mathds{1}_{K}blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is the K×1𝐾1K\times 1italic_K × 1 vector of all ones, we obtain an optimization problem equivalent to the consensus problem (1). While projecting onto the space spanned by the vector of all ones allows to enforce consensus across the network, graph smoothness can in general be promoted by projecting onto the space spanned by the eigenvectors of the graph Laplacian corresponding to small eigenvalues [9, 40, 42].

II-B Decentralized diffusion-based approach

To solve problem (2) in a decentralized manner, we consider the primal approach proposed and studied in [40, 12], namely,

𝝍k,isubscript𝝍𝑘𝑖\displaystyle\boldsymbol{\psi}_{k,i}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝒘k,i1μwkJk^(𝒘k,i1)absentsubscript𝒘𝑘𝑖1𝜇^subscriptsubscript𝑤𝑘subscript𝐽𝑘subscript𝒘𝑘𝑖1\displaystyle=\boldsymbol{w}_{k,i-1}-\mu\widehat{\nabla_{w_{k}}J_{k}}(% \boldsymbol{w}_{k,i-1})= bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - italic_μ over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) (3a)
𝒘k,isubscript𝒘𝑘𝑖\displaystyle\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝒩kAk𝝍,iabsentsubscriptsubscript𝒩𝑘subscript𝐴𝑘subscript𝝍𝑖\displaystyle=\sum_{\ell\in\mathcal{N}_{k}}A_{k\ell}\boldsymbol{\psi}_{\ell,i}= ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT bold_italic_ψ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT (3b)

where 𝒩ksubscript𝒩𝑘\mathcal{N}_{k}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the set of nodes connected to agent k𝑘kitalic_k by a communication link (including node k𝑘kitalic_k itself) and μ>0𝜇0\mu>0italic_μ > 0 is a small step-size parameter. Note that the information sharing across agents in (3) is implemented by means of a K×K𝐾𝐾K\times Kitalic_K × italic_K block combination matrix 𝒜=[Ak]𝒜delimited-[]subscript𝐴𝑘\mathcal{A}=[A_{k\ell}]caligraphic_A = [ italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT ] that has a zero Mk×Msubscript𝑀𝑘subscript𝑀M_{k}\times M_{\ell}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_M start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT block element (k,)𝑘(k,\ell)( italic_k , roman_ℓ ) if nodes k𝑘kitalic_k and \ellroman_ℓ are not neighbors, i.e., Ak=0subscript𝐴𝑘0A_{k\ell}=0italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT = 0 if 𝒩ksubscript𝒩𝑘\ell\notin\mathcal{N}_{k}roman_ℓ ∉ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and satisfies the following conditions [40]:

𝒜𝒰=𝒰,𝒰𝒜=𝒰,and ρ(𝒜𝒫𝒰)<1,formulae-sequence𝒜𝒰𝒰formulae-sequencesuperscript𝒰top𝒜superscript𝒰topand 𝜌𝒜subscript𝒫𝒰1\mathcal{A}\,\mathcal{U}=\mathcal{U},\quad\mathcal{U}^{\top}\mathcal{A}=% \mathcal{U}^{\top},~{}~{}\text{and~{} }\rho(\mathcal{A}-\mathcal{P}_{{% \scriptscriptstyle\mathcal{U}}})<1,caligraphic_A caligraphic_U = caligraphic_U , caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT caligraphic_A = caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , and italic_ρ ( caligraphic_A - caligraphic_P start_POSTSUBSCRIPT caligraphic_U end_POSTSUBSCRIPT ) < 1 , (4)

where ρ()𝜌\rho(\cdot)italic_ρ ( ⋅ ) denotes the spectral radius of its matrix argument, and 𝒫𝒰=𝒰𝒰subscript𝒫𝒰𝒰superscript𝒰top\mathcal{P}_{{\scriptscriptstyle\mathcal{U}}}=\mathcal{U}\mathcal{U}^{\top}caligraphic_P start_POSTSUBSCRIPT caligraphic_U end_POSTSUBSCRIPT = caligraphic_U caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the orthogonal projection matrix onto Range(𝒰)Range𝒰\text{Range}(\mathcal{U})Range ( caligraphic_U ). It is shown [12, 42] how combination matrices 𝒜𝒜\mathcal{A}caligraphic_A satisfying (4) can be constructed. If strategy (3) is employed to solve the single-task problem (1), we can select the combination matrix 𝒜𝒜\mathcal{A}caligraphic_A in the form AIMctensor-product𝐴subscript𝐼subscript𝑀𝑐A\otimes I_{{M_{c}}}italic_A ⊗ italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where A=[ak]𝐴delimited-[]subscript𝑎𝑘A=[a_{k\ell}]italic_A = [ italic_a start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT ] is a K×K𝐾𝐾K\times Kitalic_K × italic_K doubly-stochastic matrix satisfying:

ak0,A𝟙K=𝟙K,𝟙KA=𝟙K,ak=0 if 𝒩k,formulae-sequencesubscript𝑎𝑘0formulae-sequence𝐴subscript1𝐾subscript1𝐾formulae-sequencesuperscriptsubscript1𝐾top𝐴superscriptsubscript1𝐾topsubscript𝑎𝑘0 if subscript𝒩𝑘a_{k\ell}\geq 0,\quad A\mathds{1}_{K}=\mathds{1}_{K},\quad\mathds{1}_{K}^{\top% }A=\mathds{1}_{K}^{\top},\quad a_{k\ell}=0\text{ if }\ell\notin\mathcal{N}_{k},italic_a start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT ≥ 0 , italic_A blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A = blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT = 0 if roman_ℓ ∉ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (5)

In this case, conditions (4) are satisfied for 𝒰=1K(𝟙KIMc)𝒰1𝐾tensor-productsubscript1𝐾subscript𝐼subscript𝑀𝑐\mathcal{U}=\frac{1}{\sqrt{K}}(\mathds{1}_{K}\otimes I_{{M_{c}}})caligraphic_U = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ( blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) [40] and strategy (3) reduces to the standard diffusion Adapt-Then-Combine (ATC) approach [6, 7, 8]:

𝝍k,isubscript𝝍𝑘𝑖\displaystyle\boldsymbol{\psi}_{k,i}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝒘k,i1μwkJk^(𝒘k,i1)absentsubscript𝒘𝑘𝑖1𝜇^subscriptsubscript𝑤𝑘subscript𝐽𝑘subscript𝒘𝑘𝑖1\displaystyle=\boldsymbol{w}_{k,i-1}-\mu\widehat{\nabla_{w_{k}}J_{k}}(% \boldsymbol{w}_{k,i-1})= bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - italic_μ over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) (6a)
𝒘k,isubscript𝒘𝑘𝑖\displaystyle\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝒩kak𝝍,iabsentsubscriptsubscript𝒩𝑘subscript𝑎𝑘subscript𝝍𝑖\displaystyle=\sum_{\ell\in\mathcal{N}_{k}}a_{k\ell}\boldsymbol{\psi}_{\ell,i}= ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT bold_italic_ψ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT (6b)

This fact motivates the use of the terminology “diffusion Adapt-Then-Combine (ATC) approach” in the sequel when referring to the decentralized strategy (3) for solving general constrained optimization problems of the form (2).

The first step (3a) in the ATC approach (3) is the self-learning step corresponding to the stochastic gradient descent step on the individual risk Jk()subscript𝐽𝑘J_{k}(\cdot)italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ). This step is followed by the social learning step (3b) where agent k𝑘kitalic_k receives the intermediate estimates {𝝍,i}subscript𝝍𝑖\{\boldsymbol{\psi}_{\ell,i}\}{ bold_italic_ψ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } from its neighbors 𝒩ksubscript𝒩𝑘\ell\in\mathcal{N}_{k}roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and combines them through {Ak}subscript𝐴𝑘\{A_{k\ell}\}{ italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT } to form 𝒘k,isubscript𝒘𝑘𝑖\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, which corresponds to the estimate of wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at agent k𝑘kitalic_k and iteration i𝑖iitalic_i. To alleviate the communication bottleneck resulting from the exchange of the intermediate estimates among agents over many iterations, compressed communication must be considered. Before presenting the communication-efficient variant of the ATC approach (3), we describe in the following the class of compression operators considered in this study.

II-C Compression operator

For the sake of clarity, we first introduce the following formal definitions for key concepts relating to data compression.

Definition 1.

(Compression operator). Let L𝐿Litalic_L represent a generic vector length. A compression operator 𝓒:LL:𝓒superscript𝐿superscript𝐿\boldsymbol{\cal{C}}:\mathbb{R}^{L}\rightarrow\mathbb{R}^{L}bold_caligraphic_C : blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT associates to every input xL𝑥superscript𝐿x\in\mathbb{R}^{L}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT a random quantity 𝓒(x)L𝓒𝑥superscript𝐿\boldsymbol{\cal{C}}(x)\in\mathbb{R}^{L}bold_caligraphic_C ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT that is governed by the conditional probability measure (|x)\mathbb{P}(\cdot|x)blackboard_P ( ⋅ | italic_x ).

Note that the above family of compression operators includes deterministic map**s as a particular case.

Definition 2.

(Bounded-distortion compression operator). A bounded-distortion compression operator is a compression operator that fulfills the property:

𝔼x𝓒(x)2βc2x2+σc2,𝔼superscriptnorm𝑥𝓒𝑥2subscriptsuperscript𝛽2𝑐superscriptnorm𝑥2subscriptsuperscript𝜎2𝑐\mathbb{E}\|x-\boldsymbol{\cal{C}}(x)\|^{2}\leq\beta^{2}_{c}\|x\|^{2}+\sigma^{% 2}_{c},blackboard_E ∥ italic_x - bold_caligraphic_C ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , (7)

for some βc20subscriptsuperscript𝛽2𝑐0\beta^{2}_{c}\geq 0italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≥ 0 and σc20subscriptsuperscript𝜎2𝑐0\sigma^{2}_{c}\geq 0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≥ 0, and where the expectation is taken over the conditional probability measure (|x)\mathbb{P}(\cdot|x)blackboard_P ( ⋅ | italic_x ).

Definition 3.

(Unbiased compression operator). An unbiased compression operator is a compression operator that fulfills the property:

𝔼[𝓒(x)]=x.𝔼delimited-[]𝓒𝑥𝑥\mathbb{E}[\boldsymbol{\cal{C}}(x)]=x.blackboard_E [ bold_caligraphic_C ( italic_x ) ] = italic_x . (8)

In this study, we consider bounded-distortion compression operators. Table II provides a list of bounded-distortion operators commonly used in decentralized learning, with the corresponding compression noise parameters βc2subscriptsuperscript𝛽2𝑐\beta^{2}_{c}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and σc2subscriptsuperscript𝜎2𝑐\sigma^{2}_{c}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, and the bit-budget required to encode an input vector xL𝑥superscript𝐿x\in\mathbb{R}^{L}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. By comparing the reported schemes, we observe that the “rand-c𝑐citalic_c”, “randomized Gossip”, and “top-c𝑐citalic_c” can be considered as sparsifiers that map a full vector into a sparse version thereof. For instance, the rand-c𝑐citalic_c scheme selects randomly c𝑐citalic_c components of the input vector and encodes them with very high precision (32323232 or 64646464 bits are typical values for encoding a scalar). These bits are then communicated over the links in addition to the bits encoding the locations of the selected entries. On the other hand, the QSGD scheme encodes the norm of the input vector with very high precision. In addition to encoding the norm, L𝐿Litalic_L-bits are used to encode the signs of the input vector components and Llog2(s)𝐿subscript2𝑠L\lceil\log_{2}(s)\rceilitalic_L ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_s ) ⌉ to encode the levels. In comparison, the probabilistic uniform and probabilistic ANQ quantizers do not make any assumptions on the high-precision representation of specific variables. In the following, we highlight the key facts regarding the compression operators considered in this study.

TABLE II: Examples of bounded-distortion compression operators. For each scheme, we report the compression rule, the parameters βc2superscriptsubscript𝛽𝑐2\beta_{c}^{2}italic_β start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σc2subscriptsuperscript𝜎2𝑐\sigma^{2}_{c}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT in (7), and the bit-budget. BHPsubscript𝐵HPB_{\text{HP}}italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT denotes the number of bits required to encode a scalar with high precision (Typical values for BHPsubscript𝐵HPB_{\text{HP}}italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT are 32323232 or 64646464). The operator marked with is deterministic. The operators marked with are unbiased.
Name Rule βc2subscriptsuperscript𝛽2𝑐\beta^{2}_{c}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT σc2subscriptsuperscript𝜎2𝑐\sigma^{2}_{c}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT Bit-budget
No compression [40] 𝓒(x)=x𝓒𝑥𝑥\boldsymbol{\cal{C}}(x)=xbold_caligraphic_C ( italic_x ) = italic_x 00 00 LBHP𝐿subscript𝐵HPLB_{\text{HP}}italic_L italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT
Probabilistic uniform [𝓒(x)]j=Δ𝒏(xj)subscriptdelimited-[]𝓒𝑥𝑗Δ𝒏subscript𝑥𝑗\left[\boldsymbol{\cal{C}}(x)\right]_{j}=\Delta\cdot\boldsymbol{n}(x_{j})[ bold_caligraphic_C ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_Δ ⋅ bold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) 0 LΔ24𝐿superscriptΔ24L\frac{\Delta^{2}}{4}italic_L divide start_ARG roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG 𝒓(x)𝒓𝑥\boldsymbol{r}(x)bold_italic_r ( italic_x ) defined in (9)
or dithered 𝒏(xj)={m,w.p. (m+1)ΔxjΔ,m+1,w.p. xjmΔΔ,m=xjΔformulae-sequence𝒏subscript𝑥𝑗cases𝑚w.p. 𝑚1Δsubscript𝑥𝑗Δ𝑚1w.p. subscript𝑥𝑗𝑚ΔΔ𝑚subscript𝑥𝑗Δ\boldsymbol{n}(x_{j})=\left\{\begin{array}[]{ll}m,&\text{w.p. }\frac{(m+1)% \Delta-x_{j}}{\Delta},\\ m+1,&\text{w.p. }\frac{x_{j}-m\Delta}{\Delta},\end{array}\right.\qquad m=\left% \lfloor\frac{x_{j}}{\Delta}\right\rfloorbold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = { start_ARRAY start_ROW start_CELL italic_m , end_CELL start_CELL w.p. divide start_ARG ( italic_m + 1 ) roman_Δ - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ end_ARG , end_CELL end_ROW start_ROW start_CELL italic_m + 1 , end_CELL start_CELL w.p. divide start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_m roman_Δ end_ARG start_ARG roman_Δ end_ARG , end_CELL end_ROW end_ARRAY italic_m = ⌊ divide start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ end_ARG ⌋
quantizer [16, 23] ΔΔ\Deltaroman_Δ is the quantization step
Probabilistic ANQ [25] [𝓒(x)]j=y𝒏(xj)subscriptdelimited-[]𝓒𝑥𝑗subscript𝑦𝒏subscript𝑥𝑗\left[\boldsymbol{\cal{C}}(x)\right]_{j}=y_{\boldsymbol{n}(x_{j})}[ bold_caligraphic_C ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT bold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT 2ω22superscript𝜔22\omega^{2}2 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2Lη22𝐿superscript𝜂22L\eta^{2}2 italic_L italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝒓(x)𝒓𝑥\boldsymbol{r}(x)bold_italic_r ( italic_x ) defined in (9)
𝒏(xj)={m,w.p. ym+1xjym+1ym,m+1,w.p. xjymym+1ym,m=sign(xj)ln(1+ωη|xj|)2ln(ω+1+ω2)𝒏subscript𝑥𝑗cases𝑚w.p. subscript𝑦𝑚1subscript𝑥𝑗subscript𝑦𝑚1subscript𝑦𝑚𝑚1w.p. subscript𝑥𝑗subscript𝑦𝑚subscript𝑦𝑚1subscript𝑦𝑚𝑚signsubscript𝑥𝑗1𝜔𝜂subscript𝑥𝑗2𝜔1superscript𝜔2\boldsymbol{n}(x_{j})=\left\{\begin{array}[]{ll}m,&\text{w.p. }\frac{y_{m+1}-x% _{j}}{y_{m+1}-y_{m}},\\ m+1,&\text{w.p. }\frac{x_{j}-y_{m}}{y_{m+1}-y_{m}},\end{array}\right.\quad m=% \left\lfloor\text{sign}(x_{j})\frac{\ln\left(1+\frac{\omega}{\eta}|x_{j}|% \right)}{2\ln\left(\omega+\sqrt{1+\omega^{2}}\right)}\right\rfloorbold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = { start_ARRAY start_ROW start_CELL italic_m , end_CELL start_CELL w.p. divide start_ARG italic_y start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_y start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW start_ROW start_CELL italic_m + 1 , end_CELL start_CELL w.p. divide start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_y start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW end_ARRAY italic_m = ⌊ sign ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) divide start_ARG roman_ln ( 1 + divide start_ARG italic_ω end_ARG start_ARG italic_η end_ARG | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ) end_ARG start_ARG 2 roman_ln ( italic_ω + square-root start_ARG 1 + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG ⌋
ym=sign(m)ηω[(ω+1+ω2)2|m|1]subscript𝑦𝑚sign𝑚𝜂𝜔delimited-[]superscript𝜔1superscript𝜔22𝑚1y_{m}=\text{sign}(m)\frac{\eta}{\omega}\left[\left(\omega+\sqrt{1+\omega^{2}}% \right)^{2|m|}-1\right]italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = sign ( italic_m ) divide start_ARG italic_η end_ARG start_ARG italic_ω end_ARG [ ( italic_ω + square-root start_ARG 1 + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 | italic_m | end_POSTSUPERSCRIPT - 1 ]
ω𝜔\omegaitalic_ω and η𝜂\etaitalic_η are two non-negative design parameters
Rand-c𝑐citalic_c [10, 11] [𝓒(x)]j=α{xj,if xjΩc0,otherwise where α={1,biased versionLc,unbiased versionsubscriptdelimited-[]𝓒𝑥𝑗𝛼casessubscript𝑥𝑗if subscript𝑥𝑗subscriptΩ𝑐0otherwise where 𝛼cases1biased version𝐿𝑐unbiased version\left[\boldsymbol{\cal{C}}(x)\right]_{j}=\alpha\cdot\left\{\begin{array}[]{ll}% x_{j},&\text{if }x_{j}\in\Omega_{c}\\ 0,&\text{otherwise }\end{array}\right.\quad\text{where }\alpha=\left\{\begin{% array}[]{ll}1,&\text{biased version}\\ \frac{L}{c},&\text{unbiased version}\end{array}\right.[ bold_caligraphic_C ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_α ⋅ { start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY where italic_α = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL biased version end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_L end_ARG start_ARG italic_c end_ARG , end_CELL start_CELL unbiased version end_CELL end_ROW end_ARRAY α(1cL)𝛼1𝑐𝐿\alpha\left(1-\frac{c}{L}\right)italic_α ( 1 - divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG ) 00 cBHP+clog2(L)𝑐subscript𝐵HP𝑐subscript2𝐿cB_{\text{HP}}+c\lceil\log_{2}(L)\rceilitalic_c italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT + italic_c ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_L ) ⌉
ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is a set of c𝑐citalic_c randomly selected coordinates, c{1,,L}𝑐1𝐿c\in\{1,\ldots,L\}italic_c ∈ { 1 , … , italic_L }
Randomized Gossip [11] 𝓒(x)=α{x,w.p. q0,w.p. 1qwhere α={1,biased version1q,unbiased version𝓒𝑥𝛼cases𝑥w.p. 𝑞0w.p. 1𝑞where 𝛼cases1biased version1𝑞unbiased version\boldsymbol{\cal{C}}(x)=\alpha\cdot\left\{\begin{array}[]{ll}x,&\text{w.p. }q% \\ 0,&\text{w.p. }1-q\end{array}\right.\quad\text{where }\alpha=\left\{\begin{% array}[]{ll}1,&\text{biased version}\\ \frac{1}{q},&\text{unbiased version}\end{array}\right.bold_caligraphic_C ( italic_x ) = italic_α ⋅ { start_ARRAY start_ROW start_CELL italic_x , end_CELL start_CELL w.p. italic_q end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL w.p. 1 - italic_q end_CELL end_ROW end_ARRAY where italic_α = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL biased version end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_q end_ARG , end_CELL start_CELL unbiased version end_CELL end_ROW end_ARRAY α(1q)𝛼1𝑞\alpha(1-q)italic_α ( 1 - italic_q ) 00 LBHPq𝐿subscript𝐵HP𝑞LB_{\text{HP}}q\qquaditalic_L italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT italic_q (on average)
q(0,1]𝑞01q\in(0,1]italic_q ∈ ( 0 , 1 ] is the transmission probability
QSGD [4] [𝓒(x)]j=xsign(xj)𝒏(xj,x)ssubscriptdelimited-[]𝓒𝑥𝑗norm𝑥signsubscript𝑥𝑗𝒏subscript𝑥𝑗𝑥𝑠\left[\boldsymbol{\cal{C}}(x)\right]_{j}=\|x\|\cdot\text{sign}(x_{j})\cdot% \frac{\boldsymbol{n}(x_{j},x)}{s}[ bold_caligraphic_C ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∥ italic_x ∥ ⋅ sign ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⋅ divide start_ARG bold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG italic_s end_ARG min(Ls2,Ls)𝐿superscript𝑠2𝐿𝑠\min\left(\frac{L}{s^{2}},\frac{\sqrt{L}}{s}\right)roman_min ( divide start_ARG italic_L end_ARG start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG square-root start_ARG italic_L end_ARG end_ARG start_ARG italic_s end_ARG ) 00 BHP+L+Llog2(s)subscript𝐵HP𝐿𝐿subscript2𝑠B_{\text{HP}}+L+L\lceil\log_{2}(s)\rceilitalic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT + italic_L + italic_L ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_s ) ⌉
𝒏(xj,x)={m,w.p. (m+1)|xj|xs,m+1,w.p. |xj|xsm,m=s|xj|xformulae-sequence𝒏subscript𝑥𝑗𝑥cases𝑚w.p. 𝑚1subscript𝑥𝑗norm𝑥𝑠𝑚1w.p. subscript𝑥𝑗norm𝑥𝑠𝑚𝑚𝑠subscript𝑥𝑗norm𝑥\boldsymbol{n}(x_{j},x)=\left\{\begin{array}[]{ll}m,&\text{w.p. }(m+1)-\frac{|% x_{j}|}{\|x\|}s,\\ m+1,&\text{w.p. }\frac{|x_{j}|}{\|x\|}s-m,\end{array}\right.\qquad m=\left% \lfloor s\frac{|x_{j}|}{\|x\|}\right\rfloorbold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x ) = { start_ARRAY start_ROW start_CELL italic_m , end_CELL start_CELL w.p. ( italic_m + 1 ) - divide start_ARG | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ∥ italic_x ∥ end_ARG italic_s , end_CELL end_ROW start_ROW start_CELL italic_m + 1 , end_CELL start_CELL w.p. divide start_ARG | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ∥ italic_x ∥ end_ARG italic_s - italic_m , end_CELL end_ROW end_ARRAY italic_m = ⌊ italic_s divide start_ARG | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ∥ italic_x ∥ end_ARG ⌋
s𝑠sitalic_s is the number of quantization levels
Top-c𝑐citalic_c sparsifier [11] [𝒞(x)]j={xj,if xjΩc0,otherwise subscriptdelimited-[]𝒞𝑥𝑗casessubscript𝑥𝑗if subscript𝑥𝑗subscriptΩ𝑐0otherwise \left[\mathcal{C}(x)\right]_{j}=\left\{\begin{array}[]{ll}x_{j},&\text{if }x_{% j}\in\Omega_{c}\\ 0,&\text{otherwise }\end{array}\right.[ caligraphic_C ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY (c{1,,L})𝑐1𝐿\qquad\qquad(c\in\{1,\ldots,L\})( italic_c ∈ { 1 , … , italic_L } ) 1cL1𝑐𝐿1-\frac{c}{L}1 - divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG 00 cBHP+clog2(L)𝑐subscript𝐵HP𝑐subscript2𝐿cB_{\text{HP}}+c\lceil\log_{2}(L)\rceilitalic_c italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT + italic_c ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_L ) ⌉
ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is a set of the c𝑐citalic_c coordinates with highest magnitude

II-C1 Allowing for an absolute compression noise term

Many existing works focus on studying decentralized learning in the presence of bounded-distortion compression operators that satisfy condition (7) with the absolute noise term σc2=0subscriptsuperscript𝜎2𝑐0\sigma^{2}_{c}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 0 [10, 11, 28, 31, 21, 22, 27, 29, 30, 24, 33]. In contrast, the analysis in the current work is conducted in the presence of both the relative (captured through βc2subscriptsuperscript𝛽2𝑐\beta^{2}_{c}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) and the absolute compression noise terms. As explained in [25], neglecting the effect of σc2subscriptsuperscript𝜎2𝑐\sigma^{2}_{c}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT requires that some quantities (e.g., the norm of the vector in QSGD) are represented with no quantization error, in practice at the machine precision. We refer the reader to [26] for a description of a framework for designing randomized compression operators that do not require high-precision quantization of specific variables. Particularly, Sec. II in [26] describes the design of the probabilistic uniform and ANQ quantizers endowed with a variable-rate coding scheme from [25] to adapt the bit rate based on the quantizer input. When these rules, which are listed in Table II (rows 22223333), are applied entrywise to a vector xL𝑥superscript𝐿x\in\mathbb{R}^{L}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, the overall (random) bit budget will be equal to [26]:

𝒓(x)=log2(3)j=1L(1+log2(|𝒏(xj)|+1)).𝒓𝑥subscript23superscriptsubscript𝑗1𝐿1subscript2𝒏subscript𝑥𝑗1\boldsymbol{r}(x)=\log_{2}(3)\sum_{j=1}^{L}(1+\lceil\log_{2}(|\boldsymbol{n}(x% _{j})|+1)\rceil).bold_italic_r ( italic_x ) = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( 1 + ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( | bold_italic_n ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | + 1 ) ⌉ ) . (9)

II-C2 Allowing for the use of biased compression operators

Although the communication-efficient approach developed in [26] can be used for solving decentralized learning under subspace constraints and makes no assumptions about encoding quantities with high precision, it is not designed to handle biased compression operators, i.e., operators that do not satisfy the unbiasedness condition (8). As we will see, by incorporating explicitly the error feedback mechanism into the differential quantization approach proposed in [26], we can address biased compression by filtering the compression error over time. In general, biased compression operators tend to outperform their unbiased counterparts [17].

While the list of compression operators in Table II provides several examples of interest, it is not exhaustive. As we will see, through concatenation, it is possible to achieve other meaningful schemes.

Example 1.

(Concatenation of compression schemes). In this example, we investigate a specific type of compression operator consisting of concatenating two distinct bounded-distortion compression operators. The first operation, known as the top-c𝑐citalic_c sparsifier, entails retaining only the c𝑐citalic_c largest-magnitude components of the input vector. The second operation involves quantizing the output of the top-c𝑐citalic_c sparsifier. This concatenation is particularly noteworthy as it tends to require the fewest number of bits for representation by exploiting the inherent sparsity induced by the sparsifier and by concentrating the quantization process on the most significant components of the input. Numerical results illustrating the benefits of the concatenation are provided in Sec. V.

Definition 4.

(Top-c𝑐citalic_c quantizer). Let 𝓠()𝓠\boldsymbol{\cal{Q}}(\cdot)bold_caligraphic_Q ( ⋅ ) be a bounded-distortion compression operator with parameters βq2subscriptsuperscript𝛽2𝑞\beta^{2}_{q}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT and σq2subscriptsuperscript𝜎2𝑞\sigma^{2}_{q}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. Let 𝒮()𝒮\mathcal{S}(\cdot)caligraphic_S ( ⋅ ) be the top-c𝑐citalic_c sparsifier, i.e., the deterministic compression operator listed in Table II (row 7), which can also be defined as [45, 46]:

[𝒮(x)]j={xj,if jc0,otherwisesubscriptdelimited-[]𝒮𝑥𝑗casessubscript𝑥𝑗if 𝑗subscript𝑐0otherwise\left[\mathcal{S}(x)\right]_{j}=\left\{\begin{array}[]{ll}x_{j},&\emph{if }j% \in\mathcal{I}_{c}\\ 0,&\emph{otherwise}\end{array}\right.[ caligraphic_S ( italic_x ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , end_CELL start_CELL if italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY (10)

where csubscript𝑐\mathcal{I}_{c}caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the set containing the indices of the c𝑐citalic_c largest-magnitude components of x𝑥xitalic_x. In case of ties (i.e., when the first c𝑐citalic_c components are not uniquely determined), any tie-break rule is permitted. The top-c𝑐citalic_c quantizer operator is defined as:

𝓒(x)=α𝓠(𝒮(x)),where α={11+βq2,if 𝓠() is an unbiased scheme1,if 𝓠() is a biased scheme with βq21.formulae-sequence𝓒𝑥𝛼𝓠𝒮𝑥where 𝛼cases11subscriptsuperscript𝛽2𝑞if 𝓠 is an unbiased scheme1if 𝓠 is a biased scheme with βq21.\boldsymbol{\cal{C}}(x)=\alpha\cdot\boldsymbol{\cal{Q}}(\mathcal{S}(x)),\qquad% \emph{where }\alpha=\left\{\begin{array}[]{ll}\frac{1}{1+\beta^{2}_{q}},&\emph% {if }\boldsymbol{\cal{Q}}(\cdot)\emph{ is an unbiased scheme}\\ 1,&\emph{if }\boldsymbol{\cal{Q}}(\cdot)\emph{ is a biased scheme with $\beta^% {2}_{q}\leq 1$.}\end{array}\right.bold_caligraphic_C ( italic_x ) = italic_α ⋅ bold_caligraphic_Q ( caligraphic_S ( italic_x ) ) , where italic_α = { start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 1 + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL if bold_caligraphic_Q ( ⋅ ) is an unbiased scheme end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL if bold_caligraphic_Q ( ⋅ ) is a biased scheme with italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≤ 1 . end_CELL end_ROW end_ARRAY (11)
Lemma 1.

(Property of the top-c𝑐citalic_c quantizer). The compression operator defined by (11) is a bounded-distortion compression operator with parameters:

βc2=1cL(1((1α)2+α2βq2))andσc2=α2σq2,formulae-sequencesubscriptsuperscript𝛽2𝑐1𝑐𝐿1superscript1𝛼2superscript𝛼2subscriptsuperscript𝛽2𝑞andsubscriptsuperscript𝜎2𝑐superscript𝛼2subscriptsuperscript𝜎2𝑞\beta^{2}_{c}=1-\frac{c}{L}\left(1-\left((1-\alpha)^{2}+\alpha^{2}\beta^{2}_{q% }\right)\right)\qquad\emph{and}\qquad\sigma^{2}_{c}=\alpha^{2}\sigma^{2}_{q},italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 1 - divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG ( 1 - ( ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ) and italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , (12)

where the expectation in (7) is taken over the conditional probability measure (|x)\mathbb{P}(\cdot|x)blackboard_P ( ⋅ | italic_x ) that governs the random behavior of the compression operator 𝓠()𝓠\boldsymbol{\cal{Q}}(\cdot)bold_caligraphic_Q ( ⋅ ). By choosing α𝛼\alphaitalic_α according to (11), we find that the parameters {βc2,σc2}subscriptsuperscript𝛽2𝑐subscriptsuperscript𝜎2𝑐\{\beta^{2}_{c},\sigma^{2}_{c}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } reduce to {1cL(1+βq2),σq2(1+βq2)2}1𝑐𝐿1subscriptsuperscript𝛽2𝑞subscriptsuperscript𝜎2𝑞superscript1subscriptsuperscript𝛽2𝑞2\left\{1-\frac{c}{L(1+\beta^{2}_{q})},\frac{\sigma^{2}_{q}}{(1+\beta^{2}_{q})^% {2}}\right\}{ 1 - divide start_ARG italic_c end_ARG start_ARG italic_L ( 1 + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) end_ARG , divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } for unbiased 𝓠()𝓠\boldsymbol{\cal{Q}}(\cdot)bold_caligraphic_Q ( ⋅ ) and to {1(1βq2)cL,σq2}11subscriptsuperscript𝛽2𝑞𝑐𝐿subscriptsuperscript𝜎2𝑞\left\{1-(1-\beta^{2}_{q})\frac{c}{L},\sigma^{2}_{q}\right\}{ 1 - ( 1 - italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } for biased 𝓠()𝓠\boldsymbol{\cal{Q}}(\cdot)bold_caligraphic_Q ( ⋅ ).

Proof.

See Appendix A. ∎

\blacksquare

II-D Contributions: Significant reduction in communication, with almost no effect on steady-state performance

In summary, we provide the following main contributions.

  • We propose a communication-efficient variant of the ATC diffusion approach (3) for solving decentralized learning under subspace constraints. The strategy blends differential quantization and error feedback.

  • We provide a detailed characterization of the proposed approach for a general class of bounded-distortion compression operators satisfying (7), both in terms of mean-square stability and communication resources.

  • In terms of steady-state performance: We show that, in the small step-size regime, i.e., when μ0𝜇0\mu\rightarrow 0italic_μ → 0 (so that higher order terms of the step-size can be neglected), the iterates 𝒘k,isubscript𝒘𝑘𝑖\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT generated by the communication-efficient approach resulting from incorporating differential quantization and error feedback into the ATC diffusion approach (3) satisfy:

    lim supi𝔼wko𝒘k,i2κμ,subscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2𝜅𝜇{\limsup_{i\rightarrow\infty}\mathbb{E}\|w^{o}_{k}-\boldsymbol{w}_{k,i}\|^{2}% \approx\kappa\mu},lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≈ italic_κ italic_μ , (13)

    where κ𝜅\kappaitalic_κ is a constant that depends mainly on the gradient noise (i.e., the difference between the true gradient and its approximation) variance, and does not depend on the compression noise terms {βc2,σc2}subscriptsuperscript𝛽2𝑐subscriptsuperscript𝜎2𝑐\{\beta^{2}_{c},\sigma^{2}_{c}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT }. The same result holds when studying the ATC diffusion approach (3) in the absence of compression [40, Theorem 1]. As we will show later, this asymptotic equivalence is achievable because the compression error is contained in higher-order terms that vanish faster than μ𝜇\muitalic_μ as μ0𝜇0\mu\rightarrow 0italic_μ → 0.

  • In terms of bit rate: While result (13) provides important reassurance about the accuracy of the compressed approach, it does not address the communication efficiency directly, which is often quantified in terms of bit-rate. In the absence of the absolute quantization noise term (σc2=0subscriptsuperscript𝜎2𝑐0\sigma^{2}_{c}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 0), result (13) is achieved at the expense of communicating some quantities with high precision (as previously explained). In the presence of the absolute noise term, the analysis reveals that, to guarantee (13), the parameters of the compression schemes (which are chosen by the designer) should be set so that the absolute noise term converges to zero as μ0𝜇0\mu\rightarrow 0italic_μ → 0. We prove that this result can be achieved with a bit rate that remains bounded as μ0𝜇0\mu\rightarrow 0italic_μ → 0, despite the fact that we are requiring an increasing precision as the step-size decreases.

Thus, our theoretical findings reveal that, in the small step-size regime, the proposed strategy attains the performance achievable in the absence of compression, despite the use of a finite number of bits. This demonstrates the effectiveness of the approach in maintaining performance while reducing communication overheads. While the theoretical findings show the optimality of the strategy in the small step-size regime, the experimental results in Sec. V illustrate its practical effectiveness in terms of achieving superior or competitive performance against state-of-the-art baselines in various scenarios, including those beyond the small step-size regime.

III Decentralized algorithmic framework: compressed communications

In this work, we propose the DEF-ATC (differential error feedback - adapt then combine) diffusion strategy listed in Algorithm 1 and in (18a)–(18c) for solving problem (2) in a decentralized and communication-efficient manner. At each iteration i𝑖iitalic_i, each agent k𝑘kitalic_k in the network performs three steps. The first step, which corresponds to the adaptation step, is identical to the adaptation step (3a), except that the step-size μ𝜇\muitalic_μ in (3a) is replaced by μ/ζ𝜇𝜁\mu/\zetaitalic_μ / italic_ζ in (14), where ζ(0,1]𝜁01\zeta\in(0,1]italic_ζ ∈ ( 0 , 1 ] is a dam** parameter appearing in the compression step (15). This parameter is used to counteract the instability induced by the compression errors. The second step is the compression step. To update {ϕ,i}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖subscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, each agent k𝑘kitalic_k first encodes the error compensated difference 𝝍k,iϕk,i1+𝒛k,i1subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+\boldsymbol{z}_{k,i-1}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT (using a bounded-distortion compression operator222Since the compression scheme characteristics can vary across agents, the compression operator becomes 𝓒k()subscript𝓒𝑘\boldsymbol{\cal{C}}_{k}(\cdot)bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) instead of 𝓒()𝓒\boldsymbol{\cal{C}}(\cdot)bold_caligraphic_C ( ⋅ ) with a subscript k𝑘kitalic_k added to 𝓒𝓒\boldsymbol{\cal{C}}bold_caligraphic_C. 𝓒k()subscript𝓒𝑘\boldsymbol{\cal{C}}_{k}(\cdot)bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ )), and broadcasts the result to its neighbors. Then, agent k𝑘kitalic_k updates the local compression error vector 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT according to (16) and performs the reconstruction on each received vector by first decoding it to obtain 𝜹,isubscript𝜹𝑖\boldsymbol{\delta}_{\ell,i}bold_italic_δ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT, and then computing the predictor ϕ,isubscriptbold-italic-ϕ𝑖\boldsymbol{\phi}_{\ell,i}bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT according to (15) where 𝜹,isubscript𝜹𝑖\boldsymbol{\delta}_{\ell,i}bold_italic_δ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT is scaled by the aforementioned dam** parameter ζ𝜁\zetaitalic_ζ. Observe that implementing the compression step in Algorithm 1 requires storing the previous compression error 𝒛k,i1subscript𝒛𝑘𝑖1\boldsymbol{z}_{k,i-1}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT and the previous predictors {ϕ,i1}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i-1}\}_{\ell\in\mathcal{N}_{k}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT by agent k𝑘kitalic_k. The compression step is followed by the combination step (17) where agent k𝑘kitalic_k combines the reconstructed vectors {ϕ,i}subscriptbold-italic-ϕ𝑖\{\boldsymbol{\phi}_{\ell,i}\}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } using the combination coefficients {Ak}subscript𝐴𝑘\{A_{k\ell}\}{ italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT } and a mixing parameter γ(0,1]𝛾01\gamma\in(0,1]italic_γ ∈ ( 0 , 1 ]. The resulting vector 𝒘k,isubscript𝒘𝑘𝑖\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is the estimate of wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the k𝑘kitalic_k-th subvector of 𝒲osuperscript𝒲𝑜{\scriptstyle\mathcal{W}}^{o}caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT in (2), at agent k𝑘kitalic_k and iteration i𝑖iitalic_i. As we will see in Sec. IV, and as for the dam** coefficient ζ𝜁\zetaitalic_ζ, the mixing parameter γ𝛾\gammaitalic_γ in the combination step (17) can also be used to control the algorithm stability. A block diagram illustrating the implementation of the DEF-ATC diffusion approach at agent k𝑘kitalic_k is provided in Fig. 1.

Refer to caption
Refer to caption
Figure 1: (Left) An illustration of a multitask network [9, 26]. The objective at agent k𝑘kitalic_k is to estimate wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (of dimension Mk×1subscript𝑀𝑘1M_{k}\times 1italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × 1), the k𝑘kitalic_k-th subvector of 𝒲osuperscript𝒲𝑜{\scriptstyle\mathcal{W}}^{o}caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT in (2). In this example, the neighborhood set of agent k𝑘kitalic_k is given by 𝒩k={1,k,3,,7}subscript𝒩𝑘1𝑘37\mathcal{N}_{k}=\{1,k,3,\ell,7\}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { 1 , italic_k , 3 , roman_ℓ , 7 }. (Right) The implementation of the DEF-ATC diffusion approach listed in Alg. 1 at agent k𝑘kitalic_k. The set 𝒩ksuperscriptsubscript𝒩𝑘\mathcal{N}_{k}^{-}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT is the neighborhood set of agent k𝑘kitalic_k, excluding k𝑘kitalic_k itself. The compression step consists of three sub-steps: i)i)italic_i ) the forward compression step where agent k𝑘kitalic_k encodes the error-compensated difference 𝝌k,i=𝝍k,iϕk,i1+𝒛k,i1subscript𝝌𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT and sends the resulting vector 𝒑k,isubscript𝒑𝑘𝑖{\boldsymbol{p}}_{k,i}bold_italic_p start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT (sequence of symbols or bits) to its neighbors; ii)ii)italic_i italic_i ) the local error computation and reconstruction step where agent k𝑘kitalic_k decodes the local vector 𝒑k,isubscript𝒑𝑘𝑖{\boldsymbol{p}}_{k,i}bold_italic_p start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT to obtain 𝜹k,i=𝓒k(𝝍k,iϕk,i1+𝒛k,i1)subscript𝜹𝑘𝑖subscript𝓒𝑘subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\delta}_{k,i}=\boldsymbol{\cal{C}}_{k}(\boldsymbol{\psi}_{k,i}-% \boldsymbol{\phi}_{k,i-1}+\boldsymbol{z}_{k,i-1})bold_italic_δ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ), updates the local compression error vector 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT according to (16) and the local predictor ϕk,isubscriptbold-italic-ϕ𝑘𝑖\boldsymbol{\phi}_{k,i}bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT according to (18b); and iii)iii)italic_i italic_i italic_i ) the (social) reconstruction step where agent k𝑘kitalic_k receives the encoded vectors {𝒑,i}𝒩ksubscriptsubscript𝒑𝑖superscriptsubscript𝒩𝑘\{{\boldsymbol{p}}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}^{-}}{ bold_italic_p start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from its neighbors, decodes them to obtain {𝜹,i}𝒩ksubscriptsubscript𝜹𝑖superscriptsubscript𝒩𝑘\{\boldsymbol{\delta}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}^{-}}{ bold_italic_δ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and then updates the predictors {ϕ,i}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖superscriptsubscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}^{-}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT according to (15). The resulting vectors ϕk,isubscriptbold-italic-ϕ𝑘𝑖\boldsymbol{\phi}_{k,i}bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT and {ϕ,i}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖superscriptsubscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}^{-}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are then used in the social learning step (17). Observe that implementing the compression step requires storing the previous compression error 𝒛k,i1subscript𝒛𝑘𝑖1\boldsymbol{z}_{k,i-1}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT and the previous estimates {ϕ,i1}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i-1}\}_{\ell\in\mathcal{N}_{k}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT by agent k𝑘kitalic_k.
Input: initializations 𝒘k,0=0subscript𝒘𝑘00\boldsymbol{w}_{k,0}=0bold_italic_w start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT = 0, ϕk,0=0subscriptbold-italic-ϕ𝑘00\boldsymbol{\phi}_{k,0}=0bold_italic_ϕ start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT = 0, and 𝒛k,0=0subscript𝒛𝑘00\boldsymbol{z}_{k,0}=0bold_italic_z start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT = 0, small step-size μ𝜇\muitalic_μ, dam** coefficient ζ(0,1]𝜁01\zeta\in(0,1]italic_ζ ∈ ( 0 , 1 ], mixing parameter γ(0,1]𝛾01\gamma\in(0,1]italic_γ ∈ ( 0 , 1 ], combination matrix 𝒜𝒜\mathcal{A}caligraphic_A satisfying (4).
for i=1,2,,𝑖12i=1,2,\ldots,italic_i = 1 , 2 , … , on the k𝑘kitalic_k-th node do
       Adapt: update 𝒘k,i1subscript𝒘𝑘𝑖1\boldsymbol{w}_{k,i-1}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT according to:
𝝍k,i=𝒘k,i1μζwkJk^(𝒘k,i1)subscript𝝍𝑘𝑖subscript𝒘𝑘𝑖1𝜇𝜁^subscriptsubscript𝑤𝑘subscript𝐽𝑘subscript𝒘𝑘𝑖1\boldsymbol{\psi}_{k,i}=\boldsymbol{w}_{k,i-1}-\frac{\mu}{\zeta}\widehat{% \nabla_{w_{k}}J_{k}}(\boldsymbol{w}_{k,i-1})bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) (14)
      
      Compress and broadcast:
       \bullet encode the error compensated difference 𝝍k,iϕk,i1+𝒛k,i1subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+\boldsymbol{z}_{k,i-1}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT using a bounded-distortion compression operator 𝓒k()subscript𝓒𝑘\boldsymbol{\cal{C}}_{k}(\cdot)bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) and broadcast the result 𝒑k,isubscript𝒑𝑘𝑖\boldsymbol{p}_{k,i}bold_italic_p start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT to the neighbors 𝒩ksubscript𝒩𝑘\mathcal{N}_{k}caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
       \bullet upon receiving the compressed messages {𝒑,i}subscript𝒑𝑖\{\boldsymbol{p}_{\ell,i}\}{ bold_italic_p start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } from neighbors 𝒩ksubscript𝒩𝑘\ell\in\mathcal{N}_{k}roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, first decode them to obtain {𝜹,i=𝓒(𝝍,iϕ,i1+𝒛,i1)}𝒩ksubscriptsubscript𝜹𝑖subscript𝓒subscript𝝍𝑖subscriptbold-italic-ϕ𝑖1subscript𝒛𝑖1subscript𝒩𝑘\{\boldsymbol{\delta}_{\ell,i}=\boldsymbol{\cal{C}}_{\ell}(\boldsymbol{\psi}_{% \ell,i}-\boldsymbol{\phi}_{\ell,i-1}+\boldsymbol{z}_{\ell,i-1})\}_{\ell\in% \mathcal{N}_{k}}{ bold_italic_δ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT = bold_caligraphic_C start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( bold_italic_ψ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and then compute {ϕ,i}𝒩ksubscriptsubscriptbold-italic-ϕ𝑖subscript𝒩𝑘\{\boldsymbol{\phi}_{\ell,i}\}_{\ell\in\mathcal{N}_{k}}{ bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT according to:
ϕ,i=ϕ,i1+ζ𝜹,i,𝒩kformulae-sequencesubscriptbold-italic-ϕ𝑖subscriptbold-italic-ϕ𝑖1𝜁subscript𝜹𝑖subscript𝒩𝑘\boldsymbol{\phi}_{\ell,i}=\boldsymbol{\phi}_{\ell,i-1}+\zeta\boldsymbol{% \delta}_{\ell,i},\qquad\ell\in\mathcal{N}_{k}bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT = bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT + italic_ζ bold_italic_δ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT , roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (15)
\bullet update the local compression error:
𝒛k,i=(𝝍k,iϕk,i1+𝒛k,i1)𝜹k,isubscript𝒛𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1subscript𝜹𝑘𝑖{\boldsymbol{z}_{k,i}=(\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1})-\boldsymbol{\delta}_{k,i}}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) - bold_italic_δ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT (16)
      
      
      Combine: Update local model according to:
𝒘k,i=(1γ)ϕk,i+γ𝒩kAkϕ,isubscript𝒘𝑘𝑖1𝛾subscriptbold-italic-ϕ𝑘𝑖𝛾subscriptsubscript𝒩𝑘subscript𝐴𝑘subscriptbold-italic-ϕ𝑖\boldsymbol{w}_{k,i}=(1-\gamma)\boldsymbol{\phi}_{k,i}+\gamma\sum_{\ell\in% \mathcal{N}_{k}}A_{k\ell}\boldsymbol{\phi}_{\ell,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = ( 1 - italic_γ ) bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + italic_γ ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT (17)
Algorithm 1 DEF-ATC (differential error feedback - adapt then combine) diffusion strategy for solving (2)

For the sake of convenience, we rewrite Algorithm 1 in the following compact form:

𝝍k,isubscript𝝍𝑘𝑖\displaystyle\boldsymbol{\psi}_{k,i}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝒘k,i1μζwkJk^(𝒘k,i1)absentsubscript𝒘𝑘𝑖1𝜇𝜁^subscriptsubscript𝑤𝑘subscript𝐽𝑘subscript𝒘𝑘𝑖1\displaystyle=\boldsymbol{w}_{k,i-1}-\frac{\mu}{\zeta}\widehat{\nabla_{w_{k}}J% _{k}}(\boldsymbol{w}_{k,i-1})= bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) (18a)
ϕk,isubscriptbold-italic-ϕ𝑘𝑖\displaystyle\boldsymbol{\phi}_{k,i}bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =ϕk,i1+ζ𝓒k(𝝍k,iϕk,i1+𝒛k,i1)absentsubscriptbold-italic-ϕ𝑘𝑖1𝜁subscript𝓒𝑘subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\displaystyle=\boldsymbol{\phi}_{k,i-1}+\zeta\boldsymbol{\cal{C}}_{k}(% \boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+\boldsymbol{z}_{k,i-1})= bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_ζ bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) (18b)
𝒘k,isubscript𝒘𝑘𝑖\displaystyle\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =(1γ)ϕk,i+γ𝒩kAkϕ,iabsent1𝛾subscriptbold-italic-ϕ𝑘𝑖𝛾subscriptsubscript𝒩𝑘subscript𝐴𝑘subscriptbold-italic-ϕ𝑖\displaystyle=(1-\gamma)\boldsymbol{\phi}_{k,i}+\gamma\sum_{\ell\in\mathcal{N}% _{k}}A_{k\ell}\boldsymbol{\phi}_{\ell,i}= ( 1 - italic_γ ) bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + italic_γ ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT (18c)

where the compression error 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is updated according to:

𝒛k,i=(𝝍k,iϕk,i1+𝒛k,i1)𝓒k(𝝍k,iϕk,i1+𝒛k,i1).subscript𝒛𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1subscript𝓒𝑘subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{z}_{k,i}=(\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1})-\boldsymbol{\cal{C}}_{k}(\boldsymbol{\psi}_{k,i}-% \boldsymbol{\phi}_{k,i-1}+\boldsymbol{z}_{k,i-1}).bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) - bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) . (19)

Observe that, in the absence of compression (i.e., when the operator 𝓒k()subscript𝓒𝑘\boldsymbol{\cal{C}}_{k}(\cdot)bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) in (18b) and (19) is replaced by the identity operator and the parameters ζ𝜁\zetaitalic_ζ and γ𝛾\gammaitalic_γ in (18b) and (18c), respectively, are set to 1) we recover the diffusion ATC approach (3). Therefore, Algorithm 1 can be seen as a communication-efficient variant of the Adapt-Then-Combine (ATC) approach. To mitigate the negative impact of compression, the DEF-ATC approach uses differential quantization and error-feedback in step (18b). Differential quantization consists of compressing differences of the form 𝝍k,iϕk,i1subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT and transmitting them, instead of communicating compressed versions of the estimates 𝝍k,isubscript𝝍𝑘𝑖\boldsymbol{\psi}_{k,i}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT. Error feedback, on the other hand, consists of locally storing the compression error 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT (i.e., the difference between the input and output of the compression operator), and incorporating it back into the next iteration. In Remark 1 further ahead, we explain the role of the compression error 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT and how its introduction helps mitigate the accumulation of errors over time.

IV Mean-square-error and bit rate stability analysis

IV-A Modeling assumptions

In this section, we analyze strategy (18) with a matrix 𝒜𝒜\mathcal{A}caligraphic_A satisfying (4) by examining the average squared distance between 𝒘k,isubscript𝒘𝑘𝑖\boldsymbol{w}_{k,i}bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT and wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, namely, 𝔼wko𝒘k,i2𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2\mathbb{E}\|w^{o}_{k}-\boldsymbol{w}_{k,i}\|^{2}blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, under the following assumptions on the risks {Jk()}subscript𝐽𝑘\{J_{k}(\cdot)\}{ italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) }, the gradient noise processes {𝒔k,i()}subscript𝒔𝑘𝑖\{\boldsymbol{s}_{k,i}(\cdot)\}{ bold_italic_s start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( ⋅ ) } defined by [6]:

𝒔k,i(w)wkJk(w)wkJk^(w),subscript𝒔𝑘𝑖𝑤subscriptsubscript𝑤𝑘subscript𝐽𝑘𝑤^subscriptsubscript𝑤𝑘subscript𝐽𝑘𝑤\boldsymbol{s}_{k,i}(w)\triangleq\nabla_{w_{k}}J_{k}(w)-\widehat{\nabla_{w_{k}% }J_{k}}(w),bold_italic_s start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( italic_w ) ≜ ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w ) - over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_w ) , (20)

and the compression operators {𝓒k()}subscript𝓒𝑘\{\boldsymbol{\cal{C}}_{k}(\cdot)\}{ bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) }.

Assumption 1.

(Conditions on individual and aggregate risks). The individual risks Jk(wk)subscript𝐽𝑘subscript𝑤𝑘J_{k}(w_{k})italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) are assumed to be twice differentiable and convex such that:

λk,minIMkwk2Jk(wk)λk,maxIMk,subscript𝜆𝑘subscript𝐼subscript𝑀𝑘subscriptsuperscript2subscript𝑤𝑘subscript𝐽𝑘subscript𝑤𝑘subscript𝜆𝑘subscript𝐼subscript𝑀𝑘\lambda_{k,\min}I_{M_{k}}\leq\nabla^{2}_{w_{k}}J_{k}(w_{k})\leq\lambda_{k,\max% }I_{M_{k}},italic_λ start_POSTSUBSCRIPT italic_k , roman_min end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_λ start_POSTSUBSCRIPT italic_k , roman_max end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (21)

where λk,min0subscript𝜆𝑘0\lambda_{k,\min}\geq 0italic_λ start_POSTSUBSCRIPT italic_k , roman_min end_POSTSUBSCRIPT ≥ 0 for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K. It is further assumed that, for any {wkMk}subscript𝑤𝑘superscriptsubscript𝑀𝑘\{w_{k}\in\mathbb{R}^{M_{k}}\}{ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }, the individual risks satisfy:

0<λminIP𝒰diag{wk2Jk(wk)}k=1K𝒰λmaxIP,0subscript𝜆subscript𝐼𝑃superscript𝒰topdiagsuperscriptsubscriptsubscriptsuperscript2subscript𝑤𝑘subscript𝐽𝑘subscript𝑤𝑘𝑘1𝐾𝒰subscript𝜆subscript𝐼𝑃0<\lambda_{\min}I_{P}\leq\mathcal{U}^{\top}\emph{\text{diag}}\left\{\nabla^{2}% _{w_{k}}J_{k}(w_{k})\right\}_{k=1}^{K}\mathcal{U}\leq\lambda_{\max}I_{P},0 < italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ≤ caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT diag { ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_U ≤ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , (22)

for some positive parameters λminλmaxsubscript𝜆subscript𝜆\lambda_{\min}\leq\lambda_{\max}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. ∎

As explained in [40], condition (22) ensures that problem (2) has a unique minimizer 𝒲osuperscript𝒲𝑜{\scriptstyle\mathcal{W}}^{o}caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT.

Assumption 2.

(Conditions on gradient noise). The gradient noise process defined in (20) satisfies for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K:

𝔼[𝒔k,i(𝒘k,i1)|{ϕ,i1,𝒛,i1}=1K]𝔼delimited-[]conditionalsubscript𝒔𝑘𝑖subscript𝒘𝑘𝑖1superscriptsubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒛𝑖11𝐾\displaystyle\mathbb{E}\left[\boldsymbol{s}_{k,i}(\boldsymbol{w}_{k,i-1})|\{{% \boldsymbol{\phi}_{\ell,i-1},\boldsymbol{z}_{\ell,i-1}}\}_{\ell=1}^{K}\right]blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) | { bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] =0,absent0\displaystyle=0,= 0 , (23)
𝔼[𝒔k,i(𝒘k,i1)2|{ϕ,i1,𝒛,i1}=1K]𝔼delimited-[]conditionalsuperscriptnormsubscript𝒔𝑘𝑖subscript𝒘𝑘𝑖12superscriptsubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒛𝑖11𝐾\displaystyle\mathbb{E}\left[\|\boldsymbol{s}_{k,i}(\boldsymbol{w}_{k,i-1})\|^% {2}|\{{\boldsymbol{\phi}_{\ell,i-1},\boldsymbol{z}_{\ell,i-1}}\}_{\ell=1}^{K}\right]blackboard_E [ ∥ bold_italic_s start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | { bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] βs,k2wko𝒘k,i12+σs,k2,absentsubscriptsuperscript𝛽2𝑠𝑘superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖12subscriptsuperscript𝜎2𝑠𝑘\displaystyle\leq\beta^{2}_{s,k}\|{w^{o}_{k}-\boldsymbol{w}_{k,i-1}}\|^{2}+% \sigma^{2}_{s,k},≤ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT , (24)

for some βs,k20subscriptsuperscript𝛽2𝑠𝑘0\beta^{2}_{s,k}\geq 0italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT ≥ 0 and σs,k20subscriptsuperscript𝜎2𝑠𝑘0\sigma^{2}_{s,k}\geq 0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT ≥ 0. ∎

As explained in [6, 7, 8], these conditions are satisfied by many risk functions of interest in learning and adaptation such as quadratic and regularized logistic costs. Condition (23) states that the gradient vector approximation should be unbiased conditioned on the iterates generated at the previous time instant. Condition (24) states that the second-order moment of the gradient noise should get smaller for better estimates, since it is bounded by the squared norm of the iterate.

Assumption 3.

(Conditions on compression operators). In step (18b) of the DEF-ATC strategy, each agent k𝑘kitalic_k at time i𝑖iitalic_i applies to the error compensated difference 𝛘k,i=𝛙k,iϕk,i1+𝐳k,i1subscript𝛘𝑘𝑖subscript𝛙𝑘𝑖subscriptbold-ϕ𝑘𝑖1subscript𝐳𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT a bounded-distortion compression operator 𝓒k()subscript𝓒𝑘\boldsymbol{\cal{C}}_{k}(\cdot)bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) (see Definition 2) with compression noise parameters βc,k2subscriptsuperscript𝛽2𝑐𝑘\beta^{2}_{c,k}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT and σc,k2subscriptsuperscript𝜎2𝑐𝑘\sigma^{2}_{c,k}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT. It is assumed that given the past history, the randomized compression mechanism depends only on the quantizer input 𝛘k,isubscript𝛘𝑘𝑖\boldsymbol{\chi}_{k,i}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT. Consequently, from (7), we get:

𝔼[𝝌k,i𝓒k(𝝌k,i)2|𝒉i]𝔼delimited-[]conditionalsuperscriptnormsubscript𝝌𝑘𝑖subscript𝓒𝑘subscript𝝌𝑘𝑖2subscript𝒉𝑖\displaystyle\mathbb{E}\left[\|\boldsymbol{\chi}_{k,i}-\boldsymbol{\cal{C}}_{k% }(\boldsymbol{\chi}_{k,i})\|^{2}|\boldsymbol{h}_{i}\right]blackboard_E [ ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] =𝔼[𝝌k,i𝓒k(𝝌k,i)2|𝝌k,i]βc,k2𝝌k,i2+σc,k2,absent𝔼delimited-[]conditionalsuperscriptnormsubscript𝝌𝑘𝑖subscript𝓒𝑘subscript𝝌𝑘𝑖2subscript𝝌𝑘𝑖subscriptsuperscript𝛽2𝑐𝑘superscriptnormsubscript𝝌𝑘𝑖2subscriptsuperscript𝜎2𝑐𝑘\displaystyle=\mathbb{E}\left[\|\boldsymbol{\chi}_{k,i}-\boldsymbol{\cal{C}}_{% k}(\boldsymbol{\chi}_{k,i})\|^{2}|\boldsymbol{\chi}_{k,i}\right]\leq\beta^{2}_% {c,k}\|\boldsymbol{\chi}_{k,i}\|^{2}+\sigma^{2}_{c,k},= blackboard_E [ ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ] ≤ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT , (25)

where 𝐡isubscript𝐡𝑖\boldsymbol{h}_{i}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the vector collecting all iterates generated by (18) before the quantizer is applied to 𝛘k,isubscript𝛘𝑘𝑖\boldsymbol{\chi}_{k,i}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, namely, {{ϕ,j}j=1i1,{𝛙,j}j=1i,{𝐳,j}j=1i1}=1Ksuperscriptsubscriptsuperscriptsubscriptsubscriptbold-ϕ𝑗𝑗1𝑖1superscriptsubscriptsubscript𝛙𝑗𝑗1𝑖superscriptsubscriptsubscript𝐳𝑗𝑗1𝑖11𝐾\Big{\{}\{\boldsymbol{\phi}_{\ell,j}\}_{j=1}^{i-1},\{\boldsymbol{\psi}_{\ell,j% }\}_{j=1}^{i},\{\boldsymbol{z}_{\ell,j}\}_{j=1}^{i-1}\Big{\}}_{\ell=1}^{K}{ { bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT , { bold_italic_ψ start_POSTSUBSCRIPT roman_ℓ , italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , { bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT . ∎

IV-B Network error vector recursion

In the following, we derive a useful recursion that allows to examine the time-evolution across the network of the error dynamics relative to the reference vector 𝒲o=col{wko}k=1Ksuperscript𝒲𝑜colsuperscriptsubscriptsubscriptsuperscript𝑤𝑜𝑘𝑘1𝐾{\scriptstyle\mathcal{W}}^{o}=\text{col}\{w^{o}_{k}\}_{k=1}^{K}caligraphic_W start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = col { italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT defined in (2). Let 𝒘~k,i=wko𝒘k,isubscript~𝒘𝑘𝑖subscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖\widetilde{\boldsymbol{w}}_{k,i}=w^{o}_{k}-\boldsymbol{w}_{k,i}over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, 𝝍~k,i=wko𝝍k,isubscript~𝝍𝑘𝑖subscriptsuperscript𝑤𝑜𝑘subscript𝝍𝑘𝑖\widetilde{\boldsymbol{\psi}}_{k,i}=w^{o}_{k}-\boldsymbol{\psi}_{k,i}over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, and ϕ~k,i=wkoϕk,isubscript~bold-italic-ϕ𝑘𝑖subscriptsuperscript𝑤𝑜𝑘subscriptbold-italic-ϕ𝑘𝑖\widetilde{\boldsymbol{\phi}}_{k,i}=w^{o}_{k}-\boldsymbol{\phi}_{k,i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT. Using (20) and the mean-value theorem [47, pp. 24][6, Appendix D], we can express the stochastic gradient vector appearing in (18a) as follows:

wkJk^(𝒘k,i1)=𝑯k,i1𝒘~k,i1+bk𝒔k,i(𝒘k,i1),^subscriptsubscript𝑤𝑘subscript𝐽𝑘subscript𝒘𝑘𝑖1subscript𝑯𝑘𝑖1subscript~𝒘𝑘𝑖1subscript𝑏𝑘subscript𝒔𝑘𝑖subscript𝒘𝑘𝑖1\widehat{\nabla_{w_{k}}J_{k}}(\boldsymbol{w}_{k,i-1})=-\boldsymbol{H}_{k,i-1}% \widetilde{\boldsymbol{w}}_{k,i-1}+b_{k}-\boldsymbol{s}_{k,i}(\boldsymbol{w}_{% k,i-1}){,}over^ start_ARG ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) = - bold_italic_H start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_s start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) , (26)

where 𝑯k,i101wk2Jk(wkot𝒘~k,i1)𝑑tsubscript𝑯𝑘𝑖1superscriptsubscript01subscriptsuperscript2subscript𝑤𝑘subscript𝐽𝑘subscriptsuperscript𝑤𝑜𝑘𝑡subscript~𝒘𝑘𝑖1differential-d𝑡\boldsymbol{H}_{k,i-1}\triangleq\int_{0}^{1}\nabla^{2}_{w_{k}}J_{k}(w^{o}_{k}-% t\widetilde{\boldsymbol{w}}_{k,i-1})dtbold_italic_H start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ≜ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_t over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) italic_d italic_t and bkwkJk(wko)subscript𝑏𝑘subscriptsubscript𝑤𝑘subscript𝐽𝑘subscriptsuperscript𝑤𝑜𝑘b_{k}\triangleq\nabla_{w_{k}}J_{k}(w^{o}_{k})italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≜ ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). By subtracting wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from both sides of (18a), by using (26), and by introducing the following network quantities:

b𝑏\displaystyle bitalic_b col{b1,,bK},absentcolsubscript𝑏1subscript𝑏𝐾\displaystyle\triangleq\text{col}\left\{b_{1},\ldots,b_{K}\right\},≜ col { italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } , (27)
𝒔isubscript𝒔𝑖\displaystyle\boldsymbol{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT col{𝒔1,i(𝒘1,i1),,𝒔K,i(𝒘K,i1)},absentcolsubscript𝒔1𝑖subscript𝒘1𝑖1subscript𝒔𝐾𝑖subscript𝒘𝐾𝑖1\displaystyle\triangleq\text{col}\left\{\boldsymbol{s}_{1,i}(\boldsymbol{w}_{1% ,i-1}),\ldots,\boldsymbol{s}_{K,i}(\boldsymbol{w}_{K,i-1})\right\},≜ col { bold_italic_s start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 , italic_i - 1 end_POSTSUBSCRIPT ) , … , bold_italic_s start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_K , italic_i - 1 end_POSTSUBSCRIPT ) } , (28)
𝓗i1subscript𝓗𝑖1\displaystyle\boldsymbol{\cal{H}}_{i-1}bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT diag{𝑯1,i1,,𝑯K,i1},absentdiagsubscript𝑯1𝑖1subscript𝑯𝐾𝑖1\displaystyle\triangleq\text{diag}\left\{\boldsymbol{H}_{1,i-1},\ldots,% \boldsymbol{H}_{K,i-1}\right\},≜ diag { bold_italic_H start_POSTSUBSCRIPT 1 , italic_i - 1 end_POSTSUBSCRIPT , … , bold_italic_H start_POSTSUBSCRIPT italic_K , italic_i - 1 end_POSTSUBSCRIPT } , (29)
𝓦~i1subscript~𝓦𝑖1\displaystyle\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i-1}over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT col{𝒘~1,i1,,𝒘~K,i1},absentcolsubscript~𝒘1𝑖1subscript~𝒘𝐾𝑖1\displaystyle\triangleq\text{col}\{\widetilde{\boldsymbol{w}}_{1,i-1},\ldots,% \widetilde{\boldsymbol{w}}_{K,i-1}\},≜ col { over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT 1 , italic_i - 1 end_POSTSUBSCRIPT , … , over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_K , italic_i - 1 end_POSTSUBSCRIPT } , (30)

we can show that the network error vector 𝝍~i=col{𝝍~k,i}k=1Ksubscript~𝝍𝑖colsuperscriptsubscriptsubscript~𝝍𝑘𝑖𝑘1𝐾\widetilde{\boldsymbol{\psi}}_{i}=\text{col}\{\widetilde{\boldsymbol{\psi}}_{k% ,i}\}_{k=1}^{K}over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = col { over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT evolves according to:

𝝍~i=(IMμζ𝓗i1)𝓦~i1μζ𝒔i+μζb.subscript~𝝍𝑖subscript𝐼𝑀𝜇𝜁subscript𝓗𝑖1subscript~𝓦𝑖1𝜇𝜁subscript𝒔𝑖𝜇𝜁𝑏\widetilde{\boldsymbol{\psi}}_{i}=\left(I_{M}-\frac{\mu}{\zeta}\boldsymbol{% \cal{H}}_{i-1}\right)\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i-1}-% \frac{\mu}{\zeta}\boldsymbol{s}_{i}+\frac{\mu}{\zeta}b.over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_b . (31)

By subtracting wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from both sides of (18c), by replacing wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT by (1γ)wko+γwko1𝛾subscriptsuperscript𝑤𝑜𝑘𝛾subscriptsuperscript𝑤𝑜𝑘(1-\gamma)w^{o}_{k}+\gamma w^{o}_{k}( 1 - italic_γ ) italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_γ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and by using wko=𝒩kAkwosubscriptsuperscript𝑤𝑜𝑘subscriptsubscript𝒩𝑘subscript𝐴𝑘subscriptsuperscript𝑤𝑜w^{o}_{k}=\sum_{\ell\in\mathcal{N}_{k}}A_{k\ell}w^{o}_{\ell}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT [40, Sec. III-B], we obtain:

𝒘~k,i=(1γ)ϕ~k,i+γ𝒩kAkϕ~,i.subscript~𝒘𝑘𝑖1𝛾subscript~bold-italic-ϕ𝑘𝑖𝛾subscriptsubscript𝒩𝑘subscript𝐴𝑘subscript~bold-italic-ϕ𝑖\begin{split}\widetilde{\boldsymbol{w}}_{k,i}&=(1-\gamma)\widetilde{% \boldsymbol{\phi}}_{k,i}+\gamma\sum_{\ell\in\mathcal{N}_{k}}A_{k\ell}% \widetilde{\boldsymbol{\phi}}_{\ell,i}.\end{split}start_ROW start_CELL over~ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT end_CELL start_CELL = ( 1 - italic_γ ) over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + italic_γ ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_k roman_ℓ end_POSTSUBSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT . end_CELL end_ROW (32)

From (32), we find that the network error vector 𝓦~i1subscript~𝓦𝑖1\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i-1}over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT in (30) evolves according to:

𝓦~i1=(1γ)ϕ~i1+γ𝒜ϕ~i1=𝒜ϕ~i1,subscript~𝓦𝑖11𝛾subscript~bold-italic-ϕ𝑖1𝛾𝒜subscript~bold-italic-ϕ𝑖1superscript𝒜subscript~bold-italic-ϕ𝑖1\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i-1}=(1-\gamma)\widetilde{% \boldsymbol{\phi}}_{i-1}+\gamma\mathcal{A}\widetilde{\boldsymbol{\phi}}_{i-1}=% \mathcal{A}^{\prime}\widetilde{\boldsymbol{\phi}}_{i-1},over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = ( 1 - italic_γ ) over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_γ caligraphic_A over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , (33)

where

ϕ~isubscript~bold-italic-ϕ𝑖\displaystyle\widetilde{\boldsymbol{\phi}}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT col{ϕ~1,i,,ϕ~K,i},absentcolsubscript~bold-italic-ϕ1𝑖subscript~bold-italic-ϕ𝐾𝑖\displaystyle\triangleq\text{col}\{\widetilde{\boldsymbol{\phi}}_{1,i},\ldots,% \widetilde{\boldsymbol{\phi}}_{K,i}\}{,}≜ col { over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT 1 , italic_i end_POSTSUBSCRIPT , … , over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT } , (34)
𝒜superscript𝒜\displaystyle\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (1γ)IM+γ𝒜.absent1𝛾subscript𝐼𝑀𝛾𝒜\displaystyle\triangleq(1-\gamma)I_{M}+\gamma\mathcal{A}{.}≜ ( 1 - italic_γ ) italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT + italic_γ caligraphic_A . (35)

By subtracting wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from both sides of (18b) and by adding and subtracting wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to the difference 𝝍k,iϕk,i1subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT, we can write:

ϕ~k,i=ϕ~k,i1ζ𝓒k(ϕ~k,i1𝝍~k,i+𝒛k,i1).subscript~bold-italic-ϕ𝑘𝑖subscript~bold-italic-ϕ𝑘𝑖1𝜁subscript𝓒𝑘subscript~bold-italic-ϕ𝑘𝑖1subscript~𝝍𝑘𝑖subscript𝒛𝑘𝑖1\widetilde{\boldsymbol{\phi}}_{k,i}=\widetilde{\boldsymbol{\phi}}_{k,i-1}-% \zeta\boldsymbol{\cal{C}}_{k}(\widetilde{\boldsymbol{\phi}}_{k,i-1}-\widetilde% {\boldsymbol{\psi}}_{k,i}+\boldsymbol{z}_{k,i-1}).over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - italic_ζ bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) . (36)

Now, by adding and subtracting ζ(ϕ~k,i1𝝍~k,i+𝒛k,i1)𝜁subscript~bold-italic-ϕ𝑘𝑖1subscript~𝝍𝑘𝑖subscript𝒛𝑘𝑖1\zeta(\widetilde{\boldsymbol{\phi}}_{k,i-1}-\widetilde{\boldsymbol{\psi}}_{k,i% }+\boldsymbol{z}_{k,i-1})italic_ζ ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) to the RHS of the above equation, we obtain:

ϕ~k,i=(1ζ)ϕ~k,i1+ζ𝝍~k,i+ζ(𝒛k,i𝒛k,i1),subscript~bold-italic-ϕ𝑘𝑖1𝜁subscript~bold-italic-ϕ𝑘𝑖1𝜁subscript~𝝍𝑘𝑖𝜁subscript𝒛𝑘𝑖subscript𝒛𝑘𝑖1\widetilde{\boldsymbol{\phi}}_{k,i}=(1-\zeta)\widetilde{\boldsymbol{\phi}}_{k,% i-1}+\zeta\widetilde{\boldsymbol{\psi}}_{k,i}+\zeta(\boldsymbol{z}_{k,i}-% \boldsymbol{z}_{k,i-1}),over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = ( 1 - italic_ζ ) over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_ζ over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + italic_ζ ( bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) , (37)

in terms of the compression error vector 𝒛k,isubscript𝒛𝑘𝑖\boldsymbol{z}_{k,i}bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT defined in (19). By combining (31), (33), and (37), we conclude that the network error vector ϕ~isubscript~bold-italic-ϕ𝑖\widetilde{\boldsymbol{\phi}}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (34) evolves according to the following dynamics:

ϕ~i=𝓑i1ϕ~i1μ𝒔i+μb+(𝒛i𝒛i1),subscript~bold-italic-ϕ𝑖subscript𝓑𝑖1subscript~bold-italic-ϕ𝑖1𝜇subscript𝒔𝑖𝜇𝑏subscript𝒛𝑖subscript𝒛𝑖1\widetilde{\boldsymbol{\phi}}_{i}=\boldsymbol{\cal{B}}_{i-1}\widetilde{% \boldsymbol{\phi}}_{i-1}-\mu\boldsymbol{s}_{i}+\mu b+\left(\boldsymbol{z}_{i}-% \boldsymbol{z}_{i-1}\right),over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ italic_b + ( bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) , (38)

where

𝓑i1subscript𝓑𝑖1\displaystyle\boldsymbol{\cal{B}}_{i-1}bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (1ζ)IM+ζ(IMμζ𝓗i1)𝒜,absent1𝜁subscript𝐼𝑀𝜁subscript𝐼𝑀𝜇𝜁subscript𝓗𝑖1superscript𝒜\displaystyle\triangleq(1-\zeta)I_{M}+\zeta\left(I_{M}-\frac{\mu}{\zeta}% \boldsymbol{\cal{H}}_{i-1}\right)\mathcal{A}^{\prime},≜ ( 1 - italic_ζ ) italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT + italic_ζ ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (39)
𝒛isubscript𝒛𝑖\displaystyle\boldsymbol{z}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ζcol{𝒛k,i}k=1K.absent𝜁colsuperscriptsubscriptsubscript𝒛𝑘𝑖𝑘1𝐾\displaystyle\triangleq\zeta\text{col}\left\{\boldsymbol{z}_{k,i}\right\}_{k=1% }^{K}.≜ italic_ζ col { bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT . (40)

Remark 1 (Temporal filtering of the compression error): A direct consequence of feeding back the error in the compression step (18b) is to subtract the compression error from previous instants in recursion (38), thereby allowing for a correction mechanism333To see this, we can simply remove the error feedback mechanism from the approach (18) by replacing the compression step (18b) by ϕk,i=ϕk,i1+ζ𝓒k(𝝍k,iϕk,i1)subscriptbold-italic-ϕ𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1𝜁subscript𝓒𝑘subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1\boldsymbol{\phi}_{k,i}=\boldsymbol{\phi}_{k,i-1}+\zeta\boldsymbol{\cal{C}}_{k% }(\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1})bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_ζ bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) and derive the network error vector ϕ~isubscript~bold-italic-ϕ𝑖\widetilde{\boldsymbol{\phi}}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (34) by following similar arguments as in (26)–(40). Instead of (38), we would arrive at the following dynamics: ϕ~i=𝓑i1ϕ~i1μ𝒔i+μb+𝒛i,subscript~bold-italic-ϕ𝑖subscript𝓑𝑖1subscript~bold-italic-ϕ𝑖1𝜇subscript𝒔𝑖𝜇𝑏subscript𝒛𝑖\widetilde{\boldsymbol{\phi}}_{i}=\boldsymbol{\cal{B}}_{i-1}\widetilde{% \boldsymbol{\phi}}_{i-1}-\mu\boldsymbol{s}_{i}+\mu b+\boldsymbol{z}_{i},over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ italic_b + bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (41) where we obtain in (41) the instantaneous noise vector 𝒛isubscript𝒛𝑖\boldsymbol{z}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT instead of the difference vector 𝒛i𝒛i1subscript𝒛𝑖subscript𝒛𝑖1\boldsymbol{z}_{i}-\boldsymbol{z}_{i-1}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT as in (38).. This correction helps mitigate the accumulation of errors over time, leading to improved network performance. \blacksquare

As the presentation will reveal, the study of the network behavior in the presence of error feedback is a challenging task since, in addition to analyzing the dynamics of the network error vector ϕ~isubscript~bold-italic-ϕ𝑖\widetilde{\boldsymbol{\phi}}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we need to examine how the compression error (40), which is fed back into the network through the compression step (18b), affects its behavior. When all is said and done, the results will help clarify the effect of network topology, step-size μ𝜇\muitalic_μ, dam** coefficient ζ𝜁\zetaitalic_ζ, mixing parameter γ𝛾\gammaitalic_γ, gradient (through {βs,k2,σs,k2}subscriptsuperscript𝛽2𝑠𝑘subscriptsuperscript𝜎2𝑠𝑘\{\beta^{2}_{s,k},\sigma^{2}_{s,k}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT }) and compression (through {βc,k2,σc,k2}subscriptsuperscript𝛽2𝑐𝑘subscriptsuperscript𝜎2𝑐𝑘\{\beta^{2}_{c,k},\sigma^{2}_{c,k}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT }) noise processes on the network mean-square-error stability and performance, and will provide insights into the design of effective compression operators for decentralized learning.

IV-C Mean-square-error stability

The mean-square-error analysis will be carried out by first establishing the boundedness of lim supi𝔼ϕ~i𝒛i2subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~bold-italic-ϕ𝑖subscript𝒛𝑖2\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{\phi}}_{i}-% \boldsymbol{z}_{i}\|^{2}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and then using relation (33) and Holder’s and Jensen’s inequalities to deduce boundedness of lim supi𝔼𝓦~i2subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~𝓦𝑖2\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle% \mathcal{W}}}}_{i}\|^{2}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Therefore, in the following, we first study the stability of the network error vector ϕ~izsuperscriptsubscript~bold-italic-ϕ𝑖𝑧\widetilde{\boldsymbol{\phi}}_{i}^{z}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT defined as ϕ~izϕ~i𝒛isuperscriptsubscript~bold-italic-ϕ𝑖𝑧subscript~bold-italic-ϕ𝑖subscript𝒛𝑖\widetilde{\boldsymbol{\phi}}_{i}^{z}\triangleq\widetilde{\boldsymbol{\phi}}_{% i}-\boldsymbol{z}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ≜ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and evolving according to:

ϕ~iz=𝓑i1ϕ~i1zμ𝒔i+μb(IM𝓑i1)𝒛i1superscriptsubscript~bold-italic-ϕ𝑖𝑧subscript𝓑𝑖1subscriptsuperscript~bold-italic-ϕ𝑧𝑖1𝜇subscript𝒔𝑖𝜇𝑏subscript𝐼𝑀subscript𝓑𝑖1subscript𝒛𝑖1\boxed{\widetilde{\boldsymbol{\phi}}_{i}^{z}=\boldsymbol{\cal{B}}_{i-1}% \widetilde{\boldsymbol{\phi}}^{z}_{i-1}-\mu\boldsymbol{s}_{i}+\mu b-(I_{M}-% \boldsymbol{\cal{B}}_{i-1})\boldsymbol{z}_{i-1}}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT = bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ italic_b - ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (42)

The above identity can be found by adding and subtracting the term 𝓑i1𝒛i1subscript𝓑𝑖1subscript𝒛𝑖1\boldsymbol{\cal{B}}_{i-1}\boldsymbol{z}_{i-1}bold_caligraphic_B start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT to the RHS of (38). We analyze the stability of recursion (42) by first transforming it into a more convenient form (shown later in (65) and (66)) using the Jordan canonical decomposition [48] of the matrix 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT defined in (35). To exploit the eigen-structure of 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we first recall that a matrix 𝒜𝒜\mathcal{A}caligraphic_A satisfying the conditions in (4) (for a full-column rank semi-unitary matrix 𝒰𝒰\mathcal{U}caligraphic_U) has a Jordan decomposition of the form 𝒜=𝒱ϵΛϵ𝒱ϵ1𝒜subscript𝒱italic-ϵsubscriptΛitalic-ϵsuperscriptsubscript𝒱italic-ϵ1\mathcal{A}=\mathcal{V}_{\epsilon}\Lambda_{\epsilon}\mathcal{V}_{\epsilon}^{-1}caligraphic_A = caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT with [40, Lemma 2]:

𝒱ϵ=[𝒰𝒱R,ϵ],Λϵ=[IP00𝒥ϵ],𝒱ϵ1=[𝒰𝒱L,ϵ],formulae-sequencesubscript𝒱italic-ϵdelimited-[]𝒰subscript𝒱𝑅italic-ϵformulae-sequencesubscriptΛitalic-ϵdelimited-[]subscript𝐼𝑃0missing-subexpressionmissing-subexpression0subscript𝒥italic-ϵsuperscriptsubscript𝒱italic-ϵ1delimited-[]superscript𝒰topmissing-subexpressionsuperscriptsubscript𝒱𝐿italic-ϵtop\mathcal{V}_{\epsilon}=\left[\begin{array}[]{c|c}\mathcal{U}&\mathcal{V}_{R,% \epsilon}\end{array}\right],~{}\Lambda_{\epsilon}=\left[\begin{array}[]{c|c}I_% {P}&0\\ \hline\cr 0&\mathcal{J}_{\epsilon}\end{array}\right],~{}\mathcal{V}_{\epsilon}% ^{-1}=\left[\begin{array}[]{c}\mathcal{U}^{\top}\\ \hline\cr\mathcal{V}_{L,\epsilon}^{\top}\end{array}\right],caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL caligraphic_U end_CELL start_CELL caligraphic_V start_POSTSUBSCRIPT italic_R , italic_ϵ end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , roman_Λ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , (43)

where 𝒥ϵsubscript𝒥italic-ϵ\mathcal{J}_{\epsilon}caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT is a Jordan matrix with eigenvalues (which may be complex but have magnitude less than one) on the diagonal and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 on the super-diagonal [40, Lemma 2],[6, pp. 510]. The parameter ϵitalic-ϵ\epsilonitalic_ϵ is chosen small enough to ensure ρ(𝒥ϵ)+ϵ(0,1)𝜌subscript𝒥italic-ϵitalic-ϵ01\rho(\mathcal{J}_{\epsilon})+\epsilon\in(0,1)italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ∈ ( 0 , 1 ) [40]. Consequently, the matrix 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in (35) has a Jordan decomposition of the form 𝒜=𝒱ϵΛϵ𝒱ϵ1superscript𝒜subscript𝒱italic-ϵsubscriptsuperscriptΛitalic-ϵsuperscriptsubscript𝒱italic-ϵ1\mathcal{A}^{\prime}=\mathcal{V}_{\epsilon}\Lambda^{\prime}_{\epsilon}\mathcal% {V}_{\epsilon}^{-1}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT where:

Λϵ=[IP00𝒥ϵ],with 𝒥ϵ(1γ)IMP+γ𝒥ϵ.formulae-sequencesubscriptsuperscriptΛitalic-ϵdelimited-[]subscript𝐼𝑃0missing-subexpressionmissing-subexpression0subscriptsuperscript𝒥italic-ϵwith subscriptsuperscript𝒥italic-ϵ1𝛾subscript𝐼𝑀𝑃𝛾subscript𝒥italic-ϵ\Lambda^{\prime}_{\epsilon}=\left[\begin{array}[]{c|c}I_{P}&0\\ \hline\cr 0&\mathcal{J}^{\prime}_{\epsilon}\end{array}\right],\quad{\text{with% }}\mathcal{J}^{\prime}_{\epsilon}\triangleq(1-\gamma)I_{M-P}+\gamma\mathcal{J% }_{\epsilon}.roman_Λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , with caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ≜ ( 1 - italic_γ ) italic_I start_POSTSUBSCRIPT italic_M - italic_P end_POSTSUBSCRIPT + italic_γ caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT . (44)

By multiplying both sides of (42) from the left by 𝒱ϵ1superscriptsubscript𝒱italic-ϵ1\mathcal{V}_{\epsilon}^{-1}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in (43), we obtain the transformed iterates and variables:

𝒱ϵ1ϕ~izsuperscriptsubscript𝒱italic-ϵ1superscriptsubscript~bold-italic-ϕ𝑖𝑧\displaystyle\mathcal{V}_{\epsilon}^{-1}\widetilde{\boldsymbol{\phi}}_{i}^{z}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT =[𝒰ϕ~iz𝒱L,ϵϕ~iz][ϕ¯izϕwidecheckiz],absentdelimited-[]superscript𝒰topsubscriptsuperscript~bold-italic-ϕ𝑧𝑖superscriptsubscript𝒱𝐿italic-ϵtopsubscriptsuperscript~bold-italic-ϕ𝑧𝑖delimited-[]subscriptsuperscript¯bold-italic-ϕ𝑧𝑖subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖\displaystyle=\left[\begin{array}[]{c}\mathcal{U}^{\top}\widetilde{\boldsymbol% {\phi}}^{z}_{i}\\ \mathcal{V}_{L,\epsilon}^{\top}\widetilde{\boldsymbol{\phi}}^{z}_{i}\end{array% }\right]\triangleq\left[\begin{array}[]{c}\overline{\boldsymbol{\phi}}^{z}_{i}% \\ \widecheck{\boldsymbol{\phi}}^{z}_{i}\end{array}\right],= [ start_ARRAY start_ROW start_CELL caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] ≜ [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , (49)
𝒱ϵ1𝒔isuperscriptsubscript𝒱italic-ϵ1subscript𝒔𝑖\displaystyle\mathcal{V}_{\epsilon}^{-1}\boldsymbol{s}_{i}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =[𝒰𝒔i𝒱L,ϵ𝒔i][𝒔¯i𝒔widechecki],absentdelimited-[]superscript𝒰topsubscript𝒔𝑖superscriptsubscript𝒱𝐿italic-ϵtopsubscript𝒔𝑖delimited-[]subscript¯𝒔𝑖subscriptwidecheck𝒔𝑖\displaystyle=\left[\begin{array}[]{c}\mathcal{U}^{\top}\boldsymbol{s}_{i}\\ \mathcal{V}_{L,\epsilon}^{\top}\boldsymbol{s}_{i}\end{array}\right]\triangleq% \left[\begin{array}[]{c}\overline{\boldsymbol{s}}_{i}\\ \widecheck{\boldsymbol{s}}_{i}\end{array}\right],= [ start_ARRAY start_ROW start_CELL caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] ≜ [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , (54)
𝒱ϵ1bsuperscriptsubscript𝒱italic-ϵ1𝑏\displaystyle\mathcal{V}_{\epsilon}^{-1}bcaligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b =[𝒰b𝒱L,ϵb][0bwidecheck],absentdelimited-[]superscript𝒰top𝑏superscriptsubscript𝒱𝐿italic-ϵtop𝑏delimited-[]0widecheck𝑏\displaystyle=\left[\begin{array}[]{c}\mathcal{U}^{\top}b\\ \mathcal{V}_{L,\epsilon}^{\top}b\end{array}\right]\triangleq\left[\begin{array% }[]{c}0\\ \widecheck{b}\end{array}\right],= [ start_ARRAY start_ROW start_CELL caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b end_CELL end_ROW start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b end_CELL end_ROW end_ARRAY ] ≜ [ start_ARRAY start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG italic_b end_ARG end_CELL end_ROW end_ARRAY ] , (59)
𝒱ϵ1𝒛i1superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖1\displaystyle\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT =[𝒰𝒛i1𝒱L,ϵ𝒛i1][𝒛¯i1𝒛widechecki1],absentdelimited-[]superscript𝒰topsubscript𝒛𝑖1superscriptsubscript𝒱𝐿italic-ϵtopsubscript𝒛𝑖1delimited-[]subscript¯𝒛𝑖1subscriptwidecheck𝒛𝑖1\displaystyle=\left[\begin{array}[]{c}\mathcal{U}^{\top}\boldsymbol{z}_{i-1}\\ \mathcal{V}_{L,\epsilon}^{\top}\boldsymbol{z}_{i-1}\end{array}\right]% \triangleq\left[\begin{array}[]{c}\overline{\boldsymbol{z}}_{i-1}\\ \widecheck{\boldsymbol{z}}_{i-1}\end{array}\right],= [ start_ARRAY start_ROW start_CELL caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] ≜ [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] , (64)

where in (59) we used the fact that 𝒰b=0superscript𝒰top𝑏0\mathcal{U}^{\top}b=0caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b = 0 as shown in [40, Sec. III-B]. In particular, the transformed components ϕ¯izsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖\overline{\boldsymbol{\phi}}^{z}_{i}over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ϕwidecheckizsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖{\widecheck{\boldsymbol{\phi}}}^{z}_{i}overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT evolve according to the recursions:

ϕ¯izsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖\displaystyle\overline{\boldsymbol{\phi}}^{z}_{i}over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(IPμ𝓓11,i1)ϕ¯i1zμ𝓓12,i1ϕwidechecki1zμ𝒔¯iμ𝓓11,i1𝒛¯i1μ𝓓12,i1𝒛widechecki1absentsubscript𝐼𝑃𝜇subscript𝓓11𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇subscript¯𝒔𝑖𝜇subscript𝓓11𝑖1subscript¯𝒛𝑖1𝜇subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖1\displaystyle=(I_{P}-\mu\boldsymbol{\cal{D}}_{11,i-1})\overline{\boldsymbol{% \phi}}^{z}_{i-1}-\mu\boldsymbol{\cal{D}}_{12,i-1}\widecheck{\boldsymbol{\phi}}% ^{z}_{i-1}-\mu\overline{\boldsymbol{s}}_{i}-\mu\boldsymbol{\cal{D}}_{11,i-1}% \overline{\boldsymbol{z}}_{i-1}-\mu\boldsymbol{\cal{D}}_{12,i-1}\widecheck{% \boldsymbol{z}}_{i-1}= ( italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT ) over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (65)
ϕwidecheckizsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖\displaystyle{\widecheck{\boldsymbol{\phi}}}^{z}_{i}overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(𝒥ϵ′′μ𝓓22,i1)ϕwidechecki1zμ𝓓21,i1ϕ¯i1z+μbwidecheckμ𝒔widecheckiμ𝓓21,i1𝒛¯i1(ζ(I𝒥ϵ)+μ𝓓22,i1)𝒛widechecki1absentsubscriptsuperscript𝒥′′italic-ϵ𝜇subscript𝓓22𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇subscript𝓓21𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇widecheck𝑏𝜇subscriptwidecheck𝒔𝑖𝜇subscript𝓓21𝑖1subscript¯𝒛𝑖1𝜁𝐼superscriptsubscript𝒥italic-ϵ𝜇subscript𝓓22𝑖1subscriptwidecheck𝒛𝑖1\displaystyle=(\mathcal{J}^{\prime\prime}_{\epsilon}-\mu\boldsymbol{\cal{D}}_{% 22,i-1})\widecheck{\boldsymbol{\phi}}^{z}_{i-1}-\mu\boldsymbol{\cal{D}}_{21,i-% 1}\overline{\boldsymbol{\phi}}^{z}_{i-1}+\mu\widecheck{b}-\mu\widecheck{% \boldsymbol{s}}_{i}-\mu\boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{z}}_% {i-1}-\left(\zeta(I-\mathcal{J}_{\epsilon}^{\prime})+\mu\boldsymbol{\cal{D}}_{% 22,i-1}\right)\widecheck{\boldsymbol{z}}_{i-1}= ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_μ overwidecheck start_ARG italic_b end_ARG - italic_μ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - ( italic_ζ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (66)

where

𝓓11,i1subscript𝓓11𝑖1\displaystyle\boldsymbol{\cal{D}}_{11,i-1}bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT 𝒰𝓗i1𝒰,absentsuperscript𝒰topsubscript𝓗𝑖1𝒰\displaystyle\triangleq\mathcal{U}^{\top}\boldsymbol{\cal{H}}_{i-1}\mathcal{U},≜ caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT caligraphic_U , (67)
𝓓12,i1subscript𝓓12𝑖1\displaystyle\boldsymbol{\cal{D}}_{12,i-1}bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT 𝒰𝓗i1𝒱R,ϵ𝒥ϵ,absentsuperscript𝒰topsubscript𝓗𝑖1subscript𝒱𝑅italic-ϵsuperscriptsubscript𝒥italic-ϵ\displaystyle\triangleq\mathcal{U}^{\top}\boldsymbol{\cal{H}}_{i-1}\mathcal{V}% _{R,\epsilon}\mathcal{J}_{\epsilon}^{\prime},≜ caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_R , italic_ϵ end_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (68)
𝓓21,i1subscript𝓓21𝑖1\displaystyle\boldsymbol{\cal{D}}_{21,i-1}bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT 𝒱L,ϵ𝓗i1𝒰,absentsuperscriptsubscript𝒱𝐿italic-ϵtopsubscript𝓗𝑖1𝒰\displaystyle\triangleq\mathcal{V}_{L,\epsilon}^{\top}\boldsymbol{\cal{H}}_{i-% 1}\mathcal{U},≜ caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT caligraphic_U , (69)
𝓓22,i1subscript𝓓22𝑖1\displaystyle\boldsymbol{\cal{D}}_{22,i-1}bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT 𝒱L,ϵ𝓗i1𝒱R,ϵ𝒥ϵ,absentsuperscriptsubscript𝒱𝐿italic-ϵtopsubscript𝓗𝑖1subscript𝒱𝑅italic-ϵsuperscriptsubscript𝒥italic-ϵ\displaystyle\triangleq\mathcal{V}_{L,\epsilon}^{\top}\boldsymbol{\cal{H}}_{i-% 1}\mathcal{V}_{R,\epsilon}\mathcal{J}_{\epsilon}^{\prime},≜ caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_R , italic_ϵ end_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (70)
𝒥ϵ′′subscriptsuperscript𝒥′′italic-ϵ\displaystyle\mathcal{J}^{\prime\prime}_{\epsilon}caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT (1ζ)IMP+ζ𝒥ϵ.absent1𝜁subscript𝐼𝑀𝑃𝜁subscriptsuperscript𝒥italic-ϵ\displaystyle\triangleq(1-\zeta)I_{M-P}+\zeta\mathcal{J}^{\prime}_{\epsilon}.≜ ( 1 - italic_ζ ) italic_I start_POSTSUBSCRIPT italic_M - italic_P end_POSTSUBSCRIPT + italic_ζ caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT . (71)
Theorem 1.

(Mean-square-error stability). Consider a network of K𝐾Kitalic_K agents running the DEF-ATC diffusion approach (listed in Algorithm 1) to solve problem (2) under Assumptions 12, and 3, with a matrix 𝒜𝒜\mathcal{A}caligraphic_A satisfying (4). In the absence of the relative compression noise term (i.e., βc,k2=0subscriptsuperscript𝛽2𝑐𝑘0\beta^{2}_{c,k}=0italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT = 0, kfor-all𝑘\forall k∀ italic_k), let the dam** and mixing parameters be such that ζ=γ=1𝜁𝛾1\zeta=\gamma=1italic_ζ = italic_γ = 1. In the presence of the relative compression noise (i.e., at least one βc,k2subscriptsuperscript𝛽2𝑐𝑘\beta^{2}_{c,k}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT is positive for some agent k𝑘kitalic_k), let ζ(0,1]𝜁01\zeta\in(0,1]italic_ζ ∈ ( 0 , 1 ] and γ(0,1]𝛾01\gamma\in(0,1]italic_γ ∈ ( 0 , 1 ] be such that the two following conditions are satisfied:

0<γζ<1(ρ(𝒥ϵ)+ϵ)4v12v22βc,max2(ρ(I𝒥ϵ)+ϵ)2,0𝛾𝜁1𝜌subscript𝒥italic-ϵitalic-ϵ4superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript𝛽2𝑐superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ2{0<\gamma\zeta<\frac{1-(\rho(\mathcal{J}_{\epsilon})+\epsilon)}{4v_{1}^{2}v_{2% }^{2}\beta^{2}_{c,\max}(\rho(I-\mathcal{J}_{\epsilon})+\epsilon)^{2}},}0 < italic_γ italic_ζ < divide start_ARG 1 - ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) end_ARG start_ARG 4 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (72)

and

γζ(ρ(I𝒥ϵ)+ϵ)21(ρ(𝒥ϵ)+ϵ)+ζ2βc,max2v12v22(1+((1+γ)γ(ρ(𝒥ϵ)+ϵ))2)<12,𝛾𝜁superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ21𝜌subscript𝒥italic-ϵitalic-ϵsuperscript𝜁2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣221superscript1𝛾𝛾𝜌subscript𝒥italic-ϵitalic-ϵ212\begin{split}{\gamma\zeta\frac{(\rho(I-\mathcal{J}_{\epsilon})+\epsilon)^{2}}{% 1-(\rho(\mathcal{J}_{\epsilon})+\epsilon)}+\zeta^{2}\beta_{c,\max}^{2}v_{1}^{2% }v_{2}^{2}\left(1+\left((1+\gamma)-\gamma(\rho(\mathcal{J}_{\epsilon})+% \epsilon)\right)^{2}\right)<\frac{1}{2},}\end{split}start_ROW start_CELL italic_γ italic_ζ divide start_ARG ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) end_ARG + italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + ( ( 1 + italic_γ ) - italic_γ ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < divide start_ARG 1 end_ARG start_ARG 2 end_ARG , end_CELL end_ROW (73)

where v1=𝒱ϵ1subscript𝑣1normsuperscriptsubscript𝒱italic-ϵ1v_{1}=\|\mathcal{V}_{\epsilon}^{-1}\|italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥, v2=𝒱ϵsubscript𝑣2normsubscript𝒱italic-ϵv_{2}=\|\mathcal{V}_{\epsilon}\|italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥, and βc,max2max1kK{βc,k2}superscriptsubscript𝛽𝑐2subscript1𝑘𝐾subscriptsuperscript𝛽2𝑐𝑘{\beta}_{c,\max}^{2}\triangleq\max_{1\leq k\leq K}\{{\beta}^{2}_{c,k}\}italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT { italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT }. Then, for sufficiently small step-size μ𝜇\muitalic_μ, the network is mean-square-error stable, and it holds that:

lim supi𝔼ϕ~iz2=κμ+σ¯c2O(1),subscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝜅𝜇subscriptsuperscript¯𝜎2𝑐𝑂1\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{\phi}}^{z}_{i}% \|^{2}=\kappa\mu+\overline{\sigma}^{2}_{c}O(1),lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_κ italic_μ + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) , (74)

where σ¯c2=k=1Kσc,k2subscriptsuperscript¯𝜎2𝑐superscriptsubscript𝑘1𝐾subscriptsuperscript𝜎2𝑐𝑘\overline{\sigma}^{2}_{c}=\sum_{k=1}^{K}{\sigma}^{2}_{c,k}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT. The constant κ𝜅\kappaitalic_κ is positive, independent of the step-size μ𝜇\muitalic_μ and the compression noise terms {βc,k2,σc,k2}subscriptsuperscript𝛽2𝑐𝑘subscriptsuperscript𝜎2𝑐𝑘\{\beta^{2}_{c,k},\sigma^{2}_{c,k}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT }, and is given by κ=v12v22σ¯s2σ11𝜅superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript¯𝜎2𝑠subscript𝜎11\kappa=v_{1}^{2}v_{2}^{2}\frac{\overline{\sigma}^{2}_{s}}{\sigma_{11}}italic_κ = italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG with σ¯s2=k=1Kσs,k2subscriptsuperscript¯𝜎2𝑠superscriptsubscript𝑘1𝐾superscriptsubscript𝜎𝑠𝑘2\overline{\sigma}^{2}_{s}=\sum_{k=1}^{K}\sigma_{s,k}^{2}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and σ11subscript𝜎11\sigma_{11}italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT is some positive constant resulting from the derivation of inequality (90) in Appendix B. Moreover, by choosing compression schemes with σc,k2μ1+εproportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT (where the symbol proportional-to\propto hides a proportionality constant independent of μ𝜇\muitalic_μ) and ε(0,1]𝜀01\varepsilon\in(0,1]italic_ε ∈ ( 0 , 1 ], we obtain:

lim supi𝔼ϕ~iz2=κμ+O(μ1+ε).subscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝜅𝜇𝑂superscript𝜇1𝜀{\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{\phi}}^{z}_{i}% \|^{2}=\kappa\mu+O(\mu^{1+{\varepsilon}}).}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_κ italic_μ + italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ) . (75)

It then holds that:

lim supi𝔼𝓦~i2=κμ+O(μ1+ε2),subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~𝓦𝑖2𝜅𝜇𝑂superscript𝜇1𝜀2\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle% \mathcal{W}}}}_{i}\|^{2}=\kappa\mu+O(\mu^{1+\frac{\varepsilon}{2}}),lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_κ italic_μ + italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + divide start_ARG italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) , (76)

from which we conclude that:

limμ0lim supi1μ𝔼wko𝒘k,i2=κ,subscript𝜇0subscriptlimit-supremum𝑖1𝜇𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2𝜅{\lim_{\mu\rightarrow 0}\limsup_{i\rightarrow\infty}\frac{1}{\mu}\mathbb{E}\|w% ^{o}_{k}-\boldsymbol{w}_{k,i}\|^{2}}=\kappa,roman_lim start_POSTSUBSCRIPT italic_μ → 0 end_POSTSUBSCRIPT lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_κ , (77)

for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K.

Proof.

See Appendix B. ∎

While expressions (74)–(77) in Theorem 1 reveal the influence of the step-size μ𝜇\muitalic_μ, the compression noise (captured by {σ¯c2,βc,max2}subscriptsuperscript¯𝜎2𝑐subscriptsuperscript𝛽2𝑐\{\overline{\sigma}^{2}_{c},\beta^{2}_{c,\max}\}{ over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT }), and the gradient noise (captured by σ¯s2subscriptsuperscript¯𝜎2𝑠\overline{\sigma}^{2}_{s}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) on the steady-state mean-square error, expressions (72) and (73) reveal the influence of the relative compression noise term (captured by βc,max2subscriptsuperscript𝛽2𝑐\beta^{2}_{c,\max}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT) on the network stability, and how this influence can be mitigated by properly choosing the dam** coefficient ζ𝜁\zetaitalic_ζ and the mixing parameter γ𝛾\gammaitalic_γ. One main conclusion stemming from Theorem 1 (expression (74)) is that the mean-square-error contains two terms. The first term is κμ𝜅𝜇\kappa\muitalic_κ italic_μ where κ𝜅\kappaitalic_κ is a constant independent of the compression noise {βc,k2,σc,k2}subscriptsuperscript𝛽2𝑐𝑘subscriptsuperscript𝜎2𝑐𝑘\{\beta^{2}_{c,k},\sigma^{2}_{c,k}\}{ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT }, but depends on the gradient noise {σs,k2}subscriptsuperscript𝜎2𝑠𝑘\{\sigma^{2}_{s,k}\}{ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT }. This term, which we refer to as the gradient noise term, is classically encountered in the uncompressed case [40]. The second factor is an O(1)𝑂1O(1)italic_O ( 1 ) term that is proportional to the quantizers’ absolute noise components {σc,k2}subscriptsuperscript𝜎2𝑐𝑘\{\sigma^{2}_{c,k}\}{ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT }. Interestingly, by choosing compression schemes with σc,k2μ1+εproportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT and ε(0,1]𝜀01\varepsilon\in(0,1]italic_ε ∈ ( 0 , 1 ], for sufficiently small step-sizes μ𝜇\muitalic_μ we obtain lim supi𝔼wko𝒘k,i2κμsubscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2𝜅𝜇\limsup_{i\rightarrow\infty}\mathbb{E}\|w^{o}_{k}-\boldsymbol{w}_{k,i}\|^{2}% \approx\kappa\mulim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≈ italic_κ italic_μ. This result is reassuring since it implies that the impact of the compression noise can be minimized to the point where it only affects higher-order terms of the step-size. Consequently, the primary noise influencing the learning process will be the gradient noise, which is consistent with the classical results observed in the uncompressed case studied in [40].

While result (77) is appealing, it is not sufficient to characterize the DEF-ATC diffusion approach. To fully characterize a decentralized strategy endowed with a compression mechanism, it is essential to consider the learning-communication tradeoff. In other words, we need to assess also how the design choice σc,k2μ1+εproportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT impacts the amount of communication resources (e.g., quantization bits). For instance, consider the probabilistic uniform quantizer from Table II. For this scheme, setting σc,k2μ1+εproportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT is equivalent to requiring the quantization step ΔΔ\Deltaroman_Δ to be proportional to μ1+ε2superscript𝜇1𝜀2\mu^{\frac{1+\varepsilon}{2}}italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT. Thus, while small values of σc,k2subscriptsuperscript𝜎2𝑐𝑘\sigma^{2}_{c,k}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT imply small compression errors in view of (7), they might in principle require large bit rates. Moreover, as μ0𝜇0\mu\rightarrow 0italic_μ → 0, the quantization step ΔΔ\Deltaroman_Δ becomes very small, potentially leading to an unbounded bit rate increase. It becomes therefore important to find a quantization scheme that achieves the same performance as the uncompressed case, i.e., lim supi𝔼wko𝒘k,i2κμsubscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2𝜅𝜇\limsup_{i\rightarrow\infty}\mathbb{E}\|w^{o}_{k}-\boldsymbol{w}_{k,i}\|^{2}% \approx\kappa\mulim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≈ italic_κ italic_μ, while guaranteeing a finite bit rate as μ0𝜇0\mu\rightarrow 0italic_μ → 0. In the next theorem, we show that the DEF-ATC diffusion approach equipped with the variable-rate coding scheme from [25][26, Sec. II] achieves both objectives.

IV-D Bit rate stability

We first assume that the top-cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT quantizer in Definition 4 (with a subscript k𝑘kitalic_k added to c𝑐citalic_c to highlight the fact that the compression characteristics can vary across agents) is used at each iteration i𝑖iitalic_i and agent k𝑘kitalic_k. We then recall that the quantizer input is given by the error compensated difference 𝝌k,i=𝝍k,iϕk,i1+𝒛k,i1subscript𝝌𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT, and assume that the probabilistic ANQ scheme444The probabilistic uniform rule (Table II, row 2222) can be obtained from the ANQ rule by letting ω0𝜔0\omega\rightarrow 0italic_ω → 0 [26]. (Table II, row 3333) is employed at the output of the top-cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT sparsifier. Consequently, from (9), the bit rate at agent k𝑘kitalic_k and iteration i𝑖iitalic_i is given by:

rk,i=log2(3)jck,i(1+𝔼[log2(|𝒏([𝝌k,i]j)|+1)])+(Mkck)log2(3),subscript𝑟𝑘𝑖subscript23subscript𝑗subscriptsubscript𝑐𝑘𝑖1𝔼delimited-[]subscript2𝒏subscriptdelimited-[]subscript𝝌𝑘𝑖𝑗1subscript𝑀𝑘subscript𝑐𝑘subscript23{{r}_{k,i}}=\log_{2}(3)\sum_{j\in\mathcal{I}_{c_{k,i}}}\Big{(}1+\mathbb{E}\Big% {[}\left\lceil\log_{2}(|\boldsymbol{n}([\boldsymbol{\chi}_{k,i}]_{j})|+1)% \right\rceil\Big{]}\Big{)}+(M_{k}-c_{k})\log_{2}(3),italic_r start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 + blackboard_E [ ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( | bold_italic_n ( [ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | + 1 ) ⌉ ] ) + ( italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) , (78)

where [𝝌k,i]jsubscriptdelimited-[]subscript𝝌𝑘𝑖𝑗[\boldsymbol{\chi}_{k,i}]_{j}[ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denotes the j𝑗jitalic_j-th entry of 𝝌k,isubscript𝝌𝑘𝑖\boldsymbol{\chi}_{k,i}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, ck,isubscriptsubscript𝑐𝑘𝑖\mathcal{I}_{c_{k,i}}caligraphic_I start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the set containing the indices of the cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT largest-magnitude components of 𝝌k,isubscript𝝌𝑘𝑖\boldsymbol{\chi}_{k,i}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, and (Mkck)log2(3)subscript𝑀𝑘subscript𝑐𝑘subscript23(M_{k}-c_{k})\log_{2}(3)( italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) is the number of bits required to encode the Mkcksubscript𝑀𝑘subscript𝑐𝑘M_{k}-c_{k}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT zero components at the output of the top-cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT sparsifier555Encoding the 00 input using the variable-rate coding scheme from [25][26, Sec. II] requires only a parsing symbol, i.e., log2(3)subscript23\log_{2}(3)roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) bits. Instead of encoding the zero components, agent k𝑘kitalic_k can send the location of the cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT largest-magnitude components. In this case, the term (Mkck)log2(3)subscript𝑀𝑘subscript𝑐𝑘subscript23(M_{k}-c_{k})\log_{2}(3)( italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) is replaced by cklog2(Mk)subscript𝑐𝑘subscript2subscript𝑀𝑘c_{k}\lceil\log_{2}(M_{k})\rceilitalic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⌉. This alternative solution does not affect the main conclusions of Sec. IV-D..

Theorem 2.

(Bit rate stability). Assume that each agent k𝑘kitalic_k employs the top-cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT quantizer (see Definition 4) with the probabilistic ANQ scheme. Assume further that the design parameters of the compression operators are chosen such that:

ωk=t,ηkμ1+ε2,formulae-sequencesubscript𝜔𝑘𝑡proportional-tosubscript𝜂𝑘superscript𝜇1𝜀2\omega_{k}=t,\qquad\eta_{k}\propto\mu^{\frac{1+\varepsilon}{2}},italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_t , italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (79)

where t𝑡titalic_t is a constant independent of μ𝜇\muitalic_μ and 0<ε10𝜀10<\varepsilon\leq 10 < italic_ε ≤ 1. First, under conditions (79), we have:

σc,k2μ1+ε.proportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}.italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT . (80)

Second, in the steady state, the average number of bits at agent k𝑘kitalic_k stays bounded as μ0𝜇0\mu\rightarrow 0italic_μ → 0, namely,

lim supirk,i=O(1).subscriptlimit-supremum𝑖subscript𝑟𝑘𝑖𝑂1\limsup_{i\rightarrow\infty}\,{r}_{k,i}=O(1).lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = italic_O ( 1 ) . (81)
Proof.

See Appendix C. ∎

The bit rate stability result (81) can be explained by considering again the uniform quantization rule (Table II, row 2222) which, as explained previously, requires setting Δμ1+ε2proportional-toΔsuperscript𝜇1𝜀2\Delta\propto\mu^{\frac{1+\varepsilon}{2}}roman_Δ ∝ italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT in order to guarantee that σc,k2μ1+εproportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇1𝜀\sigma^{2}_{c,k}\propto\mu^{1+\varepsilon}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT. The result (159) reveals that the input to the compression operators of the DEF-ATC strategy in (18b), namely, the error compensated difference 𝝌k,i=𝝍k,iϕk,i1+𝒛k,i1subscript𝝌𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT, under (80) is on the order of μ1+ε2superscript𝜇1𝜀2\mu^{\frac{1+\varepsilon}{2}}italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT at steady-state. This means that as μ0𝜇0\mu\rightarrow 0italic_μ → 0, the quantizer resolution Δμ1+ε2proportional-toΔsuperscript𝜇1𝜀2\Delta\propto\mu^{\frac{1+\varepsilon}{2}}roman_Δ ∝ italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT decreases, but in proportion to the effective range of the quantizers’ inputs. Theorem 2 reveals the adaptability of the variable-rate scheme, ensuring that even as the quantization becomes increasingly precise (as μ0𝜇0\mu\rightarrow 0italic_μ → 0), the DEF-ATC strategy can still maintain a finite expected bit rate, which is crucial for efficient data transmission.

V Simulation results

In this section, we first illustrate the theoretical results of Theorems 1 and 2. Then, we illustrate the benefit of the top-c𝑐citalic_c quantizer over other quantizers, particularly those that quantize a vector element-wise, without prioritizing the c𝑐citalic_c most important components. In the third part, we compare DEF-ATC to state-of-the-art baselines in various scenarios, including those beyond the small step-size regime. The first three parts focus on solving single-task optimization problems of the form (1). The last part illustrates the performance of the DEF-ATC approach when used to solve multitask estimation problems with overlap** parameter vectors.

Refer to caption
Refer to caption
Figure 2: Experimental setup. (Left) Communication link matrix. (Right) Regression and noise variances.

We consider a network of K=30𝐾30K=30italic_K = 30 nodes with the communication link matrix shown in Fig. 2 (left), where the (k,)𝑘(k,\ell)( italic_k , roman_ℓ )-th entry is equal to 1111 if there is a link between k𝑘kitalic_k and \ellroman_ℓ and is 0 otherwise. Each agent is subjected to streaming data {𝒅k(i),𝒖k,i}subscript𝒅𝑘𝑖subscript𝒖𝑘𝑖\{\boldsymbol{d}_{k}(i),\boldsymbol{u}_{k,i}\}{ bold_italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) , bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT } assumed to satisfy a linear regression model of the form 𝒅k(i)=𝒖k,iwk+𝒗k(i)subscript𝒅𝑘𝑖superscriptsubscript𝒖𝑘𝑖topsubscriptsuperscript𝑤𝑘subscript𝒗𝑘𝑖\boldsymbol{d}_{k}(i)=\boldsymbol{u}_{k,i}^{\top}w^{\star}_{k}+\boldsymbol{v}_% {k}(i)bold_italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) for some Mc×1subscript𝑀𝑐1{{M_{c}}}\times 1italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × 1 vector wksubscriptsuperscript𝑤𝑘w^{\star}_{k}italic_w start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with 𝒗k(i)subscript𝒗𝑘𝑖\boldsymbol{v}_{k}(i)bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) denoting a zero-mean measurement noise and Mc=10subscript𝑀𝑐10{{M_{c}}}=10italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 10. A mean-square-error risk of the form Jk(wk)=𝔼|𝒅k(i)𝒖k,iwk|2subscript𝐽𝑘subscript𝑤𝑘𝔼superscriptsubscript𝒅𝑘𝑖superscriptsubscript𝒖𝑘𝑖topsubscript𝑤𝑘2J_{k}(w_{k})=\mathbb{E}|\boldsymbol{d}_{k}(i)-\boldsymbol{u}_{k,i}^{\top}w_{k}% |^{2}italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = blackboard_E | bold_italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is associated with each agent k𝑘kitalic_k. The processes {𝒖k,i,𝒗k(i)}subscript𝒖𝑘𝑖subscript𝒗𝑘𝑖\{\boldsymbol{u}_{k,i},\boldsymbol{v}_{k}(i)\}{ bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) } are assumed to be zero-mean Gaussian with: i)i)italic_i ) 𝔼𝒖k,i𝒖,i=Ru,k=σu,k2IMc𝔼subscript𝒖𝑘𝑖superscriptsubscript𝒖𝑖topsubscript𝑅𝑢𝑘subscriptsuperscript𝜎2𝑢𝑘subscript𝐼subscript𝑀𝑐\mathbb{E}\boldsymbol{u}_{k,i}\boldsymbol{u}_{\ell,i}^{\top}=R_{u,k}=\sigma^{2% }_{u,k}I_{{M_{c}}}blackboard_E bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_R start_POSTSUBSCRIPT italic_u , italic_k end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_k end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT if k=𝑘k=\ellitalic_k = roman_ℓ and 00 otherwise; ii)ii)italic_i italic_i ) 𝔼𝒗k(i)𝒗(i)=σv,k2𝔼subscript𝒗𝑘𝑖subscript𝒗𝑖subscriptsuperscript𝜎2𝑣𝑘\mathbb{E}\boldsymbol{v}_{k}(i)\boldsymbol{v}_{\ell}(i)=\sigma^{2}_{v,k}blackboard_E bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) bold_italic_v start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i ) = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v , italic_k end_POSTSUBSCRIPT if k=𝑘k=\ellitalic_k = roman_ℓ and 00 otherwise; and iii)iii)italic_i italic_i italic_i ) 𝒖k,isubscript𝒖𝑘𝑖\boldsymbol{u}_{k,i}bold_italic_u start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT and 𝒗k(i)subscript𝒗𝑘𝑖\boldsymbol{v}_{k}(i)bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) are independent of each other. The variances σu,k2subscriptsuperscript𝜎2𝑢𝑘\sigma^{2}_{u,k}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_k end_POSTSUBSCRIPT and σv,k2subscriptsuperscript𝜎2𝑣𝑘\sigma^{2}_{v,k}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v , italic_k end_POSTSUBSCRIPT are shown in Fig. 2 (right). Throughout Sec. V, we assume that all agents employ the same compression rule, i.e., 𝓒k=𝓒subscript𝓒𝑘𝓒\boldsymbol{\cal{C}}_{k}=\boldsymbol{\cal{C}}bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_caligraphic_C kfor-all𝑘\forall k∀ italic_k. We use the terminology “top-c𝑐citalic_c quantizer-name” to refer to the top-c𝑐citalic_c quantizer of Definition 4 where, as compression scheme 𝓠𝓠\boldsymbol{\cal{Q}}bold_caligraphic_Q, we use quantizer-name. For instance, “top-4444 probabilistic ANQ” is the quantizer obtained by applying the probabilistic ANQ scheme at the output of the top-4444 sparsifier.

V-A Illustrating the theoretical findings

In this section and the following Secs. V-B and V-C, we assume that agents have a common model parameter wk=wosubscriptsuperscript𝑤𝑘superscript𝑤𝑜w^{\star}_{k}=w^{o}italic_w start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT kfor-all𝑘\forall k∀ italic_k. The model wosuperscript𝑤𝑜w^{o}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT is generated by normalizing to unit norm a randomly generated Gaussian vector, with zero mean and unit variance. To promote consensus (i.e., to solve problem (1) or, equivalently, (2) with 𝒰=1K(𝟙KIMc)𝒰1𝐾tensor-productsubscript1𝐾subscript𝐼subscript𝑀𝑐\mathcal{U}=\frac{1}{\sqrt{K}}(\mathds{1}_{K}\otimes I_{{M_{c}}})caligraphic_U = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ( blackboard_1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⊗ italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT )), we run Alg. 1 using a combination matrix of the form 𝒜=AIMc𝒜tensor-product𝐴subscript𝐼subscript𝑀𝑐\mathcal{A}=A\otimes I_{{M_{c}}}caligraphic_A = italic_A ⊗ italic_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where A𝐴Aitalic_A is generated according to the Metropolis rule [6, Chap. 8].

In Fig. 3 (left), we report the network mean-square-deviation (MSD) learning curves:

MSD(i)=1Kk=1K𝔼wko𝒘k,i2,MSD𝑖1𝐾superscriptsubscript𝑘1𝐾𝔼superscriptnormsubscriptsuperscript𝑤𝑜𝑘subscript𝒘𝑘𝑖2\text{MSD}(i)=\frac{1}{K}\sum_{k=1}^{K}\mathbb{E}\|w^{o}_{k}-\boldsymbol{w}_{k% ,i}\|^{2},MSD ( italic_i ) = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ∥ italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (82)

for 3333 different values of the step-size μ𝜇\muitalic_μ. The results are averaged over 100100100100 Monte-Carlo runs. For each value of the step-size, we run Alg. 1 for 4444 different choices of the compression operator 𝓒𝓒\boldsymbol{\cal{C}}bold_caligraphic_C: i)i)italic_i ) top-4444 sparsifier, ii)ii)italic_i italic_i ) top-4444 QSGD (Table II, row 6666, s=2𝑠2s=2italic_s = 2), iii)iii)italic_i italic_i italic_i ) top-4444 probabilistic uniform (Table II, row 2222, Δ=μΔ𝜇\Delta=\muroman_Δ = italic_μ), and iv)iv)italic_i italic_v ) top-4444 probabilistic ANQ (Table II, row 3333, ω=0.5𝜔0.5\omega=0.5italic_ω = 0.5, η=μ𝜂𝜇\eta=\muitalic_η = italic_μ). We set γ=ζ=0.9𝛾𝜁0.9\gamma=\zeta=0.9italic_γ = italic_ζ = 0.9. As it can be observed, despite compression, the DEF-ATC approach achieves a performance that is almost identical to the uncompressed ATC approach (which can be obtained from Alg. 1 by setting γ=ζ=1𝛾𝜁1\gamma=\zeta=1italic_γ = italic_ζ = 1 and replacing the compression operator by identity). We further observe that, in steady-state, the network MSD increases by approximately 3 dB when μ𝜇\muitalic_μ goes from μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to 2μ02subscript𝜇02\mu_{0}2 italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This means that the performance is on the order of μ𝜇\muitalic_μ, as expected from Theorem 1 since in the simulations the absolute noise component is such that σc,k2μ2proportional-tosubscriptsuperscript𝜎2𝑐𝑘superscript𝜇2\sigma^{2}_{c,k}\propto\mu^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT ∝ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. For the top-4444 probabilistic uniform and ANQ quantizers, we report in Fig. 3 (right) the average number of bits per node, per component, computed according to:

R(i)=1Kk=1K1Mkrk,i,𝑅𝑖1𝐾superscriptsubscript𝑘1𝐾1subscript𝑀𝑘subscript𝑟𝑘𝑖R(i)=\frac{1}{K}\sum_{k=1}^{K}\frac{1}{M_{k}}r_{k,i},italic_R ( italic_i ) = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT , (83)

where rk,isubscript𝑟𝑘𝑖r_{k,i}italic_r start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is the bit rate given by (78), which is associated with the encoding of the error compensated difference vector 𝝌k,i=𝝍k,iϕk,i1+𝒛k,i1subscript𝝌𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT transmitted by agent k𝑘kitalic_k at iteration i𝑖iitalic_i. As it can be observed, for the three different values of the step-size, we approximately obtain the same finite average number of bits in steady-state (approximately 2.4 bits/component/iteration are required on average in steady-state when the top-4444 probabilistic ANQ quantizer is used). From Table II (row 7777), the top-4444 sparsifier would require an average of 4(32+4)10=14.443241014.4\frac{4(32+4)}{10}=14.4divide start_ARG 4 ( 32 + 4 ) end_ARG start_ARG 10 end_ARG = 14.4 bits/node/component/iteration666Note that we replaced BHPsubscript𝐵HPB_{\text{HP}}italic_B start_POSTSUBSCRIPT HP end_POSTSUBSCRIPT by 32 since we are performing the experiments on MATLAB 2022a which uses 32323232 bits to represent a floating number in single-precision., which is almost six times higher than the one obtained in steady-state when the probabilistic ANQ is used. This is expected since the top-4444 sparsifier requires encoding the 4444 largest magnitude components of the input with very high precision. On the other hand, the top-4444 QSGD (Table II, row 6666, s=2𝑠2s=2italic_s = 2), which requires encoding the norm of the input with high precision, would need an average of 32+10+1010=5.2321010105.2\frac{32+10+10}{10}=5.2divide start_ARG 32 + 10 + 10 end_ARG start_ARG 10 end_ARG = 5.2 bits/node/component/iteration, which is almost two times higher than the one obtained for the probabilistic ANQ.

Refer to caption
Refer to caption
Figure 3: Network performance w.r.t. wosuperscript𝑤𝑜w^{o}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT for three different values of the step-size (μ0=0.001subscript𝜇00.001\mu_{0}=0.001italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.001). (Left) Evolution of the MSD learning curves. (Right) Evolution of the average number of bits per node, per component, when the variable-rate probabilistic uniform and ANQ schemes are used at the output of the top-4444 sparsifier to encode the error compensated difference 𝝌k,i=𝝍k,iϕk,i1+𝒛k,i1subscript𝝌𝑘𝑖subscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript𝒛𝑘𝑖1\boldsymbol{\chi}_{k,i}=\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}+% \boldsymbol{z}_{k,i-1}bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT in (18b).

Evaluating the performance of a learning approach requires considering both the attained learning error (MSD) and the associated bit expense. Therefore, in the following, we focus on reporting rate-distortion (RD) curves, where the bit budget quantifies the rate and the MSD quantifies the distortion.

V-B Top-c𝑐citalic_c quantization outperforms other compression rules

We report in Fig. 4 the RD curves of the DEF-ATC approach with probabilistic uniform and top-4444 probabilistic uniform quantization. We set μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001 and γ=ζ=0.9𝛾𝜁0.9\gamma=\zeta=0.9italic_γ = italic_ζ = 0.9. Each point of the rate-distortion curve corresponds to one value of the parameter ε𝜀\varepsilonitalic_ε, which determines the quantization step Δ=μ1+ε2Δsuperscript𝜇1𝜀2\Delta=\mu^{\frac{1+\varepsilon}{2}}roman_Δ = italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT. In the example, we selected 25252525 values of ε𝜀\varepsilonitalic_ε linearly spaced in the interval [103,1]superscript1031[10^{-3},1][ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 1 ]. For each value of ΔΔ\Deltaroman_Δ (i.e., each point of the curve), the resulting MSD (distortion) and average number of bits/node/component (rate) were obtained by averaging the instantaneous mean-square-deviation MSD(i)MSD𝑖\text{MSD}(i)MSD ( italic_i ) in (82) and averaging the number of bits R(i)𝑅𝑖R(i)italic_R ( italic_i ) in (83) over 100100100100 samples after convergence of the algorithm (the expectations in (82) and (83) are estimated empirically over 100100100100 Monte Carlo runs). The trade-off between rate and distortion can be observed from Fig. 4, namely, as the rate decreases, the distortion increases, and vice versa. For comparison purposes, we illustrate in Fig. 4 the distortion (horizontal dashed line) of the uncompressed ATC approach obtained by averaging MSD(i)MSD𝑖\text{MSD}(i)MSD ( italic_i ) in (82) over 100100100100 samples after convergence. We also illustrate the specific log2(3)subscript23\log_{2}(3)roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 ) bit rate (vertical dashed line) corresponding to minimum number of bits possible for the considered scheme, namely, the variable-rate coding scheme from [25][26, Sec. II]. Under this scheme, each component is appended with a parsing symbol, and the 00 value is encoded as an empty element. Thus, the minimum number of bits/component would correspond to sending only one symbol per component. As it can be observed, top-4444 probabilistic uniform is more efficient than probabilistic uniform, namely, it approaches the uncompressed performance (low distortion) at a lower bit rate compared to probabilistic uniform quantization.

Refer to caption
Figure 4: Rate-distortion curves of the DEF-ATC approach with probabilistic uniform and top-4444 probabilistic uniform quantization.

V-C Performance w.r.t. state-of-the-art baselines

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: DEF-ATC performance w.r.t. to state-of-the-art baselines. Evolution of the MSD and average number of bits/node/component (when top-4444 probabilistic uniform with variable-rate encoding scheme is used) for different values of the step-size. (Left) μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01. (Right) μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001.

In this part, we compare the DEF-ATC diffusion to the following approaches: i)i)italic_i ) ChocoSGD [11], ii)ii)italic_i italic_i ) DeepSqueeze [33], iii)iii)italic_i italic_i italic_i ) diffusion ACTC [21], and iv)iv)italic_i italic_v ) compressed diffusion ATC approach (which we refer to as the “compressed ATC 2222”) [26]. We assume that all agents employ the top-4444 probabilistic uniform quantizer with the variable rate encoding scheme from [25][26, Sec. II]. In Fig. 5, we report the network MSD learning curves with the corresponding bit rates, for 2 different values of the step-size μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01 (left) and μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001 (right). The results are averaged over 100100100100 Monte-Carlo runs. We set the quantization parameter Δ=μΔ𝜇\Delta=\muroman_Δ = italic_μ. For the ACTC [21] and compressed ATC 2222 [26] approaches (which were originally designed to handle unbiased probabilistic compression), we used a step-size μ=0.0125𝜇0.0125\mu=0.0125italic_μ = 0.0125 and μ=0.00125𝜇0.00125\mu=0.00125italic_μ = 0.00125, in place of μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01 and μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001, respectively, in order to ensure that they have the same learning rate as the uncompressed ATC approach. This configuration ensures that all approaches are compared at the same learning rate. The other parameters of the baselines approaches are set as follows: i)i)italic_i ) ChocoSGD: γ=0.9𝛾0.9\gamma=0.9italic_γ = 0.9, ii)ii)italic_i italic_i ) DeepSqueeze: (consensus parameter) η=0.05𝜂0.05\eta=0.05italic_η = 0.05 when μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01 and η=0.2𝜂0.2\eta=0.2italic_η = 0.2 when μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001, iii)iii)italic_i italic_i italic_i ) ACTC: ζ=0.9𝜁0.9\zeta=0.9italic_ζ = 0.9, iv)iv)italic_i italic_v ) compressed ATC 2222: γ=0.9𝛾0.9\gamma=0.9italic_γ = 0.9, and v)v)italic_v ) DEF-ATC: γ=ζ=0.9𝛾𝜁0.9\gamma=\zeta=0.9italic_γ = italic_ζ = 0.9. While the space of algorithms’ hyperparameters (γ𝛾\gammaitalic_γ, ζ𝜁\zetaitalic_ζ, etc.) is explored in the next experiment, it is worth noting that the values chosen in the current experiment ensure a stable compressed strategy with the lowest MSD level. As it can be observed from the MSD learning curves, the DEF-ATC approach tends to outperform state-of-the-art baselines in various step-size regimes. In order to identify which method provides better compression efficiency for a given level of distortion, we report in Fig. 6 the RD curves of the different approaches. Before analyzing the results, it is noteworthy that the bit rate curves reported in Fig. 5 indicate that the DeepSqueeze approach tends to require a larger number of bits as the step-size decreases. Thus, for very fine quantization (i.e., small step-sizes since Δ=μΔ𝜇\Delta=\muroman_Δ = italic_μ), the number of bits required tends to grow, making it impractical for use in such scenarios. By noting that DeepSqueeze does not employ differential quantization, the increase in bit rate with decreasing quantization step is expected. In fact, in this case, the input values at the compressor remain large (in particular, their effective range does not scale with the step-size μ𝜇\muitalic_μ), leading to an increase in the bit rate with decreasing quantization step.

Refer to caption
Refer to caption
Figure 6: Rate-distortion curves of the DEF-ATC and state-of-the-art baselines approaches for different values of the step-size. The top-4444 probabilistic uniform quantizer with variable-rate encoding scheme is used. (Left) μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01. (Right) μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001.

For each approach, the process for generating the rate-distortion curves in Fig. 6 consists of three main steps. First, we select a set of algorithm’s hyperparameters (γ𝛾\gammaitalic_γ, ζ𝜁\zetaitalic_ζ, etc.). In particular, for ACTC [21], we select 10101010 values of the dam** coefficient ζ𝜁\zetaitalic_ζ uniformly spaced in the interval [0.1,0.9]0.10.9[0.1,0.9][ 0.1 , 0.9 ]. For compressed ATC 2222 [26] (and ChocoSGD [11]), we select 10101010 values of the mixing parameter γ𝛾\gammaitalic_γ uniformly spaced in the interval [0.1,0.9]0.10.9[0.1,0.9][ 0.1 , 0.9 ]. For the DEF-ATC approach, we create two sets of 5555 uniformly spaced values in the interval [0.1,0.9]0.10.9[0.1,0.9][ 0.1 , 0.9 ] for the coefficients ζ𝜁\zetaitalic_ζ and γ𝛾\gammaitalic_γ, and then consider all possible pairs from these sets. In the second step, and for each hyperparameter setting (i.e., each element in the sets of step 1), we generate the RD curve by following the same method as in Sec. V-B, namely, we vary the quantization step according to Δ=μ1+ε2Δsuperscript𝜇1𝜀2\Delta=\mu^{\frac{1+\varepsilon}{2}}roman_Δ = italic_μ start_POSTSUPERSCRIPT divide start_ARG 1 + italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, where 25252525 values of ε𝜀\varepsilonitalic_ε linearly spaced in the interval [103,1]superscript1031[10^{-3},1][ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 1 ] are chosen. For each value of ΔΔ\Deltaroman_Δ, distortion and rate are evaluated by averaging over 100100100100 samples after convergence (expectations are computed empirically over 50505050 Monte Carlo runs). Each RD curve then represents the performance of the algorithm under a specific choice of hyperparameters. In the last step, we generate and report in Fig. 6 the optimal RD curve given by the convex hull of the empirical curves collected in step 2222. This process allows us to identify the best possible performance trade-offs by varying the algorithms’ hyperparameters (γ𝛾\gammaitalic_γ, ζ𝜁\zetaitalic_ζ, etc.) and the compression parameters, namely, ε𝜀\varepsilonitalic_ε. Two learning step-size regimes are considered, namely, μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01 (left plot) and μ=0.001𝜇0.001\mu=0.001italic_μ = 0.001 (right plot). By exploring the hyperparameter space and by considering different step-size regimes, the results show that the DEF-ATC approach can still achieve the closest performance to the uncompressed approach with a relatively small number of bits (approximately 2.22.22.22.22.42.42.42.4 bits/component/iteration are required on average in steady-state), outperforming state-of-the-art baselines.

V-D Beyond single-task estimation

To illustrate the effectiveness of the DEF-ATC approach in solving general optimization problems of the form (2), we conduct an experiment in which agents seek consensus on certain components of their estimates while seeking partial consensus on others. In particular, we assume that we have 5555 connected777A group of nodes is said to be connected if there is a path between every pair of nodes. groups of agents, namely, 𝒢1={1,,15}subscript𝒢1115\mathcal{G}_{1}=\{1,\ldots,15\}caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { 1 , … , 15 }, 𝒢2={16,,30}subscript𝒢21630\mathcal{G}_{2}=\{16,\ldots,30\}caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { 16 , … , 30 }, 𝒢3={1,,10}subscript𝒢3110\mathcal{G}_{3}=\{1,\ldots,10\}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = { 1 , … , 10 }, 𝒢4={11,,20}subscript𝒢41120\mathcal{G}_{4}=\{11,\ldots,20\}caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = { 11 , … , 20 }, and 𝒢5={21,,30}subscript𝒢52130\mathcal{G}_{5}=\{21,\ldots,30\}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = { 21 , … , 30 }, and that the model parameter vector wksubscriptsuperscript𝑤𝑘w^{\star}_{k}italic_w start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at agent k𝑘kitalic_k is of the form wk=wk+Δk,isubscriptsuperscript𝑤𝑘subscriptsuperscript𝑤𝑘subscriptΔ𝑘𝑖w^{\star}_{k}=w^{\bullet}_{k}+\Delta_{k,i}italic_w start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT ∙ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, where Δk,isubscriptΔ𝑘𝑖\Delta_{k,i}roman_Δ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is a 10×110110\times 110 × 1 vector with each component randomly generated from the Gaussian distribution, with zero mean and variance 0.10.10.10.1. The vectors {wk}subscriptsuperscript𝑤𝑘\{w^{\bullet}_{k}\}{ italic_w start_POSTSUPERSCRIPT ∙ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } are generated in such a way that the first 5555 components are common across the network, the components 6,7,86786,7,86 , 7 , 8 are separately common for agents in 𝒢1subscript𝒢1\mathcal{G}_{1}caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒢2subscript𝒢2\mathcal{G}_{2}caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and the last two components are separately common for agents in groups 𝒢3,𝒢4subscript𝒢3subscript𝒢4\mathcal{G}_{3},\mathcal{G}_{4}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, and 𝒢5subscript𝒢5\mathcal{G}_{5}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. Then, we choose the constraints in (2) (i.e., the matrix 𝒰𝒰\mathcal{U}caligraphic_U) in order to enforce global consensus on the first 5555 components of the estimates and partial consensus on the remaining components. The partial consensus is as follows. Agents in 𝒢1subscript𝒢1\mathcal{G}_{1}caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT should converge to a consensus on components 66668888, and agents in 𝒢2subscript𝒢2\mathcal{G}_{2}caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT should also converge to a consensus on components 66668888, independently from the first group. For the remaining two components 9999 and 10101010, consensus is enforced within each of the groups 𝒢3,𝒢4subscript𝒢3subscript𝒢4\mathcal{G}_{3},\mathcal{G}_{4}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, and 𝒢5subscript𝒢5\mathcal{G}_{5}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. The matrix 𝒜𝒜\mathcal{A}caligraphic_A satisfying the conditions in (4) and having the same sparsity structure as the link matrix in Fig. 2 (left) is found by following the same approach as in [40].We assume that all agents employ the top-4444 probabilistic uniform quantizer. In Fig. 7 (left) and (middle), we report the network MSD (82) and average number of bits/node/component (83) for 3333 different values of the step-size μ𝜇\muitalic_μ. The results are averaged over 100100100100 Monte-Carlo runs. We set γ=ζ=0.9𝛾𝜁0.9\gamma=\zeta=0.9italic_γ = italic_ζ = 0.9 and the quantizer parameter Δ=μΔ𝜇\Delta=\muroman_Δ = italic_μ. As in Sec. V-A, we observe, in the small step-size regime, that the DEF-ATC achieves the same performance (which is on the order of μ𝜇\muitalic_μ) as the uncompressed ATC approach, and is able to maintain a finite bit rate when the step-size approaches zero. This illustrates the effectiveness of DEF-ATC in handling different problem settings, beyond traditional single-task estimation. In Fig. 7 (right), we report the RD curves of DEF-ATC with probabilistic uniform and top-4444 probabilistic uniform quantization. These curves have been generated in the same way as those in Fig. 4. The results show that quantizing only the highest magnitude components of a vector, as opposed to the entire vector, can reduce the number of bits required while maintaining a low level of distortion.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Beyond single-task estimation. (Left) Network MSD learning curves w.r.t. wkosubscriptsuperscript𝑤𝑜𝑘w^{o}_{k}italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in (2) for three different values of the step-size (μ0=0.001subscript𝜇00.001\mu_{0}=0.001italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.001). (Middle) Evolution of the average number of bits/node/component when the variable-rate scheme is used. (Right) Rate-distortion curves of the DEF-ATC (when μ=μ0𝜇subscript𝜇0\mu=\mu_{0}italic_μ = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) with probabilistic uniform and top-4444 probabilistic uniform quantization.

VI Conclusion

In this work, we presented an approach for solving decentralized learning problems where agents have individual risks to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as a special case, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. To reduce the communication cost among agents, we incorporated compression into the decentralized approach by employing differential quantization at the agent level to compress the iterates before communicating them to neighbors. In addition, we implemented in the learning approach an error-feedback mechanism, which consists of incorporating the compression error into subsequent steps. We then showed that, under some general conditions on the compression noise, and for sufficiently small step-sizes μ𝜇\muitalic_μ, the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate. The results showed that, in the small step-size regime, the iterates generated by the decentralized communication-efficient approach achieve the same performance as the decentralized baseline full-precision approach where no communication compression is performed. Simulations illustrated the theoretical findings and the effectiveness of the approach.

Appendix A Proof of Lemma 1

Let csubscriptsuperscript𝑐\mathcal{I}^{\prime}_{c}caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT denote the complement set of csubscript𝑐\mathcal{I}_{c}caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. Since all the components {xj,jc}subscript𝑥𝑗𝑗subscript𝑐\{x_{j},j\in\mathcal{I}_{c}\}{ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } are in magnitude greater than or equal to the components {xj,jc}subscript𝑥𝑗𝑗subscriptsuperscript𝑐\{x_{j},j\in\mathcal{I}^{\prime}_{c}\}{ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j ∈ caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT }, we can write:

1cjcxj21Lj=1Lxj2,1𝑐subscript𝑗subscript𝑐superscriptsubscript𝑥𝑗21𝐿superscriptsubscript𝑗1𝐿superscriptsubscript𝑥𝑗2\frac{1}{c}\sum_{j\in\mathcal{I}_{c}}x_{j}^{2}\geq\frac{1}{L}\sum_{j=1}^{L}x_{% j}^{2},divide start_ARG 1 end_ARG start_ARG italic_c end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_L end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (84)

from which we conclude the following useful identity:

jcxj2cLx2.subscript𝑗subscript𝑐superscriptsubscript𝑥𝑗2𝑐𝐿superscriptnorm𝑥2\sum_{j\in\mathcal{I}_{c}}x_{j}^{2}\geq\frac{c}{L}\|x\|^{2}.∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (85)

Now, to show that the top-c𝑐citalic_c quantizer is a bounded-distortion compression operator with parameters given by (12), we can manipulate the compression error as follows:

𝔼x𝓒(x)2=𝔼xα𝓠(𝒮(x))2𝔼superscriptnorm𝑥𝓒𝑥2𝔼superscriptnorm𝑥𝛼𝓠𝒮𝑥2\displaystyle\mathbb{E}\|x-\boldsymbol{\cal{C}}(x)\|^{2}=\mathbb{E}\|x-\alpha% \boldsymbol{\cal{Q}}(\mathcal{S}(x))\|^{2}blackboard_E ∥ italic_x - bold_caligraphic_C ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E ∥ italic_x - italic_α bold_caligraphic_Q ( caligraphic_S ( italic_x ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝔼xα𝒮(x)+α(𝒮(x)𝓠(𝒮(x)))2absent𝔼superscriptnorm𝑥𝛼𝒮𝑥𝛼𝒮𝑥𝓠𝒮𝑥2\displaystyle=\mathbb{E}\|x-\alpha\mathcal{S}(x)+\alpha(\mathcal{S}(x)-% \boldsymbol{\cal{Q}}(\mathcal{S}(x)))\|^{2}= blackboard_E ∥ italic_x - italic_α caligraphic_S ( italic_x ) + italic_α ( caligraphic_S ( italic_x ) - bold_caligraphic_Q ( caligraphic_S ( italic_x ) ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(a)xα𝒮(x)2+α2𝔼𝒮(x)𝓠(𝒮(x))2(a)superscriptnorm𝑥𝛼𝒮𝑥2superscript𝛼2𝔼superscriptnorm𝒮𝑥𝓠𝒮𝑥2\displaystyle\overset{\text{(a)}}{=}\|x-\alpha\mathcal{S}(x)\|^{2}+\alpha^{2}% \mathbb{E}\|\mathcal{S}(x)-\boldsymbol{\cal{Q}}(\mathcal{S}(x))\|^{2}over(a) start_ARG = end_ARG ∥ italic_x - italic_α caligraphic_S ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_S ( italic_x ) - bold_caligraphic_Q ( caligraphic_S ( italic_x ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
xα𝒮(x)2+α2βq2𝒮(x)2+α2σq2absentsuperscriptnorm𝑥𝛼𝒮𝑥2superscript𝛼2subscriptsuperscript𝛽2𝑞superscriptnorm𝒮𝑥2superscript𝛼2subscriptsuperscript𝜎2𝑞\displaystyle\leq\|x-\alpha\mathcal{S}(x)\|^{2}+\alpha^{2}\beta^{2}_{q}\|% \mathcal{S}(x)\|^{2}+\alpha^{2}\sigma^{2}_{q}≤ ∥ italic_x - italic_α caligraphic_S ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∥ caligraphic_S ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
=(1α)2jcxj2+jcxj2+α2βq2𝒮(x)2+α2σq2absentsuperscript1𝛼2subscript𝑗subscript𝑐superscriptsubscript𝑥𝑗2subscript𝑗subscriptsuperscript𝑐superscriptnormsubscript𝑥𝑗2superscript𝛼2subscriptsuperscript𝛽2𝑞superscriptnorm𝒮𝑥2superscript𝛼2subscriptsuperscript𝜎2𝑞\displaystyle=(1-\alpha)^{2}\sum_{j\in\mathcal{I}_{c}}x_{j}^{2}+\sum_{j\in% \mathcal{I}^{\prime}_{c}}\|x_{j}\|^{2}+\alpha^{2}\beta^{2}_{q}\|\mathcal{S}(x)% \|^{2}+\alpha^{2}\sigma^{2}_{q}= ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∥ caligraphic_S ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
=((1α)2+α2βq2)jcxj2+jcxj2+α2σq2absentsuperscript1𝛼2superscript𝛼2subscriptsuperscript𝛽2𝑞subscript𝑗subscript𝑐superscriptsubscript𝑥𝑗2subscript𝑗subscriptsuperscript𝑐superscriptnormsubscript𝑥𝑗2superscript𝛼2subscriptsuperscript𝜎2𝑞\displaystyle=\left((1-\alpha)^{2}+\alpha^{2}\beta^{2}_{q}\right)\sum_{j\in% \mathcal{I}_{c}}x_{j}^{2}+\sum_{j\in\mathcal{I}^{\prime}_{c}}\|x_{j}\|^{2}+% \alpha^{2}\sigma^{2}_{q}= ( ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
=x2(1((1α)2+α2βq2))jcxj2+α2σq2absentsuperscriptnorm𝑥21superscript1𝛼2superscript𝛼2subscriptsuperscript𝛽2𝑞subscript𝑗subscript𝑐superscriptnormsubscript𝑥𝑗2superscript𝛼2subscriptsuperscript𝜎2𝑞\displaystyle=\|x\|^{2}-\left(1-\left((1-\alpha)^{2}+\alpha^{2}\beta^{2}_{q}% \right)\right)\sum_{j\in\mathcal{I}_{c}}\|x_{j}\|^{2}+\alpha^{2}\sigma^{2}_{q}= ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( 1 - ( ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ) ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
(85)(1(1((1α)2+α2βq2))cL)x2+α2σq2,italic-(85italic-)11superscript1𝛼2superscript𝛼2subscriptsuperscript𝛽2𝑞𝑐𝐿superscriptnorm𝑥2superscript𝛼2subscriptsuperscript𝜎2𝑞\displaystyle\overset{\eqref{eq: intermediate equation for the sparsifier}}{% \leq}\left(1-\left(1-\left((1-\alpha)^{2}+\alpha^{2}\beta^{2}_{q}\right)\right% )\frac{c}{L}\right)\|x\|^{2}+\alpha^{2}\sigma^{2}_{q},start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG ( 1 - ( 1 - ( ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ) divide start_ARG italic_c end_ARG start_ARG italic_L end_ARG ) ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , (86)

where in (a) we used the fact that αxα𝒮(x),𝔼[𝒮(x)𝓠(𝒮(x))]=0𝛼𝑥𝛼𝒮𝑥𝔼delimited-[]𝒮𝑥𝓠𝒮𝑥0\alpha\langle x-\alpha\mathcal{S}(x),\mathbb{E}[\mathcal{S}(x)-\boldsymbol{% \cal{Q}}(\mathcal{S}(x))]\rangle=0italic_α ⟨ italic_x - italic_α caligraphic_S ( italic_x ) , blackboard_E [ caligraphic_S ( italic_x ) - bold_caligraphic_Q ( caligraphic_S ( italic_x ) ) ] ⟩ = 0 in both cases: biased (α=1𝛼1\alpha=1italic_α = 1) and unbiased (α=11+βq2𝛼11subscriptsuperscript𝛽2𝑞\alpha=\frac{1}{1+\beta^{2}_{q}}italic_α = divide start_ARG 1 end_ARG start_ARG 1 + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG) quantizer 𝓠()𝓠\boldsymbol{\cal{Q}}(\cdot)bold_caligraphic_Q ( ⋅ ).

Appendix B Mean-square-error analysis

We consider the transformed iterates ϕ¯izsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖\overline{\boldsymbol{\phi}}^{z}_{i}over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ϕwidecheckizsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖\widecheck{\boldsymbol{\phi}}^{z}_{i}overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (65) and (66), respectively. Computing the second-order moment of both sides of (65), we get:

𝔼ϕ¯iz2=𝔼(IPμ𝓓11,i1)ϕ¯i1zμ𝓓12,i1ϕwidechecki1zμ𝓓11,i1𝒛¯i1μ𝓓12,i1𝒛widechecki12+μ2𝔼𝒔¯i2,𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscript𝐼𝑃𝜇subscript𝓓11𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇subscript𝓓11𝑖1subscript¯𝒛𝑖1𝜇subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖12superscript𝜇2𝔼superscriptnormsubscript¯𝒔𝑖2\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}=\mathbb{E}\|(I_{P}-\mu% \boldsymbol{\cal{D}}_{11,i-1})\overline{\boldsymbol{\phi}}^{z}_{i-1}-\mu% \boldsymbol{\cal{D}}_{12,i-1}\widecheck{\boldsymbol{\phi}}^{z}_{i-1}-\mu% \boldsymbol{\cal{D}}_{11,i-1}\overline{\boldsymbol{z}}_{i-1}-\mu\boldsymbol{% \cal{D}}_{12,i-1}\widecheck{\boldsymbol{z}}_{i-1}\|^{2}+\mu^{2}\mathbb{E}\|% \overline{\boldsymbol{s}}_{i}\|^{2},blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E ∥ ( italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT ) over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (87)

where, from Assumption 2 on the gradient noise processes, we used the fact that:

𝔼[𝒙i1𝒔¯i]𝔼delimited-[]superscriptsubscript𝒙𝑖1topsubscript¯𝒔𝑖\displaystyle\mathbb{E}[\boldsymbol{x}_{i-1}^{\top}\overline{\boldsymbol{s}}_{% i}]blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] =𝔼[𝔼[𝒙i1𝒔¯i|{ϕ,i1,𝒛,i1}=1K]]=𝔼[𝒙i1𝔼[𝒔¯i|{ϕ,i1,𝒛,i1}=1K]]=0absent𝔼delimited-[]𝔼delimited-[]conditionalsuperscriptsubscript𝒙𝑖1topsubscript¯𝒔𝑖superscriptsubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒛𝑖11𝐾𝔼delimited-[]superscriptsubscript𝒙𝑖1top𝔼delimited-[]conditionalsubscript¯𝒔𝑖superscriptsubscriptsubscriptbold-italic-ϕ𝑖1subscript𝒛𝑖11𝐾0\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\boldsymbol{x}_{i-1}^{\top}% \overline{\boldsymbol{s}}_{i}\Big{|}\{{\boldsymbol{\phi}_{\ell,i-1},% \boldsymbol{z}_{\ell,i-1}}\}_{\ell=1}^{K}\right]\right]=\mathbb{E}\left[% \boldsymbol{x}_{i-1}^{\top}\mathbb{E}\left[\overline{\boldsymbol{s}}_{i}\Big{|% }\{{\boldsymbol{\phi}_{\ell,i-1},\boldsymbol{z}_{\ell,i-1}}\}_{\ell=1}^{K}% \right]\right]=0= blackboard_E [ blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | { bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] ] = blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | { bold_italic_ϕ start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT roman_ℓ , italic_i - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] ] = 0 (88)

with 𝒙i1=(IPμ𝓓11,i1)ϕ¯i1zμ𝓓12,i1ϕwidechecki1zμ𝓓11,i1𝒛¯i1μ𝓓12,i1𝒛widechecki1subscript𝒙𝑖1subscript𝐼𝑃𝜇subscript𝓓11𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇subscript𝓓11𝑖1subscript¯𝒛𝑖1𝜇subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖1\boldsymbol{x}_{i-1}=(I_{P}-\mu\boldsymbol{\cal{D}}_{11,i-1})\overline{% \boldsymbol{\phi}}^{z}_{i-1}-\mu\boldsymbol{\cal{D}}_{12,i-1}\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}-\mu\boldsymbol{\cal{D}}_{11,i-1}\overline{% \boldsymbol{z}}_{i-1}-\mu\boldsymbol{\cal{D}}_{12,i-1}\widecheck{\boldsymbol{z% }}_{i-1}bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = ( italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT ) over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. Using similar arguments, we can also show that:

𝔼ϕwidecheckiz2=𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2absent\displaystyle\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}=blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 𝔼(𝒥ϵ′′μ𝓓22,i1)ϕwidechecki1zμ𝓓21,i1ϕ¯i1z+μbwidecheckμ𝓓21,i1𝒛¯i1(ζ(I𝒥ϵ)+μ𝓓22,i1)𝒛widechecki12𝔼superscriptnormsubscriptsuperscript𝒥′′italic-ϵ𝜇subscript𝓓22𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇subscript𝓓21𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇widecheck𝑏𝜇subscript𝓓21𝑖1subscript¯𝒛𝑖1𝜁𝐼superscriptsubscript𝒥italic-ϵ𝜇subscript𝓓22𝑖1subscriptwidecheck𝒛𝑖12\displaystyle\mathbb{E}\|(\mathcal{J}^{\prime\prime}_{\epsilon}-\mu\boldsymbol% {\cal{D}}_{22,i-1})\widecheck{\boldsymbol{\phi}}^{z}_{i-1}-\mu\boldsymbol{\cal% {D}}_{21,i-1}\overline{\boldsymbol{\phi}}^{z}_{i-1}+\mu\widecheck{b}-\mu% \boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{z}}_{i-1}-\left(\zeta(I-% \mathcal{J}_{\epsilon}^{\prime})+\mu\boldsymbol{\cal{D}}_{22,i-1}\right)% \widecheck{\boldsymbol{z}}_{i-1}\|^{2}blackboard_E ∥ ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_μ overwidecheck start_ARG italic_b end_ARG - italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - ( italic_ζ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + italic_μ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+μ2𝔼𝒔widechecki2.superscript𝜇2𝔼superscriptnormsubscriptwidecheck𝒔𝑖2\displaystyle\qquad\qquad\qquad+\mu^{2}\mathbb{E}\|\widecheck{\boldsymbol{s}}_% {i}\|^{2}.+ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (89)

By using similar arguments as those used to establish inequalities (119) and (124) in [40, Appendix D], we can show that:

𝔼ϕ¯iz2(1μσ11)𝔼ϕ¯i1z2+μσ11𝔼𝓓12,i1ϕwidechecki1z+𝓓11,i1𝒛¯i1+𝓓12,i1𝒛widechecki12+μ2𝔼𝒔¯i2(1μσ11)𝔼ϕ¯i1z2+3μσ122σ11𝔼ϕwidechecki1z2+3μσ11𝔼𝒛¯i12+3μσ122σ11𝔼𝒛widechecki12+μ2𝔼𝒔¯i2(1μσ11)𝔼ϕ¯i1z2+3μσ122σ11𝔼ϕwidechecki1z2+(3μσ11+3μσ122σ11)𝔼𝒱ϵ1𝒛i12+μ2𝔼𝒔¯i2,𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖21𝜇subscript𝜎11𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖12𝜇subscript𝜎11𝔼superscriptdelimited-∥∥subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1subscript𝓓11𝑖1subscript¯𝒛𝑖1subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖21𝜇subscript𝜎11𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖123𝜇subscriptsuperscript𝜎212subscript𝜎11𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖123𝜇subscript𝜎11𝔼superscriptdelimited-∥∥subscript¯𝒛𝑖123𝜇subscriptsuperscript𝜎212subscript𝜎11𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖21𝜇subscript𝜎11𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖123𝜇subscriptsuperscript𝜎212subscript𝜎11𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖123𝜇subscript𝜎113𝜇subscriptsuperscript𝜎212subscript𝜎11𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖2\begin{split}\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}&\leq(1-\mu% \sigma_{11})\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{\mu% }{\sigma_{11}}\mathbb{E}\|\boldsymbol{\cal{D}}_{12,i-1}\widecheck{\boldsymbol{% \phi}}^{z}_{i-1}+\boldsymbol{\cal{D}}_{11,i-1}\overline{\boldsymbol{z}}_{i-1}+% \boldsymbol{\cal{D}}_{12,i-1}\widecheck{\boldsymbol{z}}_{i-1}\|^{2}+\mu^{2}% \mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}\\ &\leq(1-\mu\sigma_{11})\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2% }+\frac{3\mu\sigma^{2}_{12}}{\sigma_{11}}\mathbb{E}\|\widecheck{\boldsymbol{% \phi}}^{z}_{i-1}\|^{2}+3\mu\sigma_{11}\mathbb{E}\|\overline{\boldsymbol{z}}_{i% -1}\|^{2}+\frac{3\mu\sigma^{2}_{12}}{\sigma_{11}}\mathbb{E}\|\widecheck{% \boldsymbol{z}}_{i-1}\|^{2}+\mu^{2}\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|% ^{2}\\ &\leq(1-\mu\sigma_{11})\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2% }+\frac{3\mu\sigma^{2}_{12}}{\sigma_{11}}\mathbb{E}\|\widecheck{\boldsymbol{% \phi}}^{z}_{i-1}\|^{2}+\left(3\mu\sigma_{11}+\frac{3\mu\sigma^{2}_{12}}{\sigma% _{11}}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}% +{\mu^{2}}\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2},\end{split}start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ≤ ( 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG blackboard_E ∥ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ( 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ( 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 3 italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW (90)

and

𝔼ϕwidecheckiz2𝒥ϵ′′𝔼ϕwidechecki1z2+2μ21𝒥ϵ′′𝔼𝓓22,i1ϕwidechecki1z+𝓓21,i1ϕ¯i1zbwidecheck+𝓓21,i1𝒛¯i1+𝓓22,i1𝒛widechecki12+2ζ2I𝒥ϵ21𝒥ϵ′′𝔼𝒛widechecki12+μ2𝔼𝒔widechecki2(𝒥ϵ′′+10μ2σ2221𝒥ϵ′′)𝔼ϕwidechecki1z2+(10μ2σ2121𝒥ϵ′′)𝔼ϕ¯i1z2+(10μ21𝒥ϵ′′)bwidecheck2+(10μ2σ2121𝒥ϵ′′)𝔼𝒛¯i12+(2ζ2I𝒥ϵ21𝒥ϵ′′+10μ2σ2221𝒥ϵ′′)𝔼𝒛widechecki12+μ2𝔼𝒔widechecki2(𝒥ϵ′′+10μ2σ2221𝒥ϵ′′)𝔼ϕwidechecki1z2+(10μ2σ2121𝒥ϵ′′)𝔼ϕ¯i1z2+(10μ21𝒥ϵ′′)bwidecheck2+(2ζ2I𝒥ϵ21𝒥ϵ′′+10μ2(σ222+σ212)1𝒥ϵ′′)𝔼𝒱ϵ1𝒛i12+μ2𝔼𝒔widechecki2,𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2delimited-∥∥subscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖122superscript𝜇21normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscript𝓓22𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1subscript𝓓21𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1widecheck𝑏subscript𝓓21𝑖1subscript¯𝒛𝑖1subscript𝓓22𝑖1subscriptwidecheck𝒛𝑖122superscript𝜁2superscriptnorm𝐼superscriptsubscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2delimited-∥∥subscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎2221normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1210superscript𝜇2superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1210superscript𝜇21normsubscriptsuperscript𝒥′′italic-ϵsuperscriptdelimited-∥∥widecheck𝑏210superscript𝜇2superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscript¯𝒛𝑖122superscript𝜁2superscriptnorm𝐼superscriptsubscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎2221normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2delimited-∥∥subscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎2221normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1210superscript𝜇2superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1210superscript𝜇21normsubscriptsuperscript𝒥′′italic-ϵsuperscriptdelimited-∥∥widecheck𝑏22superscript𝜁2superscriptnorm𝐼superscriptsubscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎222superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscript𝜇2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2\begin{split}\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}&\leq\|% \mathcal{J}^{\prime\prime}_{\epsilon}\|\mathbb{E}\|\widecheck{\boldsymbol{\phi% }}^{z}_{i-1}\|^{2}+\frac{2\mu^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|% }\mathbb{E}\|\boldsymbol{\cal{D}}_{22,i-1}\widecheck{\boldsymbol{\phi}}^{z}_{i% -1}+\boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{\phi}}^{z}_{i-1}-% \widecheck{b}+\boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{z}}_{i-1}+% \boldsymbol{\cal{D}}_{22,i-1}\widecheck{\boldsymbol{z}}_{i-1}\|^{2}+\\ &\qquad\frac{2\zeta^{2}\|I-\mathcal{J}_{\epsilon}^{\prime}\|^{2}}{1-\|\mathcal% {J}^{\prime\prime}_{\epsilon}\|}\mathbb{E}\|\widecheck{\boldsymbol{z}}_{i-1}\|% ^{2}+\mu^{2}\mathbb{E}\|\widecheck{\boldsymbol{s}}_{i}\|^{2}\\ &\leq\left(\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+\frac{10\mu^{2}\sigma_{22% }^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}\right)\mathbb{E}\|% \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\left(\frac{10\mu^{2}\sigma_{21}% ^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}\right)\mathbb{E}\|\overline% {\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\left(\frac{10\mu^{2}}{1-\|\mathcal{J}^{% \prime\prime}_{\epsilon}\|}\right)\|\widecheck{b}\|^{2}+\\ &\qquad\left(\frac{10\mu^{2}\sigma_{21}^{2}}{1-\|\mathcal{J}^{\prime\prime}_{% \epsilon}\|}\right)\mathbb{E}\|\overline{\boldsymbol{z}}_{i-1}\|^{2}+\left(% \frac{2\zeta^{2}\|I-\mathcal{J}_{\epsilon}^{\prime}\|^{2}}{1-\|\mathcal{J}^{% \prime\prime}_{\epsilon}\|}+\frac{10\mu^{2}\sigma_{22}^{2}}{1-\|\mathcal{J}^{% \prime\prime}_{\epsilon}\|}\right)\mathbb{E}\|\widecheck{\boldsymbol{z}}_{i-1}% \|^{2}+\mu^{2}\mathbb{E}\|\widecheck{\boldsymbol{s}}_{i}\|^{2}\\ &\leq\left(\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+\frac{10\mu^{2}\sigma_{22% }^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}\right)\mathbb{E}\|% \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\left(\frac{10\mu^{2}\sigma_{21}% ^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}\right)\mathbb{E}\|\overline% {\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\left(\frac{10\mu^{2}}{1-\|\mathcal{J}^{% \prime\prime}_{\epsilon}\|}\right)\|\widecheck{b}\|^{2}+\\ &\qquad\left(\frac{2{\zeta^{2}}\|I-\mathcal{J}_{\epsilon}^{\prime}\|^{2}}{1-\|% \mathcal{J}^{\prime\prime}_{\epsilon}\|}+\frac{10\mu^{2}(\sigma_{22}^{2}+% \sigma_{21}^{2})}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}\right)\mathbb{E% }\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}+\mu^{2}\mathbb{E}\|% \widecheck{\boldsymbol{s}}_{i}\|^{2},\end{split}start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ≤ ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG blackboard_E ∥ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - overwidecheck start_ARG italic_b end_ARG + bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ( ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ( ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW (91)

for some positive constant σ11subscript𝜎11\sigma_{11}italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and non-negative constants σ12,σ21subscript𝜎12subscript𝜎21\sigma_{12},\sigma_{21}italic_σ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT, and σ22subscript𝜎22\sigma_{22}italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT independent of μ𝜇\muitalic_μ, and where we used the fact that the 2limit-from22-2 -induced matrix norm of the block diagonal matrix 𝒥ϵ′′subscriptsuperscript𝒥′′italic-ϵ\mathcal{J}^{\prime\prime}_{\epsilon}caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT in (71) satisfies 𝒥ϵ′′(0,1)normsubscriptsuperscript𝒥′′italic-ϵ01\|\mathcal{J}^{\prime\prime}_{\epsilon}\|\in(0,1)∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ∈ ( 0 , 1 ). In fact, from (44), we can re-write 𝒥ϵ′′subscriptsuperscript𝒥′′italic-ϵ\mathcal{J}^{\prime\prime}_{\epsilon}caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT in the following form:

𝒥ϵ′′=(1γζ)IMP+γζ𝒥ϵ.subscriptsuperscript𝒥′′italic-ϵ1𝛾𝜁subscript𝐼𝑀𝑃𝛾𝜁subscript𝒥italic-ϵ\mathcal{J}^{\prime\prime}_{\epsilon}=(1-\gamma\zeta)I_{M-P}+\gamma\zeta% \mathcal{J}_{\epsilon}.caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = ( 1 - italic_γ italic_ζ ) italic_I start_POSTSUBSCRIPT italic_M - italic_P end_POSTSUBSCRIPT + italic_γ italic_ζ caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT . (92)

By following similar arguments as in [6, pp. 516–517], we can first show that 𝒥ϵ′′subscriptsuperscript𝒥′′italic-ϵ\mathcal{J}^{\prime\prime}_{\epsilon}caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT in (92) satisfies:

𝒥ϵ′′2(ρ(𝒥ϵ′′)+γζϵ)2.superscriptnormsuperscriptsubscript𝒥italic-ϵ′′2superscript𝜌subscriptsuperscript𝒥′′italic-ϵ𝛾𝜁italic-ϵ2\|\mathcal{J}_{\epsilon}^{\prime\prime}\|^{2}\leq(\rho(\mathcal{J}^{\prime% \prime}_{\epsilon})+\gamma\zeta\epsilon)^{2}.∥ caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( italic_ρ ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_γ italic_ζ italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (93)

From (92), we can also show that:

ρ(𝒥ϵ′′)(1γζ)+γζρ(𝒥ϵ).𝜌subscriptsuperscript𝒥′′italic-ϵ1𝛾𝜁𝛾𝜁𝜌subscript𝒥italic-ϵ\rho(\mathcal{J}^{\prime\prime}_{\epsilon})\leq(1-\gamma\zeta)+\gamma\zeta\rho% (\mathcal{J}_{\epsilon}).italic_ρ ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ≤ ( 1 - italic_γ italic_ζ ) + italic_γ italic_ζ italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) . (94)

Using the fact that ρ(𝒥ϵ)(0,1)𝜌subscript𝒥italic-ϵ01\rho(\mathcal{J}_{\epsilon})\in(0,1)italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∈ ( 0 , 1 ) from [40, Lemma 2] and the fact that γζ(0,1]𝛾𝜁01\gamma\zeta\in(0,1]italic_γ italic_ζ ∈ ( 0 , 1 ], we obtain ρ(𝒥ϵ′′)(0,1)𝜌subscriptsuperscript𝒥′′italic-ϵ01\rho(\mathcal{J}^{\prime\prime}_{\epsilon})\in(0,1)italic_ρ ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∈ ( 0 , 1 ). Moreover, since ρ(𝒥ϵ′′)+γζϵ𝜌subscriptsuperscript𝒥′′italic-ϵ𝛾𝜁italic-ϵ\rho(\mathcal{J}^{\prime\prime}_{\epsilon})+\gamma\zeta\epsilonitalic_ρ ( caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_γ italic_ζ italic_ϵ is non-negative, by replacing (94) into (93), we obtain:

𝒥ϵ′′(1γζ)+γζρ(𝒥ϵ)+γζϵ=1γζ(1ρ(𝒥ϵ)ϵ).delimited-∥∥superscriptsubscript𝒥italic-ϵ′′1𝛾𝜁𝛾𝜁𝜌subscript𝒥italic-ϵ𝛾𝜁italic-ϵ1𝛾𝜁1𝜌subscript𝒥italic-ϵitalic-ϵ\begin{split}\|\mathcal{J}_{\epsilon}^{\prime\prime}\|&\leq(1-\gamma\zeta)+% \gamma\zeta\rho(\mathcal{J}_{\epsilon})+\gamma\zeta\epsilon\\ &=1-\gamma\zeta(1-\rho(\mathcal{J}_{\epsilon})-\epsilon).\end{split}start_ROW start_CELL ∥ caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ end_CELL start_CELL ≤ ( 1 - italic_γ italic_ζ ) + italic_γ italic_ζ italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_γ italic_ζ italic_ϵ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = 1 - italic_γ italic_ζ ( 1 - italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) - italic_ϵ ) . end_CELL end_ROW (95)

This identity will be used in the subsequent analysis. Returning to the result (91), and as it can be seen from (59), bwidecheck=𝒱L,ϵbwidecheck𝑏superscriptsubscript𝒱𝐿italic-ϵtop𝑏\widecheck{b}=\mathcal{V}_{L,\epsilon}^{\top}boverwidecheck start_ARG italic_b end_ARG = caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_b depends on b𝑏bitalic_b in (27), which is defined in terms of the gradients {wkJk(wko)}subscriptsubscript𝑤𝑘subscript𝐽𝑘subscriptsuperscript𝑤𝑜𝑘\{\nabla_{w_{k}}J_{k}(w^{o}_{k})\}{ ∇ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) }. Since the costs Jk(wk)subscript𝐽𝑘subscript𝑤𝑘J_{k}(w_{k})italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) are twice differentiable, then b2superscriptnorm𝑏2\|b\|^{2}∥ italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is bounded and we obtain bwidecheck2=O(1)superscriptnormwidecheck𝑏2𝑂1\|\widecheck{b}\|^{2}=O(1)∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( 1 ).

For the gradient noise terms, by following similar arguments as in [6, Chapter 9][40, Appendix D] and by using Assumption 2, we can show that:

𝔼𝒔¯i2+𝔼𝒔widechecki2=𝔼𝒱ϵ1𝒔i2v12βs,max2𝔼𝓦~i12+v12σ¯s2,𝔼superscriptnormsubscript¯𝒔𝑖2𝔼superscriptnormsubscriptwidecheck𝒔𝑖2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒔𝑖2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2𝔼superscriptnormsubscript~𝓦𝑖12superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}+\mathbb{E}\|\widecheck{% \boldsymbol{s}}_{i}\|^{2}=\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{s% }_{i}\|^{2}\leq v_{1}^{2}\beta_{s,\max}^{2}\mathbb{E}\|\widetilde{\boldsymbol{% {\scriptstyle\mathcal{W}}}}_{i-1}\|^{2}+v_{1}^{2}\overline{\sigma}^{2}_{s},blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , (96)

where v1=𝒱ϵ1subscript𝑣1normsuperscriptsubscript𝒱italic-ϵ1v_{1}=\|\mathcal{V}_{\epsilon}^{-1}\|italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥, βs,max2=max1kKβs,k2superscriptsubscript𝛽𝑠2subscript1𝑘𝐾superscriptsubscript𝛽𝑠𝑘2\beta_{s,\max}^{2}=\max_{1\leq k\leq K}\beta_{s,k}^{2}italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and σ¯s2=k=1Kσs,k2subscriptsuperscript¯𝜎2𝑠superscriptsubscript𝑘1𝐾subscriptsuperscript𝜎2𝑠𝑘\overline{\sigma}^{2}_{s}=\sum_{k=1}^{K}\sigma^{2}_{s,k}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_k end_POSTSUBSCRIPT. Using expression (33), the fact that ϕ~i1=ϕ~i1z+𝒛i1subscript~bold-italic-ϕ𝑖1subscriptsuperscript~bold-italic-ϕ𝑧𝑖1subscript𝒛𝑖1\widetilde{\boldsymbol{\phi}}_{i-1}=\widetilde{\boldsymbol{\phi}}^{z}_{i-1}+% \boldsymbol{z}_{i-1}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, and the Jordan decomposition of the matrix 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in (44), we obtain:

𝔼𝒔¯i2+𝔼𝒔widechecki2𝔼superscriptnormsubscript¯𝒔𝑖2𝔼superscriptnormsubscriptwidecheck𝒔𝑖2\displaystyle\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}+\mathbb{E}\|% \widecheck{\boldsymbol{s}}_{i}\|^{2}blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT v12βs,max2𝔼𝒜(ϕ~i1z+𝒛i1)2+v12σ¯s2absentsuperscriptsubscript𝑣12superscriptsubscript𝛽𝑠2𝔼superscriptnormsuperscript𝒜subscriptsuperscript~bold-italic-ϕ𝑧𝑖1subscript𝒛𝑖12superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\displaystyle\leq v_{1}^{2}\beta_{s,\max}^{2}\mathbb{E}\|\mathcal{A}^{\prime}(% \widetilde{\boldsymbol{\phi}}^{z}_{i-1}+\boldsymbol{z}_{i-1})\|^{2}+v_{1}^{2}% \overline{\sigma}^{2}_{s}≤ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
v12βs,max2𝔼𝒱ϵΛ(𝒱ϵ1ϕ~i1z+𝒱ϵ1𝒛i1)2+v12σ¯s2absentsuperscriptsubscript𝑣12superscriptsubscript𝛽𝑠2𝔼superscriptnormsubscript𝒱italic-ϵsuperscriptΛsuperscriptsubscript𝒱italic-ϵ1subscriptsuperscript~bold-italic-ϕ𝑧𝑖1superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\displaystyle\leq v_{1}^{2}\beta_{s,\max}^{2}\mathbb{E}\|\mathcal{V}_{\epsilon% }\Lambda^{\prime}(\mathcal{V}_{\epsilon}^{-1}\widetilde{\boldsymbol{\phi}}^{z}% _{i-1}+\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1})\|^{2}+v_{1}^{2}% \overline{\sigma}^{2}_{s}≤ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
(a)v12βs,max2v22(𝔼𝒱ϵ1ϕ~i1z+𝒱ϵ1𝒛i12)+v12σ¯s2(a)superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscriptsuperscript~bold-italic-ϕ𝑧𝑖1superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\displaystyle\overset{\text{(a)}}{\leq}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}(% \mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\widetilde{\boldsymbol{\phi}}^{z}_{i-1}% +\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2})+v_{1}^{2}\overline{% \sigma}^{2}_{s}over(a) start_ARG ≤ end_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
2v12βs,max2v22(𝔼ϕ¯i1z2+𝔼ϕwidechecki1z2)+2v12βs,max2v22𝔼𝒱ϵ1𝒛i12+v12σ¯s2,absent2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖12𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖122superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\displaystyle\leq 2v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}(\mathbb{E}\|\overline{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{% z}_{i-1}\|^{2})+2v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\mathbb{E}\|\mathcal{V}_{% \epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}+v_{1}^{2}\overline{\sigma}^{2}_{s},≤ 2 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , (97)

where v2=𝒱ϵsubscript𝑣2normsubscript𝒱italic-ϵv_{2}=\|\mathcal{V}_{\epsilon}\|italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥. In step (a) we used the sub-multiplicative property of norms and the fact that the 2limit-from22-2 -induced matrix norm of the block diagonal matrix ΛsuperscriptΛ\Lambda^{\prime}roman_Λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in (44) is equal to 1111. Using the bound (97) into (90) and (91), we obtain:

𝔼ϕ¯iz2(1μσ11+2μ2v12βs,max2v22)𝔼ϕ¯i1z2+(3μσ122σ11+2μ2v12βs,max2v22)𝔼ϕwidechecki1z2+(3μσ11+3μσ122σ11+2μ2v12βs,max2v22)𝔼𝒱ϵ1𝒛i12+μ2v12σ¯s2,𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖21𝜇subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖123𝜇subscriptsuperscript𝜎212subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖123𝜇subscript𝜎113𝜇subscriptsuperscript𝜎212subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscript𝜇2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\begin{split}\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}\leq&\left(% 1-\mu\sigma_{11}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}% \|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\left(\frac{3\mu\sigma^{2}_{12}% }{\sigma_{11}}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|% \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\\ &\left(3\mu\sigma_{11}+\frac{3\mu\sigma^{2}_{12}}{\sigma_{11}}+2\mu^{2}v_{1}^{% 2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}% \boldsymbol{z}_{i-1}\|^{2}+\mu^{2}v_{1}^{2}\overline{\sigma}^{2}_{s},\end{split}start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL start_CELL ( 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( 3 italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , end_CELL end_ROW (98)

and

𝔼ϕwidecheckiz2(𝒥ϵ′′+10μ2σ2221𝒥ϵ′′+2μ2v12βs,max2v22)𝔼ϕwidechecki1z2+(10μ2σ2121𝒥ϵ′′+2μ2v12βs,max2v22)𝔼ϕ¯i1z2+(2ζ2I𝒥ϵ21𝒥ϵ′′+10μ2(σ222+σ212)1𝒥ϵ′′+2μ2v12βs,max2v22)𝔼𝒱ϵ1𝒛i12+(10μ21𝒥ϵ′′)bwidecheck2+μ2v12σ¯s2.𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2delimited-∥∥subscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎2221normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1210superscript𝜇2superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖122superscript𝜁2superscriptnorm𝐼superscriptsubscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎222superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖1210superscript𝜇21normsubscriptsuperscript𝒥′′italic-ϵsuperscriptdelimited-∥∥widecheck𝑏2superscript𝜇2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\begin{split}\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}\leq&\left% (\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+\frac{10\mu^{2}\sigma_{22}^{2}}{1-% \|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}% v_{2}^{2}\right)\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+% \left(\frac{10\mu^{2}\sigma_{21}^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon% }\|}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|\overline{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\\ &\left(\frac{2\zeta^{2}\|I-\mathcal{J}_{\epsilon}^{\prime}\|^{2}}{1-\|\mathcal% {J}^{\prime\prime}_{\epsilon}\|}+\frac{10\mu^{2}(\sigma_{22}^{2}+\sigma_{21}^{% 2})}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+2\mu^{2}v_{1}^{2}\beta_{s,% \max}^{2}v_{2}^{2}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}% _{i-1}\|^{2}+\left(\frac{10\mu^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}% \|}\right)\|\widecheck{b}\|^{2}+\mu^{2}v_{1}^{2}\overline{\sigma}^{2}_{s}.\end% {split}start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL start_CELL ( ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG ) ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT . end_CELL end_ROW (99)

Now, for the quantization noise vector 𝒛isubscript𝒛𝑖\boldsymbol{z}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (40), we have:

𝔼𝒱ϵ1𝒛i2ζ2v12(k=1K𝔼𝒛k,i2).𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑘1𝐾𝔼superscriptdelimited-∥∥subscript𝒛𝑘𝑖2\begin{split}\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}% \leq\zeta^{2}v_{1}^{2}\left(\sum_{k=1}^{K}\mathbb{E}\|\boldsymbol{z}_{k,i}\|^{% 2}\right).\end{split}start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ∥ bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . end_CELL end_ROW (100)

From (19) and Assumption 3, and since 𝝍k,iϕk,i1=ϕ~k,i1𝝍~k,isubscript𝝍𝑘𝑖subscriptbold-italic-ϕ𝑘𝑖1subscript~bold-italic-ϕ𝑘𝑖1subscript~𝝍𝑘𝑖\boldsymbol{\psi}_{k,i}-\boldsymbol{\phi}_{k,i-1}=\widetilde{\boldsymbol{\phi}% }_{k,i-1}-\widetilde{\boldsymbol{\psi}}_{k,i}bold_italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT = over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, we can write:

𝔼𝒛k,i2βc,k2𝔼ϕ~k,i1𝝍~k,i+𝒛k,i12+σc,k2,𝔼superscriptdelimited-∥∥subscript𝒛𝑘𝑖2subscriptsuperscript𝛽2𝑐𝑘𝔼superscriptdelimited-∥∥subscript~bold-italic-ϕ𝑘𝑖1subscript~𝝍𝑘𝑖subscript𝒛𝑘𝑖12subscriptsuperscript𝜎2𝑐𝑘\begin{split}\mathbb{E}\|\boldsymbol{z}_{k,i}\|^{2}\leq{\beta}^{2}_{c,k}% \mathbb{E}\|\widetilde{\boldsymbol{\phi}}_{k,i-1}-\widetilde{\boldsymbol{\psi}% }_{k,i}+\boldsymbol{z}_{k,i-1}\|^{2}+{\sigma}^{2}_{c,k},\end{split}start_ROW start_CELL blackboard_E ∥ bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT , end_CELL end_ROW (101)

and, therefore,

k=1K𝔼𝒛k,i2βc,max2𝔼ϕ~i1𝝍~i+𝒛i12+σ¯c2,superscriptsubscript𝑘1𝐾𝔼superscriptdelimited-∥∥subscript𝒛𝑘𝑖2superscriptsubscript𝛽𝑐2𝔼superscriptdelimited-∥∥subscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖12subscriptsuperscript¯𝜎2𝑐\begin{split}{\sum_{k=1}^{K}\mathbb{E}\|\boldsymbol{z}_{k,i}\|^{2}}\leq{\beta}% _{c,\max}^{2}\mathbb{E}\|\widetilde{\boldsymbol{\phi}}_{i-1}-\widetilde{% \boldsymbol{\psi}}_{i}+\boldsymbol{z}_{i-1}\|^{2}+\overline{\sigma}^{2}_{c},% \end{split}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ∥ bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , end_CELL end_ROW (102)

where βc,max2=max1kK{βc,k2}superscriptsubscript𝛽𝑐2subscript1𝑘𝐾subscriptsuperscript𝛽2𝑐𝑘{\beta}_{c,\max}^{2}=\max_{1\leq k\leq K}\{{\beta}^{2}_{c,k}\}italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT { italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT } and σ¯c2=k=1Kσc,k2subscriptsuperscript¯𝜎2𝑐superscriptsubscript𝑘1𝐾subscriptsuperscript𝜎2𝑐𝑘\overline{\sigma}^{2}_{c}=\sum_{k=1}^{K}{\sigma}^{2}_{c,k}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT. Since the analysis is facilitated by transforming the network vectors into the Jordan decomposition basis of the matrix 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we proceed by noting that the term k=1K𝔼𝒛k,i2superscriptsubscript𝑘1𝐾𝔼superscriptnormsubscript𝒛𝑘𝑖2{\sum_{k=1}^{K}\mathbb{E}\|\boldsymbol{z}_{k,i}\|^{2}}∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ∥ bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be bounded as follows:

k=1K𝔼𝒛k,i2(102)βc,max2𝔼𝒱ϵ𝒱ϵ1(ϕ~i1𝝍~i+𝒛i1)2+σ¯c2βc,max2𝒱ϵ2𝔼𝒱ϵ1(ϕ~i1𝝍~i+𝒛i1)2+σ¯c2=v22βc,max2𝔼𝒱ϵ1(ϕ~i1𝝍~i+𝒛i1)2+σ¯c2=v22βc,max2(𝔼𝝌¯i2+𝔼𝝌widechecki2)+σ¯c2,superscriptsubscript𝑘1𝐾𝔼superscriptdelimited-∥∥subscript𝒛𝑘𝑖2italic-(102italic-)superscriptsubscript𝛽𝑐2𝔼superscriptdelimited-∥∥subscript𝒱italic-ϵsuperscriptsubscript𝒱italic-ϵ1subscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖12subscriptsuperscript¯𝜎2𝑐superscriptsubscript𝛽𝑐2superscriptdelimited-∥∥subscript𝒱italic-ϵ2𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖12subscriptsuperscript¯𝜎2𝑐superscriptsubscript𝑣22superscriptsubscript𝛽𝑐2𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖12subscriptsuperscript¯𝜎2𝑐superscriptsubscript𝑣22superscriptsubscript𝛽𝑐2𝔼superscriptdelimited-∥∥subscript¯𝝌𝑖2𝔼superscriptdelimited-∥∥subscriptwidecheck𝝌𝑖2subscriptsuperscript¯𝜎2𝑐\begin{split}{\sum_{k=1}^{K}\mathbb{E}\|\boldsymbol{z}_{k,i}\|^{2}}&\overset{% \eqref{eq: expression 4 biased}}{\leq}{\beta}_{c,\max}^{2}\mathbb{E}\|\mathcal% {V}_{\epsilon}\mathcal{V}_{\epsilon}^{-1}(\widetilde{\boldsymbol{\phi}}_{i-1}-% \widetilde{\boldsymbol{\psi}}_{i}+\boldsymbol{z}_{i-1})\|^{2}+\overline{\sigma% }^{2}_{c}\\ &~{}\leq{\beta}_{c,\max}^{2}\|\mathcal{V}_{\epsilon}\|^{2}\mathbb{E}\|\mathcal% {V}_{\epsilon}^{-1}(\widetilde{\boldsymbol{\phi}}_{i-1}-\widetilde{\boldsymbol% {\psi}}_{i}+\boldsymbol{z}_{i-1})\|^{2}+\overline{\sigma}^{2}_{c}\\ &~{}=v_{2}^{2}{\beta}_{c,\max}^{2}\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}(% \widetilde{\boldsymbol{\phi}}_{i-1}-\widetilde{\boldsymbol{\psi}}_{i}+% \boldsymbol{z}_{i-1})\|^{2}+\overline{\sigma}^{2}_{c}\\ &~{}=v_{2}^{2}{\beta}_{c,\max}^{2}\left(\mathbb{E}\|\overline{\boldsymbol{\chi% }}_{i}\|^{2}+\mathbb{E}\|\widecheck{\boldsymbol{\chi}}_{i}\|^{2}\right)+% \overline{\sigma}^{2}_{c},\end{split}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ∥ bold_italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_E ∥ over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , end_CELL end_ROW (103)

where

𝝌¯isubscript¯𝝌𝑖\displaystyle\overline{\boldsymbol{\chi}}_{i}over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒰(ϕ~i1𝝍~i+𝒛i1),absentsuperscript𝒰topsubscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖1\displaystyle\triangleq\mathcal{U}^{\top}(\widetilde{\boldsymbol{\phi}}_{i-1}-% \widetilde{\boldsymbol{\psi}}_{i}+\boldsymbol{z}_{i-1}),≜ caligraphic_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) , (104)
𝝌widecheckisubscriptwidecheck𝝌𝑖\displaystyle\widecheck{\boldsymbol{\chi}}_{i}overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒱L,ϵ(ϕ~i1𝝍~i+𝒛i1).absentsuperscriptsubscript𝒱𝐿italic-ϵtopsubscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖1\displaystyle\triangleq\mathcal{V}_{L,\epsilon}^{\top}(\widetilde{\boldsymbol{% \phi}}_{i-1}-\widetilde{\boldsymbol{\psi}}_{i}+\boldsymbol{z}_{i-1}).≜ caligraphic_V start_POSTSUBSCRIPT italic_L , italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) . (105)

Therefore, by combining (100) and (103), we obtain:

𝔼𝒱ϵ1𝒛i2ζ2v12v22βc,max2[𝔼𝝌¯i2+𝔼𝝌widechecki2]+ζ2v12σ¯c2.𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑣22superscriptsubscript𝛽𝑐2delimited-[]𝔼superscriptdelimited-∥∥subscript¯𝝌𝑖2𝔼superscriptdelimited-∥∥subscriptwidecheck𝝌𝑖2superscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑐\begin{split}\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}% \leq\zeta^{2}v_{1}^{2}v_{2}^{2}{\beta}_{c,\max}^{2}[\mathbb{E}\|\overline{% \boldsymbol{\chi}}_{i}\|^{2}+\mathbb{E}\|\widecheck{\boldsymbol{\chi}}_{i}\|^{% 2}]+\zeta^{2}v_{1}^{2}\overline{\sigma}^{2}_{c}.\end{split}start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ blackboard_E ∥ over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT . end_CELL end_ROW (106)

We focus now on deriving the recursions that allow to examine the time-evolution of the transformed vectors 𝝌¯isubscript¯𝝌𝑖\overline{\boldsymbol{\chi}}_{i}over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝝌widecheckisubscriptwidecheck𝝌𝑖\widecheck{\boldsymbol{\chi}}_{i}overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Subtracting ϕ~i1subscript~bold-italic-ϕ𝑖1\widetilde{\boldsymbol{\phi}}_{i-1}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT from both sides of (31), adding 𝒛i1subscript𝒛𝑖1\boldsymbol{z}_{i-1}bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, and using (33), we can write:

ϕ~i1𝝍~i+𝒛i1subscript~bold-italic-ϕ𝑖1subscript~𝝍𝑖subscript𝒛𝑖1\displaystyle\widetilde{\boldsymbol{\phi}}_{i-1}-\widetilde{\boldsymbol{\psi}}% _{i}+\boldsymbol{z}_{i-1}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT =(IM(IMμζ𝓗i1)𝒜)ϕ~i1+μζ𝒔iμζb+𝒛i1absentsubscript𝐼𝑀subscript𝐼𝑀𝜇𝜁subscript𝓗𝑖1superscript𝒜subscript~bold-italic-ϕ𝑖1𝜇𝜁subscript𝒔𝑖𝜇𝜁𝑏subscript𝒛𝑖1\displaystyle=\left(I_{M}-\left(I_{M}-\frac{\mu}{\zeta}\boldsymbol{\cal{H}}_{i% -1}\right)\mathcal{A}^{\prime}\right)\widetilde{\boldsymbol{\phi}}_{i-1}+\frac% {\mu}{\zeta}\boldsymbol{s}_{i}-\frac{\mu}{\zeta}b+\boldsymbol{z}_{i-1}= ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_b + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT
=(a)(IM(IMμζ𝓗i1)𝒜)ϕ~i1z+μζ𝒔iμζb+(2IM(IMμζ𝓗i1)𝒜)𝒛i1,(a)subscript𝐼𝑀subscript𝐼𝑀𝜇𝜁subscript𝓗𝑖1superscript𝒜subscriptsuperscript~bold-italic-ϕ𝑧𝑖1𝜇𝜁subscript𝒔𝑖𝜇𝜁𝑏2subscript𝐼𝑀subscript𝐼𝑀𝜇𝜁subscript𝓗𝑖1superscript𝒜subscript𝒛𝑖1\displaystyle\overset{\text{(a)}}{=}\left(I_{M}-\left(I_{M}-\frac{\mu}{\zeta}% \boldsymbol{\cal{H}}_{i-1}\right)\mathcal{A}^{\prime}\right)\widetilde{% \boldsymbol{\phi}}^{z}_{i-1}+\frac{\mu}{\zeta}\boldsymbol{s}_{i}-\frac{\mu}{% \zeta}b+\left(2I_{M}-\left(I_{M}-\frac{\mu}{\zeta}\boldsymbol{\cal{H}}_{i-1}% \right)\mathcal{A}^{\prime}\right)\boldsymbol{z}_{i-1},over(a) start_ARG = end_ARG ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_b + ( 2 italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - ( italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_H start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , (107)

where in (a) we used the fact that ϕ~i1=ϕ~i1z+𝒛i1subscript~bold-italic-ϕ𝑖1subscriptsuperscript~bold-italic-ϕ𝑧𝑖1subscript𝒛𝑖1\widetilde{\boldsymbol{\phi}}_{i-1}=\widetilde{\boldsymbol{\phi}}^{z}_{i-1}+% \boldsymbol{z}_{i-1}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. By multiplying both sides of (107) by 𝒱ϵ1superscriptsubscript𝒱italic-ϵ1\mathcal{V}_{\epsilon}^{-1}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and by using (49)–(64), (67)–(70), and the Jordan decomposition of the matrix 𝒜superscript𝒜\mathcal{A}^{\prime}caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we obtain:

[𝝌¯i𝝌widechecki]=[μζ𝓓11,i1μζ𝓓12,i1μζ𝓓21,i1IMP𝒥ϵ+μζ𝓓22,i1][ϕ¯i1zϕwidechecki1z]+μζ[𝒔¯i𝒔widechecki]μζ[0bwidecheck]+[IP+μζ𝓓11,i1μζ𝓓12,i1μζ𝓓21,i12IMP𝒥ϵ+μζ𝓓22,i1][𝒛¯i1𝒛widechecki1].delimited-[]subscript¯𝝌𝑖subscriptwidecheck𝝌𝑖delimited-[]𝜇𝜁subscript𝓓11𝑖1𝜇𝜁subscript𝓓12𝑖1𝜇𝜁subscript𝓓21𝑖1subscript𝐼𝑀𝑃subscriptsuperscript𝒥italic-ϵ𝜇𝜁subscript𝓓22𝑖1delimited-[]subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇𝜁delimited-[]subscript¯𝒔𝑖subscriptwidecheck𝒔𝑖𝜇𝜁delimited-[]0widecheck𝑏delimited-[]subscript𝐼𝑃𝜇𝜁subscript𝓓11𝑖1𝜇𝜁subscript𝓓12𝑖1𝜇𝜁subscript𝓓21𝑖12subscript𝐼𝑀𝑃subscriptsuperscript𝒥italic-ϵ𝜇𝜁subscript𝓓22𝑖1delimited-[]subscript¯𝒛𝑖1subscriptwidecheck𝒛𝑖1\begin{split}\left[\begin{array}[]{c}\overline{\boldsymbol{\chi}}_{i}\\ \widecheck{\boldsymbol{\chi}}_{i}\end{array}\right]=&\left[\begin{array}[]{cc}% \frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{11,i-1}&\frac{\mu}{\zeta}\boldsymbol{% \cal{D}}_{12,i-1}\\ \frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{21,i-1}&I_{M-P}-\mathcal{J}^{\prime}_{% \epsilon}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{22,i-1}\end{array}\right]% \left[\begin{array}[]{c}\overline{\boldsymbol{\phi}}^{z}_{i-1}\\ \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\end{array}\right]+\frac{\mu}{\zeta}% \left[\begin{array}[]{c}\overline{\boldsymbol{s}}_{i}\\ \widecheck{\boldsymbol{s}}_{i}\end{array}\right]-\frac{\mu}{\zeta}\left[\begin% {array}[]{c}0\\ \widecheck{b}\end{array}\right]+\\ &\left[\begin{array}[]{cc}I_{P}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{11,i-1}% &\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{12,i-1}\\ \frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{21,i-1}&2I_{M-P}-\mathcal{J}^{\prime}_{% \epsilon}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{22,i-1}\end{array}\right]% \left[\begin{array}[]{c}\overline{\boldsymbol{z}}_{i-1}\\ \widecheck{\boldsymbol{z}}_{i-1}\end{array}\right].\end{split}start_ROW start_CELL [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] = end_CELL start_CELL [ start_ARRAY start_ROW start_CELL divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_I start_POSTSUBSCRIPT italic_M - italic_P end_POSTSUBSCRIPT - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG [ start_ARRAY start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG italic_b end_ARG end_CELL end_ROW end_ARRAY ] + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL [ start_ARRAY start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT end_CELL start_CELL 2 italic_I start_POSTSUBSCRIPT italic_M - italic_P end_POSTSUBSCRIPT - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] . end_CELL end_ROW (108)

Again, by using similar arguments as those used to establish inequalities (119) and (124) in [40, Appendix D], we can verify that:

𝔼𝝌¯i2=𝔼(IP+μζ𝓓11,i1)𝒛¯i1+μζ𝓓12,i1𝒛widechecki1+μζ𝓓11,i1ϕ¯i1z+μζ𝓓12,i1ϕwidechecki1z2+μ2ζ2𝔼𝒔¯i22(1+μζσ11)2𝔼𝒛¯i12+2μ2ζ2𝔼𝓓12,i1𝒛widechecki1+𝓓11,i1ϕ¯i1z+𝓓12,i1ϕwidechecki1z2+μ2ζ2𝔼𝒔¯i22(1+μζσ11)2𝔼𝒛¯i12+6μ2σ122ζ2𝔼𝒛widechecki12+6μ2σ112ζ2𝔼ϕ¯i1z2+6μ2σ122ζ2𝔼ϕwidechecki1z2+μ2ζ2𝔼𝒔¯i2(a)(2(1+μζσ11)2+6μ2σ122ζ2)𝔼𝒱ϵ1𝒛i12+6μ2σ112ζ2𝔼ϕ¯i1z2+6μ2σ122ζ2𝔼ϕwidechecki1z2+μ2ζ2𝔼𝒔¯i2,𝔼superscriptdelimited-∥∥subscript¯𝝌𝑖2𝔼superscriptdelimited-∥∥subscript𝐼𝑃𝜇𝜁subscript𝓓11𝑖1subscript¯𝒛𝑖1𝜇𝜁subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖1𝜇𝜁subscript𝓓11𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝜇𝜁subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖22superscript1𝜇𝜁subscript𝜎112𝔼superscriptdelimited-∥∥subscript¯𝒛𝑖122superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript𝓓12𝑖1subscriptwidecheck𝒛𝑖1subscript𝓓11𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1subscript𝓓12𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖22superscript1𝜇𝜁subscript𝜎112𝔼superscriptdelimited-∥∥subscript¯𝒛𝑖126superscript𝜇2subscriptsuperscript𝜎212superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖126superscript𝜇2subscriptsuperscript𝜎211superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖126superscript𝜇2subscriptsuperscript𝜎212superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖2(a)2superscript1𝜇𝜁subscript𝜎1126superscript𝜇2subscriptsuperscript𝜎212superscript𝜁2𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖126superscript𝜇2subscriptsuperscript𝜎211superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖126superscript𝜇2subscriptsuperscript𝜎212superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖2\begin{split}\mathbb{E}\|\overline{\boldsymbol{\chi}}_{i}\|^{2}&=\mathbb{E}% \left\|(I_{P}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{11,i-1})\overline{% \boldsymbol{z}}_{i-1}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{12,i-1}\widecheck% {\boldsymbol{z}}_{i-1}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{11,i-1}\overline% {\boldsymbol{\phi}}^{z}_{i-1}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{12,i-1}% \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\right\|^{2}+\frac{\mu^{2}}{\zeta^{2}}% \mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}\\ &\leq 2(1+\frac{\mu}{\zeta}\sigma_{11})^{2}\mathbb{E}\|\overline{\boldsymbol{z% }}_{i-1}\|^{2}+\frac{2\mu^{2}}{\zeta^{2}}\mathbb{E}\|\boldsymbol{\cal{D}}_{12,% i-1}\widecheck{\boldsymbol{z}}_{i-1}+\boldsymbol{\cal{D}}_{11,i-1}\overline{% \boldsymbol{\phi}}^{z}_{i-1}+\boldsymbol{\cal{D}}_{12,i-1}\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{\mu^{2}}{\zeta^{2}}\mathbb{E}\|% \overline{\boldsymbol{s}}_{i}\|^{2}\\ &\leq 2(1+\frac{\mu}{\zeta}\sigma_{11})^{2}\mathbb{E}\|\overline{\boldsymbol{z% }}_{i-1}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{12}}{\zeta^{2}}\mathbb{E}\|\widecheck% {\boldsymbol{z}}_{i-1}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{11}}{\zeta^{2}}\mathbb{% E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{12}% }{\zeta^{2}}\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{% \mu^{2}}{\zeta^{2}}\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}\\ &\overset{\text{(a)}}{\leq}\left(2(1+\frac{\mu}{\zeta}\sigma_{11})^{2}+\frac{6% \mu^{2}\sigma^{2}_{12}}{\zeta^{2}}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-% 1}\boldsymbol{z}_{i-1}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{11}}{\zeta^{2}}\mathbb{% E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{12}% }{\zeta^{2}}\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{% \mu^{2}}{\zeta^{2}}\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2},\end{split}start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL = blackboard_E ∥ ( italic_I start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT ) over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ 2 ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 11 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 12 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ 2 ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over(a) start_ARG ≤ end_ARG ( 2 ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW (109)

and

𝔼𝝌widechecki2=𝔼(2I𝒥ϵ)𝒛widechecki1+μζ𝓓22,i1𝒛widechecki1+μζ𝓓21,i1𝒛¯i1+μζ𝓓21,i1ϕ¯i1z+(I𝒥ϵ)ϕwidechecki1z+μζ𝓓22,i1ϕwidechecki1zμζbwidecheck2+μ2ζ2𝔼𝒔widechecki222I𝒥ϵ2𝔼𝒛widechecki12+4μ2ζ2𝔼𝓓22,i1𝒛widechecki1+𝓓21,i1𝒛¯i1+𝓓21,i1ϕ¯i1z+𝓓22,i1ϕwidechecki1zbwidecheck2+4I𝒥ϵ2𝔼ϕwidechecki1z2+μ2ζ2𝔼𝒔widechecki2(22I𝒥ϵ2+20μ2σ222ζ2)𝔼𝒛widechecki12+20μ2σ212ζ2𝔼𝒛¯i12+20μ2σ212ζ2𝔼ϕ¯i1z2+20μ2ζ2bwidecheck2+(4I𝒥ϵ2+20μ2σ222ζ2)𝔼ϕwidechecki1z2+μ2ζ2𝔼𝒔widechecki2(b)(22I𝒥ϵ2+20μ2(σ222+σ212)ζ2)𝔼𝒱ϵ1𝒛i12+20μ2σ212ζ2𝔼ϕ¯i1z2+20μ2ζ2bwidecheck2+(4I𝒥ϵ2+20μ2σ222ζ2)𝔼ϕwidechecki1z2+μ2ζ2𝔼𝒔widechecki2.𝔼superscriptdelimited-∥∥subscriptwidecheck𝝌𝑖2𝔼superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵsubscriptwidecheck𝒛𝑖1𝜇𝜁subscript𝓓22𝑖1subscriptwidecheck𝒛𝑖1𝜇𝜁subscript𝓓21𝑖1subscript¯𝒛𝑖1𝜇𝜁subscript𝓓21𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1𝐼subscriptsuperscript𝒥italic-ϵsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇𝜁subscript𝓓22𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1𝜇𝜁widecheck𝑏2superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖22superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖124superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript𝓓22𝑖1subscriptwidecheck𝒛𝑖1subscript𝓓21𝑖1subscript¯𝒛𝑖1subscript𝓓21𝑖1subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1subscript𝓓22𝑖1subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1widecheck𝑏24superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖22superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ220superscript𝜇2subscriptsuperscript𝜎222superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒛𝑖1220superscript𝜇2subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒛𝑖1220superscript𝜇2subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1220superscript𝜇2superscript𝜁2superscriptdelimited-∥∥widecheck𝑏24superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ220superscript𝜇2subscriptsuperscript𝜎222superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2(b)2superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ220superscript𝜇2subscriptsuperscript𝜎222subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖1220superscript𝜇2subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖1220superscript𝜇2superscript𝜁2superscriptdelimited-∥∥widecheck𝑏24superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ220superscript𝜇2subscriptsuperscript𝜎222superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2\begin{split}\mathbb{E}\|\widecheck{\boldsymbol{\chi}}_{i}\|^{2}&=\mathbb{E}% \left\|(2I-\mathcal{J}^{\prime}_{\epsilon})\widecheck{\boldsymbol{z}}_{i-1}+% \frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{22,i-1}\widecheck{\boldsymbol{z}}_{i-1}% +\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{z}}_{i-1}% +\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{21,i-1}\overline{\boldsymbol{\phi}}^{z% }_{i-1}+(I-\mathcal{J}^{\prime}_{\epsilon})\widecheck{\boldsymbol{\phi}}^{z}_{% i-1}+\frac{\mu}{\zeta}\boldsymbol{\cal{D}}_{22,i-1}\widecheck{\boldsymbol{\phi% }}^{z}_{i-1}-\frac{\mu}{\zeta}\widecheck{b}\right\|^{2}\\ &\qquad+\frac{\mu^{2}}{\zeta^{2}}\mathbb{E}\|\widecheck{\boldsymbol{s}}_{i}\|^% {2}\\ &\leq 2\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\mathbb{E}\|\widecheck{% \boldsymbol{z}}_{i-1}\|^{2}+\frac{4\mu^{2}}{\zeta^{2}}\mathbb{E}\|\boldsymbol{% \cal{D}}_{22,i-1}\widecheck{\boldsymbol{z}}_{i-1}+\boldsymbol{\cal{D}}_{21,i-1% }\overline{\boldsymbol{z}}_{i-1}+\boldsymbol{\cal{D}}_{21,i-1}\overline{% \boldsymbol{\phi}}^{z}_{i-1}+\boldsymbol{\cal{D}}_{22,i-1}\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}-\widecheck{b}\|^{2}+\\ &\qquad 4\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\mathbb{E}\|\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{\mu^{2}}{\zeta^{2}}\mathbb{E}\|% \widecheck{\boldsymbol{s}}_{i}\|^{2}\\ &\leq\left(2\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{20\mu^{2}\sigma^{% 2}_{22}}{\zeta^{2}}\right)\mathbb{E}\|\widecheck{\boldsymbol{z}}_{i-1}\|^{2}+% \frac{20\mu^{2}\sigma^{2}_{21}}{\zeta^{2}}\mathbb{E}\|\overline{\boldsymbol{z}% }_{i-1}\|^{2}+\frac{20\mu^{2}\sigma^{2}_{21}}{\zeta^{2}}\mathbb{E}\|\overline{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{20\mu^{2}}{\zeta^{2}}\|\widecheck{b}% \|^{2}+\\ &\qquad\left(4\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{20\mu^{2}\sigma^% {2}_{22}}{\zeta^{2}}\right)\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}% \|^{2}+\frac{\mu^{2}}{\zeta^{2}}\mathbb{E}\|\widecheck{\boldsymbol{s}}_{i}\|^{% 2}\\ &\overset{\text{(b)}}{\leq}\left(2\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+% \frac{20\mu^{2}(\sigma^{2}_{22}+\sigma^{2}_{21})}{\zeta^{2}}\right)\mathbb{E}% \|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}+\frac{20\mu^{2}\sigma^% {2}_{21}}{\zeta^{2}}\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+% \frac{20\mu^{2}}{\zeta^{2}}\|\widecheck{b}\|^{2}+\\ &\qquad\left(4\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{20\mu^{2}\sigma^% {2}_{22}}{\zeta^{2}}\right)\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}% \|^{2}+\frac{\mu^{2}}{\zeta^{2}}\mathbb{E}\|\widecheck{\boldsymbol{s}}_{i}\|^{% 2}.\end{split}start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL = blackboard_E ∥ ( 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + ( italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ 2 ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 21 , italic_i - 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + bold_caligraphic_D start_POSTSUBSCRIPT 22 , italic_i - 1 end_POSTSUBSCRIPT overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL 4 ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ( 2 ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( 4 ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over(b) start_ARG ≤ end_ARG ( 2 ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( 4 ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (110)

In steps (a) and (b) we used the fact that the norm of the components {𝒛¯i1,𝒛widechecki1}subscript¯𝒛𝑖1subscriptwidecheck𝒛𝑖1\{\overline{\boldsymbol{z}}_{i-1},\widecheck{\boldsymbol{z}}_{i-1}\}{ over¯ start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , overwidecheck start_ARG bold_italic_z end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT } is smaller than the norm of the transformed vector 𝒱ϵ1𝒛i1superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖1\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. By combining expressions (109) and (110), we obtain:

𝔼𝝌¯i2+𝔼𝝌widechecki2(2(1+μζσ11)2+22I𝒥ϵ2+6μ2σ122ζ2+20μ2(σ222+σ212)ζ2)𝔼𝒱ϵ1𝒛i12+(6μ2σ112ζ2+20μ2σ212ζ2)𝔼ϕ¯i1z2+(4I𝒥ϵ2+6μ2σ122ζ2+20μ2σ222ζ2)𝔼ϕwidechecki1z2+20μ2ζ2bwidecheck2+μ2ζ2(𝔼𝒔¯i2+𝔼𝒔widechecki2),𝔼superscriptdelimited-∥∥subscript¯𝝌𝑖2𝔼superscriptdelimited-∥∥subscriptwidecheck𝝌𝑖22superscript1𝜇𝜁subscript𝜎1122superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎212superscript𝜁220superscript𝜇2subscriptsuperscript𝜎222subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖126superscript𝜇2subscriptsuperscript𝜎211superscript𝜁220superscript𝜇2subscriptsuperscript𝜎221superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖124superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎212superscript𝜁220superscript𝜇2subscriptsuperscript𝜎222superscript𝜁2𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1220superscript𝜇2superscript𝜁2superscriptdelimited-∥∥widecheck𝑏2superscript𝜇2superscript𝜁2𝔼superscriptdelimited-∥∥subscript¯𝒔𝑖2𝔼superscriptdelimited-∥∥subscriptwidecheck𝒔𝑖2\begin{split}\mathbb{E}\|\overline{\boldsymbol{\chi}}_{i}\|^{2}+\mathbb{E}\|% \widecheck{\boldsymbol{\chi}}_{i}\|^{2}\leq&\left(2\left(1+\frac{\mu}{\zeta}% \sigma_{11}\right)^{2}+2\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{6\mu^% {2}\sigma^{2}_{12}}{\zeta^{2}}+\frac{20\mu^{2}(\sigma^{2}_{22}+\sigma^{2}_{21}% )}{\zeta^{2}}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1% }\|^{2}+\\ &\left(\frac{6\mu^{2}\sigma^{2}_{11}}{\zeta^{2}}+\frac{20\mu^{2}\sigma^{2}_{21% }}{\zeta^{2}}\right)\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+% \left(4\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{12}% }{\zeta^{2}}+\frac{20\mu^{2}\sigma^{2}_{22}}{\zeta^{2}}\right)\mathbb{E}\|% \widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\\ &\frac{20\mu^{2}}{\zeta^{2}}\|\widecheck{b}\|^{2}+\frac{\mu^{2}}{\zeta^{2}}% \left(\mathbb{E}\|\overline{\boldsymbol{s}}_{i}\|^{2}+\mathbb{E}\|\widecheck{% \boldsymbol{s}}_{i}\|^{2}\right),\end{split}start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL start_CELL ( 2 ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 4 ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( blackboard_E ∥ over¯ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , end_CELL end_ROW (111)

and by using the bound (97) in (111), we get:

𝔼𝝌¯i2+𝔼𝝌widechecki2(2(1+μζσ11)2+22I𝒥ϵ2+6μ2σ122ζ2+20μ2(σ222+σ212)ζ2+2μ2ζ2v12βs,max2v22)𝔼𝒱ϵ1𝒛i12+(6μ2σ112ζ2+20μ2σ212ζ2+2μ2ζ2v12βs,max2v22)𝔼ϕ¯i1z2+(4I𝒥ϵ2+6μ2σ122ζ2+20μ2σ222ζ2+2μ2ζ2v12βs,max2v22)𝔼ϕwidechecki1z2+20μ2ζ2bwidecheck2+μ2ζ2v12σ¯s2.𝔼superscriptdelimited-∥∥subscript¯𝝌𝑖2𝔼superscriptdelimited-∥∥subscriptwidecheck𝝌𝑖22superscript1𝜇𝜁subscript𝜎1122superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎212superscript𝜁220superscript𝜇2subscriptsuperscript𝜎222subscriptsuperscript𝜎221superscript𝜁22superscript𝜇2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖126superscript𝜇2subscriptsuperscript𝜎211superscript𝜁220superscript𝜇2subscriptsuperscript𝜎221superscript𝜁22superscript𝜇2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖124superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎212superscript𝜁220superscript𝜇2subscriptsuperscript𝜎222superscript𝜁22superscript𝜇2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1220superscript𝜇2superscript𝜁2superscriptdelimited-∥∥widecheck𝑏2superscript𝜇2superscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠\begin{split}\mathbb{E}\|\overline{\boldsymbol{\chi}}_{i}\|^{2}+\mathbb{E}\|% \widecheck{\boldsymbol{\chi}}_{i}\|^{2}\leq&\left(2(1+\frac{\mu}{\zeta}\sigma_% {11})^{2}+2\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{6\mu^{2}\sigma^{2}% _{12}}{\zeta^{2}}+\frac{20\mu^{2}(\sigma^{2}_{22}+\sigma^{2}_{21})}{\zeta^{2}}% +2\frac{\mu^{2}}{\zeta^{2}}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{% E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}\\ &+\left(\frac{6\mu^{2}\sigma^{2}_{11}}{\zeta^{2}}+\frac{20\mu^{2}\sigma^{2}_{2% 1}}{\zeta^{2}}+2\frac{\mu^{2}}{\zeta^{2}}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}% \right)\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\\ &\left(4\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+\frac{6\mu^{2}\sigma^{2}_{12% }}{\zeta^{2}}+\frac{20\mu^{2}\sigma^{2}_{22}}{\zeta^{2}}+2\frac{\mu^{2}}{\zeta% ^{2}}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}+\frac{20\mu^{2}}{\zeta^{2}}\|\widecheck{b}% \|^{2}+\frac{\mu^{2}}{\zeta^{2}}v_{1}^{2}\overline{\sigma}^{2}_{s}.\end{split}start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL start_CELL ( 2 ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 2 divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 2 divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( 4 ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 2 divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT . end_CELL end_ROW (112)

Finally, by using (112) in (106), we find the following inequality that describes the evolution of the compression error vector 𝒛isubscript𝒛𝑖\boldsymbol{z}_{i}bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

𝔼𝒱ϵ1𝒛i2βc,max2v12v22(2ζ2(1+μζσ11)2+2ζ22I𝒥ϵ2+6μ2σ122+20μ2(σ222+σ212)+2μ2v12βs,max2v22)𝔼𝒱ϵ1𝒛i12+βc,max2v12v22(6μ2σ112+20μ2σ212+2μ2v12βs,max2v22)𝔼ϕ¯i1z2+βc,max2v12v22(4ζ2I𝒥ϵ2+6μ2σ122+20μ2σ222+2μ2v12βs,max2v22)𝔼ϕwidechecki1z2+20βc,max2v12v22μ2bwidecheck2+μ2βc,max2v22v14σ¯s2+ζ2v12σ¯c2.𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣222superscript𝜁2superscript1𝜇𝜁subscript𝜎1122superscript𝜁2superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎21220superscript𝜇2subscriptsuperscript𝜎222subscriptsuperscript𝜎2212superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥superscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣226superscript𝜇2subscriptsuperscript𝜎21120superscript𝜇2subscriptsuperscript𝜎2212superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscript¯bold-italic-ϕ𝑧𝑖12superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣224superscript𝜁2superscriptdelimited-∥∥𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎21220superscript𝜇2subscriptsuperscript𝜎2222superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝔼superscriptdelimited-∥∥subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖1220superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣22superscript𝜇2superscriptdelimited-∥∥widecheck𝑏2superscript𝜇2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣22superscriptsubscript𝑣14subscriptsuperscript¯𝜎2𝑠superscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑐\begin{split}&\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}% \leq\\ &{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(2\zeta^{2}(1+\frac{\mu}{\zeta}% \sigma_{11})^{2}+2\zeta^{2}\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+6\mu^{2}% \sigma^{2}_{12}+20\mu^{2}(\sigma^{2}_{22}+\sigma^{2}_{21})+2\mu^{2}v_{1}^{2}% \beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}% \boldsymbol{z}_{i-1}\|^{2}\\ &+{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(6\mu^{2}\sigma^{2}_{11}+20\mu^{2% }\sigma^{2}_{21}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}% \|\overline{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}\\ &+{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(4\zeta^{2}\|I-\mathcal{J}^{% \prime}_{\epsilon}\|^{2}+6\mu^{2}\sigma^{2}_{12}+20\mu^{2}\sigma^{2}_{22}+2\mu% ^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)\mathbb{E}\|\widecheck{% \boldsymbol{\phi}}^{z}_{i-1}\|^{2}\\ &+20{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\mu^{2}\|\widecheck{b}\|^{2}+\mu^{2}% {\beta}_{c,\max}^{2}v_{2}^{2}v_{1}^{4}\overline{\sigma}^{2}_{s}+\zeta^{2}v_{1}% ^{2}\overline{\sigma}^{2}_{c}.\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT + 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 4 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT + 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + 20 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT . end_CELL end_ROW (113)

From (98), (99), and (113), we finally find that 𝔼ϕ¯iz2𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 𝔼ϕwidecheckiz2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2\mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and 𝔼𝒱ϵ1𝒛i2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are coupled and recursively bounded as:

[𝔼ϕ¯iz2𝔼ϕwidecheckiz2𝔼𝒱ϵ1𝒛i2]Γ[𝔼ϕ¯i1z2𝔼ϕwidechecki1z2𝔼𝒱ϵ1𝒛i12]+[lmn],precedes-or-equalsdelimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2Γdelimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖12𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖12delimited-[]𝑙𝑚𝑛\left[\begin{array}[]{c}\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}% \\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}\\ \mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}\end{array}% \right]\preceq\Gamma\left[\begin{array}[]{c}\mathbb{E}\|\overline{\boldsymbol{% \phi}}^{z}_{i-1}\|^{2}\\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}\\ \mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i-1}\|^{2}\end{array}% \right]+\left[\begin{array}[]{c}l\\ m\\ n\end{array}\right],[ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ⪯ roman_Γ [ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL italic_l end_CELL end_ROW start_ROW start_CELL italic_m end_CELL end_ROW start_ROW start_CELL italic_n end_CELL end_ROW end_ARRAY ] , (114)

where ΓΓ\Gammaroman_Γ is the 3×3333\times 33 × 3 matrix given by:

Γ=[abcdefghj],Γdelimited-[]𝑎𝑏𝑐𝑑𝑒𝑓𝑔𝑗\Gamma=\left[\begin{array}[]{ccc}a&b&c\\ d&e&f\\ g&h&j\end{array}\right],roman_Γ = [ start_ARRAY start_ROW start_CELL italic_a end_CELL start_CELL italic_b end_CELL start_CELL italic_c end_CELL end_ROW start_ROW start_CELL italic_d end_CELL start_CELL italic_e end_CELL start_CELL italic_f end_CELL end_ROW start_ROW start_CELL italic_g end_CELL start_CELL italic_h end_CELL start_CELL italic_j end_CELL end_ROW end_ARRAY ] , (115)

with

a𝑎\displaystyle aitalic_a 1μσ11+2μ2v12βs,max2v22=1μσ11+O(μ2),absent1𝜇subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣221𝜇subscript𝜎11𝑂superscript𝜇2\displaystyle\triangleq 1-\mu\sigma_{11}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_% {2}^{2}=1-\mu\sigma_{11}+O(\mu^{2}),≜ 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 - italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (116)
b𝑏\displaystyle bitalic_b 3μσ122σ11+2μ2v12βs,max2v22=O(μ),absent3𝜇superscriptsubscript𝜎122subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝑂𝜇\displaystyle\triangleq\frac{3\mu\sigma_{12}^{2}}{\sigma_{11}}+2\mu^{2}v_{1}^{% 2}\beta_{s,\max}^{2}v_{2}^{2}=O(\mu),≜ divide start_ARG 3 italic_μ italic_σ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( italic_μ ) , (117)
c𝑐\displaystyle citalic_c 3μσ11+3μσ122σ11+2μ2v12βs,max2v22=O(μ),absent3𝜇subscript𝜎113𝜇superscriptsubscript𝜎122subscript𝜎112superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝑂𝜇\displaystyle\triangleq 3\mu\sigma_{11}+\frac{3\mu\sigma_{12}^{2}}{\sigma_{11}% }+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}=O(\mu),≜ 3 italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + divide start_ARG 3 italic_μ italic_σ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( italic_μ ) , (118)
d𝑑\displaystyle ditalic_d 10μ2σ2121𝒥ϵ′′+2μ2v12βs,max2v22=O(μ2),absent10superscript𝜇2superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝑂superscript𝜇2\displaystyle\triangleq\frac{10\mu^{2}\sigma_{21}^{2}}{1-\|\mathcal{J}^{\prime% \prime}_{\epsilon}\|}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}=O(\mu^{2}),≜ divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (119)
e𝑒\displaystyle eitalic_e 𝒥ϵ′′+10μ2σ2221𝒥ϵ′′+2μ2v12βs,max2v22=𝒥ϵ′′+O(μ2),absentnormsubscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎2221normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22normsubscriptsuperscript𝒥′′italic-ϵ𝑂superscript𝜇2\displaystyle\triangleq\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+\frac{10\mu^{% 2}\sigma_{22}^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+2\mu^{2}v_{1}^% {2}\beta_{s,\max}^{2}v_{2}^{2}=\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+O(\mu% ^{2}),≜ ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (120)
f𝑓\displaystyle fitalic_f 2ζ2I𝒥ϵ21𝒥ϵ′′+10μ2(σ222+σ212)1𝒥ϵ′′+2μ2v12βs,max2v22=2ζ2I𝒥ϵ21𝒥ϵ′′+O(μ2),absent2superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ10superscript𝜇2superscriptsubscript𝜎222superscriptsubscript𝜎2121normsubscriptsuperscript𝒥′′italic-ϵ2superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣222superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ𝑂superscript𝜇2\displaystyle\triangleq\frac{2\zeta^{2}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{% 2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+\frac{10\mu^{2}(\sigma_{22}^{% 2}+\sigma_{21}^{2})}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+2\mu^{2}v_{1% }^{2}\beta_{s,\max}^{2}v_{2}^{2}=\frac{2\zeta^{2}\|I-\mathcal{J}^{\prime}_{% \epsilon}\|^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}+O(\mu^{2}),≜ divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (121)
g𝑔\displaystyle gitalic_g βc,max2v12v22(6μ2σ112+20μ2σ212+2μ2v12βs,max2v22)=O(μ2),absentsuperscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣226superscript𝜇2subscriptsuperscript𝜎21120superscript𝜇2subscriptsuperscript𝜎2212superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22𝑂superscript𝜇2\displaystyle\triangleq{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(6\mu^{2}% \sigma^{2}_{11}+20\mu^{2}\sigma^{2}_{21}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_% {2}^{2}\right)=O(\mu^{2}),≜ italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (122)
h\displaystyle hitalic_h βc,max2v12v22(4ζ2I𝒥ϵ2+6μ2σ122+20μ2σ222+2μ2v12βs,max2v22)=4ζ2βc,max2v12v22I𝒥ϵ2+O(μ2),absentsuperscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣224superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ26superscript𝜇2subscriptsuperscript𝜎21220superscript𝜇2subscriptsuperscript𝜎2222superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣224superscript𝜁2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣22superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ2𝑂superscript𝜇2\displaystyle\triangleq{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(4\zeta^{2}% \|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}+6\mu^{2}\sigma^{2}_{12}+20\mu^{2}% \sigma^{2}_{22}+2\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)=4\zeta^{2}% {\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2% }+O(\mu^{2}),≜ italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 4 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 6 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT + 20 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + 2 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = 4 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (123)
j𝑗\displaystyle jitalic_j 2βc,max2v12v22(ζ2(1+μζσ11)2+ζ22I𝒥ϵ2+3μ2σ122+10μ2(σ222+σ212)+μ2v12βs,max2v22)absent2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣22superscript𝜁2superscript1𝜇𝜁subscript𝜎112superscript𝜁2superscriptnorm2𝐼subscriptsuperscript𝒥italic-ϵ23superscript𝜇2subscriptsuperscript𝜎21210superscript𝜇2subscriptsuperscript𝜎222subscriptsuperscript𝜎221superscript𝜇2superscriptsubscript𝑣12superscriptsubscript𝛽𝑠2superscriptsubscript𝑣22\displaystyle\triangleq 2{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left(\zeta^{2}% (1+\frac{\mu}{\zeta}\sigma_{11})^{2}+\zeta^{2}\|2I-\mathcal{J}^{\prime}_{% \epsilon}\|^{2}+3\mu^{2}\sigma^{2}_{12}+10\mu^{2}(\sigma^{2}_{22}+\sigma^{2}_{% 21})+\mu^{2}v_{1}^{2}\beta_{s,\max}^{2}v_{2}^{2}\right)≜ 2 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT + 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
=2ζ2βc,max2v12v22((1+μζσ11)2+2I𝒥ϵ2)+O(μ2),absent2superscript𝜁2superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣22superscript1𝜇𝜁subscript𝜎112superscriptnorm2𝐼subscriptsuperscript𝒥italic-ϵ2𝑂superscript𝜇2\displaystyle=2\zeta^{2}{\beta}_{c,\max}^{2}v_{1}^{2}v_{2}^{2}\left((1+\frac{% \mu}{\zeta}\sigma_{11})^{2}+\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\right)+% O(\mu^{2}),= 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( 1 + divide start_ARG italic_μ end_ARG start_ARG italic_ζ end_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (124)
l𝑙\displaystyle litalic_l μ2v12σ¯s2=O(μ2),absentsuperscript𝜇2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠𝑂superscript𝜇2\displaystyle\triangleq\mu^{2}v_{1}^{2}\overline{\sigma}^{2}_{s}=O(\mu^{2}),≜ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (125)
m𝑚\displaystyle mitalic_m μ2v12σ¯s2+10μ2bwidecheck21𝒥ϵ′′=O(μ2),absentsuperscript𝜇2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠10superscript𝜇2superscriptnormwidecheck𝑏21normsubscriptsuperscript𝒥′′italic-ϵ𝑂superscript𝜇2\displaystyle\triangleq\mu^{2}v_{1}^{2}\overline{\sigma}^{2}_{s}+\frac{10\mu^{% 2}\|\widecheck{b}\|^{2}}{1-\|\mathcal{J}^{\prime\prime}_{\epsilon}\|}=O(\mu^{2% }),≜ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + divide start_ARG 10 italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG = italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (126)
n𝑛\displaystyle nitalic_n ζ2v12σ¯c2+μ2v14v22βc,max2σ¯s2+20βc,max2v12v22μ2bwidecheck2=ζ2v12σ¯c2+O(μ2).absentsuperscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑐superscript𝜇2superscriptsubscript𝑣14superscriptsubscript𝑣22superscriptsubscript𝛽𝑐2subscriptsuperscript¯𝜎2𝑠20superscriptsubscript𝛽𝑐2superscriptsubscript𝑣12superscriptsubscript𝑣22superscript𝜇2superscriptnormwidecheck𝑏2superscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑐𝑂superscript𝜇2\displaystyle\triangleq\zeta^{2}v_{1}^{2}\overline{\sigma}^{2}_{c}+\mu^{2}v_{1% }^{4}v_{2}^{2}\beta_{c,\max}^{2}\overline{\sigma}^{2}_{s}+20\beta_{c,\max}^{2}% v_{1}^{2}v_{2}^{2}\mu^{2}\|\widecheck{b}\|^{2}=\zeta^{2}v_{1}^{2}\overline{% \sigma}^{2}_{c}+O(\mu^{2}).≜ italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + 20 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ overwidecheck start_ARG italic_b end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (127)

If the matrix ΓΓ\Gammaroman_Γ is stable, i.e., ρ(Γ)<1𝜌Γ1\rho(\Gamma)<1italic_ρ ( roman_Γ ) < 1, then by iterating (114), we arrive at:

lim supi[𝔼ϕ¯iz2𝔼ϕwidecheckiz2𝔼𝒱ϵ1𝒛i2](I3Γ)1[lmn].precedes-or-equalssubscriptlimit-supremum𝑖delimited-[]𝔼superscriptnormsuperscriptsubscript¯bold-italic-ϕ𝑖𝑧2𝔼superscriptnormsuperscriptsubscriptwidecheckbold-italic-ϕ𝑖𝑧2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2superscriptsubscript𝐼3Γ1delimited-[]𝑙𝑚𝑛\displaystyle\limsup_{i\rightarrow\infty}\left[\begin{array}[]{c}\mathbb{E}\|% \overline{\boldsymbol{\phi}}_{i}^{z}\|^{2}\\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}_{i}^{z}\|^{2}\\ \mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}\end{array}% \right]\preceq(I_{3}-\Gamma)^{-1}\left[\begin{array}[]{c}l\\ m\\ n\end{array}\right].lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ⪯ ( italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - roman_Γ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_l end_CELL end_ROW start_ROW start_CELL italic_m end_CELL end_ROW start_ROW start_CELL italic_n end_CELL end_ROW end_ARRAY ] . (134)

As we will see in the following, for some given learning problem settings (captured by {σ112,σ122,σ212,σ222,βs,max2,σ¯s2}superscriptsubscript𝜎112superscriptsubscript𝜎122superscriptsubscript𝜎212superscriptsubscript𝜎222subscriptsuperscript𝛽2𝑠subscriptsuperscript¯𝜎2𝑠\{\sigma_{11}^{2},\sigma_{12}^{2},\sigma_{21}^{2},\sigma_{22}^{2},\beta^{2}_{s% ,\max},\overline{\sigma}^{2}_{s}\}{ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , roman_max end_POSTSUBSCRIPT , over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT }), small step-size parameter μ𝜇\muitalic_μ, network topology (captured by the matrix 𝒜𝒜\mathcal{A}caligraphic_A and the variables {v12,v22,𝒥ϵ}superscriptsubscript𝑣12superscriptsubscript𝑣22subscript𝒥italic-ϵ\{v_{1}^{2},v_{2}^{2},\mathcal{J}_{\epsilon}\}{ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT } resulting from its eigendecomposition), and quantizer settings (captured by {βc,max2,σ¯c2}superscriptsubscript𝛽𝑐2subscriptsuperscript¯𝜎2𝑐\{\beta_{c,\max}^{2},\overline{\sigma}^{2}_{c}\}{ italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT }), the stability of ΓΓ\Gammaroman_Γ can be controlled by the dam** coefficient ζ𝜁\zetaitalic_ζ and the mixing parameter γ𝛾\gammaitalic_γ used in steps (18b) and (18c), respectively. Generally speaking, and since the spectral radius of a matrix is upper bounded by its 1limit-from11-1 -norm, the matrix ΓΓ\Gammaroman_Γ is stable if:

ρ(Γ)max{|a|+|d|+|g|,|b|+|e|+|h|,|c|+|f|+|j|}<1.𝜌Γ𝑎𝑑𝑔𝑏𝑒𝑐𝑓𝑗1\rho(\Gamma)\leq\max\{|a|+|d|+|g|,|b|+|e|+|h|,|c|+|f|+|j|\}<1.italic_ρ ( roman_Γ ) ≤ roman_max { | italic_a | + | italic_d | + | italic_g | , | italic_b | + | italic_e | + | italic_h | , | italic_c | + | italic_f | + | italic_j | } < 1 . (135)

Since σ11>0subscript𝜎110\sigma_{11}>0italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT > 0, a sufficiently small μ𝜇\muitalic_μ can make |a|+|d|+|g|𝑎𝑑𝑔|a|+|d|+|g|| italic_a | + | italic_d | + | italic_g | strictly smaller than 1. For |b|+|e|+|h|𝑏𝑒|b|+|e|+|h|| italic_b | + | italic_e | + | italic_h |, observe that if the dam** coefficient ζ𝜁\zetaitalic_ζ and the mixing parameter γ𝛾\gammaitalic_γ are chosen such that:

𝒥ϵ′′+4ζ2v12v22βc,max2I𝒥ϵ2<1,normsubscriptsuperscript𝒥′′italic-ϵ4superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript𝛽2𝑐superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+4\zeta^{2}v_{1}^{2}v_{2}^{2}\beta^{2% }_{c,\max}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}<1,∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + 4 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < 1 , (136)

then |b|+|e|+|h|𝑏𝑒|b|+|e|+|h|| italic_b | + | italic_e | + | italic_h | can be made strictly smaller than 1111 for sufficiently small μ𝜇\muitalic_μ. Finally, for |c|+|f|+|j|𝑐𝑓𝑗|c|+|f|+|j|| italic_c | + | italic_f | + | italic_j |, observe that if the parameters γ𝛾\gammaitalic_γ and ζ𝜁\zetaitalic_ζ are chosen such that:

2ζ2I𝒥ϵ21𝒥ϵ′′+2βc,max2ζ2v12v22(1+2I𝒥ϵ2)<1,2superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ2superscriptsubscript𝛽𝑐2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑣221superscriptnorm2𝐼subscriptsuperscript𝒥italic-ϵ21\frac{2\zeta^{2}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}}{1-\|\mathcal{J}^{% \prime\prime}_{\epsilon}\|}+2{\beta}_{c,\max}^{2}{\zeta^{2}}v_{1}^{2}v_{2}^{2}% \left(1+\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\right)<1,divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < 1 , (137)

then |c|+|f|+|j|𝑐𝑓𝑗|c|+|f|+|j|| italic_c | + | italic_f | + | italic_j | can be made strictly smaller than 1111 for sufficiently small μ𝜇\muitalic_μ. It is therefore clear that the RHS of (135) can be made strictly smaller than 1111 for sufficiently small μ𝜇\muitalic_μ and for a dam** coefficient ζ(0,1]𝜁01\zeta\in(0,1]italic_ζ ∈ ( 0 , 1 ] and mixing parameter γ(0,1]𝛾01\gamma\in(0,1]italic_γ ∈ ( 0 , 1 ] satisfying conditions (136) and (137). In the following, we analyze in details conditions (136) and (137). By following similar arguments as in [6, pp. 516–517], we can establish the following identities on the block diagonal matrices I𝒥ϵ𝐼superscriptsubscript𝒥italic-ϵI-\mathcal{J}_{\epsilon}^{\prime}italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 2I𝒥ϵ2𝐼superscriptsubscript𝒥italic-ϵ2I-\mathcal{J}_{\epsilon}^{\prime}2 italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT appearing in conditions (136) and (137):

I𝒥ϵ2superscriptnorm𝐼superscriptsubscript𝒥italic-ϵ2\displaystyle\|I-\mathcal{J}_{\epsilon}^{\prime}\|^{2}∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(44)γ(I𝒥ϵ)2=γ2I𝒥ϵ2γ2(ρ(I𝒥ϵ)+ϵ)2,italic-(44italic-)superscriptnorm𝛾𝐼subscript𝒥italic-ϵ2superscript𝛾2superscriptnorm𝐼subscript𝒥italic-ϵ2superscript𝛾2superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ2\displaystyle\overset{\eqref{eq: jordan decomposition of A'}}{=}\|\gamma(I-% \mathcal{J}_{\epsilon})\|^{2}=\gamma^{2}\|I-\mathcal{J}_{\epsilon}\|^{2}\leq% \gamma^{2}(\rho(I-\mathcal{J}_{\epsilon})+\epsilon)^{2},start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG ∥ italic_γ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (138)
2I𝒥ϵnorm2𝐼superscriptsubscript𝒥italic-ϵ\displaystyle\|2I-\mathcal{J}_{\epsilon}^{\prime}\|∥ 2 italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ =(44)(1+γ)Iγ𝒥ϵ(1+γ)γ(ρ(𝒥ϵ)+ϵ)(1,1+γ),italic-(44italic-)norm1𝛾𝐼𝛾subscript𝒥italic-ϵ1𝛾𝛾𝜌subscript𝒥italic-ϵitalic-ϵ11𝛾\displaystyle\overset{\eqref{eq: jordan decomposition of A'}}{=}\|(1+\gamma)I-% \gamma\mathcal{J}_{\epsilon}\|\leq(1+\gamma)-\gamma\left(\rho(\mathcal{J}_{% \epsilon})+\epsilon\right)\in(1,1+\gamma),start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG ∥ ( 1 + italic_γ ) italic_I - italic_γ caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ≤ ( 1 + italic_γ ) - italic_γ ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) ∈ ( 1 , 1 + italic_γ ) , (139)

where ρ(I𝒥ϵ)(0,2)𝜌𝐼subscript𝒥italic-ϵ02\rho(I-\mathcal{J}_{\epsilon})\in(0,2)italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∈ ( 0 , 2 ) since ρ(𝒥ϵ)(0,1)𝜌subscript𝒥italic-ϵ01\rho(\mathcal{J}_{\epsilon})\in(0,1)italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∈ ( 0 , 1 ). By using the bounds (95) and (138) into (136), we can upper bound the LHS of (136) by:

𝒥ϵ′′+4v12v22βc,max2ζ2I𝒥ϵ21γζ(1ρ(𝒥ϵ)ϵ)+4v12v22βc,max2(γζ)2(ρ(I𝒥ϵ)+ϵ)2.normsubscriptsuperscript𝒥′′italic-ϵ4superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript𝛽2𝑐superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21𝛾𝜁1𝜌subscript𝒥italic-ϵitalic-ϵ4superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript𝛽2𝑐superscript𝛾𝜁2superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ2\|\mathcal{J}^{\prime\prime}_{\epsilon}\|+4v_{1}^{2}v_{2}^{2}\beta^{2}_{c,\max% }\zeta^{2}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\leq 1-\gamma\zeta(1-\rho(% \mathcal{J}_{\epsilon})-\epsilon)+4v_{1}^{2}v_{2}^{2}\beta^{2}_{c,\max}(\gamma% \zeta)^{2}(\rho(I-\mathcal{J}_{\epsilon})+\epsilon)^{2}.∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ + 4 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 - italic_γ italic_ζ ( 1 - italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) - italic_ϵ ) + 4 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT ( italic_γ italic_ζ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (140)

The upper bound in the above inequality is guaranteed to be strictly smaller than 1111 if:

4v12v22βc,max2(γζ)2(ρ(I𝒥ϵ)+ϵ)2γζ(1ρ(𝒥ϵ)ϵ)<0.4superscriptsubscript𝑣12superscriptsubscript𝑣22subscriptsuperscript𝛽2𝑐superscript𝛾𝜁2superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ2𝛾𝜁1𝜌subscript𝒥italic-ϵitalic-ϵ04v_{1}^{2}v_{2}^{2}\beta^{2}_{c,\max}(\gamma\zeta)^{2}(\rho(I-\mathcal{J}_{% \epsilon})+\epsilon)^{2}-\gamma\zeta(1-\rho(\mathcal{J}_{\epsilon})-\epsilon)<0.4 italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT ( italic_γ italic_ζ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_γ italic_ζ ( 1 - italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) - italic_ϵ ) < 0 . (141)

Now, by using the above condition and the fact that γζ𝛾𝜁\gamma\zetaitalic_γ italic_ζ must be in (0,1]01(0,1]( 0 , 1 ], we obtain condition (72) on γζ𝛾𝜁\gamma\zetaitalic_γ italic_ζ. For the second condition (137), we start by noting that its LHS can be upper bounded by:

2ζ2I𝒥ϵ21𝒥ϵ′′+2βc,max2ζ2v12v22(1+2I𝒥ϵ2)(95),(138),(139)2γζ(ρ(I𝒥ϵ)+ϵ)21(ρ(𝒥ϵ)+ϵ)+2βc,max2ζ2v12v22(1+((1+γ)γ(ρ(𝒥ϵ)+ϵ))2)2superscript𝜁2superscriptnorm𝐼subscriptsuperscript𝒥italic-ϵ21normsubscriptsuperscript𝒥′′italic-ϵ2superscriptsubscript𝛽𝑐2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑣221superscriptdelimited-∥∥2𝐼subscriptsuperscript𝒥italic-ϵ2italic-(95italic-)italic-(138italic-)italic-(139italic-)2𝛾𝜁superscript𝜌𝐼subscript𝒥italic-ϵitalic-ϵ21𝜌subscript𝒥italic-ϵitalic-ϵ2superscriptsubscript𝛽𝑐2superscript𝜁2superscriptsubscript𝑣12superscriptsubscript𝑣221superscript1𝛾𝛾𝜌subscript𝒥italic-ϵitalic-ϵ2\begin{split}&\frac{2\zeta^{2}\|I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}}{1-\|% \mathcal{J}^{\prime\prime}_{\epsilon}\|}+2{\beta}_{c,\max}^{2}\zeta^{2}v_{1}^{% 2}v_{2}^{2}\left(1+\|2I-\mathcal{J}^{\prime}_{\epsilon}\|^{2}\right)\\ &\overset{\eqref{eq: spectral norm of J''},\eqref{eq: bound on I-J epsilon' % biased},\eqref{eq: bound on 2I-J epsilon' biased}}{\leq}2\gamma\zeta\frac{(% \rho(I-\mathcal{J}_{\epsilon})+\epsilon)^{2}}{1-(\rho(\mathcal{J}_{\epsilon})+% \epsilon)}+{2{\beta}_{c,\max}^{2}\zeta^{2}v_{1}^{2}v_{2}^{2}\left(1+\left((1+% \gamma)-\gamma(\rho(\mathcal{J}_{\epsilon})+\epsilon)\right)^{2}\right)}\end{split}start_ROW start_CELL end_CELL start_CELL divide start_ARG 2 italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ∥ caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ end_ARG + 2 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + ∥ 2 italic_I - caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT italic_( italic_) , italic_( italic_) , italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG 2 italic_γ italic_ζ divide start_ARG ( italic_ρ ( italic_I - caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) end_ARG + 2 italic_β start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + ( ( 1 + italic_γ ) - italic_γ ( italic_ρ ( caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) + italic_ϵ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL end_ROW (142)

Thus, (137) is guaranteed to be satisfied under condition (73).

Under conditions (72) and (73), ρ(Γ)<1𝜌Γ1\rho(\Gamma)<1italic_ρ ( roman_Γ ) < 1, and consequently, the matrix ΓΓ\Gammaroman_Γ is stable. Moreover, it holds that888While constants crucial to understanding the algorithm’s behavior are written explicitly in (143) and (144), the other constants that are less significant are encapsulated in the Big O𝑂Oitalic_O notation.:

(IΓ)=[μσ11O(μ)O(μ)O(μ2)1efO(μ2)h1j],𝐼Γdelimited-[]𝜇subscript𝜎11𝑂𝜇𝑂𝜇𝑂superscript𝜇21𝑒𝑓𝑂superscript𝜇21𝑗(I-\Gamma)=\left[\begin{array}[]{ccc}\mu\sigma_{11}&O(\mu)&O(\mu)\\ O(\mu^{2})&1-e&f\\ O(\mu^{2})&h&1-j\end{array}\right],( italic_I - roman_Γ ) = [ start_ARRAY start_ROW start_CELL italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_O ( italic_μ ) end_CELL start_CELL italic_O ( italic_μ ) end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL start_CELL 1 - italic_e end_CELL start_CELL italic_f end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL start_CELL italic_h end_CELL start_CELL 1 - italic_j end_CELL end_ROW end_ARRAY ] , (143)

and:

(IΓ)1=[1μσ11O(1)O(1)O(μ)O(1)O(1)O(μ)O(1)O(1)].superscript𝐼Γ1delimited-[]1𝜇subscript𝜎11𝑂1𝑂1𝑂𝜇𝑂1𝑂1𝑂𝜇𝑂1𝑂1(I-\Gamma)^{-1}=\left[\begin{array}[]{ccc}\frac{1}{\mu\sigma_{11}}&O(1)&O(1)\\ O(\mu)&O(1)&O(1)\\ O(\mu)&O(1)&O(1)\end{array}\right].( italic_I - roman_Γ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_μ italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL italic_O ( 1 ) end_CELL start_CELL italic_O ( 1 ) end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ ) end_CELL start_CELL italic_O ( 1 ) end_CELL start_CELL italic_O ( 1 ) end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ ) end_CELL start_CELL italic_O ( 1 ) end_CELL start_CELL italic_O ( 1 ) end_CELL end_ROW end_ARRAY ] . (144)

Now, using (125), (126), (127), and (144) into (134), we arrive at:

lim supi[𝔼ϕ¯iz2𝔼ϕwidecheckiz2𝔼𝒱ϵ1𝒛i2][μv12σ¯s2σ11+O(μ2)+σ¯c2O(1)O(μ2)+σ¯c2O(1)O(μ2)+σ¯c2O(1)].precedes-or-equalssubscriptlimit-supremum𝑖delimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2delimited-[]𝜇superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑠subscript𝜎11𝑂superscript𝜇2subscriptsuperscript¯𝜎2𝑐𝑂1𝑂superscript𝜇2subscriptsuperscript¯𝜎2𝑐𝑂1𝑂superscript𝜇2subscriptsuperscript¯𝜎2𝑐𝑂1\displaystyle\limsup_{i\rightarrow\infty}\left[\begin{array}[]{c}\mathbb{E}\|% \overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}\\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}\\ \mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}\end{array}% \right]\preceq\left[\begin{array}[]{c}\mu v_{1}^{2}\frac{\overline{\sigma}^{2}% _{s}}{\sigma_{11}}+O(\mu^{2})+\overline{\sigma}^{2}_{c}O(1)\\ O(\mu^{2})+\overline{\sigma}^{2}_{c}O(1)\\ O(\mu^{2})+\overline{\sigma}^{2}_{c}O(1)\end{array}\right].lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT [ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ⪯ [ start_ARRAY start_ROW start_CELL italic_μ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) end_CELL end_ROW start_ROW start_CELL italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) end_CELL end_ROW end_ARRAY ] . (151)

By noting that:

lim supi𝔼ϕ~iz2=lim supi𝔼𝒱ϵ(𝒱ϵ1ϕ~iz)2subscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖2subscriptlimit-supremum𝑖𝔼superscriptnormsubscript𝒱italic-ϵsuperscriptsubscript𝒱italic-ϵ1subscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{% \phi}}^{z}_{i}\|^{2}=\limsup_{i\rightarrow\infty}\mathbb{E}\|\mathcal{V}_{% \epsilon}(\mathcal{V}_{\epsilon}^{-1}\widetilde{\boldsymbol{\phi}}^{z}_{i})\|^% {2}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT lim supi𝒱ϵ2[𝔼ϕ¯iz2+𝔼ϕwidecheckiz2]absentsubscriptlimit-supremum𝑖superscriptnormsubscript𝒱italic-ϵ2delimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2\displaystyle\leq\limsup_{i\rightarrow\infty}\|\mathcal{V}_{\epsilon}\|^{2}% \left[\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}+\mathbb{E}\|% \widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}\right]≤ lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=(151)κμ+O(μ2)+σ¯c2O(1),italic-(151italic-)𝜅𝜇𝑂superscript𝜇2subscriptsuperscript¯𝜎2𝑐𝑂1\displaystyle\overset{\eqref{eq: single inequality recursion steady state 3 % new biased}}{=}{\kappa\mu+O(\mu^{2})+\overline{\sigma}^{2}_{c}O(1)},start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG italic_κ italic_μ + italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) , (152)

we can finally conclude that (74) holds. Result (75) follows from (152) by replacing σ¯c2O(1)subscriptsuperscript¯𝜎2𝑐𝑂1\overline{\sigma}^{2}_{c}O(1)over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) by O(μ1+ε)𝑂superscript𝜇1𝜀O(\mu^{1+\varepsilon})italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ).

To establish (76), we first show that the mean-square difference between the trajectories {ϕ~iz,𝓦~i}subscriptsuperscript~bold-italic-ϕ𝑧𝑖subscript~𝓦𝑖\{\widetilde{\boldsymbol{\phi}}^{z}_{i},\widetilde{\boldsymbol{{\scriptstyle% \mathcal{W}}}}_{i}\}{ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is asymptotically bounded by O(μ1+ε)𝑂superscript𝜇1𝜀O(\mu^{1+\varepsilon})italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ). By subtracting 𝓦~isubscript~𝓦𝑖\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i}over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ϕ~izsubscriptsuperscript~bold-italic-ϕ𝑧𝑖\widetilde{\boldsymbol{\phi}}^{z}_{i}over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we can write:

lim supi𝔼𝓦~iϕ~iz2subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{{% \scriptstyle\mathcal{W}}}}_{i}-\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(33)lim supi𝔼𝒜ϕ~iϕ~iz2italic-(33italic-)subscriptlimit-supremum𝑖𝔼superscriptnormsuperscript𝒜subscript~bold-italic-ϕ𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle\overset{\eqref{eq: wt in terms of phi}}{=}\limsup_{i\rightarrow% \infty}\mathbb{E}\|\mathcal{A}^{\prime}\widetilde{\boldsymbol{\phi}}_{i}-% \widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=lim supi𝔼𝒜(ϕ~iz+𝒛i)ϕ~iz2absentsubscriptlimit-supremum𝑖𝔼superscriptnormsuperscript𝒜subscriptsuperscript~bold-italic-ϕ𝑧𝑖subscript𝒛𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle=\limsup_{i\rightarrow\infty}\mathbb{E}\|\mathcal{A}^{\prime}(% \widetilde{\boldsymbol{\phi}}^{z}_{i}+\boldsymbol{z}_{i})-\widetilde{% \boldsymbol{\phi}}^{z}_{i}\|^{2}= lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ caligraphic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=lim supi𝔼𝒱ϵ(ΛϵIM)𝒱ϵ1ϕ~iz+𝒱ϵΛϵ𝒱ϵ1𝒛i2absentsubscriptlimit-supremum𝑖𝔼superscriptnormsubscript𝒱italic-ϵsuperscriptsubscriptΛitalic-ϵsubscript𝐼𝑀superscriptsubscript𝒱italic-ϵ1subscriptsuperscript~bold-italic-ϕ𝑧𝑖subscript𝒱italic-ϵsuperscriptsubscriptΛitalic-ϵsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2\displaystyle=\limsup_{i\rightarrow\infty}\mathbb{E}\left\|\mathcal{V}_{% \epsilon}(\Lambda_{\epsilon}^{\prime}-I_{M})\mathcal{V}_{\epsilon}^{-1}% \widetilde{\boldsymbol{\phi}}^{z}_{i}+\mathcal{V}_{\epsilon}\Lambda_{\epsilon}% ^{\prime}\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\right\|^{2}= lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( roman_Λ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(43),(44)lim supi𝔼𝒱R,ϵ(𝒥ϵI)ϕwidecheckiz+𝒱ϵΛϵ𝒱ϵ1𝒛i2italic-(43italic-)italic-(44italic-)subscriptlimit-supremum𝑖𝔼superscriptnormsubscript𝒱𝑅italic-ϵsubscriptsuperscript𝒥italic-ϵ𝐼subscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖subscript𝒱italic-ϵsuperscriptsubscriptΛitalic-ϵsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2\displaystyle\overset{{\eqref{eq: jordan decomposition of A},}\eqref{eq: % jordan decomposition of A'}}{=}\limsup_{i\rightarrow\infty}\mathbb{E}\left\|% \mathcal{V}_{R,\epsilon}(\mathcal{J}^{\prime}_{\epsilon}-I)\widecheck{% \boldsymbol{\phi}}^{z}_{i}+\mathcal{V}_{\epsilon}\Lambda_{\epsilon}^{\prime}% \mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\right\|^{2}start_OVERACCENT italic_( italic_) , italic_( italic_) end_OVERACCENT start_ARG = end_ARG lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_R , italic_ϵ end_POSTSUBSCRIPT ( caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT - italic_I ) overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
lim supi[2𝒱R,ϵ(𝒥ϵI)2𝔼ϕwidecheckiz2+2𝒱ϵΛϵ2𝔼𝒱ϵ1𝒛i2].absentsubscriptlimit-supremum𝑖delimited-[]2superscriptnormsubscript𝒱𝑅italic-ϵsubscriptsuperscript𝒥italic-ϵ𝐼2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖22superscriptnormsubscript𝒱italic-ϵsubscriptsuperscriptΛitalic-ϵ2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2\displaystyle\leq\limsup_{i\rightarrow\infty}\left[2\|\mathcal{V}_{R,\epsilon}% (\mathcal{J}^{\prime}_{\epsilon}-I)\|^{2}\mathbb{E}\|\widecheck{\boldsymbol{% \phi}}^{z}_{i}\|^{2}+2\|\mathcal{V}_{\epsilon}\Lambda^{\prime}_{\epsilon}\|^{2% }\mathbb{E}\left\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\right\|^{2}% \right].≤ lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT [ 2 ∥ caligraphic_V start_POSTSUBSCRIPT italic_R , italic_ϵ end_POSTSUBSCRIPT ( caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT - italic_I ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (153)

Now, using (151) with σ¯c2O(1)subscriptsuperscript¯𝜎2𝑐𝑂1\overline{\sigma}^{2}_{c}O(1)over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ) replaced by O(μ1+ε)𝑂superscript𝜇1𝜀O(\mu^{1+\varepsilon})italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ), we can conclude that:

lim supi𝔼𝓦~iϕ~iz2=O(μ1+ε).subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝑂superscript𝜇1𝜀\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle% \mathcal{W}}}}_{i}-\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}=O(\mu^{1+% \varepsilon}).lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ) . (154)

Finally, note that:

𝔼𝓦~i2𝔼superscriptnormsubscript~𝓦𝑖2\displaystyle\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i% }\|^{2}blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝔼𝓦~iϕ~iz+ϕ~iz2absent𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle=\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{% i}-\widetilde{\boldsymbol{\phi}}^{z}_{i}+\widetilde{\boldsymbol{\phi}}^{z}_{i}% \|^{2}= blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝔼𝓦~iϕ~iz2+𝔼ϕ~iz2+2|𝔼(𝓦~iϕ~iz)ϕ~iz|absent𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖22𝔼superscriptsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖topsubscriptsuperscript~bold-italic-ϕ𝑧𝑖\displaystyle\leq\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}% }_{i}-\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}+\mathbb{E}\|\widetilde{% \boldsymbol{\phi}}^{z}_{i}\|^{2}+2|\mathbb{E}(\widetilde{\boldsymbol{{% \scriptstyle\mathcal{W}}}}_{i}-\widetilde{\boldsymbol{\phi}}^{z}_{i})^{\top}% \widetilde{\boldsymbol{\phi}}^{z}_{i}|≤ blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 | blackboard_E ( over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
(a)𝔼𝓦~iϕ~iz2+𝔼ϕ~iz2+2𝔼𝓦~iϕ~iz2𝔼ϕ~iz2,(a)𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖22𝔼superscriptnormsubscript~𝓦𝑖subscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖2\displaystyle\overset{\text{(a)}}{\leq}\mathbb{E}\|\widetilde{\boldsymbol{{% \scriptstyle\mathcal{W}}}}_{i}-\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}+% \mathbb{E}\|\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}+2\sqrt{\mathbb{E}\|% \widetilde{\boldsymbol{{\scriptstyle\mathcal{W}}}}_{i}-\widetilde{\boldsymbol{% \phi}}^{z}_{i}\|^{2}\mathbb{E}\|\widetilde{\boldsymbol{\phi}}^{z}_{i}\|^{2}},over(a) start_ARG ≤ end_ARG blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 square-root start_ARG blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (155)

and, hence, from the sub-additivity property of the limit superior, and from (75) and (154), we get:

lim supi𝔼𝓦~i2lim supi𝔼ϕ~iz2+O(μ1+ε2),subscriptlimit-supremum𝑖𝔼superscriptnormsubscript~𝓦𝑖2subscriptlimit-supremum𝑖𝔼superscriptnormsubscriptsuperscript~bold-italic-ϕ𝑧𝑖2𝑂superscript𝜇1𝜀2{\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde{\boldsymbol{{\scriptstyle% \mathcal{W}}}}_{i}\|^{2}\leq\limsup_{i\rightarrow\infty}\mathbb{E}\|\widetilde% {\boldsymbol{\phi}}^{z}_{i}\|^{2}+O(\mu^{1+\frac{\varepsilon}{2}}),}lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ over~ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_μ start_POSTSUPERSCRIPT 1 + divide start_ARG italic_ε end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) , (156)

which establishes (76). In step (a), we used |𝔼𝒙|𝔼|𝒙|𝔼𝒙𝔼𝒙|\mathbb{E}\boldsymbol{x}|\leq\mathbb{E}|\boldsymbol{x}|| blackboard_E bold_italic_x | ≤ blackboard_E | bold_italic_x | from Jensen’s inequality and we applied Holder’s inequality, namely, 𝔼|𝒙𝒚|(𝔼|𝒙|p)1p(𝔼|𝒚|q)1q𝔼superscript𝒙top𝒚superscript𝔼superscript𝒙𝑝1𝑝superscript𝔼superscript𝒚𝑞1𝑞\mathbb{E}|\boldsymbol{x}^{\top}\boldsymbol{y}|\leq(\mathbb{E}|\boldsymbol{x}|% ^{p})^{\frac{1}{p}}(\mathbb{E}|\boldsymbol{y}|^{q})^{\frac{1}{q}}blackboard_E | bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y | ≤ ( blackboard_E | bold_italic_x | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT ( blackboard_E | bold_italic_y | start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT when 1/p+1/q=11𝑝1𝑞11/p+1/q=11 / italic_p + 1 / italic_q = 1, with p=q=2𝑝𝑞2p=q=2italic_p = italic_q = 2.

The analysis can be simplified in settings where the compression operators {𝓒k}subscript𝓒𝑘\{\boldsymbol{\cal{C}}_{k}\}{ bold_caligraphic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } are such that their relative compression noise terms βc,k2=0subscriptsuperscript𝛽2𝑐𝑘0\beta^{2}_{c,k}=0italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_k end_POSTSUBSCRIPT = 0, kfor-all𝑘\forall k∀ italic_k. In fact, in such settings, we can replace βc,max2subscriptsuperscript𝛽2𝑐\beta^{2}_{c,\max}italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , roman_max end_POSTSUBSCRIPT in (113) by 00, and use the resulting inequality 𝔼𝒱ϵ1𝒛i2ζ2v12σ¯c2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2superscript𝜁2superscriptsubscript𝑣12subscriptsuperscript¯𝜎2𝑐\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}\leq\zeta^{2}v_% {1}^{2}\overline{\sigma}^{2}_{c}blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT directly into (98) and (99), without the need to study the evolution of 𝔼𝒱ϵ1𝒛i2𝔼superscriptnormsuperscriptsubscript𝒱italic-ϵ1subscript𝒛𝑖2\mathbb{E}\|\mathcal{V}_{\epsilon}^{-1}\boldsymbol{z}_{i}\|^{2}blackboard_E ∥ caligraphic_V start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as in (114). By doing so, we find that the variances of ϕ¯izsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖\overline{\boldsymbol{\phi}}^{z}_{i}over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ϕwidecheckizsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖\widecheck{\boldsymbol{\phi}}^{z}_{i}overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are coupled and recursively bounded as:

[𝔼ϕ¯iz2𝔼ϕwidecheckiz2][abde][𝔼ϕ¯i1z2𝔼ϕwidechecki1z2]+[lm],precedes-or-equalsdelimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖2𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖2delimited-[]𝑎𝑏𝑑𝑒delimited-[]𝔼superscriptnormsubscriptsuperscript¯bold-italic-ϕ𝑧𝑖12𝔼superscriptnormsubscriptsuperscriptwidecheckbold-italic-ϕ𝑧𝑖12delimited-[]superscript𝑙superscript𝑚\left[\begin{array}[]{c}\mathbb{E}\|\overline{\boldsymbol{\phi}}^{z}_{i}\|^{2}% \\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i}\|^{2}\end{array}\right]% \preceq\left[\begin{array}[]{cc}a&b\\ d&e\end{array}\right]\left[\begin{array}[]{c}\mathbb{E}\|\overline{\boldsymbol% {\phi}}^{z}_{i-1}\|^{2}\\ \mathbb{E}\|\widecheck{\boldsymbol{\phi}}^{z}_{i-1}\|^{2}\end{array}\right]+% \left[\begin{array}[]{c}l^{\prime}\\ m^{\prime}\end{array}\right],[ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] ⪯ [ start_ARRAY start_ROW start_CELL italic_a end_CELL start_CELL italic_b end_CELL end_ROW start_ROW start_CELL italic_d end_CELL start_CELL italic_e end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL blackboard_E ∥ over¯ start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL blackboard_E ∥ overwidecheck start_ARG bold_italic_ϕ end_ARG start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] + [ start_ARRAY start_ROW start_CELL italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ] , (157)

where l=l+σ¯c2O(μ)superscript𝑙𝑙subscriptsuperscript¯𝜎2𝑐𝑂𝜇l^{\prime}=l+\overline{\sigma}^{2}_{c}O(\mu)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_l + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( italic_μ ), and m=m+σ¯c2O(1)superscript𝑚𝑚subscriptsuperscript¯𝜎2𝑐𝑂1m^{\prime}=m+\overline{\sigma}^{2}_{c}O(1)italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_m + over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_O ( 1 ). By setting γ=ζ=1𝛾𝜁1\gamma=\zeta=1italic_γ = italic_ζ = 1 (so that 𝒥ϵ′′=𝒥ϵsubscriptsuperscript𝒥′′italic-ϵsubscript𝒥italic-ϵ\mathcal{J}^{\prime\prime}_{\epsilon}=\mathcal{J}_{\epsilon}caligraphic_J start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = caligraphic_J start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT), and by using similar arguments as in (134)–(152), we can establish the mean-square-error stability for a sufficiently small step-size μ𝜇\muitalic_μ and obtain the performance result (74).

Appendix C Bit Rate stability

Equation (80) follows from Lemma 1 and Table II (row 3333, col. 33334444). Invoking similar arguments to the ones used to establish Theorem 2 in [26], the individual summands in (78) can be upper bounded by:

2+log2(ln(1+ωη𝔼𝝌k,i2)2ln(ω+1+ω2)+2).2subscript21𝜔𝜂𝔼superscriptnormsubscript𝝌𝑘𝑖22𝜔1superscript𝜔222+\log_{2}\left(\frac{\ln\left(1+\displaystyle{\frac{\omega}{\eta}}\sqrt{% \mathbb{E}\|\boldsymbol{\chi}_{k,i}\|^{2}}\right)}{2\ln\left(\omega+\sqrt{1+% \omega^{2}}\right)}+2\right).2 + roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_ln ( 1 + divide start_ARG italic_ω end_ARG start_ARG italic_η end_ARG square-root start_ARG blackboard_E ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG start_ARG 2 roman_ln ( italic_ω + square-root start_ARG 1 + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG + 2 ) . (158)

By taking the limit superior of (112) as i𝑖i\rightarrow\inftyitalic_i → ∞ and by using (151) and (80), we obtain:

lim supi𝔼𝝌k,i2κ1μ1+ε+κ2μ2,subscriptlimit-supremum𝑖𝔼superscriptnormsubscript𝝌𝑘𝑖2subscript𝜅1superscript𝜇1𝜀subscript𝜅2superscript𝜇2\limsup_{i\rightarrow\infty}\mathbb{E}\|\boldsymbol{\chi}_{k,i}\|^{2}\leq% \kappa_{1}\mu^{1+\varepsilon}+\kappa_{2}\mu^{2},lim sup start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT blackboard_E ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT + italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (159)

for sufficiently small μ𝜇\muitalic_μ, and where κ1subscript𝜅1\kappa_{1}italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are some positive constants independent of μ𝜇\muitalic_μ. Applying the limit superior to (158), using the fact that (158) is a continuous and increasing function in the argument 𝔼𝝌k,i2𝔼superscriptnormsubscript𝝌𝑘𝑖2\mathbb{E}\|\boldsymbol{\chi}_{k,i}\|^{2}blackboard_E ∥ bold_italic_χ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and using (159), in view of (80), we find that each summand in (78) is O(1)𝑂1O(1)italic_O ( 1 ), which in turn implies (81).

References

  • [1] R. Nassif, S. Vlaski, M. Carpentiero, V. Matta, and A. H. Sayed, “Differential error feedback for communication-efficient decentralized optimization,” in Proc. IEEE Sens. Array Multichannel Signal Process. Workshop, Corvallis, OR, USA, Jul. 2024.
  • [2] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Int. Conf. Artif. Intell. Stat., Ft. Lauderdale, FL, USA, 2017, vol. 54, pp. 1273–1282.
  • [3] T. Li, A. K. Sahu, A. S. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, pp. 50–60, May 2020.
  • [4] D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 1709–1720.
  • [5] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, Dec. 2017, vol. 30.
  • [6] A. H. Sayed, “Adaptation, learning, and optimization over networks,” Found. Trends Mach. Learn., vol. 7, no. 4-5, pp. 311–801, 2014.
  • [7] A. H. Sayed, “Adaptive networks,” Proc. IEEE, vol. 102, no. 4, pp. 460–497, 2014.
  • [8] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. Towfic, “Diffusion strategies for adaptation and learning over networks,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 155–171, May 2013.
  • [9] R. Nassif, S. Vlaski, C. Richard, J. Chen, and A. H. Sayed, “Multitask learning over graphs: An approach for distributed, streaming machine learning,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 14–25, 2020.
  • [10] D. Kovalev, A. Koloskova, M. Jaggi, P. Richtarik, and S. Stich, “A linearly convergent algorithm for decentralized optimization: Sending less bits for free!,” in Proc. Int. Conf. Artif. Intell. Stat., Virtual, 2021, pp. 4087–4095.
  • [11] A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” in Proc. Int. Conf. Mach. Learn., 2019, vol. 97, pp. 3478–3487.
  • [12] R. Nassif, S. Vlaski, and A. H. Sayed, “Adaptation and learning over networks under subspace constraints–Part II: Performance analysis,” IEEE Trans. Signal Process., vol. 68, pp. 2948–2962, 2020.
  • [13] X. Cao, T. Başar, S. Diggavi, Y. C. Eldar, K. B. Letaief, H. V. Poor, and J. Zhang, “Communication-efficient distributed learning: An overview,” IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 851–873, 2023.
  • [14] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” in Proc. Int. Conf. Mach. Learn., Virtual, Jul. 2020, vol. 119, pp. 5381–5393.
  • [15] Y. Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentralized gradient tracking with local steps,” Available as arXiv:2301.01313v1, 2023.
  • [16] T. C. Aysal, M. J. Coates, and M. G. Rabbat, “Distributed average consensus with dithered quantization,” IEEE Trans. Signal Process., vol. 56, no. 10, pp. 4905–4918, 2008.
  • [17] A. Beznosikov, S. Horvath, P. Richtárik, and M. H. Safaryan, “On biased compression for distributed learning,” J. Mach. Learn. Res., vol. 24, no. 276, pp. 1–50, 2023.
  • [18] A. Nedic, A. Olshevsky, A. Ozdaglar, and J. N. Tsitsiklis, “Distributed subgradient methods and quantization effects,” in Proc. IEEE Conf. Decis. Control, Cancun, Mexico, Dec. 2008, pp. 4177–4184.
  • [19] X. Zhao, S.-Y. Tu, and A. H. Sayed, “Diffusion adaptation over networks under imperfect information exchange and non-stationary data,” IEEE Trans. Signal Process., vol. 60, no. 7, pp. 3460–3475, 2012.
  • [20] D. Thanou, E. Kokiopoulou, Y. Pu, and P. Frossard, “Distributed average consensus with quantization refinement,” IEEE Trans. Signal Process., vol. 61, no. 1, pp. 194–205, 2013.
  • [21] M. Carpentiero, V. Matta, and A. H. Sayed, “Distributed adaptive learning under communication constraints,” IEEE Open J. Signal Process., vol. 5, pp. 321–358, 2024.
  • [22] M. Carpentiero, V. Matta, and A. H. Sayed, “Compressed regression over adaptive networks,” To appear in IEEE Trans. Signal Inf. Process. Netw.. Available as arXiv:2304.03638v1, 2024.
  • [23] A. Reisizadeh, A. Mokhtari, H. Hassani, and R. Pedarsani, “An exact quantized decentralized gradient descent algorithm,” IEEE Trans. Signal Process., vol. 67, no. 19, pp. 4934–4947, 2019.
  • [24] H. Taheri, A. Mokhtari, H. Hassani, and R. Pedarsani, “Quantized decentralized stochastic learning over directed graphs,” in Proc. Int. Conf. Mach. Learn., Jul. 2020, vol. 119, pp. 9324–9333.
  • [25] N. Michelusi, G. Scutari, and C.-S. Lee, “Finite-bit quantization for distributed algorithms with linear convergence,” IEEE Trans. Inf. Theory., vol. 68, no. 11, pp. 7254–7280, 2022.
  • [26] R. Nassif, S. Vlaski, M. Carpentiero, V. Matta, M. Antonini, and A. H. Sayed, “Quantization for decentralized learning under subspace constraints,” IEEE Trans. Signal Process., vol. 71, pp. 2320–2335, 2023.
  • [27] H. Zhao, B. Li, Z. Li, P. Richtarik, and Y. Chi, “BEER: Fast O(1/T) rate for decentralized nonconvex optimization with communication compression,” in Proc. Adv. Neural Inf. Process. Syst., New Orleans, Louisiana, USA, 2022, vol. 35, pp. 31653–31667.
  • [28] N. Singh, X. Cao, S. Diggavi, and T. Başar, “Decentralized multi-task stochastic optimization with compressed communications,” Automatica, vol. 159, pp. 111363, 2024.
  • [29] N. Singh, D. Data, J. George, and S. Diggavi, “SPARQ-SGD: Event-triggered and compressed communication in decentralized optimization,” IEEE Trans. Automat. Contr., vol. 68, no. 2, pp. 721–736, 2023.
  • [30] N. Singh, D. Data, J. George, and S. Diggavi, “SQuARM-SGD: Communication-efficient momentum SGD for decentralized optimization,” in Proc. IEEE Int. Symp. Inf. Theory, Melbourne, Victoria, Australia, 2021, pp. 1212–1217.
  • [31] H. Tang, S. Gan, C. Zhang, T. Zhang, and J. Liu, “Communication compression for decentralized training,” in Proc. Adv. Neural Inf. Process. Syst., Montreal, Canada, Dec. 2018, vol. 31, pp. 7663–7673.
  • [32] S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedback fixes SignSGD and other gradient compression schemes,” in Proc. Int. Conf. Mach. Learn., Long Beach, CA, USA, Jun. 2019, vol. 97, pp. 3252–3261.
  • [33] H. Tang, X. Lian, S. Qiu, L. Yuan, C. Zhang, T. Zhang, and J. Liu, “DeepSqueeze: Decentralization meets error-compensated compression,” Available as arXiv:1907.07346, 2019.
  • [34] A. H. Sayed, Inference and Learning from Data, 3 vols., Cambridge University Press, 2022.
  • [35] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Automat. Contr., vol. 54, no. 1, pp. 48–61, Jan. 2009.
  • [36] D. P. Bertsekas, “A new class of incremental gradient methods for least squares problems,” SIAM J. Optim, vol. 7, no. 4, pp. 913–926, 1997.
  • [37] A. G. Dimakis, S. Kar, J. M. F. Moura, M. G. Rabbat, and A. Scaglione, “Gossip algorithms for distributed signal processing,” Proc. IEEE, vol. 98, no. 11, pp. 1847–1864, 2010.
  • [38] V. Kekatos and G. B. Giannakis, “Distributed robust power system state estimation,” IEEE Trans. Power Syst., vol. 28, no. 2, pp. 1617–1626, 2013.
  • [39] R. Nassif, S. Vlaski, C. Richard, and A. H. Sayed, “Learning over multitask graphs–Part I: Stability analysis,” IEEE Open Journal of Signal Processing, vol. 1, pp. 28–45, 2020.
  • [40] R. Nassif, S. Vlaski, and A. H. Sayed, “Adaptation and learning over networks under subspace constraints–Part I: Stability analysis,” IEEE Trans. Signal Process., vol. 68, pp. 1346–1360, 2020.
  • [41] J. Plata-Chaves, A. Bertrand, M. Moonen, S. Theodoridis, and A. M. Zoubir, “Heterogeneous and multitask wireless sensor networks – Algorithms, applications, and challenges,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 3, pp. 450–465, 2017.
  • [42] P. Di Lorenzo, S. Barbarossa, and S. Sardellitti, “Distributed signal processing and optimization based on in-network subspace projections,” IEEE Trans. Signal Process., vol. 68, pp. 2061–2076, 2020.
  • [43] J. F. C. Mota, J. M. F. Xavier, P. M. Q. Aguiar, and M. Püschel, “Distributed optimization with local domains: Applications in MPC and network flows,” IEEE Trans. Automat. Contr., vol. 60, no. 7, pp. 2004–2009, 2015.
  • [44] S. A. Alghunaim and A. H. Sayed, “Distributed coupled multiagent stochastic optimization,” IEEE Trans. Automat. Contr., vol. 65, no. 1, pp. 175–190, 2020.
  • [45] S. U. Stich, J.-B. Cordonnier, and M. Jaggi, “Sparsified SGD with memory,” in Proc. Adv. Neural Inf. Process. Syst., Montréal, Canada, 2018, pp. 4452–4463.
  • [46] D. Basu, D. Data, C. Karakus, and S. Diggavi, “Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, Canada, 2019, pp. 14695–14706.
  • [47] B. T. Polyak, Introduction to Optimization, Optimization Software, New York, 1987.
  • [48] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 2012.