License: CC BY-NC-ND 4.0
arXiv:2312.16505v1 [math.NA] 27 Dec 2023

Asynchronous iterations of HSS method for non-Hermitian linear systems

Guillaume Gbikpi-Benissan Université Paris-Saclay, CentraleSupélec, Gif-sur-Yvette, France ([email protected]).    Qinmeng Zou Université Paris-Saclay, CentraleSupélec, Gif-sur-Yvette, France ([email protected]).    Frédéric Magoulès Université Paris-Saclay, CentraleSupélec, Gif-sur-Yvette, France (correspondence, [email protected]).
Abstract

A general asynchronous alternating iterative model is designed, for which convergence is theoretically ensured both under classical spectral radius bound and, then, for a classical class of matrix splittings for 𝖧𝖧\mathsf{H}sansserif_H-matrices. The computational model can be thought of as a two-stage alternating iterative method, which well suits to the well-known Hermitian and skew-Hermitian splitting (HSS) approach, with the particularity here of considering only one inner iteration. Experimental parallel performance comparison is conducted between the generalized minimal residual (GMRES) algorithm, the standard HSS and our asynchronous variant, on both real and complex non-Hermitian linear systems respectively arising from convection-diffusion and structural dynamics problems. A significant gain on execution time is observed in both cases.

Keywords: Asynchronous iterations; alternating iterations; Hermitian and skew-Hermitian splitting; non-Hermitian problems; parallel computing

1 Introduction

Many applications in scientific computing and engineering lead to the following system of linear equations,

Ax=b,An×n,bn.formulae-sequence𝐴𝑥𝑏formulae-sequence𝐴superscript𝑛𝑛𝑏superscript𝑛Ax=b,\quad A\in\mathbb{C}^{n\times n},\quad b\in\mathbb{C}^{n}.italic_A italic_x = italic_b , italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT , italic_b ∈ blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . (1)

Let A=MN𝐴𝑀𝑁A=M-Nitalic_A = italic_M - italic_N and A=FG𝐴𝐹𝐺A=F-Gitalic_A = italic_F - italic_G be two splittings of A𝐴Aitalic_A with M𝑀Mitalic_M and F𝐹Fitalic_F being nonsingular. The alternating iterative scheme for solving (1) is defined as follows,

{Mxk+12=Nxk+b,Fxk+1=Gxk+12+b,cases𝑀superscript𝑥𝑘12𝑁superscript𝑥𝑘𝑏𝐹superscript𝑥𝑘1𝐺superscript𝑥𝑘12𝑏\left\{\begin{array}[]{lcl}Mx^{k+\frac{1}{2}}&=&Nx^{k}+b,\\ Fx^{k+1}&=&Gx^{k+\frac{1}{2}}+b,\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_M italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL italic_N italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_b , end_CELL end_ROW start_ROW start_CELL italic_F italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL italic_G italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_b , end_CELL end_ROW end_ARRAY (2)

which can be viewed as a stationary iterative scheme with an iteration matrix F1GM1Nsuperscript𝐹1𝐺superscript𝑀1𝑁F^{-1}GM^{-1}Nitalic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_G italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_N. Well-known early examples include the symmetric successive over-relaxation (SSOR) method [43, 17] and the alternating direction implicit (ADI) methods [40, 19, 38]. In [12] the convergence of some alternating iterations were analyzed by eliminating the intermediate solution term xk+1/2superscript𝑥𝑘12x^{k+1/2}italic_x start_POSTSUPERSCRIPT italic_k + 1 / 2 end_POSTSUPERSCRIPT from (2); see also [1]. Recently, there has been growing interest in studies of the Hermitian and skew-Hermitian splitting (HSS) method [5] for solving (1) when A𝐴Aitalic_A is non-Hermitian. Let α>0𝛼0\alpha>0italic_α > 0 be a given constant. The HSS method can be written in the form

{(αI+H)xk+12=(αIS)xk+b,(αI+S)xk+1=(αIH)xk+12+b,cases𝛼𝐼𝐻superscript𝑥𝑘12𝛼𝐼𝑆superscript𝑥𝑘𝑏𝛼𝐼𝑆superscript𝑥𝑘1𝛼𝐼𝐻superscript𝑥𝑘12𝑏\left\{\begin{array}[]{lcl}(\alpha I+H)x^{k+\frac{1}{2}}&=&(\alpha I-S)x^{k}+b% ,\\ (\alpha I+S)x^{k+1}&=&(\alpha I-H)x^{k+\frac{1}{2}}+b,\end{array}\right.{ start_ARRAY start_ROW start_CELL ( italic_α italic_I + italic_H ) italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL ( italic_α italic_I - italic_S ) italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_b , end_CELL end_ROW start_ROW start_CELL ( italic_α italic_I + italic_S ) italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL ( italic_α italic_I - italic_H ) italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_b , end_CELL end_ROW end_ARRAY (3)

where H=(A+A𝖧)/2𝐻𝐴superscript𝐴𝖧2H=(A+A^{\mathsf{H}})/2italic_H = ( italic_A + italic_A start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ) / 2 and S=(AA𝖧)/2𝑆𝐴superscript𝐴𝖧2S=(A-A^{\mathsf{H}})/2italic_S = ( italic_A - italic_A start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ) / 2 are the Hermitian and skew-Hermitian parts of A𝐴Aitalic_A, respectively, and I𝐼Iitalic_I is the identity matrix. Here, A𝖧superscript𝐴𝖧A^{\mathsf{H}}italic_A start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT denotes the conjugate transpose of A𝐴Aitalic_A. This method can be obtained from (2) by defining

M:=αI+H,F:=αI+S.𝑀assign𝛼𝐼𝐻𝐹assign𝛼𝐼𝑆\begin{array}[]{lcl}M&:=&\alpha I+H,\\ F&:=&\alpha I+S.\end{array}start_ARRAY start_ROW start_CELL italic_M end_CELL start_CELL := end_CELL start_CELL italic_α italic_I + italic_H , end_CELL end_ROW start_ROW start_CELL italic_F end_CELL start_CELL := end_CELL start_CELL italic_α italic_I + italic_S . end_CELL end_ROW end_ARRAY (4)

It was proved in [5] that when H𝐻Hitalic_H is positive definite, namely, A𝐴Aitalic_A is non-Hermitian positive definite, HSS converges unconditionally to the unique solution x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for any initial guess x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. The linear subsystems, however, especially the one involving αI+S𝛼𝐼𝑆\alpha I+Sitalic_α italic_I + italic_S, may still be difficult to solve, therefore much attention has been devoted to the inexact implementation. More precisely, the tolerances for the inner iterative solvers may be relatively relaxed, while good convergence properties can still be retained according to numerical experiments; see [5, 11, 9, 6]. The HSS iterative scheme has been generalized to other splitting methods, as well as their preconditioned variants, for handling various problems in scientific computing; see, e.g., [13, 30, 9, 3, 44, 29, 2]. There is also a number of studies on the optimal selection of α𝛼\alphaitalic_α; see [5, 4, 28, 46]. The iterative scheme (3) can be equivalently written in a residual-updating form, which achieves a higher accuracy at the cost of more computational effort; see [6] for a detailed discussion.

Parallel computing could be extremely useful when A𝐴Aitalic_A has large dimension. In practice, the high cost of synchronization relative to that of computation is currently the major bottleneck in high-performance distributed computing systems, which motivates redesigning of parallel iterative algorithms. One of the most interesting approaches, arising from basic relaxation methods, is the so-called asynchronous iterations [16, 15]. Asynchronous iterative scheme gives a full overlap** of communication and computation. Every process has the flexibility to work at their own pace without waiting for the data acquisition. A major difference between synchronous and asynchronous iterations lies in their predictability properties. The former produces deterministic sequence of iterations, while the latter enables nondeterministic behaviors. In [16] the first convergence result was established for the solution of linear systems, which was followed by the investigation of general fixed-point iterative models; see [39, 7, 21, 14]. In recent years, with the advent of very high-performance computing environment, asynchronous iterative scheme has gained much popularity. The study of asynchronous domain decomposition methods, in both time and space domains, becomes an increasingly active area of research; see, e.g., [36, 35, 37, 32, 45, 20]. Another area that has seen growth in the last decades is the asynchronous convergence detection; see [33, 26] and the references therein.

In this paper we focus on the asynchronous formulation of alternating iterations. In Section 2, we recall some general tools and the asynchronous iterations theory used for the formulation and the convergence analysis of our asynchronous alternating scheme. Section 3 presents the main contribution where we formulate our asynchronous alternating scheme and sufficient conditions for its convergence. Section 5 is devoted to numerical experiments on a parallel computing platform, featuring both a real three dimensional convection-diffusion problem and a complex two dimensional structural dynamic problem. Finally, Section 6 gives our conclusions.

2 Generalities

2.1 𝖧𝖧\mathsf{H}sansserif_H-matrix and 𝖧𝖧\mathsf{H}sansserif_H-splitting

In a general manner, let 𝒜i,jsubscript𝒜𝑖𝑗\mathcal{A}_{i,j}caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT denote the entry of a matrix 𝒜𝒜\mathcal{A}caligraphic_A on its i𝑖iitalic_i-th row and j𝑗jitalic_j-th column, and let xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the i𝑖iitalic_i-th entry of a vector x𝑥xitalic_x. Comparisons <<<, \leq, >>>, \geq and === between two matrices or vectors (of same shapes) are entrywise. The absolute value (or module) |𝒜|𝒜|\mathcal{A}|| caligraphic_A | of a matrix or a vector 𝒜𝒜\mathcal{A}caligraphic_A is entrywise. The spectral radius of a matrix 𝒜𝒜\mathcal{A}caligraphic_A is designated by ρ(𝒜)𝜌𝒜\rho(\mathcal{A})italic_ρ ( caligraphic_A ). In expressions like 𝒜<0𝒜0\mathcal{A}<0caligraphic_A < 0 and like x<0𝑥0x<0italic_x < 0 with 𝒜𝒜\mathcal{A}caligraphic_A and x𝑥xitalic_x being a matrix and a vector, respectively, 00 indicates a matrix and a vector, respectively, with all entries being 00. I𝐼Iitalic_I stands for the identity matrix.

We recall now few general tools later used for the convergence analysis of the proposed asynchronous iterative method.

Definition 1.

A square matrix 𝒜𝒜\mathcal{A}caligraphic_A is an 𝖬𝖬\mathsf{M}sansserif_M-matrix if and only if

α:αI𝒜0,α>ρ(αI𝒜).\exists\ \alpha\in\mathbb{R}:\quad\alpha I-\mathcal{A}\geq 0,\quad\alpha>\rho(% \alpha I-\mathcal{A}).∃ italic_α ∈ blackboard_R : italic_α italic_I - caligraphic_A ≥ 0 , italic_α > italic_ρ ( italic_α italic_I - caligraphic_A ) .
Definition 2.

The comparison matrix 𝒜delimited-⟨⟩𝒜\langle\mathcal{A}\rangle⟨ caligraphic_A ⟩ of a matrix 𝒜𝒜\mathcal{A}caligraphic_A is defined as

𝒜i,i:=|𝒜i,i|,𝒜i,j:=|𝒜i,j|,ij.formulae-sequenceassignsubscriptdelimited-⟨⟩𝒜𝑖𝑖subscript𝒜𝑖𝑖formulae-sequenceassignsubscriptdelimited-⟨⟩𝒜𝑖𝑗subscript𝒜𝑖𝑗𝑖𝑗\langle\mathcal{A}\rangle_{i,i}:=|\mathcal{A}_{i,i}|,\qquad\langle\mathcal{A}% \rangle_{i,j}:=-|\mathcal{A}_{i,j}|,\quad i\neq j.⟨ caligraphic_A ⟩ start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT := | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | , ⟨ caligraphic_A ⟩ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT := - | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | , italic_i ≠ italic_j .
Definition 3.

A square matrix 𝒜𝒜\mathcal{A}caligraphic_A is an 𝖧𝖧\mathsf{H}sansserif_H-matrix if and only if its comparison matrix 𝒜delimited-⟨⟩𝒜\langle\mathcal{A}\rangle⟨ caligraphic_A ⟩ is an 𝖬𝖬\mathsf{M}sansserif_M-matrix.

Lemma 1.

A square matrix 𝒜𝒜\mathcal{A}caligraphic_A is an 𝖧𝖧\mathsf{H}sansserif_H-matrix if and only if

u>0:i,|𝒜i,i|ui>ji|𝒜i,j|uj.\exists\ u>0:\quad\forall i,\ |\mathcal{A}_{i,i}|u_{i}>\sum_{j\neq i}|\mathcal% {A}_{i,j}|u_{j}.∃ italic_u > 0 : ∀ italic_i , | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .
Proof.

This is directly implied by Theorem 5’ in [22]. ∎

A splitting 𝒜=𝒩𝒜𝒩\mathcal{A}=\mathcal{M}-\mathcal{N}caligraphic_A = caligraphic_M - caligraphic_N of a matrix 𝒜𝒜\mathcal{A}caligraphic_A consists of identifying a nonsingular matrix \mathcal{M}caligraphic_M and the resulting matrix 𝒩=𝒜𝒩𝒜\mathcal{N}=\mathcal{M}-\mathcal{A}caligraphic_N = caligraphic_M - caligraphic_A, so as to define a relaxation operator 1𝒩=I1𝒜.superscript1𝒩𝐼superscript1𝒜\mathcal{M}^{-1}\mathcal{N}=I-\mathcal{M}^{-1}\mathcal{A}.caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_N = italic_I - caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_A .

Definition 4.

A splitting 𝒜=𝒩𝒜𝒩\mathcal{A}=\mathcal{M}-\mathcal{N}caligraphic_A = caligraphic_M - caligraphic_N is an 𝖧𝖧\mathsf{H}sansserif_H-splitting if and only if |𝒩|delimited-⟨⟩𝒩\langle\mathcal{M}\rangle-|\mathcal{N}|⟨ caligraphic_M ⟩ - | caligraphic_N | is an 𝖬𝖬\mathsf{M}sansserif_M-matrix.

Lemma 2.

Let 𝒜=𝒩𝒜𝒩\mathcal{A}=\mathcal{M}-\mathcal{N}caligraphic_A = caligraphic_M - caligraphic_N be an 𝖧𝖧\mathsf{H}sansserif_H-splitting. Then, we have ρ(|I1𝒜|)<1.𝜌𝐼superscript1𝒜1\rho(|I-\mathcal{M}^{-1}\mathcal{A}|)<1.italic_ρ ( | italic_I - caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_A | ) < 1 .

Proof.

This directly follows from Proof of Theorem 3.4 (c) in [23]. ∎

Lemma 3 (refer to, e.g., Corollary 6.1 in [15]).

Let 𝒜𝒜\mathcal{A}caligraphic_A be a square matrix. Then, we have

ρ(|𝒜|)<1w>0:𝒜w<1,𝒜w:=maxi1wij|𝒜i,j|wj.:formulae-sequence𝜌𝒜1iff𝑤0formulae-sequencesuperscriptsubscriptnorm𝒜𝑤1assignsuperscriptsubscriptnorm𝒜𝑤subscript𝑖1subscript𝑤𝑖subscript𝑗subscript𝒜𝑖𝑗subscript𝑤𝑗\rho(\left|\mathcal{A}\right|)<1\quad\iff\quad\exists\ w>0:\ \left\|\mathcal{A% }\right\|_{\infty}^{w}<1,\qquad\quad\|\mathcal{A}\|_{\infty}^{w}:=\max_{i}% \frac{1}{w_{i}}\sum_{j}|\mathcal{A}_{i,j}|w_{j}.italic_ρ ( | caligraphic_A | ) < 1 ⇔ ∃ italic_w > 0 : ∥ caligraphic_A ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT < 1 , ∥ caligraphic_A ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

2.2 Asynchronous iterations

Consider, again, the linear system (1), a splitting A=MN𝐴𝑀𝑁A=M-Nitalic_A = italic_M - italic_N of the matrix A𝐴Aitalic_A and the resulting iterative scheme

xk+1=(IM1A)xk+M1b=xk+M1(bAxk).superscript𝑥𝑘1𝐼superscript𝑀1𝐴superscript𝑥𝑘superscript𝑀1𝑏superscript𝑥𝑘superscript𝑀1𝑏𝐴superscript𝑥𝑘x^{k+1}=\left(I-M^{-1}A\right)x^{k}+M^{-1}b=x^{k}+M^{-1}\left(b-Ax^{k}\right).italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b = italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) .

Assume a distribution

A=[A(1)A(2)A(m)],b=[b(1)b(2)b(m)],M=[M(1)000M(2)000M(m)]formulae-sequence𝐴matrixsuperscript𝐴1superscript𝐴2superscript𝐴𝑚formulae-sequence𝑏matrixsuperscript𝑏1superscript𝑏2superscript𝑏𝑚𝑀matrixsuperscript𝑀1000superscript𝑀2000superscript𝑀𝑚A=\begin{bmatrix}A^{(1)}\\ A^{(2)}\\ \vdots\\ A^{(m)}\end{bmatrix},\ \ b=\begin{bmatrix}b^{(1)}\\ b^{(2)}\\ \vdots\\ b^{(m)}\end{bmatrix},\ \ M=\begin{bmatrix}M^{(1)}&0&\cdots&0\\ 0&M^{(2)}&\ddots&\vdots\\ \vdots&\ddots&\ddots&0\\ 0&\cdots&0&M^{(m)}\end{bmatrix}italic_A = [ start_ARG start_ROW start_CELL italic_A start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] , italic_b = [ start_ARG start_ROW start_CELL italic_b start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] , italic_M = [ start_ARG start_ROW start_CELL italic_M start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_M start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋱ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL start_CELL italic_M start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]

of both the system and the splitting of A𝐴Aitalic_A. Note that the problem (1) can also corresponds to an augmented system resulting from a domain decomposition with overlap** subdomains, i.e., some rows in a submatrix A(s1)superscript𝐴subscript𝑠1A^{(s_{1})}italic_A start_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT are possibly replicated in another submatrix A(s2)superscript𝐴subscript𝑠2A^{(s_{2})}italic_A start_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT, s1,s2{1,,m}subscript𝑠1subscript𝑠21𝑚s_{1},s_{2}\in\{1,\ldots,m\}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 1 , … , italic_m }. A classical parallel relaxation is then given by

x(s),k+1superscript𝑥𝑠𝑘1\displaystyle x^{(s),k+1}italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k + 1 end_POSTSUPERSCRIPT =x(s),k+M(s)1(b(s)A(s)[x(1),kx(m),k]𝖳)s{1,,m},formulae-sequenceabsentsuperscript𝑥𝑠𝑘superscriptsuperscript𝑀𝑠1superscript𝑏𝑠superscript𝐴𝑠superscriptmatrixsuperscript𝑥1𝑘superscript𝑥𝑚𝑘𝖳for-all𝑠1𝑚\displaystyle=x^{(s),k}+{M^{(s)}}^{-1}\left(b^{(s)}-A^{(s)}\begin{bmatrix}x^{(% 1),k}&\cdots&x^{(m),k}\end{bmatrix}^{\mathsf{T}}\right)\quad\forall s\in\{1,% \ldots,m\},= italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( 1 ) , italic_k end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_x start_POSTSUPERSCRIPT ( italic_m ) , italic_k end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) ∀ italic_s ∈ { 1 , … , italic_m } ,
=x(s),k+M(s)1(b(s)q=1mA(s,q)x(q),k)s{1,,m}formulae-sequenceabsentsuperscript𝑥𝑠𝑘superscriptsuperscript𝑀𝑠1superscript𝑏𝑠superscriptsubscript𝑞1𝑚superscript𝐴𝑠𝑞superscript𝑥𝑞𝑘for-all𝑠1𝑚\displaystyle=x^{(s),k}+{M^{(s)}}^{-1}\left(b^{(s)}-\sum_{q=1}^{m}A^{(s,q)}x^{% (q),k}\right)\quad\forall s\in\{1,\ldots,m\}= italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ( italic_q ) , italic_k end_POSTSUPERSCRIPT ) ∀ italic_s ∈ { 1 , … , italic_m }

with A(s)=[A(s,1)A(s,m)].superscript𝐴𝑠matrixsuperscript𝐴𝑠1superscript𝐴𝑠𝑚A^{(s)}=\begin{bmatrix}A^{(s,1)}&\cdots&A^{(s,m)}\end{bmatrix}.italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL italic_A start_POSTSUPERSCRIPT ( italic_s , 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_A start_POSTSUPERSCRIPT ( italic_s , italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] . The first feature of asynchronous iterations is the free steering (see, e.g., [42]), where, at each iteration k𝑘kitalic_k, a random subset Ωk{1,,m}subscriptΩ𝑘1𝑚\Omega_{k}\subset\{1,\ldots,m\}roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊂ { 1 , … , italic_m } of block-components can be updated. It is convenient to state a natural assumption,

card{k:sΩk}=s{1,,m},formulae-sequencecard:𝑘𝑠subscriptΩ𝑘for-all𝑠1𝑚\operatorname{card}\left\{k\in\mathbb{N}:s\in\Omega_{k}\right\}=\infty\qquad% \forall s\in\{1,\ldots,m\},roman_card { italic_k ∈ blackboard_N : italic_s ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } = ∞ ∀ italic_s ∈ { 1 , … , italic_m } ,

which is implemented by the fact that no block-component stops being updated until convergence is globally reached. The second feature consists of modeling communication delays implying that at an iteration k+1𝑘1k+1italic_k + 1, a block-component s1Ωksubscript𝑠1subscriptΩ𝑘s_{1}\in\Omega_{k}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is possibly updated using a block-component s2{1,,m}subscript𝑠21𝑚s_{2}\in\{1,\ldots,m\}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 1 , … , italic_m } computed at a random previous iteration δs1(s2,k)ksubscript𝛿subscript𝑠1subscript𝑠2𝑘𝑘\delta_{s_{1}}(s_{2},k)\leq kitalic_δ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_k ) ≤ italic_k. It yields the parallel iterative scheme

x(s),k+1={x(s),δs(s,k)+M(s)1(b(s)q=1mA(s,q)x(q),δs(q,k))sΩk,x(s),ksΩk,superscript𝑥𝑠𝑘1casessuperscript𝑥𝑠subscript𝛿𝑠𝑠𝑘superscriptsuperscript𝑀𝑠1superscript𝑏𝑠superscriptsubscript𝑞1𝑚superscript𝐴𝑠𝑞superscript𝑥𝑞subscript𝛿𝑠𝑞𝑘for-all𝑠subscriptΩ𝑘superscript𝑥𝑠𝑘for-all𝑠subscriptΩ𝑘x^{(s),k+1}=\left\{\begin{array}[]{ll}x^{(s),\delta_{s}(s,k)}+{M^{(s)}}^{-1}% \left(b^{(s)}-\displaystyle\sum_{q=1}^{m}A^{(s,q)}x^{(q),\delta_{s}(q,k)}% \right)&\forall s\in\Omega_{k},\\ x^{(s),k}&\forall s\notin\Omega_{k},\end{array}\right.italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k + 1 end_POSTSUPERSCRIPT = { start_ARRAY start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_s , italic_k ) end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ( italic_q ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_q , italic_k ) end_POSTSUPERSCRIPT ) end_CELL start_CELL ∀ italic_s ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT end_CELL start_CELL ∀ italic_s ∉ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY (5)

where, as well, another natural assumption is made, stating that

limkδs1(s2,k)=s1,s2{1,,m}.formulae-sequencesubscript𝑘subscript𝛿subscript𝑠1subscript𝑠2𝑘for-allsubscript𝑠1subscript𝑠21𝑚\lim_{k\to\infty}\delta_{s_{1}}(s_{2},k)=\infty\qquad\forall s_{1},s_{2}\in\{1% ,\ldots,m\}.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_k ) = ∞ ∀ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 1 , … , italic_m } .
Theorem 5 (Chazan and Miranker (1969) [16]).

An asynchronous iterative method (5) converges from any initial guess x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, with any sequence {Ωk}ksubscriptsubscriptnormal-Ω𝑘𝑘\{\Omega_{k}\}_{k\in\mathbb{N}}{ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT and any functions δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to δmsubscript𝛿𝑚\delta_{m}italic_δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if and only if ρ(|IM1A|)<1.𝜌𝐼superscript𝑀1𝐴1\rho(|I-M^{-1}A|)<1.italic_ρ ( | italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A | ) < 1 .

The model (5) was later generalized by Baudet [7] to arbitrary fixed-point iterations

x(s),k+1={f(s)(x(1),δs,1(1,k),,x(m),δs,1(m,k),,x(1),δs,p(1,k),,x(m),δs,p(m,k))sΩk,x(s),ksΩk,x^{(s),k+1}=\left\{\begin{array}[]{ll}f^{(s)}\left(x^{(1),\delta_{s,1}(1,k)},% \ldots,x^{(m),\delta_{s,1}(m,k)},\right.&\\ \left.\qquad\quad\ldots,x^{(1),\delta_{s,p}(1,k)},\ldots,x^{(m),\delta_{s,p}(m% ,k)}\right)&\forall s\in\Omega_{k},\\ x^{(s),k}&\forall s\notin\Omega_{k},\end{array}\right.italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k + 1 end_POSTSUPERSCRIPT = { start_ARRAY start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( 1 ) , italic_δ start_POSTSUBSCRIPT italic_s , 1 end_POSTSUBSCRIPT ( 1 , italic_k ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_m ) , italic_δ start_POSTSUBSCRIPT italic_s , 1 end_POSTSUBSCRIPT ( italic_m , italic_k ) end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL … , italic_x start_POSTSUPERSCRIPT ( 1 ) , italic_δ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ( 1 , italic_k ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_m ) , italic_δ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ( italic_m , italic_k ) end_POSTSUPERSCRIPT ) end_CELL start_CELL ∀ italic_s ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT end_CELL start_CELL ∀ italic_s ∉ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY (6)

where the update of a block-component sΩk𝑠subscriptΩ𝑘s\in\Omega_{k}italic_s ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at an iteration k𝑘kitalic_k depends on p𝑝p\in\mathbb{N}italic_p ∈ blackboard_N versions, δs,1(q,k)subscript𝛿𝑠1𝑞𝑘\delta_{s,1}(q,k)italic_δ start_POSTSUBSCRIPT italic_s , 1 end_POSTSUBSCRIPT ( italic_q , italic_k ) to δs,p(q,k)subscript𝛿𝑠𝑝𝑞𝑘\delta_{s,p}(q,k)italic_δ start_POSTSUBSCRIPT italic_s , italic_p end_POSTSUBSCRIPT ( italic_q , italic_k ), of each block-component q{1,,m}𝑞1𝑚q\in\{1,\ldots,m\}italic_q ∈ { 1 , … , italic_m }. Let us denote by max(x,y)𝑥𝑦\max(x,y)roman_max ( italic_x , italic_y ) the vector given by

(max(x,y))i:=max{xi,yi}assignsubscript𝑥𝑦𝑖subscript𝑥𝑖subscript𝑦𝑖(\max(x,y))_{i}:=\max\{x_{i},y_{i}\}( roman_max ( italic_x , italic_y ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := roman_max { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }

with x𝑥xitalic_x and y𝑦yitalic_y being two vectors of same size. Let X:=(X1,,Xp)assign𝑋subscript𝑋1subscript𝑋𝑝X:=(X_{1},\ldots,X_{p})italic_X := ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) and Y:=(Y1,,Yp)assign𝑌subscript𝑌1subscript𝑌𝑝Y:=(Y_{1},\ldots,Y_{p})italic_Y := ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) denote collections of p𝑝pitalic_p vectors, i.e.,

Xt=[Xt(1)Xt(m)]𝖳,Yt=[Yt(1)Yt(m)]𝖳,t{1,,p}.formulae-sequencesubscript𝑋𝑡superscriptmatrixsuperscriptsubscript𝑋𝑡1superscriptsubscript𝑋𝑡𝑚𝖳formulae-sequencesubscript𝑌𝑡superscriptmatrixsuperscriptsubscript𝑌𝑡1superscriptsubscript𝑌𝑡𝑚𝖳𝑡1𝑝X_{t}=\begin{bmatrix}X_{t}^{(1)}&\cdots&X_{t}^{(m)}\end{bmatrix}^{\mathsf{T}},% \quad Y_{t}=\begin{bmatrix}Y_{t}^{(1)}&\cdots&Y_{t}^{(m)}\end{bmatrix}^{% \mathsf{T}},\qquad t\in\{1,\ldots,p\}.italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , italic_t ∈ { 1 , … , italic_p } .
Theorem 6 (Baudet (1978) [7]).

An asynchronous iterative method (6) converges from any initial guess x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, with any sequence {Ωk}ksubscriptsubscriptnormal-Ω𝑘𝑘\{\Omega_{k}\}_{k\in\mathbb{N}}{ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT and any functions δ1,1subscript𝛿11\delta_{1,1}italic_δ start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT to δm,psubscript𝛿𝑚𝑝\delta_{m,p}italic_δ start_POSTSUBSCRIPT italic_m , italic_p end_POSTSUBSCRIPT if there exists a square matrix 𝒫𝒫\mathcal{P}caligraphic_P such that 𝒫0𝒫0\mathcal{P}\geq 0caligraphic_P ≥ 0, ρ(𝒫)<1𝜌𝒫1\rho(\mathcal{P})<1italic_ρ ( caligraphic_P ) < 1 and

X,Y,|f(X)f(Y)|𝒫max(|X1Y1|,,|XpYp|).for-all𝑋𝑌𝑓𝑋𝑓𝑌𝒫subscript𝑋1subscript𝑌1subscript𝑋𝑝subscript𝑌𝑝\forall X,Y,\quad\left|f(X)-f(Y)\right|\leq\mathcal{P}\max\left(\left|X_{1}-Y_% {1}\right|,\ldots,\left|X_{p}-Y_{p}\right|\right).∀ italic_X , italic_Y , | italic_f ( italic_X ) - italic_f ( italic_Y ) | ≤ caligraphic_P roman_max ( | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , … , | italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | ) .

3 Asynchronous alternating iterations

3.1 Computational scheme

Consider, now, the alternating scheme (2) which results in

xk+1superscript𝑥𝑘1\displaystyle x^{k+1}italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT =(IF1A)xk+12+F1babsent𝐼superscript𝐹1𝐴superscript𝑥𝑘12superscript𝐹1𝑏\displaystyle=\left(I-F^{-1}A\right)x^{k+\frac{1}{2}}+F^{-1}b= ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b
=(IF1A)(IM1A)xk+(IF1A)M1b+F1babsent𝐼superscript𝐹1𝐴𝐼superscript𝑀1𝐴superscript𝑥𝑘𝐼superscript𝐹1𝐴superscript𝑀1𝑏superscript𝐹1𝑏\displaystyle=\left(I-F^{-1}A\right)\left(I-M^{-1}A\right)x^{k}+\left(I-F^{-1}% A\right)M^{-1}b+F^{-1}b= ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b
=(IF1(M+FA)M1A)xk+F1(M+FA)M1b.absent𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴superscript𝑥𝑘superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝑏\displaystyle=\left(I-F^{-1}\left(M+F-A\right)M^{-1}A\right)x^{k}+F^{-1}\left(% M+F-A\right)M^{-1}b.= ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b .

Then, according to Theorem 5, such an induced parallel scheme is asynchronously convergent if ρ(|IF1(M+FA)M1A|)<1,𝜌𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴1\rho\left(\left|I-F^{-1}\left(M+F-A\right)M^{-1}A\right|\right)<1,italic_ρ ( | italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A | ) < 1 , which is shown, in the next section, to be achieved under usual convergence conditions on the splittings A=MN𝐴𝑀𝑁A=M-Nitalic_A = italic_M - italic_N and A=FG𝐴𝐹𝐺A=F-Gitalic_A = italic_F - italic_G. Nevertheless, asynchronous relaxation based on such an operator cannot be implemented using the alternating form (2), since the said operator is induced by strictly synchronizing xk+12superscript𝑥𝑘12x^{k+\frac{1}{2}}italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT and xk+1superscript𝑥𝑘1x^{k+1}italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT.

Consider, then, an equivalent formulation of the alternating scheme (2),

{yk:=xk+M1(bAxk),xk+1=yk+F1(bAyk),casessuperscript𝑦𝑘assignsuperscript𝑥𝑘superscript𝑀1𝑏𝐴superscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘superscript𝐹1𝑏𝐴superscript𝑦𝑘\left\{\begin{array}[]{lcl}y^{k}&:=&x^{k}+M^{-1}\left(b-Ax^{k}\right),\\ x^{k+1}&=&y^{k}+F^{-1}\left(b-Ay^{k}\right),\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_CELL start_CELL := end_CELL start_CELL italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW end_ARRAY

and assume that F𝐹Fitalic_F is distributed as M𝑀Mitalic_M, i.e.,

F=[F(1)000F(2)000F(m)].𝐹matrixsuperscript𝐹1000superscript𝐹2000superscript𝐹𝑚F=\begin{bmatrix}F^{(1)}&0&\cdots&0\\ 0&F^{(2)}&\ddots&\vdots\\ \vdots&\ddots&\ddots&0\\ 0&\cdots&0&F^{(m)}\end{bmatrix}.italic_F = [ start_ARG start_ROW start_CELL italic_F start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_F start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋱ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL ⋯ end_CELL start_CELL 0 end_CELL start_CELL italic_F start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] .

Parallel asynchronous alternating methods are thus given by the computational scheme

{y(s),k:=x(s),δs(s,k)+M(s)1(b(s)q=1mA(s,q)x(q),δs(q,k))s{1,,m},x(s),k+1={y(s),δs(s,k)+F(s)1(b(s)q=1mA(s,q)y(q),δs(q,k))sΩk,x(s),ksΩk.casessuperscript𝑦𝑠𝑘assignsuperscript𝑥𝑠subscript𝛿𝑠𝑠𝑘missing-subexpressionmissing-subexpressionsuperscriptsuperscript𝑀𝑠1superscript𝑏𝑠superscriptsubscript𝑞1𝑚superscript𝐴𝑠𝑞superscript𝑥𝑞subscript𝛿𝑠𝑞𝑘for-all𝑠1𝑚superscript𝑥𝑠𝑘1casessuperscript𝑦𝑠subscript𝛿𝑠𝑠𝑘missing-subexpressionsuperscriptsuperscript𝐹𝑠1superscript𝑏𝑠superscriptsubscript𝑞1𝑚superscript𝐴𝑠𝑞superscript𝑦𝑞subscript𝛿𝑠𝑞𝑘for-all𝑠subscriptΩ𝑘superscript𝑥𝑠𝑘for-all𝑠subscriptΩ𝑘\left\{\begin{array}[]{lcl}y^{(s),k}&:=&x^{(s),\delta_{s}(s,k)}\\ &&\quad+\ {M^{(s)}}^{-1}\left(b^{(s)}-\displaystyle\sum_{q=1}^{m}A^{(s,q)}x^{(% q),\delta_{s}(q,k)}\right)\quad\forall s\in\{1,\ldots,m\},\\ x^{(s),k+1}&=&\left\{\begin{array}[]{ll}y^{(s),\delta_{s}(s,k)}&\\ \quad+\ {F^{(s)}}^{-1}\left(b^{(s)}-\displaystyle\sum_{q=1}^{m}A^{(s,q)}y^{(q)% ,\delta_{s}(q,k)}\right)&\forall s\in\Omega_{k},\\ x^{(s),k}&\forall s\notin\Omega_{k}.\end{array}\right.\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_y start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT end_CELL start_CELL := end_CELL start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_s , italic_k ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL + italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ( italic_q ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_q , italic_k ) end_POSTSUPERSCRIPT ) ∀ italic_s ∈ { 1 , … , italic_m } , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = end_CELL start_CELL { start_ARRAY start_ROW start_CELL italic_y start_POSTSUPERSCRIPT ( italic_s ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_s , italic_k ) end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ( italic_q ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_q , italic_k ) end_POSTSUPERSCRIPT ) end_CELL start_CELL ∀ italic_s ∈ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k end_POSTSUPERSCRIPT end_CELL start_CELL ∀ italic_s ∉ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARRAY (7)

Assuming that the identity matrix I𝐼Iitalic_I is distributed as A𝐴Aitalic_A, i.e.,

I=[I(1,1)I(1,m)I(m,1)I(m,m)],𝐼matrixsuperscript𝐼11superscript𝐼1𝑚superscript𝐼𝑚1superscript𝐼𝑚𝑚I=\begin{bmatrix}I^{(1,1)}&\cdots&I^{(1,m)}\\ \vdots&\ddots&\vdots\\ I^{(m,1)}&\cdots&I^{(m,m)}\end{bmatrix},italic_I = [ start_ARG start_ROW start_CELL italic_I start_POSTSUPERSCRIPT ( 1 , 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_I start_POSTSUPERSCRIPT ( 1 , italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_I start_POSTSUPERSCRIPT ( italic_m , 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_I start_POSTSUPERSCRIPT ( italic_m , italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] ,

it yields

x(s),k+1superscript𝑥𝑠𝑘1\displaystyle x^{(s),k+1}italic_x start_POSTSUPERSCRIPT ( italic_s ) , italic_k + 1 end_POSTSUPERSCRIPT =q=1m(I(s,q)F(s)1A(s,q))y(q),δs(q,k)+F(s)1b(s)absentsuperscriptsubscript𝑞1𝑚superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscript𝑦𝑞subscript𝛿𝑠𝑞𝑘superscriptsuperscript𝐹𝑠1superscript𝑏𝑠\displaystyle=\sum_{q=1}^{m}\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)y^{(% q),\delta_{s}(q,k)}+{F^{(s)}}^{-1}b^{(s)}= ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) italic_y start_POSTSUPERSCRIPT ( italic_q ) , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_q , italic_k ) end_POSTSUPERSCRIPT + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
=q=1m(I(s,q)F(s)1A(s,q))(r=1m(I(q,r)M(q)1A(q,r))x(r),δq(r,δs(q,k))\displaystyle=\sum_{q=1}^{m}\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)% \left(\sum_{r=1}^{m}\left(I^{(q,r)}-{M^{(q)}}^{-1}A^{(q,r)}\right)x^{(r),% \delta_{q}(r,\delta_{s}(q,k))}\right.= ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) ( ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT ( italic_q , italic_r ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q , italic_r ) end_POSTSUPERSCRIPT ) italic_x start_POSTSUPERSCRIPT ( italic_r ) , italic_δ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_r , italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_q , italic_k ) ) end_POSTSUPERSCRIPT
+M(q)1b(q))+F(s)1b(s),\displaystyle\left.\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\ {M^{(q)}}^{-1}% b^{(q)}\right)+{F^{(s)}}^{-1}b^{(s)},+ italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ,

which actually lies in the framework of the generalized model (6) with, here, p=m𝑝𝑚p=mitalic_p = italic_m, since each update of a block-component depends on m𝑚mitalic_m versions of the other block-components. Considering, then, a collection X=(X1,,Xm)𝑋subscript𝑋1subscript𝑋𝑚X=\left(X_{1},\ldots,X_{m}\right)italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) of m𝑚mitalic_m vectors, the corresponding map** f𝑓fitalic_f is given by

f(s)(X)superscript𝑓𝑠𝑋\displaystyle f^{(s)}(X)italic_f start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ( italic_X ) :=q=1m(I(s,q)F(s)1A(s,q))(r=1m(I(q,r)M(q)1A(q,r))Xq(r)\displaystyle:=\sum_{q=1}^{m}\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)% \left(\sum_{r=1}^{m}\left(I^{(q,r)}-{M^{(q)}}^{-1}A^{(q,r)}\right)X_{q}^{(r)}\right.:= ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) ( ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_I start_POSTSUPERSCRIPT ( italic_q , italic_r ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q , italic_r ) end_POSTSUPERSCRIPT ) italic_X start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT
+M(q)1b(q))+F(s)1b(s)\displaystyle\left.\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\ {M^{(q)}}^{-1}% b^{(q)}\right)+{F^{(s)}}^{-1}b^{(s)}+ italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
=q=1mPq(s)Xq+(I(s)F(s)1A(s))M1b+F(s)1b(s),absentsuperscriptsubscript𝑞1𝑚superscriptsubscript𝑃𝑞𝑠subscript𝑋𝑞superscript𝐼𝑠superscriptsuperscript𝐹𝑠1superscript𝐴𝑠superscript𝑀1𝑏superscriptsuperscript𝐹𝑠1superscript𝑏𝑠\displaystyle=\sum_{q=1}^{m}P_{q}^{(s)}X_{q}+\left(I^{(s)}-{F^{(s)}}^{-1}A^{(s% )}\right)M^{-1}b+{F^{(s)}}^{-1}b^{(s)},= ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + ( italic_I start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ,
f(X)𝑓𝑋\displaystyle f(X)italic_f ( italic_X ) :=q=1mPqXq+(IF1A)M1b+F1bassignabsentsuperscriptsubscript𝑞1𝑚subscript𝑃𝑞subscript𝑋𝑞𝐼superscript𝐹1𝐴superscript𝑀1𝑏superscript𝐹1𝑏\displaystyle:=\sum_{q=1}^{m}P_{q}X_{q}+\left(I-F^{-1}A\right)M^{-1}b+F^{-1}b:= ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b

with Pq(s):=(I(s,q)F(s)1A(s,q))(I(q)M(q)1A(q)),q,s{1,,m},formulae-sequenceassignsuperscriptsubscript𝑃𝑞𝑠superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscript𝐼𝑞superscriptsuperscript𝑀𝑞1superscript𝐴𝑞𝑞𝑠1𝑚P_{q}^{(s)}:=\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)\left(I^{(q)}-{M^{(% q)}}^{-1}A^{(q)}\right),\ q,s\in\{1,\ldots,m\},italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) ( italic_I start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) , italic_q , italic_s ∈ { 1 , … , italic_m } , and Pq:=[Pq(1)Pq(m)]𝖳,q{1,,m}.formulae-sequenceassignsubscript𝑃𝑞superscriptmatrixsuperscriptsubscript𝑃𝑞1superscriptsubscript𝑃𝑞𝑚𝖳𝑞1𝑚P_{q}:=\begin{bmatrix}P_{q}^{(1)}&\cdots&P_{q}^{(m)}\end{bmatrix}^{\mathsf{T}}% ,\ q\in\{1,\ldots,m\}.italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , italic_q ∈ { 1 , … , italic_m } .

3.2 Convergence conditions

We analyze, now, sufficient conditions for the convergence of our asynchronous alternating iterative scheme (7). To the best of our knowledge, Lemma 4, Proposition 1 and Corollary 1 are new. Proposition 1 and Corollary 1 highlight how combining properties of the operators IF1A𝐼superscript𝐹1𝐴I-F^{-1}Aitalic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A and IM1A𝐼superscript𝑀1𝐴I-M^{-1}Aitalic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A imply a resulting contracting operator (IF1A)(IM1A)𝐼superscript𝐹1𝐴𝐼superscript𝑀1𝐴\left(I-F^{-1}A\right)\left(I-M^{-1}A\right)( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ). Our main results consist of Theorem 7 and Corollary 2 where the same combined conditions are shown to be sufficient for the convergence of asynchronous alternating methods (7), despite the induced, slightly different, iterations operator.

Let, first, 𝒜𝒜\mathcal{A}caligraphic_A be a matrix with arbitrary shape, let w𝑤witalic_w be a vector with as many entries as the number of columns in 𝒜𝒜\mathcal{A}caligraphic_A, and let v𝑣vitalic_v be a vector with as many entries as the number of rows in 𝒜𝒜\mathcal{A}caligraphic_A, and with no 00 entry. Let τ(𝒜,w,v)𝜏𝒜𝑤𝑣\tau(\mathcal{A},w,v)italic_τ ( caligraphic_A , italic_w , italic_v ) denote the vector given by the row-sums

τi(𝒜,w,v):=(τ(𝒜,w,v))i:=1vij|𝒜i,j|wji.formulae-sequenceassignsubscript𝜏𝑖𝒜𝑤𝑣subscript𝜏𝒜𝑤𝑣𝑖assign1subscript𝑣𝑖subscript𝑗subscript𝒜𝑖𝑗subscript𝑤𝑗for-all𝑖\tau_{i}(\mathcal{A},w,v):=\left(\tau(\mathcal{A},w,v)\right)_{i}:=\frac{1}{v_% {i}}\sum_{j}\left|\mathcal{A}_{i,j}\right|w_{j}\qquad\forall i.italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_A , italic_w , italic_v ) := ( italic_τ ( caligraphic_A , italic_w , italic_v ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i .

Note, then, that, for a square matrix 𝒜𝒜\mathcal{A}caligraphic_A,

𝒜w=maxiτi(𝒜,w,w),w>0.formulae-sequencesuperscriptsubscriptnorm𝒜𝑤subscript𝑖subscript𝜏𝑖𝒜𝑤𝑤𝑤0\|\mathcal{A}\|_{\infty}^{w}=\max_{i}\tau_{i}(\mathcal{A},w,w),\qquad w>0.∥ caligraphic_A ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_A , italic_w , italic_w ) , italic_w > 0 .
Lemma 4.

Let 𝒜𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B be matrices with shapes such that 𝒜𝒜\mathcal{A}\mathcal{B}caligraphic_A caligraphic_B is calculable. Let u>0𝑢0u>0italic_u > 0, v>0𝑣0v>0italic_v > 0 and w𝑤witalic_w be vectors with dimensions such that τ(𝒜,u,v)𝜏𝒜𝑢𝑣\tau(\mathcal{A},u,v)italic_τ ( caligraphic_A , italic_u , italic_v ) and τ(,w,u)𝜏𝑤𝑢\tau(\mathcal{B},w,u)italic_τ ( caligraphic_B , italic_w , italic_u ) are calculable. Then, we have

τ(,w,u)<[111]𝖳τ(𝒜,w,v)<τ(𝒜,u,v).formulae-sequence𝜏𝑤𝑢superscriptmatrix111𝖳𝜏𝒜𝑤𝑣𝜏𝒜𝑢𝑣\tau(\mathcal{B},w,u)<\begin{bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}}% \quad\implies\quad\tau(\mathcal{A}\mathcal{B},w,v)<\tau(\mathcal{A},u,v).italic_τ ( caligraphic_B , italic_w , italic_u ) < [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ⟹ italic_τ ( caligraphic_A caligraphic_B , italic_w , italic_v ) < italic_τ ( caligraphic_A , italic_u , italic_v ) .
Proof.

Let us index rows and columns of 𝒜𝒜\mathcal{A}caligraphic_A by i𝑖iitalic_i and j𝑗jitalic_j, respectively, and columns of \mathcal{B}caligraphic_B by l𝑙litalic_l. We have

τi(𝒜,w,v):=1vil|(𝒜)i,l|wlassignsubscript𝜏𝑖𝒜𝑤𝑣1subscript𝑣𝑖subscript𝑙subscript𝒜𝑖𝑙subscript𝑤𝑙\displaystyle\tau_{i}(\mathcal{A}\mathcal{B},w,v):=\frac{1}{v_{i}}\sum_{l}% \left|(\mathcal{A}\mathcal{B})_{i,l}\right|w_{l}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_A caligraphic_B , italic_w , italic_v ) := divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | ( caligraphic_A caligraphic_B ) start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT =1vil|j𝒜i,jj,l|wlabsent1subscript𝑣𝑖subscript𝑙subscript𝑗subscript𝒜𝑖𝑗subscript𝑗𝑙subscript𝑤𝑙\displaystyle=\frac{1}{v_{i}}\sum_{l}\left|\sum_{j}\mathcal{A}_{i,j}\mathcal{B% }_{j,l}\right|w_{l}= divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT
1vilj|𝒜i,jj,l|wlabsent1subscript𝑣𝑖subscript𝑙subscript𝑗subscript𝒜𝑖𝑗subscript𝑗𝑙subscript𝑤𝑙\displaystyle\leq\frac{1}{v_{i}}\sum_{l}\sum_{j}\left|\mathcal{A}_{i,j}% \mathcal{B}_{j,l}\right|w_{l}≤ divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT
=1vilj1uj|𝒜i,j||j,l|ujwlabsent1subscript𝑣𝑖subscript𝑙subscript𝑗1subscript𝑢𝑗subscript𝒜𝑖𝑗subscript𝑗𝑙subscript𝑢𝑗subscript𝑤𝑙\displaystyle=\frac{1}{v_{i}}\sum_{l}\sum_{j}\frac{1}{u_{j}}\left|\mathcal{A}_% {i,j}\right|\left|\mathcal{B}_{j,l}\right|u_{j}w_{l}= divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | | caligraphic_B start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT
=1vij(1ujl|j,l|wl)|𝒜i,j|ujabsent1subscript𝑣𝑖subscript𝑗1subscript𝑢𝑗subscript𝑙subscript𝑗𝑙subscript𝑤𝑙subscript𝒜𝑖𝑗subscript𝑢𝑗\displaystyle=\frac{1}{v_{i}}\sum_{j}\left(\frac{1}{u_{j}}\sum_{l}\left|% \mathcal{B}_{j,l}\right|w_{l}\right)\left|\mathcal{A}_{i,j}\right|u_{j}= divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | caligraphic_B start_POSTSUBSCRIPT italic_j , italic_l end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
=1vijτj(,w,u)|𝒜i,j|uj.absent1subscript𝑣𝑖subscript𝑗subscript𝜏𝑗𝑤𝑢subscript𝒜𝑖𝑗subscript𝑢𝑗\displaystyle=\frac{1}{v_{i}}\sum_{j}\tau_{j}(\mathcal{B},w,u)\left|\mathcal{A% }_{i,j}\right|u_{j}.= divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_B , italic_w , italic_u ) | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

It yields that if τj(,w,u)<1subscript𝜏𝑗𝑤𝑢1\tau_{j}(\mathcal{B},w,u)<1italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_B , italic_w , italic_u ) < 1 for all j𝑗jitalic_j, then

τj(,w,u)|𝒜i,j|ujsubscript𝜏𝑗𝑤𝑢subscript𝒜𝑖𝑗subscript𝑢𝑗\displaystyle\tau_{j}(\mathcal{B},w,u)\left|\mathcal{A}_{i,j}\right|u_{j}italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_B , italic_w , italic_u ) | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT <|𝒜i,j|ujji,absentsubscript𝒜𝑖𝑗subscript𝑢𝑗for-all𝑗for-all𝑖\displaystyle<\left|\mathcal{A}_{i,j}\right|u_{j}\quad\forall j\ \forall i,< | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_j ∀ italic_i ,
1vijτj(,w,u)|𝒜i,j|uj1subscript𝑣𝑖subscript𝑗subscript𝜏𝑗𝑤𝑢subscript𝒜𝑖𝑗subscript𝑢𝑗\displaystyle\frac{1}{v_{i}}\sum_{j}\tau_{j}(\mathcal{B},w,u)\left|\mathcal{A}% _{i,j}\right|u_{j}divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_B , italic_w , italic_u ) | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT <1vij|𝒜i,j|uji,conditionalbra1subscript𝑣𝑖subscript𝑗subscript𝒜𝑖𝑗subscript𝑢𝑗for-all𝑖\displaystyle<\frac{1}{v_{i}}\sum_{j}\left|\mathcal{A}_{i,j}\right|u_{j}\quad% \forall i,< divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i ,
1vil|(𝒜)i,l|wl1subscript𝑣𝑖subscript𝑙subscript𝒜𝑖𝑙subscript𝑤𝑙\displaystyle\frac{1}{v_{i}}\sum_{l}\left|(\mathcal{A}\mathcal{B})_{i,l}\right% |w_{l}divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT | ( caligraphic_A caligraphic_B ) start_POSTSUBSCRIPT italic_i , italic_l end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT \displaystyle\ \leq 1vijτj(,w,u)|𝒜i,j|uj1subscript𝑣𝑖subscript𝑗subscript𝜏𝑗𝑤𝑢subscript𝒜𝑖𝑗subscript𝑢𝑗\displaystyle\frac{1}{v_{i}}\sum_{j}\tau_{j}(\mathcal{B},w,u)\left|\mathcal{A}% _{i,j}\right|u_{j}divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_B , italic_w , italic_u ) | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT <1vij|𝒜i,j|uji,conditionalbra1subscript𝑣𝑖subscript𝑗subscript𝒜𝑖𝑗subscript𝑢𝑗for-all𝑖\displaystyle<\frac{1}{v_{i}}\sum_{j}\left|\mathcal{A}_{i,j}\right|u_{j}\quad% \forall i,< divide start_ARG 1 end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i ,
τi(𝒜,w,v)subscript𝜏𝑖𝒜𝑤𝑣\displaystyle\tau_{i}(\mathcal{A}\mathcal{B},w,v)italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_A caligraphic_B , italic_w , italic_v ) <τi(𝒜,u,v)i,absentsubscript𝜏𝑖𝒜𝑢𝑣for-all𝑖\displaystyle<\tau_{i}(\mathcal{A},u,v)\quad\forall i,< italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_A , italic_u , italic_v ) ∀ italic_i ,

which concludes the proof. ∎

Proposition 1.

Let

Q:=[0IM1AIF1A0].assign𝑄matrix0𝐼superscript𝑀1𝐴𝐼superscript𝐹1𝐴0Q:=\begin{bmatrix}0&I-M^{-1}A\\ I-F^{-1}A&0\end{bmatrix}.italic_Q := [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL end_ROW start_ROW start_CELL italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] .

We have

ρ(|Q|)<1ρ(|IF1(M+FA)M1A|)<1.formulae-sequence𝜌𝑄1𝜌𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴1\rho(|Q|)<1\quad\implies\quad\rho\left(\left|I-F^{-1}\left(M+F-A\right)M^{-1}A% \right|\right)<1.italic_ρ ( | italic_Q | ) < 1 ⟹ italic_ρ ( | italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A | ) < 1 .
Proof.

According to Lemma 3,

ρ(|Q|)<1W>0:QW<1.:formulae-sequence𝜌𝑄1iff𝑊0superscriptsubscriptnorm𝑄𝑊1\rho(|Q|)<1\quad\iff\quad\exists\ W>0:\ \|Q\|_{\infty}^{W}<1.italic_ρ ( | italic_Q | ) < 1 ⇔ ∃ italic_W > 0 : ∥ italic_Q ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT < 1 .

According to the two blocks of Q𝑄Qitalic_Q, take W=[W1W2]𝖳.𝑊superscriptmatrixsubscript𝑊1subscript𝑊2𝖳W=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix}^{\mathsf{T}}.italic_W = [ start_ARG start_ROW start_CELL italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT . Then, we have both

{τ(IM1A,W2,W1)<[111]𝖳,τ(IF1A,W1,W2)<[111]𝖳.cases𝜏𝐼superscript𝑀1𝐴subscript𝑊2subscript𝑊1superscriptmatrix111𝖳𝜏𝐼superscript𝐹1𝐴subscript𝑊1subscript𝑊2superscriptmatrix111𝖳\left\{\begin{array}[]{lcl}\tau\left(I-M^{-1}A,W_{2},W_{1}\right)&<&\begin{% bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}},\\ \tau\left(I-F^{-1}A,W_{1},W_{2}\right)&<&\begin{bmatrix}1&1&\cdots&1\end{% bmatrix}^{\mathsf{T}}.\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_τ ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL < end_CELL start_CELL [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_τ ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL < end_CELL start_CELL [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARRAY

Lemma 4 therefore ensures

τ((IF1A)(IM1A),W2,W2)𝜏𝐼superscript𝐹1𝐴𝐼superscript𝑀1𝐴subscript𝑊2subscript𝑊2\displaystyle\tau\left(\left(I-F^{-1}A\right)\left(I-M^{-1}A\right),W_{2},W_{2% }\right)italic_τ ( ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) <τ(IF1A,W1,W2)absent𝜏𝐼superscript𝐹1𝐴subscript𝑊1subscript𝑊2\displaystyle<\tau\left(I-F^{-1}A,W_{1},W_{2}\right)< italic_τ ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
<[111]𝖳,absentsuperscriptmatrix111𝖳\displaystyle<\begin{bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}},< [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ,

which leads to (IF1A)(IM1A)W2<1.superscriptsubscriptnorm𝐼superscript𝐹1𝐴𝐼superscript𝑀1𝐴subscript𝑊21\left\|\left(I-F^{-1}A\right)\left(I-M^{-1}A\right)\right\|_{\infty}^{W_{2}}<1.∥ ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 1 . Recall that

(IF1A)(IM1A)=IF1(M+FA)M1A.𝐼superscript𝐹1𝐴𝐼superscript𝑀1𝐴𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴\left(I-F^{-1}A\right)\left(I-M^{-1}A\right)=I-F^{-1}\left(M+F-A\right)M^{-1}A.( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) = italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A .

Lemma 3 finally ensures ρ(|IF1(M+FA)M1A|)<1,𝜌𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴1\rho\left(\left|I-F^{-1}\left(M+F-A\right)M^{-1}A\right|\right)<1,italic_ρ ( | italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A | ) < 1 , which concludes the proof. ∎

Corollary 1.

if A𝐴Aitalic_A is an 𝖧𝖧\mathsf{H}sansserif_H-matrix, then

{M|MA|=A,F|FA|=Aρ(|IF1(M+FA)M1A|)<1.casesdelimited-⟨⟩𝑀𝑀𝐴delimited-⟨⟩𝐴delimited-⟨⟩𝐹𝐹𝐴delimited-⟨⟩𝐴𝜌𝐼superscript𝐹1𝑀𝐹𝐴superscript𝑀1𝐴1\left\{\begin{array}[]{lcl}\langle M\rangle-|M-A|&=&\langle A\rangle,\\ \langle F\rangle-|F-A|&=&\langle A\rangle\end{array}\right.\quad\implies\quad% \rho\left(\left|I-F^{-1}\left(M+F-A\right)M^{-1}A\right|\right)<1.{ start_ARRAY start_ROW start_CELL ⟨ italic_M ⟩ - | italic_M - italic_A | end_CELL start_CELL = end_CELL start_CELL ⟨ italic_A ⟩ , end_CELL end_ROW start_ROW start_CELL ⟨ italic_F ⟩ - | italic_F - italic_A | end_CELL start_CELL = end_CELL start_CELL ⟨ italic_A ⟩ end_CELL end_ROW end_ARRAY ⟹ italic_ρ ( | italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M + italic_F - italic_A ) italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A | ) < 1 .
Proof.

Considering that A𝐴Aitalic_A is an 𝖧𝖧\mathsf{H}sansserif_H-matrix, take u>0𝑢0u>0italic_u > 0 like in Lemma 1, so as to have

|Ai,i|ui>ji|Ai,j|uji.conditionalsubscript𝐴𝑖𝑖ketsubscript𝑢𝑖subscript𝑗𝑖subscript𝐴𝑖𝑗subscript𝑢𝑗for-all𝑖|A_{i,i}|u_{i}>\sum_{j\neq i}|A_{i,j}|u_{j}\quad\forall i.| italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i .

We also have

M|MA|=Ai,{|Mi,i||Mi,iAi,i|=|Ai,i|,|Mi,j||Mi,jAi,j|=|Ai,j|ji,delimited-⟨⟩𝑀𝑀𝐴delimited-⟨⟩𝐴for-all𝑖casessubscript𝑀𝑖𝑖subscript𝑀𝑖𝑖subscript𝐴𝑖𝑖subscript𝐴𝑖𝑖subscript𝑀𝑖𝑗subscript𝑀𝑖𝑗subscript𝐴𝑖𝑗subscript𝐴𝑖𝑗for-all𝑗𝑖\langle M\rangle-|M-A|=\langle A\rangle\quad\implies\quad\forall i,\ \left\{% \begin{array}[]{lcl}|M_{i,i}|-|M_{i,i}-A_{i,i}|&=&|A_{i,i}|,\\ -|M_{i,j}|-|M_{i,j}-A_{i,j}|&=&-|A_{i,j}|\quad\forall j\neq i,\end{array}\right.⟨ italic_M ⟩ - | italic_M - italic_A | = ⟨ italic_A ⟩ ⟹ ∀ italic_i , { start_ARRAY start_ROW start_CELL | italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | - | italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | end_CELL start_CELL = end_CELL start_CELL | italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | , end_CELL end_ROW start_ROW start_CELL - | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | - | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | end_CELL start_CELL = end_CELL start_CELL - | italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | ∀ italic_j ≠ italic_i , end_CELL end_ROW end_ARRAY

and, then,

{|Mi,i|ui|Mi,iAi,i|ui=|Ai,i|ui,|Mi,j|uj|Mi,jAi,j|uj=|Ai,j|ujji.casessubscript𝑀𝑖𝑖subscript𝑢𝑖subscript𝑀𝑖𝑖subscript𝐴𝑖𝑖subscript𝑢𝑖subscript𝐴𝑖𝑖subscript𝑢𝑖subscript𝑀𝑖𝑗subscript𝑢𝑗subscript𝑀𝑖𝑗subscript𝐴𝑖𝑗subscript𝑢𝑗subscript𝐴𝑖𝑗subscript𝑢𝑗for-all𝑗𝑖\left\{\begin{array}[]{lcl}|M_{i,i}|u_{i}-|M_{i,i}-A_{i,i}|u_{i}&=&|A_{i,i}|u_% {i},\\ -|M_{i,j}|u_{j}-|M_{i,j}-A_{i,j}|u_{j}&=&-|A_{i,j}|u_{j}\quad\forall j\neq i.% \end{array}\right.{ start_ARRAY start_ROW start_CELL | italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - | italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = end_CELL start_CELL | italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL - | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL start_CELL = end_CELL start_CELL - | italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_j ≠ italic_i . end_CELL end_ROW end_ARRAY

It yields that, ifor-all𝑖\forall i∀ italic_i,

|Mi,i|uiji|Mi,j|uj|Mi,iAi,i|uiji|Mi,jAi,j|ujsubscript𝑀𝑖𝑖subscript𝑢𝑖subscript𝑗𝑖subscript𝑀𝑖𝑗subscript𝑢𝑗subscript𝑀𝑖𝑖subscript𝐴𝑖𝑖subscript𝑢𝑖subscript𝑗𝑖subscript𝑀𝑖𝑗subscript𝐴𝑖𝑗subscript𝑢𝑗\displaystyle|M_{i,i}|u_{i}-\sum_{j\neq i}|M_{i,j}|u_{j}-|M_{i,i}-A_{i,i}|u_{i% }-\sum_{j\neq i}|M_{i,j}-A_{i,j}|u_{j}| italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - | italic_M start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT =|Ai,i|uiji|Ai,j|ujabsentsubscript𝐴𝑖𝑖subscript𝑢𝑖subscript𝑗𝑖subscript𝐴𝑖𝑗subscript𝑢𝑗\displaystyle=|A_{i,i}|u_{i}-\sum_{j\neq i}|A_{i,j}|u_{j}= | italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
>0,absent0\displaystyle>0,> 0 ,

which implies, with F𝐹Fitalic_F also satisfying F|FA|=Adelimited-⟨⟩𝐹𝐹𝐴delimited-⟨⟩𝐴\langle F\rangle-|F-A|=\langle A\rangle⟨ italic_F ⟩ - | italic_F - italic_A | = ⟨ italic_A ⟩, that the matrix

A^:=[MAMAFF]assign^𝐴matrix𝑀𝐴𝑀𝐴𝐹𝐹\widehat{A}:=\begin{bmatrix}M&A-M\\ A-F&F\end{bmatrix}over^ start_ARG italic_A end_ARG := [ start_ARG start_ROW start_CELL italic_M end_CELL start_CELL italic_A - italic_M end_CELL end_ROW start_ROW start_CELL italic_A - italic_F end_CELL start_CELL italic_F end_CELL end_ROW end_ARG ]

is an 𝖧𝖧\mathsf{H}sansserif_H-matrix, according to Lemma 1. Define, then,

M^:=[M00F],assign^𝑀matrix𝑀00𝐹\widehat{M}:=\begin{bmatrix}M&0\\ 0&F\end{bmatrix},over^ start_ARG italic_M end_ARG := [ start_ARG start_ROW start_CELL italic_M end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_F end_CELL end_ROW end_ARG ] ,

and note that M^|M^A^|=A^delimited-⟨⟩^𝑀^𝑀^𝐴delimited-⟨⟩^𝐴\left\langle\widehat{M}\right\rangle-\left|\widehat{M}-\widehat{A}\right|=% \left\langle\widehat{A}\right\rangle⟨ over^ start_ARG italic_M end_ARG ⟩ - | over^ start_ARG italic_M end_ARG - over^ start_ARG italic_A end_ARG | = ⟨ over^ start_ARG italic_A end_ARG ⟩, which implies, by Definition 3, that M^|M^A^|delimited-⟨⟩^𝑀^𝑀^𝐴\left\langle\widehat{M}\right\rangle-\left|\widehat{M}-\widehat{A}\right|⟨ over^ start_ARG italic_M end_ARG ⟩ - | over^ start_ARG italic_M end_ARG - over^ start_ARG italic_A end_ARG | is an 𝖬𝖬\mathsf{M}sansserif_M-matrix, hence, by Definition 4, A^=M^(M^A^)^𝐴^𝑀^𝑀^𝐴\widehat{A}=\widehat{M}-\left(\widehat{M}-\widehat{A}\right)over^ start_ARG italic_A end_ARG = over^ start_ARG italic_M end_ARG - ( over^ start_ARG italic_M end_ARG - over^ start_ARG italic_A end_ARG ) is an 𝖧𝖧\mathsf{H}sansserif_H-splitting. Lemma 2 therefore ensures that ρ(|M^1(M^A^)|)<1,𝜌superscript^𝑀1^𝑀^𝐴1\rho\left(\left|\widehat{M}^{-1}\left(\widehat{M}-\widehat{A}\right)\right|% \right)<1,italic_ρ ( | over^ start_ARG italic_M end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_M end_ARG - over^ start_ARG italic_A end_ARG ) | ) < 1 , and one can verify that

M^1(M^A^)=[0IM1AIF1A0].superscript^𝑀1^𝑀^𝐴matrix0𝐼superscript𝑀1𝐴𝐼superscript𝐹1𝐴0\widehat{M}^{-1}\left(\widehat{M}-\widehat{A}\right)=\begin{bmatrix}0&I-M^{-1}% A\\ I-F^{-1}A&0\end{bmatrix}.over^ start_ARG italic_M end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_M end_ARG - over^ start_ARG italic_A end_ARG ) = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL end_ROW start_ROW start_CELL italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] .

Proposition 1 therefore finally applies, which concludes the proof. ∎

Theorem 7.

Let

Q:=[0IM1AIF1A0].assign𝑄matrix0𝐼superscript𝑀1𝐴𝐼superscript𝐹1𝐴0Q:=\begin{bmatrix}0&I-M^{-1}A\\ I-F^{-1}A&0\end{bmatrix}.italic_Q := [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL end_ROW start_ROW start_CELL italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] .

An asynchronous alternating method (7) converges from any initial guess x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, with any sequence {Ωk}ksubscriptsubscriptnormal-Ω𝑘𝑘\{\Omega_{k}\}_{k\in\mathbb{N}}{ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT and any functions δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to δmsubscript𝛿𝑚\delta_{m}italic_δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if ρ(|Q|)<1𝜌𝑄1\rho(|Q|)<1italic_ρ ( | italic_Q | ) < 1.

Proof.

Consider two collections, X=(X1,,Xm)𝑋subscript𝑋1subscript𝑋𝑚X=\left(X_{1},\ldots,X_{m}\right)italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) and Y=(Y1,,Ym)𝑌subscript𝑌1subscript𝑌𝑚Y=\left(Y_{1},\ldots,Y_{m}\right)italic_Y = ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), of m𝑚mitalic_m vectors. We have

|f(X)f(Y)|𝑓𝑋𝑓𝑌\displaystyle|f(X)-f(Y)|| italic_f ( italic_X ) - italic_f ( italic_Y ) | =|q=1mPq(XqYq)|absentsuperscriptsubscript𝑞1𝑚subscript𝑃𝑞subscript𝑋𝑞subscript𝑌𝑞\displaystyle=\left|\sum_{q=1}^{m}P_{q}\left(X_{q}-Y_{q}\right)\right|= | ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) |
q=1m|Pq|max(|X1Y1|,,|XmYm|).absentsuperscriptsubscript𝑞1𝑚subscript𝑃𝑞subscript𝑋1subscript𝑌1subscript𝑋𝑚subscript𝑌𝑚\displaystyle\leq\sum_{q=1}^{m}\left|P_{q}\right|\max\left(\left|X_{1}-Y_{1}% \right|,\ldots,\left|X_{m}-Y_{m}\right|\right).≤ ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | roman_max ( | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , … , | italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | ) .

Consequently, according to Theorem 6, an asynchronous alternating method (7) is convergent if ρ(q=1m|Pq|)<1.𝜌superscriptsubscript𝑞1𝑚subscript𝑃𝑞1\rho\left(\sum_{q=1}^{m}\left|P_{q}\right|\right)<1.italic_ρ ( ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ) < 1 . Recall, then, that according to Lemma 3,

ρ(|Q|)<1W>0:QW<1.:formulae-sequence𝜌𝑄1iff𝑊0superscriptsubscriptnorm𝑄𝑊1\rho(|Q|)<1\quad\iff\quad\exists\ W>0:\ \|Q\|_{\infty}^{W}<1.italic_ρ ( | italic_Q | ) < 1 ⇔ ∃ italic_W > 0 : ∥ italic_Q ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT < 1 .

According to the two blocks of Q𝑄Qitalic_Q, take W=[W1W2]𝖳.𝑊superscriptmatrixsubscript𝑊1subscript𝑊2𝖳W=\begin{bmatrix}W_{1}&W_{2}\end{bmatrix}^{\mathsf{T}}.italic_W = [ start_ARG start_ROW start_CELL italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT . Then, we have both

{τ(IM1A,W2,W1)<[111]𝖳,τ(IF1A,W1,W2)<[111]𝖳,cases𝜏𝐼superscript𝑀1𝐴subscript𝑊2subscript𝑊1superscriptmatrix111𝖳𝜏𝐼superscript𝐹1𝐴subscript𝑊1subscript𝑊2superscriptmatrix111𝖳\left\{\begin{array}[]{lcl}\tau\left(I-M^{-1}A,W_{2},W_{1}\right)&<&\begin{% bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}},\\ \tau\left(I-F^{-1}A,W_{1},W_{2}\right)&<&\begin{bmatrix}1&1&\cdots&1\end{% bmatrix}^{\mathsf{T}},\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_τ ( italic_I - italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL < end_CELL start_CELL [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_τ ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL < end_CELL start_CELL [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , end_CELL end_ROW end_ARRAY

implying, as well,

τ(I(q)M(q)1A(q),W2,W1(q))<[111]𝖳q{1,,m}.formulae-sequence𝜏superscript𝐼𝑞superscriptsuperscript𝑀𝑞1superscript𝐴𝑞subscript𝑊2superscriptsubscript𝑊1𝑞superscriptmatrix111𝖳for-all𝑞1𝑚\tau\left(I^{(q)}-{M^{(q)}}^{-1}A^{(q)},W_{2},W_{1}^{(q)}\right)<\begin{% bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}}\quad\forall q\in\{1,\ldots,m\}.italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) < [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∀ italic_q ∈ { 1 , … , italic_m } .

Lemma 4 therefore ensures, with s{1,,m}𝑠1𝑚s\in\{1,\ldots,m\}italic_s ∈ { 1 , … , italic_m },

τ((I(s,q)F(s)1A(s,q))(I(q)M(q)1A(q)),W2,W2(s))𝜏superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscript𝐼𝑞superscriptsuperscript𝑀𝑞1superscript𝐴𝑞subscript𝑊2superscriptsubscript𝑊2𝑠\displaystyle\tau\left(\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)\left(I^{% (q)}-{M^{(q)}}^{-1}A^{(q)}\right),W_{2},W_{2}^{(s)}\right)italic_τ ( ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) ( italic_I start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) <τ(I(s,q)F(s)1A(s,q),\displaystyle<\tau\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)},\right.< italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ,
W1(q),W2(s)).\displaystyle\left.\qquad\quad W_{1}^{(q)},W_{2}^{(s)}\right).italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) .

Recall that Pq(s):=(I(s,q)F(s)1A(s,q))(I(q)M(q)1A(q)),q,s{1,,m}.formulae-sequenceassignsuperscriptsubscript𝑃𝑞𝑠superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscript𝐼𝑞superscriptsuperscript𝑀𝑞1superscript𝐴𝑞𝑞𝑠1𝑚P_{q}^{(s)}:=\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)}\right)\left(I^{(q)}-{M^{(% q)}}^{-1}A^{(q)}\right),\ q,s\in\{1,\ldots,m\}.italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT ) ( italic_I start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT - italic_M start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT ) , italic_q , italic_s ∈ { 1 , … , italic_m } . Then, we have

τ(Pq(s),W2,W2(s))𝜏subscriptsuperscript𝑃𝑠𝑞subscript𝑊2superscriptsubscript𝑊2𝑠\displaystyle\tau\left(P^{(s)}_{q},W_{2},W_{2}^{(s)}\right)italic_τ ( italic_P start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) <τ(I(s,q)F(s)1A(s,q),W1(q),W2(s)),absent𝜏superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscriptsubscript𝑊1𝑞superscriptsubscript𝑊2𝑠\displaystyle<\tau\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)},W_{1}^{(q)},W_{2}^{(% s)}\right),< italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ,
τ(|Pq(s)|,W2,W2(s))𝜏subscriptsuperscript𝑃𝑠𝑞subscript𝑊2superscriptsubscript𝑊2𝑠\displaystyle\tau\left(\left|P^{(s)}_{q}\right|,W_{2},W_{2}^{(s)}\right)italic_τ ( | italic_P start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) <τ(I(s,q)F(s)1A(s,q),W1(q),W2(s)),absent𝜏superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscriptsubscript𝑊1𝑞superscriptsubscript𝑊2𝑠\displaystyle<\tau\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)},W_{1}^{(q)},W_{2}^{(% s)}\right),< italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ,
q=1mτ(|Pq(s)|,W2,W2(s))superscriptsubscript𝑞1𝑚𝜏subscriptsuperscript𝑃𝑠𝑞subscript𝑊2superscriptsubscript𝑊2𝑠\displaystyle\sum_{q=1}^{m}\tau\left(\left|P^{(s)}_{q}\right|,W_{2},W_{2}^{(s)% }\right)∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ ( | italic_P start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) <q=1mτ(I(s,q)F(s)1A(s,q),W1(q),W2(s)),absentsuperscriptsubscript𝑞1𝑚𝜏superscript𝐼𝑠𝑞superscriptsuperscript𝐹𝑠1superscript𝐴𝑠𝑞superscriptsubscript𝑊1𝑞superscriptsubscript𝑊2𝑠\displaystyle<\sum_{q=1}^{m}\tau\left(I^{(s,q)}-{F^{(s)}}^{-1}A^{(s,q)},W_{1}^% {(q)},W_{2}^{(s)}\right),< ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s , italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ,
τ(q=1m|Pq(s)|,W2,W2(s))𝜏superscriptsubscript𝑞1𝑚subscriptsuperscript𝑃𝑠𝑞subscript𝑊2superscriptsubscript𝑊2𝑠\displaystyle\tau\left(\sum_{q=1}^{m}\left|P^{(s)}_{q}\right|,W_{2},W_{2}^{(s)% }\right)italic_τ ( ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) <τ(I(s)F(s)1A(s),W1,W2(s)),absent𝜏superscript𝐼𝑠superscriptsuperscript𝐹𝑠1superscript𝐴𝑠subscript𝑊1superscriptsubscript𝑊2𝑠\displaystyle<\tau\left(I^{(s)}-{F^{(s)}}^{-1}A^{(s)},W_{1},W_{2}^{(s)}\right),< italic_τ ( italic_I start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ,
τ(q=1m|Pq|,W2,W2)𝜏superscriptsubscript𝑞1𝑚subscript𝑃𝑞subscript𝑊2subscript𝑊2\displaystyle\tau\left(\sum_{q=1}^{m}\left|P_{q}\right|,W_{2},W_{2}\right)italic_τ ( ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) <τ(IF1A,W1,W2),absent𝜏𝐼superscript𝐹1𝐴subscript𝑊1subscript𝑊2\displaystyle<\tau\left(I-F^{-1}A,W_{1},W_{2}\right),< italic_τ ( italic_I - italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A , italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,
<[111]𝖳,absentsuperscriptmatrix111𝖳\displaystyle<\begin{bmatrix}1&1&\cdots&1\end{bmatrix}^{\mathsf{T}},< [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL ⋯ end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ,

which leads to q=1m|Pq|W2<1.superscriptsubscriptnormsuperscriptsubscript𝑞1𝑚subscript𝑃𝑞subscript𝑊21\left\|\sum_{q=1}^{m}\left|P_{q}\right|\right\|_{\infty}^{W_{2}}<1.∥ ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 1 . By Lemma 3, we therefore satisfy ρ(q=1m|Pq|)<1,𝜌superscriptsubscript𝑞1𝑚subscript𝑃𝑞1\rho\left(\sum_{q=1}^{m}\left|P_{q}\right|\right)<1,italic_ρ ( ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ) < 1 , which concludes the proof. ∎

Corollary 2.

An asynchronous alternating method (7) converges from any initial guess x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, with any sequence {Ωk}ksubscriptsubscriptnormal-Ω𝑘𝑘\{\Omega_{k}\}_{k\in\mathbb{N}}{ roman_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT and any functions δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to δmsubscript𝛿𝑚\delta_{m}italic_δ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if A𝐴Aitalic_A is an 𝖧𝖧\mathsf{H}sansserif_H-matrix and

{M|MA|=A,F|FA|=A.casesdelimited-⟨⟩𝑀𝑀𝐴delimited-⟨⟩𝐴delimited-⟨⟩𝐹𝐹𝐴delimited-⟨⟩𝐴\left\{\begin{array}[]{lcl}\langle M\rangle-|M-A|&=&\langle A\rangle,\\ \langle F\rangle-|F-A|&=&\langle A\rangle.\end{array}\right.{ start_ARRAY start_ROW start_CELL ⟨ italic_M ⟩ - | italic_M - italic_A | end_CELL start_CELL = end_CELL start_CELL ⟨ italic_A ⟩ , end_CELL end_ROW start_ROW start_CELL ⟨ italic_F ⟩ - | italic_F - italic_A | end_CELL start_CELL = end_CELL start_CELL ⟨ italic_A ⟩ . end_CELL end_ROW end_ARRAY
Proof.

This follows in the same way as Corollary 1. ∎

Let 𝒟(𝒜)𝒟𝒜\mathcal{D}(\mathcal{A})caligraphic_D ( caligraphic_A ) denote the diagonal matrix obtained from the diagonal of a matrix 𝒜𝒜\mathcal{A}caligraphic_A.

Remark.

For practical applications of Corollary 2, let ΛΛ\Lambdaroman_Λ be a diagonal real matrix such that Λi,i1i.subscriptΛ𝑖𝑖1for-all𝑖\Lambda_{i,i}\geq 1\ \forall i.roman_Λ start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT ≥ 1 ∀ italic_i . We straightforwardly have

=Λ𝒟(𝒜)|𝒜|=𝒜.formulae-sequenceΛ𝒟𝒜delimited-⟨⟩𝒜delimited-⟨⟩𝒜\mathcal{M}=\Lambda\mathcal{D}(\mathcal{A})\quad\implies\quad\langle\mathcal{M% }\rangle-|\mathcal{M}-\mathcal{A}|=\langle\mathcal{A}\rangle.caligraphic_M = roman_Λ caligraphic_D ( caligraphic_A ) ⟹ ⟨ caligraphic_M ⟩ - | caligraphic_M - caligraphic_A | = ⟨ caligraphic_A ⟩ .
Remark.

In regard to the HSS splitting, if A𝐴Aitalic_A is a real matrix with 𝒟(A)0𝒟𝐴0\mathcal{D}(A)\geq 0caligraphic_D ( italic_A ) ≥ 0, and splitting matrices M𝑀Mitalic_M and F𝐹Fitalic_F are given by

M:=𝒟(αI+H),F:=𝒟(αI+S),αmaxiAi,i,formulae-sequenceassign𝑀𝒟𝛼𝐼𝐻formulae-sequenceassign𝐹𝒟𝛼𝐼𝑆𝛼subscript𝑖subscript𝐴𝑖𝑖M:=\mathcal{D}(\alpha I+H),\qquad F:=\mathcal{D}(\alpha I+S),\qquad\alpha\geq% \max_{i}A_{i,i},italic_M := caligraphic_D ( italic_α italic_I + italic_H ) , italic_F := caligraphic_D ( italic_α italic_I + italic_S ) , italic_α ≥ roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT ,

then we have both

M=αI+𝒟(A)𝒟(A),F=αI𝒟(A),formulae-sequence𝑀𝛼𝐼𝒟𝐴𝒟𝐴𝐹𝛼𝐼𝒟𝐴M=\alpha I+\mathcal{D}(A)\geq\mathcal{D}(A),\qquad F=\alpha I\geq\mathcal{D}(A),italic_M = italic_α italic_I + caligraphic_D ( italic_A ) ≥ caligraphic_D ( italic_A ) , italic_F = italic_α italic_I ≥ caligraphic_D ( italic_A ) ,

which satisfy M=ΛM𝒟(A),F=ΛF𝒟(A),formulae-sequence𝑀subscriptΛ𝑀𝒟𝐴𝐹subscriptΛ𝐹𝒟𝐴M=\Lambda_{M}\mathcal{D}(A),\ F=\Lambda_{F}\mathcal{D}(A),italic_M = roman_Λ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT caligraphic_D ( italic_A ) , italic_F = roman_Λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT caligraphic_D ( italic_A ) , where ΛMsubscriptΛ𝑀\Lambda_{M}roman_Λ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT and ΛFsubscriptΛ𝐹\Lambda_{F}roman_Λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are two diagonal real matrices with entries greater than or equal to 1111.

4 Implementation aspects

The two alternating iterations of the HSS method require the solution of two secondary problems involving the coefficient matrices αI+H𝛼𝐼𝐻\alpha I+Hitalic_α italic_I + italic_H and αI+S𝛼𝐼𝑆\alpha I+Sitalic_α italic_I + italic_S, respectively. In practice, as pointed out in, e.g., [5, 44], these problems are inexactly solved by means of iterative algorithms. A general description for both HSS and inexact HSS (IHSS) can be given by Algorithm 1.

Algorithm 1 HSS(solverH, solverS)
1:  x𝑥xitalic_x := x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT
2:  r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
3:  k𝑘kitalic_k := 00
4:  while r>εbnorm𝑟𝜀norm𝑏\|r\|>\varepsilon\|b\|∥ italic_r ∥ > italic_ε ∥ italic_b ∥ and k<kmax𝑘subscript𝑘maxk<k_{\text{max}}italic_k < italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT do
5:     y𝑦yitalic_y := solverH.solve(αI+H𝛼𝐼𝐻\alpha I+Hitalic_α italic_I + italic_H, r𝑟ritalic_r)
6:     x𝑥xitalic_x := x+y𝑥𝑦x+yitalic_x + italic_y
7:     r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
8:     y𝑦yitalic_y := solverS.solve(αI+S𝛼𝐼𝑆\alpha I+Sitalic_α italic_I + italic_S, r𝑟ritalic_r)
9:     x𝑥xitalic_x := x+y𝑥𝑦x+yitalic_x + italic_y
10:     r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
11:     k𝑘kitalic_k := k+1𝑘1k+1italic_k + 1
12:  end while

We can then designate by, e.g, HSS(CG, GMRES) an IHSS algorithm with the conjugate gradient (CG) method [27] for solving the shifted Hermitian problem and the generalized minimal residual (GMRES) method [41] for solving the shifted skew-Hermitian one.

Asynchronous HSS iterations necessarily belong to the class of IHSS algorithms since they obviously require the inner solvers to be asynchronous too, which further reduces such an approach to the subclass of IHSS with inner splittings. Taking, then, e.g., a splitting αI+H=MN,𝛼𝐼𝐻𝑀𝑁\alpha I+H=M-N,italic_α italic_I + italic_H = italic_M - italic_N , the solution, at each outer iteration k𝑘kitalic_k, of

(αI+H)yk=bAxk𝛼𝐼𝐻superscript𝑦𝑘𝑏𝐴superscript𝑥𝑘(\alpha I+H)y^{k}=b-Ax^{k}( italic_α italic_I + italic_H ) italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT

can be given by several inner iterations

yk,l+1=yk,l+M1(bAxk(αI+H)yk,l),superscript𝑦𝑘𝑙1superscript𝑦𝑘𝑙superscript𝑀1𝑏𝐴superscript𝑥𝑘𝛼𝐼𝐻superscript𝑦𝑘𝑙y^{k,l+1}=y^{k,l}+M^{-1}(b-Ax^{k}-(\alpha I+H)y^{k,l}),italic_y start_POSTSUPERSCRIPT italic_k , italic_l + 1 end_POSTSUPERSCRIPT = italic_y start_POSTSUPERSCRIPT italic_k , italic_l end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - ( italic_α italic_I + italic_H ) italic_y start_POSTSUPERSCRIPT italic_k , italic_l end_POSTSUPERSCRIPT ) , (8)

where l𝑙litalic_l is the inner iteration variable. Furthermore, when dealing with two-stage asynchronous iterations, one should particularly take advantage of the possibility to use the inner solution vector yk,l+1superscript𝑦𝑘𝑙1y^{k,l+1}italic_y start_POSTSUPERSCRIPT italic_k , italic_l + 1 end_POSTSUPERSCRIPT with any value of l𝑙litalic_l, given that asynchronous relaxation is very likely to benefit from each newly updated data. We refer the reader to, e.g., [8, 25] for more insights into the so called “asynchronous iterations with flexible communication”. Moreover, analysis of matrix splittings for two-stage asynchronous iterations reveals that convergence of such methods can be guaranteed for any number of inner iterations (see, e.g., [24]). According, therefore, to efficiency aspects related to flexible communication ideas, it is of some interest, in the end, to simply consider only one iteration of (8). If, in particular, we also consider as initial guess yk,0:=0assignsuperscript𝑦𝑘00y^{k,0}:=0italic_y start_POSTSUPERSCRIPT italic_k , 0 end_POSTSUPERSCRIPT := 0, then we can define

yk:=yk,1=M1(bAxk),assignsuperscript𝑦𝑘superscript𝑦𝑘1superscript𝑀1𝑏𝐴superscript𝑥𝑘y^{k}:=y^{k,1}=M^{-1}(b-Ax^{k}),italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT := italic_y start_POSTSUPERSCRIPT italic_k , 1 end_POSTSUPERSCRIPT = italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ,

so as to finally have

xk+12=xk+M1(bAxk),superscript𝑥𝑘12superscript𝑥𝑘superscript𝑀1𝑏𝐴superscript𝑥𝑘x^{k+\frac{1}{2}}=x^{k}+M^{-1}(b-Ax^{k}),italic_x start_POSTSUPERSCRIPT italic_k + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_b - italic_A italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ,

which falls under the general alternating scheme (2) that has been considered in our theoretical analysis. Such a specialization of Algorithm 1 is given by Algorithm 2, where M1superscript𝑀1M^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and F1superscript𝐹1F^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT are preconditioners of αI+H𝛼𝐼𝐻\alpha I+Hitalic_α italic_I + italic_H and αI+S𝛼𝐼𝑆\alpha I+Sitalic_α italic_I + italic_S, respectively.

Algorithm 2 HSS(M1superscript𝑀1M^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1F^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT)
1:  x𝑥xitalic_x := x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT
2:  r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
3:  k𝑘kitalic_k := 00
4:  while r>εbnorm𝑟𝜀norm𝑏\|r\|>\varepsilon\|b\|∥ italic_r ∥ > italic_ε ∥ italic_b ∥ and k<kmax𝑘subscript𝑘maxk<k_{\text{max}}italic_k < italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT do
5:     x𝑥xitalic_x := x+M1r𝑥superscript𝑀1𝑟x+M^{-1}ritalic_x + italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r
6:     r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
7:     x𝑥xitalic_x := x+F1r𝑥superscript𝐹1𝑟x+F^{-1}ritalic_x + italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r
8:     r𝑟ritalic_r := bAx𝑏𝐴𝑥b-Axitalic_b - italic_A italic_x
9:     k𝑘kitalic_k := k+1𝑘1k+1italic_k + 1
10:  end while

Note that Algorithm 2 needs to be specifically implemented instead of just using Algorithm 1 with calls of relaxation-based inner solvers with maximum number of iterations set to 1111. Indeed, on pure computer science aspects, avoiding inner function calls and loops can result in a very significant execution time saving, which even makes HSS(M1superscript𝑀1M^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1F^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) possibly competitive, in practice, with, e.g., HSS(CG, GMRES), as we shall see in Section 5.

From Algorithm 2, iterative scheme (7), programming models [31, 34] and convergence detection approach [26], asynchronous parallel implementation of HSS iterations is obtained as described by Algorithm 3, where the communication routines start with “Com” and are blocking by default. Their non-blocking counterparts are designated by “ICom” with the letter “I” standing for “immediate”, similarly to the Message Passing Interface (MPI) standard.

Algorithm 3 Asynchronous parallel HSS(M(s)1superscriptsuperscript𝑀𝑠1{M^{(s)}}^{-1}italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F(s)1superscriptsuperscript𝐹𝑠1{F^{(s)}}^{-1}italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) on process s{1,,m}𝑠1𝑚s\in\{1,\ldots,m\}italic_s ∈ { 1 , … , italic_m }
1:  x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := x(s),0superscript𝑥𝑠0x^{(s),0}italic_x start_POSTSUPERSCRIPT ( italic_s ) , 0 end_POSTSUPERSCRIPT
2:  x𝑥xitalic_x := IComSendRecvInit(x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT)
3:  r(s)superscript𝑟𝑠r^{(s)}italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := b(s)A(s)xsuperscript𝑏𝑠superscript𝐴𝑠𝑥b^{(s)}-A^{(s)}xitalic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_x
4:  rr(s)𝑟superscript𝑟𝑠{rr}^{(s)}italic_r italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := r(s)𝖧r(s)superscriptsuperscript𝑟𝑠𝖧superscript𝑟𝑠{r^{(s)}}^{\mathsf{H}}r^{(s)}italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
5:  rr𝑟𝑟rritalic_r italic_r := ComSum(rr(s)𝑟superscript𝑟𝑠{rr}^{(s)}italic_r italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT)
6:  rnorm𝑟\|r\|∥ italic_r ∥ := rr𝑟𝑟\sqrt{rr}square-root start_ARG italic_r italic_r end_ARG
7:  τ𝜏\tauitalic_τ := False
8:  k𝑘kitalic_k := 00
9:  while r>εbnorm𝑟𝜀norm𝑏\|r\|>\varepsilon\|b\|∥ italic_r ∥ > italic_ε ∥ italic_b ∥ and k<kmax𝑘subscript𝑘maxk<k_{\text{max}}italic_k < italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT do
10:     x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := x(s)+M(s)1r(s)superscript𝑥𝑠superscriptsuperscript𝑀𝑠1superscript𝑟𝑠x^{(s)}+{M^{(s)}}^{-1}r^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
11:     x𝑥xitalic_x := IComSendRecv(x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT)
12:     r(s)superscript𝑟𝑠r^{(s)}italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := b(s)A(s)xsuperscript𝑏𝑠superscript𝐴𝑠𝑥b^{(s)}-A^{(s)}xitalic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_x
13:     x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := x(s)+F(s)1r(s)superscript𝑥𝑠superscriptsuperscript𝐹𝑠1superscript𝑟𝑠x^{(s)}+{F^{(s)}}^{-1}r^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT + italic_F start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
14:     x𝑥xitalic_x := IComSendRecv(x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT)
15:     r(s)superscript𝑟𝑠r^{(s)}italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := b(s)A(s)xsuperscript𝑏𝑠superscript𝐴𝑠𝑥b^{(s)}-A^{(s)}xitalic_b start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_x
16:     if not τ𝜏\tauitalic_τ then
17:        rr(s)𝑟superscript𝑟𝑠{rr}^{(s)}italic_r italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT := r(s)𝖧r(s)superscriptsuperscript𝑟𝑠𝖧superscript𝑟𝑠{r^{(s)}}^{\mathsf{H}}r^{(s)}italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT
18:        ComRequest := IComSum(rr(s)𝑟superscript𝑟𝑠{rr}^{(s)}italic_r italic_r start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT, rr𝑟𝑟rritalic_r italic_r)
19:        τ𝜏\tauitalic_τ := True
20:     end if
21:     σ𝜎\sigmaitalic_σ := ComTest(ComRequest)
22:     if σ𝜎\sigmaitalic_σ then
23:        rnorm𝑟\|r\|∥ italic_r ∥ := rr𝑟𝑟\sqrt{rr}square-root start_ARG italic_r italic_r end_ARG
24:        τ𝜏\tauitalic_τ := False
25:        k𝑘kitalic_k := k+1𝑘1k+1italic_k + 1
26:     end if
27:  end while

The routines ComSum and IComSum are used to compute dot product r𝖧rsuperscript𝑟𝖧𝑟r^{\mathsf{H}}ritalic_r start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT italic_r with r=bAx𝑟𝑏𝐴𝑥r=b-Axitalic_r = italic_b - italic_A italic_x by global reduction operation

q=1mr(q)𝖧r(q),r(q)=b(q)A(q)x.superscriptsubscript𝑞1𝑚superscriptsuperscript𝑟𝑞𝖧superscript𝑟𝑞superscript𝑟𝑞superscript𝑏𝑞superscript𝐴𝑞𝑥\sum_{q=1}^{m}{r^{(q)}}^{\mathsf{H}}r^{(q)},\qquad r^{(q)}=b^{(q)}-A^{(q)}x.∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT = italic_b start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT italic_x .

They can readily be replaced by MPI routines MPI_Allreduce and MPI_Iallreduce, respectively. The object ComRequest and the routine ComTest are therefore analogous to MPI_Request and MPI_Test. Such a simple way to reliably use the classical loop stop** criterion r>εbnorm𝑟𝜀norm𝑏\|r\|>\varepsilon\|b\|∥ italic_r ∥ > italic_ε ∥ italic_b ∥ in case of asynchronous iterations is due to [26]. It also allows for considering a counter, k𝑘kitalic_k, of the number of global convergence tests. On the other hand, the data exchange routine IComSendRecv has to be a bit constructed using, e.g., MPI routines MPI_Isend and MPI_Irecv. Briefly, the routine IComSendRecvInit triggers non-blocking requests for message sending (x(s)superscript𝑥𝑠x^{(s)}italic_x start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT) and reception (x(q)superscript𝑥𝑞x^{(q)}italic_x start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT, qs𝑞𝑠q\neq sitalic_q ≠ italic_s), and fills up the components x(q)superscript𝑥𝑞x^{(q)}italic_x start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT, qs𝑞𝑠q\neq sitalic_q ≠ italic_s, of the vector x𝑥xitalic_x with any arbitrary values. Note that both storage and communication of components x(q)superscript𝑥𝑞x^{(q)}italic_x start_POSTSUPERSCRIPT ( italic_q ) end_POSTSUPERSCRIPT, qs𝑞𝑠q\neq sitalic_q ≠ italic_s, should actually be limited to values which are necessary for computing the product A(s)xsuperscript𝐴𝑠𝑥A^{(s)}xitalic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_x, according to the nonzero entries in A(s)superscript𝐴𝑠A^{(s)}italic_A start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT. The subsequent calls to the routine IComSendRecv then check completion of previous requests, update x𝑥xitalic_x with received data and trigger new instances of the completed requests. Further details can be found in, e.g., [34].

5 Numerical experiments

5.1 Problems and overall settings

Numerical experiments have been conducted on two kinds of problem. The first one consists of a three-dimensional (3D) convection-diffusion equation,

Δu+cu=f in ΩΔ𝑢𝑐𝑢𝑓 in Ω-\Delta u+c\cdot\nabla u=f\mbox{ in $\Omega$}- roman_Δ italic_u + italic_c ⋅ ∇ italic_u = italic_f in roman_Ω (9)

with Ω=[0,1]×[0,1]×[0,1]Ω010101\Omega=[0,1]\times[0,1]\times[0,1]roman_Ω = [ 0 , 1 ] × [ 0 , 1 ] × [ 0 , 1 ] and Dirichlet boundary conditions. Discretization has been achieved using seven-point centered differences for both convection and diffusion terms. A fixed value, 20202020, has been used for all elements in the three-dimensional vector c𝑐citalic_c as convection parameter. The entries of the exact discrete solution, x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, have been taken randomly in [0,1)01[0,1)[ 0 , 1 ) and the right-hand side has then been constructed as b=Ax*𝑏𝐴superscript𝑥b=Ax^{*}italic_b = italic_A italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT.

The second kind of problem consists of a 2D structural dynamics equation (see, e.g., [10, 3]),

[(ω2L+K)+i(ωCv+Ch)]x=b,delimited-[]superscript𝜔2𝐿𝐾i𝜔subscript𝐶𝑣subscript𝐶𝑥𝑏\left[\left(-\omega^{2}L+K\right)+\operatorname{i}\left(\omega C_{v}+C_{h}% \right)\right]x=b,[ ( - italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L + italic_K ) + roman_i ( italic_ω italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ] italic_x = italic_b , (10)

where L𝐿Litalic_L and K𝐾Kitalic_K denote the mass and stiffness matrices, respectively; Cvsubscript𝐶𝑣C_{v}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and Chsubscript𝐶C_{h}italic_C start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT denote the viscous and hysteretic dam** matrices, respectively; ω𝜔\omegaitalic_ω denotes the circular frequency. The values of the matrices and the parameters have been taken from [3]. The matrix K𝐾Kitalic_K is the five-point finite difference discretization of a diffusion term on the unit square [0,1]×[0,1]0101[0,1]\times[0,1][ 0 , 1 ] × [ 0 , 1 ] with Dirichlet boundary conditions. The other matrices have been set as L=I𝐿𝐼L=Iitalic_L = italic_I, Cv=10Isubscript𝐶𝑣10𝐼C_{v}=10Iitalic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 10 italic_I, Ch=μKsubscript𝐶𝜇𝐾C_{h}=\mu Kitalic_C start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_μ italic_K, where μ=0.02𝜇0.02\mu=0.02italic_μ = 0.02, and I𝐼Iitalic_I denotes the n×n𝑛𝑛n\times nitalic_n × italic_n identity matrix. The circular frequency ω𝜔\omegaitalic_ω has been set to π𝜋\piitalic_π. The right-hand side has been taken as b=(1+i)Aq𝑏1i𝐴𝑞b=(1+\operatorname{i})Aqitalic_b = ( 1 + roman_i ) italic_A italic_q with q𝑞qitalic_q being a vector of 1111, to ensure that all entries of x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT equal 1+i1i1+\operatorname{i}1 + roman_i.

In the following, parallel execution times (wall-clock), numbers of iterations, k𝑘kitalic_k, and final residual errors, r𝑟ritalic_r, are reported for the GMRES [41], the IHSS [5] (Algorithms 1 and 2) and the asynchronous IHSS methods (Algorithm 3), with a stop** criterion set so as to have

r=bAx*b<106.𝑟norm𝑏𝐴superscript𝑥norm𝑏superscript106r=\frac{\|b-Ax^{*}\|}{\|b\|}<10^{-6}.italic_r = divide start_ARG ∥ italic_b - italic_A italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ end_ARG start_ARG ∥ italic_b ∥ end_ARG < 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT .

In case of asynchronous execution, minimum and maximum numbers of local iterations, kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, respectively, are considered since there is not global iterations k𝑘kitalic_k. Both for synchronous and asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) (respectively, Algorithms 2 and 3), we took

M:=𝒟(αI+H),F:=𝒟(αI+S).formulae-sequenceassign𝑀𝒟𝛼𝐼𝐻assign𝐹𝒟𝛼𝐼𝑆M:=\mathcal{D}(\alpha I+H),\qquad F:=\mathcal{D}(\alpha I+S).italic_M := caligraphic_D ( italic_α italic_I + italic_H ) , italic_F := caligraphic_D ( italic_α italic_I + italic_S ) .

All of the tests have been entirely implemented in the Python language, using NumPy, SciPy Sparse and MPI4Py [18] modules.

A comparison with some results in [3] about the problem (10) (Example 4.2 in [3]) is reported in Table 1 for single-process execution of full GMRES, GMRES(restart), and HSS(CG, GMRES(restart)) with inner residual threshold set to 1010superscript101010^{-10}10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT in order to compare with an “exact” HSS.

Table 1: Comparison with Ref. [3] for the test case (10), number of processes p=1𝑝1p=1italic_p = 1.
Experiment Results
Ref. [3]
MATLAB
2.66 GHz CPU
1.97 GB RAM
n𝑛nitalic_n 64×64646464\times 6464 × 64 128×128128128128\times 128128 × 128
Method Clock (sec) k𝑘kitalic_k Clock (sec) k𝑘kitalic_k
HSS 4.81 284 60 540
GMRES(10) 1.08 973 20 3096
GMRES(20) 1.50 632 22 1704
GMRES 2.98 161 45 308
Python
2.40 GHz CPU
174 GB RAM
n𝑛nitalic_n 64×64646464\times 6464 × 64 128×128128128128\times 128128 × 128
Method Clock (sec) k𝑘kitalic_k Clock (sec) k𝑘kitalic_k
HSS(CG,GMRES(10)) 4.80 284 44 540
GMRES(10) 0.36 1072 3.56 3346
GMRES(20) 0.33 672 2.70 1790
GMRES 0.44 161 5.19 308

The experimentally optimal value of α𝛼\alphaitalic_α, according to [3], was considered for each problem size n𝑛nitalic_n (α=0.12𝛼0.12\alpha=0.12italic_α = 0.12 for n=642𝑛superscript642n=64^{2}italic_n = 64 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and α=0.07𝛼0.07\alpha=0.07italic_α = 0.07 for n=1282𝑛superscript1282n=128^{2}italic_n = 128 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT). We recall that the experiments in [3] were run in MATLAB on a personal computer consisting of a 2.66 GHz Intel Core Duo central processing unit (CPU) and 1.97 GB of random access memory (RAM). Our single-process tests, here, have been performed on a computational cluster node consisting of a 2.40 GHz Intel Xeon Skylake CPU and 174 GB of RAM. Same numbers of iterations are obtained for our implementation of HSS(CG, GMRES(10)), where both CG and GMRES’s tolerances were set to 1010superscript101010^{-10}10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT, and the HSS experimented in [3] with direct inner solvers. Same result is observed for full GMRES too, while very slight differences appear for the restarted GMRES.

The remaining tests, which involve multi-process execution, have been performed on cluster nodes consisting of 2 ×\times× 12-cores 2.30 GHz Intel Xeon Haswell CPU (24 cores per node) and 48 GB of RAM (2 GB per core). The nodes are interconnected through a 56 Gb/s fourteen data rate (FDR) Infiniband network, on which the SGI MPT library is used as implementation of the MPI standard.

5.2 Results on the 3D convection-diffusion problem

5.2.1 Optimal parameters

The 3D convection-diffusion test case (9) was run on an obtained discrete problem with n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT unknowns, using from p=48𝑝48p=48italic_p = 48 to p=192𝑝192p=192italic_p = 192 processor cores (one MPI process per core).

Table 2 shows execution times for various values of the restart parameter of GMRES.

Table 2: Varying the restart parameter of GMRES for the 3D convection-diffusion test case (9), problem size n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
p𝑝pitalic_p 48484848 192192192192
Restart Clock (sec) k𝑘kitalic_k r𝑟ritalic_r Clock (sec) k𝑘kitalic_k r𝑟ritalic_r
5 344 917 9.98E-07 187 917 9.98E-07
10 251 489 9.70E-07 149 489 9.70E-07
20 274 318 9.44E-07 161 318 9.44E-07
30 427 349 9.77E-07 247 349 9.77E-07
40 614 385 9.65E-07 349 385 9.65E-07
50 748 393 9.59E-07 440 393 9.59E-07
100 1765 457 9.80E-07 969 457 9.80E-07
(Full) 2695 281 8.56E-07 1677 281 8.56E-07

This allows us to choose the value 10 as the experimentally optimal one, however, performances for a restart value of 20 were quite similar.

We therefore looked for performance variation of HSS(CG, GMRES(10)) according to its parameter α𝛼\alphaitalic_α and the inner residual threshold εinsubscript𝜀in\varepsilon_{\text{in}}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT set for both CG and GMRES(10). Convergence was obtained from εin=102subscript𝜀insuperscript102\varepsilon_{\text{in}}=10^{-2}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, which also demonstrated more efficiency than lower thresholds, as shown in Table 3.

Table 3: Varying the parameter α𝛼\alphaitalic_α and the inner residual threshold εinsubscript𝜀in\varepsilon_{\text{in}}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT of HSS(CG,GMRES(10)) for the 3D convection-diffusion test case (9), problem size n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, number of processes p=192𝑝192p=192italic_p = 192.
εinsubscript𝜀in\varepsilon_{\text{in}}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = 1.00E-02 εinsubscript𝜀in\varepsilon_{\text{in}}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = 1.00E-06
α𝛼\alphaitalic_α Clock (sec) k𝑘kitalic_k kinsubscript𝑘ink_{\text{in}}italic_k start_POSTSUBSCRIPT in end_POSTSUBSCRIPT r𝑟ritalic_r α𝛼\alphaitalic_α Clock (sec) k𝑘kitalic_k kinsubscript𝑘ink_{\text{in}}italic_k start_POSTSUBSCRIPT in end_POSTSUBSCRIPT r𝑟ritalic_r
0.7 718 213 2182 9.84E-07 0.9 2431 270 7331 9.85E-07
0.6 712 186 2124 9.57E-07 0.8 2395 240 7129 9.85E-07
0.5 665 162 1949 9.94E-07 0.7 2398 210 6986 9.84E-07
0.4 844 164 2148 9.76E-07 0.6 2450 180 6916 9.84E-07

Quite surprisingly, the number of outer iterations even slightly increased when switching from 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT to 106superscript10610^{-6}10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT.

While a restart value of 10 resulted in the most efficient executions of the GMRES solver, it does not necessarily prove to be the best choice for HSS(CG, GMRES(restart)) as well. Handling a combination of three parameters, α𝛼\alphaitalic_α, εinsubscript𝜀in\varepsilon_{\text{in}}italic_ε start_POSTSUBSCRIPT in end_POSTSUBSCRIPT and GMRES’ restart, is clearly a major drawback of HSS(CG, GMRES(restart)), especially if, additionally, the number of processes (and so, possibly, the load per process) might have an impact too. Our two-stage-splitting-based HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) with single inner iteration takes the set of parameters back to α𝛼\alphaitalic_α, as in the case of exact HSS. Moreover, as mentioned in Section 4, avoiding inner solver function calls and loops might constitute an attractive feature, considering pure computer science aspects. This is shown here by comparing Tables 3 and 4.

Table 4: Varying the parameter α𝛼\alphaitalic_α of HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) for the 3D convection-diffusion test case (9), problem size n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
p𝑝pitalic_p 48484848 192192192192
α𝛼\alphaitalic_α Clock (sec) k𝑘kitalic_k r𝑟ritalic_r Clock (sec) k𝑘kitalic_k r𝑟ritalic_r
6.0 566 2348 9.98E-07 252 2307 9.98E-07
5.0 485 2008 9.99E-07 214 1965 9.94E-07
4.0 399 1657 9.94E-07 177 1611 9.98E-07
3.0 311 1288 9.90E-07 136 1239 9.70E-07

For p=192𝑝192p=192italic_p = 192 processes, best execution times of HSS(CG, GMRES(10)) and HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) are, respectively, 665 and 136 seconds. Note that the former performed 1949 inner iterations while the latter converged in 2576 inner iterations (2 ×\times× 1288 outer iterations since there is one inner iteration using M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and another one using F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT). Such a surprisingly quite small gap in convergence speed confirms the possibility to achieve a faster solver in execution time by avoiding inner function calls and loops. Still, an important drawback for HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) is that it turned divergent for α2.0𝛼2.0\alpha\leq 2.0italic_α ≤ 2.0.

Finally, Table 5 shows that α=3.0𝛼3.0\alpha=3.0italic_α = 3.0 was experimentally optimal for the asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) too. And here as well, divergence has been observed for α2.0𝛼2.0\alpha\leq 2.0italic_α ≤ 2.0.

Table 5: Varying the parameter α𝛼\alphaitalic_α of asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) for the 3D convection-diffusion test case (9), problem size n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
p𝑝pitalic_p 48484848 192192192192
α𝛼\alphaitalic_α Clock (sec) kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT r𝑟ritalic_r Clock (sec) kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT r𝑟ritalic_r
6.0 24 3134 4609 4.32E-07 7.46 7299 9491 4.83E-07
5.0 22 2812 3969 4.31E-07 7.04 6832 9175 6.57E-07
4.0 20 2573 3695 4.21E-07 6.82 6668 8846 5.12E-07
3.0 17 2278 3080 5.49E-07 6.24 5950 7996 9.78E-07

5.2.2 Performance comparison

Using experimentally obtained optimal parameters, a performance comparison on p=48𝑝48p=48italic_p = 48 to p=192𝑝192p=192italic_p = 192 cores is summarized here in Table 6, where we dropped off the HSS(CG, GMRES(10)) due to memory limits exceeded for p120𝑝120p\leq 120italic_p ≤ 120.

Table 6: Performances from the 3D convection-diffusion test case (9), problem size n=1003𝑛superscript1003n=100^{3}italic_n = 100 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT.
GMRES(10) HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 3.0) Async. HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 3.0)
p𝑝pitalic_p Clock k𝑘kitalic_k r𝑟ritalic_r Clock k𝑘kitalic_k r𝑟ritalic_r Clock kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT r𝑟ritalic_r
(sec) (sec) (sec)
48 251 489 9.70E-07 311 1288 9.90E-07 17 2278 3080 5.49E-07
72 197 489 9.70E-07 222 1222 9.92E-07 12 3401 3912 8.44E-07
96 239 489 9.70E-07 203 1177 9.92E-07 14 5682 6678 9.21E-07
120 151 489 9.70E-07 193 1228 9.97E-07 12 6541 8233 8.79E-07
144 169 489 9.70E-07 179 1229 9.93E-07 10 7176 9394 9.50E-07
168 150 489 9.70E-07 133 1240 9.89E-07 6.20 5526 7562 8.59E-07
192 149 489 9.70E-07 136 1239 9.70E-07 6.24 5950 7996 9.78E-07

One can see a significant gain by asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 3.0), which was, e.g., at p=192𝑝192p=192italic_p = 192 processor cores, about 20 times faster (in execution time) than both GMRES(10) and synchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 3.0). While the second-stage splittings using preconditioners M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT were introduced here to achieve a fully asynchronous version of HSS, such a gap between the performances of synchronous and asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 3.0) in a homogeneous high-speed computational environment shows that there is a true advantage in resorting to asynchronous iterations, which is not due to possible programming biases introduced by this particular implementation of HSS.

5.3 Results on the 2D structural dynamics problem

5.3.1 Optimal parameters

The complex 2D structural dynamics test case (10) was run on an obtained discrete problem with n=3502𝑛superscript3502n=350^{2}italic_n = 350 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT unknowns, using from p=24𝑝24p=24italic_p = 24 to p=54𝑝54p=54italic_p = 54 processor cores (one MPI process per core).

Table 7 shows execution times for various values of the restart parameter of GMRES.

Table 7: Varying the restart parameter of GMRES for the 2D structural dynamics test case (10), problem size n=3502𝑛superscript3502n=350^{2}italic_n = 350 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, number of processes p=48𝑝48p=48italic_p = 48.
Restart Clock (sec) k𝑘kitalic_k r𝑟ritalic_r
5 5405 36594 1.00E-06
10 3960 19679 1.00E-06
20 3068 9072 1.01E-06
30 3053 6386 1.02E-06
40 3158 5125 1.04E-06
50 3084 4080 9.84E-07
100 3433 2727 7.89E-07
(Full) 7898 789 9.63E-07

This allows us to choose the value 30 as the experimentally optimal one, however, performances for restart values of 20 to 50 were quite similar.

Both HSS(CG, GMRES(30)) and HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) failed to converge within two hours of execution on p=48𝑝48p=48italic_p = 48 cores for various values of their parameters, which made them unpractical for the current test case.

Nevertheless, asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) took reasonable times to converge, and Table 8 shows an experimentally optimal α=2.0𝛼2.0\alpha=2.0italic_α = 2.0. Divergence was observed for α1.0𝛼1.0\alpha\leq 1.0italic_α ≤ 1.0.

Table 8: Varying the parameter α𝛼\alphaitalic_α of asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) for the 2D structural dynamics test case (10), problem size n=3502𝑛superscript3502n=350^{2}italic_n = 350 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, number of processes p=48𝑝48p=48italic_p = 48.
α𝛼\alphaitalic_α Clock (sec) kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT r𝑟ritalic_r
5.0 273 398754 493820 7.19E-07
4.0 235 349111 425328 8.71E-07
3.0 198 293439 357005 1.04E-06
2.0 156 231787 281838 9.50E-07

5.3.2 Performance comparison

Using experimentally obtained optimal parameters, a performance comparison on p=24𝑝24p=24italic_p = 24 to p=54𝑝54p=54italic_p = 54 cores is summarized in Table 9.

Table 9: Performances from the complex 2D structural dynamics test case (10), problem size n=3502𝑛superscript3502n=350^{2}italic_n = 350 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
GMRES(30) Async. HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 2.0)
p𝑝pitalic_p Clock (sec) k𝑘kitalic_k r𝑟ritalic_r Clock (sec) kminsubscript𝑘mink_{\text{min}}italic_k start_POSTSUBSCRIPT min end_POSTSUBSCRIPT kmaxsubscript𝑘maxk_{\text{max}}italic_k start_POSTSUBSCRIPT max end_POSTSUBSCRIPT r𝑟ritalic_r
24 2941 6486 9.99E-07 308 183861 203002 8.50E-07
30 2722 6419 9.99E-07 253 212597 249716 8.81E-07
36 2967 6510 1.02E-06 241 236977 277301 9.86E-07
42 2656 6479 1.02E-06 154 211052 257389 1.01E-06
48 3053 6386 1.02E-06 156 231787 281838 9.50E-07
54 2829 6479 1.01E-06 159 251221 310456 9.13E-07

Again, a significant gain is obtained by asynchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 2.0), which was, e.g., at p=48𝑝48p=48italic_p = 48 processor cores, about 20 times faster than GMRES(30), similarly to the real 3D convection-diffusion test case. Here as well an even more important performance gap is observed between asynchronous and synchronous HSS(M1superscript𝑀1{M}^{-1}italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, F1superscript𝐹1{F}^{-1}italic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, 2.0) which did not terminate within 7200 seconds. This confirms, for the complex test case as well, the benefit purely from asynchronous iterations.

6 Conclusion

Asynchronous alternating iterations are revealed here as a practical breakthrough in improving computational time of parallel solution of non-Hermitian problems, compared to the well-known GMRES and HSS methods. Classical asynchronous convergence conditions are investigated for a general practical parallel scheme of alternating iterations. In particular, it can result in a two-stage variant of the HSS method with one inner iteration for each of the outer alternating ones. Performance experiments have been conducted for such an asynchronous variant which has significantly outperformed both the GMRES and the classical HSS methods, both on a real convection-diffusion and a complex structural dynamics problem.

Acknowledgement

The paper has been prepared with the support of the “RUDN University Program 5-100”, the French national program LEFE/INSU, the project ADOM (Méthodes de décomposition de domaine asynchrones) of the French National Research Agency (ANR), and using HPC resources from the “Mésocentre” computing center of CentraleSupélec and École Normale Supérieure Paris-Saclay supported by CNRS and Région Île-de-France.

References

  • [1] Z.-Z. Bai. On the convergence of additive and multiplicative splitting iterations for systems of linear equations. J. Comput. Appl. Math., 154(1):195–214, 2003.
  • [2] Z.-Z. Bai. Regularized HSS iteration methods for stabilized saddle-point problems. IMA J. Numer. Anal., 39(4):1888–1923, 2019.
  • [3] Z.-Z. Bai, M. Benzi, and F. Chen. Modified HSS iteration methods for a class of complex symmetric linear systems. Computing, 87(3):93–111, 2010.
  • [4] Z.-Z. Bai, G. H. Golub, and C.-K. Li. Optimal parameter in Hermitian and skew-Hermitian splitting method for certain two-by-two block matrices. SIAM J. Sci. Comput., 28(2):583–603, 2006.
  • [5] Z.-Z. Bai, G. H. Golub, and M. K. Ng. Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. SIAM J. Matrix Anal. Appl., 24(3):603–626, 2003.
  • [6] Z.-Z. Bai and M. Rozložník. On the numerical behavior of matrix splitting iteration methods for solving linear systems. SIAM J. Numer. Anal., 53(4):1716–1737, 2015.
  • [7] G. M. Baudet. Asynchronous iterative methods for multiprocessors. J. ACM, 25(2):226–244, 1978.
  • [8] D. E. Baz, P. Spiteri, J. C. Miellou, and D. Gazen. Asynchronous iterative algorithms with flexible communication for nonlinear network flow problems. J. Parallel Distrib. Comput., 38(1):1 – 15, 1996.
  • [9] M. Benzi. A generalization of the Hermitian and skew-Hermitian splitting iteration. SIAM J. Matrix Anal. Appl., 31(2):360–374, 2009.
  • [10] M. Benzi and D. Bertaccini. Block preconditioning of real-valued iterative algorithms for complex linear systems. IMA J. Numer. Anal., 28(3):598–618, 2008.
  • [11] M. Benzi and J. Liu. An efficient solver for the incompressible Navier-Stokes equations in rotation form. SIAM J. Sci. Comput., 29(5):1959–1981, 2007.
  • [12] M. Benzi and D. B. Szyld. Existence and uniqueness of splittings for stationary iterative methods with applications to alternating methods. Numer. Math., 76(3):309–321, 1997.
  • [13] D. Bertaccini, G. H. Golub, S. S. Capizzano, and C. T. Possio. Preconditioned HSS methods for the solution of non-Hermitian positive definite linear systems and applications to the discrete convection-diffusion equation. Numer. Math., 99(3):441–484, 2005.
  • [14] D. P. Bertsekas. Distributed asynchronous computation of fixed points. Math. Program., 27(1):107–120, 1983.
  • [15] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989.
  • [16] D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra Appl., 2(2):199–222, 1969.
  • [17] V. Conrad and Y. Wallach. Alternating methods for sets of linear equations. Numer. Math., 32(1):105–108, 1979.
  • [18] L. D. Dalcín, R. R. Paz, and M. A. Storti. MPI for Python. J. Parallel Distrib. Comput., 65(9):1108–1115, 2005.
  • [19] J. Douglas. On the numerical integration of 2ux2+2uy2=utsuperscript2𝑢superscript𝑥2superscript2𝑢superscript𝑦2𝑢𝑡\frac{\partial^{2}u}{\partial x^{2}}+\frac{\partial^{2}u}{\partial y^{2}}=% \frac{\partial u}{\partial t}divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_t end_ARG by implicit methods. J. Soc. Ind. Appl. Math., 3(1):42–65, 1955.
  • [20] M. El Haddad, J. C. Garay, F. Magoulès, and D. B. Szyld. Synchronous and asynchronous optimized Schwarz methods for one-way subdivision of bounded domains. Numer. Linear Algebra Appl., 27(2):e2227, 2020.
  • [21] M. N. El Tarazi. Some convergence results for asynchronous algorithms. Numer. Math., 39(3):325–340, 1982. (in French).
  • [22] K. Fan. Topological proofs for certain theorems on matrices with non-negative elements. Monatshefte für Mathematik, 62:219–237, 1958.
  • [23] A. Frommer and D. B. Szyld. H-splittings and two-stage iterative methods. Numer. Math., 63(1):345–356, 1992.
  • [24] A. Frommer and D. B. Szyld. Asynchronous two-stage iterative methods. Numer. Math., 69(2):141–153, 1994.
  • [25] A. Frommer and D. B. Szyld. Asynchronous iterations with flexible communication for linear systems. Calculateurs Parallèles, 10:421–429, 1998.
  • [26] G. Gbikpi-Benissan and F. Magoulès. Protocol-free asynchronous iterations termination. Adv. Eng. Softw., 146:102827, 2020.
  • [27] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952.
  • [28] Y.-M. Huang. A practical formula for computing optimal parameters in the HSS iteration methods. J. Comput. Appl. Math., 255:142–149, 2014.
  • [29] C.-X. Li and S.-L. Wu. A single-step HSS method for non-Hermitian positive definite linear systems. Appl. Math. Lett., 44:26–29, 2015.
  • [30] L. Li, T.-Z. Huang, and X.-P. Liu. Modified Hermitian and skew-Hermitian splitting methods for non-Hermitian positive-definite linear systems. Numer. Linear Algebra Appl., 14(3):217–235, 2007.
  • [31] F. Magoulès and G. Gbikpi-Benissan. JACK: An asynchronous communication kernel library for iterative algorithms. J. Supercomput., 73(8):3468–3487, 2017.
  • [32] F. Magoulès and G. Gbikpi-Benissan. Asynchronous Parareal time discretization for partial differential equations. SIAM J. Sci. Comput., 40(6):C704–C725, 2018.
  • [33] F. Magoulès and G. Gbikpi-Benissan. Distributed convergence detection based on global residual error under asynchronous iterations. IEEE Trans. Parallel Distrib. Syst., 29(4):819–829, 2018.
  • [34] F. Magoulès and G. Gbikpi-Benissan. JACK2: An MPI-based communication library with non-blocking synchronization for asynchronous iterations. Adv. Eng. Softw., 119:116–133, 2018.
  • [35] F. Magoulès, G. Gbikpi-Benissan, and Q. Zou. Asynchronous iterations of Parareal algorithm for option pricing models. Mathematics, 6(4):1–18, 2018.
  • [36] F. Magoulès, D. B. Szyld, and C. Venet. Asynchronous optimized Schwarz methods with and without overlap. Numer. Math., 137(1):199–227, 2017.
  • [37] F. Magoulès and C. Venet. Asynchronous iterative sub-structuring methods. Math. Comput. Simul., 145:34–49, 2018.
  • [38] G. I. Marchuk. Splitting and alternating direction methods. In Handbook of Numerical Analysis, volume 1, pages 197–462. Elsevier, 1990.
  • [39] J.-C. Miellou. Algorithmes de relaxation chaotique à retards. ESAIM: M2AN, 9(R1):55–82, 1975. (in French).
  • [40] D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic and elliptic differential equations. J. Soc. Indust. Appl. Math., 3(1):28–41, 1955.
  • [41] Y. Saad and M. H. Schultz. Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7(3):856–869, 1986.
  • [42] S. Schechter. Relaxation methods for linear equations. Comm. Pure Appl. Math., 12(2):313–335, 1959.
  • [43] J. W. Sheldon. On the numerical solution of elliptic difference equations. MTAC, 9(51):101–112, 1955.
  • [44] S.-L. Wu. Several variants of the Hermitian and skew-Hermitian splitting method for a class of complex symmetric linear systems. Numer. Linear Algebra Appl., 22(2):338–356, 2015.
  • [45] I. Yamazaki, E. Chow, A. Bouteiller, and J. J. Dongarra. Performance of asynchronous optimized Schwarz with one-sided communication. Parallel Comput., 86:66–81, 2019.
  • [46] Q. Zou and F. Magoulès. Parameter estimation in the Hermitian and skew-Hermitian splitting method using gradient iterations. Numer. Linear Algebra Appl., 27:e2304, 2020.