\externaldocument

supplement

Efficient Estimation for Longitudinal Networks via Adaptive Merging

Haoran Zhang and Junhui Wang
Department of Statistics and Data Science
Southern University of Science and Technology
   Department of Statistics
The Chinese University of Hong Kong
Abstract

Longitudinal network consists of a sequence of temporal edges among multiple nodes, where the temporal edges are observed in real time. It has become ubiquitous with the rise of online social platform and e-commerce, but largely under-investigated in literature. In this paper, we propose an efficient estimation framework for longitudinal network, leveraging strengths of adaptive network merging, tensor decomposition and point process. It merges neighboring sparse networks so as to enlarge the number of observed edges and reduce estimation variance, whereas the estimation bias introduced by network merging is controlled by exploiting local temporal structures for adaptive network neighborhood. A projected gradient descent algorithm is proposed to facilitate estimation, where the upper bound of the estimation error in each iteration is established. A thorough analysis is conducted to quantify the asymptotic behavior of the proposed method, which shows that it can significantly reduce the estimation error and also provides guideline for network merging under various scenarios. We further demonstrate the advantage of the proposed method through extensive numerical experiments on synthetic datasets and a militarized interstate dispute dataset.

KEY WORDS: Dynamic network, embedding, multi-layer network, point process, tensor decomposition

1 Introduction

Longitudinal network, also known as temporal network or continuous-time dynamic network, consists of a sequence of temporal edges among multiple nodes, where the temporal edges may be observed between each node pair in real time (Holme and Saramäki,, 2012). It provides a flexible framework for modeling dynamic interactions between multiple objects and how network structure evolves over time (Aggarwal and Subbian,, 2014). For instances, in online social platform such as Facebook, users send likes to the posts of their friends recurrently at different time (Perry-Smith and Shalley,, 2003; Snijders et al.,, 2010); in international politics, countries may have conflict with others at one time but become allies at others (Cranmer and Desmarais,, 2011; Kinne,, 2013). Similar longitudinal networks have also been frequently encountered in biological science (Voytek and Knight,, 2015; Avena-Koenigsberger et al.,, 2018) and ecological science (Ulanowicz,, 2004; De Ruiter et al.,, 2005).

One of the key challenges in estimating longitudinal network resides in its scarce temporal edges, as the interactions between node pairs are instantaneous and come in a streaming fashion (Holme and Saramäki,, 2012), and thus the observed network at each given time point can be extremely sparse. This makes longitudinal network substantially different from discrete-time dynamic network (Kim et al.,, 2018), where multiple snapshots of networks are collected each with much more observed edges. In literature, various methods have been proposed for discrete-time dynamic network, such as Markov chain based methods (Hanneke et al.,, 2010; Sewell and Chen,, 2015, 2016; Matias and Miele,, 2017), Markov process based methods (Snijders et al.,, 2010; Snijders,, 2017) and tensor factorization methods (Lyu et al.,, 2023; Han et al.,, 2022). Whereas the former two assume that the discrete-time dynamic network is generated from some Markov chain or Markov process, tensor factorization methods treat the discrete-time dynamic network as an order-3 tensor and often require relatively dense network snapshots.

To circumvent the difficulty of severe under-sampling in longitudinal network, a common but rather ad-hoc approach is to merge longitudinal network into a multi-layer network based on equally spaced time intervals (Huang et al.,, 2023). Such an overly simplified network merging scheme completely ignores the fact that network structure may change differently during different time periods. Thus, it may introduce unnecessary estimation bias when network structure changes rapidly or incur large estimation variance when network structure stays unchanged for a long period. These negative impacts are yet neglected in literature, even though this ad-hoc network merging scheme has been widely employed to pre-process longitudinal networks in practice. Furthermore, some recent attempts were made from the perspective of survival and event history analysis (Vu et al., 2011a, ; Vu et al., 2011b, ; Perry and Wolfe,, 2013; Sit et al.,, 2021), with a keen focus on inference of the dependence of the temporal edge on some additional covariates. Some other recent works (Matias et al.,, 2018; Soliman et al.,, 2022) extend the stochastic block model to detect time-invariant communities in longitudinal network.

In this paper, we propose an efficient estimation method for longitudinal network, leveraging strengths of adaptive network merging, tensor decomposition and point process. Specifically, we introduce a two-step procedure based on regularized maximum likelihood estimate to estimate the underlying tensor for the longitudinal network. The initial step merges the longitudinal network with some small intervals, leading to an initial estimate of the embeddings of the underlying tensor. We then adaptively merge adjacent small intervals with similar estimated temporal embedding vectors, and re-estimate the underlying tensor based on the adaptively merged intervals. A projected gradient descent algorithm is provided to facilitate estimation, as well as an information criteria for choosing the number of intervals. A thorough theoretical analysis is conducted for the proposed estimation procedure. We first establish a general tensor estimation error bound based on a generic partition in each iteration of the projected gradient descent algorithm. The established error bound is tighter than most of the existing results in literature (Han et al.,, 2022), where the related empirical process is associated with a smaller parameter space with additional incoherence conditions. This tighter bound enables us to derive the error bound for the tensor estimate based on equally spaced intervals, which consists of an interesting bias-variance tradeoff governed by the number of small intervals and leads to faster convergence rate than that in Han et al., (2022) and Cai et al., (2023). More importantly, the derived error bound does not require the strong intensity condition as required in Han et al., (2022) and Cai et al., (2023), which, to the best of our knowledge, is the first Poisson tensor estimation error bound in both medium and weak intensity regimes. Furthermore, it is shown that the tensor estimation error, including the estimation bias and variance, can be further reduced by adaptively merging intervals, which also provides guidelines for network merging under various scenarios. The advantage of the proposed method over other existing competitors is demonstrated in extensive numerical experiments on synthetic longitudinal networks. The proposed method is also applied to analyze a militarized interstate dispute dataset, where not only the prediction accuracy increases substantially, but the adaptively merged intervals also lead to clear and meaningful interpretation.

The main contributions of this paper are three-fold. First, we establish an upper bound for the tensor estimation error for longitudinal networks under a generic partition. By assuming additional incoherence conditions to the index set of the empirical process, our result is more powerful than the existing theoretical results. Second, we establish an upper bound for Poisson tensor estimation error under a more complete intensity regime, especially under weak and medium intensity regimes. This is a new theoretical result which has not been established in the existing literature of tensor estimation, and is of great importance in determining the best partition scheme for longitudinal network merging. Third, we propose an adaptive merging scheme for estimating the longitudinal network, and establish the upper bound for tensor estimation error. Further, we give a theoretical guideline for optimal network merging under different scenarios. It is shown that the error rates under the adaptive merging scheme are smaller than those of the equally spaced merging scheme in most scenarios.

The rest of the paper is organized as follows. Section 2 first presents the two-step estimation procedure for longitudinal network, and then propose a regularized maximum likelihood estimator based on Poisson process. Section 3 provides the details of the computation algorithm. Section 4 establishes the error bound for the proposed method. Numerical experiments on synthetic and real-life networks are contained in Section 5. Section 6 concludes the paper with a brief discussion, and technical proofs, necessary lemmas and more numerical results are provided in the Appendix and a separate Supplementary File.

Notations. Before moving to Section 2, we introduce some notations and preliminaries for tensor decomposition. For any nr𝑛𝑟n\geq ritalic_n ≥ italic_r, let 𝕆n,r={𝐔n×r:𝐔𝐔=𝐈r}subscript𝕆𝑛𝑟conditional-set𝐔superscript𝑛𝑟superscript𝐔top𝐔subscript𝐈𝑟\mathbb{O}_{n,r}=\{\mathbf{U}\in\mathbb{R}^{n\times r}:\mathbf{U}^{\top}% \mathbf{U}=\mathbf{I}_{r}\}blackboard_O start_POSTSUBSCRIPT italic_n , italic_r end_POSTSUBSCRIPT = { bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT : bold_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U = bold_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } and denote 𝕆r=𝕆r,rsubscript𝕆𝑟subscript𝕆𝑟𝑟\mathbb{O}_{r}=\mathbb{O}_{r,r}blackboard_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = blackboard_O start_POSTSUBSCRIPT italic_r , italic_r end_POSTSUBSCRIPT. For a matrix 𝐔𝐔\mathbf{U}bold_U, let 𝐔[i,],𝐔[,r]\mathbf{U}_{[i,]},\mathbf{U}_{[,r]}bold_U start_POSTSUBSCRIPT [ italic_i , ] end_POSTSUBSCRIPT , bold_U start_POSTSUBSCRIPT [ , italic_r ] end_POSTSUBSCRIPT and (𝐔)irsubscript𝐔𝑖𝑟(\mathbf{U})_{ir}( bold_U ) start_POSTSUBSCRIPT italic_i italic_r end_POSTSUBSCRIPT denote the i𝑖iitalic_i-th row, r𝑟ritalic_r-th column and element (i,r)𝑖𝑟(i,r)( italic_i , italic_r ) of 𝐔𝐔\mathbf{U}bold_U, respectively. Let 𝐔2,𝐔Fsubscriptnorm𝐔2subscriptnorm𝐔𝐹\|\mathbf{U}\|_{2},\|\mathbf{U}\|_{F}∥ bold_U ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∥ bold_U ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denote its spectral and Frobenius norm, and 𝐔2=maxi𝐔[i,]\|\mathbf{U}\|_{2\to\infty}=\max_{i}\|\mathbf{U}_{[i,]}\|∥ bold_U ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ bold_U start_POSTSUBSCRIPT [ italic_i , ] end_POSTSUBSCRIPT ∥. For any order-3 tensor n1×n2×n3superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\mathcal{M}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}caligraphic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, let [i,,],[,j,],[,,k]\mathcal{M}_{[i,,]},\mathcal{M}_{[,j,]},\mathcal{M}_{[,,k]}caligraphic_M start_POSTSUBSCRIPT [ italic_i , , ] end_POSTSUBSCRIPT , caligraphic_M start_POSTSUBSCRIPT [ , italic_j , ] end_POSTSUBSCRIPT , caligraphic_M start_POSTSUBSCRIPT [ , , italic_k ] end_POSTSUBSCRIPT and ()ijksubscript𝑖𝑗𝑘(\mathcal{M})_{ijk}( caligraphic_M ) start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT denote the i𝑖iitalic_i-th horizontal slices, j𝑗jitalic_j-th lateral slices, k𝑘kitalic_k-th frontal slices and element (i,j,k)𝑖𝑗𝑘(i,j,k)( italic_i , italic_j , italic_k ) of \mathcal{M}caligraphic_M, respectively. Let Ψk()nk×nksubscriptΨ𝑘superscriptsubscript𝑛𝑘subscript𝑛𝑘\Psi_{k}(\mathcal{M})\in\mathbb{R}^{n_{k}\times n_{-k}}roman_Ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_M ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be the mode-k𝑘kitalic_k unfolding of \mathcal{M}caligraphic_M, where nk=n1n2n3/nksubscript𝑛𝑘subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛𝑘n_{-k}=n_{1}n_{2}n_{3}/n_{k}italic_n start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT / italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k=1,2,3𝑘123k=1,2,3italic_k = 1 , 2 , 3. Specifically,

Ψk()nk×nk,where[Ψk()]ik,ik+1+nk+1(ik+21)=i1i2i3,formulae-sequencesubscriptΨ𝑘superscriptsubscript𝑛𝑘subscript𝑛𝑘wheresubscriptdelimited-[]subscriptΨ𝑘subscript𝑖𝑘subscript𝑖𝑘1subscript𝑛𝑘1subscript𝑖𝑘21subscriptsubscript𝑖1subscript𝑖2subscript𝑖3\Psi_{k}(\mathcal{M})\in\mathbb{R}^{n_{k}\times n_{-k}},~{}\text{where}~{}[% \Psi_{k}(\mathcal{M})]_{i_{k},i_{k+1}+n_{k+1}(i_{k+2}-1)}=\mathcal{M}_{i_{1}i_% {2}i_{3}},roman_Ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_M ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , where [ roman_Ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_M ) ] start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT - 1 ) end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

where k+1𝑘1k+1italic_k + 1 and k+2𝑘2k+2italic_k + 2 are obtained modulo 3. We denote rank()(r1,r2,r3)ranksubscript𝑟1subscript𝑟2subscript𝑟3\text{rank}(\mathcal{M})\leq(r_{1},r_{2},r_{3})rank ( caligraphic_M ) ≤ ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) if \mathcal{M}caligraphic_M admits the decomposition =𝒮×1𝐔×2𝐕×3𝐖=:[𝒮;𝐔,𝐕,𝐖]\mathcal{M}=\mathcal{S}\times_{1}\mathbf{U}\times_{2}\mathbf{V}\times_{3}% \mathbf{W}=:[\mathcal{S};\mathbf{U},\mathbf{V},\mathbf{W}]caligraphic_M = caligraphic_S × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_U × start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_V × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_W = : [ caligraphic_S ; bold_U , bold_V , bold_W ] for some 𝒮r1×r2×r3𝒮superscriptsubscript𝑟1subscript𝑟2subscript𝑟3\mathcal{S}\in\mathbb{R}^{r_{1}\times r_{2}\times r_{3}}caligraphic_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝐔n1×r1𝐔superscriptsubscript𝑛1subscript𝑟1\mathbf{U}\in\mathbb{R}^{n_{1}\times r_{1}}bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝐕n2×r2𝐕superscriptsubscript𝑛2subscript𝑟2\mathbf{V}\in\mathbb{R}^{n_{2}\times r_{2}}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐖n3×r3𝐖superscriptsubscript𝑛3subscript𝑟3\mathbf{W}\in\mathbb{R}^{n_{3}\times r_{3}}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. For any order-3 tensor \mathcal{M}caligraphic_M with rank()(r1,r2,r3)ranksubscript𝑟1subscript𝑟2subscript𝑟3\text{rank}(\mathcal{M})\leq(r_{1},r_{2},r_{3})rank ( caligraphic_M ) ≤ ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), define

λ¯()¯𝜆\displaystyle\overline{\lambda}(\mathcal{M})over¯ start_ARG italic_λ end_ARG ( caligraphic_M ) =max{Ψ1()2,Ψ2()2,Ψ3()2},absentsubscriptnormsubscriptΨ12subscriptnormsubscriptΨ22subscriptnormsubscriptΨ32\displaystyle=\max\left\{\|\Psi_{1}(\mathcal{M})\|_{2},\|\Psi_{2}(\mathcal{M})% \|_{2},\|\Psi_{3}(\mathcal{M})\|_{2}\right\},= roman_max { ∥ roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_M ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∥ roman_Ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_M ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ∥ roman_Ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( caligraphic_M ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } ,
λ¯()¯𝜆\displaystyle\underline{\lambda}(\mathcal{M})under¯ start_ARG italic_λ end_ARG ( caligraphic_M ) =min{σr1(Ψ1()),σr2(Ψ2()),σr3(Ψ3())},absentsubscript𝜎subscript𝑟1subscriptΨ1subscript𝜎subscript𝑟2subscriptΨ2subscript𝜎subscript𝑟3subscriptΨ3\displaystyle=\min\left\{\sigma_{r_{1}}(\Psi_{1}(\mathcal{M})),\sigma_{r_{2}}(% \Psi_{2}(\mathcal{M})),\sigma_{r_{3}}(\Psi_{3}(\mathcal{M}))\right\},= roman_min { italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_M ) ) , italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_M ) ) , italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( caligraphic_M ) ) } ,

where σr(𝐌)subscript𝜎𝑟𝐌\sigma_{r}(\mathbf{M})italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_M ) denote the r𝑟ritalic_r-th largest singular value of matrix 𝐌𝐌\mathbf{M}bold_M. Let F=i,j,kmijk2subscriptnorm𝐹subscript𝑖𝑗𝑘superscriptsubscript𝑚𝑖𝑗𝑘2\|\mathcal{M}\|_{F}=\sqrt{\sum_{i,j,k}m_{ijk}^{2}}∥ caligraphic_M ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG be the Frobenius norm of \mathcal{M}caligraphic_M. Throughout the paper, we use c,C,ϵ𝑐𝐶italic-ϵc,C,\epsilonitalic_c , italic_C , italic_ϵ and κ𝜅\kappaitalic_κ to denote positive constants whose values may vary according to context. For an integer m𝑚mitalic_m, let [m]delimited-[]𝑚[m][ italic_m ] denote the set {1,,m}1𝑚\{1,...,m\}{ 1 , … , italic_m }. For two number a𝑎aitalic_a and b𝑏bitalic_b, let ab=min(a,b)𝑎𝑏𝑎𝑏a\wedge b=\min(a,b)italic_a ∧ italic_b = roman_min ( italic_a , italic_b ). For two nonnegative sequences ansubscript𝑎𝑛a_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and bnsubscript𝑏𝑛b_{n}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, let anbnprecedes-or-equalssubscript𝑎𝑛subscript𝑏𝑛a_{n}\preceq b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⪯ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and anbnprecedessubscript𝑎𝑛subscript𝑏𝑛a_{n}\prec b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≺ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote an=O(bn)subscript𝑎𝑛𝑂subscript𝑏𝑛a_{n}=O(b_{n})italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_O ( italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and an=o(bn)subscript𝑎𝑛𝑜subscript𝑏𝑛a_{n}=o(b_{n})italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), respectively. Denote anbnasymptotically-equalssubscript𝑎𝑛subscript𝑏𝑛a_{n}\asymp b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≍ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT if anbnprecedes-or-equalssubscript𝑎𝑛subscript𝑏𝑛a_{n}\preceq b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⪯ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and bnanprecedes-or-equalssubscript𝑏𝑛subscript𝑎𝑛b_{n}\preceq a_{n}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⪯ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Further, anPbnsubscriptprecedes-or-equals𝑃subscript𝑎𝑛subscript𝑏𝑛a_{n}\preceq_{P}b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT means that there exists a positive constant c𝑐citalic_c such that Pr(ancbn)0Prsubscript𝑎𝑛𝑐subscript𝑏𝑛0\Pr(a_{n}\geq cb_{n})\to 0roman_Pr ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_c italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) → 0 as n𝑛nitalic_n diverges.

2 Proposed method

2.1 Poisson point process and tensor factorization

Consider a bipartite longitudinal network with n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT out-nodes and n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in-nodes on a given time interval [0,T)0𝑇[0,T)[ 0 , italic_T ), where n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are not necessarily equal. Let ={(im,jm,tm):m=1,,M}conditional-setsubscript𝑖𝑚subscript𝑗𝑚subscript𝑡𝑚𝑚1𝑀{\cal E}=\{(i_{m},j_{m},t_{m}):m=1,...,M\}caligraphic_E = { ( italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) : italic_m = 1 , … , italic_M } denote the set of all observed directed edges, where the triplet (i,j,t)𝑖𝑗𝑡(i,j,t)( italic_i , italic_j , italic_t ) denotes the occurrence of a temporal edge at time t𝑡titalic_t pointing from out-node i𝑖iitalic_i to in-node j𝑗jitalic_j. Note that temporal edge is instantaneous and appears at only one single time point. Let yij()subscript𝑦𝑖𝑗y_{ij}(\cdot)italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( ⋅ ) be the point process that counts the number of directed edges out-node i𝑖iitalic_i sends to in-node j𝑗jitalic_j during [0,T)0𝑇[0,T)[ 0 , italic_T ). Particularly, out-node i𝑖iitalic_i sends a directed edge to in-node j𝑗jitalic_j at time t𝑡titalic_t if and only if dyij(t)=1𝑑subscript𝑦𝑖𝑗𝑡1dy_{ij}(t)=1italic_d italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) = 1. For each node pair (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), suppose the intensity of yij(t)subscript𝑦𝑖𝑗𝑡y_{ij}(t)italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) is governed by some underlying propensity θij(t)subscript𝜃𝑖𝑗𝑡\theta_{ij}(t)italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ). The larger θij(t)subscript𝜃𝑖𝑗𝑡\theta_{ij}(t)italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) is, the more likely out-node i𝑖iitalic_i will send a directed edge to in-node j𝑗jitalic_j during [t,t+dt)𝑡𝑡𝑑𝑡[t,t+dt)[ italic_t , italic_t + italic_d italic_t ). More specifically, given 𝚯={𝚯(t)=(θij(t))n1×n2}t[0,T)𝚯subscript𝚯𝑡subscriptsubscript𝜃𝑖𝑗𝑡subscript𝑛1subscript𝑛2𝑡0𝑇\bm{\Theta}=\{\bm{\Theta}(t)=(\theta_{ij}(t))_{n_{1}\times n_{2}}\}_{t\in[0,T)}bold_Θ = { bold_Θ ( italic_t ) = ( italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ) end_POSTSUBSCRIPT, we assume that yij()subscript𝑦𝑖𝑗y_{ij}(\cdot)italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( ⋅ )’s are mutually independent Poisson processes such that

𝔼(dyij(t)θij(t))=λ0eθij(t)dt𝔼conditional𝑑subscript𝑦𝑖𝑗𝑡subscript𝜃𝑖𝑗𝑡subscript𝜆0superscript𝑒subscript𝜃𝑖𝑗𝑡𝑑𝑡\mathbb{E}(dy_{ij}(t)\mid\theta_{ij}(t))=\lambda_{0}e^{\theta_{ij}(t)}dtblackboard_E ( italic_d italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) ∣ italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) ) = italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT italic_d italic_t (1)

where λ0>0subscript𝜆00\lambda_{0}>0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 is the baseline intensity. The log-likelihood function of {yij(t)}1in1,1jn2subscriptsubscript𝑦𝑖𝑗𝑡formulae-sequence1𝑖subscript𝑛11𝑗subscript𝑛2\{y_{ij}(t)\}_{1\leq i\leq n_{1},1\leq j\leq n_{2}}{ italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 ≤ italic_j ≤ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT can become

l(𝚯)=i=1n1j=1n2{t𝒯ijlogλij(t)0Tλij(s)𝑑s},𝑙𝚯superscriptsubscript𝑖1subscript𝑛1superscriptsubscript𝑗1subscript𝑛2subscript𝑡subscript𝒯𝑖𝑗subscript𝜆𝑖𝑗𝑡superscriptsubscript0𝑇subscript𝜆𝑖𝑗𝑠differential-d𝑠\displaystyle l(\bm{\Theta})=\sum_{i=1}^{n_{1}}\sum_{j=1}^{n_{2}}\left\{\sum_{% t\in\mathcal{T}_{ij}}\log\lambda_{ij}(t)-\int_{0}^{T}\lambda_{ij}(s)ds\right\},italic_l ( bold_Θ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { ∑ start_POSTSUBSCRIPT italic_t ∈ caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_s ) italic_d italic_s } , (2)

where λij(t)=λ0exp(θij(t))subscript𝜆𝑖𝑗𝑡subscript𝜆0subscript𝜃𝑖𝑗𝑡\lambda_{ij}(t)=\lambda_{0}\exp(\theta_{ij}(t))italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) = italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_exp ( italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) ). Note that λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is fixed throughout the paper, but it could also be varying with t𝑡titalic_t, which may require more involved treatment.

Suppose 𝚯(t)𝚯𝑡\bm{\Theta}(t)bold_Θ ( italic_t ) admits a low rank structure so that

θij(t)=𝒮×1𝐮i×2𝐯j×3𝐰(t),subscript𝜃𝑖𝑗𝑡subscript3subscript2subscript1𝒮superscriptsubscript𝐮𝑖topsuperscriptsubscript𝐯𝑗top𝐰superscript𝑡top\theta_{ij}(t)=\mathcal{S}\times_{1}\mathbf{u}_{i}^{\top}\times_{2}\mathbf{v}_% {j}^{\top}\times_{3}\mathbf{w}(t)^{\top},italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) = caligraphic_S × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_w ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , (3)

where ×ssubscript𝑠\times_{s}× start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT denotes the mode-s𝑠sitalic_s product for s[3]𝑠delimited-[]3s\in[3]italic_s ∈ [ 3 ], 𝒮r1×r2×r3𝒮superscriptsubscript𝑟1subscript𝑟2subscript𝑟3\mathcal{S}\in\mathbb{R}^{r_{1}\times r_{2}\times r_{3}}caligraphic_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is an order-3 core tensor, and each out-node i𝑖iitalic_i, in-node j𝑗jitalic_j and time t𝑡titalic_t are embedded as low-dimensional vectors 𝐮ir1,𝐯jr2formulae-sequencesubscript𝐮𝑖superscriptsubscript𝑟1subscript𝐯𝑗superscriptsubscript𝑟2\mathbf{u}_{i}\in\mathbb{R}^{r_{1}},\mathbf{v}_{j}\in\mathbb{R}^{r_{2}}bold_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐰(t)r3𝐰𝑡superscriptsubscript𝑟3\mathbf{w}(t)\in\mathbb{R}^{r_{3}}bold_w ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. It is clear that the time-invariant network structure is captured by the network embedding vectors 𝐮𝐮\mathbf{u}bold_u and 𝐯𝐯\mathbf{v}bold_v, while the temporal structure is captured by the temporal embedding vector 𝐰(t)𝐰𝑡\mathbf{w}(t)bold_w ( italic_t ). Such a network embedding model has been widely employed for network data analysis (Hoff et al.,, 2002; Lyu et al.,, 2023; Zhang et al.,, 2022; Zhen and Wang,, 2023), which embeds the unstructured network in a low-dimensional Euclidean space to facilitate the subsequent analysis. It is also related to the random dot product graph model (Athreya et al.,, 2017; Rubin-Delanchy et al.,, 2022).

2.2 Adaptive merging

Let 𝒢t={(i,j):(i,j,t)}subscript𝒢𝑡conditional-set𝑖𝑗𝑖𝑗𝑡{\cal G}_{t}=\{(i,j):(i,j,t)\in{\cal E}\}caligraphic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { ( italic_i , italic_j ) : ( italic_i , italic_j , italic_t ) ∈ caligraphic_E } as the observed network at time t𝑡titalic_t, 𝒯ij={t[0,T):(i,j,t)}subscript𝒯𝑖𝑗conditional-set𝑡0𝑇𝑖𝑗𝑡\mathcal{T}_{ij}=\{t\in[0,T):(i,j,t)\in{\cal E}\}caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { italic_t ∈ [ 0 , italic_T ) : ( italic_i , italic_j , italic_t ) ∈ caligraphic_E } as the time stamps for directed edges (i,j)𝑖𝑗(i,j)( italic_i , italic_j ). Since the directed edges in {\cal E}caligraphic_E are observed in real time, 𝒢tsubscript𝒢𝑡{\cal G}_{t}caligraphic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be extremely sparse and may consist of even only one observed edge, which casts great challenge for estimating the longitudinal network. To circumvent the difficulty of severe under-sampling, we propose to embed the longitudinal network by adaptively merging 𝒢tsubscript𝒢𝑡{\cal G}_{t}caligraphic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into relatively dense networks based on their temporal structures, which leads to a substantially improved estimation of the longitudinal network.

We first split the time window [0,T)0𝑇[0,T)[ 0 , italic_T ) into L𝐿Litalic_L equally spaced small intervals with endpoints {δl}l=1Lsuperscriptsubscriptsubscript𝛿𝑙𝑙1𝐿\{\delta_{l}\}_{l=1}^{L}{ italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, where δl=lΔ𝜹subscript𝛿𝑙𝑙subscriptΔ𝜹\delta_{l}=l\Delta_{\bm{\delta}}italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_l roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT, δ0=0subscript𝛿00\delta_{0}=0italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, and each interval [δl1,δl)subscript𝛿𝑙1subscript𝛿𝑙[\delta_{l-1},\delta_{l})[ italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) is of width Δ𝜹=T/LsubscriptΔ𝜹𝑇𝐿\Delta_{\bm{\delta}}=T/Lroman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = italic_T / italic_L. When Δ𝜹subscriptΔ𝜹\Delta_{\bm{\delta}}roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT is sufficiently small, it is expected that 𝚯(t)𝚯𝑡\bm{\Theta}(t)bold_Θ ( italic_t ) shall be roughly constant within each time interval. As a direct consequence, 𝚯(t)𝚯𝑡\bm{\Theta}(t)bold_Θ ( italic_t ) can be estimated by a low rank order-3 tensor n1×n2×Lsuperscriptsubscript𝑛1subscript𝑛2𝐿\mathcal{M}\in\mathbb{R}^{n_{1}\times n_{2}\times L}caligraphic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_L end_POSTSUPERSCRIPT, which admits a Tucker decomposition with rank (r1,r2,r3)subscript𝑟1subscript𝑟2subscript𝑟3(r_{1},r_{2},r_{3})( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ),

=𝒮×1𝐔×2𝐕×3𝐖,subscript3subscript2subscript1𝒮𝐔𝐕𝐖\mathcal{M}=\mathcal{S}\times_{1}\mathbf{U}\times_{2}\mathbf{V}\times_{3}% \mathbf{W},caligraphic_M = caligraphic_S × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_U × start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_V × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_W ,

with 𝐮i,𝐯jsubscript𝐮𝑖subscript𝐯𝑗\mathbf{u}_{i},\mathbf{v}_{j}bold_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝐰lsubscript𝐰𝑙\mathbf{w}_{l}bold_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT being the corresponding rows of 𝐔,𝐕𝐔𝐕\mathbf{U},\mathbf{V}bold_U , bold_V and 𝐖𝐖\mathbf{W}bold_W, respectively. Let 𝜹=(δ1,,δL)𝜹superscriptsubscript𝛿1subscript𝛿𝐿top\bm{\delta}=(\delta_{1},...,\delta_{L})^{\top}bold_italic_δ = ( italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and 𝒴𝜹n1×n2×Lsubscript𝒴𝜹superscriptsubscript𝑛1subscript𝑛2𝐿\mathcal{Y}_{\bm{\delta}}\in\mathbb{R}^{n_{1}\times n_{2}\times L}caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_L end_POSTSUPERSCRIPT with (𝒴𝜹)ijl=|𝒯ij[δl1,δl)|subscriptsubscript𝒴𝜹𝑖𝑗𝑙subscript𝒯𝑖𝑗subscript𝛿𝑙1subscript𝛿𝑙(\mathcal{Y}_{\bm{\delta}})_{ijl}=|\mathcal{T}_{ij}\cap[\delta_{l-1},\delta_{l% })|( caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT = | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) | representing the number of temporal edges in each small interval. An initial estimate ^𝜹=[𝒮^𝜹;𝐔^𝜹,𝐕^𝜹,𝐖^𝜹]subscript^𝜹subscript^𝒮𝜹subscript^𝐔𝜹subscript^𝐕𝜹subscript^𝐖𝜹\widehat{\mathcal{M}}_{\bm{\delta}}=[\widehat{\mathcal{S}}_{\bm{\delta}};% \widehat{\mathbf{U}}_{\bm{\delta}},\widehat{\mathbf{V}}_{\bm{\delta}},\widehat% {\mathbf{W}}_{\bm{\delta}}]over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = [ over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ; over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ] can be obtained by minimizing certain distance measure between \mathcal{M}caligraphic_M and 𝒴𝜹subscript𝒴𝜹\mathcal{Y}_{\bm{\delta}}caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT, to be specified in Section 2.3.

Once the initial estimate 𝐖^𝜹=(𝐰^1,𝜹,,𝐰^L,𝜹)subscript^𝐖𝜹superscriptsubscript^𝐰1𝜹subscript^𝐰𝐿𝜹top\widehat{\mathbf{W}}_{\bm{\delta}}=(\widehat{\mathbf{w}}_{1,\bm{\delta}},...,% \widehat{\mathbf{w}}_{L,\bm{\delta}})^{\top}over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = ( over^ start_ARG bold_w end_ARG start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT , … , over^ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_L , bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is obtained, define

𝐖~𝜹=(𝐰~1,𝜹,,𝐰~L,𝜹)=L𝐖^𝜹((𝐖^𝜹)𝐖^𝜹)12,subscript~𝐖𝜹superscriptsubscript~𝐰1𝜹subscript~𝐰𝐿𝜹top𝐿subscript^𝐖𝜹superscriptsuperscriptsubscript^𝐖𝜹topsubscript^𝐖𝜹12\widetilde{\mathbf{W}}_{\bm{\delta}}=(\widetilde{\mathbf{w}}_{1,\bm{\delta}},.% ..,\widetilde{\mathbf{w}}_{L,\bm{\delta}})^{\top}=\sqrt{L}\widehat{\mathbf{W}}% _{\bm{\delta}}((\widehat{\mathbf{W}}_{\bm{\delta}})^{\top}\widehat{\mathbf{W}}% _{\bm{\delta}})^{-\frac{1}{2}},over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = ( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT , … , over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_L , bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = square-root start_ARG italic_L end_ARG over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ( ( over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (4)

where (𝐖^𝜹)𝐖^𝜹superscriptsubscript^𝐖𝜹topsubscript^𝐖𝜹(\widehat{\mathbf{W}}_{\bm{\delta}})^{\top}\widehat{\mathbf{W}}_{\bm{\delta}}( over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT is invertible with high probability as to be shown in the proof of Theorem 3. This is actually a normalization step to facilitate technical analysis. Though consistent, the estimation variance of 𝐖~𝜹subscript~𝐖𝜹\widetilde{\mathbf{W}}_{\bm{\delta}}over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT can be exceedingly large when Δ𝜹subscriptΔ𝜹\Delta_{\bm{\delta}}roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT is too small. We then propose to merge adjacent small intervals with similar temporal embedding vectors 𝐰~l,𝜹subscript~𝐰𝑙𝜹\widetilde{\mathbf{w}}_{l,\bm{\delta}}over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT, so as to shrink the estimation variance without compromising the estimation bias.

Let 𝒫={𝒫1,,𝒫K}𝒫subscript𝒫1subscript𝒫𝐾\mathcal{P}=\{\mathcal{P}_{1},...,\mathcal{P}_{K}\}caligraphic_P = { caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } denote the adaptively merged intervals, where for any l1𝒫k1subscript𝑙1subscript𝒫subscript𝑘1l_{1}\in\mathcal{P}_{k_{1}}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and l2𝒫k2subscript𝑙2subscript𝒫subscript𝑘2l_{2}\in\mathcal{P}_{k_{2}}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, it holds that l1<l2subscript𝑙1subscript𝑙2l_{1}<l_{2}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if k1<k2subscript𝑘1subscript𝑘2k_{1}<k_{2}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Then, it can be estimated as

𝒫^=argmin𝒫k=1Kl𝒫k𝐰~l,𝜹𝝁k2,^𝒫subscriptargmin𝒫superscriptsubscript𝑘1𝐾subscript𝑙subscript𝒫𝑘superscriptnormsubscript~𝐰𝑙𝜹subscript𝝁𝑘2\widehat{\mathcal{P}}=\operatornamewithlimits{arg\,min}_{\mathcal{P}}\sum_{k=1% }^{K}\sum_{l\in\mathcal{P}_{k}}\|\widetilde{\mathbf{w}}_{l,\bm{\delta}}-\bm{% \mu}_{k}\|^{2},over^ start_ARG caligraphic_P end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (5)

where 𝝁k=|𝒫k|1l𝒫k𝐰~l,𝜹subscript𝝁𝑘superscriptsubscript𝒫𝑘1subscript𝑙subscript𝒫𝑘subscript~𝐰𝑙𝜹\bm{\mu}_{k}=|\mathcal{P}_{k}|^{-1}\sum_{l\in\mathcal{P}_{k}}\widetilde{% \mathbf{w}}_{l,\bm{\delta}}bold_italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT. Note that (5) is equivalent to seeking change points in the sequence (𝐰~1,𝜹,,𝐰~L,𝜹)subscript~𝐰1𝜹subscript~𝐰𝐿𝜹(\widetilde{\mathbf{w}}_{1,\bm{\delta}},...,\widetilde{\mathbf{w}}_{L,\bm{% \delta}})( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT , … , over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_L , bold_italic_δ end_POSTSUBSCRIPT ), and thus can be efficiently solved by multiple change point detection algorithm (Hao et al.,, 2013; Niu et al.,, 2016). Further, define η^k=Δ𝜹max𝒫^ksubscript^𝜂𝑘subscriptΔ𝜹subscript^𝒫𝑘\widehat{\eta}_{k}=\Delta_{\bm{\delta}}\max\widehat{\mathcal{P}}_{k}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT roman_max over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and thus 𝜼^=(η^1,,η^K)^𝜼superscriptsubscript^𝜂1subscript^𝜂𝐾top\widehat{\bm{\eta}}=(\widehat{\eta}_{1},...,\widehat{\eta}_{K})^{\top}over^ start_ARG bold_italic_η end_ARG = ( over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT consists of the estimated endpoints of K𝐾Kitalic_K adaptively merged intervals. Denote 𝒴𝜼^n1×n2×Ksubscript𝒴^𝜼superscriptsubscript𝑛1subscript𝑛2𝐾\mathcal{Y}_{\widehat{\bm{\eta}}}\in\mathbb{R}^{n_{1}\times n_{2}\times K}caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_K end_POSTSUPERSCRIPT with (𝒴𝜼^)ijk=|𝒯ij[η^k1,η^k)|subscriptsubscript𝒴^𝜼𝑖𝑗𝑘subscript𝒯𝑖𝑗subscript^𝜂𝑘1subscript^𝜂𝑘(\mathcal{Y}_{\widehat{\bm{\eta}}})_{ijk}=|\mathcal{T}_{ij}\cap[\widehat{\eta}% _{k-1},\widehat{\eta}_{k})|( caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT = | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | with η^0=0subscript^𝜂00\widehat{\eta}_{0}=0over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, the final estimate ^𝜼^subscript^^𝜼\widehat{\mathcal{M}}_{\widehat{\bm{\eta}}}over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT is then obtained by minimizing the distance measure between \mathcal{M}caligraphic_M and 𝒴𝜼^subscript𝒴^𝜼\mathcal{Y}_{\widehat{\bm{\eta}}}caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT.

2.3 Regularized likelihood estimation

Let 𝝉=(τ1,,τn3)𝝉superscriptsubscript𝜏1subscript𝜏subscript𝑛3top\bm{\tau}=(\tau_{1},\ldots,\tau_{n_{3}})^{\top}bold_italic_τ = ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT denote a generic partition of [0,T)0𝑇[0,T)[ 0 , italic_T ) with 0=τ0<τ1<<τn3=T0subscript𝜏0subscript𝜏1subscript𝜏subscript𝑛3𝑇0=\tau_{0}<\tau_{1}<...<\tau_{n_{3}}=T0 = italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_τ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_T. Particularly, 𝝉𝝉\bm{\tau}bold_italic_τ could be the equally spaced intervals 𝜹𝜹\bm{\delta}bold_italic_δ for the initial estimate or the adaptively merged intervals 𝜼^^𝜼\widehat{\bm{\eta}}over^ start_ARG bold_italic_η end_ARG for the final estimate, and n3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT could be L𝐿Litalic_L or K𝐾Kitalic_K, correspondingly. For any n1×n2×n3superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\mathcal{M}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}caligraphic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we define

l(;𝝉)=i=1n1j=1n2l=1n3{mijl|𝒯ij[τl1,τl)|emijlλ0(τlτl1)}.𝑙𝝉superscriptsubscript𝑖1subscript𝑛1superscriptsubscript𝑗1subscript𝑛2superscriptsubscript𝑙1subscript𝑛3conditional-setsubscript𝑚𝑖𝑗𝑙subscript𝒯𝑖𝑗conditionalsubscript𝜏𝑙1subscript𝜏𝑙superscript𝑒subscript𝑚𝑖𝑗𝑙subscript𝜆0subscript𝜏𝑙subscript𝜏𝑙1\displaystyle l(\mathcal{M};\bm{\tau})=\sum_{i=1}^{n_{1}}\sum_{j=1}^{n_{2}}% \sum_{l=1}^{n_{3}}\big{\{}m_{ijl}\left|\mathcal{T}_{ij}\cap[\tau_{l-1},\tau_{l% })\right|-e^{m_{ijl}}\lambda_{0}(\tau_{l}-\tau_{l-1})\big{\}}.italic_l ( caligraphic_M ; bold_italic_τ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { italic_m start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_τ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) | - italic_e start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ) } . (6)

Note that if 𝚯(t)𝚯𝑡\bm{\Theta}(t)bold_Θ ( italic_t ) is roughly constant in each interval, we consider the regularized formulation,

(𝒮^𝝉,𝐔^𝝉,𝐕^𝝉,𝐖^𝝉)=subscript^𝒮𝝉subscript^𝐔𝝉subscript^𝐕𝝉subscript^𝐖𝝉absent\displaystyle(\widehat{\mathcal{S}}_{\bm{\tau}},\widehat{\mathbf{U}}_{\bm{\tau% }},\widehat{\mathbf{V}}_{\bm{\tau}},\widehat{\mathbf{W}}_{\bm{\tau}})=( over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ) = argmin𝒮,𝐔,𝐕,𝐖{l(;𝝉)+γ𝒥𝝉(𝐔,𝐕,𝐖)},subscriptargmin𝒮𝐔𝐕𝐖𝑙𝝉𝛾subscript𝒥𝝉𝐔𝐕𝐖\displaystyle\operatornamewithlimits{arg\,min}_{\mathcal{S},\mathbf{U},\mathbf% {V},\mathbf{W}}\ \left\{-l(\mathcal{M};\bm{\tau})+\gamma\mathcal{J}_{\bm{\tau}% }(\mathbf{U},\mathbf{V},\mathbf{W})\right\},start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT caligraphic_S , bold_U , bold_V , bold_W end_POSTSUBSCRIPT { - italic_l ( caligraphic_M ; bold_italic_τ ) + italic_γ caligraphic_J start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( bold_U , bold_V , bold_W ) } , (7)

where γ𝛾\gammaitalic_γ is the tuning parameter, 𝒥𝝉(𝐔,𝐕,𝐖)subscript𝒥𝝉𝐔𝐕𝐖\mathcal{J}_{\bm{\tau}}(\mathbf{U},\mathbf{V},\mathbf{W})caligraphic_J start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( bold_U , bold_V , bold_W ) is the regularization term which takes the form

𝒥𝝉(𝐔,𝐕,𝐖)=14{1n1𝐔𝐔𝐈r1F2+1n2𝐕𝐕𝐈r2F2+1n3𝐖𝐖𝐈r3F2},subscript𝒥𝝉𝐔𝐕𝐖14superscriptsubscriptnorm1subscript𝑛1superscript𝐔top𝐔subscript𝐈subscript𝑟1𝐹2superscriptsubscriptnorm1subscript𝑛2superscript𝐕top𝐕subscript𝐈subscript𝑟2𝐹2superscriptsubscriptnorm1subscript𝑛3superscript𝐖top𝐖subscript𝐈subscript𝑟3𝐹2\mathcal{J}_{\bm{\tau}}(\mathbf{U},\mathbf{V},\mathbf{W})=\frac{1}{4}\left\{% \Big{\|}\frac{1}{n_{1}}\mathbf{U}^{\top}\mathbf{U}-\mathbf{I}_{r_{1}}\Big{\|}_% {F}^{2}+\Big{\|}\frac{1}{n_{2}}\mathbf{V}^{\top}\mathbf{V}-\mathbf{I}_{r_{2}}% \Big{\|}_{F}^{2}+\Big{\|}\frac{1}{n_{3}}\mathbf{W}^{\top}\mathbf{W}-\mathbf{I}% _{r_{3}}\Big{\|}_{F}^{2}\right\},caligraphic_J start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( bold_U , bold_V , bold_W ) = divide start_ARG 1 end_ARG start_ARG 4 end_ARG { ∥ divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG bold_U start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG bold_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_V - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG bold_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ,

encouraging the orthogonality among columns in 𝐔,𝐕𝐔𝐕\mathbf{U},\mathbf{V}bold_U , bold_V and 𝐖𝐖\mathbf{W}bold_W. A similar regularization term has also been employed in Han et al., (2022), which involves some additional tuning parameter and thus requires more computational efforts.

3 Computation

Define 𝒞𝒮={𝒮r1×r2×r3:𝒮Fc𝒮},𝒞𝐔={𝐔n1×r1:𝐔2c1},𝒞𝐕={𝐕n2×r2:𝐕2c2}formulae-sequencesubscript𝒞𝒮conditional-set𝒮superscriptsubscript𝑟1subscript𝑟2subscript𝑟3subscriptnorm𝒮𝐹subscript𝑐𝒮formulae-sequencesubscript𝒞𝐔conditional-set𝐔superscriptsubscript𝑛1subscript𝑟1subscriptnorm𝐔2subscript𝑐1subscript𝒞𝐕conditional-set𝐕superscriptsubscript𝑛2subscript𝑟2subscriptnorm𝐕2subscript𝑐2\mathcal{C}_{\mathcal{S}}=\{\mathcal{S}\in\mathbb{R}^{r_{1}\times r_{2}\times r% _{3}}:\|\mathcal{S}\|_{F}\leq c_{\mathcal{S}}\},~{}\mathcal{C}_{\mathbf{U}}=\{% \mathbf{U}\in\mathbb{R}^{n_{1}\times r_{1}}:\|\mathbf{U}\|_{2\to\infty}\leq c_% {1}\},~{}\mathcal{C}_{\mathbf{V}}=\{\mathbf{V}\in\mathbb{R}^{n_{2}\times r_{2}% }:\|\mathbf{V}\|_{2\to\infty}\leq c_{2}\}caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT = { caligraphic_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT : ∥ caligraphic_S ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT } , caligraphic_C start_POSTSUBSCRIPT bold_U end_POSTSUBSCRIPT = { bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT : ∥ bold_U ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , caligraphic_C start_POSTSUBSCRIPT bold_V end_POSTSUBSCRIPT = { bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT : ∥ bold_V ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, and 𝒞𝐖={𝐖n3×r3:𝐖2c3}subscript𝒞𝐖conditional-set𝐖superscriptsubscript𝑛3subscript𝑟3subscriptnorm𝐖2subscript𝑐3\mathcal{C}_{\mathbf{W}}=\{\mathbf{W}\in\mathbb{R}^{n_{3}\times r_{3}}:\|% \mathbf{W}\|_{2\to\infty}\leq c_{3}\}caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT = { bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT : ∥ bold_W ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT }, where c𝒮,c1,c2subscript𝑐𝒮subscript𝑐1subscript𝑐2c_{\mathcal{S}},c_{1},c_{2}italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are constants. Here n3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT could be L𝐿Litalic_L and K𝐾Kitalic_K and with a little abuse of notation, we use a generic 𝒞𝐖subscript𝒞𝐖\mathcal{C}_{\mathbf{W}}caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT. For any convex set 𝒞𝒞\mathcal{C}caligraphic_C, denote 𝒫𝒞subscript𝒫𝒞\mathcal{P}_{\mathcal{C}}caligraphic_P start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT to be the projection operator onto 𝒞𝒞\mathcal{C}caligraphic_C.

We develop an efficient projected gradient descent (PGD) updating algorithm to solve the optimization task in (7). Choose an initializer (𝒮𝝉(0),𝐔𝝉(0),𝐕𝝉(0),𝐖𝝉(0))subscriptsuperscript𝒮0𝝉subscriptsuperscript𝐔0𝝉subscriptsuperscript𝐕0𝝉subscriptsuperscript𝐖0𝝉(\mathcal{S}^{(0)}_{\bm{\tau}},\mathbf{U}^{(0)}_{\bm{\tau}},\mathbf{V}^{(0)}_{% \bm{\tau}},\mathbf{W}^{(0)}_{\bm{\tau}})( caligraphic_S start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_U start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ) such that 𝒮𝝉(0)𝒞𝒮,𝐔𝝉(0)𝒞𝐔,𝐕𝝉(0)𝒞𝐕formulae-sequencesubscriptsuperscript𝒮0𝝉subscript𝒞𝒮formulae-sequencesubscriptsuperscript𝐔0𝝉subscript𝒞𝐔subscriptsuperscript𝐕0𝝉subscript𝒞𝐕\mathcal{S}^{(0)}_{\bm{\tau}}\in\mathcal{C}_{\mathcal{S}},~{}\mathbf{U}^{(0)}_% {\bm{\tau}}\in\mathcal{C}_{\mathbf{U}},~{}\mathbf{V}^{(0)}_{\bm{\tau}}\in% \mathcal{C}_{\mathbf{V}}caligraphic_S start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ∈ caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , bold_U start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ∈ caligraphic_C start_POSTSUBSCRIPT bold_U end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ∈ caligraphic_C start_POSTSUBSCRIPT bold_V end_POSTSUBSCRIPT and 𝐖𝝉(0)𝒞𝐖subscriptsuperscript𝐖0𝝉subscript𝒞𝐖\mathbf{W}^{(0)}_{\bm{\tau}}\in\mathcal{C}_{\mathbf{W}}bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ∈ caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT, with 𝐔𝝉(0)𝐔𝝉(0)=n1𝐈r1,𝐕𝝉(0)𝐕𝝉(0)=n2𝐈r2formulae-sequencesuperscriptsubscriptsuperscript𝐔0𝝉topsubscriptsuperscript𝐔0𝝉subscript𝑛1subscript𝐈subscript𝑟1superscriptsubscriptsuperscript𝐕0𝝉topsubscriptsuperscript𝐕0𝝉subscript𝑛2subscript𝐈subscript𝑟2{\mathbf{U}^{(0)}_{\bm{\tau}}}^{\top}{\mathbf{U}^{(0)}_{\bm{\tau}}}=n_{1}% \mathbf{I}_{r_{1}},~{}{\mathbf{V}^{(0)}_{\bm{\tau}}}^{\top}{\mathbf{V}^{(0)}_{% \bm{\tau}}}=n_{2}\mathbf{I}_{r_{2}}bold_U start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝐖𝝉(0)𝐖𝝉(0)=n3𝐈r3superscriptsubscriptsuperscript𝐖0𝝉topsubscriptsuperscript𝐖0𝝉subscript𝑛3subscript𝐈subscript𝑟3{\mathbf{W}^{(0)}_{\bm{\tau}}}^{\top}{\mathbf{W}^{(0)}_{\bm{\tau}}}=n_{3}% \mathbf{I}_{r_{3}}bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Given (𝒮𝝉(r),𝐔𝝉(r),𝐕𝝉(r),𝐖𝝉(r))subscriptsuperscript𝒮𝑟𝝉subscriptsuperscript𝐔𝑟𝝉subscriptsuperscript𝐕𝑟𝝉subscriptsuperscript𝐖𝑟𝝉(\mathcal{S}^{(r)}_{\bm{\tau}},\mathbf{U}^{(r)}_{\bm{\tau}},\mathbf{V}^{(r)}_{% \bm{\tau}},\mathbf{W}^{(r)}_{\bm{\tau}})( caligraphic_S start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ) and 𝝉(r)=[𝒮𝝉(r);𝐔𝝉(r),𝐕𝝉(r),𝐖𝝉(r)]subscriptsuperscript𝑟𝝉subscriptsuperscript𝒮𝑟𝝉subscriptsuperscript𝐔𝑟𝝉subscriptsuperscript𝐕𝑟𝝉subscriptsuperscript𝐖𝑟𝝉\mathcal{M}^{(r)}_{\bm{\tau}}=[\mathcal{S}^{(r)}_{\bm{\tau}};\mathbf{U}^{(r)}_% {\bm{\tau}},\mathbf{V}^{(r)}_{\bm{\tau}},\mathbf{W}^{(r)}_{\bm{\tau}}]caligraphic_M start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT = [ caligraphic_S start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ], we implement the following updating scheme with step size ζ𝜁\zetaitalic_ζ:

𝐔𝝉(r+1)subscriptsuperscript𝐔𝑟1𝝉\displaystyle\mathbf{U}^{(r+1)}_{\bm{\tau}}bold_U start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT =𝒫𝒞𝐔{𝐔𝝉(r)+ζ[n1l(𝝉(r);𝝉)𝐔γ𝐔𝝉(r)(1n1𝐔𝝉(r)𝐔𝝉(r)𝐈r1)]};absentsubscript𝒫subscript𝒞𝐔subscriptsuperscript𝐔𝑟𝝉𝜁delimited-[]subscript𝑛1𝑙subscriptsuperscript𝑟𝝉𝝉𝐔𝛾subscriptsuperscript𝐔𝑟𝝉1subscript𝑛1superscriptsubscriptsuperscript𝐔𝑟𝝉topsubscriptsuperscript𝐔𝑟𝝉subscript𝐈subscript𝑟1\displaystyle=\mathcal{P}_{\mathcal{C}_{\mathbf{U}}}\left\{\mathbf{U}^{(r)}_{% \bm{\tau}}+\zeta\left[n_{1}\frac{\partial l(\mathcal{M}^{(r)}_{\bm{\tau}};\bm{% \tau})}{\partial\mathbf{U}}-\gamma\mathbf{U}^{(r)}_{\bm{\tau}}\left(\frac{1}{n% _{1}}{\mathbf{U}^{(r)}_{\bm{\tau}}}^{\top}\mathbf{U}^{(r)}_{\bm{\tau}}-\mathbf% {I}_{r_{1}}\right)\right]\right\};= caligraphic_P start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT bold_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT { bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT + italic_ζ [ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG ∂ italic_l ( caligraphic_M start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_italic_τ ) end_ARG start_ARG ∂ bold_U end_ARG - italic_γ bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] } ; (8)
𝐕𝝉(r+1)subscriptsuperscript𝐕𝑟1𝝉\displaystyle\mathbf{V}^{(r+1)}_{\bm{\tau}}bold_V start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT =𝒫𝒞𝐕{𝐕𝝉(r)+ζ[n2l(𝝉(r);𝝉)𝐕γ𝐕𝝉(r)(1n2𝐕𝝉(r)𝐕𝝉(r)𝐈r2)]};absentsubscript𝒫subscript𝒞𝐕subscriptsuperscript𝐕𝑟𝝉𝜁delimited-[]subscript𝑛2𝑙subscriptsuperscript𝑟𝝉𝝉𝐕𝛾subscriptsuperscript𝐕𝑟𝝉1subscript𝑛2superscriptsubscriptsuperscript𝐕𝑟𝝉topsubscriptsuperscript𝐕𝑟𝝉subscript𝐈subscript𝑟2\displaystyle=\mathcal{P}_{\mathcal{C}_{\mathbf{V}}}\left\{\mathbf{V}^{(r)}_{% \bm{\tau}}+\zeta\left[n_{2}\frac{\partial l(\mathcal{M}^{(r)}_{\bm{\tau}};\bm{% \tau})}{\partial\mathbf{V}}-\gamma\mathbf{V}^{(r)}_{\bm{\tau}}\left(\frac{1}{n% _{2}}{\mathbf{V}^{(r)}_{\bm{\tau}}}^{\top}\mathbf{V}^{(r)}_{\bm{\tau}}-\mathbf% {I}_{r_{2}}\right)\right]\right\};= caligraphic_P start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT bold_V end_POSTSUBSCRIPT end_POSTSUBSCRIPT { bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT + italic_ζ [ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG ∂ italic_l ( caligraphic_M start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_italic_τ ) end_ARG start_ARG ∂ bold_V end_ARG - italic_γ bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] } ;
𝐖𝝉(r+1)subscriptsuperscript𝐖𝑟1𝝉\displaystyle\mathbf{W}^{(r+1)}_{\bm{\tau}}bold_W start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT =𝒫𝒞𝐖{𝐖𝝉(r)+ζ[n3l(𝝉(r);𝝉)𝐖γ𝐖𝝉(r)(1n3𝐖(r)𝝉𝐖𝝉(r)𝐈r3)]};absentsubscript𝒫subscript𝒞𝐖subscriptsuperscript𝐖𝑟𝝉𝜁delimited-[]subscript𝑛3𝑙subscriptsuperscript𝑟𝝉𝝉𝐖𝛾subscriptsuperscript𝐖𝑟𝝉1subscript𝑛3superscriptsubscriptsuperscript𝐖𝑟𝝉topsubscriptsuperscript𝐖𝑟𝝉subscript𝐈subscript𝑟3\displaystyle=\mathcal{P}_{\mathcal{C}_{\mathbf{W}}}\left\{\mathbf{W}^{(r)}_{% \bm{\tau}}+\zeta\left[n_{3}\frac{\partial l(\mathcal{M}^{(r)}_{\bm{\tau}};\bm{% \tau})}{\partial\mathbf{W}}-\gamma\mathbf{W}^{(r)}_{\bm{\tau}}\left(\frac{1}{n% _{3}}{\mathbf{W}^{(r)}}_{\bm{\tau}}^{\top}\mathbf{W}^{(r)}_{\bm{\tau}}-\mathbf% {I}_{r_{3}}\right)\right]\right\};= caligraphic_P start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT { bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT + italic_ζ [ italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT divide start_ARG ∂ italic_l ( caligraphic_M start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_italic_τ ) end_ARG start_ARG ∂ bold_W end_ARG - italic_γ bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT - bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] } ;
𝒮𝝉(r+1)subscriptsuperscript𝒮𝑟1𝝉\displaystyle\mathcal{S}^{(r+1)}_{\bm{\tau}}caligraphic_S start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT =𝒫𝒞𝒮{𝒮𝝉(r)+ζl(𝝉(r);𝝉)𝒮},absentsubscript𝒫subscript𝒞𝒮subscriptsuperscript𝒮𝑟𝝉𝜁𝑙subscriptsuperscript𝑟𝝉𝝉𝒮\displaystyle=\mathcal{P}_{\mathcal{C}_{\mathcal{S}}}\left\{\mathcal{S}^{(r)}_% {\bm{\tau}}+\zeta\frac{\partial l(\mathcal{M}^{(r)}_{\bm{\tau}};\bm{\tau})}{% \partial\mathcal{S}}\right\},= caligraphic_P start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT { caligraphic_S start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT + italic_ζ divide start_ARG ∂ italic_l ( caligraphic_M start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_italic_τ ) end_ARG start_ARG ∂ caligraphic_S end_ARG } ,

and let 𝝉(r+1)=[𝒮𝝉(r+1);𝐔𝝉(r+1),𝐕𝝉(r+1),𝐖𝝉(r+1)]subscriptsuperscript𝑟1𝝉subscriptsuperscript𝒮𝑟1𝝉subscriptsuperscript𝐔𝑟1𝝉subscriptsuperscript𝐕𝑟1𝝉subscriptsuperscript𝐖𝑟1𝝉\mathcal{M}^{(r+1)}_{\bm{\tau}}=[\mathcal{S}^{(r+1)}_{\bm{\tau}};\mathbf{U}^{(% r+1)}_{\bm{\tau}},\mathbf{V}^{(r+1)}_{\bm{\tau}},\mathbf{W}^{(r+1)}_{\bm{\tau}}]caligraphic_M start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT = [ caligraphic_S start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ; bold_U start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_W start_POSTSUPERSCRIPT ( italic_r + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ]. We repeat the above updating scheme for a relative large number of iterations, say R𝑅Ritalic_R, and let (𝒮^𝝉,𝐔^𝝉,𝐕^𝝉,𝐖^𝝉)=(𝒮𝝉(R),𝐔𝝉(R),𝐕𝝉(R),𝐖𝝉(R))subscript^𝒮𝝉subscript^𝐔𝝉subscript^𝐕𝝉subscript^𝐖𝝉superscriptsubscript𝒮𝝉𝑅superscriptsubscript𝐔𝝉𝑅superscriptsubscript𝐕𝝉𝑅superscriptsubscript𝐖𝝉𝑅(\widehat{\mathcal{S}}_{\bm{\tau}},\widehat{\mathbf{U}}_{\bm{\tau}},\widehat{% \mathbf{V}}_{\bm{\tau}},\widehat{\mathbf{W}}_{\bm{\tau}})=(\mathcal{S}_{\bm{% \tau}}^{(R)},\mathbf{U}_{\bm{\tau}}^{(R)},\mathbf{V}_{\bm{\tau}}^{(R)},\mathbf% {W}_{\bm{\tau}}^{(R)})( over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ) = ( caligraphic_S start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT , bold_U start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT , bold_V start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT , bold_W start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT ) be the initial estimation.

Remark 1.

We point out that the updating scheme in (8) differs from the standard projected gradient descent update, as different step sizes are used for updating different variables. Specifically, the step sizes for updating 𝐔𝛕(r),𝐕𝛕(r),𝐖𝛕(r),𝒮𝛕(r)subscriptsuperscript𝐔𝑟𝛕subscriptsuperscript𝐕𝑟𝛕subscriptsuperscript𝐖𝑟𝛕subscriptsuperscript𝒮𝑟𝛕\mathbf{U}^{(r)}_{\bm{\tau}},\mathbf{V}^{(r)}_{\bm{\tau}},~{}\mathbf{W}^{(r)}_% {\bm{\tau}},\mathcal{S}^{(r)}_{\bm{\tau}}bold_U start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_V start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , bold_W start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT , caligraphic_S start_POSTSUPERSCRIPT ( italic_r ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT are n1ζ,n2ζ,n3ζsubscript𝑛1𝜁subscript𝑛2𝜁subscript𝑛3𝜁n_{1}\zeta,n_{2}\zeta,n_{3}\zetaitalic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ζ , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ζ , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_ζ and ζ𝜁\zetaitalic_ζ, respectively. This is the key difference from the algorithm in Han et al., (2022), which is also the reason that we do not require additional tuning parameter in 𝒥𝛕(𝐔,𝐕,𝐖)subscript𝒥𝛕𝐔𝐕𝐖\mathcal{J}_{\bm{\tau}}(\mathbf{U},\mathbf{V},\mathbf{W})caligraphic_J start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( bold_U , bold_V , bold_W ) and 𝒥𝛈(𝐔,𝐕,𝐖)subscript𝒥𝛈𝐔𝐕𝐖\mathcal{J}_{\bm{\eta}}(\mathbf{U},\mathbf{V},\mathbf{W})caligraphic_J start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ( bold_U , bold_V , bold_W ). As will be shown in Theorem 2 and 4, ζ𝜁\zetaitalic_ζ is chosen as cn1n2T𝑐subscript𝑛1subscript𝑛2𝑇\frac{c}{n_{1}n_{2}T}divide start_ARG italic_c end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T end_ARG.

It remains to determine the number of merged interval K𝐾Kitalic_K in (5). In particular, we set

K^=argminS{min𝒫(𝒫;S)+νnTS},^𝐾subscriptargmin𝑆subscript𝒫𝒫𝑆subscript𝜈𝑛𝑇𝑆\widehat{K}=\operatornamewithlimits{arg\,min}_{S}\big{\{}\min_{\mathcal{P}}% \mathcal{L}(\mathcal{P};S)+\nu_{nT}S\big{\}},over^ start_ARG italic_K end_ARG = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT { roman_min start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT caligraphic_L ( caligraphic_P ; italic_S ) + italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT italic_S } , (9)

where 𝒫={𝒫1,,𝒫S}𝒫subscript𝒫1subscript𝒫𝑆\mathcal{P}=\{\mathcal{P}_{1},...,\mathcal{P}_{S}\}caligraphic_P = { caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT } is an ordered partition of [L]delimited-[]𝐿[L][ italic_L ], νnTsubscript𝜈𝑛𝑇\nu_{nT}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT is a quantity to be specified in Theorem 3, and

(𝒫;S)=1Ls=1Sl𝒫s𝐰~l,𝜹𝝁s2,𝒫𝑆1𝐿superscriptsubscript𝑠1𝑆subscript𝑙subscript𝒫𝑠superscriptnormsubscript~𝐰𝑙𝜹subscript𝝁𝑠2\mathcal{L}(\mathcal{P};S)=\frac{1}{L}\sum_{s=1}^{S}\sum_{l\in\mathcal{P}_{s}}% \|\widetilde{\mathbf{w}}_{l,\bm{\delta}}-\bm{\mu}_{s}\|^{2},caligraphic_L ( caligraphic_P ; italic_S ) = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_P start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (10)

with 𝝁s=|𝒫s|1l𝒫s𝐰~l,𝜹subscript𝝁𝑠superscriptsubscript𝒫𝑠1subscript𝑙subscript𝒫𝑠subscript~𝐰𝑙𝜹\bm{\mu}_{s}=|\mathcal{P}_{s}|^{-1}\sum_{l\in\mathcal{P}_{s}}\widetilde{% \mathbf{w}}_{l,\bm{\delta}}bold_italic_μ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = | caligraphic_P start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_P start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT. More importantly, K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG is a consistent estimator of K𝐾Kitalic_K as to be shown in Theorem 3, which can be technically more involved than estimating the number of change points in (𝐰~1,𝜹,,𝐰~L,𝜹)subscript~𝐰1𝜹subscript~𝐰𝐿𝜹(\widetilde{\mathbf{w}}_{1,\bm{\delta}},...,\widetilde{\mathbf{w}}_{L,\bm{% \delta}})( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT , … , over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_L , bold_italic_δ end_POSTSUBSCRIPT ), due to the mutual dependence among 𝐰~1,𝜹,,𝐰~L,𝜹subscript~𝐰1𝜹subscript~𝐰𝐿𝜹\widetilde{\mathbf{w}}_{1,\bm{\delta}},...,\widetilde{\mathbf{w}}_{L,\bm{% \delta}}over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT , … , over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_L , bold_italic_δ end_POSTSUBSCRIPT.

Figure 1 gives a visual illustration for the proposed procedure, and Algorithm 1 further gives more detailed implementations. The Tucker ranks (r1,r2,r3)subscript𝑟1subscript𝑟2subscript𝑟3(r_{1},r_{2},r_{3})( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) in the requirement of Algorithm 1 could be selected based on 𝒴𝜹subscript𝒴𝜹\mathcal{Y}_{\bm{\delta}}caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT in the same way as Han et al. (2022).

Figure 1: Flowchart for the estimation procedure, where n=max{n1,n2}𝑛subscript𝑛1subscript𝑛2n=\max\{n_{1},n_{2}\}italic_n = roman_max { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } and the logarithmic factors are suppressed.
{\cal E}caligraphic_E(𝜹,𝒴𝜹)𝜹subscript𝒴𝜹(\bm{\delta},\mathcal{Y}_{\bm{\delta}})( bold_italic_δ , caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT )^𝜹subscript^𝜹\widehat{\mathcal{M}}_{\bm{\delta}}over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPTTn23succeeds𝑇superscript𝑛23T\succ n^{\frac{2}{3}}italic_T ≻ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT(𝜼^,𝒴𝜼^)^𝜼subscript𝒴^𝜼(\widehat{\bm{\eta}},\mathcal{Y}_{\widehat{\bm{\eta}}})( over^ start_ARG bold_italic_η end_ARG , caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT )^𝜼^subscript^^𝜼\widehat{\mathcal{M}}_{\widehat{\bm{\eta}}}over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPTEqual spaced mergingLnTasymptotically-equals𝐿𝑛𝑇L\asymp n\sqrt{T}italic_L ≍ italic_n square-root start_ARG italic_T end_ARG PGDYesAdaptive mergingPGD
Algorithm 1 Estimating longitudinal networks via adaptive merging
1:Temporal edges ={(im,jm,tm)}m=1Msuperscriptsubscriptsubscript𝑖𝑚subscript𝑗𝑚subscript𝑡𝑚𝑚1𝑀{\cal E}=\{(i_{m},j_{m},t_{m})\}_{m=1}^{M}caligraphic_E = { ( italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT and (n1,n2,T)subscript𝑛1subscript𝑛2𝑇(n_{1},n_{2},T)( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_T ), Tucker ranks (r1,r2,r3)subscript𝑟1subscript𝑟2subscript𝑟3(r_{1},r_{2},r_{3})( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), baseline intensity λ0>0subscript𝜆00\lambda_{0}>0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0, step size constant c>0𝑐0c>0italic_c > 0, constraint parameters (c𝒮,c1,c2,c3)subscript𝑐𝒮subscript𝑐1subscript𝑐2subscript𝑐3(c_{\mathcal{S}},c_{1},c_{2},c_{3})( italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
2:Determine L𝐿Litalic_L and the equal spaced partition 𝜹𝜹\bm{\delta}bold_italic_δ according to Table 1;
3:Formulate response tensor 𝒴𝜹subscript𝒴𝜹\mathcal{Y}_{\bm{\delta}}caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT based on {\cal E}caligraphic_E;
4:Perform (8) based on (𝜹,𝒴𝜹)𝜹subscript𝒴𝜹(\bm{\delta},\mathcal{Y}_{\bm{\delta}})( bold_italic_δ , caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) to obtain ^𝜹=[𝒮^𝜹;𝐔^𝜹,𝐕^𝜹,𝐖^𝜹]subscript^𝜹subscript^𝒮𝜹subscript^𝐔𝜹subscript^𝐕𝜹subscript^𝐖𝜹\widehat{\mathcal{M}}_{\bm{\delta}}=[\widehat{\mathcal{S}}_{\bm{\delta}};% \widehat{\mathbf{U}}_{\bm{\delta}},\widehat{\mathbf{V}}_{\bm{\delta}},\widehat% {\mathbf{W}}_{\bm{\delta}}]over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = [ over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ; over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ];
5:if Tn23log1+23ϵ(nT)precedes-or-equals𝑇superscript𝑛23superscript123italic-ϵ𝑛𝑇T\preceq n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)italic_T ⪯ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) then \triangleright See Section 4 for more details.
6:     return ^𝜹=[𝒮^𝜹;𝐔^𝜹,𝐕^𝜹,𝐖^𝜹]subscript^𝜹subscript^𝒮𝜹subscript^𝐔𝜹subscript^𝐕𝜹subscript^𝐖𝜹\widehat{\mathcal{M}}_{\bm{\delta}}=[\widehat{\mathcal{S}}_{\bm{\delta}};% \widehat{\mathbf{U}}_{\bm{\delta}},\widehat{\mathbf{V}}_{\bm{\delta}},\widehat% {\mathbf{W}}_{\bm{\delta}}]over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = [ over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ; over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ].
7:else
8:     Determine K𝐾Kitalic_K based on (9), and obtain 𝜼^^𝜼\widehat{\bm{\eta}}over^ start_ARG bold_italic_η end_ARG based on (5);
9:     Formulate response tensor 𝒴𝜼^subscript𝒴^𝜼\mathcal{Y}_{\widehat{\bm{\eta}}}caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT;
10:     Perform (8) based on (𝜼^,𝒴𝜼^)^𝜼subscript𝒴^𝜼(\widehat{\bm{\eta}},\mathcal{Y}_{\widehat{\bm{\eta}}})( over^ start_ARG bold_italic_η end_ARG , caligraphic_Y start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ) to obtain ^𝜼^=[𝒮^𝜼^;𝐔^𝜼^,𝐕^𝜼^,𝐖^𝜼^]subscript^^𝜼subscript^𝒮^𝜼subscript^𝐔^𝜼subscript^𝐕^𝜼subscript^𝐖^𝜼\widehat{\mathcal{M}}_{\widehat{\bm{\eta}}}=[\widehat{\mathcal{S}}_{\widehat{% \bm{\eta}}};\widehat{\mathbf{U}}_{\widehat{\bm{\eta}}},\widehat{\mathbf{V}}_{% \widehat{\bm{\eta}}},\widehat{\mathbf{W}}_{\widehat{\bm{\eta}}}]over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT = [ over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ; over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ];
11:     return ^𝜼^=[𝒮^𝜼^;𝐔^𝜼^,𝐕^𝜼^,𝐖^𝜼^]subscript^^𝜼subscript^𝒮^𝜼subscript^𝐔^𝜼subscript^𝐕^𝜼subscript^𝐖^𝜼\widehat{\mathcal{M}}_{\widehat{\bm{\eta}}}=[\widehat{\mathcal{S}}_{\widehat{% \bm{\eta}}};\widehat{\mathbf{U}}_{\widehat{\bm{\eta}}},\widehat{\mathbf{V}}_{% \widehat{\bm{\eta}}},\widehat{\mathbf{W}}_{\widehat{\bm{\eta}}}]over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT = [ over^ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ; over^ start_ARG bold_U end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT , over^ start_ARG bold_W end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT ].
12:end if

4 Theory

Suppose the longitudinal network 𝒢tsubscript𝒢𝑡{\cal G}_{t}caligraphic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is generated with 𝚯(t)=𝒮×1𝐔×2𝐕×3𝐰(t)superscript𝚯𝑡subscript3subscript2subscript1superscript𝒮superscript𝐔superscript𝐕superscript𝐰𝑡\bm{\Theta}^{*}(t)=\mathcal{S}^{*}\times_{1}\mathbf{U}^{*}\times_{2}\mathbf{V}% ^{*}\times_{3}\mathbf{w}^{*}(t)bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ), where rank(Ψs(𝒮))=rsranksubscriptΨ𝑠superscript𝒮subscript𝑟𝑠\text{rank}(\Psi_{s}(\mathcal{S}^{*}))=r_{s}rank ( roman_Ψ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) = italic_r start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for s=1,2,3𝑠123s=1,2,3italic_s = 1 , 2 , 3, 𝐔𝐔=n1𝐈r1superscriptsuperscript𝐔topsuperscript𝐔subscript𝑛1subscript𝐈subscript𝑟1{\mathbf{U}^{*}}^{\top}\mathbf{U}^{*}=n_{1}\mathbf{I}_{r_{1}}bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, 𝐕𝐕=n2𝐈r2superscriptsuperscript𝐕topsuperscript𝐕subscript𝑛2subscript𝐈subscript𝑟2{\mathbf{V}^{*}}^{\top}\mathbf{V}^{*}=n_{2}\mathbf{I}_{r_{2}}bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 0T𝐰(t)𝐰(t)𝑑t=T𝐈r3superscriptsubscript0𝑇superscript𝐰𝑡superscript𝐰superscript𝑡topdifferential-d𝑡𝑇subscript𝐈subscript𝑟3\int_{0}^{T}\mathbf{w}^{*}(t)\mathbf{w}^{*}(t)^{\top}dt=T\mathbf{I}_{r_{3}}∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_d italic_t = italic_T bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Further, suppose 𝐰(t)superscript𝐰𝑡\mathbf{w}^{*}(t)bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) is a piecewise constant function of t𝑡titalic_t in that 𝐰(t)=𝐰k,𝜼superscript𝐰𝑡subscriptsuperscript𝐰𝑘𝜼\mathbf{w}^{*}(t)=\mathbf{w}^{*}_{k,\bm{\eta}}bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k , bold_italic_η end_POSTSUBSCRIPT for t[ηk1,ηk)𝑡subscript𝜂𝑘1subscript𝜂𝑘t\in[\eta_{k-1},\eta_{k})italic_t ∈ [ italic_η start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where 𝜼=(η1,,ηK0)𝜼superscriptsubscript𝜂1subscript𝜂subscript𝐾0top\bm{\eta}=(\eta_{1},...,\eta_{K_{0}})^{\top}bold_italic_η = ( italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_η start_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with 0=η0<η1<<ηK0=T0subscript𝜂0subscript𝜂1subscript𝜂subscript𝐾0𝑇0=\eta_{0}<\eta_{1}<...<\eta_{K_{0}}=T0 = italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_η start_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_T. Let 𝐖𝜼K0×r3subscriptsuperscript𝐖𝜼superscriptsubscript𝐾0subscript𝑟3\mathbf{W}^{*}_{\bm{\eta}}\in\mathbb{R}^{{K_{0}}\times r_{3}}bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT with (𝐖𝜼)[k,]=𝐰k,𝜼(\mathbf{W}^{*}_{\bm{\eta}})_{[k,]}=\mathbf{w}^{*}_{k,\bm{\eta}}( bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT [ italic_k , ] end_POSTSUBSCRIPT = bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k , bold_italic_η end_POSTSUBSCRIPT, and 𝜼=[𝒮;𝐔,𝐕,𝐖𝜼]subscriptsuperscript𝜼superscript𝒮superscript𝐔superscript𝐕subscriptsuperscript𝐖𝜼\mathcal{M}^{*}_{\bm{\eta}}=[\mathcal{S}^{*};\mathbf{U}^{*},\mathbf{V}^{*},% \mathbf{W}^{*}_{\bm{\eta}}]caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT = [ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ].

4.1 A new error bound for the PGD algorithm

We first derive the upper bound for the tensor estimation error in each iteration of the PGD algorithm (8). Let 𝝉=(τ1,,τn3)n3𝝉superscriptsubscript𝜏1subscript𝜏subscript𝑛3topsuperscriptsubscript𝑛3\bm{\tau}=(\tau_{1},...,\tau_{n_{3}})^{\top}\in\mathbb{R}^{n_{3}}bold_italic_τ = ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT with 0=τ0<τ1<<τn3=T0subscript𝜏0subscript𝜏1subscript𝜏subscript𝑛3𝑇0=\tau_{0}<\tau_{1}<...<\tau_{n_{3}}=T0 = italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … < italic_τ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_T be a generic partition of [0,T)0𝑇[0,T)[ 0 , italic_T ), which could be 𝜹𝜹\bm{\delta}bold_italic_δ or 𝜼^^𝜼\widehat{\bm{\eta}}over^ start_ARG bold_italic_η end_ARG. Recall that

l(;𝝉)𝑙𝝉\displaystyle l(\mathcal{M};\bm{\tau})italic_l ( caligraphic_M ; bold_italic_τ ) =i=1n1j=1n2k=1n3{mijk|𝒯ij[τk1,τk)|emijkλ0(τkτk1)},and defineabsentsuperscriptsubscript𝑖1subscript𝑛1superscriptsubscript𝑗1subscript𝑛2superscriptsubscript𝑘1subscript𝑛3conditional-setsubscript𝑚𝑖𝑗𝑘subscript𝒯𝑖𝑗conditionalsubscript𝜏𝑘1subscript𝜏𝑘superscript𝑒subscript𝑚𝑖𝑗𝑘subscript𝜆0subscript𝜏𝑘subscript𝜏𝑘1and define\displaystyle=\sum_{i=1}^{n_{1}}\sum_{j=1}^{n_{2}}\sum_{k=1}^{n_{3}}\left\{m_{% ijk}\big{|}\mathcal{T}_{ij}\cap[\tau_{k-1},\tau_{k})\big{|}-e^{m_{ijk}}\lambda% _{0}(\tau_{k}-\tau_{k-1})\right\},~{}~{}\text{and define}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { italic_m start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | - italic_e start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) } , and define
𝒞~,𝝉subscript~𝒞𝝉\displaystyle\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT ={=[𝒮;𝐔,𝐕,𝐖]:𝒮r1×r2×r3,𝐔n1×r1,𝐕n2×r2,𝐖n3×r3,\displaystyle=\Big{\{}\mathcal{M}=[\mathcal{S};\mathbf{U},\mathbf{V},\mathbf{W% }]:~{}\mathcal{S}\in\mathbb{R}^{r_{1}\times r_{2}\times r_{3}},\mathbf{U}\in% \mathbb{R}^{n_{1}\times r_{1}},\mathbf{V}\in\mathbb{R}^{n_{2}\times r_{2}},% \mathbf{W}\in\mathbb{R}^{n_{3}\times r_{3}},= { caligraphic_M = [ caligraphic_S ; bold_U , bold_V , bold_W ] : caligraphic_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
𝒮F=𝐔F=𝐕F=𝐖F=1,and at least two of the followings hold:formulae-sequencesubscriptnorm𝒮𝐹subscriptnorm𝐔𝐹subscriptnorm𝐕𝐹subscriptnorm𝐖𝐹1and at least two of the followings hold:\displaystyle\|\mathcal{S}\|_{F}=\|\mathbf{U}\|_{F}=\|\mathbf{V}\|_{F}=\|% \mathbf{W}\|_{F}=1,~{}\text{and at least two of the followings hold:}∥ caligraphic_S ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∥ bold_U ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∥ bold_V ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∥ bold_W ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = 1 , and at least two of the followings hold:
𝐔22c1n11/2,𝐕22c2n21/2,𝐖22c3n31/2}.\displaystyle\|\mathbf{U}\|_{2\to\infty}\leq 2c_{1}n_{1}^{-1/2},~{}~{}\|% \mathbf{V}\|_{2\to\infty}\leq 2c_{2}n_{2}^{-1/2},~{}~{}\|\mathbf{W}\|_{2\to% \infty}\leq 2c_{3}n_{3}^{-1/2}\Big{\}}.∥ bold_U ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT , ∥ bold_V ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT , ∥ bold_W ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ 2 italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT } . (11)

Given 𝝉𝝉\bm{\tau}bold_italic_τ, for any tensor ¯n1×n2×n3¯superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\overline{\mathcal{M}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}over¯ start_ARG caligraphic_M end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, define ¯𝝉(t)=(¯)[,,l]\overline{\mathcal{M}}_{\bm{\tau}}(t)=(\overline{\mathcal{M}})_{[,,l]}over¯ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( italic_t ) = ( over¯ start_ARG caligraphic_M end_ARG ) start_POSTSUBSCRIPT [ , , italic_l ] end_POSTSUBSCRIPT for any t[τl1,τl)𝑡subscript𝜏𝑙1subscript𝜏𝑙t\in[\tau_{l-1},\tau_{l})italic_t ∈ [ italic_τ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ), and

ξ𝝉(¯)=sup𝒞~,𝝉|l(¯;𝝉),|,subscript𝜉𝝉¯subscriptsupremumsubscript~𝒞𝝉𝑙¯𝝉\xi_{\bm{\tau}}(\overline{\mathcal{M}})=\sup_{\mathcal{M}\in\widetilde{% \mathcal{C}}_{\mathcal{M},\bm{\tau}}}\big{|}\langle\nabla l(\overline{\mathcal% {M}};\bm{\tau}),\mathcal{M}\rangle\big{|},italic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) = roman_sup start_POSTSUBSCRIPT caligraphic_M ∈ over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) , caligraphic_M ⟩ | ,

which essentially quantifies the difference between ¯¯\overline{\mathcal{M}}over¯ start_ARG caligraphic_M end_ARG and a stationary point of l(;𝝉)𝑙𝝉l(\cdot;\bm{\tau})italic_l ( ⋅ ; bold_italic_τ ) (Han et al.,, 2022). Actually, ξ𝝉(¯)subscript𝜉𝝉¯\xi_{\bm{\tau}}(\overline{\mathcal{M}})italic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) also measures the amplitude of l(¯;𝝉)𝑙¯𝝉\nabla l(\overline{\mathcal{M}};\bm{\tau})∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) projected onto the manifold of tensors with ranks (r1,r2,r3)subscript𝑟1subscript𝑟2subscript𝑟3(r_{1},r_{2},r_{3})( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) under the incoherence conditions. It is important to remark that the incoherence conditions in 𝒞~,𝝉subscript~𝒞𝝉\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT are the key factors to relax the strong intensity condition as required in Han et al., (2022). Furthermore, note that

ξ𝝉(¯)subscript𝜉𝝉¯absent\displaystyle\xi_{\bm{\tau}}(\overline{\mathcal{M}})\leqitalic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) ≤ sup𝒞~,𝝉|l(¯;𝝉)𝔼l(¯;𝝉),|+sup𝒞~,𝝉|𝔼l(¯;𝝉),|subscriptsupremumsubscript~𝒞𝝉𝑙¯𝝉𝔼𝑙¯𝝉subscriptsupremumsubscript~𝒞𝝉𝔼𝑙¯𝝉\displaystyle\sup_{\mathcal{M}\in\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau% }}}\big{|}\langle\nabla l(\overline{\mathcal{M}};\bm{\tau})-\mathbb{E}\nabla l% (\overline{\mathcal{M}};\bm{\tau}),\mathcal{M}\rangle\big{|}+\sup_{\mathcal{M}% \in\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}}\big{|}\langle\mathbb{E}% \nabla l(\overline{\mathcal{M}};\bm{\tau}),\mathcal{M}\rangle\big{|}roman_sup start_POSTSUBSCRIPT caligraphic_M ∈ over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) - blackboard_E ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) , caligraphic_M ⟩ | + roman_sup start_POSTSUBSCRIPT caligraphic_M ∈ over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ blackboard_E ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) , caligraphic_M ⟩ | (12)
=\displaystyle== I1+I2,subscript𝐼1subscript𝐼2\displaystyle I_{1}+I_{2},italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT characterize the amplitude of the statistical noise, and I2subscript𝐼2I_{2}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT quantifies the bias between ¯𝝉(t)subscript¯𝝉𝑡\overline{\mathcal{M}}_{\bm{\tau}}(t)over¯ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( italic_t ) and 𝚯(t)superscript𝚯𝑡\bm{\Theta}^{*}(t)bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ). To see this, if l(;𝝉)l(;\bm{\tau})italic_l ( ; bold_italic_τ ) is deterministic and ¯¯\overline{\mathcal{M}}over¯ start_ARG caligraphic_M end_ARG is a stationary point, I1=0subscript𝐼10I_{1}=0italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0; and if ¯𝝉(t)=𝚯(t)subscript¯𝝉𝑡superscript𝚯𝑡\overline{\mathcal{M}}_{\bm{\tau}}(t)=\bm{\Theta}^{*}(t)over¯ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( italic_t ) = bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ),

𝔼l(¯;𝝉)[i,j,k]𝔼𝑙subscript¯𝝉𝑖𝑗𝑘\displaystyle\mathbb{E}\nabla l(\overline{\mathcal{M}};\bm{\tau})_{[i,j,k]}blackboard_E ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) start_POSTSUBSCRIPT [ italic_i , italic_j , italic_k ] end_POSTSUBSCRIPT =𝔼|𝒯ij[τk1,τk)|em¯ijkλ0(τkτk1)absent𝔼subscript𝒯𝑖𝑗subscript𝜏𝑘1subscript𝜏𝑘superscript𝑒subscript¯𝑚𝑖𝑗𝑘subscript𝜆0subscript𝜏𝑘subscript𝜏𝑘1\displaystyle=\mathbb{E}\left|\mathcal{T}_{ij}\cap[\tau_{k-1},\tau_{k})\right|% -e^{\overline{m}_{ijk}}\lambda_{0}(\tau_{k}-\tau_{k-1})= blackboard_E | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | - italic_e start_POSTSUPERSCRIPT over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT )
=λ0(eθij(τk1)em¯ijk)(τkτk1)=0for any(i,j,k),absentsubscript𝜆0superscript𝑒superscriptsubscript𝜃𝑖𝑗subscript𝜏𝑘1superscript𝑒subscript¯𝑚𝑖𝑗𝑘subscript𝜏𝑘subscript𝜏𝑘10for any𝑖𝑗𝑘\displaystyle=\lambda_{0}(e^{\theta_{ij}^{*}(\tau_{k-1})}-e^{\overline{m}_{ijk% }})(\tau_{k}-\tau_{k-1})=0~{}\text{for any}~{}(i,j,k),= italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) = 0 for any ( italic_i , italic_j , italic_k ) ,

and thus I2=0subscript𝐼20I_{2}=0italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0, whereas I2subscript𝐼2I_{2}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT would be substantially larger than 0 if ¯𝝉(t)subscript¯𝝉𝑡\overline{\mathcal{M}}_{\bm{\tau}}(t)over¯ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( italic_t ) differs from 𝚯(t)superscript𝚯𝑡\bm{\Theta}^{*}(t)bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ).

Note that if 𝝉𝝉\bm{\tau}bold_italic_τ is not a superset of 𝜼𝜼\bm{\eta}bold_italic_η, there exists no ¯¯\overline{\mathcal{M}}over¯ start_ARG caligraphic_M end_ARG such that ¯𝝉(t)𝚯(t)subscript¯𝝉𝑡superscript𝚯𝑡\overline{\mathcal{M}}_{\bm{\tau}}(t)\neq\bm{\Theta}^{*}(t)over¯ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( italic_t ) ≠ bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) for any t𝑡titalic_t. Therefore, for any pre-specified tensor ¯¯\overline{\mathcal{M}}over¯ start_ARG caligraphic_M end_ARG with bounded ξ𝝉(¯)subscript𝜉𝝉¯\xi_{\bm{\tau}}(\overline{\mathcal{M}})italic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ), its estimation error by 𝝉(R)superscriptsubscript𝝉𝑅\mathcal{M}_{\bm{\tau}}^{(R)}caligraphic_M start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT is established in Theorem 1.

Theorem 1.

Let ¯=[𝒮¯;𝐔¯,𝐕¯,𝐖¯]n1×n2×n3¯¯𝒮¯𝐔¯𝐕¯𝐖superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\overline{\mathcal{M}}=[\overline{\mathcal{S}};\overline{\mathbf{U}},\overline% {\mathbf{V}},\overline{\mathbf{W}}]\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}over¯ start_ARG caligraphic_M end_ARG = [ over¯ start_ARG caligraphic_S end_ARG ; over¯ start_ARG bold_U end_ARG , over¯ start_ARG bold_V end_ARG , over¯ start_ARG bold_W end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be a pre-specified order-3 tensor with 𝒮¯𝒞𝒮,𝐔¯𝒞𝐔formulae-sequence¯𝒮subscript𝒞𝒮¯𝐔subscript𝒞𝐔\overline{\mathcal{S}}\in\mathcal{C}_{\mathcal{S}},~{}\overline{\mathbf{U}}\in% \mathcal{C}_{\mathbf{U}}over¯ start_ARG caligraphic_S end_ARG ∈ caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , over¯ start_ARG bold_U end_ARG ∈ caligraphic_C start_POSTSUBSCRIPT bold_U end_POSTSUBSCRIPT, 𝐕¯𝒞𝐕¯𝐕subscript𝒞𝐕\overline{\mathbf{V}}\in\mathcal{C}_{\mathbf{V}}over¯ start_ARG bold_V end_ARG ∈ caligraphic_C start_POSTSUBSCRIPT bold_V end_POSTSUBSCRIPT, 𝐖¯𝒞𝐖¯𝐖subscript𝒞𝐖\overline{\mathbf{W}}\in\mathcal{C}_{\mathbf{W}}over¯ start_ARG bold_W end_ARG ∈ caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT, 𝐔¯𝐔¯=n1𝐈r1superscript¯𝐔top¯𝐔subscript𝑛1subscript𝐈subscript𝑟1\overline{\mathbf{U}}^{\top}\overline{\mathbf{U}}=n_{1}\mathbf{I}_{r_{1}}over¯ start_ARG bold_U end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_U end_ARG = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, 𝐕¯𝐕¯=n2𝐈r2superscript¯𝐕top¯𝐕subscript𝑛2subscript𝐈subscript𝑟2\overline{\mathbf{V}}^{\top}\overline{\mathbf{V}}=n_{2}\mathbf{I}_{r_{2}}over¯ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_V end_ARG = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, 𝐖¯𝐖¯=n3𝐈r3superscript¯𝐖top¯𝐖subscript𝑛3subscript𝐈subscript𝑟3\overline{\mathbf{W}}^{\top}\overline{\mathbf{W}}=n_{3}\mathbf{I}_{r_{3}}over¯ start_ARG bold_W end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_W end_ARG = italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and λ¯(𝒮¯)λ¯(𝒮¯)1asymptotically-equals¯𝜆¯𝒮¯𝜆¯𝒮asymptotically-equals1\underline{\lambda}(\overline{\mathcal{S}})\asymp\overline{\lambda}(\overline{% \mathcal{S}})\asymp 1under¯ start_ARG italic_λ end_ARG ( over¯ start_ARG caligraphic_S end_ARG ) ≍ over¯ start_ARG italic_λ end_ARG ( over¯ start_ARG caligraphic_S end_ARG ) ≍ 1.

Suppose there exists a quantity H(0,T]𝐻0𝑇H\in(0,T]italic_H ∈ ( 0 , italic_T ] such that min1ln3(τlτl1)max1ln3(τlτl1)Hasymptotically-equalssubscript1𝑙subscript𝑛3subscript𝜏𝑙subscript𝜏𝑙1subscript1𝑙subscript𝑛3subscript𝜏𝑙subscript𝜏𝑙1asymptotically-equals𝐻\min_{1\leq l\leq n_{3}}(\tau_{l}-\tau_{l-1})\asymp\max_{1\leq l\leq n_{3}}(% \tau_{l}-\tau_{l-1})\asymp Hroman_min start_POSTSUBSCRIPT 1 ≤ italic_l ≤ italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ) ≍ roman_max start_POSTSUBSCRIPT 1 ≤ italic_l ≤ italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ) ≍ italic_H. Further suppose γn1n2n3Hasymptotically-equals𝛾subscript𝑛1subscript𝑛2subscript𝑛3𝐻\gamma\asymp n_{1}n_{2}n_{3}Hitalic_γ ≍ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_H and ξ𝛕(¯)Pn1n2n3Hsubscriptprecedes-or-equals𝑃subscript𝜉𝛕¯subscript𝑛1subscript𝑛2subscript𝑛3𝐻\xi_{\bm{\tau}}(\overline{\mathcal{M}})\preceq_{P}\sqrt{n_{1}n_{2}n_{3}}Hitalic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT square-root start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_H. Then there exists c0>0subscript𝑐00c_{0}>0italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 such that for any step size ζ=cn1n2n3H𝜁𝑐subscript𝑛1subscript𝑛2subscript𝑛3𝐻\zeta=\frac{c}{n_{1}n_{2}n_{3}H}italic_ζ = divide start_ARG italic_c end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_H end_ARG with 0<c<c00𝑐subscript𝑐00<c<c_{0}0 < italic_c < italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have

1n1n2n3𝝉(R)¯F2Pξ𝝉(¯)2n1n2n3H2+c1(1κ)R,subscriptprecedes-or-equals𝑃1subscript𝑛1subscript𝑛2subscript𝑛3superscriptsubscriptnormsubscriptsuperscript𝑅𝝉¯𝐹2subscript𝜉𝝉superscript¯2subscript𝑛1subscript𝑛2subscript𝑛3superscript𝐻2subscript𝑐1superscript1𝜅𝑅\frac{1}{n_{1}n_{2}n_{3}}\|\mathcal{M}^{(R)}_{\bm{\tau}}-\overline{\mathcal{M}% }\|_{F}^{2}\preceq_{P}\frac{\xi_{\bm{\tau}}(\overline{\mathcal{M}})^{2}}{n_{1}% n_{2}n_{3}H^{2}}+c_{1}(1-\kappa)^{R},divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT - over¯ start_ARG caligraphic_M end_ARG ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT divide start_ARG italic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_κ ) start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT , (13)

for some constant 0<κ<10𝜅10<\kappa<10 < italic_κ < 1 and c1>0subscript𝑐10c_{1}>0italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0.

The term c(1κ)R𝑐superscript1𝜅𝑅c(1-\kappa)^{R}italic_c ( 1 - italic_κ ) start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT in (13) is the optimization error which decays linearly with iterations. Thanks to the regularizer in (7) and the restricted correlated gradient condition (Han et al.,, 2022) of the log-likelihood function, 𝝉(R)subscriptsuperscript𝑅𝝉\mathcal{M}^{(R)}_{\bm{\tau}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT will converge to a stationary point of the log-likelihood function at a linear convergence rate as R𝑅Ritalic_R grows. The upper bound in the right-hand side of (13) is thus dominated by the first term. Actually, we shall choose ¯¯\overline{\mathcal{M}}over¯ start_ARG caligraphic_M end_ARG to be 𝜹subscriptsuperscript𝜹\mathcal{M}^{*}_{\bm{\delta}}caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT in the initial estimate, which is specified in Theorem 2, and 𝜼subscriptsuperscript𝜼\mathcal{M}^{*}_{\bm{\eta}}caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT in the final estimate. The asymptotic orders of the corresponding ξ𝝉(¯)subscript𝜉𝝉¯\xi_{\bm{\tau}}(\overline{\mathcal{M}})italic_ξ start_POSTSUBSCRIPT bold_italic_τ end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_M end_ARG ) are established in Theorems 2 and 4.

It is interesting to remark that a similar upper bound for the tensor estimation error is established in Han et al., (2022). Yet, Theorem 1 differs from Han et al., (2022) in that the space 𝒞~,𝝉subscript~𝒞𝝉\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT associated with the empirical process ξτsubscript𝜉𝜏\xi_{\tau}italic_ξ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT is reduced by requiring additional incoherence conditions that 𝐔2μ1subscriptnorm𝐔2subscript𝜇1\|\mathbf{U}\|_{2\to\infty}\leq\mu_{1}∥ bold_U ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐕2μ2subscriptnorm𝐕2subscript𝜇2\|\mathbf{V}\|_{2\to\infty}\leq\mu_{2}∥ bold_V ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐖2μ3subscriptnorm𝐖2subscript𝜇3\|\mathbf{W}\|_{2\to\infty}\leq\mu_{3}∥ bold_W ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT with μk=log(nk)/nksubscript𝜇𝑘subscript𝑛𝑘subscript𝑛𝑘\mu_{k}=\sqrt{\log(n_{k})/n_{k}}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = square-root start_ARG roman_log ( italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG. The incoherence conditions for 𝐔,𝐕,𝐖𝐔𝐕𝐖\mathbf{U},\mathbf{V},\mathbf{W}bold_U , bold_V , bold_W in 𝒞~,𝝉subscript~𝒞𝝉\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT are the key ingredients to derive the convergence rate for the tensor estimation error in a complete regime, in contrast to the results in Han et al., (2022) and Cai et al., (2023) requiring the strong intensity condition.

The following proposition quantifies the Poisson tensor estimator error in the strong, medium and weak intensity regimes. The error bound in the strong intensity regime has been established in Han et al., (2022), while the error bounds in the other two regimes are new addition to the literature. It will be shown in Sections 4.2 and 4.3 that the derived error bound in the medium and weak intensity regimes are of great importance.

Proposition 1.

Let 𝒴n1×n2×n3𝒴superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\mathcal{Y}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}caligraphic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be a random tensor whose entries follow Poisson distribution with mean Iexp(¯)𝐼¯I\exp(\overline{\mathcal{M}})italic_I roman_exp ( over¯ start_ARG caligraphic_M end_ARG ), with I>0𝐼0I>0italic_I > 0 and ¯=[𝒮¯;𝐔¯,𝐕¯,𝐖¯]n1×n2×n3¯¯𝒮¯𝐔¯𝐕¯𝐖superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\overline{\mathcal{M}}=[\overline{\mathcal{S}};\overline{\mathbf{U}},\overline% {\mathbf{V}},\overline{\mathbf{W}}]\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}over¯ start_ARG caligraphic_M end_ARG = [ over¯ start_ARG caligraphic_S end_ARG ; over¯ start_ARG bold_U end_ARG , over¯ start_ARG bold_V end_ARG , over¯ start_ARG bold_W end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT satisfying the same conditions as in Theorem 1. For any n1×n2×n3superscriptsubscript𝑛1subscript𝑛2subscript𝑛3\mathcal{M}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}caligraphic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, let l()=i,j,l(mijlyijlIemijl)𝑙subscript𝑖𝑗𝑙subscript𝑚𝑖𝑗𝑙subscript𝑦𝑖𝑗𝑙𝐼superscript𝑒subscript𝑚𝑖𝑗𝑙l(\mathcal{M})=\sum_{i,j,l}(m_{ijl}y_{ijl}-Ie^{m_{ijl}})italic_l ( caligraphic_M ) = ∑ start_POSTSUBSCRIPT italic_i , italic_j , italic_l end_POSTSUBSCRIPT ( italic_m start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT - italic_I italic_e start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ), and ^(R)superscript^𝑅\widehat{\mathcal{M}}^{(R)}over^ start_ARG caligraphic_M end_ARG start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT be the estimate after R𝑅Ritalic_R iteration of (8), where l(;𝛕)𝑙𝛕l(\mathcal{M};\bm{\tau})italic_l ( caligraphic_M ; bold_italic_τ ) is replaced by l()𝑙l(\mathcal{M})italic_l ( caligraphic_M ). Suppose γn1n2n3Iasymptotically-equals𝛾subscript𝑛1subscript𝑛2subscript𝑛3𝐼\gamma\asymp n_{1}n_{2}n_{3}Iitalic_γ ≍ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I and ξ(¯):=sup𝒞~|l(¯),|Pn1n2n3Iassign𝜉¯subscriptsupremumsubscript~𝒞𝑙¯subscriptprecedes-or-equals𝑃subscript𝑛1subscript𝑛2subscript𝑛3𝐼\xi(\overline{\mathcal{M}}):=\sup_{\mathcal{M}\in\widetilde{\mathcal{C}}_{% \mathcal{M}}}\big{|}\langle\nabla l(\overline{\mathcal{M}}),\mathcal{M}\rangle% \big{|}\preceq_{P}\sqrt{n_{1}n_{2}n_{3}}Iitalic_ξ ( over¯ start_ARG caligraphic_M end_ARG ) := roman_sup start_POSTSUBSCRIPT caligraphic_M ∈ over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ) , caligraphic_M ⟩ | ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT square-root start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_I, where 𝒞~subscript~𝒞\widetilde{\mathcal{C}}_{\mathcal{M}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT is defined the same as 𝒞~,𝛕subscript~𝒞𝛕\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT in (4.1). Then there exists c0>0subscript𝑐00c_{0}>0italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 such that for any step size ζ=cn1n2n3I𝜁𝑐subscript𝑛1subscript𝑛2subscript𝑛3𝐼\zeta=\frac{c}{n_{1}n_{2}n_{3}I}italic_ζ = divide start_ARG italic_c end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I end_ARG with 0<c<c00𝑐subscript𝑐00<c<c_{0}0 < italic_c < italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have

1n1n2n3^(R)¯F2C(1κ)R+{n1+n2+n3n1n2n3I,if Ilog(n1n2n3),(n1+n2+n3)log1+ϵ(n1n2n3)n1n2n3I,if 1Ilog(n1n2n3),(n1+n2+n3)log(n1n2n3)n1n2n3I,if log(n1n2n3)n1n2n3I1,\frac{1}{n_{1}n_{2}n_{3}}\|\widehat{\mathcal{M}}^{(R)}-\overline{\mathcal{M}}% \|_{F}^{2}\preceq C(1-\kappa)^{R}+\left\{\begin{aligned} &\frac{n_{1}+n_{2}+n_% {3}}{n_{1}n_{2}n_{3}I},&&\mbox{if }~{}I\succ\log(n_{1}n_{2}n_{3}),\\ &\frac{(n_{1}+n_{2}+n_{3})\log^{1+\epsilon}(n_{1}n_{2}n_{3})}{n_{1}n_{2}n_{3}I% },&&\mbox{if }~{}1\preceq I\preceq\log(n_{1}n_{2}n_{3}),\\ &\frac{(n_{1}+n_{2}+n_{3})\log(n_{1}n_{2}n_{3})}{n_{1}n_{2}n_{3}I},&&\mbox{if % }~{}\frac{\log(n_{1}n_{2}n_{3})}{n_{1}\wedge n_{2}\wedge n_{3}}\prec I\prec 1,% \end{aligned}\right.divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG ∥ over^ start_ARG caligraphic_M end_ARG start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT - over¯ start_ARG caligraphic_M end_ARG ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ italic_C ( 1 - italic_κ ) start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT + { start_ROW start_CELL end_CELL start_CELL divide start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I end_ARG , end_CELL start_CELL end_CELL start_CELL if italic_I ≻ roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I end_ARG , end_CELL start_CELL end_CELL start_CELL if 1 ⪯ italic_I ⪯ roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I end_ARG , end_CELL start_CELL end_CELL start_CELL if divide start_ARG roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG ≺ italic_I ≺ 1 , end_CELL end_ROW

for some constant 0<κ<10𝜅10<\kappa<10 < italic_κ < 1 and c1>0subscript𝑐10c_{1}>0italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0.

Remark 2.

Thanks to the new error bound for the PGD algorithm in Theorem 1, the strong intensity condition Ilog(n1n2n3)succeeds𝐼subscript𝑛1subscript𝑛2subscript𝑛3I\succ\log(n_{1}n_{2}n_{3})italic_I ≻ roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) as required in Han et al., (2022) and Cai et al., (2023) can be relaxed, and similar upper bound can be obtained even when I𝐼Iitalic_I decays to 00. To the best of our knowledge, Proposition 1 gives the first Poisson tensor estimation error bound in both weak intensity regime with I1precedes𝐼1I\prec 1italic_I ≺ 1 and medium intensity regime with 1Ilog(n1n2n3)precedes-or-equals1𝐼precedes-or-equalssubscript𝑛1subscript𝑛2subscript𝑛31\preceq I\preceq\log(n_{1}n_{2}n_{3})1 ⪯ italic_I ⪯ roman_log ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ). If we suppress the logarithmic factor, the error bound is essentially n1+n2+n3n1n2n3Isubscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛1subscript𝑛2subscript𝑛3𝐼\frac{n_{1}+n_{2}+n_{3}}{n_{1}n_{2}n_{3}I}divide start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_I end_ARG in all regimes.

4.2 Error analysis based on equally spaced intervals

Let n=max{n1,n2}𝑛subscript𝑛1subscript𝑛2n=\max\{n_{1},n_{2}\}italic_n = roman_max { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } for simplicity and suppose n1n2nasymptotically-equalssubscript𝑛1subscript𝑛2asymptotically-equals𝑛n_{1}\asymp n_{2}\asymp nitalic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≍ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≍ italic_n. Define dmin=min1kK0(ηkηk1)/Tsubscript𝑑subscript1𝑘subscript𝐾0subscript𝜂𝑘subscript𝜂𝑘1𝑇d_{\min}=\min_{1\leq k\leq{K_{0}}}(\eta_{k}-\eta_{k-1})/Titalic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) / italic_T, dmax=max1kK0(ηkηk1)/Tsubscript𝑑subscript1𝑘subscript𝐾0subscript𝜂𝑘subscript𝜂𝑘1𝑇d_{\max}=\max_{1\leq k\leq{K_{0}}}(\eta_{k}-\eta_{k-1})/Titalic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) / italic_T and Δ𝜼=dminTsubscriptΔ𝜼subscript𝑑𝑇\Delta_{\bm{\eta}}=d_{\min}Troman_Δ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT italic_T. Suppose dmindmax1/K0asymptotically-equalssubscript𝑑subscript𝑑asymptotically-equals1subscript𝐾0d_{\min}\asymp d_{\max}\asymp 1/K_{0}italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≍ italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ≍ 1 / italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which requires that the lengths of all intervals based on 𝜼𝜼\bm{\eta}bold_italic_η are of the same order. Further suppose 𝒮Fc𝒮/max{2,(K0dmin)1/2}subscriptnormsuperscript𝒮𝐹subscript𝑐𝒮2superscriptsubscript𝐾0subscript𝑑12\|\mathcal{S}^{*}\|_{F}\leq c_{\mathcal{S}}/\max\{2,(K_{0}d_{\min})^{-1/2}\}∥ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT / roman_max { 2 , ( italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT }, 𝐔2c1subscriptnormsuperscript𝐔2subscript𝑐1\|\mathbf{U}^{*}\|_{2\to\infty}\leq c_{1}∥ bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐕2c2subscriptnormsuperscript𝐕2subscript𝑐2\|\mathbf{V}^{*}\|_{2\to\infty}\leq c_{2}∥ bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 → ∞ end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and supt[0,T)𝐰(t)c3/max{2,K0dmax}subscriptsupremum𝑡0𝑇normsuperscript𝐰𝑡subscript𝑐32subscript𝐾0subscript𝑑\sup_{t\in[0,T)}\|\mathbf{w}^{*}(t)\|\leq c_{3}/\max\{2,\sqrt{K_{0}d_{\max}}\}roman_sup start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ) end_POSTSUBSCRIPT ∥ bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) ∥ ≤ italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT / roman_max { 2 , square-root start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG }, where (c𝒮,c1,c2,c3)subscript𝑐𝒮subscript𝑐1subscript𝑐2subscript𝑐3(c_{\mathcal{S}},c_{1},c_{2},c_{3})( italic_c start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) are defined in (𝒞𝒮,𝒞𝐔,𝒞𝐕,𝒞𝐖)subscript𝒞𝒮subscript𝒞𝐔subscript𝒞𝐕subscript𝒞𝐖(\mathcal{C}_{\mathcal{S}},\mathcal{C}_{\mathbf{U}},\mathcal{C}_{\mathbf{V}},% \mathcal{C}_{\mathbf{W}})( caligraphic_C start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT bold_U end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT bold_V end_POSTSUBSCRIPT , caligraphic_C start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ) in the beginning of Section 3, and the different requirements for 𝒮Fsubscriptnorm𝒮𝐹\|\mathcal{S}\|_{F}∥ caligraphic_S ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and 𝐰(t)normsuperscript𝐰𝑡\|\mathbf{w}^{*}(t)\|∥ bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) ∥ are due to the normalization step (4). Recall that Δ𝜹=T/LsubscriptΔ𝜹𝑇𝐿\Delta_{\bm{\delta}}=T/Lroman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = italic_T / italic_L. Theorem 2 establishes the tensor error bound for the initial estimate based on the equally spaced interval 𝜹𝜹\bm{\delta}bold_italic_δ.

Theorem 2.

(Initial estimate) Choose γn2Tasymptotically-equals𝛾superscript𝑛2𝑇\gamma\asymp n^{2}Titalic_γ ≍ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T and ζ=cn2T𝜁𝑐superscript𝑛2𝑇\zeta=\frac{c}{n^{2}T}italic_ζ = divide start_ARG italic_c end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG for some small constant c>0𝑐0c>0italic_c > 0. Then, with probability approaching 1, it holds true that

1n1n2L𝜹(R)𝜹F2I1,𝜹+I2,𝜹+I3,𝜹,precedes-or-equals1subscript𝑛1subscript𝑛2𝐿superscriptsubscriptnormsubscriptsuperscript𝑅𝜹subscriptsuperscript𝜹𝐹2subscript𝐼1𝜹subscript𝐼2𝜹subscript𝐼3𝜹\frac{1}{n_{1}n_{2}L}\|\mathcal{M}^{(R)}_{\bm{\delta}}-\mathcal{M}^{*}_{\bm{% \delta}}\|_{F}^{2}\preceq I_{1,\bm{\delta}}+I_{2,\bm{\delta}}+I_{3,\bm{\delta}},divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 3 , bold_italic_δ end_POSTSUBSCRIPT ,

where 𝛅=[𝒮;𝐔,𝐕,𝐖𝛅]subscriptsuperscript𝛅superscript𝒮superscript𝐔superscript𝐕subscriptsuperscript𝐖𝛅\mathcal{M}^{*}_{\bm{\delta}}=[\mathcal{S}^{*};\mathbf{U}^{*},\mathbf{V}^{*},% \mathbf{W}^{*}_{\bm{\delta}}]caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = [ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ] with 𝐖𝛅L×r3subscriptsuperscript𝐖𝛅superscript𝐿subscript𝑟3\mathbf{W}^{*}_{\bm{\delta}}\in\mathbb{R}^{L\times r_{3}}bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that (𝐖𝛅)[l,]=𝐰(δl1)(\mathbf{W}^{*}_{\bm{\delta}})_{[l,]}=\mathbf{w}^{*}(\delta_{l-1})( bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT [ italic_l , ] end_POSTSUBSCRIPT = bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ). Here

I1,𝜹={1nT+Ln2T,if log(nT)Δ𝜹TK0,log1+ϵ(nT)nT+Llog1+ϵ(nT)n2T,if 1Δ𝜹log(nT),log(nT)nT+Llog(nT)n2T,if (n+L)2log(nT)n2LΔ𝜹1,I_{1,\bm{\delta}}=\left\{\begin{aligned} &\frac{1}{nT}+\frac{L}{n^{2}T},&&% \mbox{if }~{}\log(nT)\prec\Delta_{\bm{\delta}}\prec\frac{T}{K_{0}},\\ &\frac{\log^{1+\epsilon}(nT)}{nT}+\frac{L\log^{1+\epsilon}(nT)}{n^{2}T},&&% \mbox{if }~{}1\preceq\Delta_{\bm{\delta}}\preceq\log(nT),\\ &\frac{\log(nT)}{nT}+\frac{L\log(nT)}{n^{2}T},&&\mbox{if }~{}\frac{(n+L)^{2}% \log(nT)}{n^{2}L}\prec\Delta_{\bm{\delta}}\prec 1,\end{aligned}\right.italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT = { start_ROW start_CELL end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_L end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , end_CELL start_CELL end_CELL start_CELL if roman_log ( italic_n italic_T ) ≺ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ divide start_ARG italic_T end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_L roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , end_CELL start_CELL end_CELL start_CELL if 1 ⪯ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ⪯ roman_log ( italic_n italic_T ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG roman_log ( italic_n italic_T ) end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_L roman_log ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , end_CELL start_CELL end_CELL start_CELL if divide start_ARG ( italic_n + italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L end_ARG ≺ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ 1 , end_CELL end_ROW

for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, I2,𝛅=K0/Lsubscript𝐼2𝛅subscript𝐾0𝐿I_{2,\bm{\delta}}=K_{0}/Litalic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_L and I3,𝛅=C(1κ)Rsubscript𝐼3𝛅𝐶superscript1𝜅𝑅I_{3,\bm{\delta}}=C(1-\kappa)^{R}italic_I start_POSTSUBSCRIPT 3 , bold_italic_δ end_POSTSUBSCRIPT = italic_C ( 1 - italic_κ ) start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT for some constants C𝐶Citalic_C and 0<κ<10𝜅10<\kappa<10 < italic_κ < 1.

Respectively, I1,𝜹subscript𝐼1𝜹I_{1,\bm{\delta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT, I2,𝜹subscript𝐼2𝜹I_{2,\bm{\delta}}italic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT and I3,𝜹subscript𝐼3𝜹I_{3,\bm{\delta}}italic_I start_POSTSUBSCRIPT 3 , bold_italic_δ end_POSTSUBSCRIPT correspond to the estimation variance, the bias induced by network merging, and the optimization error of (8) after R𝑅Ritalic_R iterations. If we suppress the logarithmic factor, the estimation variance I1,𝜹1nT+Ln2Tasymptotically-equalssubscript𝐼1𝜹1𝑛𝑇𝐿superscript𝑛2𝑇I_{1,\bm{\delta}}\asymp\frac{1}{nT}+\frac{L}{n^{2}T}italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT ≍ divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_L end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG, which matches up with the minimax lower bound for the tensor estimation error in Poisson PCA (Han et al.,, 2022).

Remark 3.

Given the partition 𝛅𝛅\bm{\delta}bold_italic_δ, the problem becomes estimating the low-rank 𝛅subscript𝛅\mathcal{M}_{\bm{\delta}}caligraphic_M start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT based on 𝒴𝛅subscript𝒴𝛅\mathcal{Y}_{\bm{\delta}}caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT with (𝒴𝛅)ijl=|𝒯ij[δl1,δl)|subscriptsubscript𝒴𝛅𝑖𝑗𝑙subscript𝒯𝑖𝑗subscript𝛿𝑙1subscript𝛿𝑙(\mathcal{Y}_{\bm{\delta}})_{ijl}=|\mathcal{T}_{ij}\cap[\delta_{l-1},\delta_{l% })|( caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT = | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) |, where (𝒴𝛅)ijlsubscriptsubscript𝒴𝛅𝑖𝑗𝑙(\mathcal{Y}_{\bm{\delta}})_{ijl}( caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT follows the Poisson distribution with intensity (l1)Δ𝛅lΔ𝛅𝛉ij(t)𝑑tΔ𝛅proportional-tosuperscriptsubscript𝑙1subscriptΔ𝛅𝑙subscriptΔ𝛅superscriptsubscript𝛉𝑖𝑗𝑡differential-d𝑡subscriptΔ𝛅\int_{(l-1)\Delta_{\bm{\delta}}}^{l\Delta_{\bm{\delta}}}\bm{\theta}_{ij}^{*}(t% )dt\propto\Delta_{\bm{\delta}}∫ start_POSTSUBSCRIPT ( italic_l - 1 ) roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) italic_d italic_t ∝ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT. The results in Han et al., (2022) and Cai et al., (2023) require that Δ𝛅log(nT)succeedssubscriptΔ𝛅𝑛𝑇\Delta_{\bm{\delta}}\succ\log(nT)roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≻ roman_log ( italic_n italic_T ), or the intensity needs to be “strong”, whereas Theorem 2 still holds when Δ𝛅log(nT)precedes-or-equalssubscriptΔ𝛅𝑛𝑇\Delta_{\bm{\delta}}\preceq\log(nT)roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ⪯ roman_log ( italic_n italic_T ) or even Δ𝛅1precedes-or-equalssubscriptΔ𝛅1\Delta_{\bm{\delta}}\preceq 1roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ⪯ 1. As will be shown in Corollary 1 and Remark 4, allowing Δ𝛅1precedessubscriptΔ𝛅1\Delta_{\bm{\delta}}\prec 1roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ 1 will lead to a faster convergence rate in certain scenario. We establish the upper bound in the weak and medium intensity regimes by exploiting a more delicate concentration inequality. Thanks to the additional incoherence conditions in Theorem 1, we use the Chernoff bound coupled with the Bernstein’s inequality (Proposition 2.10, Wainwright,, 2019) to show that for any 𝒞~,𝛕subscript~𝒞𝛕\mathcal{M}\in\widetilde{\mathcal{C}}_{\mathcal{M},\bm{\tau}}caligraphic_M ∈ over~ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_M , bold_italic_τ end_POSTSUBSCRIPT, |l(¯;𝛕),|𝑙¯𝛕\big{|}\langle\nabla l(\overline{\mathcal{M}};\bm{\tau}),\mathcal{M}\rangle% \big{|}| ⟨ ∇ italic_l ( over¯ start_ARG caligraphic_M end_ARG ; bold_italic_τ ) , caligraphic_M ⟩ | still has a sub-Gaussian tail bound within the required scope, as is the case under the strong intensity condition.

Furthermore, with a relatively large value of R𝑅Ritalic_R, the optimization error I3,𝜹subscript𝐼3𝜹I_{3,\bm{\delta}}italic_I start_POSTSUBSCRIPT 3 , bold_italic_δ end_POSTSUBSCRIPT is dominated by I1,𝜹+I2,𝜹subscript𝐼1𝜹subscript𝐼2𝜹I_{1,\bm{\delta}}+I_{2,\bm{\delta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT. Then, the convergence rate of 𝜹(R)𝜹F2superscriptsubscriptnormsubscriptsuperscript𝑅𝜹subscriptsuperscript𝜹𝐹2\|\mathcal{M}^{(R)}_{\bm{\delta}}-\mathcal{M}^{*}_{\bm{\delta}}\|_{F}^{2}∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is largely determined by the trade-off between I1,𝜹subscript𝐼1𝜹I_{1,\bm{\delta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT and I2,𝜹subscript𝐼2𝜹I_{2,\bm{\delta}}italic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT. Corollary 1 specifies the convergence rate for the estimation error in the weak intensity regime.

Corollary 1.

Suppose all the conditions in Theorem 2 are satisfied and log(nT)Tn2log(nT)precedes𝑛𝑇𝑇precedessuperscript𝑛2𝑛𝑇\log(nT)\prec T\prec\frac{n^{2}}{\log(nT)}roman_log ( italic_n italic_T ) ≺ italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG. Then, choosing Tlog1/2(nT)nΔ𝛅1precedes𝑇superscript12𝑛𝑇𝑛subscriptΔ𝛅precedes1\frac{\sqrt{T}\log^{1/2}(nT)}{n}\prec\Delta_{\bm{\delta}}\prec 1divide start_ARG square-root start_ARG italic_T end_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n end_ARG ≺ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ 1, we have

1n1n2L𝜹(R)𝜹F2PK0L.subscriptprecedes-or-equals𝑃1subscript𝑛1subscript𝑛2𝐿superscriptsubscriptnormsubscriptsuperscript𝑅𝜹subscriptsuperscript𝜹𝐹2subscript𝐾0𝐿\frac{1}{n_{1}n_{2}L}\|\mathcal{M}^{(R)}_{\bm{\delta}}-\mathcal{M}^{*}_{\bm{% \delta}}\|_{F}^{2}\preceq_{P}\frac{K_{0}}{L}.divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_L end_ARG .
Remark 4.

Corollary 1 assures the validity of employing small intervals with Δ𝛅1precedessubscriptΔ𝛅1\Delta_{\bm{\delta}}\prec 1roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ 1 in estimating the underlying tensor, to which the existing results (Han et al.,, 2022; Cai et al.,, 2023) requiring the strong intensity assumption may not apply. It is also interesting to point out that the derived error bound in the weak and medium intensity regimes also provides practical guideline for network merging. By Corollary 1, we will get a faster convergence rate as K0log1/2+ϵ(nT)nTsubscript𝐾0superscript12italic-ϵ𝑛𝑇𝑛𝑇\frac{K_{0}\log^{1/2+\epsilon}(nT)}{n\sqrt{T}}divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG with Δ𝛅Tlog1/2+ϵ(nT)nasymptotically-equalssubscriptΔ𝛅𝑇superscript12italic-ϵ𝑛𝑇𝑛\Delta_{\bm{\delta}}\asymp\frac{\sqrt{T}\log^{1/2+\epsilon}(nT)}{n}roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≍ divide start_ARG square-root start_ARG italic_T end_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n end_ARG or LnTlog1/2+ϵ(nT)asymptotically-equals𝐿𝑛𝑇superscript12italic-ϵ𝑛𝑇L\asymp\frac{n\sqrt{T}}{\log^{1/2+\epsilon}(nT)}italic_L ≍ divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG, in contrast to the rate K0log1+ϵ(nT)Tsubscript𝐾0superscript1italic-ϵ𝑛𝑇𝑇\frac{K_{0}\log^{1+\epsilon}(nT)}{T}divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_T end_ARG obtained in the strong intensity regime with Δ𝛅log1+ϵ(nT)asymptotically-equalssubscriptΔ𝛅superscript1italic-ϵ𝑛𝑇\Delta_{\bm{\delta}}\asymp\log^{1+\epsilon}(nT)roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≍ roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) or LTlog1+ϵ(nT)asymptotically-equals𝐿𝑇superscript1italic-ϵ𝑛𝑇L\asymp\frac{T}{\log^{1+\epsilon}(nT)}italic_L ≍ divide start_ARG italic_T end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG (Han et al.,, 2022; Cai et al.,, 2023). The intuition is that if T𝑇Titalic_T diverges very slowly, then one prefers to choose a relatively small Δ𝛅subscriptΔ𝛅\Delta_{\bm{\delta}}roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT or large L𝐿Litalic_L to reduce the bias I2,𝛅=K0Δ𝛅/Tsubscript𝐼2𝛅subscript𝐾0subscriptΔ𝛅𝑇I_{2,\bm{\delta}}=K_{0}\Delta_{\bm{\delta}}/Titalic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT / italic_T.

4.3 Error analysis based on adaptively merged intervals

Define ρ=mink[K0]𝐰k,𝜼𝐰k1,𝜼𝜌subscript𝑘delimited-[]subscript𝐾0normsubscriptsuperscript𝐰𝑘𝜼subscriptsuperscript𝐰𝑘1𝜼\rho=\min_{k\in[{K_{0}}]}\|\mathbf{w}^{*}_{k,\bm{\eta}}-\mathbf{w}^{*}_{k-1,% \bm{\eta}}\|italic_ρ = roman_min start_POSTSUBSCRIPT italic_k ∈ [ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ∥ bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k , bold_italic_η end_POSTSUBSCRIPT - bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - 1 , bold_italic_η end_POSTSUBSCRIPT ∥ and suppose ρ1succeeds-or-equals𝜌1\rho\succeq 1italic_ρ ⪰ 1. Denote rnT=I1,𝜹+I2,𝜹subscript𝑟𝑛𝑇subscript𝐼1𝜹subscript𝐼2𝜹r_{nT}=I_{1,\bm{\delta}}+I_{2,\bm{\delta}}italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 2 , bold_italic_δ end_POSTSUBSCRIPT as the upper bound in Theorem 2. Theorem 3 shows that (9) gives a consistent estimate of K0subscript𝐾0{K_{0}}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and (5) further results in a precise recovery of the true partition 𝜼𝜼\bm{\eta}bold_italic_η with overwhelming probability.

Theorem 3.

(Consistency of partition) Suppose all the conditions of Theorem 2 are satisfied, and rnTνnT1/K0precedessubscript𝑟𝑛𝑇subscript𝜈𝑛𝑇precedes1subscript𝐾0r_{nT}\prec\nu_{nT}\prec 1/{K_{0}}italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≺ italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≺ 1 / italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Then as n𝑛nitalic_n and T𝑇Titalic_T grow to infinity, we have Pr(K^=K0)1Pr^𝐾subscript𝐾01\Pr(\widehat{K}={K_{0}})\to 1roman_Pr ( over^ start_ARG italic_K end_ARG = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) → 1 and 𝛈^𝛈PTrnTsubscriptprecedes-or-equals𝑃subscriptnorm^𝛈𝛈𝑇subscript𝑟𝑛𝑇\|\widehat{\bm{\eta}}-\bm{\eta}\|_{\infty}\preceq_{P}Tr_{nT}∥ over^ start_ARG bold_italic_η end_ARG - bold_italic_η ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_T italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT.

Remark 5.

It is clear that the consistency of K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG is guaranteed with a wide range of νnTsubscript𝜈𝑛𝑇\nu_{nT}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT. Specifically, the condition νnT1/K0precedessubscript𝜈𝑛𝑇1subscript𝐾0\nu_{nT}\prec 1/{K_{0}}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≺ 1 / italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT implies that K^K0^𝐾subscript𝐾0\widehat{K}\geq{K_{0}}over^ start_ARG italic_K end_ARG ≥ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, whereas νnTrnTsucceedssubscript𝜈𝑛𝑇subscript𝑟𝑛𝑇\nu_{nT}\succ r_{nT}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≻ italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT guarantees K^K0^𝐾subscript𝐾0\widehat{K}\leq{K_{0}}over^ start_ARG italic_K end_ARG ≤ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. More importantly, Theorem 3 provides valuable guidelines for choosing L𝐿Litalic_L and νnT.subscript𝜈𝑛𝑇\nu_{nT}.italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT . For fixed K0subscript𝐾0K_{0}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, if log(nT)Tn2log(nT)precedes𝑛𝑇𝑇precedessuperscript𝑛2𝑛𝑇\log(nT)\prec T\prec\frac{n^{2}}{\log(nT)}roman_log ( italic_n italic_T ) ≺ italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, we can choose L=nTlog1/2+ϵ(nT)𝐿𝑛𝑇superscript12italic-ϵ𝑛𝑇L=\frac{n\sqrt{T}}{\log^{1/2+\epsilon}(nT)}italic_L = divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG and νnT=log1/4+ϵ/2(nT)n1/2T1/4subscript𝜈𝑛𝑇superscript14italic-ϵ2𝑛𝑇superscript𝑛12superscript𝑇14\nu_{nT}=\frac{\log^{1/4+\epsilon/2}(nT)}{n^{1/2}T^{1/4}}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT = divide start_ARG roman_log start_POSTSUPERSCRIPT 1 / 4 + italic_ϵ / 2 end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG; if Tn2log(nT)succeeds-or-equals𝑇superscript𝑛2𝑛𝑇T\succeq\frac{n^{2}}{\log(nT)}italic_T ⪰ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, we can choose L=nTlog3/2+ϵ(nT)𝐿𝑛𝑇superscript32italic-ϵ𝑛𝑇L=\frac{n\sqrt{T}}{\log^{3/2+\epsilon}(nT)}italic_L = divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 3 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG and νnT=log3/4+ϵ/2(nT)n1/2T1/4subscript𝜈𝑛𝑇superscript34italic-ϵ2𝑛𝑇superscript𝑛12superscript𝑇14\nu_{nT}=\frac{\log^{3/4+\epsilon/2}(nT)}{n^{1/2}T^{1/4}}italic_ν start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT = divide start_ARG roman_log start_POSTSUPERSCRIPT 3 / 4 + italic_ϵ / 2 end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG.

Given that the true partition 𝜼𝜼\bm{\eta}bold_italic_η is accurately estimated by 𝜼^^𝜼\widehat{\bm{\eta}}over^ start_ARG bold_italic_η end_ARG, Theorem 4 further shows that the estimate 𝜼^(R)subscriptsuperscript𝑅^𝜼\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT based on the adaptively merged intervals 𝜼^^𝜼\widehat{\bm{\eta}}over^ start_ARG bold_italic_η end_ARG can attain a faster rate of convergence than that in Theorem 2.

Theorem 4.

(Improved estimate via adaptive merging) Suppose all the conditions of Theorem 3 are satisfied and Δ𝛈log2+ϵ(nK0)succeeds-or-equalssubscriptΔ𝛈superscript2italic-ϵ𝑛subscript𝐾0\Delta_{\bm{\eta}}\succeq\log^{2+\epsilon}(nK_{0})roman_Δ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ⪰ roman_log start_POSTSUPERSCRIPT 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Then, with probability approaching 1, we have

1n1n2K0𝜼^(R)𝜼F2I1,𝜼+I2,𝜼+I3,𝜼,precedes-or-equals1subscript𝑛1subscript𝑛2subscript𝐾0superscriptsubscriptnormsubscriptsuperscript𝑅^𝜼subscriptsuperscript𝜼𝐹2subscript𝐼1𝜼subscript𝐼2𝜼subscript𝐼3𝜼\frac{1}{n_{1}n_{2}{K_{0}}}\|\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}-\mathcal{% M}^{*}_{\bm{\eta}}\|_{F}^{2}\preceq I_{1,\bm{\eta}}+I_{2,\bm{\eta}}+I_{3,\bm{% \eta}},divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ italic_I start_POSTSUBSCRIPT 1 , bold_italic_η end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 2 , bold_italic_η end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT 3 , bold_italic_η end_POSTSUBSCRIPT ,

where I1,𝛈=1nT+K0n2Tsubscript𝐼1𝛈1𝑛𝑇subscript𝐾0superscript𝑛2𝑇I_{1,\bm{\eta}}=\frac{1}{nT}+\frac{{K_{0}}}{n^{2}T}italic_I start_POSTSUBSCRIPT 1 , bold_italic_η end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG,

I2,𝜼={K02rnT2,if TrnTlog(nK0),K02rnT2log2(1+ϵ)(nK0),if 1TrnTlog(nK0),K02log2(1+ϵ)(nK0)T2,if TrnT1,I_{2,\bm{\eta}}=\left\{\begin{aligned} &K_{0}^{2}r_{nT}^{2},&&\mbox{if }~{}Tr_% {nT}\succ\log(nK_{0}),\\ &K_{0}^{2}r_{nT}^{2}\log^{2(1+\epsilon)}(nK_{0}),&&\mbox{if }~{}1\preceq Tr_{% nT}\preceq\log(nK_{0}),\\ &\frac{K_{0}^{2}\log^{2(1+\epsilon)}(nK_{0})}{T^{2}},&&\mbox{if }~{}Tr_{nT}% \prec 1,\end{aligned}\right.italic_I start_POSTSUBSCRIPT 2 , bold_italic_η end_POSTSUBSCRIPT = { start_ROW start_CELL end_CELL start_CELL italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL start_CELL if italic_T italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≻ roman_log ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 ( 1 + italic_ϵ ) end_POSTSUPERSCRIPT ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , end_CELL start_CELL end_CELL start_CELL if 1 ⪯ italic_T italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ⪯ roman_log ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 ( 1 + italic_ϵ ) end_POSTSUPERSCRIPT ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL start_CELL end_CELL start_CELL if italic_T italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT ≺ 1 , end_CELL end_ROW

and I3,𝛈=C(1κ)Rsubscript𝐼3𝛈𝐶superscript1𝜅𝑅I_{3,\bm{\eta}}=C(1-\kappa)^{R}italic_I start_POSTSUBSCRIPT 3 , bold_italic_η end_POSTSUBSCRIPT = italic_C ( 1 - italic_κ ) start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT for some constants C𝐶Citalic_C and 0<κ<10𝜅10<\kappa<10 < italic_κ < 1.

Similarly, I1,𝜼subscript𝐼1𝜼I_{1,\bm{\eta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_η end_POSTSUBSCRIPT, I2,𝜼subscript𝐼2𝜼I_{2,\bm{\eta}}italic_I start_POSTSUBSCRIPT 2 , bold_italic_η end_POSTSUBSCRIPT and I3,𝜼subscript𝐼3𝜼I_{3,\bm{\eta}}italic_I start_POSTSUBSCRIPT 3 , bold_italic_η end_POSTSUBSCRIPT correspond to the estimation variance, the bias induced by adaptively merging, and the optimization error of (8) after R𝑅Ritalic_R iterations, respectively. It is clear that I1,𝜼subscript𝐼1𝜼I_{1,\bm{\eta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_η end_POSTSUBSCRIPT is much smaller than I1,𝜹subscript𝐼1𝜹I_{1,\bm{\delta}}italic_I start_POSTSUBSCRIPT 1 , bold_italic_δ end_POSTSUBSCRIPT in Theorem 2 where the term Ln2T𝐿superscript𝑛2𝑇\frac{L}{n^{2}T}divide start_ARG italic_L end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG is reduced to K0n2Tsubscript𝐾0superscript𝑛2𝑇\frac{K_{0}}{n^{2}T}divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG. The convergence rate for the bias term, I2,𝜼subscript𝐼2𝜼I_{2,\bm{\eta}}italic_I start_POSTSUBSCRIPT 2 , bold_italic_η end_POSTSUBSCRIPT, takes different forms depending on the term TrnT𝑇subscript𝑟𝑛𝑇Tr_{nT}italic_T italic_r start_POSTSUBSCRIPT italic_n italic_T end_POSTSUBSCRIPT. Specifically, Corollary 2 gives the convergence rate for the estimation error of 𝜼^(R)subscriptsuperscript𝑅^𝜼\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT.

Corollary 2.

Suppose all the conditions in Theorem 4 are satisfied. If Tn2log(nT)succeeds-or-equals𝑇superscript𝑛2𝑛𝑇T\succeq\frac{n^{2}}{\log(nT)}italic_T ⪰ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, then choosing Δ𝛅log(nT)succeedssubscriptΔ𝛅𝑛𝑇\Delta_{\bm{\delta}}\succ\log(nT)roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≻ roman_log ( italic_n italic_T ) leads to

1n1n2K0𝜼^(R)𝜼F2P1nT+K0n2T+K02L2n4T2+K04L2;subscriptprecedes-or-equals𝑃1subscript𝑛1subscript𝑛2subscript𝐾0superscriptsubscriptnormsubscriptsuperscript𝑅^𝜼subscriptsuperscript𝜼𝐹21𝑛𝑇subscript𝐾0superscript𝑛2𝑇superscriptsubscript𝐾02superscript𝐿2superscript𝑛4superscript𝑇2superscriptsubscript𝐾04superscript𝐿2\frac{1}{n_{1}n_{2}K_{0}}\|\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}-\mathcal{M}% ^{*}_{\bm{\eta}}\|_{F}^{2}\preceq_{P}\frac{1}{nT}+\frac{K_{0}}{n^{2}T}+\frac{K% _{0}^{2}L^{2}}{n^{4}T^{2}}+\frac{K_{0}^{4}}{L^{2}};divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ;

if log(nT)Tn2log(nT)precedes𝑛𝑇𝑇precedessuperscript𝑛2𝑛𝑇\log(nT)\prec T\prec\frac{n^{2}}{\log(nT)}roman_log ( italic_n italic_T ) ≺ italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, choosing Tlog1/2(nT)nΔ𝛅1precedes𝑇superscript12𝑛𝑇𝑛subscriptΔ𝛅precedes1\frac{\sqrt{T}\log^{1/2}(nT)}{n}\prec\Delta_{\bm{\delta}}\prec 1divide start_ARG square-root start_ARG italic_T end_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n end_ARG ≺ roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≺ 1 leads to

1n1n2K0𝜼^(R)𝜼F2P1nT+K0n2T+K02log2(1+ϵ)(nK0)T2.subscriptprecedes-or-equals𝑃1subscript𝑛1subscript𝑛2subscript𝐾0superscriptsubscriptnormsubscriptsuperscript𝑅^𝜼subscriptsuperscript𝜼𝐹21𝑛𝑇subscript𝐾0superscript𝑛2𝑇superscriptsubscript𝐾02superscript21italic-ϵ𝑛subscript𝐾0superscript𝑇2\frac{1}{n_{1}n_{2}K_{0}}\|\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}-\mathcal{M}% ^{*}_{\bm{\eta}}\|_{F}^{2}\preceq_{P}\frac{1}{nT}+\frac{K_{0}}{n^{2}T}+\frac{K% _{0}^{2}\log^{2(1+\epsilon)}(nK_{0})}{T^{2}}.divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∥ caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT - caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⪯ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG + divide start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 ( 1 + italic_ϵ ) end_POSTSUPERSCRIPT ( italic_n italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .
Remark 6.

Let K0subscript𝐾0K_{0}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be a fixed constant, and we compare the estimates 𝛅(R)subscriptsuperscript𝑅𝛅\mathcal{M}^{(R)}_{\bm{\delta}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT based on the equally spaced intervals and 𝛈^(R)subscriptsuperscript𝑅^𝛈\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT based on the adaptively merged intervals. If Tn2log(nT)succeeds-or-equals𝑇superscript𝑛2𝑛𝑇T\succeq\frac{n^{2}}{\log(nT)}italic_T ⪰ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, then 𝛈^(R)subscriptsuperscript𝑅^𝛈\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT converges to 0 at a faster rate of 1nT+log3+2ϵ(nT)n2T1𝑛𝑇superscript32italic-ϵ𝑛𝑇superscript𝑛2𝑇\frac{1}{nT}+\frac{\log^{3+2\epsilon}(nT)}{n^{2}T}divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG roman_log start_POSTSUPERSCRIPT 3 + 2 italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG, whereas the convergence rate of 𝛅(R)subscriptsuperscript𝑅𝛅\mathcal{M}^{(R)}_{\bm{\delta}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT with L=nTlog3/2+ϵ(nT)𝐿𝑛𝑇superscript32italic-ϵ𝑛𝑇L=\frac{n\sqrt{T}}{\log^{3/2+\epsilon}(nT)}italic_L = divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 3 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG is of order log3/2+ϵ(nT)nTsuperscript32italic-ϵ𝑛𝑇𝑛𝑇\frac{\log^{3/2+\epsilon}(nT)}{n\sqrt{T}}divide start_ARG roman_log start_POSTSUPERSCRIPT 3 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG. If log(nT)Tn2log(nT)precedes𝑛𝑇𝑇precedessuperscript𝑛2𝑛𝑇\log(nT)\prec T\prec\frac{n^{2}}{\log(nT)}roman_log ( italic_n italic_T ) ≺ italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, the convergence rates of 𝛈^(R)subscriptsuperscript𝑅^𝛈\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT and 𝛅(R)subscriptsuperscript𝑅𝛅\mathcal{M}^{(R)}_{\bm{\delta}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT are of order 1nT+log2(1+ϵ)nT21𝑛𝑇superscript21italic-ϵ𝑛superscript𝑇2\frac{1}{nT}+\frac{\log^{2(1+\epsilon)}n}{T^{2}}divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG roman_log start_POSTSUPERSCRIPT 2 ( 1 + italic_ϵ ) end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG and log1/2+ϵ(nT)nTsuperscript12italic-ϵ𝑛𝑇𝑛𝑇\frac{\log^{1/2+\epsilon}(nT)}{n\sqrt{T}}divide start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG with L=nTlog1/2+ϵ(nT)𝐿𝑛𝑇superscript12italic-ϵ𝑛𝑇L=\frac{n\sqrt{T}}{\log^{1/2+\epsilon}(nT)}italic_L = divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG, where 𝛈^(R)subscriptsuperscript𝑅^𝛈\mathcal{M}^{(R)}_{\widehat{\bm{\eta}}}caligraphic_M start_POSTSUPERSCRIPT ( italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT is still advantageous as long as Tn23log1+23ϵ(nT)succeeds𝑇superscript𝑛23superscript123italic-ϵ𝑛𝑇T\succ n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)italic_T ≻ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ).

Table 1 summarizes the convergence rates of the proposed method. It is shown that in all scenarios of n𝑛nitalic_n and T𝑇Titalic_T, if we suppress the logarithm terms, the optimal L𝐿Litalic_L is always of order nT𝑛𝑇n\sqrt{T}italic_n square-root start_ARG italic_T end_ARG. When Tn2log(nT)succeeds-or-equals𝑇superscript𝑛2𝑛𝑇T\succeq\frac{n^{2}}{\log(nT)}italic_T ⪰ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, the optimal choice of Δ𝜹log(nT)succeedssubscriptΔ𝜹𝑛𝑇\Delta_{\bm{\delta}}\succ\log(nT)roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ≻ roman_log ( italic_n italic_T ) makes the initial estimate fall into the strong intensity regime. When Tn2log(nT)precedes𝑇superscript𝑛2𝑛𝑇T\prec\frac{n^{2}}{\log(nT)}italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG, the optimal choice of L𝐿Litalic_L makes the initial estimate fall into the weak intensity regime, and adaptive merging will further improve the convergence rate as long as Tn23log1+23ϵ(nT)succeeds𝑇superscript𝑛23superscript123italic-ϵ𝑛𝑇T\succ n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)italic_T ≻ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ). Though this advantage will vanish when Tn23log1+23ϵ(nT)precedes-or-equals𝑇superscript𝑛23superscript123italic-ϵ𝑛𝑇T\preceq n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)italic_T ⪯ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ), in which case the initial estimate ^𝜹subscript^𝜹\widehat{\mathcal{M}}_{\bm{\delta}}over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT would be a better choice. Note that the error rates for the proposed method are always smaller than the rates obtained in the strong intensity regime based on equally spaced intervals, which are log3/2+ϵ(nT)nTsuperscript32italic-ϵ𝑛𝑇𝑛𝑇\frac{\log^{3/2+\epsilon}(nT)}{n\sqrt{T}}divide start_ARG roman_log start_POSTSUPERSCRIPT 3 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG in the first scenario and log1+ϵ(nT)Tsuperscript1italic-ϵ𝑛𝑇𝑇\frac{\log^{1+\epsilon}(nT)}{T}divide start_ARG roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_T end_ARG in the second and third scenarios.

Scenarios for (n,T)𝑛𝑇(n,T)( italic_n , italic_T ) Optimal L𝐿Litalic_L Intensity Merging Error Rates
Tn2log(nT)succeeds-or-equals𝑇superscript𝑛2𝑛𝑇T\succeq\frac{n^{2}}{\log(nT)}italic_T ⪰ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG nTlog3/2+ϵ(nT)𝑛𝑇superscript32italic-ϵ𝑛𝑇\frac{n\sqrt{T}}{\log^{3/2+\epsilon}(nT)}divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 3 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG Strong Yes 1nT+log3+2ϵ(nT)n2T1𝑛𝑇superscript32italic-ϵ𝑛𝑇superscript𝑛2𝑇\frac{1}{nT}+\frac{\log^{3+2\epsilon}(nT)}{n^{2}T}divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG roman_log start_POSTSUPERSCRIPT 3 + 2 italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG
n23log1+23ϵ(nT)Tn2log(nT)precedessuperscript𝑛23superscript123italic-ϵ𝑛𝑇𝑇precedessuperscript𝑛2𝑛𝑇n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)\prec T\prec\frac{n^{2}}{\log(% nT)}italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) ≺ italic_T ≺ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log ( italic_n italic_T ) end_ARG nTlog1/2+ϵ(nT)𝑛𝑇superscript12italic-ϵ𝑛𝑇\frac{n\sqrt{T}}{\log^{1/2+\epsilon}(nT)}divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG Weak Yes 1nT+log2(1+ϵ)(nT)T21𝑛𝑇superscript21italic-ϵ𝑛𝑇superscript𝑇2\frac{1}{nT}+\frac{\log^{2(1+\epsilon)}(nT)}{T^{2}}divide start_ARG 1 end_ARG start_ARG italic_n italic_T end_ARG + divide start_ARG roman_log start_POSTSUPERSCRIPT 2 ( 1 + italic_ϵ ) end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
log(nT)Tn23log1+23ϵ(nT)precedes𝑛𝑇𝑇precedes-or-equalssuperscript𝑛23superscript123italic-ϵ𝑛𝑇\log(nT)\prec T\preceq n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)roman_log ( italic_n italic_T ) ≺ italic_T ⪯ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) nTlog1/2+ϵ(nT)𝑛𝑇superscript12italic-ϵ𝑛𝑇\frac{n\sqrt{T}}{\log^{1/2+\epsilon}(nT)}divide start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG Weak No log1/2+ϵ(nT)nTsuperscript12italic-ϵ𝑛𝑇𝑛𝑇\frac{\log^{1/2+\epsilon}(nT)}{n\sqrt{T}}divide start_ARG roman_log start_POSTSUPERSCRIPT 1 / 2 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG start_ARG italic_n square-root start_ARG italic_T end_ARG end_ARG
Table 1: Convergence rates for the proposed method in different regimes.

5 Numerical experiments

5.1 Simulation examples

We let n1=n2=n{50,100}subscript𝑛1subscript𝑛2𝑛50100n_{1}=n_{2}=n\in\{50,100\}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_n ∈ { 50 , 100 } and T{n2/logn,n,n1/3}𝑇superscript𝑛2𝑛𝑛superscript𝑛13T\in\{n^{2}/\log n,n,n^{1/3}\}italic_T ∈ { italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_log italic_n , italic_n , italic_n start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT }, corresponding to three scenarios in Table 1. We set r1=r2=r3=3subscript𝑟1subscript𝑟2subscript𝑟33r_{1}=r_{2}=r_{3}=3italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 3, K0{3,5}subscript𝐾035K_{0}\in\{3,5\}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ { 3 , 5 } and the partition 𝜼K0𝜼superscriptsubscript𝐾0\bm{\eta}\in\mathbb{R}^{K_{0}}bold_italic_η ∈ blackboard_R start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is constructed in such a way that each ηksubscript𝜂𝑘\eta_{k}italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is randomly generated from [0,T)0𝑇[0,T)[ 0 , italic_T ), where the length ratio for the largest and smallest intervals is no larger than 3, and 𝐰(t)superscript𝐰𝑡\mathbf{w}^{*}(t)bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) is a piecewise constant function of t𝑡titalic_t with 𝐰(t)=(𝐖𝜼)[k,]\mathbf{w}^{*}(t)=(\mathbf{W}_{\bm{\eta}}^{*})_{[k,]}bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = ( bold_W start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT [ italic_k , ] end_POSTSUBSCRIPT for t[ηk1,ηk)𝑡subscript𝜂𝑘1subscript𝜂𝑘t\in[\eta_{k-1},\eta_{k})italic_t ∈ [ italic_η start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). The columns of 𝐖𝜼superscriptsubscript𝐖𝜼\mathbf{W}_{\bm{\eta}}^{*}bold_W start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are randomly generated such that 0T𝐰(t)𝐰(t)𝑑t=T𝐈2superscriptsubscript0𝑇superscript𝐰𝑡superscript𝐰superscript𝑡topdifferential-d𝑡𝑇subscript𝐈2\int_{0}^{T}\mathbf{w}^{*}(t)\mathbf{w}^{*}(t)^{\top}dt=T\mathbf{I}_{2}∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_d italic_t = italic_T bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, while the columns of 𝐔/n1superscript𝐔subscript𝑛1\mathbf{U}^{*}/\sqrt{n_{1}}bold_U start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / square-root start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG and 𝐕/n2superscript𝐕subscript𝑛2\mathbf{V}^{*}/\sqrt{n_{2}}bold_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / square-root start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG are generated uniformly from 𝕆n,2subscript𝕆𝑛2\mathbb{O}_{n,2}blackboard_O start_POSTSUBSCRIPT italic_n , 2 end_POSTSUBSCRIPT. For 𝒮superscript𝒮\mathcal{S}^{*}caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the diagonal entries are set to be 0.5 and the rest entries 0.

We investigate the finite-sample performance of the proposed method, and compare it with existing tensor decomposition methods, including a modified Poisson tensor PCA (Han et al.,, 2022), higher-order orthogonal iteration (De Lathauwer et al., 2000b, ) and higher-order SVD (De Lathauwer et al., 2000a, ). Specifically, we denote

  • AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) as the estimate based on adaptively merged intervals;

  • ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) as the proposed initial estimate built on Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT equally spaced intervals, where Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT is based on Table 1;

  • ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) as the estimate based on LstrTlog1+ϵ(nT)asymptotically-equalssubscript𝐿str𝑇superscript1italic-ϵ𝑛𝑇L_{\text{str}}\asymp\frac{T}{\log^{1+\epsilon}(nT)}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT ≍ divide start_ARG italic_T end_ARG start_ARG roman_log start_POSTSUPERSCRIPT 1 + italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) end_ARG equally spaced intervals in the strong intensity regime;

  • “HOOI” and “HOSVD” as the estimates of higher-order orthogonal iteration (De Lathauwer et al., 2000b, ) and SVD (De Lathauwer et al., 2000a, ) based on Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT equally spaced intervals, where (𝒴𝜹)ijl=|𝒯ij[δl1,δl)|subscriptsubscript𝒴𝜹𝑖𝑗𝑙subscript𝒯𝑖𝑗subscript𝛿𝑙1subscript𝛿𝑙(\mathcal{Y}_{\bm{\delta}})_{ijl}=|\mathcal{T}_{ij}\cap[\delta_{l-1},\delta_{l% })|( caligraphic_Y start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_l end_POSTSUBSCRIPT = | caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∩ [ italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) |.

Their numeric performance is assessed by the average tensor estimation error based on the corresponding intervals.

The averaged tensor estimation errors over 50 independent replications and their standard errors for each method are summarized in Tables 2 and 3. It is shown that AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) has delivered superior numerical performance and outperforms the other three competitors in the first two scenarios, T=n2/logn𝑇superscript𝑛2𝑛T=n^{2}/\log nitalic_T = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_log italic_n and T=n𝑇𝑛T=nitalic_T = italic_n, in all examples, which is consistent with the theoretical results in Table 1. It is interesting to note that in the third scenarios where T=n1/3𝑇superscript𝑛13T=n^{1/3}italic_T = italic_n start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT, ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) outperforms AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG), which echos the results in Theorem 2 and Reamrk 6. It is worthy pointing out that AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) and ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) show great advantage over ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) and HOOI in all scenarios, suggesting superiority of the proposed method. Further, Tables 2 and 3 also show that K𝐾Kitalic_K could be consistently selected by (9).

We now scrutinize how the tensor estimation error is affected by different choices of L𝐿Litalic_L in examples with n=50𝑛50n=50italic_n = 50, T{n2/logn,n,n1/3}𝑇superscript𝑛2𝑛𝑛superscript𝑛13T\in\{n^{2}/\log n,n,n^{1/3}\}italic_T ∈ { italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_log italic_n , italic_n , italic_n start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT } and K0=3subscript𝐾03K_{0}=3italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 3. The three panels of Figure 2 show the average tensor estimation errors of ES(L𝐿Litalic_L) over 50 independent replications with different L𝐿Litalic_L. Clearly, as L𝐿Litalic_L increases, the error decreases at first, and then increases. This is because the bias induced by the partition with a small number of intervals dominates the tensor estimation error in each interval, which will be reduced dramatically as L𝐿Litalic_L increases. Yet, as L𝐿Litalic_L becomes larger, the estimation variance begins to dominate the tensor estimation error, and it increases along with L𝐿Litalic_L. This phenomenon validates the asymptotic upper bound in Theorem 2. The averaged tensor estimation error of AM(K^)^𝐾(\widehat{K})( over^ start_ARG italic_K end_ARG ) with K^=3^𝐾3\widehat{K}=3over^ start_ARG italic_K end_ARG = 3 adaptively merged intervals is represented by the red dotted line, which is smaller than that of all the methods based on equally spaced intervals in the first two scenarios, demonstrating the advantage of the proposed methods in Theorem 4. However, in the third scenario, AM(K^)^𝐾(\widehat{K})( over^ start_ARG italic_K end_ARG ) is defeated by ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) for certain L𝐿Litalic_L, which also validates the results in Theorem 2 and Reamrk 6. It suggests that the initial estimate ^𝜹subscript^𝜹\widehat{\mathcal{M}}_{\bm{\delta}}over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT would be a better choice when Tn23log1+23ϵ(nT)precedes-or-equals𝑇superscript𝑛23superscript123italic-ϵ𝑛𝑇T\preceq n^{\frac{2}{3}}\log^{1+\frac{2}{3}\epsilon}(nT)italic_T ⪯ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 1 + divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_ϵ end_POSTSUPERSCRIPT ( italic_n italic_T ) as shown in the Table 1.

Table 2: The averaged tensor estimation errors and K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG, when K0=3subscript𝐾03K_{0}=3italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 3
Error Method T=n2/logn𝑇superscript𝑛2𝑛T=n^{2}/\log nitalic_T = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_log italic_n T=n𝑇𝑛T=nitalic_T = italic_n T=n1/3𝑇superscript𝑛13T=n^{1/3}italic_T = italic_n start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT
n=50𝑛50n=50italic_n = 50 K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG 3(0) 3(0) 3(0)
AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) 0.0014(0.0001) 0.0017(0.0002) 0.5316(0.2)
ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) 0.0075(0.0003) 0.0065(0.0003) 0.4318(0.2)
ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) 0.0075(0.0003) 0.268(0.002) 1.0241(0.7)
HOOI 1.9976(0.001) 0.2215(0.003) 6.6766(0.01)
HOSVD 2.0405(0.004) 0.2206(0.003) 6.6963(0.01)
n=100𝑛100n=100italic_n = 100 K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG 3(0) 3(0) 3(0)
AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) 0.0002(2e-05) 0.0004(3e-05) 0.1371(0.03)
ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) 0.0006(5e-05) 0.0015(7e-05) 0.1175(0.007)
ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) 0.0006(5e-05) 0.1325(0.0006) 0.3032(0.01)
HOOI 1.6588(0.0003) 0.1064(0.0005) 5.6621(0.005)
HOSVD 1.6656(0.0007) 0.1066(0.0005) 5.6641(0.004)
Table 3: The averaged tensor estimation errors and K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG when K0=5subscript𝐾05K_{0}=5italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5
Error Method T=n2/logn𝑇superscript𝑛2𝑛T=n^{2}/\log nitalic_T = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / roman_log italic_n T=n𝑇𝑛T=nitalic_T = italic_n T=n1/3𝑇superscript𝑛13T=n^{1/3}italic_T = italic_n start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT
n=50𝑛50n=50italic_n = 50 K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG 5(0) 5(0) 4.5(0.7)
AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) 0.0013(0.0001) 0.0016(0.0002) 0.7052(0.4)
ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) 0.0069(0.0003) 0.0075(0.001) 0.3551(0.04)
ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) 0.0069(0.0003) 0.0502(0.0009) 1.2808(1)
HOOI 1.9662(0.001) 0.4487(0.002) 9.3893(0.007)
HOSVD 2.0214(0.002) 0.4213(0.006) 9.3983(0.007)
n=100𝑛100n=100italic_n = 100 K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG 5(0) 5(0) 4.8(0.5)
AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) 0.0002(2e-05) 0.0008(4e-05) 0.2997(0.3)
ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT) 0.0039(6e-05) 0.0031(8e-05) 0.0592(0.02)
ES(Lstrsubscript𝐿strL_{\text{str}}italic_L start_POSTSUBSCRIPT str end_POSTSUBSCRIPT) 0.0039(6e-05) 0.1721(0.001) 0.3666(0.03)
HOOI 3.2263(0.0003) 0.127(0.0007) 9.5284(0.004)
HOSVD 3.2381(0.0005) 0.1309(0.0007) 9.53(0.004)
Refer to caption
Refer to caption
Refer to caption
Figure 2: The average tensor estimation errors based on equal spaced intervals under three scenarios of Table 1 with different values of L𝐿Litalic_L over 50 independent replications. The red dotted lines are the average estimation errors of the estimate based on adaptively merged intervals. The large error rates in the third panel is due to the much smaller chosen T𝑇Titalic_T in the third scenario.

5.2 Real example

We apply the proposed method to analyze a longitudinal network based on the militarized interstate dispute dataset (Palmer et al.,, 2022). The dataset consists of all the major interstate disputes and involved countries during 1895-2014. It can be converted into a longitudinal network with nodes representing all countries ever involved in any dispute over the years. Particularly, we set dyij(t)=1𝑑subscript𝑦𝑖𝑗𝑡1dy_{ij}(t)=1italic_d italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) = 1 if country i𝑖iitalic_i cooperated with country j𝑗jitalic_j in a militarized interstate dispute occurred at time t𝑡titalic_t. We keep it as 1111 for the following years until a dispute occurred between themselves, and then dyij(t)𝑑subscript𝑦𝑖𝑗𝑡dy_{ij}(t)italic_d italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_t ) changes to 0 and remains until the next cooperation. This pre-processing step leads to a longitudinal network with n1=n2=195subscript𝑛1subscript𝑛2195n_{1}=n_{2}=195italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 195 nodes and 110066110066110066110066 temporal edges, and the time stamps range from 0 to T=120𝑇120T=120italic_T = 120 years. We apply the proposed method with Δ𝜹=5subscriptΔ𝜹5\Delta_{\bm{\delta}}=5roman_Δ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT = 5 years and thus L=24𝐿24L=24italic_L = 24, where the ranks are set to be (r1,r2,r3)=(2,2,2)subscript𝑟1subscript𝑟2subscript𝑟3222(r_{1},r_{2},r_{3})=(2,2,2)( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 2 , 2 , 2 ) following a similar rank selection procedure in Han et al., (2022).

To assess the numeric performance, we randomly split the node pairs into 5 disjoint subsets {𝒫p}p=15superscriptsubscriptsubscript𝒫𝑝𝑝15\{\mathcal{P}_{p}\}_{p=1}^{5}{ caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT. For each p𝑝pitalic_p, we obtain the estimated tensor ^(p)superscript^𝑝\widehat{\mathcal{M}}^{(p)}over^ start_ARG caligraphic_M end_ARG start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT on 𝒫p=[n1]×[n2]\𝒫psubscript𝒫𝑝\delimited-[]subscript𝑛1delimited-[]subscript𝑛2subscript𝒫𝑝\mathcal{P}_{-p}=[n_{1}]\times[n_{2}]\backslash\mathcal{P}_{p}caligraphic_P start_POSTSUBSCRIPT - italic_p end_POSTSUBSCRIPT = [ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] × [ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] \ caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, and validate the estimation accuracy on 𝒫psubscript𝒫𝑝\mathcal{P}_{p}caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT in each small interval by,

err(p)=(𝒯𝒴^(p))𝟏𝒫pF𝒯𝟏𝒫pF,superscripterr𝑝subscriptnorm𝒯superscript^𝒴𝑝subscript1subscript𝒫𝑝𝐹subscriptnorm𝒯subscript1subscript𝒫𝑝𝐹\text{err}^{(p)}=\frac{\|(\mathcal{T}-\widehat{\mathcal{Y}}^{(p)})\circ{\bm{1}% }_{\mathcal{P}_{p}}\|_{F}}{\|\mathcal{T}\circ{\bm{1}}_{\mathcal{P}_{p}}\|_{F}},err start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT = divide start_ARG ∥ ( caligraphic_T - over^ start_ARG caligraphic_Y end_ARG start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ) ∘ bold_1 start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG ∥ caligraphic_T ∘ bold_1 start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ,

where 𝒯=(𝒯ij)n1×n2𝒯subscriptsubscript𝒯𝑖𝑗subscript𝑛1subscript𝑛2\mathcal{T}=(\mathcal{T}_{ij})_{n_{1}\times n_{2}}caligraphic_T = ( caligraphic_T start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒴^(p)n1×n2superscript^𝒴𝑝superscriptsubscript𝑛1subscript𝑛2\widehat{\mathcal{Y}}^{(p)}\in\mathbb{R}^{n_{1}\times n_{2}}over^ start_ARG caligraphic_Y end_ARG start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT contain the true and estimated numbers of temporal edges for each node pair (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), 𝟏𝒫pn1×n2subscript1subscript𝒫𝑝superscriptsubscript𝑛1subscript𝑛2{\bm{1}}_{\mathcal{P}_{p}}\in\mathbb{R}^{n_{1}\times n_{2}}bold_1 start_POSTSUBSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the indicator matrix for 𝒫psubscript𝒫𝑝\mathcal{P}_{p}caligraphic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, and \circ denotes the matrix Hadamard product. Then, the testing error is calculated as err=p=15err(p)/5errsuperscriptsubscript𝑝15superscripterr𝑝5\text{err}=\sum_{p=1}^{5}\text{err}^{(p)}/5err = ∑ start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT err start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT / 5. For AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG), 𝒴^𝜼^(p)subscriptsuperscript^𝒴𝑝^𝜼\widehat{\mathcal{Y}}^{(p)}_{\widehat{\bm{\eta}}}over^ start_ARG caligraphic_Y end_ARG start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT is obtained by (𝒴^𝜼^(p))ij=k=1K^λ0exp((^𝜼^(p))ijk)(η^kη^k1)subscriptsuperscriptsubscript^𝒴^𝜼𝑝𝑖𝑗superscriptsubscript𝑘1^𝐾subscript𝜆0subscriptsuperscriptsubscript^^𝜼𝑝𝑖𝑗𝑘subscript^𝜂𝑘subscript^𝜂𝑘1(\widehat{\mathcal{Y}}_{\widehat{\bm{\eta}}}^{(p)})_{ij}=\sum_{k=1}^{\widehat{% K}}\lambda_{0}\exp((\widehat{\mathcal{M}}_{\widehat{\bm{\eta}}}^{(p)})_{ijk})(% \widehat{\eta}_{k}-\widehat{\eta}_{k-1})( over^ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_exp ( ( over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT over^ start_ARG bold_italic_η end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ) ( over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ), whereas (𝒴^𝜹(p))ij=l=1L^λ0exp((^𝜹(p))ijk)(δlδl1)subscriptsuperscriptsubscript^𝒴𝜹𝑝𝑖𝑗superscriptsubscript𝑙1^𝐿subscript𝜆0subscriptsuperscriptsubscript^𝜹𝑝𝑖𝑗𝑘subscript𝛿𝑙subscript𝛿𝑙1(\widehat{\mathcal{Y}}_{\bm{\delta}}^{(p)})_{ij}=\sum_{l=1}^{\widehat{L}}% \lambda_{0}\exp((\widehat{\mathcal{M}}_{\bm{\delta}}^{(p)})_{ijk})(\delta_{l}-% \delta_{l-1})( over^ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_L end_ARG end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_exp ( ( over^ start_ARG caligraphic_M end_ARG start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ) ( italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ) for ES(Loptsubscript𝐿optL_{\text{opt}}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT). The estimates by HOSVD and HOOI are obtained in the same way as in Section 5.1. The averaged testing errors and their standard errors for the competing methods over 50 times replications are provided in Table 4. It is evident that AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) and ES(L𝐿Litalic_L) significantly outperform the spectral methods, and the difference between AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) and ES(L𝐿Litalic_L) is not significant, which is not surprising as T=120𝑇120T=120italic_T = 120 is not large enough compared with n=195𝑛195n=195italic_n = 195, corresponding to the third scenario in Table 1.

Table 4: The average testing errors and standard errors (in parentheses) for various methods over 50 replications.
AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) ES(L) HOSVD HOOI
0.739(0.037) 0.752(0.087) 1.160(0.002) 1.163(0.002)

Furthermore, the output of AM(K^)^𝐾(\widehat{K})( over^ start_ARG italic_K end_ARG ) yields that K^=6^𝐾6\widehat{K}=6over^ start_ARG italic_K end_ARG = 6 and 𝜼^=(20,45,50,95,105,120)^𝜼20455095105120\widehat{\bm{\eta}}=(20,45,50,95,105,120)over^ start_ARG bold_italic_η end_ARG = ( 20 , 45 , 50 , 95 , 105 , 120 ), and thus the adaptively merged time intervals are 1895-1914, 1915-1939, 1940-1944, 1945-1989, 1990-1999 and 2000-2014. These intervals appear to be closely related with a number of major world-wide events: before WWI, recess between WWI and WWII, WWII, Cold War, the 90s, and the 21st century. The estimated temporal embedding vectors {𝐰^l,𝜹}l=1Lsuperscriptsubscriptsubscript^𝐰𝑙𝜹𝑙1𝐿\{\widehat{\mathbf{w}}_{l,\bm{\delta}}\}_{l=1}^{L}{ over^ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT are shown in Figure 3, where 𝐰^l,𝜹subscript^𝐰𝑙𝜹\widehat{\mathbf{w}}_{l,\bm{\delta}}over^ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT in different merged time intervals, represented by different colors, are well separated.

Refer to caption
Figure 3: The estimated temporal embedding vectors {𝐰^l,𝜹}l=19superscriptsubscriptsubscript^𝐰𝑙𝜹𝑙19\{\widehat{\mathbf{w}}_{l,\bm{\delta}}\}_{l=1}^{9}{ over^ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_l , bold_italic_δ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT, where colors represent different merged time intervals.

It is also interesting to examine the averaged estimation error in each small intervals [δl1,δl)subscript𝛿𝑙1subscript𝛿𝑙[\delta_{l-1},\delta_{l})[ italic_δ start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ), as displayed in Figure 4. Clearly, the estimation errors of AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) are generally smaller than ES(L𝐿Litalic_L) in intervals that do not contain the estimated change points, but more or less comparable in intervals containing the estimated change points. This phenomenon reveals that adaptive merging actually leads to a smaller tensor estimation error than ES(L𝐿Litalic_L) over the time line, while it produces similar errors in those small number of intervals containing the estimated change points, which somehow dominates the tensor estimation errors.

Refer to caption
Figure 4: The average estimation errors of AM(K^^𝐾\widehat{K}over^ start_ARG italic_K end_ARG) and ES(L𝐿Litalic_L) in each of the L=24𝐿24L=24italic_L = 24 intervals over 50 replications.

6 Discussion

In this paper, we propose an efficient estimation framework for longitudinal network, leveraging strengths of adaptive network merging, tensor decomposition and point process. A thorough analysis is conducted to quantify the asymptotic behavior of the proposed method, which shows that adaptively network merging leads to substantially improved estimation accuracy compared with existing competitors in literature. The theoretical analysis also provides a guideline for network merging under various scenarios. The advantage of the proposed method is supported in the numerical experiments on both synthetic and real longitudinal networks. The proposed estimation framework can be further extended to incorporate edge-wise or node-wise covariates or employ some more general counting processes, which will be left for future investigation.

Acknowledgment

This research is supported in part by HK RGC Grants GRF-11304520, GRF-11301521, GRF-11311022, and CUHK Startup Grant 4937091.

References

  • Aggarwal and Subbian, (2014) Aggarwal, C. and Subbian, K. (2014). Evolutionary network analysis: A survey. ACM Computing Surveys (CSUR), 47:1–36.
  • Athreya et al., (2017) Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., Levin, K., Lyzinski, V., and Qin, Y. (2017). Statistical inference on random dot product graphs: a survey. The Journal of Machine Learning Research, 18:8393–8484.
  • Avena-Koenigsberger et al., (2018) Avena-Koenigsberger, A., Misic, B., and Sporns, O. (2018). Communication dynamics in complex brain networks. Nature Reviews Neuroscience, 19:17–33.
  • Cai et al., (2023) Cai, J.-F., Li, J., and Xia, D. (2023). Generalized low-rank plus sparse tensor estimation by fast riemannian optimization. Journal of the American Statistical Association, 118:2588–2604.
  • Cranmer and Desmarais, (2011) Cranmer, S. J. and Desmarais, B. A. (2011). Inferential network analysis with exponential random graph models. Political Analysis, 19:66–86.
  • (6) De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000a). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21:1253–1278.
  • (7) De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000b). On the best rank-1 and rank-(r1,r2,,rn)subscript𝑟1subscript𝑟2subscript𝑟𝑛(r_{1},r_{2},...,r_{n})( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21:1324–1342.
  • De Ruiter et al., (2005) De Ruiter, P. C., Wolters, V., and Moore, J. C. (2005). Dynamic food webs: multispecies assemblages, ecosystem development and environmental change. Elsevier.
  • Han et al., (2022) Han, R., Willett, R., and Zhang, A. R. (2022). An optimal statistical and computational framework for generalized tensor estimation. The Annals of Statistics, 50:1–29.
  • Hanneke et al., (2010) Hanneke, S., Fu, W., and Xing, E. P. (2010). Discrete temporal models of social networks. Electronic Journal of Statistics, 4:585–605.
  • Hao et al., (2013) Hao, N., Niu, Y. S., and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm. Statistica Sinica, 23:1553.
  • Hoff et al., (2002) Hoff, P., Raftery, A., and Handcock, M. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090–1098.
  • Holme and Saramäki, (2012) Holme, P. and Saramäki, J. (2012). Temporal networks. Physics Reports, 519:97–125.
  • Huang et al., (2023) Huang, S., Weng, H., and Feng, Y. (2023). Spectral clustering via adaptive layer aggregation for multi-layer networks. Journal of Computational and Graphical Statistics, 32:1170–1184.
  • Kim et al., (2018) Kim, B., Lee, K. H., Xue, L., and Niu, X. (2018). A review of dynamic network models with latent variables. Statistics Surveys, 12:105.
  • Kinne, (2013) Kinne, B. J. (2013). Network dynamics and the evolution of international cooperation. American Political Science Review, 107:766–785.
  • Lyu et al., (2023) Lyu, Z., Xia, D., and Zhang, Y. (2023). Latent space model for higher-order networks and generalized tensor decomposition. Journal of Computational and Graphical Statistics, 32:1320–1336.
  • Matias and Miele, (2017) Matias, C. and Miele, V. (2017). Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79:1119–1141.
  • Matias et al., (2018) Matias, C., Rebafka, T., and Villers, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika, 105:665–680.
  • Niu et al., (2016) Niu, Y. S., Hao, N., and Zhang, H. (2016). Multiple change-point detection: A selective overview. Statistical Science, 31:611–623.
  • Palmer et al., (2022) Palmer, G., McManus, R. W., D’Orazio, V., Kenwick, M. R., Karstens, M., Bloch, C., Dietrich, N., Kahn, K., Ritter, K., and Soules, M. J. (2022). The mid5 dataset, 2011–2014: Procedures, coding rules, and description. Conflict Management and Peace Science, 39:470–482.
  • Perry and Wolfe, (2013) Perry, P. O. and Wolfe, P. J. (2013). Point process modelling for directed interaction networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75:821–849.
  • Perry-Smith and Shalley, (2003) Perry-Smith, J. E. and Shalley, C. E. (2003). The social side of creativity: A static and dynamic social network perspective. Academy of Management Review, 28:89–106.
  • Rubin-Delanchy et al., (2022) Rubin-Delanchy, P., Cape, J., Tang, M., and Priebe, C. E. (2022). A statistical interpretation of spectral embedding: The generalised random dot product graph. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84:1446–1473.
  • Sewell and Chen, (2015) Sewell, D. K. and Chen, Y. (2015). Latent space models for dynamic networks. Journal of the American Statistical Association, 110:1646–1657.
  • Sewell and Chen, (2016) Sewell, D. K. and Chen, Y. (2016). Latent space models for dynamic networks with weighted edges. Social Networks, 44:105–116.
  • Sit et al., (2021) Sit, T., Ying, Z., and Yu, Y. (2021). Event history analysis of dynamic networks. Biometrika, 108:223–230.
  • Snijders, (2017) Snijders, T. A. (2017). Stochastic actor-oriented models for network dynamics. Annual Review of Statistics and Its Application, 4:343–363.
  • Snijders et al., (2010) Snijders, T. A., Koskinen, J., and Schweinberger, M. (2010). Maximum likelihood estimation for social network dynamics. The Annals of Applied Statistics, 4:567.
  • Soliman et al., (2022) Soliman, H., Zhao, L., Huang, Z., Paul, S., and Xu, K. S. (2022). The multivariate community hawkes model for dependent relational events in continuous-time networks. In International Conference on Machine Learning, pages 20329–20346. PMLR.
  • Ulanowicz, (2004) Ulanowicz, R. E. (2004). Quantitative methods for ecological network analysis. Computational Biology and Chemistry, 28:321–339.
  • Voytek and Knight, (2015) Voytek, B. and Knight, R. T. (2015). Dynamic network communication as a unifying neural basis for cognition, development, aging, and disease. Biological Psychiatry, 77:1089–1097.
  • (33) Vu, D., Hunter, D., Smyth, P., and Asuncion, A. (2011a). Continuous-time regression models for longitudinal networks. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc.
  • (34) Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. (2011b). Dynamic egocentric models for citation networks. In International Conference on Machine Learning, page 857–864.
  • Wainwright, (2019) Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press.
  • Zhang et al., (2022) Zhang, J., He, X., and Wang, J. (2022). Directed community detection with network embedding. Journal of the American Statistical Association, 117:1809–1819.
  • Zhen and Wang, (2023) Zhen, Y. and Wang, J. (2023). Community detection in general hypergraph via graph embedding. Journal of the American Statistical Association, 118(543):1620–1629.