Contrastive independent component analysis

Kexin Wang Harvard University, 29 Oxford Street, Pierce Hall 212A, Cambridge, MA 02138, USA [email protected] Aida Maraj Harvard University, 29 Oxford Street, Pierce Hall 212A, Cambridge, MA 02138, USA AND University of Michigan, East Hall 1855, Ann Arbor, MI 48109 [email protected]  and  Anna Seigal Harvard University, 29 Oxford Street, Pierce Hall 324, Cambridge, MA 02138, USA [email protected]
Abstract.

Visualizing data and finding patterns in data are ubiquitous problems in the sciences. Increasingly, applications seek signal and structure in a contrastive setting: a foreground dataset relative to a background dataset. For this purpose, we propose contrastive independent component analysis (cICA). This generalizes independent component analysis to independent latent variables across a foreground and background. We propose a hierarchical tensor decomposition algorithm for cICA. We study the identifiability of cICA and demonstrate its performance visualizing data and finding patterns in data, using synthetic and real-world datasets, comparing the approach to existing contrastive methods.

1. Introduction

Finding and understanding patterns in data is fundamental in various scientific fields. Often, data have been collected under two different settings, such as a group of patients receiving a treatment and a control group, or a group of patients with a certain disease and a group without the disease. The goal may be to understand the effect of the treatment, or to understand the genetic changes that describe the disease. While standard data analysis methods can be used, which restrict attention to one of the datasets or combine them together, an alternate view is offered by contrastive methods. Contrastive methods view the two settings as a foreground and a background. They seek to learn patterns in the foreground after accounting for (or, “subtracting off”) the background. The hope is that such patterns encode useful structure and offer a good basis for dimensionality reduction and visualization of the data, to identify fine-grained structure and clusters particular to the foreground.

The contrastive viewpoint is first addressed in [ZHPA13], in contrastive topic modeling and contrastive hidden Markov Models applied to genomic sequence analysis. Principal component analysis (PCA) is generalized to contrastive PCA (cPCA) in [AZBZ17, AZBZ18]. The contrastive patterns are principal components of the foreground covariance matrix minus a scalar multiple of the background covariance matrix. The paper [SGN19] studies a linear contrastive latent variable model. Probabilistic contrastive PCA (PCPCA) is introduced in [LJE20], where foreground patterns are inferred by maximizing a likelihood ratio of linear Gaussian mixtures.

In this paper, we propose contrastive independent component analysis (cICA). Independent component analysis (ICA) is a blind source separation method, which seeks to recover latent sources and unknown mixing from observations of mixtures of signals [CJ10]. ICA assumes that latent sources are independent. In extending ICA to the contrastive setting, the idea is that background data is generated by mixing of independent sources while foreground data is generated by the background mixing together with a foreground mixing of independent sources. The patterns of interest are the foreground mixing.

We show that cICA has strong identifiability properties. These enable the contribution of each background pattern to the foreground to be found uniquely. This avoids the need for a sweep of hyperparameters to find the best multiple of the background to subtract from the foreground and even avoids the assumption that the background contribution to the foreground is via a single scalar multiple, both of which are required in cPCA and PCPCA [AZBZ17, AZ19, LJE20]. We develop tensor decomposition algorithms for cICA and show that they recover accurate patterns for synthetic data. For this, we devise a new hierarchical tensor decomposition based on recursive eigendecompositions. We turn cICA into a dimensionality reduction tool and investigate its performance on real-world data, comparing the plots to those obtained with other contrastive methods to see its competitiveness.

The paper is organized as follows. We define cICA and introduce a tensor decomposition approach to learn it in Section 2. A key ingredient of learning cICA is a new hierarchical tensor decomposition, which we introduce and study in Section 3. We study the identifiability results and present algorithms for cICA in Section 4. Numerical results are in Section 6.

2. From ICA to contrastive ICA

Independent component analysis (ICA) studies observations that are a linear mixture of independent source variables. We write the ICA model as

(1) 𝐲=A𝐳,𝐲𝐴𝐳\mathbf{y}=A\mathbf{z},bold_y = italic_A bold_z ,

where 𝐳𝐳\mathbf{z}bold_z is a vector of r𝑟ritalic_r independent latent random variables, the mixing matrix is Ap×r𝐴superscript𝑝𝑟A\in\mathbb{R}^{p\times r}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_r end_POSTSUPERSCRIPT, and 𝐲𝐲\mathbf{y}bold_y is a vector of p𝑝pitalic_p observed variables. The i𝑖iitalic_i-th column of A𝐴Aitalic_A records a pattern in the data: the contribution of variable zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to each of the p𝑝pitalic_p observed variables. The identifiability of ICA refers to the uniqueness of the mixing matrix A𝐴Aitalic_A and sometimes also of the variables 𝐳𝐳\mathbf{z}bold_z; see [EK04, Com94, WS24].

Many algorithms for ICA proceed via tensor decomposition, see e.g. [CJ10, CS93, DLDMV01, DLCC07]. The cumulants of a distribution are symmetric tensors that encode it. The d𝑑ditalic_d-th cumulant κd(𝐲)subscript𝜅𝑑𝐲\kappa_{d}(\mathbf{y})italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_y ) of 𝐲𝐲\mathbf{y}bold_y is a symmetric order d𝑑ditalic_d tensor of format p××p𝑝𝑝p\times\cdots\times pitalic_p × ⋯ × italic_p with decomposition

(2) κd(𝐲)=i=1rλi𝐚id,subscript𝜅𝑑𝐲superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent𝑑\kappa_{d}(\mathbf{y})=\sum_{i=1}^{r}\lambda_{i}\mathbf{a}_{i}^{\otimes d},italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_y ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT ,

where the scalar λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the d𝑑ditalic_d-th cumulant of zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the vector 𝐚ipsubscript𝐚𝑖superscript𝑝\mathbf{a}_{i}\in\mathbb{R}^{p}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is the i𝑖iitalic_i-th column of A𝐴Aitalic_A. This decomposition (2) follows from the multi-linear properties of cumulants and the fact that cumulant tensors of independent variables are diagonal, see [McC18, Chapter 2]. The matrix A𝐴Aitalic_A can be recovered using tensor decomposition of the cumulant tensor (2). If the tensor decomposition is identifiable, then the columns 𝐚isubscript𝐚𝑖\mathbf{a}_{i}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with λi0subscript𝜆𝑖0\lambda_{i}\neq 0italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 can be recovered uniquely up to permutation and scaling of columns. Thus tensor decomposition of higher-order cumulant tensors gives an algorithm for ICA, provided no source variable is Gaussian (this is required for non-zero higher-order cumulants).

In this paper we extend ICA, and tensor decomposition for ICA, to the comparison of two distributions. We call this contrastive ICA (cICA), by analogy with cPCA [AZBZ18]. We have two observed distributions, a foreground and a background. Both are assumed to be linear mixtures of independent source variables. The cICA model expresses the background 𝐲𝐲\mathbf{y}bold_y and foreground 𝐱𝐱\mathbf{x}bold_x as

(3) 𝐲=A𝐳and𝐱=A𝐳+B𝐬.formulae-sequence𝐲𝐴𝐳and𝐱𝐴superscript𝐳𝐵𝐬\mathbf{y}=A\mathbf{z}\qquad\text{and}\qquad\mathbf{x}=A\mathbf{z}^{\prime}+B% \mathbf{s}.bold_y = italic_A bold_z and bold_x = italic_A bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_B bold_s .

The background distribution 𝐲𝐲\mathbf{y}bold_y is a linear mixture of a random vector 𝐳𝐳\mathbf{z}bold_z of r𝑟ritalic_r independent random variables, as in (1). The foreground 𝐱𝐱\mathbf{x}bold_x is a mixture of r+𝑟r+\ellitalic_r + roman_ℓ independent variables 𝐳=(z1,,zr)superscript𝐳superscriptsubscript𝑧1superscriptsubscript𝑧𝑟\mathbf{z}^{\prime}=(z_{1}^{\prime},\ldots,z_{r}^{\prime})bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and 𝐬=(s1,,s)𝐬subscript𝑠1subscript𝑠\mathbf{s}=(s_{1},\ldots,s_{\ell})bold_s = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ). The columns of A𝐴Aitalic_A are the patterns in the background: column 𝐚ipsubscript𝐚𝑖superscript𝑝\mathbf{a}_{i}\in\mathbb{R}^{p}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT records how source variable zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT appears among the p𝑝pitalic_p background variables as well as how source variable zisuperscriptsubscript𝑧𝑖z_{i}^{\prime}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT appears among the p𝑝pitalic_p foreground variables. The columns of B𝐵Bitalic_B are patterns that appear only in the foreground. They correspond to the variables sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, referred to as the salient variables in [AZ19].

We propose a tensor decomposition algorithm to recover mixing matrices A𝐴Aitalic_A and B𝐵Bitalic_B from (3). These matrices record the patterns that encode our background and foreground distributions. We apply the algorithm to empirical cumulant tensors of 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y obtained from sample data. We order the columns of matrix B𝐵Bitalic_B to obtain a dimensionality reduction tool. We work under the assumption that 𝐳,𝐳,𝐬𝐳superscript𝐳𝐬\mathbf{z},\mathbf{z}^{\prime},\mathbf{s}bold_z , bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_s are non-Gaussian, an assumption that also appears for usual ICA. This can likely be relaxed to that at most one source is Gaussian, cf. [Com94, WS24].

Under the model (3), the d𝑑ditalic_d-th cumulants of the background and foreground data are, respectively,

(4) κd(𝐲)=i=1rλi𝐚id,κd(𝐱)=i=1rλi𝐚id+j=1νj𝐛jd,formulae-sequencesubscript𝜅𝑑𝐲superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent𝑑subscript𝜅𝑑𝐱superscriptsubscript𝑖1𝑟superscriptsubscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent𝑑superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent𝑑\kappa_{d}(\mathbf{y})=\sum_{i=1}^{r}\lambda_{i}\mathbf{a}_{i}^{\otimes d},% \qquad\quad\kappa_{d}(\mathbf{x})=\sum_{i=1}^{r}\lambda_{i}^{\prime}\mathbf{a}% _{i}^{\otimes d}+\sum_{j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes d},italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_y ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT , italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT ,

where λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the d𝑑ditalic_d-th cumulant of zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the d𝑑ditalic_d-th cumulant of zisuperscriptsubscript𝑧𝑖z_{i}^{\prime}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and νjsubscript𝜈𝑗\nu_{j}italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the d𝑑ditalic_d-th cumulant of sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. This follows from the multilinearity of cumulants and that cumulant tensors of independent sources are diagonal, as for usual ICA. We hence have the following optimization problem to recover A𝐴Aitalic_A and B𝐵Bitalic_B: find a joint decomposition of cumulant tensors κd(𝐲)subscript𝜅𝑑𝐲\kappa_{d}(\mathbf{y})italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_y ) and κd(𝐱)subscript𝜅𝑑𝐱\kappa_{d}(\mathbf{x})italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_x ) of the form in (4). Our approach is:

  1. (1)

    Compute a symmetric tensor decomposition of κd(𝐲)subscript𝜅𝑑𝐲\kappa_{d}(\mathbf{y})italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_y ) to learn A𝐴Aitalic_A.

  2. (2)

    Find the coefficients λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of each 𝐚idsuperscriptsubscript𝐚𝑖tensor-productabsent𝑑\mathbf{a}_{i}^{\otimes d}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT in κd(𝐱)subscript𝜅𝑑𝐱\kappa_{d}(\mathbf{x})italic_κ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_x ) to obtain j=1νj𝐛jdsuperscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent𝑑\sum_{j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes d}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT.

  3. (3)

    Compute a symmetric tensor decomposition of j=1νj𝐛jdsuperscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent𝑑\sum_{j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes d}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT to learn B𝐵Bitalic_B.

We work with the fourth order cumulants d=4𝑑4d=4italic_d = 4. We propose a hierarchical eigendecomposition based algorithm to decompose an order four symmetric tensor, which we describe in the next section. This is a key ingredient to our cICA algorithm.

2.1. Related Work

We relate cICA to other contrastive models. Setting 𝐳=γ𝐳superscript𝐳𝛾𝐳\mathbf{z}^{\prime}=\gamma\mathbf{z}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z and studying observed distributions 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y via their covariance matrices (d=2𝑑2d=2italic_d = 2) specializes cICA to cPCA from [AZBZ17, AZBZ18]. cICA also relates to PCPCA [LJE20] but we do not impose distributional assumptions, beyond independence, on the variables 𝐳𝐳\mathbf{z}bold_z and (𝐳,𝐬)superscript𝐳𝐬(\mathbf{z}^{\prime},\mathbf{s})( bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_s ). Finally, the cICA model fits into the contrastive latent variable model framework of [SGN19], but we disregard noise terms and do not impose 𝐳=𝐳𝐳superscript𝐳\mathbf{z}=\mathbf{z}^{\prime}bold_z = bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

The setting of cICA relates to usual ICA, with block structure on the mixing matrix:

if 𝐳𝐳𝐬 are independent,(𝐱𝐲)=(0ABA00)(𝐳𝐳𝐬);if 𝐳=γ𝐳,(𝐱𝐲)=(γABA0)(𝐳𝐬).formulae-sequenceif 𝐳𝐳𝐬 are independent,matrix𝐱𝐲matrix0𝐴𝐵𝐴00matrix𝐳superscript𝐳𝐬if 𝐳=γ𝐳,matrix𝐱𝐲matrix𝛾𝐴𝐵𝐴0matrix𝐳𝐬\text{if $\mathbf{z}^{\prime}\!$, $\mathbf{z}$, $\mathbf{s}$ are independent,}% \,\begin{pmatrix}\mathbf{x}\\ \mathbf{y}\end{pmatrix}=\begin{pmatrix}0&A&B\\ A&0&0\end{pmatrix}\begin{pmatrix}\mathbf{z}\phantom{{}^{\prime}}\\ \mathbf{z}^{\prime}\\ \mathbf{s}\phantom{{}^{\prime}}\end{pmatrix};\quad\text{if $\mathbf{z}^{\prime% }=\gamma\mathbf{z}$,}\,\begin{pmatrix}\mathbf{x}\\ \mathbf{y}\end{pmatrix}=\begin{pmatrix}\gamma A&B\\ A&0\end{pmatrix}\begin{pmatrix}\mathbf{z}\\ \mathbf{s}\end{pmatrix}.if bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z , bold_s are independent, ( start_ARG start_ROW start_CELL bold_x end_CELL end_ROW start_ROW start_CELL bold_y end_CELL end_ROW end_ARG ) = ( start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_A end_CELL start_CELL italic_B end_CELL end_ROW start_ROW start_CELL italic_A end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL bold_z end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_s end_CELL end_ROW end_ARG ) ; if bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z , ( start_ARG start_ROW start_CELL bold_x end_CELL end_ROW start_ROW start_CELL bold_y end_CELL end_ROW end_ARG ) = ( start_ARG start_ROW start_CELL italic_γ italic_A end_CELL start_CELL italic_B end_CELL end_ROW start_ROW start_CELL italic_A end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) ( start_ARG start_ROW start_CELL bold_z end_CELL end_ROW start_ROW start_CELL bold_s end_CELL end_ROW end_ARG ) .

However, learning parameters via usual ICA in either of these settings requires access to the joint distribution of (𝐱,𝐲)𝐱𝐲(\mathbf{x},\mathbf{y})( bold_x , bold_y ), In the first setting, 𝐱,𝐲𝐱𝐲\mathbf{x},\mathbf{y}bold_x , bold_y are independent, so we can build the joint distribution of (𝐱,𝐲)𝐱𝐲(\mathbf{x},\mathbf{y})( bold_x , bold_y ) from unpaired observations of 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y. Identifiability can be characterized using [Com94], or using [EK04, WS24] if the model is overcomplete (i.e. the number of sources exceeds the number of observations, which occurs for 2r+>2p2𝑟2𝑝2r+\ell>2p2 italic_r + roman_ℓ > 2 italic_p). However, in practice, the independence assumption on all of 𝐳,𝐳,𝐬superscript𝐳𝐳𝐬\mathbf{z}^{\prime},\mathbf{z},\mathbf{s}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_z , bold_s is too strong. In the second setting, we do not have access to the joint distribution of (𝐱,𝐲)𝐱𝐲(\mathbf{x},\mathbf{y})( bold_x , bold_y ) unless we have paired data samples, which is unrealistic in the settings we study.

In [SSDU24], the authors study multi-modal linear ICA. They recover the mixing matrices from each mode via usual linear ICA and then use a hypothesis test to decide which latent variables should be shared across modes. Our method differs from this as we seek shared patterns across datasets rather than shared latent variables.

Nonlinear ICA related contrastive methods have been explored in the literature. Nonlinear ICA is studied using contrastive learning in [HM16, HST19, LF22]. Here contrastive is used in a different context: it describes a method to train a network to distinguish two datasets. A nonlinear contrastive method called a contrastive variational autoencoder (cVAE) is introduced in [AZ19, SGN19]. The paper [WBWL22] presents a method for cVAE using maximum mean discrepancy to prevent leakage of information between the two sets of latent variables. Identifiability of cVAE is studied using result from nonlinear ICA in [LHH+24]. These works produce a nonlinear latent encoding of data, whereas our focus is on pattern vectors to describe observed variables.

3. Hierarchical tensor decomposition

We introduce a hierarchical tensor decomposition (HTD) that decomposes an order four tensor via recursive eigendecompositions. The idea is to find a low-rank approximation of a tensor, whose rank one summands offer an interpretable basis on which to project data. Later, we use the decomposition for cICA. In this section, we define the decomposition and study its properties.

3.1. The HTD algorithm

Consider a symmetric tensor T𝑇Titalic_T of format p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p. We compute a rank r𝑟ritalic_r approximation,

(5) Ti=1rνi𝐛i4,𝑇superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T\approx\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4},italic_T ≈ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT ,

as follows. Let Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) be the flattening of T𝑇Titalic_T that rearranges its p4superscript𝑝4p^{4}italic_p start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT entries into a matrix of size p2×p2superscript𝑝2superscript𝑝2p^{2}\times p^{2}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The entries of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) are indexed ((i1,i2),(j1,j2))subscript𝑖1subscript𝑖2subscript𝑗1subscript𝑗2((i_{1},i_{2}),(j_{1},j_{2}))( ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ), where i1,i2,j1,j2[p]:={1,,p}subscript𝑖1subscript𝑖2subscript𝑗1subscript𝑗2delimited-[]𝑝assign1𝑝i_{1},i_{2},j_{1},j_{2}\in[p]:=\{1,\ldots,p\}italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ italic_p ] := { 1 , … , italic_p }. We compute the approximation (5) by first computing the eigendecomposition of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ), whose eigenvectors lie in p2superscriptsuperscript𝑝2\mathbb{R}^{p^{2}}blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and then by resha** these eigenvectors into p×p𝑝𝑝p\times pitalic_p × italic_p matrices and computing their top eigenvalue and corresponding eigenvector. By top eigenvalue we mean those of highest magnitude. This decomposition has not to our knowledge been studied before but has connections to the hierarchical tensor representations of [Hac12, Chapter 11] and to the PARATREE model in [SRK09], see Subsection 3.3. Here is the algorithm.

Algorithm 1 Compute unit vectors 𝐛1,,𝐛rsubscript𝐛1subscript𝐛𝑟\mathbf{b}_{1},\ldots,\mathbf{b}_{r}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT such that Ti=1rνi𝐛i4𝑇superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T\approx\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4}italic_T ≈ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT
1:Symmetric tensor T𝑇Titalic_T of format p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p and rank r𝑟ritalic_r.
2:Compute the eigendecomposition of the p2×p2superscript𝑝2superscript𝑝2p^{2}\times p^{2}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT flattening Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ). Take the top r𝑟ritalic_r eigenvalues μ1,,μrsubscript𝜇1subscript𝜇𝑟\mu_{1},\ldots,\mu_{r}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, with corresponding eigenvectors 𝐯1,,𝐯rp2subscript𝐯1subscript𝐯𝑟superscriptsuperscript𝑝2\mathbf{v}_{1},\ldots,\mathbf{v}_{r}\in\mathbb{R}^{p^{2}}bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT of unit length.
3:For each i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ], reshape 𝐯ip2subscript𝐯𝑖superscriptsuperscript𝑝2\mathbf{v}_{i}\in\mathbb{R}^{p^{2}}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to Mip×psubscript𝑀𝑖superscript𝑝𝑝M_{i}\in\mathbb{R}^{p\times p}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT.
4:For each Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, find the top eigenvalue βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and a corresponding unit length eigenvector 𝐛ipsubscript𝐛𝑖superscript𝑝\mathbf{b}_{i}\in\mathbb{R}^{p}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT.
5:Rank r𝑟ritalic_r decomposition i=1r(μiβi2)𝐛i4superscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝛽𝑖2superscriptsubscript𝐛𝑖tensor-productabsent4\sum_{i=1}^{r}(\mu_{i}\beta_{i}^{2})\mathbf{b}_{i}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT.

We record some observations pertaining to Algorithm 1. The matrix Mat(T)p2×p2Mat𝑇superscriptsuperscript𝑝2superscript𝑝2\operatorname{Mat}(T)\in\mathbb{R}^{p^{2}\times p^{2}}roman_Mat ( italic_T ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is symmetric since T𝑇Titalic_T is symmetric. The matrices M1,,Mrp×psubscript𝑀1subscript𝑀𝑟superscript𝑝𝑝M_{1},\ldots,M_{r}\in\mathbb{R}^{p\times p}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT are also symmetric, because the vectors 𝐯1,,𝐯rsubscript𝐯1subscript𝐯𝑟\mathbf{v}_{1},\ldots,\mathbf{v}_{r}bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are in the column space of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ), whose (i1,i2)subscript𝑖1subscript𝑖2(i_{1},i_{2})( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )-th row coincides with its (i2,i1)subscript𝑖2subscript𝑖1(i_{2},i_{1})( italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )-th row. Although the output vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are in general not orthogonal, as each is an eigenvector of a distinct matrix, they can be nearly orthogonal in practice, see Section 3.2. This is because they are the leading eigenvectors of matrices that have been reshaped from orthogonal vectors 𝐯isubscript𝐯𝑖\mathbf{v}_{i}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Example 3.1 (2×2×2×222222\times 2\times 2\times 22 × 2 × 2 × 2 example).

Let r=2𝑟2r=2italic_r = 2. Fix

T=2[10]4+[0.09980.995]4.ThenMat(T)=[2.00010.00100.00100.00990.00100.00990.00990.09830.00100.00990.00990.09830.00990.09830.09830.9801]T=2\begin{bmatrix}1\\ 0\end{bmatrix}^{\otimes 4}+\begin{bmatrix}0.0998\\ 0.995\end{bmatrix}^{\otimes 4}.\qquad\text{Then}\qquad\mathrm{Mat}(T)=\begin{% bmatrix}2.0001&0.0010&0.0010&0.0099\\ 0.0010&0.0099&0.0099&0.0983\\ 0.0010&0.0099&0.0099&0.0983\\ 0.0099&0.0983&0.0983&0.9801\end{bmatrix}italic_T = 2 [ start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT + [ start_ARG start_ROW start_CELL 0.0998 end_CELL end_ROW start_ROW start_CELL 0.995 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT . Then roman_Mat ( italic_T ) = [ start_ARG start_ROW start_CELL 2.0001 end_CELL start_CELL 0.0010 end_CELL start_CELL 0.0010 end_CELL start_CELL 0.0099 end_CELL end_ROW start_ROW start_CELL 0.0010 end_CELL start_CELL 0.0099 end_CELL start_CELL 0.0099 end_CELL start_CELL 0.0983 end_CELL end_ROW start_ROW start_CELL 0.0010 end_CELL start_CELL 0.0099 end_CELL start_CELL 0.0099 end_CELL start_CELL 0.0983 end_CELL end_ROW start_ROW start_CELL 0.0099 end_CELL start_CELL 0.0983 end_CELL start_CELL 0.0983 end_CELL start_CELL 0.9801 end_CELL end_ROW end_ARG ]

with eigenvalues μ1=2.00019,μ2=0.99977formulae-sequencesubscript𝜇12.00019subscript𝜇20.99977\mu_{1}=2.00019,\mu_{2}=0.99977italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2.00019 , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.99977 and associated eigenvectors

𝐯1𝖳[0.999950.000980.000980.00985],𝐯2𝖳[0.009950.09930.09930.99003].formulae-sequencesuperscriptsubscript𝐯1𝖳matrix0.999950.000980.000980.00985superscriptsubscript𝐯2𝖳matrix0.009950.09930.09930.99003\mathbf{v}_{1}^{\mathsf{T}}\approx\begin{bmatrix}0.99995&0.00098&0.00098&0.009% 85\end{bmatrix},\,\mathbf{v}_{2}^{\mathsf{T}}\approx\begin{bmatrix}-0.00995&0.% 0993&0.0993&0.99003\end{bmatrix}.bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ≈ [ start_ARG start_ROW start_CELL 0.99995 end_CELL start_CELL 0.00098 end_CELL start_CELL 0.00098 end_CELL start_CELL 0.00985 end_CELL end_ROW end_ARG ] , bold_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ≈ [ start_ARG start_ROW start_CELL - 0.00995 end_CELL start_CELL 0.0993 end_CELL start_CELL 0.0993 end_CELL start_CELL 0.99003 end_CELL end_ROW end_ARG ] .

Their corresponding matrices M1,M22×2subscript𝑀1subscript𝑀2superscript22M_{1},M_{2}\in\mathbb{R}^{2\times 2}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 × 2 end_POSTSUPERSCRIPT are symmetric with top eigenvalues β1=0.99995subscript𝛽10.99995\beta_{1}=0.99995italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.99995 and β2=0.9998subscript𝛽20.9998\beta_{2}=0.9998italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.9998, respectively, with associated eigenvectors 𝐛1𝖳=[0.999990.00099]superscriptsubscript𝐛1𝖳matrix0.999990.00099\mathbf{b}_{1}^{\mathsf{T}}=\begin{bmatrix}0.99999&0.00099\end{bmatrix}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0.99999 end_CELL start_CELL 0.00099 end_CELL end_ROW end_ARG ] and 𝐛2𝖳=[0.097870.99519]superscriptsubscript𝐛2𝖳matrix0.097870.99519\mathbf{b}_{2}^{\mathsf{T}}=\begin{bmatrix}0.09787&0.99519\end{bmatrix}bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0.09787 end_CELL start_CELL 0.99519 end_CELL end_ROW end_ARG ]. The HTD algorithm with input T𝑇Titalic_T and r=2𝑟2r=2italic_r = 2 thus outputs

(6) i=12(μiβi2)𝐛i4=1.99999[0.999990.00099]4+0.99937[0.097870.99519]4.superscriptsubscript𝑖12subscript𝜇𝑖superscriptsubscript𝛽𝑖2superscriptsubscript𝐛𝑖tensor-productabsent41.99999superscriptmatrix0.999990.00099tensor-productabsent40.99937superscriptmatrix0.097870.99519tensor-productabsent4\sum_{i=1}^{2}(\mu_{i}\beta_{i}^{2})\mathbf{b}_{i}^{\otimes 4}=1.99999\begin{% bmatrix}0.99999\\ 0.00099\end{bmatrix}^{\otimes 4}+0.99937\begin{bmatrix}0.09787\\ 0.99519\end{bmatrix}^{\otimes 4}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT = 1.99999 [ start_ARG start_ROW start_CELL 0.99999 end_CELL end_ROW start_ROW start_CELL 0.00099 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT + 0.99937 [ start_ARG start_ROW start_CELL 0.09787 end_CELL end_ROW start_ROW start_CELL 0.99519 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT .

We note the similarity to the input tensor T𝑇Titalic_T.

3.2. Properties of the decomposition

The HTD algorithm outputs a rank r𝑟ritalic_r approximation of a tensor. In certain cases, the output closely approximates the input tensor, as in Example 3.1. We bound the distance between the HTD approximation and the input tensor. We give a bound that applies to all tensors in Proposition 3.2. We show that the input and output coincide for orthogonally decomposable tensors in Proposition 3.3. Our main result is Theorem 3.4, which bounds the distance between an input and output tensor for a tensor decomposition involving vectors that are close to orthogonal.

The norm \|\cdot\|∥ ⋅ ∥ refers to the Frobenius norm for matrices and tensors and the 2222-norm for vectors; i.e., the square root of the sum of the squares of the entries. The 2222-norm of a matrix is denoted by 2\|\cdot\|_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proposition 3.2.

Let T𝑇Titalic_T be a symmetric tensor of format p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p. Let T=i=1r(μiβi2)𝐛i4superscript𝑇superscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝛽𝑖2superscriptsubscript𝐛𝑖tensor-productabsent4T^{\prime}=\sum_{i=1}^{r}(\mu_{i}\beta_{i}^{2})\mathbf{b}_{i}^{\otimes 4}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT be the rank r𝑟ritalic_r HTD approximation of T𝑇Titalic_T. Then

TT(i=r+1qμi2)12+i=1r|μi|(1+|βi|)(j=2ri(βi(j))2)12,normsuperscript𝑇𝑇superscriptsuperscriptsubscript𝑖𝑟1𝑞superscriptsubscript𝜇𝑖212superscriptsubscript𝑖1𝑟subscript𝜇𝑖1subscript𝛽𝑖superscriptsuperscriptsubscript𝑗2subscript𝑟𝑖superscriptsuperscriptsubscript𝛽𝑖𝑗212\|T^{\prime}-T\|\leq{\left(\sum_{i=r+1}^{q}\mu_{i}^{2}\right)}^{\frac{1}{2}}+% \sum_{i=1}^{r}|\mu_{i}|(1+|\beta_{i}|){\left(\sum_{j=2}^{r_{i}}(\beta_{i}^{(j)% })^{2}\right)}^{\frac{1}{2}},∥ italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_T ∥ ≤ ( ∑ start_POSTSUBSCRIPT italic_i = italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( 1 + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) ( ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where q𝑞qitalic_q is the rank of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ), risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the rank of Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the numbers μ1,,μrsubscript𝜇1subscript𝜇𝑟\mu_{1},\ldots,\mu_{r}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are the eigenvalues of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) in descending order of magnitude, and βi:=βi(1)assignsubscript𝛽𝑖superscriptsubscript𝛽𝑖1\beta_{i}:=\beta_{i}^{(1)}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT is the highest magnitude eigenvalue of Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with βi(2),,βi(ri)superscriptsubscript𝛽𝑖2superscriptsubscript𝛽𝑖subscript𝑟𝑖\beta_{i}^{(2)},\ldots,\beta_{i}^{(r_{i})}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT the other eigenvalues.

Proof.

We use the notation from Algorithm 1. We have

Mat(T)i=1rμi𝐯i22=i=r+1qμi2andMiβi𝐛i22=j=2ri(βi(j))2,formulae-sequencesuperscriptnormMat𝑇superscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝐯𝑖tensor-productabsent22superscriptsubscript𝑖𝑟1𝑞superscriptsubscript𝜇𝑖2andsuperscriptnormsubscript𝑀𝑖subscript𝛽𝑖superscriptsubscript𝐛𝑖tensor-productabsent22superscriptsubscript𝑗2subscript𝑟𝑖superscriptsuperscriptsubscript𝛽𝑖𝑗2\|\operatorname{Mat}(T)-\sum_{i=1}^{r}\mu_{i}\mathbf{v}_{i}^{\otimes 2}\|^{2}=% \sum_{i=r+1}^{q}\mu_{i}^{2}\qquad\text{and}\qquad\|M_{i}-\beta_{i}\mathbf{b}_{% i}^{\otimes 2}\|^{2}=\sum_{j=2}^{r_{i}}(\beta_{i}^{(j)})^{2},∥ roman_Mat ( italic_T ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and ∥ italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

from the properties of the eigendecomposition of a symmetric matrix and the Frobenius norm. Let T′′superscript𝑇′′T^{\prime\prime}italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT be the p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor obtained from resha** the truncated eigendecomposition i=1rμi𝐯i2superscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝐯𝑖tensor-productabsent2\sum_{i=1}^{r}\mu_{i}\mathbf{v}_{i}^{\otimes 2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ). Then TT′′2=i=r+1qμi2superscriptnorm𝑇superscript𝑇′′2superscriptsubscript𝑖𝑟1𝑞superscriptsubscript𝜇𝑖2\|T-T^{\prime\prime}\|^{2}=\sum_{i=r+1}^{q}\mu_{i}^{2}∥ italic_T - italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Let 𝐁ip2subscript𝐁𝑖superscriptsuperscript𝑝2\mathbf{B}_{i}\in\mathbb{R}^{p^{2}}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT be the vectorization of 𝐛i2p×psuperscriptsubscript𝐛𝑖tensor-productabsent2superscript𝑝𝑝\mathbf{b}_{i}^{\otimes 2}\in\mathbb{R}^{p\times p}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT. Then

T′′T=normsuperscript𝑇′′superscript𝑇absent\displaystyle\|T^{\prime\prime}-T^{\prime}\|=∥ italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ = i=1rμi(𝐯i2βi2𝐁i2)normsuperscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝐯𝑖tensor-productabsent2superscriptsubscript𝛽𝑖2superscriptsubscript𝐁𝑖tensor-productabsent2\displaystyle\|\sum_{i=1}^{r}\mu_{i}(\mathbf{v}_{i}^{\otimes 2}-\beta_{i}^{2}% \mathbf{B}_{i}^{\otimes 2})\|∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) ∥
\displaystyle\leq i=1r|μi|𝐯i2βi2𝐁i2superscriptsubscript𝑖1𝑟subscript𝜇𝑖normsuperscriptsubscript𝐯𝑖tensor-productabsent2superscriptsubscript𝛽𝑖2superscriptsubscript𝐁𝑖tensor-productabsent2\displaystyle\sum_{i=1}^{r}|\mu_{i}|\|\mathbf{v}_{i}^{\otimes 2}-\beta_{i}^{2}% \mathbf{B}_{i}^{\otimes 2}\|∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥
\displaystyle\leq i=1r|μi|(𝐯i2βi𝐁i𝐯i+βi2𝐁i2βi𝐁i𝐯i)superscriptsubscript𝑖1𝑟subscript𝜇𝑖normsuperscriptsubscript𝐯𝑖tensor-productabsent2tensor-productsubscript𝛽𝑖subscript𝐁𝑖subscript𝐯𝑖normsuperscriptsubscript𝛽𝑖2superscriptsubscript𝐁𝑖tensor-productabsent2tensor-productsubscript𝛽𝑖subscript𝐁𝑖subscript𝐯𝑖\displaystyle\sum_{i=1}^{r}|\mu_{i}|(\|\mathbf{v}_{i}^{\otimes 2}-\beta_{i}% \mathbf{B}_{i}\otimes\mathbf{v}_{i}\|+\|\beta_{i}^{2}\mathbf{B}_{i}^{\otimes 2% }-\beta_{i}\mathbf{B}_{i}\otimes\mathbf{v}_{i}\|)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + ∥ italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ )
=\displaystyle== i=1r|μi|(𝐯i+|βi|𝐁i)𝐯iβi𝐁isuperscriptsubscript𝑖1𝑟subscript𝜇𝑖normsubscript𝐯𝑖subscript𝛽𝑖normsubscript𝐁𝑖normsubscript𝐯𝑖subscript𝛽𝑖subscript𝐁𝑖\displaystyle\sum_{i=1}^{r}|\mu_{i}|(\|\mathbf{v}_{i}\|+|\beta_{i}|\|\mathbf{B% }_{i}\|)\|\mathbf{v}_{i}-\beta_{i}\mathbf{B}_{i}\|∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ) ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
=\displaystyle== i=1r|μi|(1+|βi|)(j=2ri(βi(j))2)12,superscriptsubscript𝑖1𝑟subscript𝜇𝑖1subscript𝛽𝑖superscriptsuperscriptsubscript𝑗2subscript𝑟𝑖superscriptsuperscriptsubscript𝛽𝑖𝑗212\displaystyle\sum_{i=1}^{r}|\mu_{i}|(1+|\beta_{i}|){\left(\sum_{j=2}^{r_{i}}(% \beta_{i}^{(j)})^{2}\right)}^{\frac{1}{2}},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( 1 + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) ( ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where the penultimate equality follows from 𝐱𝐲=𝐱𝐲normtensor-product𝐱𝐲norm𝐱norm𝐲\|\mathbf{x}\otimes\mathbf{y}\|=\|\mathbf{x}\|\cdot\|\mathbf{y}\|∥ bold_x ⊗ bold_y ∥ = ∥ bold_x ∥ ⋅ ∥ bold_y ∥ and the last equality uses 𝐯i=𝐁i=1normsubscript𝐯𝑖normsubscript𝐁𝑖1\|\mathbf{v}_{i}\|=\|\mathbf{B}_{i}\|=1∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = 1. We conclude by the triangle inequality TTTT′′+T′′Tnorm𝑇superscript𝑇norm𝑇superscript𝑇′′normsuperscript𝑇′′superscript𝑇\|T-T^{\prime}\|\leq\|T-T^{\prime\prime}\|+\|T^{\prime\prime}-T^{\prime}\|∥ italic_T - italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ ∥ italic_T - italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ + ∥ italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥. ∎

The quantity in Proposition 3.2 is small if Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) is well-approximated by a matrix of rank r𝑟ritalic_r, and each Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is well-approximated by a matrix of rank one. Orthogonally decomposable tensors are those with a decomposition into orthogonal rank one terms; that is, a decomposition T=i=1rνi𝐛i4𝑇superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T=\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4}italic_T = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT, where 𝐛1,,𝐛rsubscript𝐛1subscript𝐛𝑟\mathbf{b}_{1},\ldots,\mathbf{b}_{r}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are orthonormal [Rob16]. For orthogonally decomposable tensors, HTD recovers the exact decomposition.

Proposition 3.3.

Let T=i=1rνi𝐛i4𝑇superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T=\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4}italic_T = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT, where the vectors 𝐛1,,𝐛rsubscript𝐛1subscript𝐛𝑟\mathbf{b}_{1},\ldots,\mathbf{b}_{r}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are orthonormal and the coefficients ν1,,νrsubscript𝜈1subscript𝜈𝑟\nu_{1},\ldots,\nu_{r}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ν start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are distinct. Then the rank r𝑟ritalic_r HTD approximation is the tensor T𝑇Titalic_T.

Proof.

The flattening Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) has decomposition i=1rνi𝐁i2superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐁𝑖tensor-productabsent2\sum_{i=1}^{r}\nu_{i}\mathbf{B}_{i}^{\otimes 2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT, where 𝐁ip2subscript𝐁𝑖superscriptsuperscript𝑝2\mathbf{B}_{i}\in\mathbb{R}^{p^{2}}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the vectorization of 𝐛i2p×psuperscriptsubscript𝐛𝑖tensor-productabsent2superscript𝑝𝑝\mathbf{b}_{i}^{\otimes 2}\in\mathbb{R}^{p\times p}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT. We have the orthogonality 𝐁i,𝐁j=𝐛i,𝐛j2=0subscript𝐁𝑖subscript𝐁𝑗superscriptsubscript𝐛𝑖subscript𝐛𝑗20\langle\mathbf{B}_{i},\mathbf{B}_{j}\rangle=\langle\mathbf{b}_{i},\mathbf{b}_{% j}\rangle^{2}=0⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ = ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j, since the vectors 𝐛i,𝐛jsubscript𝐛𝑖subscript𝐛𝑗\mathbf{b}_{i},\mathbf{b}_{j}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are orthogonal. Hence this expression for Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) is a sum of outer products of orthogonal vectors, so it is the eigendecomposition of Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ). The matrix reshaped from the eigenvector 𝐁isubscript𝐁𝑖\mathbf{B}_{i}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is Mi=𝐛i2subscript𝑀𝑖superscriptsubscript𝐛𝑖tensor-productabsent2M_{i}=\mathbf{b}_{i}^{\otimes 2}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT. It has top eigenvalue 1111 with corresponding eigenvector 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hence the output of HTD is i=1rνi𝐛i4superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT. ∎

We extend Proposition 3.3 to decompositions where the vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are close to orthogonal.

Theorem 3.4.

Let r=𝒪(1)𝑟𝒪1r=\mathcal{O}(1)italic_r = caligraphic_O ( 1 ). Fix T=i=1rνi𝐛i4𝑇superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T=\sum_{i=1}^{r}\nu_{i}\mathbf{b}_{i}^{\otimes 4}italic_T = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT with νisubscript𝜈𝑖\nu_{i}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT distinct and in decreasing order of magnitude and 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT unit vectors with 𝐛12,,𝐛r2superscriptsubscript𝐛1tensor-productabsent2superscriptsubscript𝐛𝑟tensor-productabsent2\mathbf{b}_{1}^{\otimes 2},\ldots,\mathbf{b}_{r}^{\otimes 2}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT linearly independent. Assume that |𝐛i,𝐛j|ϵsubscript𝐛𝑖subscript𝐛𝑗italic-ϵ|\langle\mathbf{b}_{i},\mathbf{b}_{j}\rangle|\leq\epsilon| ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | ≤ italic_ϵ for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j, for small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Suppose the output of rank r𝑟ritalic_r HTD applied to T𝑇Titalic_T is T=i=1rνi𝐛i4superscript𝑇superscriptsubscript𝑖1𝑟superscriptsubscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4T^{\prime}=\sum_{i=1}^{r}\nu_{i}^{\prime}\mathbf{b}_{i}^{\prime\otimes 4}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ⊗ 4 end_POSTSUPERSCRIPT with νisuperscriptsubscript𝜈𝑖\nu_{i}^{\prime}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in decreasing order of magnitude and 𝐛isuperscriptsubscript𝐛𝑖\mathbf{b}_{i}^{\prime}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT unit vectors. Then

|νiνi|=|νi|(Kν)12𝒪(ϵ)andmin{𝐛i𝐛i,𝐛i+𝐛i}=(Kν)14𝒪(ϵ12),formulae-sequencesubscript𝜈𝑖subscriptsuperscript𝜈𝑖subscript𝜈𝑖superscript𝐾𝜈12𝒪italic-ϵandnormsubscript𝐛𝑖superscriptsubscript𝐛𝑖normsubscript𝐛𝑖superscriptsubscript𝐛𝑖superscript𝐾𝜈14𝒪superscriptitalic-ϵ12{|\nu_{i}-\nu^{\prime}_{i}|}=|\nu_{i}|\left(\frac{K}{\nu}\right)^{\frac{1}{2}}% \mathcal{O}(\epsilon)\quad\text{and}\quad\min\{\|\mathbf{b}_{i}-\mathbf{b}_{i}% ^{\prime}\|,\|\mathbf{b}_{i}+\mathbf{b}_{i}^{\prime}\|\}=\left({\frac{K}{\nu}}% \right)^{\frac{1}{4}}\mathcal{O}(\epsilon^{\frac{1}{2}}),| italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT caligraphic_O ( italic_ϵ ) and roman_min { ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ } = ( divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ,

where ν=min{|νiνj|,|νi|:1i<jr}𝜈subscript𝜈𝑖subscript𝜈𝑗:subscript𝜈𝑖1𝑖𝑗𝑟\nu=\min\{|\nu_{i}-\nu_{j}|,|\nu_{i}|:1\leq i<j\leq r\}italic_ν = roman_min { | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | , | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | : 1 ≤ italic_i < italic_j ≤ italic_r } and K=i=1r|νi|(2i+14)𝐾superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscript2𝑖14K=\sum_{i=1}^{r}|\nu_{i}|(2^{i+1}-4)italic_K = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT - 4 ).

Note that the quantity min{𝐛i𝐛i,𝐛i+𝐛i}normsubscript𝐛𝑖superscriptsubscript𝐛𝑖normsubscript𝐛𝑖superscriptsubscript𝐛𝑖\min\{\|\mathbf{b}_{i}-\mathbf{b}_{i}^{\prime}\|,\|\mathbf{b}_{i}+\mathbf{b}_{% i}^{\prime}\|\}roman_min { ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ } arises in Theorem 3.4, because of the sign indeterminacy in the vectors in the decompositions, due to the equality (𝐛i)d=𝐛idsuperscriptsubscript𝐛𝑖tensor-productabsent𝑑superscriptsubscript𝐛𝑖tensor-productabsent𝑑(-\mathbf{b}_{i})^{\otimes d}=\mathbf{b}_{i}^{\otimes d}( - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT = bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT for d𝑑ditalic_d even.

We prove Theorem 3.4 via two lemmas. The condition that the matrices 𝐛12,,𝐛r2superscriptsubscript𝐛1tensor-productabsent2superscriptsubscript𝐛𝑟tensor-productabsent2\mathbf{b}_{1}^{\otimes 2},\ldots,\mathbf{b}_{r}^{\otimes 2}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT are linearly independent ensures that Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) has rank r𝑟ritalic_r. This condition holds for generic vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, provided r(p+12)𝑟binomial𝑝12r\leq{p+1\choose 2}italic_r ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ).

Lemma 3.5.

Assume the hypotheses of Theorem 3.4. Let 𝐁isubscript𝐁𝑖\mathbf{B}_{i}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the vectorization of 𝐛i2superscriptsubscript𝐛𝑖tensor-productabsent2\mathbf{b}_{i}^{\otimes 2}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT for i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ] and define M=i=1rνi𝐁i2𝑀superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsubscript𝐁𝑖tensor-productabsent2M=\sum_{i=1}^{r}\nu_{i}\mathbf{B}_{i}^{\otimes 2}italic_M = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT, the flattening of tensor T𝑇Titalic_T. Then there exists a matrix Msuperscript𝑀M^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with eigendecomposition

(7) M=i=1rνi(𝐁i)2superscript𝑀superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscriptsuperscriptsubscript𝐁𝑖tensor-productabsent2M^{\prime}=\sum_{i=1}^{r}\nu_{i}(\mathbf{B}_{i}^{\prime})^{\otimes 2}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT

such that 𝐁i𝐁i(2i2)ϵ2normsuperscriptsubscript𝐁𝑖subscript𝐁𝑖superscript2𝑖2superscriptitalic-ϵ2\|\mathbf{B}_{i}^{\prime}-\mathbf{B}_{i}\|\leq(2^{i}-2)\epsilon^{2}∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ] and MM(i=1r|νi|(2i+14))ϵ2.norm𝑀superscript𝑀superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscript2𝑖14superscriptitalic-ϵ2\|M-M^{\prime}\|\leq(\sum_{i=1}^{r}|\nu_{i}|(2^{i+1}-4))\epsilon^{2}.∥ italic_M - italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT - 4 ) ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Proof.

Our first goal is to construct the vectors 𝐁ip2superscriptsubscript𝐁𝑖superscriptsuperscript𝑝2\mathbf{B}_{i}^{\prime}\in\mathbb{R}^{p^{2}}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. We generate orthogonal vectors via Gram-Schmidt:

𝐁j′′:=𝐁ji=1j1𝐁i′′,𝐁j𝐁i′′.assignsuperscriptsubscript𝐁𝑗′′subscript𝐁𝑗superscriptsubscript𝑖1𝑗1superscriptsubscript𝐁𝑖′′subscript𝐁𝑗superscriptsubscript𝐁𝑖′′\mathbf{B}_{j}^{\prime\prime}:=\mathbf{B}_{j}-\sum_{i=1}^{j-1}\langle\mathbf{B% }_{i}^{\prime\prime},\mathbf{B}_{j}\rangle\mathbf{B}_{i}^{\prime\prime}.bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT := bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT .

The vectors 𝐁isubscript𝐁𝑖\mathbf{B}_{i}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT satisfy 𝐁i=1normsubscript𝐁𝑖1\|\mathbf{B}_{i}\|=1∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = 1 for all i𝑖iitalic_i and |𝐁i,𝐁j|=|𝐛i,𝐛j|2ϵ2subscript𝐁𝑖subscript𝐁𝑗superscriptsubscript𝐛𝑖subscript𝐛𝑗2superscriptitalic-ϵ2|\langle\mathbf{B}_{i},\mathbf{B}_{j}\rangle|=|\langle\mathbf{b}_{i},\mathbf{b% }_{j}\rangle|^{2}\leq\epsilon^{2}| ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | = | ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j. We will prove by induction on i𝑖iitalic_i that

𝐁i′′𝐁i(2i11)ϵ2 and |𝐁i′′,𝐁j|2i1ϵ2 for all j>i.normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖superscript2𝑖11superscriptitalic-ϵ2 and superscriptsubscript𝐁𝑖′′subscript𝐁𝑗superscript2𝑖1superscriptitalic-ϵ2 for all 𝑗𝑖\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}\|\leq(2^{i-1}-1)\epsilon^{2}\ % \text{ and }\ |\langle\mathbf{B}_{i}^{\prime\prime},\mathbf{B}_{j}\rangle|\leq 2% ^{i-1}\epsilon^{2}\text{ for all }j>i.∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ( 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT - 1 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and | ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | ≤ 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all italic_j > italic_i .

The proofs rely on the inequalities

𝐁j′′𝐁ji=1j1|𝐁i′′,𝐁j|normsuperscriptsubscript𝐁𝑗′′subscript𝐁𝑗superscriptsubscript𝑖1𝑗1superscriptsubscript𝐁𝑖′′subscript𝐁𝑗\displaystyle\|\mathbf{B}_{j}^{\prime\prime}-\mathbf{B}_{j}\|\leq\sum_{i=1}^{j% -1}|\langle\mathbf{B}_{i}^{\prime\prime},\mathbf{B}_{j}\rangle|∥ bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT | ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ |

and

|𝐁i′′,𝐁j|=|𝐁i,𝐁j+𝐁i′′𝐁i,𝐁j||𝐁i,𝐁j|+𝐁i′′𝐁i,superscriptsubscript𝐁𝑖′′subscript𝐁𝑗subscript𝐁𝑖subscript𝐁𝑗subscriptsuperscript𝐁′′𝑖subscript𝐁𝑖subscript𝐁𝑗subscript𝐁𝑖subscript𝐁𝑗normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖\displaystyle|\langle\mathbf{B}_{i}^{\prime\prime},\mathbf{B}_{j}\rangle|=|% \langle\mathbf{B}_{i},\mathbf{B}_{j}\rangle+\langle\mathbf{B}^{\prime\prime}_{% i}-\mathbf{B}_{i},\mathbf{B}_{j}\rangle|\leq|\langle\mathbf{B}_{i},\mathbf{B}_% {j}\rangle|+\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}\|,| ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | = | ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ + ⟨ bold_B start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | ≤ | ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ,

where the last inequality uses Cauchy-Schwarz. When i=1𝑖1i=1italic_i = 1, 𝐁i′′𝐁i=0(201)ϵ2normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖0superscript201superscriptitalic-ϵ2\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}\|=0\leq(2^{0}-1)\epsilon^{2}∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = 0 ≤ ( 2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - 1 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and |𝐁1′′,𝐁j|ϵ2+0=211ϵ2superscriptsubscript𝐁1′′subscript𝐁𝑗superscriptitalic-ϵ20superscript211superscriptitalic-ϵ2|\langle\mathbf{B}_{1}^{\prime\prime},\mathbf{B}_{j}\rangle|\leq\epsilon^{2}+0% =2^{1-1}\epsilon^{2}| ⟨ bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | ≤ italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 0 = 2 start_POSTSUPERSCRIPT 1 - 1 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, as desired. Suppose the statement is true for all k<i𝑘𝑖k<iitalic_k < italic_i. Then,

𝐁i′′𝐁ik=1i1|𝐁k′′,𝐁i|k=1i12k1ϵ2=(2i11)ϵ2normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖superscriptsubscript𝑘1𝑖1superscriptsubscript𝐁𝑘′′subscript𝐁𝑖superscriptsubscript𝑘1𝑖1superscript2𝑘1superscriptitalic-ϵ2superscript2𝑖11superscriptitalic-ϵ2\displaystyle\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}\|\leq\sum_{k=1}^{i% -1}|\langle\mathbf{B}_{k}^{\prime\prime},\mathbf{B}_{i}\rangle|\leq\sum_{k=1}^% {i-1}2^{k-1}\epsilon^{2}=(2^{i-1}-1)\epsilon^{2}∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT | ⟨ bold_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ | ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT - 1 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

and

|𝐁i′′,𝐁j|ϵ2+(2i11)ϵ2=2i1ϵ2.superscriptsubscript𝐁𝑖′′subscript𝐁𝑗superscriptitalic-ϵ2superscript2𝑖11superscriptitalic-ϵ2superscript2𝑖1superscriptitalic-ϵ2\displaystyle|\langle\mathbf{B}_{i}^{\prime\prime},\mathbf{B}_{j}\rangle|\leq% \epsilon^{2}+(2^{i-1}-1)\epsilon^{2}=2^{i-1}\epsilon^{2}.| ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | ≤ italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT - 1 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

This concludes the induction. We now define 𝐁i=1𝐁i′′𝐁i′′superscriptsubscript𝐁𝑖1normsuperscriptsubscript𝐁𝑖′′superscriptsubscript𝐁𝑖′′\mathbf{B}_{i}^{\prime}=\frac{1}{\|\mathbf{B}_{i}^{\prime\prime}\|}\mathbf{B}_% {i}^{\prime\prime}bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ end_ARG bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT. Then 𝐁1,,𝐁rsubscriptsuperscript𝐁1subscriptsuperscript𝐁𝑟\mathbf{B}^{\prime}_{1},\ldots,\mathbf{B}^{\prime}_{r}bold_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are orthonormal, hence (7) is the eigendecomposition of Msuperscript𝑀M^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Moreover,

𝐁i𝐁i𝐁i𝐁i′′+𝐁i′′𝐁i=|𝐁i′′1|+𝐁i′′𝐁i2𝐁i′′𝐁i(2i2)ϵ2.normsuperscriptsubscript𝐁𝑖subscript𝐁𝑖normsuperscriptsubscript𝐁𝑖superscriptsubscript𝐁𝑖′′normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖normsuperscriptsubscript𝐁𝑖′′1normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖2normsuperscriptsubscript𝐁𝑖′′subscript𝐁𝑖superscript2𝑖2superscriptitalic-ϵ2\|\mathbf{B}_{i}^{\prime}-\mathbf{B}_{i}\|\leq\|\mathbf{B}_{i}^{\prime}-% \mathbf{B}_{i}^{\prime\prime}\|+\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}% \|=|\|\mathbf{B}_{i}^{\prime\prime}\|-1|+\|\mathbf{B}_{i}^{\prime\prime}-% \mathbf{B}_{i}\|\leq 2\|\mathbf{B}_{i}^{\prime\prime}-\mathbf{B}_{i}\|\leq(2^{% i}-2)\epsilon^{2}.∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = | ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ - 1 | + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ 2 ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

It remains to bound MMnorm𝑀superscript𝑀\|M-M^{\prime}\|∥ italic_M - italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥. We have MMi=1r|νi|(𝐁i)2𝐁i2norm𝑀superscript𝑀superscriptsubscript𝑖1𝑟subscript𝜈𝑖normsuperscriptsuperscriptsubscript𝐁𝑖tensor-productabsent2superscriptsubscript𝐁𝑖tensor-productabsent2\|M-M^{\prime}\|\leq\sum_{i=1}^{r}|\nu_{i}|\|(\mathbf{B}_{i}^{\prime})^{% \otimes 2}-\mathbf{B}_{i}^{\otimes 2}\|∥ italic_M - italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ∥ ( bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥. The summands satisfy

(𝐁i)2𝐁i2normsuperscriptsuperscriptsubscript𝐁𝑖tensor-productabsent2superscriptsubscript𝐁𝑖tensor-productabsent2absent\displaystyle\|(\mathbf{B}_{i}^{\prime})^{\otimes 2}-\mathbf{B}_{i}^{\otimes 2% }\|\leq∥ ( bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ ≤ (𝐁i)2𝐁i𝐁i+𝐁i𝐁i(𝐁i)2normsuperscriptsuperscriptsubscript𝐁𝑖tensor-productabsent2tensor-productsubscript𝐁𝑖superscriptsubscript𝐁𝑖normtensor-productsubscript𝐁𝑖superscriptsubscript𝐁𝑖superscriptsubscriptsuperscript𝐁𝑖tensor-productabsent2\displaystyle\|(\mathbf{B}_{i}^{\prime})^{\otimes 2}-\mathbf{B}_{i}\otimes% \mathbf{B}_{i}^{\prime}\|+\|\mathbf{B}_{i}\otimes\mathbf{B}_{i}^{\prime}-(% \mathbf{B}^{\prime}_{i})^{\otimes 2}\|∥ ( bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - ( bold_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥
=\displaystyle== 𝐁i′′𝐁i𝐁i+𝐁i𝐁i𝐁inormsuperscriptsubscript𝐁𝑖′′normsuperscriptsubscript𝐁𝑖subscript𝐁𝑖normsubscript𝐁𝑖normsuperscriptsubscript𝐁𝑖subscript𝐁𝑖\displaystyle\|\mathbf{B}_{i}^{\prime\prime}\|\cdot\|\mathbf{B}_{i}^{\prime}-% \mathbf{B}_{i}\|+\|\mathbf{B}_{i}\|\cdot\|\mathbf{B}_{i}^{\prime}-\mathbf{B}_{% i}\|∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥ ⋅ ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ⋅ ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
=\displaystyle== 2𝐁i𝐁i(2i+14)ϵ2,2normsuperscriptsubscript𝐁𝑖subscript𝐁𝑖superscript2𝑖14superscriptitalic-ϵ2\displaystyle 2\|\mathbf{B}_{i}^{\prime}-\mathbf{B}_{i}\|\leq(2^{i+1}-4)% \epsilon^{2},2 ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ( 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT - 4 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first and third lines follow from the triangle inequality and the second follows from 𝐚𝐛=𝐚𝐛normtensor-product𝐚𝐛norm𝐚norm𝐛\|\mathbf{a}\otimes\mathbf{b}\|=\|\mathbf{a}\|\cdot\|\mathbf{b}\|∥ bold_a ⊗ bold_b ∥ = ∥ bold_a ∥ ⋅ ∥ bold_b ∥. Collecting the summands proves the claim. ∎

Lemma 3.6.

Assume the setup of Theorem 3.4 and let Msuperscript𝑀M^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the matrix from Lemma 3.5. Then the eigenvalues and eigenvectors of M𝑀Mitalic_M and Msuperscript𝑀M^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are close: let i=1rμi𝐯i2superscriptsubscript𝑖1𝑟subscript𝜇𝑖superscriptsubscript𝐯𝑖tensor-productabsent2\sum_{i=1}^{r}\mu_{i}\mathbf{v}_{i}^{\otimes 2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT be the eigendecomposition of M𝑀Mitalic_M, then

|μiνi|Kϵ2andmin{𝐯i𝐁i,𝐯i+𝐁i}2Kνϵ,formulae-sequencesubscript𝜇𝑖subscript𝜈𝑖𝐾superscriptitalic-ϵ2andnormsubscript𝐯𝑖superscriptsubscript𝐁𝑖normsubscript𝐯𝑖superscriptsubscript𝐁𝑖2𝐾𝜈italic-ϵ|\mu_{i}-\nu_{i}|\leq K\epsilon^{2}\quad\text{and}\quad\min\{\|\mathbf{v}_{i}-% \mathbf{B}_{i}^{\prime}\|,\|\mathbf{v}_{i}+\mathbf{B}_{i}^{\prime}\|\}\leq 2% \sqrt{\frac{K}{\nu}}\epsilon,| italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and roman_min { ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ } ≤ 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ ,

where ν1,,νrsubscript𝜈1subscript𝜈𝑟\nu_{1},\ldots,\nu_{r}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ν start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and μ1,,μrsubscript𝜇1subscript𝜇𝑟\mu_{1},\ldots,\mu_{r}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are ordered by decreasing magnitude and K=i=1r|νi|(2i+14)𝐾superscriptsubscript𝑖1𝑟subscript𝜈𝑖superscript2𝑖14K=\sum_{i=1}^{r}|\nu_{i}|(2^{i+1}-4)italic_K = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ( 2 start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT - 4 ).

Proof.

We have the bound

|μiνi|MMKϵ2,subscript𝜇𝑖subscript𝜈𝑖norm𝑀superscript𝑀𝐾superscriptitalic-ϵ2|\mu_{i}-\nu_{i}|\leq\|M-M^{\prime}\|\leq K\epsilon^{2},| italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ∥ italic_M - italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first inequality follows from Weyl’s inequality [Wey12] and the second from Lemma 3.5. For the similarity of eigenvectors, we lower bound |𝐁i,𝐯i|superscriptsubscript𝐁𝑖subscript𝐯𝑖|\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}\rangle|| ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ | using the difference M𝐯iνi𝐯isuperscript𝑀subscript𝐯𝑖subscript𝜈𝑖subscript𝐯𝑖M^{\prime}\mathbf{v}_{i}-\nu_{i}\mathbf{v}_{i}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We have

M𝐯iνi𝐯inormsuperscript𝑀subscript𝐯𝑖subscript𝜈𝑖subscript𝐯𝑖\displaystyle\|M^{\prime}\mathbf{v}_{i}-\nu_{i}\mathbf{v}_{i}\|∥ italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ =(MM)𝐯i+(μiνi)𝐯iabsentnormsuperscript𝑀𝑀subscript𝐯𝑖subscript𝜇𝑖subscript𝜈𝑖subscript𝐯𝑖\displaystyle=\|(M^{\prime}-M)\mathbf{v}_{i}+(\mu_{i}-\nu_{i})\mathbf{v}_{i}\|= ∥ ( italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_M ) bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
MM2+|μiνi|absentsubscriptnormsuperscript𝑀𝑀2subscript𝜇𝑖subscript𝜈𝑖\displaystyle\leq\|M^{\prime}-M\|_{2}+|\mu_{i}-\nu_{i}|≤ ∥ italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_M ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
MM+|μiνi|2Kϵ2,absentnormsuperscript𝑀𝑀subscript𝜇𝑖subscript𝜈𝑖2𝐾superscriptitalic-ϵ2\displaystyle\leq\|M^{\prime}-M\|+|\mu_{i}-\nu_{i}|\leq 2K\epsilon^{2},≤ ∥ italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_M ∥ + | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ 2 italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the second inequality follows from the triangle inequality and the definition of the 2-norm and the third inequality follows from the fact that the 2-norm for matrices is bounded above by the Frobenius norm. If r<p2,𝑟superscript𝑝2r<p^{2},italic_r < italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , we complete 𝐁1,,𝐁rsuperscriptsubscript𝐁1superscriptsubscript𝐁𝑟\mathbf{B}_{1}^{\prime},\ldots,\mathbf{B}_{r}^{\prime}bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to an orthonormal basis 𝐁1,,𝐁p2superscriptsubscript𝐁1superscriptsubscript𝐁superscript𝑝2\mathbf{B}_{1}^{\prime},\ldots,\mathbf{B}_{p^{2}}^{\prime}bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of p2superscriptsuperscript𝑝2\mathbb{R}^{p^{2}}blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Then 𝐯i=i=1p2𝐁i,𝐯i𝐁isubscript𝐯𝑖superscriptsubscript𝑖1superscript𝑝2superscriptsubscript𝐁𝑖subscript𝐯𝑖superscriptsubscript𝐁𝑖\mathbf{v}_{i}=\sum_{i=1}^{p^{2}}\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}% \rangle\mathbf{B}_{i}^{\prime}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Hence

M𝐯iνi𝐯i=j=1r(νjνi)𝐁j,𝐯i𝐁jνii=r+1p2𝐁i,𝐯i𝐁i.superscript𝑀subscript𝐯𝑖subscript𝜈𝑖subscript𝐯𝑖superscriptsubscript𝑗1𝑟subscript𝜈𝑗subscript𝜈𝑖superscriptsubscript𝐁𝑗subscript𝐯𝑖superscriptsubscript𝐁𝑗subscript𝜈𝑖superscriptsubscript𝑖𝑟1superscript𝑝2superscriptsubscript𝐁𝑖subscript𝐯𝑖superscriptsubscript𝐁𝑖M^{\prime}\mathbf{v}_{i}-\nu_{i}\mathbf{v}_{i}=\sum_{j=1}^{r}(\nu_{j}-\nu_{i})% \langle\mathbf{B}_{j}^{\prime},\mathbf{v}_{i}\rangle\mathbf{B}_{j}^{\prime}-% \nu_{i}\sum_{i=r+1}^{p^{2}}\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}% \rangle\mathbf{B}_{i}^{\prime}.italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⟨ bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .

Then M𝐯iνi𝐯i2j=1,jip2ν2𝐁j,𝐯i2=ν2(1𝐁i,𝐯i2)superscriptnormsuperscript𝑀subscript𝐯𝑖subscript𝜈𝑖subscript𝐯𝑖2superscriptsubscriptformulae-sequence𝑗1𝑗𝑖superscript𝑝2superscript𝜈2superscriptsuperscriptsubscript𝐁𝑗subscript𝐯𝑖2superscript𝜈21superscriptsuperscriptsubscript𝐁𝑖subscript𝐯𝑖2\|M^{\prime}\mathbf{v}_{i}-\nu_{i}\mathbf{v}_{i}\|^{2}\geq\sum_{j=1,j\neq i}^{% p^{2}}\nu^{2}\langle\mathbf{B}_{j}^{\prime},\mathbf{v}_{i}\rangle^{2}=\nu^{2}(% 1-\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}\rangle^{2})∥ italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_j = 1 , italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟨ bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where ν=min{|νiνj|,|νi|:1i<jr}𝜈subscript𝜈𝑖subscript𝜈𝑗:subscript𝜈𝑖1𝑖𝑗𝑟\nu=\min\{|\nu_{i}-\nu_{j}|,|\nu_{i}|:1\leq i<j\leq r\}italic_ν = roman_min { | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | , | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | : 1 ≤ italic_i < italic_j ≤ italic_r }. Then

|𝐁i,𝐯i|(1(2Kνϵ2)2)1212Kνϵ2.superscriptsubscript𝐁𝑖subscript𝐯𝑖superscript1superscript2𝐾𝜈superscriptitalic-ϵ221212𝐾𝜈superscriptitalic-ϵ2|\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}\rangle|\geq{\left(1-(\frac{2K}{% \nu}\epsilon^{2})^{2}\right)}^{\frac{1}{2}}\geq 1-\frac{2K}{\nu}\epsilon^{2}.| ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ | ≥ ( 1 - ( divide start_ARG 2 italic_K end_ARG start_ARG italic_ν end_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≥ 1 - divide start_ARG 2 italic_K end_ARG start_ARG italic_ν end_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since 𝐯i±𝐁i2=𝐯i2+𝐁i2±2𝐁i,𝐯isuperscriptnormplus-or-minussubscript𝐯𝑖superscriptsubscript𝐁𝑖2plus-or-minussuperscriptnormsubscript𝐯𝑖2superscriptnormsuperscriptsubscript𝐁𝑖22superscriptsubscript𝐁𝑖subscript𝐯𝑖\|\mathbf{v}_{i}\pm\mathbf{B}_{i}^{\prime}\|^{2}=\|\mathbf{v}_{i}\|^{2}+\|% \mathbf{B}_{i}^{\prime}\|^{2}\pm 2\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}\rangle∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ± bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ± 2 ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩, we have

min{𝐯i𝐁i,𝐯i+𝐁i}=𝐯i2+𝐁i22|𝐁i,𝐯i|2Kνϵ.normsubscript𝐯𝑖superscriptsubscript𝐁𝑖normsubscript𝐯𝑖superscriptsubscript𝐁𝑖superscriptnormsubscript𝐯𝑖2superscriptnormsuperscriptsubscript𝐁𝑖22superscriptsubscript𝐁𝑖subscript𝐯𝑖2𝐾𝜈italic-ϵ\min\{\|\mathbf{v}_{i}-\mathbf{B}_{i}^{\prime}\|,\|\mathbf{v}_{i}+\mathbf{B}_{% i}^{\prime}\|\}=\sqrt{\|\mathbf{v}_{i}\|^{2}+\|\mathbf{B}_{i}^{\prime}\|^{2}-2% |\langle\mathbf{B}_{i}^{\prime},\mathbf{v}_{i}\rangle|}\leq 2\sqrt{\frac{K}{% \nu}}\epsilon.\qedroman_min { ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ } = square-root start_ARG ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 | ⟨ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ | end_ARG ≤ 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ . italic_∎
Proof of Theorem 3.4.

For i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ], the matrix Mip×psubscript𝑀𝑖superscript𝑝𝑝M_{i}\in\mathbb{R}^{p\times p}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT reshapes the unit vector 𝐯ip2subscript𝐯𝑖superscriptsuperscript𝑝2\mathbf{v}_{i}\in\mathbb{R}^{p^{2}}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT into a p×p𝑝𝑝p\times pitalic_p × italic_p matrix. The output of HTD is Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as in the statement, where 𝐛1,,𝐛rsuperscriptsubscript𝐛1superscriptsubscript𝐛𝑟\mathbf{b}_{1}^{\prime},\ldots,\mathbf{b}_{r}^{\prime}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are the eigenvectors of M1,,Mrsubscript𝑀1subscript𝑀𝑟M_{1},\ldots,M_{r}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, respectively. The output of HTD has r𝑟ritalic_r non-zero summands, since the matrix Mat(T)Mat𝑇\operatorname{Mat}(T)roman_Mat ( italic_T ) has rank r𝑟ritalic_r. This follows from the fact that vectors 𝐛12,,𝐛r2superscriptsubscript𝐛1tensor-productabsent2superscriptsubscript𝐛𝑟tensor-productabsent2\mathbf{b}_{1}^{\otimes 2},\ldots,\mathbf{b}_{r}^{\otimes 2}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT are linearly independent. Then,

Mi𝐛i2=𝐯i𝐁i𝐯i𝐁i+𝐁i𝐁i=2Kνϵ+(2i2)ϵ2.normsubscript𝑀𝑖superscriptsubscript𝐛𝑖tensor-productabsent2normsubscript𝐯𝑖subscript𝐁𝑖normsubscript𝐯𝑖superscriptsubscript𝐁𝑖normsubscript𝐁𝑖superscriptsubscript𝐁𝑖2𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2\|M_{i}-\mathbf{b}_{i}^{\otimes 2}\|=\|\mathbf{v}_{i}-\mathbf{B}_{i}\|\leq\|% \mathbf{v}_{i}-\mathbf{B}_{i}^{\prime}\|+\|\mathbf{B}_{i}-\mathbf{B}_{i}^{% \prime}\|=2\sqrt{\frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2}.∥ italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ = ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ + ∥ bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ = 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We use the distance between Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐛i2superscriptsubscript𝐛𝑖tensor-productabsent2\mathbf{b}_{i}^{\otimes 2}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT to bound the distance between their eigenvectors and eigenvalues:

|βi1|Mi𝐛i22Kνϵ+(2i2)ϵ2,subscript𝛽𝑖1normsubscript𝑀𝑖superscriptsubscript𝐛𝑖tensor-productabsent22𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2|\beta_{i}-1|\leq\|M_{i}-\mathbf{b}_{i}^{\otimes 2}\|\leq 2\sqrt{\frac{K}{\nu}% }\epsilon+(2^{i}-2)\epsilon^{2},| italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | ≤ ∥ italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ ≤ 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

by Weyl’s inequality. Note from Algorithm 1 that νi=μiβi2superscriptsubscript𝜈𝑖subscript𝜇𝑖superscriptsubscript𝛽𝑖2\nu_{i}^{\prime}=\mu_{i}\beta_{i}^{2}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Hence

|νiνi|superscriptsubscript𝜈𝑖subscript𝜈𝑖\displaystyle|\nu_{i}^{\prime}-\nu_{i}|| italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | |μiβi2μi|+|μiνi|absentsubscript𝜇𝑖superscriptsubscript𝛽𝑖2subscript𝜇𝑖subscript𝜇𝑖subscript𝜈𝑖\displaystyle\leq|\mu_{i}\beta_{i}^{2}-\mu_{i}|+|\mu_{i}-\nu_{i}|≤ | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
|μi(βi+1)||βi1|+|μiνi|absentsubscript𝜇𝑖subscript𝛽𝑖1subscript𝛽𝑖1subscript𝜇𝑖subscript𝜈𝑖\displaystyle\leq|\mu_{i}(\beta_{i}+1)|\cdot|\beta_{i}-1|+|\mu_{i}-\nu_{i}|≤ | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 ) | ⋅ | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | + | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
(|νi|+|νiμi|)(2+|βi1|)|βi1|+|μiνi|absentsubscript𝜈𝑖subscript𝜈𝑖subscript𝜇𝑖2subscript𝛽𝑖1subscript𝛽𝑖1subscript𝜇𝑖subscript𝜈𝑖\displaystyle\leq(|\nu_{i}|+|\nu_{i}-\mu_{i}|)(2+|\beta_{i}-1|)|\beta_{i}-1|+|% \mu_{i}-\nu_{i}|≤ ( | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) ( 2 + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | ) | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | + | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
(|νi|+Kϵ2)(2+2Kνϵ+(2i2)ϵ2)(2Kνϵ+(2i2)ϵ2)+Kϵ2absentsubscript𝜈𝑖𝐾superscriptitalic-ϵ222𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ22𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2𝐾superscriptitalic-ϵ2\displaystyle\leq(|\nu_{i}|+K\epsilon^{2})(2+2\sqrt{\frac{K}{\nu}}\epsilon+(2^% {i}-2)\epsilon^{2})(2\sqrt{\frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2})+K% \epsilon^{2}≤ ( | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( 2 + 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_K italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=|νi|Kν𝒪(ϵ),absentsubscript𝜈𝑖𝐾𝜈𝒪italic-ϵ\displaystyle=|\nu_{i}|\sqrt{\frac{K}{\nu}}\mathcal{O}(\epsilon),= | italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG caligraphic_O ( italic_ϵ ) ,

where the first and second inequalities follow from the triangle inequality and the third is obtained via substituting in the bounds |βi1|subscript𝛽𝑖1|\beta_{i}-1|| italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | and |νiμi|subscript𝜈𝑖subscript𝜇𝑖|\nu_{i}-\mu_{i}|| italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, obtained above.

To bound 𝐛i𝐛inormsubscript𝐛𝑖superscriptsubscript𝐛𝑖\|\mathbf{b}_{i}-\mathbf{b}_{i}^{\prime}\|∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥, we consider (𝐛i2)𝐛i𝐛isuperscriptsubscript𝐛𝑖tensor-productabsent2superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖(\mathbf{b}_{i}^{\otimes 2})\mathbf{b}_{i}^{\prime}-\mathbf{b}_{i}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. On the one hand, we have

(𝐛i2)𝐛i𝐛inormsuperscriptsubscript𝐛𝑖tensor-productabsent2superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖\displaystyle\|(\mathbf{b}_{i}^{\otimes 2})\mathbf{b}_{i}^{\prime}-\mathbf{b}_% {i}^{\prime}\|∥ ( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ =(𝐛i2Mi)𝐛i+(MiI)𝐛iMi𝐛i22+|βi1|absentnormsuperscriptsubscript𝐛𝑖tensor-productabsent2subscript𝑀𝑖superscriptsubscript𝐛𝑖subscript𝑀𝑖𝐼superscriptsubscript𝐛𝑖subscriptnormsubscript𝑀𝑖superscriptsubscript𝐛𝑖tensor-productabsent22subscript𝛽𝑖1\displaystyle=\|(\mathbf{b}_{i}^{\otimes 2}-M_{i})\mathbf{b}_{i}^{\prime}+(M_{% i}-I)\mathbf{b}_{i}^{\prime}\|\leq\|M_{i}-\mathbf{b}_{i}^{\otimes 2}\|_{2}+|% \beta_{i}-1|= ∥ ( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT - italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_I ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ ∥ italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 |
Mi𝐛i2+|βi1|2(2Kνϵ+(2i2)ϵ2).absentnormsubscript𝑀𝑖superscriptsubscript𝐛𝑖tensor-productabsent2subscript𝛽𝑖122𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2\displaystyle\leq\|M_{i}-\mathbf{b}_{i}^{\otimes 2}\|+|\beta_{i}-1|\leq 2(2% \sqrt{\frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2}).≤ ∥ italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∥ + | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 | ≤ 2 ( 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

On the other hand, (𝐛i2)𝐛i𝐛i=𝐛i𝐛i,𝐛i𝐛isuperscriptsubscript𝐛𝑖tensor-productabsent2superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖subscript𝐛𝑖subscript𝐛𝑖superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖(\mathbf{b}_{i}^{\otimes 2})\mathbf{b}_{i}^{\prime}-\mathbf{b}_{i}^{\prime}=% \mathbf{b}_{i}\langle\mathbf{b}_{i},\mathbf{b}_{i}^{\prime}\rangle-\mathbf{b}_% {i}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (𝐛i2)𝐛i𝐛i2=1𝐛i,𝐛i2superscriptnormsuperscriptsubscript𝐛𝑖tensor-productabsent2superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖21superscriptsubscript𝐛𝑖superscriptsubscript𝐛𝑖2\|(\mathbf{b}_{i}^{\otimes 2})\mathbf{b}_{i}^{\prime}-\mathbf{b}_{i}^{\prime}% \|^{2}=1-\langle\mathbf{b}_{i},\mathbf{b}_{i}^{\prime}\rangle^{2}∥ ( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 - ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. So, |𝐛i,𝐛i|(1(2(2Kνϵ+(2i2)ϵ2))2)1212(2Kνϵ+(2i2)ϵ2)subscript𝐛𝑖superscriptsubscript𝐛𝑖superscript1superscript22𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2212122𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ2|\langle\mathbf{b}_{i},\mathbf{b}_{i}^{\prime}\rangle|\geq{\left(1-(2(2\sqrt{% \frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2}))^{2}\right)}^{\frac{1}{2}}\geq 1% -2(2\sqrt{\frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2})| ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ | ≥ ( 1 - ( 2 ( 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≥ 1 - 2 ( 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Hence,

min{𝐛i𝐛i,𝐛i+𝐛i}=22𝐛i,𝐛i2(2Kνϵ+(2i2)ϵ2)12.normsubscript𝐛𝑖superscriptsubscript𝐛𝑖normsubscript𝐛𝑖superscriptsubscript𝐛𝑖22subscript𝐛𝑖superscriptsubscript𝐛𝑖2superscript2𝐾𝜈italic-ϵsuperscript2𝑖2superscriptitalic-ϵ212\displaystyle\min\{\|\mathbf{b}_{i}-\mathbf{b}_{i}^{\prime}\|,\|\mathbf{b}_{i}% +\mathbf{b}_{i}^{\prime}\|\}=\sqrt{2-2\langle\mathbf{b}_{i},\mathbf{b}_{i}^{% \prime}\rangle}\leq 2{\left(2\sqrt{\frac{K}{\nu}}\epsilon+(2^{i}-2)\epsilon^{2% }\right)}^{\frac{1}{2}}.roman_min { ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ } = square-root start_ARG 2 - 2 ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ end_ARG ≤ 2 ( 2 square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG end_ARG italic_ϵ + ( 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - 2 ) italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

The last expression simplifies to (Kν)14𝒪(ϵ12)superscript𝐾𝜈14𝒪superscriptitalic-ϵ12({\frac{K}{\nu}})^{\frac{1}{4}}\mathcal{O}(\epsilon^{\frac{1}{2}})( divide start_ARG italic_K end_ARG start_ARG italic_ν end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ). ∎

3.3. Comparison of HTD with other hierarchical tensor decompositions

We compare HTD in Algorithm 1 to other hierarchical tensor decompositions. The goal of hierarchical tensor decomposition [Hac12, Chapter 11] is to efficiently represent a tensor that lives in a high-dimensional space. Given a tensor of order d𝑑ditalic_d, a hierarchical decomposition is based on a hierarchy of vector spaces given by a dimension partition tree on indices {1,,d}1𝑑\{1,\ldots,d\}{ 1 , … , italic_d }, such as those in Figure 1.

{forest}

[{1,2,,d}12𝑑\{1,2,\ldots,d\}{ 1 , 2 , … , italic_d } [{1}1\{1\}{ 1 }] [{2,,d}2𝑑\{2,\ldots,d\}{ 2 , … , italic_d } [{2}2\{2\}{ 2 }] [\vdots [{d1}𝑑1\{d-1\}{ italic_d - 1 }] [{d}𝑑\{d\}{ italic_d }] ] ] ]

{forest}

[{1,2,3,4}1234\{1,2,3,4\}{ 1 , 2 , 3 , 4 } [{1,2}12\{1,2\}{ 1 , 2 } [{1}1\{1\}{ 1 }] [{2}2\{2\}{ 2 }] ] [{3,4}34\{3,4\}{ 3 , 4 } [{3}3\{3\}{ 3 }] [{4}4\{4\}{ 4 }] ] ]

Figure 1. The dimension partition trees used in (a) the PARATREE algorithm of [SRK09] and (b) our HTD from Algorithm 1.

Hierarchical tensor representations in [Hac12, Chapter 11] start at the leaves of the tree, which are labelled by single indices. One finds subspaces Uinisubscript𝑈𝑖superscriptsubscript𝑛𝑖U_{i}\subseteq\mathbb{R}^{n_{i}}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that the tensor is well-approximated by a tensor in the lower-dimensional space U1Udn1ndtensor-productsubscript𝑈1subscript𝑈𝑑tensor-productsuperscriptsubscript𝑛1superscriptsubscript𝑛𝑑U_{1}\otimes\cdots\otimes U_{d}\subset\mathbb{R}^{n_{1}}\otimes\cdots\otimes% \mathbb{R}^{n_{d}}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ ⋯ ⊗ italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊗ ⋯ ⊗ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Proceeding from leaves to root, when two indices {i}𝑖\{i\}{ italic_i } and {j}𝑗\{j\}{ italic_j } combine to form the subset {i,j}𝑖𝑗\{i,j\}{ italic_i , italic_j }, the representation finds a subspace UijUiUjsubscript𝑈𝑖𝑗tensor-productsubscript𝑈𝑖subscript𝑈𝑗U_{ij}\subset U_{i}\otimes U_{j}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⊂ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that well-approximates the tensor. This repeats until we have a low-dimensional subspace U1dn1ndsubscript𝑈1𝑑tensor-productsuperscriptsubscript𝑛1superscriptsubscript𝑛𝑑U_{1\cdots d}\subseteq\mathbb{R}^{n_{1}}\otimes\cdots\otimes\mathbb{R}^{n_{d}}italic_U start_POSTSUBSCRIPT 1 ⋯ italic_d end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊗ ⋯ ⊗ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that the tensor T𝑇Titalic_T lies in this subspace to reasonable accuracy. Fixing ranks in the representation fixes the allowable dimension of the subspaces UIsubscript𝑈𝐼U_{I}italic_U start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT for the subsets I[d]𝐼delimited-[]𝑑I\subseteq[d]italic_I ⊆ [ italic_d ] in the tree. See [Hac12, Figure 11.1].

The PARATREE model starts at the root of the tree. For example, if the root is the splitting of {1,2,3}123\{1,2,3\}{ 1 , 2 , 3 } into {1}{2,3}123\{1\}\cup\{2,3\}{ 1 } ∪ { 2 , 3 } (i.e. Figure 1 in the case d=3𝑑3d=3italic_d = 3) then one computes a decomposition of the flattened tensor in n1n2n3tensor-productsuperscriptsubscript𝑛1superscriptsubscript𝑛2subscript𝑛3\mathbb{R}^{n_{1}}\otimes\mathbb{R}^{n_{2}n_{3}}blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊗ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to give a sum i=1r1𝐮i𝐱isuperscriptsubscript𝑖1subscript𝑟1tensor-productsubscript𝐮𝑖subscript𝐱𝑖\sum_{i=1}^{r_{1}}\mathbf{u}_{i}\otimes\mathbf{x}_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with 𝐮in1subscript𝐮𝑖superscriptsubscript𝑛1\mathbf{u}_{i}\in\mathbb{R}^{n_{1}}bold_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐱in2n3subscript𝐱𝑖superscriptsubscript𝑛2subscript𝑛3\mathbf{x}_{i}\in\mathbb{R}^{n_{2}n_{3}}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The second step is the splitting of indices {2,3}={2}{3}2323\{2,3\}=\{2\}\cup\{3\}{ 2 , 3 } = { 2 } ∪ { 3 }. This decomposes each vector 𝐱i=j=1r2𝐯i,j𝐰i,jsubscript𝐱𝑖superscriptsubscript𝑗1subscript𝑟2tensor-productsubscript𝐯𝑖𝑗subscript𝐰𝑖𝑗\mathbf{x}_{i}=\sum_{j=1}^{r_{2}}\mathbf{v}_{i,j}\otimes\mathbf{w}_{i,j}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⊗ bold_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, where 𝐱in2n3subscript𝐱𝑖superscriptsubscript𝑛2subscript𝑛3\mathbf{x}_{i}\in\mathbb{R}^{n_{2}n_{3}}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is viewed as a matrix of size n2×n3subscript𝑛2subscript𝑛3n_{2}\times n_{3}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. This results in the decomposition

(8) T=i=1r1𝐮𝐢(j=1r2𝐯i,j𝐰i,j).𝑇superscriptsubscript𝑖1subscript𝑟1tensor-productsubscript𝐮𝐢superscriptsubscript𝑗1subscript𝑟2tensor-productsubscript𝐯𝑖𝑗subscript𝐰𝑖𝑗T=\sum_{i=1}^{r_{1}}\mathbf{u_{i}}\otimes\left(\sum_{j=1}^{r_{2}}\mathbf{v}_{i% ,j}\otimes\mathbf{w}_{i,j}\right).italic_T = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_u start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT ⊗ ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_v start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⊗ bold_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) .

This pattern can be continued for larger d𝑑ditalic_d, see [SRK09, Equation 9].

Our HTD takes as input a symmetric p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor. We use the dimension partition tree in Figure 1(b). HTD can be viewed as a symmetric analogue of the PARATREE model, but differs in that it uses a different dimension partition tree, and leverages the symmetry of the tensor and decomposition to produce a rank r𝑟ritalic_r decomposition, rather than the rank r1r2subscript𝑟1subscript𝑟2r_{1}r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or, more generally, rank r1rd1subscript𝑟1subscript𝑟𝑑1r_{1}\cdots r_{d-1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ italic_r start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT) decomposition obtained from (8). Compared to the hierarchical tensor representations of [Hac12, Chapter 11], it differs in that the tensor is symmetric and it uses the dimension partition tree from root to leaves rather than leaves to root.

4. Tensor decompositions for cICA

The cICA model assumes 𝐲=A𝐳𝐲𝐴𝐳\mathbf{y}=A\mathbf{z}bold_y = italic_A bold_z and 𝐱=A𝐳+B𝐬𝐱𝐴superscript𝐳𝐵𝐬\mathbf{x}=A\mathbf{z}^{\prime}+B\mathbf{s}bold_x = italic_A bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_B bold_s, for Ap×r𝐴superscript𝑝𝑟A\in\mathbb{R}^{p\times r}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_r end_POSTSUPERSCRIPT and Bp×𝐵superscript𝑝B\in\mathbb{R}^{p\times\ell}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × roman_ℓ end_POSTSUPERSCRIPT, see (3). This leads to the cICA tensor decompositions (4). We present two variants of cICA to compute the decompositions (4). One does not assume a relationship between 𝐳𝐳\mathbf{z}bold_z and 𝐳superscript𝐳\mathbf{z}^{\prime}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT; we call this general cICA, see Section 4.1. The other assumes the proportional relationship 𝐳=γ𝐳superscript𝐳𝛾𝐳\mathbf{z}^{\prime}=\gamma\mathbf{z}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z for some scalar γ𝛾\gammaitalic_γ; we call this proportional cICA, see Section 4.2. We explain how to use cICA for dimensionality reduction in Section 4.3. This projects data onto a subspace given by certain columns of the foreground mixing matrix B𝐵Bitalic_B.

4.1. General cICA

We present Algorithm 2 for general cICA. Steps 1 and 3 both decompose a symmetric order four tensor. We use the subspace power method [KP19] in Step 1 to prioritize the accuracy of the tensor decomposition. We use Algorithm 1 in Step 3 to prioritize interpretability and efficiency.

Algorithm 2 Recover A𝐴Aitalic_A and B𝐵Bitalic_B from the cumulants of the background and foreground
1:κ4(𝐱),κ4(𝐲)subscript𝜅4𝐱subscript𝜅4𝐲\kappa_{4}(\mathbf{x}),\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) , italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) and r,𝑟r,\ellitalic_r , roman_ℓ as in (4).
2:Recover A𝐴Aitalic_A: Compute the symmetric tensor decomposition of κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) via the subspace power method [KP19]. This recovers A𝐴Aitalic_A up to permutation and scaling of columns.
3:Subtract background from κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ): Learn the coefficients λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of 𝐚14,,𝐚r4superscriptsubscript𝐚1tensor-productabsent4superscriptsubscript𝐚𝑟tensor-productabsent4\mathbf{a}_{1}^{\otimes 4},\ldots,\mathbf{a}_{r}^{\otimes 4}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT in κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) using the deflation step of the subspace power method.
4:Recover B𝐵Bitalic_B: Compute the symmetric tensor decomposition of i=1νi𝐛4=κ4(𝐱)i=1rλi𝐚i4superscriptsubscript𝑖1subscript𝜈𝑖superscript𝐛tensor-productabsent4subscript𝜅4𝐱superscriptsubscript𝑖1𝑟superscriptsubscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4\sum_{i=1}^{\ell}\nu_{i}\mathbf{b}^{\otimes 4}=\kappa_{4}(\mathbf{x})-\sum_{i=% 1}^{r}\lambda_{i}^{\prime}\mathbf{a}_{i}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT = italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT, using Algorithm 1.
5:Mixing matrices A𝐴Aitalic_A and B𝐵Bitalic_B.

We study the identifiability of the algorithm, that is, the uniqueness of the vectors and scalars it outputs, assuming genericity. Our genericity assumption holds almost surely in the space of parameters.

We use the following lemma.

Lemma 4.1.

Let vectors 𝐚ipsubscript𝐚𝑖superscript𝑝\mathbf{a}_{i}\in\mathbb{R}^{p}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT and scalars λisubscript𝜆𝑖\lambda_{i}\in\mathbb{R}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R be generic. Then the decomposition T=i=1qλi𝐚id𝑇superscriptsubscript𝑖1𝑞subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent𝑑T=\sum_{i=1}^{q}\lambda_{i}\mathbf{a}_{i}^{\otimes d}italic_T = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT of a symmetric p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor T𝑇Titalic_T is unique for

q{1p(p+34)1for p{3,4,5},1p(p+34)for p{3,5},9for p=4, provided q8.𝑞cases1𝑝binomial𝑝341for 𝑝3451𝑝binomial𝑝34for 𝑝359for 𝑝4 provided q8.q\leq\begin{cases}\lceil\frac{1}{p}{p+3\choose 4}-1\rceil&\text{for }p\notin\{% 3,4,5\},\\ \lceil\frac{1}{p}{p+3\choose 4}\rceil&\text{for }p\in\{3,5\},\\ 9&\text{for }p=4,\text{ provided $q\neq 8$.}\end{cases}italic_q ≤ { start_ROW start_CELL ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) - 1 ⌉ end_CELL start_CELL for italic_p ∉ { 3 , 4 , 5 } , end_CELL end_ROW start_ROW start_CELL ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ end_CELL start_CELL for italic_p ∈ { 3 , 5 } , end_CELL end_ROW start_ROW start_CELL 9 end_CELL start_CELL for italic_p = 4 , provided italic_q ≠ 8 . end_CELL end_ROW
Proof.

The rank of a generic p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p symmetric tensor is 1p(p+34)1𝑝binomial𝑝34\lceil\frac{1}{p}{p+3\choose 4}\rceil⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p{3,4,5}𝑝345p\notin\{3,4,5\}italic_p ∉ { 3 , 4 , 5 } and 1p(p+34)+11𝑝binomial𝑝341\lceil\frac{1}{p}{p+3\choose 4}\rceil+1⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ + 1 for p{3,4,5}𝑝345p\in\{3,4,5\}italic_p ∈ { 3 , 4 , 5 }, by the Alexander-Hirschowitz theorem [JA95]. Generic rank q𝑞qitalic_q tensors in this space, with q𝑞qitalic_q strictly below the generic rank have unique symmetric tensor decomposition for (p,q)(4,8)𝑝𝑞48(p,q)\neq(4,8)( italic_p , italic_q ) ≠ ( 4 , 8 ) and two tensor decompositions for p=4,q=8formulae-sequence𝑝4𝑞8p=4,q=8italic_p = 4 , italic_q = 8 by [COV17, Theorem 1.1]. ∎

Proposition 4.2 (Identifiability of the cICA tensor decomposition).

The joint decomposition

(9) κ4(𝐲)=i=1rλi𝐚i4,κ4(𝐱)=i=1rλi𝐚i4+j=1νj𝐛j4,formulae-sequencesubscript𝜅4𝐲superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4subscript𝜅4𝐱superscriptsubscript𝑖1𝑟superscriptsubscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent4\kappa_{4}(\mathbf{y})=\sum_{i=1}^{r}\lambda_{i}\mathbf{a}_{i}^{\otimes 4},% \qquad\quad\kappa_{4}(\mathbf{x})=\sum_{i=1}^{r}\lambda_{i}^{\prime}\mathbf{a}% _{i}^{\otimes 4}+\sum_{j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes 4},italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT , italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT ,

is unique for generic 𝐚i,𝐛j,λi,λi,νjsubscript𝐚𝑖subscript𝐛𝑗subscript𝜆𝑖superscriptsubscript𝜆𝑖subscript𝜈𝑗\mathbf{a}_{i},\mathbf{b}_{j},\lambda_{i},\lambda_{i}^{\prime},\nu_{j}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ] and j[]𝑗delimited-[]j\in[\ell]italic_j ∈ [ roman_ℓ ], when r+<1p(p+34)𝑟1𝑝binomial𝑝34r+\ell<\lceil\frac{1}{p}{p+3\choose 4}\rceilitalic_r + roman_ℓ < ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p3,4,5𝑝345p\neq 3,4,5italic_p ≠ 3 , 4 , 5, r+1p(p+34)𝑟1𝑝binomial𝑝34r+\ell\leq\lceil\frac{1}{p}{p+3\choose 4}\rceilitalic_r + roman_ℓ ≤ ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p=3,5𝑝35p=3,5italic_p = 3 , 5, and when r+9,r+8formulae-sequence𝑟9𝑟8r+\ell\leq 9,r+\ell\neq 8italic_r + roman_ℓ ≤ 9 , italic_r + roman_ℓ ≠ 8 for p=4𝑝4p=4italic_p = 4.

Proof.

The cICA tensor decomposition in the statement is identifiable when the symmetric tensor decomposition of κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) is unique, as follows. The tensor decomposition of κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ), gives vectors 𝐚i,𝐛jsubscript𝐚𝑖subscript𝐛𝑗\mathbf{a}_{i},\mathbf{b}_{j}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT up to permutation and scaling. Then we can solve a linear system to find the decomposition κ4(𝐲)=i=1rλi𝐚i4subscript𝜅4𝐲superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4\kappa_{4}(\mathbf{y})=\sum_{i=1}^{r}\lambda_{i}\mathbf{a}_{i}^{\otimes 4}italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT. It therefore remains to study the identifiability of the decomposition of κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ). It is a symmetric p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor of rank r+𝑟r+\ellitalic_r + roman_ℓ. Hence the uniqueness follows from Lemma 4.1, setting q=r+𝑞𝑟q=r+\ellitalic_q = italic_r + roman_ℓ. ∎

We say that Algorithm 2 is identifiable if, for generic 𝐚i,𝐛j,λi,λi,νjsubscript𝐚𝑖subscript𝐛𝑗subscript𝜆𝑖superscriptsubscript𝜆𝑖subscript𝜈𝑗\mathbf{a}_{i},\mathbf{b}_{j},\lambda_{i},\lambda_{i}^{\prime},\nu_{j}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT where i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ], j[]𝑗delimited-[]j\in[\ell]italic_j ∈ [ roman_ℓ ], we can uniquely recover the vectors 𝐚1,,𝐚rsubscript𝐚1subscript𝐚𝑟\mathbf{a}_{1},\ldots,\mathbf{a}_{r}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the coefficients λ1,,λrsuperscriptsubscript𝜆1superscriptsubscript𝜆𝑟\lambda_{1}^{\prime},\ldots,\lambda_{r}^{\prime}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and the vectors 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT.

Proposition 4.3.

Algorithm 2 is identifiable when r+(p+12)𝑟binomial𝑝12r+\ell\leq{p+1\choose 2}italic_r + roman_ℓ ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) for p4𝑝4p\neq 4italic_p ≠ 4 and r+9,r,8formulae-sequence𝑟9𝑟8r+\ell\leq 9,r,\ell\neq 8italic_r + roman_ℓ ≤ 9 , italic_r , roman_ℓ ≠ 8 for p=4𝑝4p=4italic_p = 4.

To prove Proposition 4.3 and latter Theorem 4.5, we use the following linear algebra result. See [KP19, Lemma B.1] for a proof.

Lemma 4.4.

Let Mn×n,Un×kformulae-sequence𝑀superscript𝑛𝑛𝑈superscript𝑛𝑘M\in\mathbb{R}^{n\times n},U\in\mathbb{R}^{n\times k}italic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT , italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_k end_POSTSUPERSCRIPT and Vn×k𝑉superscript𝑛𝑘V\in\mathbb{R}^{n\times k}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_k end_POSTSUPERSCRIPT be full rank matrices with kn𝑘𝑛k\leq nitalic_k ≤ italic_n. Let C=(V𝖳M1U)superscript𝐶superscriptsuperscript𝑉𝖳superscript𝑀1𝑈C^{\ast}=(V^{\mathsf{T}}M^{-1}U)^{\dagger}italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, where \dagger denotes the pseudo-inverse, and d=rank(C)𝑑ranksuperscript𝐶d=\operatorname{rank}(C^{\ast})italic_d = roman_rank ( italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Then

rank(MUCV𝖳)nd,rank𝑀𝑈𝐶superscript𝑉𝖳𝑛𝑑\operatorname{rank}(M-UCV^{\mathsf{T}})\geq n-d,roman_rank ( italic_M - italic_U italic_C italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) ≥ italic_n - italic_d ,

with equality if and only if C=C𝐶superscript𝐶C=C^{\ast}italic_C = italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Proof of Proposition 4.3.

Tensors i=1rλi𝐚i4superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4\sum_{i=1}^{r}\lambda_{i}\mathbf{a}_{i}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT and j=1νj𝐛j4superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent4\sum_{j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT are generic rank r𝑟ritalic_r and rank \ellroman_ℓ tensors, respectively. So, the identifiability of Steps 1 and 3 of Algorithm 2 hold if r,<1p(p+34)𝑟1𝑝binomial𝑝34r,\ell<\lceil\frac{1}{p}{p+3\choose 4}\rceilitalic_r , roman_ℓ < ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p{3,4,5}𝑝345p\notin\{3,4,5\}italic_p ∉ { 3 , 4 , 5 } or r,1p(p+34)𝑟1𝑝binomial𝑝34r,\ell\leq\lceil\frac{1}{p}{p+3\choose 4}\rceilitalic_r , roman_ℓ ≤ ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p{3,5}𝑝35p\in\{3,5\}italic_p ∈ { 3 , 5 } or r,9,r,8formulae-sequence𝑟9𝑟8r,\ell\leq 9,r,\ell\neq 8italic_r , roman_ℓ ≤ 9 , italic_r , roman_ℓ ≠ 8 for p=4𝑝4p=4italic_p = 4, setting q=r𝑞𝑟q=ritalic_q = italic_r and q=𝑞q=\ellitalic_q = roman_ℓ in Lemma 4.1.

It remains to consider Step 2, learning the coefficients λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of 𝐚i4superscriptsubscript𝐚𝑖tensor-productabsent4\mathbf{a}_{i}^{\otimes 4}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT in κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ). The flattening of κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) has the form M=i=1rλi𝐀i2+j=1νj𝐁j2p2×p2𝑀superscriptsubscript𝑖1𝑟superscriptsubscript𝜆𝑖superscriptsubscript𝐀𝑖tensor-productabsent2superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐁𝑗tensor-productabsent2superscriptsuperscript𝑝2superscript𝑝2M=\sum_{i=1}^{r}\lambda_{i}^{\prime}\mathbf{A}_{i}^{\otimes 2}+\sum_{j=1}^{% \ell}\nu_{j}\mathbf{B}_{j}^{\otimes 2}\in\mathbb{R}^{p^{2}\times p^{2}}italic_M = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where 𝐀i,𝐁jp2subscript𝐀𝑖subscript𝐁𝑗superscriptsuperscript𝑝2\mathbf{A}_{i},\mathbf{B}_{j}\in\mathbb{R}^{p^{2}}bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT vectorize 𝐚i2superscriptsubscript𝐚𝑖tensor-productabsent2\mathbf{a}_{i}^{\otimes 2}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT and 𝐛j2superscriptsubscript𝐛𝑗tensor-productabsent2\mathbf{b}_{j}^{\otimes 2}bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT, respectively. The scalar λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is unique if rank(Mλi𝐀i𝐀i)=rank(M)1rank𝑀tensor-productsuperscriptsubscript𝜆𝑖subscript𝐀𝑖subscript𝐀𝑖rank𝑀1\operatorname{rank}(M-\lambda_{i}^{\prime}\mathbf{A}_{i}\otimes\mathbf{A}_{i})% =\operatorname{rank}(M)-1roman_rank ( italic_M - italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_rank ( italic_M ) - 1, by Lemma 4.4. It is ((𝐀i𝖳V)D1(𝐀i𝖳V)𝖳)1superscriptsuperscriptsubscript𝐀𝑖𝖳𝑉superscript𝐷1superscriptsuperscriptsubscript𝐀𝑖𝖳𝑉𝖳1((\mathbf{A}_{i}^{\mathsf{T}}V)D^{-1}(\mathbf{A}_{i}^{\mathsf{T}}V)^{\mathsf{T% }})^{-1}( ( bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V ) italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, where VDV𝖳𝑉𝐷superscript𝑉𝖳VDV^{\mathsf{T}}italic_V italic_D italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT is the thin eigendecomposition of M𝑀Mitalic_M. In particular, the coefficient λisuperscriptsubscript𝜆𝑖\lambda_{i}^{\prime}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is unique when

𝐚i2Span({𝐚12,,𝐚i12,𝐚i+12,𝐚r2,𝐛12,,𝐛2}).superscriptsubscript𝐚𝑖tensor-productabsent2Spansuperscriptsubscript𝐚1tensor-productabsent2superscriptsubscript𝐚𝑖1tensor-productabsent2superscriptsubscript𝐚𝑖1tensor-productabsent2superscriptsubscript𝐚𝑟tensor-productabsent2superscriptsubscript𝐛1tensor-productabsent2superscriptsubscript𝐛tensor-productabsent2\mathbf{a}_{i}^{\otimes 2}\notin\operatorname{Span}(\{\mathbf{a}_{1}^{\otimes 2% },\ldots,\mathbf{a}_{i-1}^{\otimes 2},\mathbf{a}_{i+1}^{\otimes 2},\mathbf{a}_% {r}^{\otimes 2},\mathbf{b}_{1}^{\otimes 2},\ldots,\mathbf{b}_{\ell}^{\otimes 2% }\}).bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ∉ roman_Span ( { bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , bold_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , bold_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT } ) .

For generic 𝐚isubscript𝐚𝑖\mathbf{a}_{i}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐛jsubscript𝐛𝑗\mathbf{b}_{j}bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, this holds provided r+𝑟r+\ellitalic_r + roman_ℓ is at most (p+12)binomial𝑝12{p+1\choose 2}( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ), the dimension of the space of p×p𝑝𝑝p\times pitalic_p × italic_p symmetric matrices. Inequalities (p+12)1p(p+34)binomial𝑝121𝑝binomial𝑝34{p+1\choose 2}\leq\lceil\frac{1}{p}{p+3\choose 4}\rceil( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) ≤ ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ for p{3,4,5}𝑝345p\notin\{3,4,5\}italic_p ∉ { 3 , 4 , 5 } and (p+12)1p(p+34)+1binomial𝑝121𝑝binomial𝑝341{p+1\choose 2}\leq\lceil\frac{1}{p}{p+3\choose 4}\rceil+1( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) ≤ ⌈ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( binomial start_ARG italic_p + 3 end_ARG start_ARG 4 end_ARG ) ⌉ + 1 for p{3,4,5}𝑝345p\in\{3,4,5\}italic_p ∈ { 3 , 4 , 5 } hold. Combining the above conditions, Algorithm 2 is identifiable when r+(p+12)𝑟binomial𝑝12r+\ell\leq{p+1\choose 2}italic_r + roman_ℓ ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) for p4𝑝4p\neq 4italic_p ≠ 4 and r+9,r,8formulae-sequence𝑟9𝑟8r+\ell\leq 9,r,\ell\neq 8italic_r + roman_ℓ ≤ 9 , italic_r , roman_ℓ ≠ 8 for p=4𝑝4p=4italic_p = 4. ∎

In some settings, we assume that the vectors 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT are orthogonal. In particular, p𝑝\ell\leq proman_ℓ ≤ italic_p. This assumption is natural for visualization purposes, since the projection onto foreground patterns is orthogonal. In this case, HTD gives an exact decomposition, by Proposition 3.3. The identifiability requirements are the same as in Propositions 4.2 and 4.3, as follows. The identifiability conditions in the two propositions are unchanged under a change of basis by an invertible p×p𝑝𝑝p\times pitalic_p × italic_p matrix. When p𝑝\ell\leq proman_ℓ ≤ italic_p, we can apply a change of basis to κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) so that the vectors 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT become orthogonal. We apply the same change of basis to κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ).

4.2. Proportional cICA

In this subsection, we assume a proportional relationship 𝐳=γ𝐳superscript𝐳𝛾𝐳\mathbf{z}^{\prime}=\gamma\mathbf{z}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z for some scalar γ>0𝛾0\gamma>0italic_γ > 0. This assumption also appears in cPCA [AZBZ17]. There, the choice of the hyperparameter γ𝛾\gammaitalic_γ is not unique. However, in our setting—which involves the fourth-order cumulants κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) and κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ), under the assumption that r+(p+12)𝑟binomial𝑝12r+\ell\leq{p+1\choose 2}italic_r + roman_ℓ ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG )—the value of γ𝛾\gammaitalic_γ is uniquely determined, with a closed-form expression, see Theorem 4.5. The details of the ensuing algorithm for computing matrix B𝐵Bitalic_B are as follows.

Algorithm 3 Recover B𝐵Bitalic_B from the background and foreground cumulants when 𝐳=γ𝐳superscript𝐳𝛾𝐳\mathbf{z^{\prime}}=\gamma\mathbf{z}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z
1:κ4(𝐱),κ4(𝐲)subscript𝜅4𝐱subscript𝜅4𝐲\kappa_{4}(\mathbf{x}),\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) , italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) and \ellroman_ℓ as in (4).
2:Compute γ𝛾\gammaitalic_γ using Theorem 4.5.
3:Recover B𝐵Bitalic_B: Compute rank \ellroman_ℓ symmetric decomposition of κ4(𝐱)γ4κ4(𝐲)subscript𝜅4𝐱superscript𝛾4subscript𝜅4𝐲\kappa_{4}(\mathbf{x})-\gamma^{4}\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) - italic_γ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ), using Algorithm 1.
4:Mixing matrix B𝐵Bitalic_B.
Theorem 4.5.

Consider proportional cICA with 𝐳=γ𝐳superscript𝐳𝛾𝐳\mathbf{z}^{\prime}=\gamma\mathbf{z}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ bold_z, for γ>0𝛾0\gamma>0italic_γ > 0. For generic 𝐚1,,𝐚rsubscript𝐚1subscript𝐚𝑟\mathbf{a}_{1},\ldots,\mathbf{a}_{r}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT with r+(p+12)𝑟binomial𝑝12r+\ell\leq{p+1\choose 2}italic_r + roman_ℓ ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) and r8𝑟8r\neq 8italic_r ≠ 8, the hyperparameter γ𝛾\gammaitalic_γ is the unique value (1λi(𝐚i𝖳VD1V𝖳𝐚i)1)14superscript1subscript𝜆𝑖superscriptsuperscriptsubscript𝐚𝑖𝖳𝑉superscript𝐷1superscript𝑉𝖳subscript𝐚𝑖114(\frac{1}{\lambda_{i}}(\mathbf{a}_{i}^{\mathsf{T}}VD^{-1}V^{\mathsf{T}}\mathbf% {a}_{i})^{-1})^{\frac{1}{4}}( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT, where i𝑖iitalic_i is any index between 1111 and r𝑟ritalic_r, λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the coefficient of 𝐚i4superscriptsubscript𝐚𝑖tensor-productabsent4\mathbf{a}_{i}^{\otimes 4}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT in κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) and VDV𝖳𝑉𝐷superscript𝑉𝖳VDV^{\mathsf{T}}italic_V italic_D italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT is the thin eigendecomposition of Mat(κ4(𝐱))Matsubscript𝜅4𝐱\operatorname{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ).

Proof.

The flattenings of the cumulants κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) and κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) are, respectively,

M𝐲:=i=1rλi𝐀i2,M𝐱:=γ4(i=1rλi𝐀i2)+j=1νj𝐁j2,formulae-sequenceassignsubscript𝑀𝐲superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐀𝑖tensor-productabsent2assignsubscript𝑀𝐱superscript𝛾4superscriptsubscript𝑖1𝑟subscript𝜆𝑖superscriptsubscript𝐀𝑖tensor-productabsent2superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐁𝑗tensor-productabsent2M_{\mathbf{y}}:=\sum_{i=1}^{r}\lambda_{i}\mathbf{A}_{i}^{\otimes 2},\qquad M_{% \mathbf{x}}:=\gamma^{4}\left(\sum_{i=1}^{r}\lambda_{i}\mathbf{A}_{i}^{\otimes 2% }\right)+\sum_{j=1}^{\ell}\nu_{j}\mathbf{B}_{j}^{\otimes 2},italic_M start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT , italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT := italic_γ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ,

where 𝐀i,𝐁jp2subscript𝐀𝑖subscript𝐁𝑗superscriptsuperscript𝑝2\mathbf{A}_{i},\mathbf{B}_{j}\in\mathbb{R}^{p^{2}}bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT vectorize the matrices 𝐚i2superscriptsubscript𝐚𝑖tensor-productabsent2\mathbf{a}_{i}^{\otimes 2}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT and 𝐛i2superscriptsubscript𝐛𝑖tensor-productabsent2\mathbf{b}_{i}^{\otimes 2}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT, respectively and we use that λi=γ4λisuperscriptsubscript𝜆𝑖superscript𝛾4subscript𝜆𝑖\lambda_{i}^{\prime}=\gamma^{4}\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We have rankM𝐲=rranksubscript𝑀𝐲𝑟\operatorname{rank}M_{\mathbf{y}}=rroman_rank italic_M start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT = italic_r and rankM𝐱=r+ranksubscript𝑀𝐱𝑟\operatorname{rank}M_{\mathbf{x}}=r+\ellroman_rank italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT = italic_r + roman_ℓ, by the assumptions in the statement.

Let Ap2×r𝐴superscriptsuperscript𝑝2𝑟A\in\mathbb{R}^{p^{2}\times r}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_r end_POSTSUPERSCRIPT be the matrix with columns 𝐀1,,𝐀rsubscript𝐀1subscript𝐀𝑟\mathbf{A}_{1},\ldots,\mathbf{A}_{r}bold_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and define D=γ4Diag(λ1,,λr)superscript𝐷superscript𝛾4Diagsubscript𝜆1subscript𝜆𝑟D^{\prime}=\gamma^{4}\text{Diag}(\lambda_{1},\ldots,\lambda_{r})italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_γ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT Diag ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ). We have rank(M𝐱ADA𝖳)=rank(j=1νj𝐁j2)=ranksubscript𝑀𝐱𝐴superscript𝐷superscript𝐴𝖳ranksuperscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐁𝑗tensor-productabsent2\operatorname{rank}(M_{\mathbf{x}}-AD^{\prime}A^{\mathsf{T}})=\operatorname{% rank}(\sum_{j=1}^{\ell}\nu_{j}\mathbf{B}_{j}^{\otimes 2})=\ellroman_rank ( italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT - italic_A italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) = roman_rank ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT ) = roman_ℓ. Suppose that VDV𝖳𝑉𝐷superscript𝑉𝖳VDV^{\mathsf{T}}italic_V italic_D italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT is the thin eigendecomposition of M𝐱subscript𝑀𝐱M_{\mathbf{x}}italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT. We have

V𝖳(M𝐱ADA𝖳)V=D(V𝖳A)D(V𝖳A)𝖳.superscript𝑉𝖳subscript𝑀𝐱𝐴superscript𝐷superscript𝐴𝖳𝑉𝐷superscript𝑉𝖳𝐴superscript𝐷superscriptsuperscript𝑉𝖳𝐴𝖳V^{\mathsf{T}}(M_{\mathbf{x}}-AD^{\prime}A^{\mathsf{T}})V=D-(V^{\mathsf{T}}A)D% ^{\prime}(V^{\mathsf{T}}A)^{\mathsf{T}}.italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT - italic_A italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) italic_V = italic_D - ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT .

We have that rankD=r+rank𝐷𝑟\operatorname{rank}D=r+\ellroman_rank italic_D = italic_r + roman_ℓ, the upper bound rank(V𝖳A)D(V𝖳A)𝖳=rankV𝖳M𝐲Vrranksuperscript𝑉𝖳𝐴superscript𝐷superscriptsuperscript𝑉𝖳𝐴𝖳ranksuperscript𝑉𝖳subscript𝑀𝐲𝑉𝑟\operatorname{rank}(V^{\mathsf{T}}A)D^{\prime}(V^{\mathsf{T}}A)^{\mathsf{T}}=% \operatorname{rank}V^{\mathsf{T}}M_{\mathbf{y}}V\leq rroman_rank ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = roman_rank italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT italic_V ≤ italic_r, and finally that rank(D(V𝖳A)D(V𝖳A)𝖳)=rank(V𝖳(M𝐱ADA𝖳)V)rank𝐷superscript𝑉𝖳𝐴superscript𝐷superscriptsuperscript𝑉𝖳𝐴𝖳ranksuperscript𝑉𝖳subscript𝑀𝐱𝐴superscript𝐷superscript𝐴𝖳𝑉\operatorname{rank}(D-(V^{\mathsf{T}}A)D^{\prime}(V^{\mathsf{T}}A)^{\mathsf{T}% })=\operatorname{rank}(V^{\mathsf{T}}(M_{\mathbf{x}}-AD^{\prime}A^{\mathsf{T}}% )V)\leq\ellroman_rank ( italic_D - ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) = roman_rank ( italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT - italic_A italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) italic_V ) ≤ roman_ℓ. Hence

D=(A𝖳VD1V𝖳A)1,superscript𝐷superscriptsuperscript𝐴𝖳𝑉superscript𝐷1superscript𝑉𝖳𝐴1D^{\prime}=(A^{\mathsf{T}}VD^{-1}V^{\mathsf{T}}A)^{-1},italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

by Lemma 4.4. Matrices A,Diag(λ1,,λr),V,D𝐴Diagsubscript𝜆1subscript𝜆𝑟𝑉𝐷A,\text{Diag}(\lambda_{1},\ldots,\lambda_{r}),V,Ditalic_A , Diag ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) , italic_V , italic_D can be recovered uniquely from tensor decomposition of κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) and the eigendecomposition of M𝐱subscript𝑀𝐱M_{\mathbf{x}}italic_M start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT. So Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can also be recovered uniquely, and hence γ𝛾\gammaitalic_γ is unique: it is γ4λi=(𝐚i𝖳VD1V𝖳𝐚i)1superscript𝛾4subscript𝜆𝑖superscriptsuperscriptsubscript𝐚𝑖𝖳𝑉superscript𝐷1superscript𝑉𝖳subscript𝐚𝑖1\gamma^{4}\lambda_{i}=(\mathbf{a}_{i}^{\mathsf{T}}VD^{-1}V^{\mathsf{T}}\mathbf% {a}_{i})^{-1}italic_γ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for any i[r]𝑖delimited-[]𝑟i\in[r]italic_i ∈ [ italic_r ]. ∎

One can test the proportionality assumption by seeing whether the values (1λi(𝐚iVD1V𝐚i)1)14superscript1subscript𝜆𝑖superscriptsuperscriptsubscript𝐚𝑖top𝑉superscript𝐷1superscript𝑉topsubscript𝐚𝑖114\left(\frac{1}{\lambda_{i}}(\mathbf{a}_{i}^{\top}VD^{-1}V^{\top}\mathbf{a}_{i}% )^{-1}\right)^{\frac{1}{4}}( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT from Theorem 4.5 are approximately equal as i𝑖iitalic_i varies. In practice, exact proportionality may not hold, and learning γ𝛾\gammaitalic_γ via Theorem 4.5 could be challenging. An alternative is to use a sweep of γ𝛾\gammaitalic_γ values and choose γ𝛾\gammaitalic_γ according to visualization plots, a similar method to that used in cPCA [AZBZ17].

4.3. cICA for dimensionality reduction

Usual ICA has been used as a tool to project data, see [Dom18, GW20, LM08]. We extend this to cICA. In practice, the input to cICA consists of samples from the foreground 𝐱𝐱\mathbf{x}bold_x and background 𝐲𝐲\mathbf{y}bold_y. These samples comprise the foreground data Xn×p𝑋superscript𝑛𝑝X\in\mathbb{R}^{n\times p}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT and the background data Ym×p𝑌superscript𝑚𝑝Y\in\mathbb{R}^{m\times p}italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_p end_POSTSUPERSCRIPT, where n𝑛nitalic_n and m𝑚mitalic_m are the number of samples in the foreground and background datasets respectively. We then construct the sample cumulants κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) and κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) as follows.

A dataset of n𝑛nitalic_n samples in psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT gives a data matrix Xn×p𝑋superscript𝑛𝑝X\in\mathbb{R}^{n\times p}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Its fourth cumulant is computed as follows. Let X¯p¯𝑋superscript𝑝\bar{X}\in\mathbb{R}^{p}over¯ start_ARG italic_X end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT denote the mean vector over all observations. The p×p𝑝𝑝p\times pitalic_p × italic_p sample covariance matrix ΣΣ\Sigmaroman_Σ for X𝑋Xitalic_X has entries σij=1nt=1n(XtiX¯i)(XtjX¯j),subscript𝜎𝑖𝑗1𝑛superscriptsubscript𝑡1𝑛subscript𝑋𝑡𝑖subscript¯𝑋𝑖subscript𝑋𝑡𝑗subscript¯𝑋𝑗\sigma_{ij}=\frac{1}{n}\sum_{t=1}^{n}(X_{ti}-\bar{X}_{i})(X_{tj}-\bar{X}_{j}),italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,. The fourth-order central sample moment is a p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor with entries Mijkl=1nt=1n(XtiX¯i)(XtjX¯j)(XtkX¯k)(XtlX¯l).subscript𝑀𝑖𝑗𝑘𝑙1𝑛superscriptsubscript𝑡1𝑛subscript𝑋𝑡𝑖subscript¯𝑋𝑖subscript𝑋𝑡𝑗subscript¯𝑋𝑗subscript𝑋𝑡𝑘subscript¯𝑋𝑘subscript𝑋𝑡𝑙subscript¯𝑋𝑙M_{ijkl}=\frac{1}{n}\sum_{t=1}^{n}(X_{ti}-\bar{X}_{i})(X_{tj}-\bar{X}_{j})(X_{% tk}-\bar{X}_{k})(X_{tl}-\bar{X}_{l}).italic_M start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_l end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) . Entry (i,j,k,l)𝑖𝑗𝑘𝑙(i,j,k,l)( italic_i , italic_j , italic_k , italic_l ) of the fourth-order sample cumulant is Mijklσijσklσikσjlσilσjk.subscript𝑀𝑖𝑗𝑘𝑙subscript𝜎𝑖𝑗subscript𝜎𝑘𝑙subscript𝜎𝑖𝑘subscript𝜎𝑗𝑙subscript𝜎𝑖𝑙subscript𝜎𝑗𝑘M_{ijkl}-\sigma_{ij}\sigma_{kl}-\sigma_{ik}\sigma_{jl}-\sigma_{il}\sigma_{jk}.italic_M start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_l end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT . If the data X𝑋Xitalic_X are samples from a distribution 𝐱𝐱\mathbf{x}bold_x, this sample cumulant approximates κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ). The computation for κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) is similar.

When p𝑝pitalic_p is large, forming the fourth cumulants may be prohibitively expensive. To get around this, one can reduce the dimension before forming the cumulants, as follows.

We combine the foreground and background datasets together to form a single dataset, a matrix of size (m+n)×p𝑚𝑛𝑝(m+n)\times p( italic_m + italic_n ) × italic_p. Let Up×k𝑈superscript𝑝𝑘U\in\mathbb{R}^{p\times k}italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_k end_POSTSUPERSCRIPT have as its columns the top k𝑘kitalic_k principal components of this combined data. The background and foreground transformed variables are then

(10) U𝖳A𝐳andU𝖳A𝐳+U𝖳B𝐬,superscript𝑈𝖳𝐴𝐳andsuperscript𝑈𝖳𝐴superscript𝐳superscript𝑈𝖳𝐵𝐬U^{\mathsf{T}}A\mathbf{z}\qquad\text{and}\qquad U^{\mathsf{T}}A\mathbf{z}^{% \prime}+U^{\mathsf{T}}B\mathbf{s},italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A bold_z and italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B bold_s ,

respectively, where U𝖳Ak×rsuperscript𝑈𝖳𝐴superscript𝑘𝑟U^{\mathsf{T}}A\in\mathbb{R}^{k\times r}italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × italic_r end_POSTSUPERSCRIPT and U𝖳Bk×superscript𝑈𝖳𝐵superscript𝑘U^{\mathsf{T}}B\in\mathbb{R}^{k\times\ell}italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × roman_ℓ end_POSTSUPERSCRIPT. The recovered foreground patterns from cICA are the columns of U𝖳Bsuperscript𝑈𝖳𝐵U^{\mathsf{T}}Bitalic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B. The columns of UU𝖳Bp×𝑈superscript𝑈𝖳𝐵superscript𝑝UU^{\mathsf{T}}B\in\mathbb{R}^{p\times\ell}italic_U italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × roman_ℓ end_POSTSUPERSCRIPT convert these projected foreground patterns back into the original space.

In practice, for our data visualization in Section 6.2, we choose the number k𝑘kitalic_k of PCA components to be 30 or the number of components that explains at least 90%percent9090\%90 % variance, whichever comes first.

We compute the mixing matrix Bp×𝐵superscript𝑝B\in\mathbb{R}^{p\times\ell}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × roman_ℓ end_POSTSUPERSCRIPT with columns 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT using Algorithm 2 or 3. When employing cICA for dimensionality reduction, we project the foreground data X𝑋Xitalic_X onto XB𝑋𝐵XBitalic_X italic_B. For a two-dimensional plot, we plot the projections (X𝐛i,X𝐛j)𝑋subscript𝐛𝑖𝑋subscript𝐛𝑗(X\mathbf{b}_{i},X\mathbf{b}_{j})( italic_X bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for a pair i,j𝑖𝑗i,jitalic_i , italic_j. To select the most relevant vectors out of our \ellroman_ℓ recovered vectors 𝐛isubscript𝐛𝑖superscript\mathbf{b}_{i}\in\mathbb{R}^{\ell}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, we order them by the ratio

(11) k(𝐛):=𝐛κ2(𝐱)𝐛𝐛κ2(𝐲)𝐛.assign𝑘𝐛superscript𝐛topsubscript𝜅2𝐱𝐛superscript𝐛topsubscript𝜅2𝐲𝐛k(\mathbf{b}):=\frac{\mathbf{b}^{\top}\kappa_{2}(\mathbf{x})\mathbf{b}}{% \mathbf{b}^{\top}\kappa_{2}(\mathbf{y})\mathbf{b}}.italic_k ( bold_b ) := divide start_ARG bold_b start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) bold_b end_ARG start_ARG bold_b start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_y ) bold_b end_ARG .

We justify this ranking in Section 5.2. We interpret the axes of a cICA dimensionality reduction plot in Section 5.3.

5. Practicalities and interpretation of cICA

In this section, we discuss the practicalities of cICA: preprocessing the input to speed up the algorithm and how to choose the ranks r𝑟ritalic_r and \ellroman_ℓ. We also discuss how to interpret coordinates when viewing cICA as a dimensionality reduction method.

5.1. Preprocessing with PCA

A dataset of n𝑛nitalic_n samples in psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT gives a data matrix Xn×p𝑋superscript𝑛𝑝X\in\mathbb{R}^{n\times p}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_p end_POSTSUPERSCRIPT. Its fourth cumulant is computed as follows. Let X¯p¯𝑋superscript𝑝\bar{X}\in\mathbb{R}^{p}over¯ start_ARG italic_X end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT denote the mean vector over all observations. The p×p𝑝𝑝p\times pitalic_p × italic_p sample covariance matrix ΣΣ\Sigmaroman_Σ for X𝑋Xitalic_X has entries σij=1nt=1n(XtiX¯i)(XtjX¯j),subscript𝜎𝑖𝑗1𝑛superscriptsubscript𝑡1𝑛subscript𝑋𝑡𝑖subscript¯𝑋𝑖subscript𝑋𝑡𝑗subscript¯𝑋𝑗\sigma_{ij}=\frac{1}{n}\sum_{t=1}^{n}(X_{ti}-\bar{X}_{i})(X_{tj}-\bar{X}_{j}),italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,. The fourth-order central sample moment is a p×p×p×p𝑝𝑝𝑝𝑝p\times p\times p\times pitalic_p × italic_p × italic_p × italic_p tensor with entries Mijkl=1nt=1n(XtiX¯i)(XtjX¯j)(XtkX¯k)(XtlX¯l).subscript𝑀𝑖𝑗𝑘𝑙1𝑛superscriptsubscript𝑡1𝑛subscript𝑋𝑡𝑖subscript¯𝑋𝑖subscript𝑋𝑡𝑗subscript¯𝑋𝑗subscript𝑋𝑡𝑘subscript¯𝑋𝑘subscript𝑋𝑡𝑙subscript¯𝑋𝑙M_{ijkl}=\frac{1}{n}\sum_{t=1}^{n}(X_{ti}-\bar{X}_{i})(X_{tj}-\bar{X}_{j})(X_{% tk}-\bar{X}_{k})(X_{tl}-\bar{X}_{l}).italic_M start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_t italic_l end_POSTSUBSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) . Entry (i,j,k,l)𝑖𝑗𝑘𝑙(i,j,k,l)( italic_i , italic_j , italic_k , italic_l ) of the fourth-order sample cumulant is Mijklσijσklσikσjlσilσjk.subscript𝑀𝑖𝑗𝑘𝑙subscript𝜎𝑖𝑗subscript𝜎𝑘𝑙subscript𝜎𝑖𝑘subscript𝜎𝑗𝑙subscript𝜎𝑖𝑙subscript𝜎𝑗𝑘M_{ijkl}-\sigma_{ij}\sigma_{kl}-\sigma_{ik}\sigma_{jl}-\sigma_{il}\sigma_{jk}.italic_M start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j italic_l end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_i italic_l end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT . If the data X𝑋Xitalic_X are samples from a distribution 𝐱𝐱\mathbf{x}bold_x, this sample cumulant approximates κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ). The computation for κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) is similar.

When p𝑝pitalic_p is large, forming the fourth cumulants may be prohibitively expensive. To get around this, one can reduce the dimension before forming the cumulants, as follows. We combine the foreground and background datasets together to form a single dataset, a matrix of size (m+n)×p𝑚𝑛𝑝(m+n)\times p( italic_m + italic_n ) × italic_p. Let Up×k𝑈superscript𝑝𝑘U\in\mathbb{R}^{p\times k}italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_k end_POSTSUPERSCRIPT have as its columns the top k𝑘kitalic_k principal components of this combined data. The background and foreground transformed variables then have the form

(12) U𝖳A𝐳andU𝖳A𝐳+U𝖳B𝐬,superscript𝑈𝖳𝐴𝐳andsuperscript𝑈𝖳𝐴superscript𝐳superscript𝑈𝖳𝐵𝐬U^{\mathsf{T}}A\mathbf{z}\qquad\text{and}\qquad U^{\mathsf{T}}A\mathbf{z}^{% \prime}+U^{\mathsf{T}}B\mathbf{s},italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A bold_z and italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B bold_s ,

respectively, where U𝖳Ak×rsuperscript𝑈𝖳𝐴superscript𝑘𝑟U^{\mathsf{T}}A\in\mathbb{R}^{k\times r}italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × italic_r end_POSTSUPERSCRIPT and U𝖳Bk×superscript𝑈𝖳𝐵superscript𝑘U^{\mathsf{T}}B\in\mathbb{R}^{k\times\ell}italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × roman_ℓ end_POSTSUPERSCRIPT. The recovered foreground patterns from cICA are the columns of U𝖳Bsuperscript𝑈𝖳𝐵U^{\mathsf{T}}Bitalic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B. The columns of UU𝖳Bp×𝑈superscript𝑈𝖳𝐵superscript𝑝UU^{\mathsf{T}}B\in\mathbb{R}^{p\times\ell}italic_U italic_U start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × roman_ℓ end_POSTSUPERSCRIPT convert these projected foreground patterns back into the original space.

In practice, for our data visualization in Section 6.2, we choose the number k𝑘kitalic_k of PCA components to be 30 or the number of components that explains at least 90%percent9090\%90 % variance, whichever comes first.

5.2. Choosing the ranks

When computing the tensor decompositions in cICA, a key step is to determine the ranks r𝑟ritalic_r and \ellroman_ℓ. To choose the ranks, we can use the flattenings of the cumulants, the matrices Mat(κ4(𝐱)),Mat(κ4(𝐲))p2×p2Matsubscript𝜅4𝐱Matsubscript𝜅4𝐲superscriptsuperscript𝑝2superscript𝑝2\operatorname{Mat}(\kappa_{4}(\mathbf{x})),\operatorname{Mat}(\kappa_{4}(% \mathbf{y}))\in\mathbb{R}^{p^{2}\times p^{2}}roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) , roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. If the expressions for the cumulant tensors κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) and κ4(𝐲)subscript𝜅4𝐲\kappa_{4}(\mathbf{y})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) in (4) hold exactly, and if r+(p+12)𝑟binomial𝑝12r+\ell\leq{p+1\choose 2}italic_r + roman_ℓ ≤ ( binomial start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ) and the vectors 𝐚i,𝐛jsubscript𝐚𝑖subscript𝐛𝑗\mathbf{a}_{i},\mathbf{b}_{j}bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are generic, then

r=rank(Mat(κ4(𝐲)))andr+=rank(Mat(κ4(𝐱))).formulae-sequence𝑟rankMatsubscript𝜅4𝐲and𝑟rankMatsubscript𝜅4𝐱r=\text{rank}(\text{Mat}(\kappa_{4}(\mathbf{y})))\quad\text{and}\quad r+\ell=% \text{rank}(\text{Mat}(\kappa_{4}(\mathbf{x}))).italic_r = rank ( Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) ) and italic_r + roman_ℓ = rank ( Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) ) .

For non-exact cumluants, such as sample cumulants, we do not work with the exact ranks of the flattening matrices, but instead examine plots of the eigenvalues in descending magnitude (see e.g. Figure 7) to choose an appropriate cut-off. We choose r𝑟ritalic_r such that the decrease of the eigenvalue plot of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) slows down, choose q𝑞qitalic_q such that the decrease of the eigenvalue plot of Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) slows down, and calculate =qr𝑞𝑟\ell=q-rroman_ℓ = italic_q - italic_r. General cICA has hyperparameters r𝑟ritalic_r and \ellroman_ℓ; proportional cICA has one hyperparameter \ellroman_ℓ.

We discuss how the results may be affected by an incorrect choice of r𝑟ritalic_r and \ellroman_ℓ and justify our proposed way to order the foreground patterns 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT by importance in (11).

Let the true ranks be r𝑟ritalic_r and \ellroman_ℓ and assume that we have used rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and superscript\ell^{\prime}roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the input to Algorithm 2.

  • If >superscript\ell^{\prime}>\ellroman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > roman_ℓ, then superscript\ell^{\prime}-\ellroman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - roman_ℓ foreground patterns are noise.

  • If <superscript\ell^{\prime}<\ellroman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < roman_ℓ, then superscript\ell-\ell^{\prime}roman_ℓ - roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT foreground patterns are not recovered.

  • If r<rsuperscript𝑟𝑟r^{\prime}<ritalic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_r, then background patterns are mixed with foreground patterns, as follows. Assuming without loss of generality that we have recovered 𝐚1,,𝐚rsubscript𝐚1subscript𝐚superscript𝑟\mathbf{a}_{1},\ldots,\mathbf{a}_{r^{\prime}}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, the third step of Algorithm 2 decomposes the tensor i=r+1rλi𝐚i4+j=1νj𝐛j4superscriptsubscript𝑖superscript𝑟1𝑟superscriptsubscript𝜆𝑖superscriptsubscript𝐚𝑖tensor-productabsent4superscriptsubscript𝑗1subscript𝜈𝑗superscriptsubscript𝐛𝑗tensor-productabsent4\sum_{i=r^{\prime}+1}^{r}\lambda_{i}^{\prime}\mathbf{a}_{i}^{\otimes 4}+\sum_{% j=1}^{\ell}\nu_{j}\mathbf{b}_{j}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT via HTD, as in Algorithm 1. If the orthogonality hypotheses of Proposition 3.3 hold, then the recovered foreground patterns are recovered together with some background patterns that are incorrectly interpreted as foreground patterns. If the approximate orthogonality hypotheses of Theorem 3.4 hold, then the foreground patterns are recovered approximately, together with background patterns that are classed as foreground patterns. Without an orthogonality condition, the recovered foreground patterns 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT will be polluted but still roughly collinear to the true foreground patterns for small rr𝑟superscript𝑟r-r^{\prime}italic_r - italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT or when the dimension of the dataset is large, resulting in almost orthogonality between random vectors.

  • If r>rsuperscript𝑟𝑟r^{\prime}>ritalic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_r, then foreground patterns are mixed with background noise, as follows. Some background patterns from Algorithm 2 will be noise, say 𝐚r+1,,𝐚rsuperscriptsubscript𝐚𝑟1superscriptsubscript𝐚superscript𝑟\mathbf{a}_{r+1}^{\prime},\ldots,\mathbf{a}_{r^{\prime}}^{\prime}bold_a start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_a start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Step 2 of Algorithm 2 computes the coefficients of the tensors (𝐚r+1)4,,(𝐚r)4superscriptsuperscriptsubscript𝐚𝑟1tensor-productabsent4superscriptsuperscriptsubscript𝐚superscript𝑟tensor-productabsent4(\mathbf{a}_{r+1}^{\prime})^{\otimes 4},\ldots,(\mathbf{a}_{r^{\prime}}^{% \prime})^{\otimes 4}( bold_a start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT , … , ( bold_a start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT in κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ), though they are not true rank one components of κ4(𝐱)subscript𝜅4𝐱\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ). In Step 3, the tensor to be decomposed has the form i=1rrμi(𝐚r+i)4+i=1νi𝐛i4superscriptsubscript𝑖1superscript𝑟𝑟subscript𝜇𝑖superscriptsuperscriptsubscript𝐚𝑟𝑖tensor-productabsent4superscriptsubscript𝑖1subscript𝜈𝑖superscriptsubscript𝐛𝑖tensor-productabsent4\sum_{i=1}^{r^{\prime}-r}\mu_{i}(\mathbf{a}_{r+i}^{\prime})^{\otimes 4}+\sum_{% i=1}^{\ell}\nu_{i}\mathbf{b}_{i}^{\otimes 4}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_r end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_r + italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ 4 end_POSTSUPERSCRIPT for some μ1,,μrrsubscript𝜇1subscript𝜇superscript𝑟𝑟\mu_{1},\ldots,\mu_{r^{\prime}-r}\in\mathbb{R}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_r end_POSTSUBSCRIPT ∈ blackboard_R. As in the case r<rsuperscript𝑟𝑟r^{\prime}<ritalic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_r, the foreground patterns can still be exactly or approximately recovered, under the hypotheses of Proposition 3.3 and Theorem 3.4 respectively, albeit with some background noise recovered as foreground patterns.

The above discussion shows that when rrsuperscript𝑟𝑟r^{\prime}\neq ritalic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_r, the vectors 𝐛1,,𝐛subscript𝐛1subscript𝐛\mathbf{b}_{1},\ldots,\mathbf{b}_{\ell}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_b start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT obtained from Algorithm 2 could represent foreground patterns, background patterns, or noise. We order the vectors according to (11). The denominator of (11) is the variance of the linearly transformed background dataset Y𝐛𝑌𝐛Y\mathbf{b}italic_Y bold_b. The numerator is that of the transformed dataset X𝐛𝑋𝐛X\mathbf{b}italic_X bold_b. Their ratio enables us to select the most relevant foreground patterns, as follows.

  • If 𝐛𝐛\mathbf{b}bold_b is a foreground pattern, we expect 𝐛𝖳κ2(𝐲)𝐛superscript𝐛𝖳subscript𝜅2𝐲𝐛\mathbf{b}^{\mathsf{T}}\kappa_{2}(\mathbf{y})\mathbf{b}bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_y ) bold_b to be small relative to 𝐛𝖳κ2(𝐱)𝐛superscript𝐛𝖳subscript𝜅2𝐱𝐛\mathbf{b}^{\mathsf{T}}\kappa_{2}(\mathbf{x})\mathbf{b}bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) bold_b, hence a large k(𝐛)𝑘𝐛k(\mathbf{b})italic_k ( bold_b ).

  • If 𝐛𝐛\mathbf{b}bold_b is a background pattern, we expect 𝐛𝖳κ2(𝐲)𝐛α𝐛𝖳κ2(𝐱)𝐛superscript𝐛𝖳subscript𝜅2𝐲𝐛𝛼superscript𝐛𝖳subscript𝜅2𝐱𝐛\mathbf{b}^{\mathsf{T}}\kappa_{2}(\mathbf{y})\mathbf{b}\approx\alpha\mathbf{b}% ^{\mathsf{T}}\kappa_{2}(\mathbf{x})\mathbf{b}bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_y ) bold_b ≈ italic_α bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) bold_b for some constant α𝛼\alphaitalic_α and hence k(𝐛)α𝑘𝐛𝛼k(\mathbf{b})\approx\alphaitalic_k ( bold_b ) ≈ italic_α.

  • If 𝐛𝐛\mathbf{b}bold_b is foreground noise, we expect a small 𝐛𝖳κ2(𝐱)𝐛superscript𝐛𝖳subscript𝜅2𝐱𝐛\mathbf{b}^{\mathsf{T}}\kappa_{2}(\mathbf{x})\mathbf{b}bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) bold_b, hence small k(𝐛)𝑘𝐛k(\mathbf{b})italic_k ( bold_b ).

  • If 𝐛𝐛\mathbf{b}bold_b is background noise, we expect a small 𝐛𝖳κ2(𝐲)𝐛superscript𝐛𝖳subscript𝜅2𝐲𝐛\mathbf{b}^{\mathsf{T}}\kappa_{2}(\mathbf{y})\mathbf{b}bold_b start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_y ) bold_b, hence a large k(𝐛)𝑘𝐛k(\mathbf{b})italic_k ( bold_b ). To prevent the background noise showing up in the recovered foreground pattern, we require rrsuperscript𝑟𝑟r^{\prime}\leq ritalic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_r.

In practice, we consider those patterns for which k(𝐛)𝑘𝐛k(\mathbf{b})italic_k ( bold_b ) exceeds a certain threshold, or take the patterns with the two highest values of k(𝐛)𝑘𝐛k(\mathbf{b})italic_k ( bold_b ).

5.3. Visualization

We discuss how to interpret coordinates when using cICA for dimensionality reduction. The following proposition relates the projections 𝐛iT𝐱superscriptsubscript𝐛𝑖𝑇𝐱\mathbf{b}_{i}^{T}\mathbf{x}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x for i[]𝑖delimited-[]i\in[\ell]italic_i ∈ [ roman_ℓ ] to the latent variables sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Proposition 5.1.

Consider the cICA model in (3). Suppose 𝐛i=1normsubscript𝐛𝑖1\|\mathbf{b}_{i}\|=1∥ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = 1 for i[]𝑖delimited-[]i\in[\ell]italic_i ∈ [ roman_ℓ ]. Assume that for some small ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 that |𝐛i,𝐛j|<ϵsubscript𝐛𝑖subscript𝐛𝑗italic-ϵ|\langle\mathbf{b}_{i},\mathbf{b}_{j}\rangle|<\epsilon| ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | < italic_ϵ and |𝐛i,𝐚k|<ϵsubscript𝐛𝑖subscript𝐚𝑘italic-ϵ|\langle\mathbf{b}_{i},\mathbf{a}_{k}\rangle|<\epsilon| ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | < italic_ϵ for ij[]𝑖𝑗delimited-[]i\neq j\in[\ell]italic_i ≠ italic_j ∈ [ roman_ℓ ], k[r]𝑘delimited-[]𝑟k\in[r]italic_k ∈ [ italic_r ]. Then, for each i[]𝑖delimited-[]i\in[\ell]italic_i ∈ [ roman_ℓ ],

|si𝐛iT𝐱|=(rC𝐳+(1)C𝐬)𝒪(ϵ),subscript𝑠𝑖superscriptsubscript𝐛𝑖𝑇𝐱𝑟subscript𝐶superscript𝐳1subscript𝐶𝐬𝒪italic-ϵ|s_{i}-\mathbf{b}_{i}^{T}\mathbf{x}|=(rC_{\mathbf{z}^{\prime}}+(\ell-1)C_{% \mathbf{s}})\mathcal{O}(\epsilon),| italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x | = ( italic_r italic_C start_POSTSUBSCRIPT bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( roman_ℓ - 1 ) italic_C start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ) caligraphic_O ( italic_ϵ ) ,

where C𝐳subscript𝐶superscript𝐳C_{\mathbf{z}^{\prime}}italic_C start_POSTSUBSCRIPT bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and C𝐬subscript𝐶𝐬C_{\mathbf{s}}italic_C start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT are upper bounds on the magnitudes of random variables in 𝐳superscript𝐳\mathbf{z}^{\prime}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝐬𝐬\mathbf{s}bold_s. In particular, 𝐛iT𝐱superscriptsubscript𝐛𝑖𝑇𝐱\mathbf{b}_{i}^{T}\mathbf{x}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x approximates the component sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with an error linear in ϵitalic-ϵ\epsilonitalic_ϵ.

Proof.

Recall from (3) that 𝐱=A𝐳+B𝐬𝐱𝐴superscript𝐳𝐵𝐬\mathbf{x}=A\mathbf{z}^{\prime}+B\mathbf{s}bold_x = italic_A bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_B bold_s. Hence

𝐛iT𝐱superscriptsubscript𝐛𝑖𝑇𝐱\displaystyle\mathbf{b}_{i}^{T}\mathbf{x}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x =(𝐛iTA)𝐳+(𝐛iTB)𝐬=k=1r𝐛i,𝐚kzk+j=1,ji𝐛i,𝐛jsj+si.absentsuperscriptsubscript𝐛𝑖𝑇𝐴superscript𝐳superscriptsubscript𝐛𝑖𝑇𝐵𝐬superscriptsubscript𝑘1𝑟subscript𝐛𝑖subscript𝐚𝑘subscriptsuperscript𝑧𝑘superscriptsubscriptformulae-sequence𝑗1𝑗𝑖subscript𝐛𝑖subscript𝐛𝑗subscript𝑠𝑗subscript𝑠𝑖\displaystyle=(\mathbf{b}_{i}^{T}A)\mathbf{z}^{\prime}+(\mathbf{b}_{i}^{T}B)% \mathbf{s}=\sum_{k=1}^{r}\langle\mathbf{b}_{i},\mathbf{a}_{k}\rangle z^{\prime% }_{k}+\sum_{j=1,j\neq i}^{\ell}\langle\mathbf{b}_{i},\mathbf{b}_{j}\rangle s_{% j}+s_{i}.= ( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + ( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_B ) bold_s = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 , italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

The almost orthogonality conditions of the proposition then imply that

|si𝐛iT𝐱|subscript𝑠𝑖superscriptsubscript𝐛𝑖𝑇𝐱\displaystyle|s_{i}-\mathbf{b}_{i}^{T}\mathbf{x}|| italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x | k=1r|𝐛i,𝐚k||zk|+j=1|𝐛i,𝐛j||sj|(rC𝐳+(1)C𝐬)ϵ.absentsuperscriptsubscript𝑘1𝑟subscript𝐛𝑖subscript𝐚𝑘subscriptsuperscript𝑧𝑘superscriptsubscript𝑗1subscript𝐛𝑖subscript𝐛𝑗subscript𝑠𝑗𝑟subscript𝐶superscript𝐳1subscript𝐶𝐬italic-ϵ\displaystyle\leq\sum_{k=1}^{r}|\langle\mathbf{b}_{i},\mathbf{a}_{k}\rangle||z% ^{\prime}_{k}|+\sum_{j=1}^{\ell}|\langle\mathbf{b}_{i},\mathbf{b}_{j}\rangle||% s_{j}|\leq(rC_{\mathbf{z}^{\prime}}+(\ell-1)C_{\mathbf{s}})\epsilon.\qed≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | | italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT | ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | | italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ ( italic_r italic_C start_POSTSUBSCRIPT bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ( roman_ℓ - 1 ) italic_C start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ) italic_ϵ . italic_∎

The almost orthogonality conditions in Proposition 5.1 are strong requirements. However, they can be relaxed – if |𝐛i,𝐛j|<ϵsubscript𝐛𝑖subscript𝐛𝑗italic-ϵ|\langle\mathbf{b}_{i},\mathbf{b}_{j}\rangle|<\epsilon| ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ | < italic_ϵ for chosen i,j[]𝑖𝑗delimited-[]i,j\in[\ell]italic_i , italic_j ∈ [ roman_ℓ ] and sources sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have wider variance than (𝐛i𝖳A)𝐳superscriptsubscript𝐛𝑖𝖳𝐴superscript𝐳(\mathbf{b}_{i}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (𝐛j𝖳A)𝐳superscriptsubscript𝐛𝑗𝖳𝐴superscript𝐳(\mathbf{b}_{j}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then plotting 𝐛i𝖳Xsuperscriptsubscript𝐛𝑖𝖳𝑋\mathbf{b}_{i}^{\mathsf{T}}Xbold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_X against 𝐛j𝖳Xsuperscriptsubscript𝐛𝑗𝖳𝑋\mathbf{b}_{j}^{\mathsf{T}}Xbold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_X still approximates the plot of sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT against sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

If (𝐛i𝖳A)𝐳superscriptsubscript𝐛𝑖𝖳𝐴superscript𝐳(\mathbf{b}_{i}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (𝐛j𝖳A)𝐳superscriptsubscript𝐛𝑗𝖳𝐴superscript𝐳(\mathbf{b}_{j}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are uncorrelated, we expect the plot of X𝐛i𝑋subscript𝐛𝑖X\mathbf{b}_{i}italic_X bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT against X𝐛j𝑋subscript𝐛𝑗X\mathbf{b}_{j}italic_X bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to show axis-aligned clusters; otherwise, clusters may not be axis-aligned. We specify the condition for (𝐛i𝖳A)𝐳superscriptsubscript𝐛𝑖𝖳𝐴superscript𝐳(\mathbf{b}_{i}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (𝐛j𝖳A)𝐳superscriptsubscript𝐛𝑗𝖳𝐴superscript𝐳(\mathbf{b}_{j}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to be uncorrelated, assuming that all variables in the tuple 𝐳superscript𝐳\mathbf{z}^{\prime}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT have the same variance.

Proposition 5.2.

Consider the cICA model in (3). Suppose that the independent variables 𝐳superscript𝐳\mathbf{z^{\prime}}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a tuple of independent random variables with the same variance. Then (𝐛i𝖳A)𝐳superscriptsubscript𝐛𝑖𝖳𝐴superscript𝐳(\mathbf{b}_{i}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (𝐛j𝖳A)𝐳superscriptsubscript𝐛𝑗𝖳𝐴superscript𝐳(\mathbf{b}_{j}^{\mathsf{T}}A)\mathbf{z}^{\prime}( bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ) bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are uncorrelated if and only if 𝐛i𝖳A,𝐛j𝖳A=0superscriptsubscript𝐛𝑖𝖳𝐴superscriptsubscript𝐛𝑗𝖳𝐴0\langle\mathbf{b}_{i}^{\mathsf{T}}A,\mathbf{b}_{j}^{\mathsf{T}}A\rangle=0⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A ⟩ = 0.

Proof.

Write 𝐮=𝐛i𝖳A𝐮superscriptsubscript𝐛𝑖𝖳𝐴\mathbf{u}=\mathbf{b}_{i}^{\mathsf{T}}Abold_u = bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A and 𝐯=𝐛j𝖳A𝐯superscriptsubscript𝐛𝑗𝖳𝐴\mathbf{v}=\mathbf{b}_{j}^{\mathsf{T}}Abold_v = bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A. By the bilinearity of the covariance

Cov(𝐮𝐳,𝐯𝐳)=1i,jruivjCov(zi,zj)=1iruiviVar(zi)=Var(z1)1iruivi.Covsuperscript𝐮𝐳superscript𝐯𝐳subscriptformulae-sequence1𝑖𝑗𝑟subscript𝑢𝑖subscript𝑣𝑗Covsubscriptsuperscript𝑧𝑖subscriptsuperscript𝑧𝑗subscript1𝑖𝑟subscript𝑢𝑖subscript𝑣𝑖Varsubscriptsuperscript𝑧𝑖Varsubscriptsuperscript𝑧1subscript1𝑖𝑟subscript𝑢𝑖subscript𝑣𝑖\displaystyle\mathrm{Cov}(\mathbf{u}\mathbf{z}^{\prime},\mathbf{v}\mathbf{z}^{% \prime})=\sum_{1\leq i,j\leq r}u_{i}v_{j}\mathrm{Cov}(z^{\prime}_{i},z^{\prime% }_{j})=\sum_{1\leq i\leq r}u_{i}v_{i}\mathrm{Var}(z^{\prime}_{i})=\mathrm{Var}% (z^{\prime}_{1})\sum_{1\leq i\leq r}u_{i}v_{i}.roman_Cov ( bold_uz start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_vz start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT 1 ≤ italic_i , italic_j ≤ italic_r end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_Cov ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_r end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Var ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_Var ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_r end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

The last expression is zero if and only if 𝐮,𝐯=0𝐮𝐯0\langle\mathbf{u},\mathbf{v}\rangle=0⟨ bold_u , bold_v ⟩ = 0. ∎

6. Numerical experiments

We investigate the performance of cICA for finding patterns in data (Section 6.1) and for data visualization (Section 6.2). Our code is available on GitHub at https://github.com/QWE123665/cICA.

6.1. Finding patterns

The cICA patterns are the foreground vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We show that cICA recovers these vectors accurately for synthetic data, comparing it with cPCA [AZBZ17] and PCPCA [LJE20]. We also apply cICA to gene expression data from [SCJ+23]. Taking monkey gene expression as the background and human gene expression as the foreground, we relate the cICA patterns to existing results to identify genes responsible for human evolution.

6.1.1. Synthetic data

We use synthetic data to assess the accuracy of the patterns recovered by cICA, both for general cICA (Algorithm 2) and proportional cICA (Algorithm 3). We compare against cPCA and PCPCA, illustrating that cICA algorithms recover the foreground patterns more accurately when generated under a model (3) that assumes independence of latent variables, see Figure 2. The details of the simulations are in Appendix A.1.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2. The similarity of the recovered vs. true foreground patterns (i.e. the accuracy of recovering matrix B𝐵Bitalic_B), measured via cosine similarity in (a) and (c) and relative Frobenius error in (b) and (d). The x𝑥xitalic_x-axis is the number of variables p𝑝pitalic_p, which ranges from 4 to 12. Plots (a) and (b) refer to cICA in Algorithm 2. The interquartile range over 100 runs is shaded in red, with the best run shown as the red line. Plots (c) and (d) refer to proportional cICA in Algorithm 3, which is deterministic. For cPCA and PCPCA, we test 100 hyperparameter values and plot the one with lowest error.

We see from Figure 2 that cICA outperforms cPCA and PCPCA in recovering the foreground patterns. Figure 2(a) shows that the interquartile range for cICA in Algorithm 2 is above the maximum cosine similarity results for cPCA and PCPCA. The best performing cICA has cosine similarity above 0.9 for all tested p𝑝pitalic_p. Figure 2(b) shows analogous results with accuracy measured via relative Frobenius error. The variability in performance as p𝑝pitalic_p changes is due to randomness in the matrix A𝐴Aitalic_A. Figures 2(c) and (d) show analogous results for proportional cICA from Algorithm 3, with hyperparameter γ𝛾\gammaitalic_γ learned from Theorem 4.5. The method outperforms cPCA and PCPCA, with the added benefit that no selection of hyperparameters is necessary.

6.1.2. Human and monkey gene expression data

We apply cICA to a dataset of human and monkey gene expression from [SCJ+23], in which the authors analyse human, chimp, gorilla, macaque, and marmoset datasets to identify genes that are responsible for evolutionary change. Out of 14131 genes, they identify 3383 genes with extensive differences between human and non-human primates, of which they identify a subset of 139 with deeply conserved co-expression across all non-human animals, and strongly divergent co-expression relationships in humans.

We select the 15 most variant genes among the 139 selected genes and the 15 most variant genes among the other 3244=3383139324433831393244=3383-1393244 = 3383 - 139 genes. We combine 10000 chimp and 10000 gorilla data points to form the background dataset Y20000×30𝑌superscript2000030Y\in\mathbb{R}^{20000\times 30}italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT 20000 × 30 end_POSTSUPERSCRIPT and 10000 human gene expression data points for the foreground dataset X10000×30𝑋superscript1000030X\in\mathbb{R}^{10000\times 30}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT 10000 × 30 end_POSTSUPERSCRIPT. Then we apply cICA as in Algorithm 2 and use (11) to order the 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and extract the first two vectors 𝐛1,𝐛230subscript𝐛1subscript𝐛2superscript30\mathbf{b}_{1},\mathbf{b}_{2}\in\mathbb{R}^{30}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT. We observe that the 15 genes with highest absolute values in 𝐛1subscript𝐛1\mathbf{b}_{1}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (resp. 𝐛2subscript𝐛2\mathbf{b}_{2}bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) have 10 (resp. 13) genes among the 15 selected genes that come from the subset of 139 in [SCJ+23]. This demonstrates consistency with the results from [SCJ+23]: the vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT assign higher weights to the genes from the subset of 139. For details, see Appendix A.2.

6.2. Data visualization

We use cICA for dimensionality reduction and data visualization, as described in Section 4.3. We investigate the performance on three datasets: mouse protein expression, corrupted MNIST images, and gene expression, the same data studied in the papers [AZBZ17, LJE20]. We quantify the performance of the methods, using the silhouette score [Rou87] of the projected data; higher values indicate better clustering of points.

6.2.1. Mouse protein data

We study the mouse protein dataset from [HGC15]. The foreground data measure protein expression in the cortex of mice subjected to shock therapy, some of whom have Down syndrome. The background dataset consists of protein expression measurements from mice without Down Syndrome who did not receive shock therapy. We compare general cICA, proportional cICA, as well as cPCA and PCPCA. All four algorithms can separate the two clusters in the foreground data, corresponding to mice with Down syndrome and those without, though the projections differ. The general cICA algorithm has the highest silhouette score (0.606), followed by proportional cICA (0.604), then cPCA (0.421), and then PCPCA (0.220), see Figure 3. See Appendix A.3 for details.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3. Dimensionality reduction of the mouse protein data [HGC15] via (a) general cICA (b) proportional cICA (c) cPCA (d) PCPCA. For (a), we fix a random seed. For (b), (c), and (d), we plot the projection with the best silhouette score over 100 hyperparameter values.

6.2.2. Corrupted MNIST data

Next we explore the corrupted MNIST dataset from [AZBZ18]. The foreground data are digits 0, 1 from the MNIST dataset superimposed with strength 0.25 onto 5000 randomly selected grass images from ImageNet. The background data are the 5000 grass images. Each image has size 28×28282828\times 2828 × 28. The projections are shown in Figure 4. All four algorithms cluster the foreground data between digits 0 and 1. The cPCA algorithm has the highest silhouette score (0.546), followed by proportional cICA (0.508), general cICA (0.451) and PCPCA (0.009). See Appendix A.4.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4. Dimensionality reduction of the correupted MNIST data from [AZBZ18] via (a) general cICA (b) proportional cICA (c) cPCA (d) PCPCA.

6.2.3. Transplant gene expression data

We study the single-cell RNA sequencing data from [ZTB+17]. The foreground datas are gene expressions of bone marrow mononuclear cells from patients with acute myeloid leukemia before and after they received a stem-cell transplant; the background dataset contains gene expression measurements of healthy people. The projection plots of the four algorithms are shown in Figure 5. cPCA has the highest silhouette score (0.451), followed by proportional cICA (0.402), then general cICA (0.344), then PCPCA (0.164). See Appendix A.5 for details.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5. Dimensionality reduction of the single-cell RNA sequencing data from [ZTB+17] via (a) general cICA (b) proportional cICA (c) cPCA (d) PCPCA. For (a), we fix a random seed. For (b), (c), and (d), we plot the projection with the best silhouette score over 100 hyperparameter values.

7. Summary

We have presented contrastive independent component analysis (cICA), a tool to explore patterns and visualize data in one setting relative to another. We designed algorithms for cICA based on a new hierarchical tensor decomposition that we introduce. We studied two variants: general and proportional cICA. The upside to general cICA is its higher expressivity: it is able to model background patterns that each contribute to the foreground in different relative amounts λi/λisuperscriptsubscript𝜆𝑖subscript𝜆𝑖\lambda_{i}^{\prime}/\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The advantage of proportional cICA is that it is deterministic, based solely on recursive eigendecompositions. We used our algorithms to find contrastive patterns that describe a foreground dataset relative to a background, testing the results on synthetic and real-world datasets. We saw its potential to extract foreground patterns of interest and its competitiveness with other contrastive methods.

We investigated the identifiability of cICA, via the uniqueness of its associated coupled tensor decomposition, seeing identifiability improvements relative to cPCA and PCPCA. This echoes the improved identifiability of ICA over PCA: a general linear mixing can be recovered uniquely via ICA, whereas PCA requires an orthogonal mixing.

We conclude with two directions for further study. Our cICA model describes observations as a linear mixing of independent latent variables. Drop** the linearity assumption, we may seek patterns that have nonlinear signature across the observed variables. This would combine the nonlinear contrastive methods of [AZ19, SGN19, WBWL22, LHH+24] with approaches to find interpretable patterns, generalizing our vectors 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Finally, drop** the independence assumption on the latent variables is also a promising direction for further study, which would connect cICA to other latent variable models such as those arising in causal disentanglement [YLC+21, SSBU23].

Acknowledgements

We thank Salil Bhate for helpful discussions. AM and AS were partially supported by the NSF (DMS-2306672 and DMR-2011754).

References

  • [AZ19] Abubakar Abid and James Zou. Contrastive variational autoencoder enhances salient features. arXiv preprint arXiv:1902.04601, 2019.
  • [AZBZ17] Abubakar Abid, Martin J Zhang, Vivek K Bagaria, and James Zou. Contrastive principal component analysis. arXiv preprint arXiv:1709.06716, 2017.
  • [AZBZ18] Abubakar Abid, Martin J Zhang, Vivek K Bagaria, and James Zou. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nature communications, 9(1):2134, 2018.
  • [CJ10] Pierre Comon and Christian Jutten. Handbook of Blind Source Separation: Independent component analysis and applications. Academic press, 2010.
  • [Com94] Pierre Comon. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994.
  • [COV17] Luca Chiantini, Giorgio Ottaviani, and Nick Vannieuwenhoven. On generic identifiability of symmetric tensors of subgeneric rank. Transactions of the American Mathematical Society, 369(6):4021–4042, 2017.
  • [CS93] Jean-François Cardoso and Antoine Souloumiac. Blind beamforming for non-Gaussian signals. In IEE proceedings F (radar and signal processing), volume 140, pages 362–370. IET, 1993.
  • [DLCC07] Lieven De Lathauwer, Josphine Castaing, and Jean-Franois Cardoso. Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing, 55:2965–2973, 2007.
  • [DLDMV01] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. Independent component analysis and (simultaneous) third-order tensor diagonalization. IEEE Transactions on Signal Processing, 49(10):2262–2271, 2001.
  • [Dom18] Krzysztof Domino. The use of fourth order cumulant tensors to detect outlier features modelled by a t-student copula. arXiv preprint arXiv:1804.00541, 2018.
  • [EK04] J. Eriksson and V. Koivunen. Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters, 11(7):601–604, 2004.
  • [GW20] Xiurui Geng and Lei Wang. NPSA: Nonorthogonal principal skewness analysis. IEEE Transactions on Image Processing, 29:6396–6408, 2020.
  • [Hac12] Wolfgang Hackbusch. Tensor spaces and numerical tensor calculus, volume 42. Springer, 2012.
  • [HGC15] Clara Higuera, Katheleen J Gardiner, and Krzysztof J Cios. Self-organizing feature maps identify proteins critical to learning in a mouse model of Down syndrome. PloS one, 10(6):e0129126, 2015.
  • [HM16] Aapo Hyvarinen and Hiroshi Morioka. Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. Advances in neural information processing systems, 29, 2016.
  • [HST19] Aapo Hyvarinen, Hiroaki Sasaki, and Richard Turner. Nonlinear ICA using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 859–868. PMLR, 2019.
  • [JA95] A. Hirschowitz J. Alexander. Polynomial interpolation in several variables. J. Algebraic Geom. 4(4) (1995), 1995.
  • [KP19] Joe Kileel and Joao M Pereira. Subspace power method for symmetric tensor decomposition and generalized PCA. arXiv preprint arXiv:1912.04007, 2019.
  • [LF22] Qi Lyu and Xiao Fu. On finite-sample identifiability of contrastive learning-based nonlinear independent component analysis. In International Conference on Machine Learning, pages 14582–14600. PMLR, 2022.
  • [LHH+24] Romain Lopez, Jan-Christian Huetter, Ehsan Hajiramezanali, Jonathan Pritchard, and Aviv Regev. Toward the identifiability of comparative deep generative models. arXiv preprint arXiv:2401.15903, 2024.
  • [LJE20] Didong Li, Andrew Jones, and Barbara Engelhardt. Probabilistic contrastive principal component analysis. arXiv preprint arXiv:2012.07977, 2020.
  • [LM08] Lek-Heng Lim and Jason Morton. Cumulant component analysis: a simultaneous generalization of PCA and ICA. CASTA2008, 18, 2008.
  • [McC18] Peter McCullagh. Tensor methods in statistics: Monographs on statistics and applied probability. Chapman and Hall/CRC, 2018.
  • [Rob16] Elina Robeva. Orthogonal decomposition of symmetric tensors. SIAM Journal on Matrix Analysis and Applications, 37(1):86–102, 2016.
  • [Rou87] Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
  • [SCJ+23] Hamsini Suresh, Megan Crow, Nikolas Jorstad, Rebecca Hodge, Ed Lein, Alexander Dobin, Trygve Bakken, and Jesse Gillis. Comparative single-cell transcriptomic analysis of primate brains highlights human-specific regulatory evolution. Nature Ecology & Evolution, 7(11):1930–1943, 2023.
  • [SGN19] Kristen A Severson, Soumya Ghosh, and Kenney Ng. Unsupervised learning with contrastive latent variable models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4862–4869, 2019.
  • [SRK09] Jussi Salmi, Andreas Richter, and Visa Koivunen. Sequential unfolding SVD for tensors with applications in array signal processing. IEEE Transactions on Signal Processing, 57(12):4719–4733, 2009.
  • [SSBU23] Chandler Squires, Anna Seigal, Salil S Bhate, and Caroline Uhler. Linear causal disentanglement via interventions. In International Conference on Machine Learning, pages 32540–32560. PMLR, 2023.
  • [SSDU24] Nils Sturma, Chandler Squires, Mathias Drton, and Caroline Uhler. Unpaired multi-domain causal representation learning. Advances in Neural Information Processing Systems, 36, 2024.
  • [WBWL22] Ethan Weinberger, Nicasia Beebe-Wang, and Su-In Lee. Moment matching deep contrastive latent variable models. arXiv preprint arXiv:2202.10560, 2022.
  • [Wey12] Hermann Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen, 71(4):441–479, 1912.
  • [WS24] Kexin Wang and Anna Seigal. Identifiability of overcomplete independent component analysis. arXiv preprint arXiv:2401.14709, 2024.
  • [YLC+21] Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. CausalVAE: Disentangled representation learning via neural structural causal models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9593–9602, 2021.
  • [ZHPA13] James Y Zou, Daniel J Hsu, David C Parkes, and Ryan P Adams. Contrastive learning using spectral methods. Advances in Neural Information Processing Systems, 26, 2013.
  • [ZTB+17] Grace XY Zheng, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, Tobias D Wheeler, Geoff P McDermott, Junjie Zhu, et al. Massively parallel digital transcriptional profiling of single cells. Nature communications, 8(1):14049, 2017.

Appendix A Details of numerical experiments

All experiments are run on an Apple M2 Pro with 16 GB memory. Each run of each algorithm takes at most 1 minute.

A.1. Finding patterns from synthetic data

We describe the details of the synthetic data setup in Section 6.1.1 that produced Figure 2. We consider p[4,12]𝑝412p\in[4,12]italic_p ∈ [ 4 , 12 ]. Our samples come from the distributions (3), where matrices Ap×p𝐴superscript𝑝𝑝A\in\mathbb{R}^{p\times p}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT and Bp×(p1)𝐵superscript𝑝𝑝1B\in\mathbb{R}^{p\times(p-1)}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × ( italic_p - 1 ) end_POSTSUPERSCRIPT are random with unit vector columns, and the columns of B𝐵Bitalic_B are assumed to be orthogonal. We assume orthogonality of the columns of B𝐵Bitalic_B to facilitate comparison with the methods cPCA and PCPCA, which require this assumption.

For testing Algorithm 2 in Figure 2(a) and (b), variables sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are exponential distributions exp(θi)subscript𝜃𝑖\exp(\theta_{i})roman_exp ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) where θi=2subscript𝜃𝑖2\theta_{i}=2italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 when i𝑖iitalic_i is odd and θi=1.5subscript𝜃𝑖1.5\theta_{i}=1.5italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1.5 when i𝑖iitalic_i is even. Variables zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and zisubscriptsuperscript𝑧𝑖z^{\prime}_{i}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are exponential distributions exp(νi),exp(νi)subscript𝜈𝑖superscriptsubscript𝜈𝑖\exp(\nu_{i}),\exp(\nu_{i}^{\prime})roman_exp ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_exp ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) where νi=2,νi=1formulae-sequencesubscript𝜈𝑖2superscriptsubscript𝜈𝑖1\nu_{i}=2,\nu_{i}^{\prime}=1italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 , italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 when i𝑖iitalic_i is odd and νi=1,νi=2formulae-sequencesubscript𝜈𝑖1superscriptsubscript𝜈𝑖2\nu_{i}=1,\nu_{i}^{\prime}=2italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 2 when i𝑖iitalic_i is even. We generate 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT datapoints for both the foreground and background data and apply cICA to the sample cumulant tensors. cICA has randomness coming from the subspace power method. We apply our algorithm 100 times and get 100 recovered foreground mixing matrices Bp×(p1)𝐵superscript𝑝𝑝1B\in\mathbb{R}^{p\times(p-1)}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × ( italic_p - 1 ) end_POSTSUPERSCRIPT.

For testing Algorithm 3 in Figure 2(c) and (d), we let zi,zisubscript𝑧𝑖subscriptsuperscript𝑧𝑖z_{i},z^{\prime}_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be exponential distributions exp(νi),exp(νi)subscript𝜈𝑖superscriptsubscript𝜈𝑖\exp(\nu_{i}),\exp(\nu_{i}^{\prime})roman_exp ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_exp ( italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) where νi=νi=1subscript𝜈𝑖superscriptsubscript𝜈𝑖1\nu_{i}=\nu_{i}^{\prime}=1italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1. We learn the hyperparameter γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT via Theorem 4.5. The true γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is 1 and the recovered γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are all in the range [0.94,1.08]0.941.08[0.94,1.08][ 0.94 , 1.08 ].

We describe the implementation of the two methods we compare to. For cPCA [AZBZ17], we test 100 log-evenly spaced hyperparameters α𝛼\alphaitalic_α between 0 and 1000 with p1𝑝1p-1italic_p - 1 components. Each run returns a matrix of size p×(p1)𝑝𝑝1p\times(p-1)italic_p × ( italic_p - 1 ), whose columns are contrastive principal components with norm 1. For PCPCA, we test 100 evenly spaced hyperparameters γ𝛾\gammaitalic_γ between 0 and 0.9 and fix p1𝑝1p-1italic_p - 1 components. Each run returns a matrix of size p×(p1)𝑝𝑝1p\times(p-1)italic_p × ( italic_p - 1 ). We normalize the columns to unit norm, to compare PCPCA with the other algorithms.

Since the columns of B𝐵Bitalic_B that are recovered are only unique up to permutation and sign, we describe how to align the outputs. Let Bp×(p1)superscript𝐵superscript𝑝𝑝1B^{\prime}\in\mathbb{R}^{p\times(p-1)}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × ( italic_p - 1 ) end_POSTSUPERSCRIPT be a recovered matrix. Rather than searching over all ways to match the columns of B𝐵Bitalic_B to those of Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we use a greedy algorithm to approximate the matching, as follows. We fix the first column of B𝐵Bitalic_B, denoted 𝐛1subscript𝐛1\mathbf{b}_{1}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We choose one of the columns of Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whose cosine similarity with 𝐛1subscript𝐛1\mathbf{b}_{1}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT has largest absolute value. We set this to be the first column of Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, changing its sign if the cosine similarity is negative. Then we select among the remaining columns, the one with the largest absolute cosine similarity with 𝐛2subscript𝐛2\mathbf{b}_{2}bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and set this as the second column of Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT(again, changing the sign if the cosine similarity is negative). We continue until we reach the last column. Then we compute the relative Frobenius error and mean cosine similarity which are, respectively,

i=1pj=1p1(bijbij)2/(p1)and1p1i=1p1𝐛i,𝐛i.superscriptsubscript𝑖1𝑝superscriptsubscript𝑗1𝑝1superscriptsubscript𝑏𝑖𝑗superscriptsubscript𝑏𝑖𝑗2𝑝1and1𝑝1superscriptsubscript𝑖1𝑝1subscript𝐛𝑖superscriptsubscript𝐛𝑖\sqrt{\sum_{i=1}^{p}\sum_{j=1}^{p-1}(b_{ij}-b_{ij}^{\prime})^{2}/(p-1)}\qquad% \text{and}\qquad\frac{1}{p-1}\sum_{i=1}^{p-1}\langle\mathbf{b}_{i},\mathbf{b}_% {i}^{\prime}\rangle.square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_p - 1 ) end_ARG and divide start_ARG 1 end_ARG start_ARG italic_p - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ⟨ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ .

A.2. Finding patterns from gene expression data

We describe the patterns obtained from the comparison of human and monkey gene expression in Section 6.1.2. The selected 15 highest variance genes among the 139 selected genes in [SCJ+23] are EIF3K, NDUFA13, SARNP, MYL10, TAF9, PRCD, BBS5, MRPS14, RING1, AGPAT5, FLOT1, BTBD7, MASTL, KANK1, BDP1. The 15 highest variance genes among the remaining 3244=3383139324433831393244=3383-1393244 = 3383 - 139 genes are LUC7L3, RBKS, RBM7, AP4S1, CLCN1, CLASP1, ADTRP, CNNM3, NDUFAF7, CNIH4, RPUSD2, NELFCD, RPP14, ROMO1, RNF181.

We use the plots of the eigenvalues of the flattenings of κ4(𝐲),κ4(𝐱)subscript𝜅4𝐲subscript𝜅4𝐱\kappa_{4}(\mathbf{y}),\kappa_{4}(\mathbf{x})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) , italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) to choose r=22𝑟22r=22italic_r = 22 and =4622=24462224\ell=46-22=24roman_ℓ = 46 - 22 = 24. The top two foreground patterns are:

𝐛1𝖳=[\displaystyle\mathbf{b}_{1}^{\mathsf{T}}=[bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = [ 0.04,0.041,0.09,0.051,0.12,0.075,0.01,0.004,0.002,0.007,0.040.0410.090.0510.120.0750.010.0040.0020.007\displaystyle-0.04,-0.041,-0.09,-0.051,-0.12,0.075,0.01,-0.004,0.002,0.007,- 0.04 , - 0.041 , - 0.09 , - 0.051 , - 0.12 , 0.075 , 0.01 , - 0.004 , 0.002 , 0.007 ,
0.07,0.061,0.95,0.192,0.009,0.007,0.002,0.001,0.076,0.042,0.070.0610.950.1920.0090.0070.0020.0010.0760.042\displaystyle-0.07,-0.061,0.95,0.192,-0.009,-0.007,-0.002,-0.001,-0.076,-0.042,- 0.07 , - 0.061 , 0.95 , 0.192 , - 0.009 , - 0.007 , - 0.002 , - 0.001 , - 0.076 , - 0.042 ,
0.008,0.04,0.005,0.058,0.012,0.012,0.05,0.006,0.046,0.005]\displaystyle-0.008,-0.04,0.005,-0.058,0.012,-0.012,-0.05,-0.006,-0.046,-0.005]- 0.008 , - 0.04 , 0.005 , - 0.058 , 0.012 , - 0.012 , - 0.05 , - 0.006 , - 0.046 , - 0.005 ]
𝐛2𝖳=[\displaystyle\mathbf{b}_{2}^{\mathsf{T}}=[bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = [ 0.615,0.166,0.185,0.119,0.113,0.099,0.118,0.011,0.045,0.025,0.6150.1660.1850.1190.1130.0990.1180.0110.0450.025\displaystyle 0.615,-0.166,0.185,0.119,0.113,-0.099,-0.118,0.011,0.045,-0.025,0.615 , - 0.166 , 0.185 , 0.119 , 0.113 , - 0.099 , - 0.118 , 0.011 , 0.045 , - 0.025 ,
0.098,0.141,0.482,0.339,0.054,0.028,0.005,0.03,0.247,0.017,0.0980.1410.4820.3390.0540.0280.0050.030.2470.017\displaystyle 0.098,0.141,-0.482,-0.339,0.054,0.028,-0.005,0.03,0.247,-0.017,0.098 , 0.141 , - 0.482 , - 0.339 , 0.054 , 0.028 , - 0.005 , 0.03 , 0.247 , - 0.017 ,
0.031,0.043,0.012,0.043,0.015,0.04,0.025,0.002,0.236,0.016],\displaystyle-0.031,0.043,0.012,0.043,0.015,0.04,0.025,0.002,0.236,-0.016],- 0.031 , 0.043 , 0.012 , 0.043 , 0.015 , 0.04 , 0.025 , 0.002 , 0.236 , - 0.016 ] ,

where the coordinates are labelled by the 30 genes in the order listed above. The 15 genes with the largest absolute values of the top foreground pattern include 10 genes among the 139 selected in [SCJ+23]. The 15 genes with the largest absolute values of the second foreground pattern include 13 genes from [SCJ+23]. Therefore, the foreground patterns obtained via cICA demonstrate consistency with the finding in [SCJ+23] that this subset of 139 genes captures human-specific information.

A.3. Mouse protein data

There are 270 foreground samples. These are the protein expression in the cortex of mice subjected to shock therapy. Of these samples, 135 have Down syndrome and 135 do not. There are 135 background samples, protein expression measurements from mice without Down Syndrome who did not receive shock therapy. Each sample measures the expression of 77 proteins; that is, p=77𝑝77p=77italic_p = 77.

For cICA, we preprocess using PCA as described in Appendix 5.1. We take k=15𝑘15k=15italic_k = 15 components, which explain 90%percent9090\%90 % of the variance. We then choose r𝑟ritalic_r and \ellroman_ℓ, as described in Appendix 5.2. That is, we compute the eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ), ranking the eigenvalues by magnitude, see Figure 6. Based on these plots, we choose r=27𝑟27r=27italic_r = 27 and =5327=26532726\ell=53-27=26roman_ℓ = 53 - 27 = 26.

Refer to caption
Refer to caption
Figure 6. Absolute values of eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) (left) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) (right).

For general cICA, we fix the random seed to be 0. For proportional cICA, we run the algorithm for 100 log-evenly spaced γ𝛾\gammaitalic_γ between 0 and 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT. The highest silhouette score is obtained at γ=0𝛾0\gamma=0italic_γ = 0.

We run cPCA for 100 α𝛼\alphaitalic_α between 0 to 1000. These are the default values of α𝛼\alphaitalic_α in the code of [AZBZ17]. Note that our parameters for proportional cICA are square of the cPCA parameters, since if 𝐳=λ𝐳𝐳𝜆superscript𝐳\mathbf{z}=\lambda\mathbf{z}^{\prime}bold_z = italic_λ bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then κ2(𝐳)=λ2κ2(𝐳)subscript𝜅2𝐳superscript𝜆2subscript𝜅2superscript𝐳\kappa_{2}(\mathbf{z})=\lambda^{2}\kappa_{2}(\mathbf{z}^{\prime})italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z ) = italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and κ4(𝐳)=λ4κ4(𝐳)subscript𝜅4𝐳superscript𝜆4subscript𝜅4superscript𝐳\kappa_{4}(\mathbf{z})=\lambda^{4}\kappa_{4}(\mathbf{z}^{\prime})italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_z ) = italic_λ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). We plotted the choice with highest silhouette score, which was achieved for α=26.2𝛼26.2\alpha=26.2italic_α = 26.2.

We run PCPCA for 100 evenly spaced γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT values between 0 and 0.92701350.92701350.9\cdot\frac{270}{135}0.9 ⋅ divide start_ARG 270 end_ARG start_ARG 135 end_ARG. 270 and 135 are the number of samples in the foreground and background datasets, respectively. Such choices of γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are in accordance with the setup in [LJE20] and are sufficient to find the highest silhouette score. The best score was obtained when γ=0.9270135superscript𝛾0.9270135\gamma^{\prime}=0.9\cdot\frac{270}{135}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0.9 ⋅ divide start_ARG 270 end_ARG start_ARG 135 end_ARG. In [LJE20], the authors take a further step to scale the probabilistic contrastive principal components, before calculating the silhouette score. The silhouette score obtained after this additional step is 0.450.

A.4. Corrupted MNIST data

For the hyperparameters of cICA, we choose the number of components to be 30, which explains 80%percent8080\%80 % of the variance. We then choose r,𝑟r,\ellitalic_r , roman_ℓ for general cICA and \ellroman_ℓ for proportional cICA. We order the eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) according to their absolute values and plot parts of the ordered eigenvalues in Figure 7. Based on these plots, we choose r=53𝑟53r=53italic_r = 53 and r+=114𝑟114r+\ell=114italic_r + roman_ℓ = 114.

Refer to caption
Refer to caption
Figure 7. Absolute values of eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) (left) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) (right).

We fix the random seed to be 0 for general cICA. The visualization for the general cICA algorithm is in Figure 4(a) and the silhouette score is 0.451.

For proportional cICA, we run the algorithm for 100 log-evenly spaces γ𝛾\gammaitalic_γ between 0 and 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT and the highest silhouette score is 0.508, obtained when γ=0.21𝛾0.21\gamma=0.21italic_γ = 0.21. The plot is Figure 4(b).

For cPCA, we plot the first two cPCA components. As above, we run cPCA for 100 α𝛼\alphaitalic_α values between 0 and 1000. The highest silhouette score is 0.546, obtained when α=3.7𝛼3.7\alpha=3.7italic_α = 3.7. The plot is Figure 4(c).

We run PCPCA for 100 evenly spaced γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT between 0 and 0.90.90.90.9, in accordance with the setup in [LJE20]. The best silhouette score for the plot of the first two probabilistic contrastive principal components is 0.009, obtained when γ=0.9superscript𝛾0.9\gamma^{\prime}=0.9italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0.9. If we normalize the probabilistic contrastive principal components before calculating the silhouette score, the silhouette score obtained is 0.386. The plot is Figure 4(d).

A.5. Transplant gene expression data

There are 7525 pre-transplant patients and 4874 post-transplant patients in the foreground dataset. The background dataset consists of 4457 healthy patients. Each sample contains gene expression measurements of bone marrow mononuclear cells. We preprocess the data by log-transforming and subsetting to the 500 most variable genes, in accordance with previous analyses on these data [ZTB+17, AZBZ18, LJE20].

For the hyperparameters of cICA and HTD, we choose the number of components to be 15 which explains 54.5%percent54.554.5\%54.5 % of the variance. We then choose r,𝑟r,\ellitalic_r , roman_ℓ for cICA and \ellroman_ℓ for proportional cICA. We order the eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) according to their absolute values and plot out parts of the ranked eigenvalues in Figure 8. We choose r=53𝑟53r=53italic_r = 53 and r+=116𝑟116r+\ell=116italic_r + roman_ℓ = 116.

We fix the random seed to be 0 for cICA. The visualization for the cICA algorithm is in Figure 5(a) and the silhouette score is 0.344.

For proportional cICA, we run the algorithm for 100 log-evenly spaces γ𝛾\gammaitalic_γ between 0 and 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT and the highest silhouette score is 0.402, obtained when γ=0.50𝛾0.50\gamma=0.50italic_γ = 0.50. The plot is Figure 5(b).

Refer to caption
Refer to caption
Figure 8. Absolute values of eigenvalues of Mat(κ4(𝐲))Matsubscript𝜅4𝐲\mathrm{Mat}(\kappa_{4}(\mathbf{y}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_y ) ) (left) and Mat(κ4(𝐱))Matsubscript𝜅4𝐱\mathrm{Mat}(\kappa_{4}(\mathbf{x}))roman_Mat ( italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_x ) ) (right).

For cPCA, we plot the first two cPCA components. As above, we run cPCA using 100 α𝛼\alphaitalic_α between 0 to 1000, the default values from [AZBZ17]. The highest silhouette score is 0.457, obtained when α=3.5𝛼3.5\alpha=3.5italic_α = 3.5. The plot is Figure 5(c).

We run PCPCA for 100 evenly spaced γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT between 0 and 0.91239944570.91239944570.9\cdot\frac{12399}{4457}0.9 ⋅ divide start_ARG 12399 end_ARG start_ARG 4457 end_ARG, in accordance with [LJE20]. The numbers 12399 and 4457 are the sample sizes of the foreground and background datasets, respectively. In accordance with the experiment in [AZBZ17], we run PCPCA with 4 components. The best silhouette score over any γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and any pair of probabilistic contrastive principal components is 0.164, obtained when γ=0.41superscript𝛾0.41\gamma^{\prime}=0.41italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0.41 using the third and fourth components. If we normalize the probabilistic contrastive principal components and then calculate the silhouette score, the score obtained is 0.184. The plot is Figure 5(d).