The $G$ -invariant graph Laplacian

Eitan Rosen Department of Applied Mathematics, Tel Aviv University Paulina Hoyos Department of Mathematics, The University of Texas at Austin Xiuyuan Cheng Department of Mathematics, Duke University Joe Kileel Department of Mathematics, The University of Texas at Austin Yoel Shkolnisky Department of Applied Mathematics, Tel Aviv University

Abstract

Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group $G$ . We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of $G$ on the data set. We deem the latter construction the “ $G$ -invariant Graph Laplacian” ( $G$ -GL). We show that the $G$ -GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the $G$ -GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group $SU(2)$ .

1 Introduction

A popular modeling assumption in data analysis is that the observed data lie on a low dimensional manifold $\mathcal{M}$ that is embedded in high dimensional Euclidean space. When $\mathcal{M}$ is a linear subspace, it can be identified by using principal component analysis (PCA). However, most often $\mathcal{M}$ is non linear. A leading approach for analyzing data with a nonlinear manifold structure is to encode the data by using a graph, whose vertices are the data points, and whose edge weights encode the similarities between pairs of points. These similarities can be used to form a matrix known as the graph Laplacian, and its eigenvectors and eigenvalues are used for tasks such as dimensionality reduction, clustering, and denoising ([28, 2, 43]). While the term graph Laplacian has been given different definitions in different contexts [12, 22], in this paper we adopt the definition and notation in [3] and [28].

Formally, let $\left\{x_{1},\ldots,x_{N}\right\}$ be a set of points that reside on a compact and smooth $d$ -dimensional manifold $\mathcal{M}$ embedded in $\mathbb{C}^{n}$ . We form the matrix $W\in\mathbb{R}^{N\times N}$ with $W_{ij}=K(x_{i},x_{j})$ , where $K$ is a positive semi-definite kernel function. The graph Laplacian is then defined as the matrix $L\in\mathbb{R}^{N\times N}$ given by

L=D-W,\quad D_{ii}=\sum_{j=1}^{N}W_{ij},

(1.1)

where $D$ is the $N\times N$ diagonal matrix with the $i$ -th’ element on the diagonal given by $D_{ii}$ in (1.1). Various choices for the kernel $K$ have been utilized in the literature [3, 41]. In this work, we make the popular choice of $K$ being the Gaussian kernel function, due to its favorable analytical properties. In this case,

W_{ij}=K(x_{i},x_{j})=e^{-\left\lVert x_{i}-x_{j}\right\rVert^{2}/\epsilon},% \quad i,j\in\left\{1,\ldots,N\right\},

(1.2)

where $\epsilon$ is a bandwidth to be determined from the data.

A particularly important matrix related to $L$ of (1.1) is the random-walk normalized graph Laplacian, defined as

\tilde{L}=D^{-1}L=I-P,\quad P=D^{-1}W.

(1.3)

The matrix $P$ is row-stochastic, and thus, may be viewed as the transition probability matrix of a random walk over the data points (which gives $\tilde{L}$ its name). The latter view was adopted in the seminal work [28], where the eigenvectors of $P$ (which are identical to those of $\tilde{L}$ ) and its eigenvalues are used to construct “diffusion maps”, a successful machine learning framework for dimensionality reduction and clustering of manifold data. Furthermore, in [4] it was shown that if the data points are sampled uniformly from a manifold $\mathcal{M}$ , then $\tilde{L}$ converges to the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ on $\mathcal{M}$ as $\epsilon\rightarrow 0$ and $N\rightarrow\infty$ . Formally, it was shown that for a sufficiently smooth $f$ , with high probability

\frac{4}{\epsilon}\sum_{j=1}^{N}\tilde{L}_{ij}f(x_{j})=\Delta_{\mathcal{M}}f(x% _{i})+O\left(\frac{1}{N^{1/2}\epsilon^{1/2+d/4}}\right)+O(\epsilon).

(1.4)

The latter result has important theoretical and practical consequences. First, we observe that the convergence rate of the graph Laplacian to $\Delta_{\mathcal{M}}$ depends on the intrinsic dimension $d$ of $\mathcal{M}$ and not on the ambient high dimension $n$ of the data points, mitigating the “curse of dimensionality” [8]. Second, it is known that the eigenfunctions of $\Delta_{\mathcal{M}}$ provide a basis for the space $L^{2}(\mathcal{M})$ of square-integrable functions on $\mathcal{M}$ [36]. For example, when $\mathcal{M}$ is the circle $S^{1}$ , the eigenfunctions of $\Delta_{S^{1}}$ are given by the Fourier modes $\left\{e^{im\theta}\right\}$ . Recent results [9] show that the eigenvectors of $\tilde{L}$ converge to the eigenfunctions of $\Delta_{\mathcal{M}}$ . This implies that the eigenvectors of the graph Laplacian constructed from a data set sampled from $S^{1}$ are discrete approximations to the Fourier modes, giving rise to classical discrete Fourier analysis. Analogously, the eigenvectors of the graph Laplacian constructed by using a sample from a general compact manifold $\mathcal{M}$ can be employed for a data-driven discrete Fourier analysis on $\mathcal{M}$ [43].

In various scenarios, the data set under consideration is closed under the action of a group, namely, there is a known group $G$ such that if $x_{i}$ is a point in our data set, then for each $A\in G$ the point $A\cdot x_{i}$ resulting from the action of $A$ on $x_{i}$ is a valid data point (which is not necessarily in the data set, but may be added to it). Such data sets are called “ $G$ -invariant”. For example, in electron-microscopy imaging, a method to determine the 3D structure of a molecule from its 2D images acquired by an electron microscope [20], all the images lie on a manifold of dimension 3 (diffeomorphic to the 3D rotations group $SO(3)$ ). The planar rotation of any such image is a valid image that may have been acquired by the microscope. Thus, the manifold of images is closed under the action of the rotations group $G=SO(2)$ .

In [38], it was shown how to construct the graph Laplacian from all given images and all their infinitely many in-plane rotations. This construction is deemed “the steerable graph Laplacian”. A key result of [38] is that the steerable graph Laplacian converges to the Laplace-Beltrami operator on $\mathcal{M}$ (in this case a $SO(2)$ -invariant compact manifold) faster than the graph Laplacian (1.3). Specifically, it was shown that the steerable graph Laplacian approximates $\Delta_{\mathcal{M}}$ with an error that is given asymptotically by

O\left(\frac{1}{N^{1/2}\epsilon^{1/2+(d-1)/4}}\right)+O(\epsilon).

(1.5)

The latter error converges to zero at a rate that depends on $d-1$ , and so converges to zero faster than the corresponding error term (1.4) of the standard graph Laplacian (1.3), whose convergence rate depends on $d$ . This improved convergence rate is attributed to the following two facts. The first is that all infinitely many in-plane rotations of each given image are known. The second is that the action of the rotations group on the image manifold $\mathcal{M}$ accounts for one dimension of $\mathcal{M}$ (since planar rotations are parametrized by a single angle in $[0,2\pi)$ ). Combining these two facts implies that the error depends only on $d-1$ dimensions. Furthermore, it was shown in [38] that the eigenfunctions of the steerable graph Laplacian are tensor products between certain $N$ -dimensional vectors and complex exponentials. This special form of the eigenfunctions gives rise to efficient algorithms for their computation. These eigenfunctions are used in [38] for filtering noisy functions on $\mathcal{M}$ using a Fourier-like scheme, and are shown to result in an improved error bound compared to the bound achieved by employing the eigenvectors of the standard graph Laplacian [43].

This paper is the first part of a two part work presenting a graph Laplacian based framework for the analysis of $G$ -invariant data sets. In this paper, we extend the results of [38] (which focuses on image manifolds closed under planar rotations) to the setting where the given data points lie on an arbitrary compact manifold $\mathcal{M}$ of dimension $d$ , closed under the action of an arbitrary compact unitary matrix Lie group $G$ . An example of such a data set is a collection of subtomograms (volumes) in cryo-electron tomography [25, 15], which due to the experimental setup are arbitrarily rotated in space, that is, $G=SO(3)$ . The results in the current paper also lay the foundations for Part II [35], where we develop low-dimensional embeddings of the $G$ -invariant data (which were not proposed in [38]) of two types. The first type is a $G$ -invariant embedding, which means that any two points which are related by the action of an element of $G$ are embedded into the same point. In the context of machine learning, this embedding may be used to organize the data into clusters where the points in each cluster are related by the action of a group element (for example, images which are rotations of one another). The second type is a $G$ -equivariant embedding, which means that the embeddings of two points which are related by the action of an element of $G$ , are themselves related by the action of the same element. Such embeddings may be applied, for instance, to align images which are rotations of one another.

The contributions of this paper are as follows. First, we construct the $G$ -invariant graph Laplacian ( $G$ -GL), which is conceptually the standard graph Laplacian (1.3) constructed using a data set consisting of the given data points as well as all (infinitely many) points generated by applying $G$ to the given points. Second, we show that if $d_{G}$ is the dimension of $G$ , then the $G$ -GL approximates the Laplace-Beltrami operator on the manifold with an error given asymptotically by

O\left(\frac{1}{N^{1/2}\epsilon^{1/2+(d-d_{G})/4}}\right)+O(\epsilon),

(1.6)

analogously to (1.4) and (1.5). The result (1.6) is of great practical importance, as the improved convergence rate implies that significantly less data is required in order to achieve a prescribed accuracy, compared to the standard graph Laplacian. Third, we derive the eigenfunctions of the $G$ -invariant graph Laplacian and show that they admit the form of products between certain vectors and the elements of the irreducible unitary representations of $G$ . Furthermore, we show that this form of the eigenfunctions enables their efficient computation while avoiding explicitly augmenting the input data (that is, by adding the points $A\cdot x$ for every point $x$ in the data set, and all $A\in G$ ). We then demonstrate the utility of these eigenfunctions in filtering a noisy data set sampled from the four-dimensional unit sphere. We comment that different proofs for some of the theoretical results in this paper can be found in [24]. The proof strategies in both papers differ in that here we explicitly construct a $G$ -invariant local parametrization of the data manifold, whereas [24] uses a less concrete approach by passing to the abstract quotient manifold. The advantage of the approach taken here is that while [24] uses advanced machinery from fiber bundle theory, here we employ mostly basic instruments from manifold calculus in Euclidean spaces, which is accessible to a wider audience.

This paper is organized as follows. In Section 2, we review some related work on group invariance and compare it with our approach. In Section 3, we discuss the structure induced on the data manifold by the group action, and introduce some basic machinery from representation theory used in this work. In Section 4, we introduce the $G$ -invariant graph Laplacian and present its key properties. In Section 5, we demonstrate how to use the eigenfunctions of the $G$ -invariant graph Laplacian to filter noisy data sets. In Section 6 we describe the details of the numerical computation of the $G$ -GL, and discuss computational complexity. Lastly, in Section 7, we summarize our results and discuss future work.

2 Related work

Other works dealing with group invariance typically focus on rotation invariance, especially in image processing algorithms [16, 54, 26, 41, 42, 53]. There are four main approaches in the literature towards rotation invariance. The first approach is based on the steerable PCA [52, 29], which computes the PCA of a set of images and all their infinitely many rotations, namely, finds the linear subspace which best spans a set of images and all their rotations. In a sense, our work is a generalization of this approach to nonlinear manifolds and to general compact matrix Lie groups (and not just rotations). The second approach towards rotation invariance is defining a rotationally-invariant distance for measuring pairwise similarities and constructing graph Laplacians using this distance [41]. Unfortunately, it is often not obvious what invariant distance is most appropriate for the task at hand, and how to compute it efficiently. Furthermore, in general, the limiting operator resulting from such a construction is either unknown, or is not the Laplace-Beltrami operator [27], in which case, its properties are not well understood. In our approach, on the other hand, we consider not only the distance between best matching rotations of image pairs (nor any other type of a rotationally-invariant distance), but rather the standard (Euclidean) distance between all rotations of all pairs of images. We show that all these pointwise Euclidean distances can be computed efficiently by using FFT-type algorithms (when available), and that the resulting operator converges to the Laplace-Beltrami operator on the data manifold. This enables us to preserve the geometry of the underlying manifold (in contrast to various rotation-invariant distances) while making the resulting operator (the $G$ -invariant graph Laplacian) invariant to the action of the group on our data set. Moreover, our approach is applicable not only to rotations, but rather to any compact matrix Lie group $G$ . The third approach to group invariance is based on CNNs [45, 48, 49] that produce group equivariant features (for low dimensional rotation groups) by convolving the data with steerable basis functions in each layer. However, this approach lacks solid theory, and in particular, provides no error bounds and no means for analyzing the properties of the resulting tools. Furthermore, unlike CNNs, our approach is applied directly to unlabeled data. The fourth approach, also commonly based on CCN’s, is to augment a given data set $X=\left\{x_{1},\ldots,x_{N}\right\}$ , by adding to it all the points of the form $A\cdot x_{i}$ for some finite set of elements $A\in G$ [13, 14, 18, 31, 39]. This approach suffers from several shortcomings. First, since we have chosen a finite set of elements $A_{1},\ldots,A_{K}\in G$ , the augmented data set is only approximately invariant to the action of $G$ . Second, the augmented data set is larger than the original data set by a factor of $K$ , which poses computational challenges. Third, if the data are noisy, this approach introduces correlations in the noise of different data points. In contrast, in our approach, we derive a numerically efficient construction of a $G$ -invariant operator that is equivalent to constructing the standard graph Laplacian in (1.1) from all the (infinitely many) points generated by the action of $G$ on the points in $X$ , without explicitly augmenting $X$ .

We note the work [51], which although does not deal with group invariance, makes an important contribution by deriving an algorithm for manifold factorization of product manifolds. We can consider the algorithm in [51] as a form of invariant learning, as the goal of the algorithm there is to learn submanifolds that are independent of the other submanifolds comprising the product, and thus, can be used to learn the submanifolds which are generated by the action of the group, and factor them out. However, as we later explain, in our setting, any sufficiently small neighborhood of the data manifold $\mathcal{M}$ is isomorphic to a product of manifolds, but $\mathcal{M}$ itself need not be a product of manifolds. In that sense, the setting in [51] is much more restrictive.

Finally, a special attention should be given to [17]. Similarly to the works mentioned above, this work also defines a group invariant distance by looking at a single group element that best “aligns” a given pair of points. In that sense, its approach is fundamentally different from what we propose here. Yet, this is the first work we know of that addresses the group invariance problem for arbitrary Lie groups.

3 Preliminaries

3.1 Manifolds under actions of matrix Lie groups

In this section, we describe our model for data sets closed under the action of a matrix Lie group. In particular, we define matrix Lie groups and their action on the data set.

Definition 1.

A matrix Lie group is a smooth (that is, differentiable) manifold $G$ , whose points form a group of matrices.

For example, consider the group $SU(2)$ of $2\times 2$ unitary matrices with determinant 1. Each matrix $A\in SU(2)$ can be written using Euler angles as

A(\alpha,\beta,\gamma)=\begin{pmatrix}\cos{\frac{\beta}{2}}e^{i(\alpha+\gamma)% /2}&\sin{\frac{\beta}{2}}e^{i(\alpha-\gamma)/2}\\ i\sin{\frac{\beta}{2}}e^{-i(\alpha-\gamma)/2}&\cos{\frac{\beta}{2}}e^{-i(% \alpha+\gamma)/2}\end{pmatrix},

(3.1)

where $\alpha\in[0,2\pi),\beta\in[0,\pi)$ and $\gamma\in[-2\pi,2\pi)$ . Using the fact that the sum of squares of the entries of $A(\alpha,\beta,\gamma)$ equals one, it is easily inferred that $SU(2)$ is diffeomorphic to the three-dimensional unit sphere $S^{3}$ . Other important examples for matrix Lie groups include the group of three-dimensional rotation matrices $SO(3)$ , and the $n$ -dimensional torus $\mathbb{T}^{n}$ , which is simply the group of diagonal $n\times n$ unitary matrices.

Definition 2.

The action of a group $G$ of $n\times n$ matrices on a subset $S\subseteq\mathbb{C}^{n}$ is the map $\text{'}\cdot\text{'}:G\times S\rightarrow S$ , defined for each $A\in G$ and $x\in S$ by matrix multiplication on the left $A\cdot x$ . We say that a set $S$ is closed under the action of a group $G$ , or simply $G$ -invariant, if $A\cdot x\in S$ for all $x\in S$ and $A\in G$ .

In this work, we assume that we are given a data set $X=\left\{x_{1},\ldots,x_{N}\right\}$ sampled from a smooth, compact, and $G$ -invariant manifold $\mathcal{M}$ without boundary, embedded in $\mathbb{C}^{n}$ , where $G$ is a unitary matrix Lie group. In particular, the $G$ -invariance implies that $A\cdot x\in\mathcal{M}$ for all $x\in\mathcal{M}$ and $A\in G$ . An additional useful characterization of $G$ -invariant manifolds is derived from the following definition.

Definition 3.

For a fixed point $x\in\mathcal{M}$ , the orbit generated by the action of $G$ on $x$ is defined as the set

G\cdot x\coloneq\left\{A\cdot x\;:\;A\in G,\;x\in\mathcal{M}\right\}.

(3.2)

Thus, a manifold $\mathcal{M}$ is $G$ -invariant if $G\cdot x\subset\mathcal{M}$ for all $x\in\mathcal{M}$ , that is, $\mathcal{M}$ contains all the orbits of the action of $G$ on its points. In particular, this implies that the set

G\cdot X\coloneq\left\{A\cdot x_{i}\;:\;A\in G,\;x_{i}\in X\right\}=\bigcup_{i% =1}^{N}G\cdot x_{i},

(3.3)

of points generated by the action of $G$ on the data set $X$ is a $G$ -invariant subset in $\mathcal{M}$ . In Section 4, we construct the central object in our framework, namely, the $G$ -invariant graph Laplacian ( $G$ -GL), which is a graph Laplacian constructed by using not only the points in $X$ but rather all the points in $G\cdot X$ .

Finally, we will assume that the Lie group $G$ is also compact. In the rest of this section, we give a short introduction to the theory of harmonic analysis on compact Lie groups, which is essential for the construction of the $G$ -GL.

3.2 Haar integration

The theory of harmonic analysis on matrix Lie groups requires integrating functions over these groups. This is known as “Haar integration” since it is performed with respect to the Haar measure, which we now define.

Definition 4.

A Haar measure over a Lie group $G$ is a finite valued, non-negative function $\eta(\cdot)$ over all (Borel) subsets $S\subseteq G$ , such that

\eta(A\cdot S)=\eta(S)\quad\text{ for all}\quad A\in G.

(3.4)

By Haar’s theorem (see .e.g [19]), for every compact matrix Lie group there exists a Haar measure which is unique up to a multiplicative constant. In this work, we choose (without loss of generality) the unique measure $\eta$ such that

\eta(G)=1,

(3.5)

and henceforth refer to this $\eta$ as “the Haar measure over $G$ ”. Essentially, the function $\eta(\cdot)$ measures the volume of subsets of the manifold $G$ . Specifically, property (3.5) makes $\eta(\cdot)$ a probability measure over $G$ . Furthermore, property (3.4), known as ’left invariance’, means that multiplication by a matrix $A$ from the left maps the set $S\subseteq G$ to another subset of $G$ of the same measure, implying that $\eta(\cdot)$ is uniform over $G$ . In the context of integration, property (3.4) implies that the Haar integral is left-invariant, namely, for any $B\in G$ we have that

\int_{G}f(BA)d\eta(A)=\int_{G}f(C)d\eta(B^{*}C)=\int_{G}f(C)d\eta(C)=\int_{G}f% (A)d\eta(A),

(3.6)

where we substituted $C=BA$ in the first equality, and used (3.4) in the second equality.

As an example of a Haar integration, the integral of a function $f$ over $G=SU(2)$ can be computed in terms of Euler angles by (see [11])

\int_{SU(2)}f(A)d\eta(A)=\frac{1}{16\pi^{2}}\int_{0}^{2\pi}\int_{0}^{\pi}\int_% {-2\pi}^{2\pi}f(A(\alpha,\beta,\gamma))\sin\beta d\alpha d\beta d\gamma,

(3.7)

where $A(\alpha,\beta,\gamma)$ is defined in (3.1). In this case, the volume element $d\eta(A)$ induced by the Haar measure is just $d\alpha d\beta d\gamma$ multiplied by $\frac{\sin\beta}{16\pi^{2}}$ , which is the absolute value of the Jacobian determinant of the parametrization of $SU(2)$ by Euler angles.

3.3 Harmonic analysis on compact matrix Lie groups

The framework we develop below in Section 4 employs series expansions of functions over compact matrix Lie groups. The expansion of a function $f:G\rightarrow\mathbb{C}$ is obtained in terms of the elements of certain matrix-valued functions, known as the irreducible unitary representations of $G$ , which we now define.

Definition 5.

An $n$ -dimensional unitary representation of a group $G$ is a unitary matrix-valued function $U(\cdot)$ from $G$ to the group U(n) of $n\times n$ unitary matrices, such that

U(A\cdot B)=U(A)\cdot U(B),

(3.8)

and the identity element in $G$ is mapped to the identity element in U(n). The homomorphism property (3.8), implies that the set $\left\{U(A)\right\}_{A\in G}$ is also a matrix Lie group. In particular, the latter implies that each element of the matrix valued function $U(\cdot)$ is a smooth function over $G$ .

Definition 6.

A group representation $U(\cdot)$ is called reducible if there exists a unitary matrix $P$ such that $P\cdot U(A)\cdot P^{-1}$ is block diagonal for all $A\in G$ . A group representation is called irreducible if it is not reducible. We abbreviate irreducible unitray representation as IUR.

By the Peter-Weyl theorem [7], there exists a countable family $\left\{U^{\ell}\right\}$ of finite dimensional IURs of $G$ , such that the collection $\left\{U^{\ell}_{ij}(\cdot)\right\}$ of all the elements of all these IURs forms an orthogonal basis for $L^{2}(G)$ . This implies that any smooth function $f:G\rightarrow\mathbb{C}$ can be expanded in a series of the elements of the IURs of $G$ . For example, the IURs of $SU(2)$ in (3.1) are given by a sequence of matrices $\left\{U^{\ell}\right\}$ , where $\ell=0,1/2,1,3/2,\ldots$ , and $U^{\ell}(A)$ is a $(2\ell+1)\times(2\ell+1)$ dimensional matrix for each $A\in G$ (see e.g. [11]). In fact, the matrices in (3.1) correspond to the IUR of $SU(2)$ with $\ell=1/2$ .

The series expansion of a function $f:G\rightarrow\mathbb{C}$ is then given by

f(A)=\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,n=1}^{d_{\ell}}\hat{f}^% {\ell}_{mn}U_{mn}^{\ell}(A),\quad\hat{f}^{\ell}_{mn}=\int_{G}f(B)\overline{U^{% \ell}_{mn}(B)}d\eta(B),

(3.9)

where $\mathcal{I}_{G}$ is a countable set that enumerates the IURs of $G$ , $d_{\ell}$ is the dimension of the $\ell$ -th IUR, and $\eta(\cdot)$ is the Haar measure on $G$ . The latter can also be written in the form

f(A)=\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\text{trace}\left(\hat{f}^{\ell% }\cdot U^{\ell}(A)\right),

(3.10)

where $\hat{f}^{\ell}$ is the $d_{\ell}\times d_{\ell}$ matrix given by

\hat{f}^{\ell}=\int_{G}f(A)\overline{U^{\ell}(A)}d\eta(A),

(3.11)

for all $\ell\in\mathcal{I}_{G}$ .

Remark 1.

The group $SO(2)$ of two-dimensional rotations is a one dimensional matrix Lie group, whose IURs are given by the Fourier modes $\left\{e^{im\theta}\right\}_{m=-\infty}^{\infty}$ . Thus, the series expansion of an $SO(2)$ -valued function in terms of the IURs of $SO(2)$ is nothing but the classical Fourier series. In this sense, the expansion (3.10) can be viewed as generalized Fourier series over $G$ , with the Fourier modes replaced by the IURs $\left\{U^{\ell}\right\}$ , and with coefficients given by the matrices $\left\{\hat{f}^{\ell}\right\}$ of (3.11).

4 The $G$ -invariant graph Laplacian

In this section, we construct the $G$ -invariant graph Laplacian ( $G$ -GL) - a generalization of the standard graph Laplacian (1.1) for data sets sampled from a $G$ -invariant manifold $\mathcal{M}$ . We then compute the $G$ -GL’s eigendecomposition, and show that a proper normalization of the $G$ -GL converges to the Laplace-Beltrami operator on $\mathcal{M}$ significantly faster than (1.1).

Let $X=\left\{x_{1},\ldots,x_{N}\right\}$ be a data set sampled from a $G$ -invariant (see Definition 2) compact manifold $\mathcal{M}\subset\mathbb{C}^{n}$ . Our goal is to construct the graph Laplacian by using all the points in the $G$ -invariant set $G\cdot X$ in (3.3). As we will see shortly, our construction results in an operator (rather than a matrix) over a certain Hilbert space, which we now define.

Definition 7.

Given a data set $X=\left\{1,\ldots,N\right\}$ , let $\Gamma$ be the set of pairs

\Gamma\coloneq\left\{1,\ldots,N\right\}\times G=\left\{(i,A)\;:\;i\in\left\{1,% \ldots,N\right\},\;A\in G\right\},

(4.1)

where each pair $(i,A)$ corresponds to the point $A\cdot x_{i}\in G\cdot X$ . We define the Hilbert space $\mathcal{H}=L^{2}(\Gamma)$ as the space of functions of the form $f(i,A)=f_{i}(A)$ , where $f_{i}\in L^{2}(G)$ for all $i\in\left\{1,\ldots,N\right\}$ , endowed with the inner product

\left\langle f,g\right\rangle_{\mathcal{H}}=\sum_{i=1}^{N}\int_{G}f_{i}(A)% \overline{g_{i}(A)}d\eta(A),

(4.2)

where $\eta(\cdot)$ is the Haar measure on $G$ .

Now, let $D$ be an $N\times N$ diagonal matrix. We define the action of $D$ on a function $f\in\mathcal{H}$ by

\left\{Df\right\}(i,A)=D_{ii}\cdot f_{i}(A),\quad A\in G,

(4.3)

where $D_{ii}$ is the $i$ ’th element on the diagonal of $D$ . Equipped with Definition 7 and (4.3), we are now ready to define the $G$ -GL.

Definition 8.

Let $W:\mathcal{H}\rightarrow\mathcal{H}$ be the operator acting on functions $f\in\mathcal{H}$ by

\left\{Wf\right\}(i,A)=\sum_{j=1}^{N}\int_{G}W_{ij}(A,B)f_{j}(B)d\eta(B),\quad W% _{ij}=e^{-\left\lVert A\cdot x_{i}-B\cdot x_{j}\right\rVert^{2}/\epsilon},

(4.4)

and let $D$ be the $N\times N$ diagonal matrix defined by

D=\operatorname{diag}\left(D_{11},\ldots,D_{NN}\right),\quad D_{ii}=\sum_{j=1}% ^{N}\int_{G}W_{ij}(I,C)d\eta(C),

(4.5)

where $I$ is the identity element in $G$ . The $G$ -invariant graph Laplacian ( $G$ -GL) is defined as the operator $L:\mathcal{H}\rightarrow\mathcal{H}$ given by

L=D-W.

(4.6)

Note that by (4.4), we have that $W_{ij}(A,B)=W_{ji}(B,A)$ for all $i,j\in\left\{1,\ldots,N\right\}$ and $A,B\in G$ , which implies that the operator $W$ is symmetric. Combining the latter with (4.6) implies the same for $L$ . The following result asserts that $L$ is a positive semi-definite operator.

Lemma 9.

The $G$ -GL admits the positive semi-definite quadratic form

\left\langle f,Lf\right\rangle_{\mathcal{H}}=\frac{1}{2}\sum_{i,j=1}^{N}\int_{% G}\int_{G}W_{ij}(A,B)\left|f_{i}(A)-f_{j}(B)\right|^{2}d\eta(A)d\eta(B).

(4.7)

The proof of Lemma 9 is given in Appendix B. The form (4.7) is analogous to the quadratic form of the standard graph Laplacian [5]. Thus, this form is important on its own right since it can be used as a smoothness regularization term in various machine learning algorithms where the objective function is assumed to have been sampled over a compact $G$ -invariant manifold. This idea was first proposed in [5], and rigorously justified in [50]. Intuitively, the quantity $\left\langle f,Lf\right\rangle_{\mathcal{H}}$ puts large penalties on the differences $\left|f_{i}(A)-f_{j}(B)\right|$ when $W_{ij}$ is large, that is, when there exist $A,B\in G$ such that the points $A\cdot x_{i}$ and $B\cdot x_{j}$ are close. Thus, the quantity $\left\langle f,Lf\right\rangle_{\mathcal{H}}$ can be viewed as imposing a notion of smoothness on functions over the domain $\Gamma$ in (4.1).

Analogously to the results in [38, 40], below we show that the normalization

\tilde{L}=D^{-1}L=I-D^{-1}L,

(4.8)

of $L$ in (4.6) converges to the Laplace-Beltrami operator on $\mathcal{M}$ . While other useful normalizations of $L$ are possible, in the current work, we mainly focus on (4.8). Thus, we henceforth refer to (4.8) as the normalized $G$ -GL.

As mentioned above, unlike previous works [2, 41], our construction results in an operator over a Hilbert space rather than a matrix (compare with (1.1)). This is a direct consequence of the continuous nature of the set $\Gamma$ in (4.1), being a product between a discrete set and a Lie group $G$ , on account of $G$ being a smooth manifold by Definition 1. As we will see next, the continuity of $G$ also implies that the $G$ -GL admits an infinitely-countable basis of eigenfunctions for the space $\mathcal{H}$ (see Definition 7) instead of the finite set of eigenvectors of the graph Laplacian matrix in (1.1). In particular, the eigenfunctions of the $G$ -GL can be evaluated for any $A\in G$ .

4.1 Eigendecomposition of the $G$ -GL

We now derive the eigendecompostions of the $G$ -GL (4.6), and its normalized version (4.8). Let $\mathcal{M}\subset\mathbb{C}^{n}$ be a $G$ -invariant compact and smooth manifold, without a boundary, where $G$ is a compact Lie group of unitary $n\times n$ matrices. Let $X=\left\{x_{1},\ldots,x_{N}\right\}\subset\mathcal{M}$ be a data set sampled from $\mathcal{M}$ . By (4.4), and since $G$ is unitary, for each $i,j\in\left\{1,\ldots,N\right\}$ we have that

W_{ij}(A,B)=W_{ij}(I,A^{*}B),\quad A,B\in G.

(4.9)

That is, each function $W_{ij}$ in (4.4) only depends on the quotient $A^{*}B$ . Thus, by using (3.10), we can expand the function $W_{ij}(I,A^{*}B)\in\mathcal{H}$ in the Fourier series

W_{ij}(I,A^{*}B)=\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\text{trace}\left(% \hat{W}^{\ell}_{ij}U^{\ell}(A^{*}B)\right),

(4.10)

where by (3.11) and (3.6), $\hat{W}_{ij}^{\ell}$ is the $d_{\ell}\times d_{\ell}$ matrix given by

\hat{W}_{ij}^{\ell}=\int_{G}W_{ij}(I,A^{*}B)\overline{U^{\ell}(A^{*}B)}d\eta(B% )=\int_{G}W_{ij}(I,A)\overline{U^{\ell}(A)}d\eta(A),

(4.11)

for each $\ell\in\mathcal{I}_{G}$ .

Clearly, the $G$ -GL is completely characterized by the set of matrices $\hat{W}^{\ell}_{ij}$ of (4.11), since for any $i$ and $j$ , the kernel function $W_{ij}(I,A^{*}B)$ can be recovered from them. The following theorem characterizes the eigendecomposition of $L$ of (4.6) in terms of certain products between the columns of the IURs $U^{\ell}$ , and the eigenvectors of the block matrices

\hat{W}^{\ell}=\begin{pmatrix}\hat{W}^{\ell}_{11}&\hat{W}^{\ell}_{12}&...&\hat% {W}^{\ell}_{1N}\\ \vdots&\ddots&&\vdots\\ \vdots&&\ddots&\vdots\\ \hat{W}^{\ell}_{N1}&\hat{W}^{\ell}_{N2}&...&\hat{W}^{\ell}_{NN}\end{pmatrix},% \quad\ell\in\mathcal{I}_{G},

(4.12)

of dimension $Nd_{\ell}\times Nd_{\ell}$ whose $ij$ -th block of size $d_{\ell}\times d_{\ell}$ is $\hat{W}^{\ell}_{ij}$ of (4.11). To derive the eigendecomposition, we introduce the following notation. For any vector $v\in\mathbb{C}^{Nd_{\ell}}$ and any $j\in\{1,\ldots,N\}$ , we denote by

e^{j}(v)=(v((j-1)d_{\ell}+1),\ldots,v(jd_{\ell}))\in\mathbb{C}^{d_{\ell}},

(4.13)

the elements $(j-1)\cdot d_{\ell}+1$ up to $j\cdot d_{\ell}$ of $v$ stacked in a $d_{\ell}$ -dimensional row vector.

Theorem 10.

For each $\ell\in\mathcal{I}_{G}$ , let $D^{\ell}$ be the $Nd_{\ell}\times Nd_{\ell}$ block-diagonal matrix whose $i$ -th block of size $d_{\ell}\times d_{\ell}$ on the diagonal is given by the product of the scalar $D_{ii}$ in (4.5) with the $d_{\ell}\times d_{\ell}$ identity matrix. Then, the $G$ -invariant graph Laplacian $L$ in $\eqref{GinvDef:Ldef}$ admits the following:

1.

A sequence of non-negative eigenvalues $\{\lambda_{1,\ell},\ldots,\lambda_{Nd_{\ell},\ell}\}_{\ell\in\mathcal{I}_{G}}$ , where $\lambda_{n,\ell}$ is the $n$ -th eigenvalue of the matrix $D^{\ell}-\hat{W}^{\ell}$ .

A sequence $\{\Phi_{\ell,1,1},\ldots,\Phi_{\ell,d_{\ell},Nd_{\ell}}\}_{\ell\in\mathcal{I}_% {G}}$ of eigenfunctions, which are orthogonal and complete in $\mathcal{H}$ and are given by

\Phi_{\ell,m,n}(i,A)=e^{i}(v_{n,\ell})\cdot U^{\ell}_{\cdot,m}(A^{*}),

(4.14)

where $v_{n,\ell}$ is the eigenvector of $D^{\ell}-\hat{W}^{\ell}$ which corresponds to its eigenvalue $\lambda_{n,\ell}$ . Furthermore, for each $n\in\{1,\ldots,Nd_{\ell}\}$ and $\ell\in\mathcal{I}_{G}$ , the eigenfunctions $\{\Phi_{\ell,1,n},\ldots,\Phi_{\ell,d_{\ell},n}\}$ correspond to the eigenvalue $\lambda_{n,\ell}$ of the $G$ -invariant graph Laplacian.

The proof of Theorem 10 is given in Appendix C. A nearly identical theorem (Theorem 20) characterizing the eigendecomposition of the normalized $G$ -GL in (4.8) is given below in Appendix F. Theorem 20 states that for the normalized $G$ -GL we only need to replace the eigenvectors $\left\{v_{n,\ell}\right\}_{n,\ell}$ above with the eigenvectors $\left\{\tilde{v}_{n,\ell}\right\}_{n\ell}$ of the sequence of matrices

S^{\ell}=I-(D^{\ell})^{-1}\hat{W}^{\ell},\quad\ell\in\mathcal{I}_{G},

(4.15)

with the only difference that the resulting eigenfunctions

\{\tilde{\Phi}_{\ell,1,1},\ldots,\tilde{\Phi}_{\ell,d_{\ell},Nd_{\ell}}\}_{% \ell\in\mathcal{I}_{G}},

(4.16)

are no longer orthogonal due to the fact that the matrices in (4.15) are generally not Hermitian.

The form of the eigenfunctions in (4.14) is of practical importance for numerical computations, as it implies that the eigendecomposition of the $G$ -GL can be obtained by diagonalizing the sequence of matrices $\hat{W}^{\ell}$ of (4.12). Furthermore, for groups which are common in applications (e.g. $SO(3)$ ) all the elements of the Fourier matrices $\left\{\hat{W}^{\ell}\right\}$ can be computed efficiently by employing generalized FFT algorithms [11, 33]. We provide the details of such a computational procedure for the case $G=SU(2)$ in Appendix A below.

4.2 Convergence of the $G$ -invariant graph Laplacian

We now show that the normalized $G$ -invariant graph Laplacian (4.8) converges to the Laplace-Beltrami operator on $\mathcal{M}$ . Furthermore, we show that the convergence has an improved rate which scales with $d-d_{G}$ instead of $d$ , where $d_{G}$ is the dimension of the group $G$ .

Theorem 11.

Let $\mathcal{M}$ be a smooth $d$ -dimensional compact manifold without boundary, closed under the action of a matrix Lie group $G$ . Let $\left\{x_{1},\ldots,x_{N}\right\}\in\mathcal{M}$ be i.i.d with the uniform probability density function $p(x)=1/\text{Vol}(\mathcal{M})$ , and suppose that $A\cdot x_{i}\neq x_{i}$ for all $A\in G$ with probability one. Let $f:\mathcal{M}\rightarrow\mathbb{R}$ be a smooth function, and define $g\in\mathcal{H}$ so that $g(i,A)=f(A\cdot x_{i})$ . Then, with high probability, we have that

\frac{4}{\epsilon}\left\{\tilde{L}g\right\}(i,A)=\Delta_{\mathcal{M}}f(A\cdot x% _{i})+O\left(\frac{1}{N^{1/2}\epsilon^{1/2+(d-d_{G})/4}}\right)+O(\epsilon).

(4.17)

The proof of Theorem 11 is given in Appendix D. We point out that the requirement that $Ax_{i}\neq x_{i}$ with probability one ensures that the orbits generated by the action of $G$ (see Definition 3) are diffeomorphic to $G$ . This eliminates pathological cases in which the convergence analysis of the $G$ -GL may become over-complicated or even superfluous, while still accounting for a broad class of data manifolds.

Inspecting (4.17), we observe that as $N\rightarrow\infty$ , the $G$ -GL estimates $\Delta_{\mathcal{M}}$ with a bias error of $O(\epsilon)$ given by the third term on the r.h.s. The second term on the r.h.s accounts for the variance of the estimator when the sample size $N$ is finite. Thus, we conclude that the $G$ -GL reduces the variance error compared to that of the standard GL in (1.4), proportionally to the dimension $d_{G}$ of $G$ . The improvement in the variance error (4.17) in comparison to (1.4) can be explained as follows. In the proof of Theorem 11, we show that any sufficiently small neighborhood $\mathcal{M}^{\prime}\subset\mathcal{M}$ can be written as a disjoint union of orbits generated by $G$ . In fact, we show that there exists a set of $d$ coordinates for $\mathcal{M}^{\prime}$ such that given a point $x\in\mathcal{M}^{\prime}$ , the first $d-d_{G}$ coordinates specify the orbit in which $x$ resides, while the last $d_{G}$ coordinates indicate the position of $x$ on that orbit. In other words, these last $d_{G}$ coordinates are the “directions” in which $G$ acts on $\mathcal{M}$ . The construction of the $G$ -GL incorporates all the points in the set $G\cdot X$ in (3.3), namely, those generated by following the directions in $\mathcal{M}$ in which $G$ acts on the data set $X\subset\mathcal{M}$ . Thus, the variance error of approximating $\Delta_{\mathcal{M}}$ by the $G$ -GL stems entirely from sampling the remaining $d-d_{G}$ directions in $\mathcal{M}$ , resulting in the reduced variance error in (4.17).

Remark 2.

In Theorem 11, we have assumed that the sampling density $p(x)$ is uniform over $\mathcal{M}$ . In Appendix E, we show that in the case where $p(x)$ is non-uniform, the operator $\tilde{L}$ in (4.8) converges to the Fokker-Planck operator $\tilde{\Delta}_{\mathcal{M}}$ , given for every smooth function $f:\mathcal{M}\rightarrow\mathbb{R}$ by

\tilde{\Delta}_{\mathcal{M}}=\Delta_{\mathcal{M}}f-2\frac{\left\langle\nabla_{% \mathcal{M}}f(x),\nabla_{\mathcal{M}}\tilde{p}(x)\right\rangle}{\tilde{p}(x)},

(4.18)

where $\tilde{p}$ is the probability density given by

\ \tilde{p}(x)=\int_{G}p(A\cdot x)d\eta(A).

(4.19)

Furthermore, we show that there exists a normalization $\bar{L}$ of $L$ in (4.6) (different from $\tilde{L}$ in (4.8)) that still converges to $\Delta_{\mathcal{M}}$ .

The practical importance of Theorem 11 is that we expect the eigenvalues and eigenfunctions of the $G$ -GL to approximate those of the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ better than the standard normalized graph Laplacian (1.3). We support this conjecture by simulations in the following section.

4.3 Numerical examples

At this point, we wish to demonstrate the improved convergence rate (4.17) with some numerical examples. In the following simulation, we let the group $G=SU(2)$ (of $2\times 2$ unitary matrices with determinant one) act on a data set sampled from the four-dimensional sphere $S^{4}$ , as follows. First, we sample a set of $N$ points $\left\{p_{1},\ldots,p_{N}\right\}\in S^{4}$ and embed them in the Euclidean space $\mathbb{C}^{3}$ via the map

x_{i}\left(p_{i,1},\ldots,p_{i,5}\right)=(p_{i,1}+ip_{i,2},p_{i,3}+ip_{i,4},p_% {i,5}),\quad p_{i,1}^{2}+\cdots+p_{i,5}^{2}=1,

(4.20)

where we denote by $p_{i,j}$ the $j$ -th coordinate in $p_{i}$ . Then, we let the group $SU(2)$ act on each embedded point $x_{i}$ of (4.20) via the multiplication

\begin{pmatrix}A&\\ &1\end{pmatrix}\cdot x_{i},\quad A\in SU(2),

(4.21)

where $SU(2)$ was defined explicitly in (3.1). We then apply the $SU(2)$ -invariant graph Laplacian to the test function

f(x_{i})=\text{Re}(x_{i,1})+\text{Im}(x_{i,1})=p_{i,1}+p_{i,2},

(4.22)

at the point $x_{0}=(1/2+i/2,1/2+i/2,0)$ , where we denote by $x_{i,j}$ the $j$ -th coordinate of $x_{i}$ . It can be shown that the coordinate functions $h_{j}(p_{i})=p_{i,j}$ on $S^{4}$ are eigenfunctions of the Laplace-Beltrami operator $\Delta_{S^{4}}$ corresponding to the eigenvalue $\lambda=-4$ (see [1]). Thus, we have that $\Delta_{S^{4}}f=-4\cdot(p_{i,1}+p_{i,2})$ and $\Delta_{S^{4}}f(x_{0})=-4$ . To demonstrate the convergence and variance error of (4.17), we uniformly sample $N=5000$ points $p_{i}\in S^{4}$ , and generate the data set $X=\{x_{1},\ldots,x_{N}\}$ by using (4.20). We then approximate $\Delta_{S^{4}}f(x_{0})$ by applying $\tilde{L}$ , the normalized $SU(2)$ -GL, to the function $g(i,A)=f(A\cdot x_{i})$ for $A\in SU(2)$ by

\displaystyle\frac{4}{\epsilon}

\displaystyle\left\{\tilde{L}g\right\}(0,I)=\frac{4}{\epsilon}\left[f(I\cdot x% _{0})-\frac{\sum_{j=1}^{N}\int_{SU(2)}W_{0,j}(I,A)f(A\cdot x_{j})d\eta(A)}{% \sum_{j=1}^{N}\int_{SU(2)}W_{0,j}(I,A)d\eta(A)}\right].

(4.23)

The quantity (4.23) can be approximated efficiently by using the parametrization (3.1) of $SU(2)$ by Euler angles, together with Gauss-Legendre quadratures to approximate the integrals over $SU(2)$ .

We observe that for large values of $\epsilon$ , the error (4.17) is dominated by the term $O(\epsilon)$ , while for small values of $\epsilon$ , the error is dominated by the middle term on the r.h.s of (4.17), which accounts for the sampling variance of the approximation. Thus, we refer to the error for small values of $\epsilon$ as the ’variance dominated region’ of the error.

The results of this experiment are depicted in Figure 1(a), where we plot the log-error of approximation of $\Delta_{S^{4}}f(x_{0})$ by (4.23) against different values of $\log(\epsilon)$ . The slope of the log-error in the variance dominated region is -1.4122 for the normalized standard graph Laplacian (abbreviated standard-GL) and -0.7048 for the normalized $SU(2)$ -GL, supporting the classical result (1.4) for the normalized standard-GL, and (4.17) for the normalized $SU(2)$ -GL, which predict slopes of -1.5 and -0.75 respectively, when substituting $d=4$ and $d_{G}=3$ .

As another example, we simulated the action of the torus group $\mathbb{T}^{2}$ , defined as the group of all diagonal $2\times 2$ unitary matrices, on the unit 3-sphere $S^{3}$ . In a similar fashion to our first example, we embed the sampled data points $\left\{p_{1},\ldots,p_{N}\right\}\in S^{3}$ into $\mathbb{C}^{2}$ via the map

x_{i}\left(p_{i,1},\ldots,p_{i,4}\right)=(p_{i,1}+ip_{i,2},p_{i,3}+ip_{i,4}),% \quad p_{i,1}^{2}+\cdots+p_{i,4}^{2}=1,

(4.24)

and let $\mathbb{T}^{2}$ act on each point $p_{i}$ by matrix multiplication. We then repeat the steps of our first simulation, computing the $\mathbb{T}^{2}$ -GL by using $N=5000$ samples from $S^{3}$ and applying it to the function $g(i,A)=f(A\cdot x_{i})$ for $f$ in (4.22) (defined over $S^{3}$ ), at the point $x_{0}=(1/2+i/2,1/2+i/2)$ . Using the fact the coordinate functions $h_{j}(p_{i})=p_{i,j}$ on $S^{3}$ are eigenfunctions of $\Delta_{S^{3}}$ corresponding to the eigenvalue $\lambda=-3$ (see [1]), we obtain that $\Delta_{S^{3}}f=-3\cdot f$ . The plot of the logs of the approximation errors of $\Delta_{S^{3}}f(x_{0})$ by the normalized $\mathbb{T}^{2}$ -GL and the normalized standard-GL against different values of $\log(\epsilon)$ show the same qualitative picture as in the first simulation. In particular, the slope of the log-error in the variance dominated region is $-1.2171$ for the normalized standard-GL, and $-0.7454$ for the $\mathbb{T}^{2}$ -GL, supporting the results (1.4) and (4.17), which predict slopes of -1.25 and -0.75, respectively, when $d=3$ and $d_{G}=2$ (since $\mathbb{T}^{2}$ is a two-dimensional group).

We also computed the $50$ smallest eigenvalues of the normalized $\mathbb{T}^{2}$ -GL on $S^{3}$ , scaled by $4/\epsilon$ in accordance with (4.17), and the $50$ smallest eigenvalues of the normalized standard-GL, also scaled by $4/\epsilon$ (see (1.4)). We used (the same) $N=5000$ points for the construction of both graph Laplacians, with bandwidth parameter values of $\epsilon=2^{-7}$ for the normalized $\mathbb{T}^{2}$ -GL, and $\epsilon=2^{-3}$ for the standard graph Laplacian. The values of $\epsilon$ were chosen to minimize the mean absolute error of approximating the eigenvalues of $\Delta_{S^{3}}$ by those of each graph Laplacian. The results are illustrated in Figure 3. The red bars depict the eigenvalues of $\Delta_{S^{3}}$ which are given by the $5$ unique values $0,3,8,15$ and $24$ with respective multiplicities $1,4,9,16$ and $25$ (see e.g. [1]). The green and blue bars depict the eigenvalues of the normalized $\mathbb{T}^{2}$ -GL, and those of the normalized standard-GL, respectively. While for both graph Laplacians the multiplicities are in agreement with those of $\Delta_{S^{3}}$ , it is clear that the eigenvalues of the normalized $\mathbb{T}^{2}$ -GL better approximate those of $\Delta_{S^{3}}$ than those of the normalized standard-GL.

Lastly, we illustrate how constructing the normalized $\mathbb{T}^{2}$ -GL by using all the points in $\mathbb{T}^{2}\cdot X\subset S^{3}$ is manifested in the eigenfunctions (4.14). The IUR’s of $\mathbb{T}^{2}$ (see Definitions 5 and 6) are all one-dimensional, and are given by the set of products of Fourier modes $\{e^{il_{1}\theta}\cdot e^{il_{2}\varphi}\}$ , which can be conveniently enumerated by the set $\mathcal{I}_{\mathbb{T}^{2}}=\left\{(\ell_{1},\ell_{2})\;:\;\ell_{1},\ell_{2}% \in\mathbb{Z}\right\}$ . Thus, Theorem 10 implies that the eigenfunctions $\Phi_{\ell,m,n}$ in (4.14) take the form of a Kronecker product between an $N$ -dimensional vector and a bivariate function $e^{il_{1}\theta}\cdot e^{il_{2}\varphi}$ . To visualize the eigenfunctions, we first map the points in $\mathbb{T}^{2}\cdot X$ to $\mathbb{R}^{3}$ by using the stereographic projection from $S^{3}\subset\mathbb{R}^{4}$ . It can be shown that each orbit $\mathbb{T}^{2}\cdot x_{i}\subset S^{3}$ gets projected to a torus in $\mathbb{R}^{3}$ (a “bagel-shaped” surface), and furthermore, that the image of $S^{3}$ under this projection is a union of nested tori that fill all of $\mathbb{R}^{3}$ . Figure 2(c) depicts two of these tori (one nested inside the other), generated by the action of $\mathbb{T}^{2}$ on a pair of data points in $X$ , colored according to the values of $\text{Re}\left\{\Phi_{(2,4),1,1}\right\}$ , the real part of the function $\Phi_{(2,4),1,1}$ (i.e. $\ell=(2,4))$ . In Figure 2(a), we show the values of $\text{Re}\left\{\Phi_{(2,4),1,1}\right\}$ at the points of the stereographic projection of $X\subset S^{3}$ , which were projected to the $xy$ -plane in $\mathbb{R}^{3}$ , and in Figure 2(b), we show the values of $\text{Re}\left\{\Phi_{(2,4),1,1}\right\}$ at intersection of the $xy$ -plane with all the tori generated by the action of $\mathbb{T}^{2}$ on those points, which happens at planar circles. In particular, each circle in Figure 2(b) is generated by the action of $\mathbb{T}^{2}$ on a point in Figure 2(a), which illustrates how the eigenfucntions account for the group action.

5 Denoising $G$ -invariant data sets

We now demonstrate how to apply Theorem 10 to denoise a data set sampled from an $SU(2)$ -invariant manifold. In the following simulations, we generate noisy samples from the $4$ -sphere $S^{4}$ according to the following model. For a scalar $\sigma>0$ , we define the $\sigma$ -tubular neighborhood of $S^{4}$ by

S^{4}_{\sigma}=\left\{x\;:\;\min_{y\in S^{4}}\left\lVert x-y\right\rVert<% \sigma\right\}.

(5.1)

The set $S^{4}_{\sigma}$ is simply a spherical shell of width $2\sigma$ in $\mathbb{R}^{5}$ . A noisy sample of $S^{4}$ is generated by drawing points uniformly from $S^{4}_{\sigma}$ for some fixed $\sigma>0$ . Thus, the parameter $\sigma$ controls the amount of noise in the data set. We generate a data set $X=\left\{x_{1},\ldots,x_{N}\right\}$ by drawing $N$ points $\left\{p_{1},\ldots,p_{N}\right\}\in S^{4}_{\sigma}$ , and then map** each point $p_{i}$ to a point $x_{i}\in\mathbb{C}^{3}$ by using the map (4.20).

To apply our framework to denoise the data set $X$ , we consider the action of the group $SU(2)$ on $X$ defined in (4.21). Using the notation in (4.20) and (4.21), we define the functions

F_{1}(i,A)=\left(U(A)\cdot x_{i}\right)_{1},\quad F_{2}(i,A)=\left(U(A)\cdot x% _{i}\right)_{2},\quad F_{3}(i,A)=x_{i,3},

(5.2)

for all $A\in SU(2)$ and $i\in\left\{1,\ldots,N\right\}$ , where $\left(\cdot\right)_{1}$ and $\left(\cdot\right)_{2}$ denote the first and second elements of a vector in $\mathbb{C}^{3}$ . Clearly, we have that $F_{1},F_{2}$ , and $F_{3}$ are all elements of the Hilbert space $\mathcal{H}=L^{2}\left\{\left\{1,\ldots,N\right\}\times SU(2)\right\}$ . For each $k\in\left\{1,2,3\right\}$ , the function $F_{k}(i,\cdot):G\rightarrow\mathcal{M}$ is the $k$ -th’ coordinate of the points in the orbit $G\cdot x_{i}$ , and thus $F_{k}$ is the $k$ -th coordinate function of the points in $G\cdot X$ . Denote by $S^{4}_{\mathbb{C}}$ the embedding of $S^{4}$ in $\mathbb{C}^{3}$ by the map in (4.20). Thus, the function $F_{k}$ attains the values of the $k$ -th coordinate of $S^{4}_{\mathbb{C}}$ sampled at the points in $G\cdot X$ .

We now denoise the data set $X$ as follows. First, we construct the normalized normalized $SU(2)$ -GL by using the points in the data set $X$ , and compute its eigenfunctions $\left\{\Phi_{\ell,m,n}\right\}$ given by (4.14), as described by Theorem 10. We choose the bandwidth parameter $\epsilon$ so as to make the matrices $\hat{W}^{\ell}$ in (4.12) sparse. Specifically, for a data set of $N=5000$ points, we first subsample $50$ points and sort the elements in each of the rows of $\hat{W}^{(0)}$ (which is real valued) corresponding to those points in descending order. The bandwidth is then chosen such that the values of the sorted elements in each row decay exponentially fast, and such that the index of the elbow of the scree plot of values in each row (defined as the first point where the derivative equals $\approx-1$ ) is $<250$ (which is $5\%$ of the values). We then expand each of the functions of (5.2) in terms of the eigenfunctions $\left\{\Phi_{\ell,m,n}\right\}$ , and truncate the expansion. A standard approach is to retain the terms in the expansion that correspond to eigenvalues $\lambda_{n,\ell}$ above some threshold value. However, we truncate the expansion using the following observation. The $4$ -sphere can be completely recovered using the five eigenfunctions that correspond to the second leading eigenvalue of the Laplacian operator $\Delta_{S^{4}}$ . This is simply due to the fact that the coordinate functions $h_{1},\ldots,h_{5}$ defined for each $p_{i}\in S^{4}$ by $h_{j}(p_{i})=p_{i,j}$ , span the eigenspace that corresponds to the second smallest eigenvalue of $\Delta_{S^{4}}$ [1]. Thus, we expect that the functions in (5.2) should be well approximated by the space spanned by the eigenfunctions corresponding to the five smallest eigenvalues of the normalized $SU(2)$ -GL after excluding the smallest eigenvalue. This suggests retaining only the terms corresponding to the latter eigenfunctions in the expansion of each coordinate function in (5.2). Finally, for each $i\in\left\{1,2,3\right\}$ let $\tilde{F_{i}}\in\mathbb{C}^{N}$ denote the vector of values of the truncated expansion (just described) of the function $F_{i}$ of (5.2) at the points $(j,I)$ for all $j\in\left\{1,\ldots,N\right\}$ . The denoised data points $\tilde{x}_{1},\ldots,\tilde{x}_{N}$ are then given by

\normalsize\begin{pmatrix}-\>\tilde{x}_{1}\;-\\ -\;\tilde{x}_{2}\;-\\ \vdots\\ -\;\tilde{x}_{N}\;-\end{pmatrix}=\begin{pmatrix}|&|&|&|&|\\ \\ \text{Re}\left\{\tilde{F}_{1}\right\}&\text{Im}\left\{\tilde{F}_{1}\right\}&% \text{Re}\left\{\tilde{F}_{2}\right\}&\text{Im}\left\{\tilde{F}_{2}\right\}&% \text{Re}\left\{\tilde{F}_{3}\right\}\\ \\ |&|&|&|&|\end{pmatrix}.

(5.3)

The denoising results of $N=5000$ points sampled from $S_{\sigma}^{4}$ for various values of $\sigma$ are presented in Figure 1. Defining the error of approximation of each noisy point $x_{i}$ as the distance

d_{i}=\min_{y\in S^{4}}\left\lVert x_{i}-y\right\rVert,

(5.4)

for each value of $\sigma$ , we report the mean squared error (MSE) of the approximation obtained by preforming our proposed denoising procedure using the normalized $SU(2)$ -GL. For comparison, we also report the MSE for the same data sets denoised by the eigenvectors of the normalized standard GL. Denoising using the normalized standard GL is implemented by viewing each column $H_{i}$ of the matrix

\begin{pmatrix}|&&|\\ H_{1}&\cdots&H_{5}\\ |&&|\\ \end{pmatrix}=\begin{pmatrix}-\;x_{1}\;-\\ -\;x_{2}\;-\\ \vdots\\ -\;x_{N}\;-\end{pmatrix}

(5.5)

formed by stacking the data points in rows, as a sample of a coordinate function on $\mathcal{M}$ , and projecting $H_{i}$ on the eigenvectors that correspond to the five smallest eigenvalues of the standard GL, after excluding the smallest one. We observe that for moderate noise levels $\sigma=0.1,0.2$ , denoising using the normalized $SU(2)$ -GL outperforms denoising using the normalized standard-GL by an order of magnitude, recovering the 4-sphere with high accuracy.

$\sigma$	noisy data MSE	standard GL denoised data MSE	$SU(2)$ -GL denoised data MSE
0.1	3.3E-03	9.3E-04	5.04E-05
0.2	1.33E-02	3.11E-03	3.30E-04
0.4	5.33E-02	1.745E-02	1.6E-02

Table 1: MSE of noisy data before and after denoising.

6 Implementation details and computational complexity

In this section, we describe a numerical procedure to compute the eigendecomposition of the $G$ -invariant graph Laplacian in the case where $G=SU(2)$ . We point out that almost all of our analysis can be readily generalized to the case where $G$ is an arbitrary compact matrix Lie group, and we restrict ourselves to the case $G=SU(2)$ whose representation theory is well understood, for the sake of clarity and concreteness. In particular, the important case where $G=SO(3)$ is nearly identical to that of $G=SU(2)$ since the IUR’s of $SO(3)$ are a subset of those of $SU(2)$ .

With the exception of $SO(2)$ and the 2-dimensional torus $\mathbb{T}^{2}$ , the dimension of a matrix Lie group is $\geq 3$ . Thus, even for a low-dimensional group such as $SU(2)$ , the integrals in (4.11), required to construct the matrices (4.12), need to be evaluated by triple sums. Such sums are computationally expensive even for moderate values of $N$ . Fortunately, for groups such as $SU(2)$ (and the closely related $SO(3)$ ) there exist generalized FFT algorithms that compute the Fourier coefficients efficiently [33].

The general approach for numerical integration over $SU(2)$ hinges upon the fact that the elements of its IURs can be parameterized by Euler angles, and written in a separable form as a product of factors, each of which depends on a single angle. The integrals are then evaluated using quadrature formulas that are computed using FFT-type algorithms applied to each factor seperately, requiring $O(\tilde{K}\log^{2}\tilde{K})$ operations where $\tilde{K}$ is a prescribed sampling resolution over the group. We give a detailed exposition of an $SU(2)$ -FFT in Appendix A below.

We now continue to describe analyze the complexity of computing the eigendecomposition presented in Theorem 10 for the case where $G=SU(2)$ acts on a data set $\left\{x_{1},\ldots,x_{N}\right\}\in\mathcal{M}\subset\mathbb{C}^{\mathcal{D}}$ by matrix multiplication. The first step of the algorithm requires computing the affinities $W_{ij}$ in (4.4) at $O(\tilde{K})$ sampling points, and in particular, the Euclidean pairwise distances inside each exponent. In practice, the matrices $A\in G$ are usually block-diagonal where each block is an IUR of $SU(2)$ (see e.g. [37, 38]). Formally, we write

A=\text{diag}(U^{\ell_{1}}(A),\ldots,U^{\ell_{S}}(A)),\quad\ell_{j}\in\mathcal% {I}_{\mathcal{M}},\quad j=1\ldots,S,

(6.1)

where $U^{\ell_{j}}$ is the $\ell_{j}$ -th dimensional IUR of $SU(2)$ , and $\mathcal{I}_{\mathcal{M}}$ is the set of IURs that appear as blocks on the diagonal of $A$ , such that $\ell_{j}\leq\ell_{j+1}$ . Note that some of the IURs may appear more than once on the diagonal. Accordingly, we can now index the coordinates of a point $x_{i}$ in the data set to match the indices of the rows of the IURs in the blocks of $A$ , by

x_{i}=\left(x_{i,(\ell_{j},m)}\right),\quad\quad m=-\ell_{j},\ldots,\ell_{j},% \quad\ell_{j}\in\mathcal{I}_{\mathcal{M}}.

(6.2)

That is, the indexing (6.2) partitions $x_{i}$ into $\#\left\{\mathcal{I}_{\mathcal{M}}\right\}$ tuples of length $(2\ell_{j}+1)$ such that the action of $SU(2)$ on $x\in\mathcal{M}$ can be written as

(A\cdot x_{i})_{l_{j},m}=\sum_{r=-\ell_{j}}^{\ell_{j}}U_{m,r}^{\ell_{j}}(A)% \cdot x_{i,(\ell_{j},r)},

(6.3)

for each $\ell_{j}$ and $m$ in (6.2). Altogether, in matrix form, we have that

A\cdot x_{i}=\begin{pmatrix}U^{\ell_{1}}(A)&&&&\\ &\ddots&&&\\ &&\ddots&&\\ &&&\ddots&\\ &&&&U^{\ell_{S}}(A)\end{pmatrix}\cdot\begin{pmatrix}x_{i,-\ell_{1}}\\ \vdots\\ x_{i,\ell_{1}}\\ \vdots\\ x_{i,-\ell_{S}}\\ \vdots\\ x_{i,\ell_{S}}\end{pmatrix}.

(6.4)

To compute the matrices in (4.11), we must first evaluate the Euclidean distances

\left\lVert x_{i}-A\cdot x_{j}\right\rVert,\quad A\in G,\quad i,j\in\left\{1,% \ldots,N\right\}.

(6.5)

Expanding the squared norm function, we have

\displaystyle\lVert x_{i}-A\cdot x_{j}\rVert^{2}=\left\lVert x_{i}\right\rVert% ^{2}+\left\lVert x_{j}\right\rVert^{2}-2\text{Re}\left\{\left\langle x_{i},A% \cdot x_{j}\right\rangle\right\}.

(6.6)

Then, expanding the inner product in the third term on the right hand side of (6.6), we get

	$\displaystyle\left\langle x_{i},A\cdot x_{j}\right\rangle$	$\displaystyle=\sum_{\ell\in\mathcal{I}_{\mathcal{M}}}\sum_{m=-\ell}^{\ell}\sum% _{\left\{k:\ell_{k}=\ell\right\}}x_{i,(\ell_{k},m)}\sum_{r=-\ell}^{\ell}U^{% \ell}_{mr}(A)\cdot x_{j,(\ell_{k},r)}$		(6.7)
		$\displaystyle=\sum_{\ell\in\mathcal{I}_{\mathcal{M}}}\sum_{m,r=-\ell}^{\ell}% \sum_{\left\{k:\ell_{k}=\ell\right\}}c_{(i,j),(\ell,m,r)}\cdot U^{\ell}_{mr}(A),$

where we denote

c_{(i,j),(\ell,m,r)}=\sum_{\left\{k:\ell_{k}=\ell\right\}}x_{i,(\ell_{k},m)}% \cdot x_{j,(\ell_{k},r)}.

(6.8)

Given an integration parameter $K$ , we compute (6.7) and subsequently (6.6) for all matrices $A_{k1,k_{2},k_{3}}$ defined by using (3.1) and (6.1) as

A_{k_{1},k_{2},k_{3}}\coloneq\text{diag}(U^{\ell_{1}}(A(\pi k_{1}/K,\pi k_{2}/% K,\pi k_{3}/K),\ldots,U^{\ell_{S}}(A(\pi k_{1}/K,\pi k_{2}/K,\pi k_{3}/K))),

(6.9)

where $k_{1}=0,\ldots,K-1$ , and $k_{2}=0,\ldots,2K-1$ , and $k_{3}=-2K,\ldots,2K-1$ . Once we have computed the coefficients $c_{(i,j),(\ell,m,r)}$ , the third term in (6.6) can be computed for all $A_{k_{1},k_{2},k_{3}}$ with $O(K^{3}\log^{2}K)$ operations by using a generalized FFT algorithm for $SU(2)$ (see Appendix A). Now, since the $\ell$ -th IUR consists of $(2\ell+1)^{2}$ elements, the number of coefficients $c_{(i,j),(\ell,m,r)}$ that need to be computed for a fixed pair $i$ and $j$ amounts to

\sum_{\ell\in\mathcal{I}_{\mathcal{M}}}\sum_{k:\ell_{k}=\ell}(2\ell_{k}+1)^{2}.

(6.10)

By (6.2) and (6.4), we have

\sum_{\ell\in\mathcal{I}_{\mathcal{M}}}\sum_{k:\ell_{k}=\ell}(2\ell_{k}+1)=n,

(6.11)

where $n$ is the dimension of the points $x_{i}$ . Since (6.10) is bounded from above by the square of (6.11), we have that (6.10) is $O(n^{2})$ . Finally, once we have computed the squared distances (6.6), we use Algorithm 2 to compute the elements of the matrices $S^{\ell}$ in (4.15), and compute their eigenvectors and eigenvalues. The entire procedure is described in Algorithm 1.

Algorithm 1 Evaluating the

G

-invariant manifold harmonics

1:Input: A data set of

N

points

\left\{x_{1},\ldots,x_{N}\right\}\subset\mathbb{C}^{\mathcal{D}}

, integration parameter

K

, and bandwidth parameter

\epsilon

2:For every

i,j\in\left\{1,\ldots,N\right\}

, apply Algorithm 2 with integration parameter

K

, in conjunction with (6.6) and (6.7) to compute the affinities

{W}_{ij}(I,A_{k_{1},k_{2},k_{3}})=\exp{\left\{-{\left\|x_{i}-A_{k_{1},k_{2},k_% {3}}\cdot x_{j}\right\|^{2}}{/\epsilon}\right\}},

(6.12)

where

A_{k_{1},k_{2},k_{3}}

is defined in (6.9).

3:For every

i,j\in\left\{1,\ldots,N\right\}

and

\ell\in\left\{0,\ldots,K-1\right\}

, apply Algorithm 2 to evaluate the generalized Fourier coefficient matrices

\hat{W}_{ij}^{\ell}

of (4.11).

4:For every

\ell\in\left\{0,\ldots,K-1\right\}

form the matrix

\tilde{S}_{\ell}=I-\left(D^{\ell}\right)^{-1}\hat{W}^{\ell},

(6.13)

from (4.15), and return its eigenvectors

\left\{\tilde{v}_{n,\ell}\right\}_{n=1}^{N}

and eigenvalues

\left\{\tilde{\lambda}_{n,\ell}\right\}_{n=1}^{N}

We now summarize the computational complexity of Algorithm 1. Given that we evaluate the $SU(2)$ Fourier series over $O(K)$ points for each Euler angle, the sampling resolution of $SU(2)$ amounts to $O(K^{3})$ points. Denoting $\tilde{K}=O(K^{3})$ , computing the distances in (6.12) requires $O(N^{2}\tilde{K}\log^{2}\tilde{K}+N^{2}n^{2})$ operations, out of which $O(N^{2}n^{2})$ operations are required to compute the coefficients $c_{(i,j),(\ell,m,r)}$ , and $O(N^{2}\tilde{K}\log^{2}\tilde{K})$ operations to compute (6.7) using a fast polynomial transform based $SU(2)$ -FFT. Forming the generalized Fourier coefficients matrices $\hat{W}^{\ell}$ of (4.11) when using a $SU(2)$ -FFT requires $O(N^{2}\tilde{K}\log^{2}\tilde{K})$ operations. Forming the sequence of matrices (6.13) in the last step of Algorithm 1 requires $O(N^{2}K)$ operations, and evaluating the eigenvalues and eigenfunctions of (6.13) requires additional $O(N^{3}K)$ operations. Thus, the computational complexity of Algorithm 1 amounts to $O(N^{3}K+N^{2}n^{2}+N^{2}\tilde{K}\log^{2}\tilde{K})$ operations in total.

7 Summary and future work

In this work, we extended the graph Laplacian to data sets that are closed under the action of a matrix Lie group. To that end, we introduced the $G$ -invariant graph Laplacian (the $G$ -GL), that incorporates the group action into its construction, by considering the pairwise distances between all points generated by applying the group action to the given data set. We have shown that the $G$ -GL converges to the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ , at a rate accelerated proportionally to the dimension of the group. This accelerated rate implies that it is advantageous to employ the $G$ -GL for graph Laplacian based methods [28, 2, 5] whenever the data set is equipped with a known group action, since faster convergence implies that significantly less data is required for a prescribed accuracy. We also derived the eigendecomposition of the $G$ -GL, showing that its eigenfunctions have a separable form, where the dependence on the group is expressed analytically using the irreducible unitary representations of the group. We then demonstrated how the $G$ -GL can be employed to denoise a noisy sample from the 4-sphere $S^{4}$ by using a discrete Fourier analysis type algorithm, with the Fourier modes replaced by the eigenfunctions of the $G$ -GL.

As of future research, an important direction is to investigate the spectral convergence (see [9]) of the $G$ -GL, that is, the convergence of its eigenvectors and eigenvalues to those of $\Delta_{\mathcal{M}}$ . Another, could be to further develop applications of the $G$ -GL, e.g., in electron-microscopy imaging [20].

Acknowledgements

PH and JK were supported in part by NSF Award DMS-2309782 and start-up grants provided by the College of Natural Sciences and Oden Institute for Computational Engineering and Sciences at the University of Texas at Austin. XC was supported in part by NSF-BSF award 2019733. ER and YS were supported by NSF-BSF award 2019733 and by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH). YS was supported also by the NIH/NIGMS Award R01GM136780-01.

Appendix A The FFT over $SU(2)$

We now describe how to efficiently compute the Fourier series of a function defined over the group $SU(2)$ whose elements are given by (3.1). The explicit form of the series is given in (3.9), with the IURs of $SU(2)$ enumerated by the set of non-negative half integers $\mathcal{I}_{SU(2)}=\left\{0,1/2,1,3/2,\ldots\right\}$ , and $d_{\ell}=2\ell+1$ for all $\ell\in\mathcal{I}_{SU(2)}$ .

Recall that the Fourier series of a function over a Lie group $G$ is given by the elements of its IURs. The elements of the IURs of $SU(2)$ are given by (see [47])

\begin{gathered}U^{\ell}_{mn}(\alpha,\beta,\gamma)=e^{-im\alpha}P_{mn}^{\ell}(% \cos\beta)e^{-in\gamma},\\ m,n\in\left\{-\ell,\cdots,\ell\right\},\quad\ell=0,\frac{1}{2},1,\frac{3}{2},% \ldots,\end{gathered}

(A.1)

with $P_{mn}^{\ell}(\cos\beta)$ given by

P_{mn}^{\ell}(\cos\beta)=\bigg{[}\frac{(\ell-m)!(\ell+m)!}{(\ell-n)!(\ell+n)!}% \bigg{]}^{\frac{1}{2}}\sin^{m-n}\left(\frac{\beta}{2}\right)\cos^{m+n}\left(% \frac{\beta}{2}\right)P_{\ell-m}^{(m-n,m+n)}(\cos\beta),

(A.2)

where $P^{(a,b)}_{r}$ are the Jacobi polynomials (see [47]). Now, by (3.7) and (3.9), the generalized Fourier coefficients of a function $f:SU(2)\rightarrow\mathbb{C}$ are given by

\hat{f}_{mn}^{\ell}=\frac{1}{16\pi^{2}}\int_{0}^{2\pi}\int_{0}^{\pi}\int_{-2% \pi}^{2\pi}f(B(\alpha,\beta,\gamma))\overline{U^{\ell}_{mn}(\alpha,\beta,% \gamma)}\sin\beta d\alpha d\beta d\gamma.

(A.3)

These coefficients can be approximated rapidly to an arbitrary accuracy as we now describe. Note that the functions $U^{\ell}_{mn}(\alpha,\beta,\gamma)$ in $\eqref{secSU2Quad:SU2IURs}$ separate into a product of factors each depending on a single angle $\alpha$ , $\beta$ , or $\gamma$ . Thus, (A.3) can be computed by integrating over each of the angles successively, as we now show.

Set a bandlimit $L\geq 0$ depending on the required accuracy, and an integration parameter $K>2L$ . We begin by evaluating the integrals

\tilde{f}_{m}(\beta,\gamma)=\int_{0}^{2\pi}f(B(\alpha,\beta,\gamma))e^{im% \alpha}d\alpha,

(A.4)

of $f$ multiplied by the conjugate of the factor depending on $\alpha$ in (A.1) for each $m\in\left\{-L,\ldots,L\right\}$ , all $\gamma\in\left\{-2\pi,-2\pi+2\pi/K,\ldots,2\pi(K-1)/K\right\}$ , and all $\beta\in\left\{\arccos(y_{k})\right\}_{k=1}^{M}$ , where $y_{k}$ are the Gauss-Legendre quadrature nodes for some $M=O(K)$ (the reason for this choice of $\beta$ s will become apparent shortly), by

\tilde{f}_{m}(\beta,\gamma)\approx\tilde{f}_{\left[m\right]}(\beta,\gamma)=% \frac{2\pi}{K}\sum_{k=0}^{K-1}f\left(B\left(2\pi k/K,\beta,\gamma\right)\right% )e^{2\pi imk/K}.

(A.5)

Using $O(K^{2})$ applications of the classical FFT, the entire computation is accomplished by using $O(K^{3}\log K)$ operations. Next, we evaluate the integrals

\accentset{\approx}{f}_{mn}(\beta)=\int_{-2\pi}^{2\pi}\tilde{f}_{m}(\beta,% \gamma)e^{in\gamma}d\gamma=\int_{-2\pi}^{2\pi}\int_{0}^{2\pi}f(\alpha,\beta,% \gamma)e^{im\alpha}e^{in\gamma}d\alpha\,d\gamma,

(A.6)

of $f$ multiplied by the conjugate of the factors in (A.1) that depend on $\alpha$ and $\gamma$ , for each $n\in\left\{-L,\ldots,L\right\}$ , and all values of $\beta$ and $m$ used in the previous computation, by

\accentset{\approx}{f}_{mn}(\beta)\approx\accentset{\approx}{f}_{\left[m\right% ]\left[n\right]}(\beta)=\frac{4\pi}{K}\sum_{k=0}^{K-1}\tilde{f}_{\left[m\right% ]}(\beta,2\pi k/K)e^{2\pi ink/K}.

(A.7)

Using $O(K^{2})$ applications of the FFT, the latter computation amounts to a total of $O(K^{3}\log K)$ operations.

Algorithm 2

SU(2)

-FFT

1:Input:

1.

Integration parameter $K$ .
2.

Function $f:SU(2)\rightarrow\mathbb{C}$ .
3.

Precomputed weights $\left\{w_{0},\ldots,w_{M}\right\}$ and nodes $\left\{y_{1},\ldots,y_{M}\right\}$ for Gauss-Legendre quadrature.

2:for

\ell\in\left\{0,\ldots,L\right\}

3: for

m\in\left\{-\ell,\ldots,\ell\right\}

4: for

n\in\left\{-\ell,\ldots,\ell\right\}

5: for

\beta\in\left\{\arccos(y_{0}),\ldots,\arccos(y_{M})\right\}

6: for

\gamma\in\left\{-2\pi,-2\pi+\frac{2\pi}{K}\ldots,\frac{2\pi(K-1)}{K}\right\}

\tilde{f}_{[m]}(\beta,\gamma)=\frac{2\pi}{K}\sum_{k=0}^{K-1}f\left(\frac{2\pi k% }{K},\beta,\gamma\right)e^{i2\pi mk/K},

(A.8)

7: end for

\accentset{\approx}{f}_{\left[m\right]\left[n\right]}(\beta)=\frac{4\pi}{K}% \sum_{k=-K}^{K-1}\tilde{f}_{[m]}\left(\beta,\frac{2\pi k}{K}\right)e^{i2\pi nk% /K}

(A.9)

8: end for

\hat{f}_{\left[m\right]\left[n\right]}^{\left[\ell\right]}=\frac{1}{16\pi^{2}}% \sum_{k=0}^{M}w_{k}\cdot\accentset{\approx}{f}_{\left[m\right]\left[n\right]}% \left(\arccos(y_{k})\right)P^{\ell}_{mn}\left(y_{k}\right)

(A.10)

9: end for

10: end for

11:end for

12:The generalized Fourier coefficients

\hat{f}_{\left[m\right]\left[n\right]}^{\left[\ell\right]}

Lastly, we evaluate

	$\displaystyle\hat{f}_{mn}^{\ell}$	$\displaystyle=\frac{1}{16\pi^{2}}\int_{-2\pi}^{2\pi}\int_{0}^{\pi}\int_{0}^{2% \pi}f(B(\alpha,\beta,\gamma))\overline{U^{\ell}_{mn}(\alpha,\beta,\gamma)}\sin% \beta\,d\alpha\,d\beta\,d\gamma,$
		$\displaystyle=\frac{1}{16\pi^{2}}\int_{0}^{\pi}\accentset{\approx}{f}_{mn}(% \beta)P_{mn}^{\ell}(\cos\beta)\sin\beta\,d\beta=\frac{1}{16\pi^{2}}\int_{-1}^{% 1}\accentset{\approx}{f}_{mn}(\arccos(y))P_{mn}^{\ell}(y)\,dy$		(A.11)

for each $\ell\in\left\{0,\ldots,L\right\}$ , and all $m$ and $n$ from the previous computation, by

\hat{f}_{mn}^{\ell}\approx\hat{f}_{\left[m\right]\left[n\right]}^{\left[\ell% \right]}=\frac{1}{16\pi^{2}}\sum_{k=1}^{M}w_{k}\cdot\accentset{\approx}{f}_{% \left[m\right]\left[n\right]}(\arccos(y_{k}))P_{mn}^{\ell}(y_{k}),

(A.12)

using Gauss-Legendre quadrature with precomputed weights $w_{1},\ldots,w_{M}$ . The latter computation is accomplished using $O(K^{3})$ direct evaluations of size $O(K)$ , amounting to a total complexity of $O(K^{4})$ operations.

We point out that (A) can also be computed using $O(K^{3}\log^{2}K)$ operations by applying fast polynomial transforms (see e.g. [34]), bringing the overall complexity of the entire algorithm to $O(K^{3}\log^{2}K)$ operations. However, after some experimentation, we found that while the direct computation of (A.12) is asymptotically more expensive, in practice, utilizing GPUs to evaluate it is substantially faster than the available $O(K^{3}\log^{2}K)$ algorithms. Unfortunately, utilizing GPUs does not easily lend itself for speeding up fast polynomial transform algorithms, due to their iterative nature. The entire procedure of evaluating the integrals in (A.3) is outlined in Algorithm 2.

Lastly, we note that the method described above can be applied to $SO(3)$ by restricting all computations to the integer valued IURs of $SU(2)$ , and the angle $\gamma$ to $[0,2\pi)$ .

Appendix B Proof of Lemma 9

For any $f\in\mathcal{H}$ , expanding (4.6) by using (4.4) and (4.5), we obtain that

\left\{Lf\right\}(i,A)=D_{ii}\cdot f_{i}(A)-\sum_{i=1}^{N}\int_{G}W_{ij}(A,B)f% _{j}(B)d\eta(B),\quad(i,A)\in\Gamma,

(B.1)

which implies that

\left\langle f,Lf\right\rangle_{\mathcal{H}}=\sum_{i=1}^{N}D_{ii}\cdot\int_{G}% \left|f_{i}(A)\right|^{2}d\eta(A)-\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}% \overline{f_{i}}(A)f_{j}(B)d\eta(A)d\eta(B).

(B.2)

Next, by using the left-invariance property (3.6) of $\eta$ and (4.9), for any $i,j\in\left\{1,\ldots,N\right\}$ and $A\in G$ we have that

\int_{G}W_{ij}(I,C)d\eta(C)=\int_{G}W_{ij}(I,C)d\eta(AC)=\int_{G}W_{ij}(I,A^{*% }B)d\eta(B)=\int_{G}W_{ij}(A,B)d\eta(B),

(B.3)

where we made the change of variables $B=AC$ in the second equality. Thus, using that $W_{ij}(A,B)=W_{ji}(B,A)$ (by the definition of $W_{ij}$ in (4.4)), we can write the first expression on the r.h.s of (B.2) as

	$\displaystyle\sum_{i=1}^{N}D_{ii}\cdot\int_{G}\left\|f_{i}(A)\right\|^{2}d\eta(A)$	$\displaystyle=\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\left\|f_{i}(A)\right\|% ^{2}d\eta(A)d\eta(B)$
		$\displaystyle=\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\left\|f_{j}(B)\right\|% ^{2}d\eta(A)d\eta(B).$		(B.4)

Plugging (B) into (B.2) we obtain that

	$\displaystyle\left\langle f,Lf\right\rangle_{\mathcal{H}}$	$\displaystyle=\frac{1}{2}\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\Big{[}% \left\|f_{i}(A)\right\|^{2}+\left\|f_{j}(B)\right\|^{2}-f_{i}(A)\overline{f_{j}}(B% )-\overline{f_{i}(A)}f_{j}(B)\Big{]}d\eta(A)d\eta(B)$
		$\displaystyle=\frac{1}{2}\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\left\|f_{i% }(A)-f_{j}(B)\right\|^{2}d\eta(A)d\eta(B).$		(B.5)

Appendix C Proof of Theorem 10

For any $\Psi\in\mathcal{H}$ , by plugging (4.10) into (4.4), and using (3.9) we have for any $i\in\{1,\ldots,N\}$ that

\displaystyle\left\{W\Psi\right\}(i,A)=\sum_{j=1}^{N}\int_{G}\sum_{\ell\in% \mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,n=1}^{d_{\ell}}\left(\hat{W}^{\ell}_{ij}% \right)_{mn}U^{\ell}_{mn}\left(A^{*}B\right)\Psi_{j}(B)d\eta(B),

(C.1)

where we denote by $(\hat{W}^{\ell}_{ij})_{mn}$ and $U^{\ell}_{mn}(\cdot)$ the $(m,n)^{\text{th}}$ entries of $\hat{W}^{\ell}_{ij}$ and $U^{\ell}(\cdot)$ , respectively, and $\mathcal{I}_{G}$ enumerates the IURs of $G$ . Next, by using the homomorphism property of group representations (3.8), we have

\displaystyle\left\{W\Psi\right\}(i,A)=\sum_{j=1}^{N}\sum_{\ell\in\mathcal{I}_% {G}}d_{\ell}\cdot\sum_{m,n=1}^{d_{\ell}}\left(\hat{W}^{\ell}_{ij}\right)_{mn}% \sum_{r=1}^{d_{\ell}}U^{\ell}_{mr}\left(A^{*}\right)\int_{G}U^{\ell}_{rn}\left% (B\right)\Psi_{j}(B)d\eta(B).

(C.2)

We now show that for a given $q\in\mathcal{I}_{G}$ , $p\in\left\{1,\ldots,d_{\ell}\right\}$ and $s\in\{1,\ldots,N\}$ the function $\Phi_{q,p,s}$ of (4.14) is an eigenfunction of $L$ .

Extending the notation of (4.13), for any $v\in\mathbb{C}^{N\ell}$ and all $j\in\{1,\ldots,N\}$ , we denote the $d_{\ell}$ entries of the vector $e^{j}(v)\in\mathbb{C}^{d_{\ell}}$ by

e^{j}(v)=\left(e^{j}_{1}(v),\ldots,e^{j}_{d_{\ell}}(v)\right).

(C.3)

Now, the homomorphism property (3.8) implies that $U^{q}_{\cdot,p}\left(A^{*}\right)=\overline{U^{q}_{p,\cdot}\left(A\right)}$ . Thus, plugging $\Psi=\Phi_{q,p,s}$ into (C.2), and using the notation in (C.3), and that

\Psi_{j}(A)=\Phi_{q,p,s}(j,A)=e^{j}\left(v_{s}\right)U^{q}_{\cdot,p}(A^{*}),

(C.4)

the expression for $\left\{W\Phi_{q,p,s}\right\}(i,A)$ is given by

		$\displaystyle\sum_{j=1}^{N}\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,n% =1}^{d_{\ell}}\left(\hat{W}^{\ell}_{ij}\right)_{mn}\sum_{r=1}^{d_{\ell}}U^{% \ell}_{mr}\left(A^{*}\right)\int_{G}U^{\ell}_{rn}\left(B\right)\sum_{k=1}^{d_{% \ell}}e^{j}_{k}\left(v_{s}\right)\overline{U^{q}_{p,k}\left(B\right)}d\eta(B)$		(C.5)
		$\displaystyle=\sum_{j=1}^{N}\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,% n=1}^{d_{\ell}}\left(\hat{W}^{\ell}_{ij}\right)_{mn}\sum_{r=1}^{d_{\ell}}\sum_% {k=1}^{d_{\ell}}U^{\ell}_{mr}\left(A^{*}\right)e^{j}_{k}\left(v_{s}\right)\int% _{G}U^{\ell}_{rn}\left(B\right)\overline{U^{q}_{p,k}\left(B\right)}d\eta(B)$

By Schur’s orthogonality relations (see e.g. [11]), we have

\int_{G}U^{\ell}_{rn}\left(B\right)\overline{U^{q}_{p,k}\left(B\right)}d\eta(B% )=d_{q}^{-1}\delta_{rp}\delta_{nk}\delta_{\ell q},

(C.6)

by which we get that the expression in (C.5) for $\left\{W\Phi_{q,p,s}\right\}(i,A)$ becomes

	$\displaystyle\left\{W\Phi_{q,p,s}\right\}(i,A)$	$\displaystyle=\sum_{j=1}^{N}\sum_{m,n=1}^{d_{q}}\left(\hat{W}^{q}_{ij}\right)_% {mn}U^{q}_{mp}\left(A^{*}\right)e^{j}_{n}\left(v_{s}\right)$		(C.7)
		$\displaystyle=\sum_{m=1}^{d_{q}}U^{q}_{mp}\left(A^{*}\right)\sum_{j=1}^{N}\sum% _{n=1}^{d_{q}}\left(\hat{W}^{q}_{ij}\right)_{mn}e^{j}_{n}\left(v_{s}\right).$

Next, we notice that

\sum_{j=1}^{N}\sum_{n=1}^{d_{q}}\left(\hat{W}^{q}_{ij}\right)_{mn}e^{j}_{n}% \left(v_{s}\right)=\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\cdot v_{s},

(C.8)

where $\hat{W}^{q}$ is the block matrix of Fourier coefficients matrices of $q^{\text{th}}$ order that was defined in (4.12), and $(\hat{W}^{q})_{((i-1)d_{q}+m,\cdot)}$ is the $m^{\text{th}}$ row of the $d_{q}\times Nd_{q}$ matrix consisting of blocks $(i,1),(i,2),\ldots,(i,N)$ of $\hat{W}^{q}$ . Thus, we get that

\left\{W\Phi_{q,p,s}\right\}(i,A)=\sum_{m=1}^{d_{q}}U^{q}_{mp}\left(A^{*}% \right)\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\cdot v_{s}.

(C.9)

Next, we notice that by the definition of $D^{\ell}$ in statement of the theorem, we have that

D^{\ell}_{(i-1)d_{\ell}+m,(i-1)d_{\ell}+m}=D_{ii},\quad m\in\{1,\ldots d_{\ell% }\},\quad i\in\{1,\ldots,N\}.

(C.10)

That is, the $(m,m)$ -th element of the $(i,i)$ -th block of the $Nd_{\ell}\times Nd_{\ell}$ matrix $D^{\ell}$ is given by $D_{ii}$ defined in $\eqref{GinvDef:Ddef}$ , for all $m\in\{1,\ldots,d_{\ell}\}$ . Thus, by (4.6), (C.9) and (C.10), we have

$\displaystyle L\left\{\Phi_{q,p,s}\right\}(i,A)$	$\displaystyle=D_{ii}\Phi_{q,p,s}(i,A)-\sum_{m=1}^{d_{q}}U^{q}_{mp}\left(A^{*}% \right)\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\cdot v_{s}$
	$\displaystyle=D_{ii}e^{i}\left(v_{s}\right)U^{q}_{\cdot,p}(A^{})-\sum_{m=1}^{% d_{q}}U^{q}_{mp}\left(A^{}\right)\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot% }\cdot v_{s}$
	$\displaystyle=\sum_{m=1}^{d_{q}}U_{m,p}^{q}(A^{*})D^{q}_{(i-1)d_{q}+m,(i-1)d_{% q}+m}e^{i}_{m}(v_{s})$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad-\sum_{m=1}^{d_{q}}U^{q}_{mp}\left(% A^{*}\right)\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\cdot v_{s}$
	$\displaystyle=\sum_{m=1}^{d_{q}}U_{m,p}^{q}(A^{*})\left(D^{q}_{(i-1)d_{q}+m,% \cdot}\cdot v_{s}-\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\cdot v_{s}\right)$
	$\displaystyle=\sum_{m=1}^{d_{q}}U_{m,p}^{q}(A^{*})\left(D^{q}_{(i-1)d_{q}+m,% \cdot}-\left(\hat{W}^{q}\right)_{(i-1)d_{q}+m,\cdot}\right)v_{s}.$	(C.11)

Thus, since $v_{s}$ is an eigenvector of $D^{q}-\hat{W}^{q}$ corresponding an eigenvalue $\lambda$ , we get

	$\displaystyle L\left\{\Phi_{q,p,s}\right\}(i,A)$	$\displaystyle=\sum_{m=1}^{d_{q}}U_{m,p}^{q}(A^{})\lambda e^{i}_{m}(v_{s})=% \lambda\sum_{m=1}^{d_{q}}e^{i}_{m}(v_{s})U_{m,p}^{q}(A^{})$		(C.12)
		$\displaystyle=\lambda e^{i}(v_{s})U^{q}_{\cdot,p}(A^{*})=\lambda\Phi_{q,p,s}(i% ,A),$

showing that the function $\Phi_{q,p,s}$ is an eigenfunction of $L$ in (4.6).

Next, we show that the eigenfucntions in (4.14) are orthogonal. Indeed, we have that

$\displaystyle\langle\Phi_{m_{1},k_{1},\ell_{1}},\Phi_{m_{2},k_{2},\ell_{2}}% \rangle_{\mathcal{H}}$	$\displaystyle=\sum_{j=1}^{N}\int_{G}\Phi_{m_{1},k_{1},\ell_{1}}(j,A)\Phi_{m_{2% },k_{2},\ell_{2}}^{*}(j,A)d\eta(A)=$	(C.13)
	$\displaystyle=\sum_{j=1}^{N}\int_{G}e^{j}(v_{m_{1},\ell_{1}})U_{\cdot,k_{1}}^{% \ell_{1}}(A)(U_{\cdot,k_{2}}^{\ell_{2}}(A))^{}(e^{j}(v_{m_{2},\ell_{2}}))^{}% d\eta(A)$
	$\displaystyle=\sum_{j=1}^{N}e^{j}(v_{m_{1},\ell_{1}})\left(\int_{G}U_{\cdot,k_% {1}}^{\ell_{1}}(A)\left(\overline{U_{\cdot,k_{2}}^{\ell_{2}}(A)}\right)^{T}dA% \right)(e^{j}(v_{m_{2},\ell_{2}}))^{*}.$

The outer product of rows $U_{\cdot,k_{1}}^{\ell_{1}}(A)\left(\overline{U_{\cdot,k_{2}}^{\ell_{2}}}\right% )^{T}(A)$ is a $d_{\ell_{1}}\times d_{\ell_{2}}$ matrix of products between elements of the IURs of $G$ , and by Schur’s orthogonality relations, we have that

d_{\ell}\cdot\int_{G}U_{\cdot,k_{1}}^{\ell_{1}}(A)\overline{U_{k_{2},\cdot}^{% \ell_{2}}}(A)d\eta(A)=\begin{cases}I_{d_{\ell}\times d_{\ell}}&\ell_{1}=\ell_{% 2}=\ell,k_{1}=k_{2}=k,\\ 0&\text{otherwise}.\end{cases}

(C.14)

Thus, when $\ell_{1}=\ell_{2}=\ell$ and $k_{1}=k_{2}=k$ , we are left with

	$\displaystyle\langle\Phi_{m_{1},k_{1},\ell_{1}},\Phi_{m_{2},k_{2},\ell_{2}}% \rangle_{\mathcal{H}}$	$\displaystyle=d_{\ell}\cdot\sum_{j=1}^{N}e^{j}(v_{m_{1},\ell})(e^{j}(v_{m_{2},% \ell}))^{*}$		(C.15)
		$\displaystyle=d_{\ell}\cdot\langle v_{m_{1},\ell},v_{m_{2},\ell}\rangle_{% \mathbb{C}^{Nd_{\ell}}}=\begin{cases}d_{\ell}&m_{1}=m_{2}=m,\\ 0&m_{1}\neq m_{2},\end{cases}$

which shows that $\Phi_{m_{1},k_{1},\ell_{1}}$ and $\Phi_{m_{2},k_{2},\ell_{2}}$ are orthogonal.

To show that the eigenfunctions in (4.14) form a basis for $\mathcal{H}$ , we first assert that the matrices $\hat{W}^{(\ell)}$ in (4.11) are hermitian. For the latter we require the following result (see p.82 in [10]).

Lemma 12.

Let $G$ be a compact unitary matrix Lie group. Then, we have that

\int_{G}f(A^{*})d\eta(A)=\int_{G}f(A)d\eta(A),

(C.16)

where $\eta$ is the Haar measure over $G$ .

Now, by (4.11) and (4.4), we have

	$\displaystyle\hat{W}^{(\ell)}_{ji}$	$\displaystyle=\int_{G}W_{ji}(I,A)\overline{U^{\ell}(A)}d\eta(A)=\int_{G}W_{ij}% (I,A^{})\overline{\left(U^{\ell}(A^{})\right)^{T}}d\eta(A)$
		$\displaystyle=\overline{\left(\int_{G}W_{ij}(I,A^{})U^{\ell}(A^{})d\eta(A)% \right)^{T}}=\overline{\left(\int_{G}W_{ij}(I,A)U^{\ell}(A)d\eta(A)\right)^{T}% }=\left(\hat{W}^{(\ell)}_{ij}\right)^{*},$

where in passing to the second equality we used the homomorphism property (3.8) that implies

I=U^{\ell}(AA^{*})=U^{\ell}(A)\cdot U^{\ell}(A^{*}),

hence $U^{\ell}(A^{*})=\left(U^{\ell}(A)\right)^{*}$ , and Lemma 12 in passing to the third equality.

Now, for $f\in L^{2}\left(\left\{1,\ldots,N\right\}\times G\right)$ and a fixed $i\in\{1,\ldots,N\}$ , we observe that since $f(i,A)\in L^{2}(G)$ then we also have $\overline{f(i,A)}\in L^{2}(G)$ , and thus we can expand $\overline{f(i,A)}$ as

\displaystyle\overline{f(i,A)}

\displaystyle=\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,m^{\prime}=1}^% {d_{\ell}}\alpha^{i}_{\ell,m,m^{\prime}}U^{\ell}_{m,m^{\prime}}(A),

(C.17)

from which we get

\displaystyle f(i,A)

\displaystyle=\sum_{\ell\in\mathcal{I}_{G}}d_{\ell}\cdot\sum_{m,m^{\prime}=1}^% {d_{\ell}}\tilde{\alpha}^{i}_{\ell,m,m^{\prime}}\overline{U^{\ell}_{m,m^{% \prime}}(A)},

(C.18)

where $\tilde{\alpha}^{i}_{\ell,m,m^{\prime}}=\overline{\alpha^{i}_{\ell,m,m^{\prime}}}$ for all $\ell\in\mathcal{I}$ and $m,m^{\prime}\in\left\{1,\ldots,d_{\ell}\right\}$ . Fix $\ell$ and $m$ . The matrix $W^{(\ell)}$ is hermitian, and thus admits an orthonormal basis of eigenvectors $\left\{v_{n,\ell}\right\}$ , $n=1,2,\ldots,d_{\ell}N$ . Thus, we can expand

\left(\tilde{\alpha}^{1}_{\ell,m,1},\ldots,\tilde{\alpha}^{1}_{\ell,m,d_{\ell}% },\ldots,\tilde{\alpha}^{N}_{\ell,m,1},\ldots,\tilde{\alpha}^{N}_{\ell,m,d_{% \ell}}\right)^{T}=\sum_{n=1}^{Nd_{\ell}}\beta_{\ell,m,n}v_{n,\ell},

from which we have that

\tilde{\alpha}_{l,m,m^{\prime}}^{i}=\sum_{n=1}^{Nd_{\ell}}\beta_{\ell,m,n}e^{i% }_{m^{\prime}}(v_{n,\ell}).

(C.19)

Plugging (C.19) back in to (C.18), we have

	$\displaystyle f(i,R)$	$\displaystyle=d_{\ell}\cdot\sum_{\ell\in\mathcal{I}}\sum_{m,m^{\prime}=1}^{d_{% \ell}}\sum_{n=1}^{Nd_{\ell}}\beta_{\ell,m,n}e^{i}_{m^{\prime}}(v_{n,\ell})U_{m% ^{\prime},m}^{\ell}(A^{*})$
		$\displaystyle=d_{\ell}\cdot\sum_{\ell\in\mathcal{I}}\sum_{m=1}^{d_{\ell}}\sum_% {n=1}^{Nd_{\ell}}\beta_{\ell,m,n}\sum_{m^{\prime}=-\ell}^{\ell}e^{i}_{m^{% \prime}}(v_{n,\ell})U_{m^{\prime},m}^{\ell}(A^{*})$
		$\displaystyle=d_{\ell}\cdot\sum_{\ell\in\mathcal{I}}\sum_{m=1}^{d_{\ell}}\sum_% {n=1}^{Nd_{\ell}}\beta_{\ell,m,n}\Phi_{\ell,m,n}(i,A),$

which shows directly that any function $f\in L^{2}\left(\{1,\ldots,N\}\times G\right)$ can be expanded in a series of eigenfunctions of the $G$ -GL.

Lastly, the fact that the eigenvalues of $L$ are real and non-negative is a direct result of Lemma 9, coupled with the fact that $L$ is a symmetric operator, since we have that $W_{ij}{(A,B)}=W_{ji}(B,A)$ for all $A,B\in G$ and all $i,j\in\left\{1\ldots,N\right\}$ .

Appendix D Proof of Theorem 11

The analysis that follows is a generalization of the proof of Theorem 2 in [38]. The proof is divided into 4 parts, given in appendices D.1,D.2 D.5 and D.6. In part 1, we show that the $G$ -invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold $\mathcal{M}$ . In part 2, we derive the convergence rate (the variance term) of our operator, using the proof technique derived in [40] for the standard graph Laplacian. In parts 3 and 4, we provide proof for key results that are used in part 2. Appendices D.3 and D.4 provide some differential geometry background needed in for D.5.

D.1 Convergence of the $G$ -invariant graph Laplacian

In this section, we show that for a fixed $\epsilon>0$ and as $N\rightarrow\infty$ , the normalized $G$ -invariant graph Laplacian approximates the Laplace-Beltrami operator on the data manifold $\mathcal{M}$ up to an $O(\epsilon)$ error at each data point $A\cdot x_{i}\in G\cdot X$ . We will assume w.l.o.g that $A=I$ , since all the analysis that follows can be carried out exactly in the same manner and with the same results when $A\neq I$ .

By (4.3),(4.5),(4.6), and (4.8), we can write

	$\displaystyle\frac{4}{\varepsilon}\left\{\tilde{L}g\right\}(i,I)$	$\displaystyle=\frac{4}{\varepsilon}\left[f(x_{i})-\sum_{j=1}^{N}\int_{G}D^{-1}% _{ii}{W}_{ij}(I,B)f(B\cdot x_{j})d\eta(B)\right]$
		$\displaystyle=\frac{4}{\varepsilon}\left[f(x_{i})-\frac{\frac{1}{N}\sum_{j=1}^% {N}\int_{G}\exp{\left\{-{\left\\|x_{i}-B\cdot x_{j}\right\\|^{2}}{/\varepsilon}% \right\}}f(B\cdot x_{j})d\eta(B)}{\frac{1}{N}\sum_{j=1}^{N}\int_{G}\exp{\left% \{-{\left\\|x_{i}-B\cdot x_{j}\right\\|^{2}}{/\varepsilon}\right\}}d\eta(B)}% \right].$		(D.1)

We now derive the limit of (D.1) for $N\rightarrow\infty$ and a fixed $\varepsilon>0$ , showing that it is essentially the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ with an additional bias error term of $O(\varepsilon)$ . First, let us focus on the expression

C_{i,N}^{1}\coloneq\frac{1}{N}\sum_{j=1}^{N}\int_{G}\exp{\left\{-{\left\|x_{i}% -B\cdot x_{j}\right\|^{2}}{/\varepsilon}\right\}}f(B\cdot x_{j})d\eta(B),

(D.2)

which is the numerator of the second term of (D.1) (inside the brackets). Let us define

H_{i}(x)\coloneq\int_{G}\exp{\left\{-{\left\|x_{i}-B\cdot x\right\|^{2}}{/% \varepsilon}\right\}}f(B\cdot x)d\eta(B),\quad x\in\mathcal{M}.

(D.3)

Since $\left\{x_{i}\right\}$ are i.i.d samples from $\mathcal{M}$ , the law of large numbers implies

	$\displaystyle\lim_{N\rightarrow\infty}C_{i,N}^{1}$	$\displaystyle=\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{j=1}^{N}H_{i}(x_{j})=% \lim_{N\rightarrow\infty}\frac{1}{N}\sum_{j\neq i,j=1}^{N}H_{i}(x_{j})$		(D.4)
		$\displaystyle=\mathbb{E}{\left[H_{i}(x)\right]}=\int_{\mathcal{M}}H_{i}(x)p(x)% d\omega(x),$		(D.5)

where $p(x)$ is the sampling density of the data over $\mathcal{M}$ , and $\omega(x)$ is the measure with respect to the Riemannian metric on $\mathcal{M}$ induced by the standard Euclidean inner product in $\mathbb{C}^{N}$ .

Next, we recall that $G$ acts on points $x\in\mathcal{M}$ by multiplication by unitary matrices $A$ . Consider the map $U_{A}:\mathcal{M}\rightarrow\mathcal{M}$ defined by

U_{A}(x)=A\cdot x,\quad x\in\mathcal{M}.

(D.6)

The pushforward of $\omega$ by $U_{A}$ is the measure $U_{A}^{*}(\omega)(\cdot)$ over $\mathcal{M}$ defined by

U_{A}^{*}(\omega)(S)=\omega(U_{A}^{-1}(S)),

(D.7)

for all Lebesgue-measurable subsets $S\subseteq\mathcal{M}$ . Since $U_{A}$ acts as an isometry over $\mathcal{M}$ , and the metric tensor over $\mathcal{M}$ is invariant under isometries, we conclude that $\omega(x)$ is $G$ -invariant. That is, for fixed $A\in G$ we have

U_{A}^{*}(\omega)(S)=\omega(S),

(D.8)

for all Lebesgue-measurable subsets $S\subseteq\mathcal{M}$ .

Using the latter observation, and assuming that $p$ is uniform over $\mathcal{M}$ (and so $p(x)=1/\operatorname{Vol}\{\mathcal{M}\}$ ) we have

$\displaystyle\int_{\mathcal{M}}H_{i}(x)p(x)d\omega(x)$	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{% \mathcal{M}}\int_{G}\exp{\left\{-{\left\\|x_{i}-B\cdot x\right\\|^{2}}{/% \varepsilon}\right\}}f(B\cdot x)d\omega(x)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{G}% \int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}% \right\}}f(y)dU_{A}^{*}(\omega)(y)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{G}% \int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}% \right\}}f(y)d\omega(y)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{% \mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}\right\}}f(% y)d\omega(y),$	(D.9)

where in the second equality we applied the change of variables $y=B\cdot x=U_{B}(x)$ , and in the fourth equality that $\operatorname{Vol}(G)=1$ , by (3.5).

In a similar fashion, if we consider the denominator of the second term in (D.1)

C_{i,N}^{2}\coloneq\frac{1}{N}\sum_{j=1}^{N}\int_{G}\exp{\left\{-{\left\|x_{i}% -B\cdot x_{j}\right\|^{2}}{/\varepsilon}\right\}}d\eta(B),

(D.10)

and by repeating the calculations carried above for $C_{i,N}^{1}$ but with $f\equiv 1$ , we get that

\lim_{N\rightarrow\infty}C_{i,N}^{2}=\frac{1}{\operatorname{Vol}\left\{% \mathcal{M}\right\}}\int_{\mathcal{M}}\exp{\left\{-{\left\|x_{i}-x\right\|^{2}% }{/\varepsilon}\right\}}d\omega(x)=\mathbb{E}\left[G_{i}(x)\right],

(D.11)

where we defined

G_{i}(x)\coloneq\sum_{j=1}^{N}\int_{G}e^{-\lVert x_{i}-B\cdot x\rVert^{2}/% \epsilon}d\eta(B),\quad x\in\mathcal{M}.

(D.12)

Lastly, if we substitute (D.1) and (D.11) into (D.1), we have that

	$\displaystyle\lim_{N\rightarrow\infty}\frac{4}{\varepsilon}\left\{\tilde{L}g% \right\}(i,I)$	$\displaystyle=\frac{4}{\varepsilon}\left[f(x_{i})-\frac{\frac{1}{\operatorname% {Vol}\left\{\mathcal{M}\right\}}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-x% \right\\|^{2}}{/\varepsilon}\right\}}f(x)d\omega(x)}{\frac{1}{\operatorname{Vol% }\left\{\mathcal{M}\right\}}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-x% \right\\|^{2}}{/\varepsilon}\right\}}d\omega(x)}\right]$		(D.13)
		$\displaystyle=\Delta_{\mathcal{M}}f(x_{i})+O(\varepsilon),$		(D.14)

where (D.14) is justified in [40].

D.2 The convergence rate

The variance error in the approximation of the Laplace-Beltrami operator by the $G$ -GL (second term on the r.h.s of (4.17)), is attributed to the difference between the values of $C_{i,N}^{1}$ and $C^{2}_{i,N}$ for a finite $N$ and their limit when $N\rightarrow\infty$ . To derive the variance error we employ the proof technique derived in [40]. As in Section D.1, we perform all our analysis in a neighbourhood of an arbitrary data point $A\cdot x_{i}$ , assuming w.l.o.g that $A=I$ .

Using the definitions (D.3) and (D.12), the normalized $G$ -GL $\eqref{GinvDef:normGLapDef}$ applied to an arbitrary smooth function $f$ on $\mathcal{M}$ , and evaluated at the fixed point $(i,I)$ can be written as

\tilde{L}\left\{f\right\}(i,I)=f(x_{i})-\frac{\sum_{j=1}^{N}H_{i}(x_{j})}{\sum% _{j=1}^{N}G_{i}(x_{j})}.

(D.15)

Following [40], we employ the Chernoff tail inequality to bound the probability of (D.15) deviating from its mean (the limit of (D.15) when $N\rightarrow\infty$ ). We now derive a bound on the probability of the $\alpha$ -error

p_{+}(N,\alpha)=Pr\left\{\frac{\sum_{j\neq i}^{N}H_{i}(x_{j})}{\sum_{j\neq i}^% {N}G_{i}(x_{j})}-\frac{\mathbb{E}\left[H_{i}\right]}{\mathbb{E}\left[G_{i}% \right]}>\alpha\right\},

(D.16)

where we point out that excluding the diagonal terms $H_{i}(x_{i})$ and $G_{i}(x_{i})$ results in an even smaller error than the variance error itself, as was shown in [38] and [40]. We also point out that a bound on the probability

p_{-}(N,\alpha)=Pr\left\{\frac{\sum_{j\neq i}^{N}H_{i}(x_{j})}{\sum_{j\neq i}^% {N}G_{i}(x_{j})}-\frac{\mathbb{E}\left[H_{i}\right]}{\mathbb{E}\left[G_{i}% \right]}<-\alpha\right\},

(D.17)

can be obtained by the same technique that we now apply to bound (D.16).

Now, defining

J_{i}(x_{j})\coloneq\mathbb{E}\left[G_{i}\right]H_{i}(x_{j})-\mathbb{E}\left[H% _{i}\right]G_{i}(x_{j})+\alpha\mathbb{E}\left[G_{i}\right]\left(\mathbb{E}% \left[G_{i}\right]-G_{i}(x_{j})\right),

(D.18)

it was shown in [40] that $p(N,\alpha)$ can be rewritten as

p_{+}(N,\alpha)=Pr\left\{\sum_{j\neq i}^{N}J_{i}(x_{j})>\alpha(N-1)\left(% \mathbb{E}\left[G_{i}\right]\right)^{2}\right\},

(D.19)

where $J_{i}(x_{j})$ are i.i.d random variables. Using the Chernoff inequality we get

p_{+}(N,\alpha)\leq\exp\left\{\frac{\alpha^{2}(N-1)^{2}\left(\mathbb{E}\left[G% _{i}\right]\right)^{4}}{2(N-1)\mathbb{E}\left[\left(J_{i}\right)^{2}\right]+O% \left(\alpha\right)}\right\}.

(D.20)

From (D.18) we get by a direct calculation that

\begin{split}\mathbb{E}\left[\left(J_{i}\right)^{2}\right]=\left(\mathbb{E}% \left[G_{i}\right]\right)^{2}\mathbb{E}\left[\left(H_{i}\right)^{2}\right]-2% \mathbb{E}\left[G_{i}\right]\mathbb{E}\left[H_{i}\right]\mathbb{E}\left[H_{i}G% _{i}\right]\\ +\left(\mathbb{E}\left[H_{i}\right]\right)^{2}\mathbb{E}\left[\left(G_{i}% \right)^{2}\right]+O\left(\alpha\right).\end{split}

(D.21)

To evaluate all the moments in (D.21), and consequently the quantities $\mathbb{E}\left[\left(J_{i}\right)^{2}\right]$ and $\mathbb{E}\left[\left(G_{i}\right)^{4}\right]$ in (D.20), we use the following result from [40], which will be the key instrument in the analysis that follows.

Theorem 13.

Let $\mathcal{M}$ be a smooth and compact $d$ -dimensional submanifold, and let $f:\mathcal{M}\rightarrow\mathcal{R}$ be a smooth function. Then, for any $y\in\mathcal{M}$

\left(\pi\epsilon\right)^{-d/2}\int_{\mathcal{M}}e^{-\left\lVert y-x\right% \rVert^{2}/\epsilon}f(x)dx=f(y)+\frac{\epsilon}{4}\bigg{[}E(y)f(y)+\Delta_{% \mathcal{M}}f(y)\bigg{]}+O\left(\epsilon^{2}\right),

(D.22)

where $E(y)$ is a scalar function of the curvature of $\mathcal{M}$ at $y$ .

This shows that the integral on the l.h.s of (D.22) essentially operates as an evaluation functional of $f$ at the point $y$ , up to an $O(\epsilon)$ error.

Applying Theorem 13 to the first order moments appearing in (D.21), we immediately obtain

\mathbb{E}\left[H_{i}\right]=\frac{1}{\text{Vol}\left\{\mathcal{M}\right\}}% \int_{\mathcal{M}}e^{-\lVert x_{i}-x\rVert^{2}/\epsilon}f(x)dx=\frac{1}{\text{% Vol}\left(\mathcal{M}\right)}(\pi\epsilon)^{d/2}\big{[}f(x_{i})+O\left(% \epsilon\right)\big{]},

(D.23)

and

\mathbb{E}\left[G_{i}\right]=\frac{1}{\text{Vol}\left\{\mathcal{M}\right\}}% \int_{\mathcal{M}}e^{-\lVert x_{i}-x\rVert^{2}/\epsilon}dx=\frac{1}{\text{Vol}% \left(\mathcal{M}\right)}(\pi\epsilon)^{d/2}\big{[}1+O\left(\epsilon\right)% \big{]}.

(D.24)

Thus, in order to evaluate $\mathbb{E}\left[\left(J_{i}\right)^{2}\right]$ in (D.21), it remains to evaluate the second order moments $\mathbb{E}\left[\left(H_{i}\right)^{2}\right],\mathbb{E}\left[\left(G_{i}% \right)^{2}\right]$ and $\mathbb{E}\left[H_{i}G_{i}\right]$ , which we carry out in two steps in the following two sections.

First, in Section D.5 we construct a local parametrization of $\mathcal{M}$ in a sufficiently small neighborhood $\mathcal{M}^{\prime}$ of $x_{i}$ , such that each $x\in\mathcal{M}^{\prime}$ is mapped to a unique pair $(z,B)$ , where all the $z$ values reside on a $(d-d_{G})$ -dimensional submanifold $\mathcal{N}\subset\mathcal{M}^{\prime}$ , and $B\in G$ . Next, in Section D.6, we use the results of Section D.5 to reduce integration over $\mathcal{M}$ in the expressions for the second order moments in (D.21) to integration over $\mathcal{N}$ , leading to the following lemma.

Lemma 14.

There exist a smooth function $\mu(x)>0$ over $\mathcal{N}$ , and a smooth function $p_{\mathcal{N}}(x)>0$ over $\mathcal{M}^{\prime}$ such that

	$\displaystyle\mathbb{E}\left[\left(H_{i}(x)\right)^{2}\right]=\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\bigg{[}\frac{f^{2}(x_{i})% p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)\bigg{]},$		(D.25)
	$\displaystyle\mathbb{E}\left[\left(G_{i}(x)\right)^{2}\right]=\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\bigg{[}\frac{p_{\mathcal{% N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)\bigg{]},$		(D.26)
	$\displaystyle\mathbb{E}\left[H_{i}(x)G_{i}(x)\right]=\frac{(\pi\epsilon)^{% \left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\bigg{[}\frac{f(x_{i})p_{\mathcal{N}}% (x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)\bigg{]}.$		(D.27)

By Lemma 14, (D.24) and (D.23), we have

$\displaystyle\mathbb{E}\left[\left(J_{i}\right)^{2}\right]$	$\displaystyle=\frac{1}{\text{Vol}\left(\mathcal{M}\right)^{2}}\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\left(\pi\epsilon\right)^{% d}\bigg{[}\frac{f^{2}(x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon% )\bigg{]}$	(D.28)
	$\displaystyle-2\frac{1}{\text{Vol}\left(\mathcal{M}\right)^{2}}\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\left(\pi\epsilon\right)^{% d}\bigg{[}\frac{f^{2}(x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon% )\bigg{]}$
	$\displaystyle+\frac{1}{\text{Vol}\left(\mathcal{M}\right)^{2}}\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\left(\pi\epsilon\right)^{% d}\bigg{[}\frac{f^{2}(x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon% )\bigg{]}+O\left(\alpha\right)$
	$\displaystyle=\frac{1}{\text{Vol}\left(\mathcal{M}\right)^{2}}\frac{(\pi% \epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}\left(\pi\epsilon\right)^{% d}\cdot O(\epsilon)=O\left(\epsilon^{3d/2+(d_{G}+2)/2}\right)+O\left(\alpha% \right).$

We now obtain Theorem 11, and in particular (4.17), by repeating the computations in equations $121-128$ in the proof of Lemma $10$ in [38], with $d_{G}=1$ replaced by an arbitrary group dimension $d_{G}$ .

D.3 Real manifolds embedded in $\mathbb{C}^{n}$

Before we continue with the proof of Theorem 11, we describe the way we view real manifolds embedded in a complex vector space.

Firstly, we point out that we say that a $d$ -dimensional manifold $\mathcal{M}\subset\mathbb{C}^{n}$ is real, in the sense that its charts are given by maps of the form $\Phi:U\rightarrow\mathbb{R}^{d}$ , where $U$ is an open subset in $\mathcal{M}$ . This is in contrast to complex manifolds that admit charts that map open subsets in the manifold to the unit disk in $\mathbb{C}^{n}$ . The crucial distinction between the two is that real manifolds admit a differentiable structure where the transition maps between charts are differentiable with respect to real variables, while complex manifolds admit transition maps that are holomorphic.

Specifically , we can formulate our entire analysis in a real space by identifying $\mathbb{C}^{n}$ with $\mathbb{R}^{2n}$ via the map

z\mapsto\tilde{z}=\begin{pmatrix}\text{Re}\left\{z\right\}\\ \text{Im}\left\{z\right\}\end{pmatrix},\quad z\in\mathbb{C}^{n}.

(D.29)

If we equip $\mathbb{C}^{n}$ with the real valued inner product given by

\left\langle u,v\right\rangle=\text{Re}\left\{\left\langle u,v\right\rangle_{% \mathbb{C}^{n}}\right\},

(D.30)

the map (D.29) becomes an isometry, since

\text{Re}\left\{\left\langle z,w\right\rangle_{\mathbb{C}^{n}}\right\}=\left% \langle\tilde{z},\tilde{w}\right\rangle,\quad z,w\in\mathbb{C}^{n}.

(D.31)

Now, let $\mathcal{M}\subset\mathbb{C}^{n}$ be an embedded $d$ -dimensional submanifold, and let $\left(U,\Phi\right)$ be a local chart on $\mathcal{M}$ , that is, $U\subset\mathcal{M}$ is an open subset, and $\Phi$ is a diffeomorphism that maps $U$ onto an open subset of $\mathbb{R}^{d}$ , where we identify $U$ with the set

\tilde{U}=\left\{\tilde{u}\;:\;u\in U\right\}

using the map $\tilde{(\cdot)}$ in (D.29). The inverse map $\Phi^{-1}(u_{1},\ldots,u_{d})$ parametrizes the points $x\in U$ as $x=x(u_{1},\ldots,u_{d})=\Phi^{-1}(u_{1},\ldots,u_{d})$ . The Jacobian matrix of the latter parametrization is given in coordinates by

J_{u}=\begin{pmatrix}\frac{\partial x_{1}}{\partial u_{1}}&\dots&\frac{% \partial x_{1}}{\partial u_{d}}\\ \vdots&\dots&\vdots\\ \frac{\partial x_{n}}{\partial u_{1}}&\dots&\frac{\partial x_{n}}{\partial u_{% d}}\end{pmatrix}.

(D.32)

Thus, denoting

\frac{\partial x_{i}}{\partial u}=\left(\frac{\partial x_{i}}{\partial u_{1}},% \ldots,\frac{\partial x_{i}}{\partial u_{d}}\right)^{T},

(D.33)

the metric tensor induced on $\mathcal{M}$ by the dot product (D.30) is given in local coordinates as

\begin{pmatrix}\text{Re}\left\{\left\langle\frac{\partial x_{1}}{\partial u},% \frac{\partial x_{1}}{\partial u}\right\rangle_{\mathbb{C}^{n}}\right\}&\dots&% \text{Re}\left\{\left\langle\frac{\partial x_{1}}{\partial u},\frac{\partial x% _{n}}{\partial u}\right\rangle_{\mathbb{C}^{n}}\right\}\\ \vdots&\dots&\vdots\\ \text{Re}\left\{\left\langle\frac{\partial x_{1}}{\partial u},\frac{\partial x% _{n}}{\partial u}\right\rangle_{\mathbb{C}^{n}}\right\}&\dots&\text{Re}\left\{% \left\langle\frac{\partial x_{n}}{\partial u},\frac{\partial x_{n}}{\partial u% }\right\rangle_{\mathbb{C}^{n}}\right\}\end{pmatrix}=\text{Re}\left\{J^{*}J% \right\}.

(D.34)

The latter enables us to integrate smooth functions over open subsets $U\subset\mathcal{M}$ by

\int_{U}f(x)dx=\int_{\Phi^{-1}\left(U\right)}f(x(u))dV(u),

(D.35)

where $V(u)$ is the volume form on $\mathcal{M}$ given by

V(u)=\sqrt{\det\left\{\text{Re}\left\{J^{*}J\right\}\right\}},

(D.36)

and

dV(u)=V(u)du=V(u)du_{1}\dots du_{n}.

(D.37)

D.4 Coordinate charts on Lie groups

The next part of the proof of Theorem 11 also requires us to define coordinate charts on Lie groups. The standard coordinates on a Lie group $G$ is given by the exponential map over the Lie-algebra of $G$ . In detail, the Lie-algebra $\mathfrak{g}$ of $G$ is the tangent space to $G$ at the identity $I_{G}$ . By a theorem (due to Von-Neumann, see [23]), there exists a sufficiently small neighborhood $\mathcal{N}_{I}\subset G$ of $I_{G}$ , where for each $A\in G$ there exists a unique element $X\in\mathfrak{g}$ such that $\exp(X)=A$ , where $\exp(\cdot)$ is the matrix exponential. Thus, choosing a basis $\left\{X_{1},...,X_{d_{G}}\right\}$ for $\mathfrak{g}$ , we can write each element $X\in\mathfrak{g}$ as a linear combination

X=\sum_{i=1}^{d_{G}}u_{i}X_{i},

(D.38)

inducing a coordinate chart for $\mathcal{N}_{I}\subset G$ , such that the elements of $\mathcal{N}_{I}$ are given explicitly by the matrix valued map

A_{I}(u_{1},\ldots,u_{d_{G}})\coloneq\exp\left(\sum_{i=1}^{d_{G}}u_{i}X_{i}% \right),\quad(u_{1},\ldots,u_{d_{G}})\in U,

(D.39)

where $U\subset\mathbb{R}^{d_{G}}$ is an open subset. Multiplying the elements of $\mathcal{N}_{I}$ by a fixed element $B\in G$ (either from the left or the right) translates $\mathcal{N}_{I}$ to a neighborhood $\mathcal{N}_{B}$ of $B$ . A chart for $\mathcal{N}_{B}$ is thus given by (also see [44])

A_{I}(u_{1},\ldots,u_{d_{G}})\cdot B=\exp\left(\sum_{i=1}^{d_{G}}u_{i}X_{i}% \right)\cdot B,\quad(u_{1},\ldots,u_{d_{G}})\in U.

(D.40)

Hence, an atlas of charts for $G$ can be obtained by choosing a finite cover of $G$ (since $G$ is compact) by such neighborhoods. Equipped with the map (D.40), a chart for a neighborhood of $B\cdot x\in G\cdot x$ is obtained by multiplying $x$ by (D.40) on the left, that is

(u_{1},\ldots,u_{d_{G}})\mapsto A_{I}(u_{1},\ldots,u_{d_{G}})\cdot B\cdot x,% \quad(u_{1},\ldots,u_{d_{G}})\in\exp^{-1}(\mathfrak{g}).

(D.41)

Since $G\cdot x$ is diffeomorphic to $G$ , and thus compact, an atlas of charts for $G\cdot x$ is obtained by choosing a finite covering of $G\cdot x$ . In this work, to simplify notation, we refer to all charts in the atlas for $G\cdot x$ by the notation $A(u_{1},\ldots,u_{d_{G}})\cdot x$ , where we define implicitly that

A(u_{1},\ldots,u_{d_{G}})\cdot x=A_{I}(u_{1},\ldots,u_{d_{G}})\cdot B,

(D.42)

whenever we refer to points in a chart for a neighborhood of $B\cdot x$ .

D.5 G-invariant local parametrization of the data manifold

In this section, we construct a parametrization of $\mathcal{M}$ in a local neighbourhood $\mathcal{M}^{\prime}$ around the point $B\cdot x_{i}\in G\cdot x_{i}$ , which takes the form of a product between $G$ and a certain $(d-d_{G})$ -dimensional submanifold in $\mathcal{M}$ , and derive the integration volume form over $\mathcal{M}^{\prime}$ in terms of the resulting local coordinates. The parametrization we construct is $G$ -invariant in the sense that for every $x\in\mathcal{M}^{\prime}$ we have $G\cdot x\subset\mathcal{M}^{\prime}$ . As in the previous sections, for the rest of the this section, we will assume w.l.o.g that $B=I$ .

To construct our parametrization, for each $x\in\mathcal{M}$ we consider the solution to minimization problem

\hat{A}(x)=\operatorname*{argmin}_{A\in G}\lVert Ax-x_{i}\rVert,

(D.43)

and the value $z(x)=\hat{A}(x)\cdot x$ . In other words, for each $x\in\mathcal{M}$ we solve for the element $\hat{A}(x)\in G$ such that $z(x)$ is the point on the orbit $G\cdot x$ closest to $x_{i}$ . Since $G$ is compact, a solution for (D.43) exists. In Lemma 17 below, we prove that there exists a certain neighborhood $\mathcal{M}^{\prime}\subset\mathcal{M}$ of $x_{i}$ , such that the solution of (D.43) is also unique for each $x$ in this neighborhood. Subsequently, we parameterize points $x\in\mathcal{M}^{\prime}$ by

x(z,\hat{B})=\hat{B}\cdot z,\quad z\in\mathcal{N},

(D.44)

where $\hat{B}(x)=(\hat{A}(x))^{*}$ , and $\mathcal{N}$ is the set of unique solutions of (D.43) for all $x\in\mathcal{M}^{\prime}$ . The proof of Lemma 17 requires the notion of a $\delta$ -neighborhood of a manifold.

Definition 15.

Let $M\subset\mathbb{C}^{n}$ be a smooth compact embedded submanifold. Given a $\delta>0$ , the $\delta$ -neighborhood of $M$ is defined as

M_{\delta}=\left\{y\in\mathbb{C}^{n}\;:\;\left\lVert x-y\right\rVert<\delta% \text{ for some }x\in M\right\}.

(D.45)

Our proof also requires the following property of $\delta$ -neighborhoods.

Theorem 16.

There exists a $\delta>0$ such that any $x\in M_{\delta}$ has a unique closest point in $M$ .

For a proof, see Theorem 6.24 and Proposition 6.25 in [30].

Lemma 17.

There exists a $\delta>0$ such that the problem (D.43) has a unique solution for any $x$ in a $\delta$ -neighborhood of $G\cdot x_{i}$ . Furthermore, the $\delta$ -neighborhood $\left(G\cdot x_{i}\right)_{\delta}$ is $G$ -invariant.

Proof.

By assumption, with probability one we have that $Ax_{i}\neq x_{i}$ for all $A\in G$ . Since $G$ is a smooth manifold, the map $x_{i}\mapsto A\cdot x_{i}$ , $A\in G$ , is a smooth injective map onto the orbit $G\cdot x_{i}\subset\mathcal{M}$ . This implies that $G\cdot x_{i}$ is a smooth $d_{G}$ -dimensional compact embedded submanifold in $\mathcal{M}$ , diffeomorphic to $G$ . By Theorem 16, there exists a $\delta>0$ such that for any $x\in\left(G\cdot x_{i}\right)_{\delta}$ (see Definition 15) there exists a unique solution to the problem

	$\displaystyle\min_{y\in G\cdot x_{i}}\left\lVert x-y\right\rVert$	$\displaystyle=\min_{A\in G}\left\lVert x-A\cdot x_{i}\right\rVert=\min_{A\in G% }\left\lVert A^{*}x-x_{i}\right\rVert$
		$\displaystyle=\min_{A\in G}\left\lVert Ax-x_{i}\right\rVert=\min_{z\in G\cdot x% }\left\lVert z-x_{i}\right\rVert,$		(D.46)

which shows that there exists a unique point $z\in G\cdot x$ closest to $x_{i}$ , which is given by $z=\hat{A}(x)\cdot x$ , where $\hat{A}(x)$ is the unique solution to (D.43).

Moreover, we observe that

d=\left\lVert z-x_{i}\right\rVert=\left\lVert Bz-Bx_{i}\right\rVert,

(D.47)

for all $B\in G$ , which shows that any point on the orbit $G\cdot x$ has a point $y\in G\cdot x_{i}$ at the minimal distance $d$ . Thus, using that $d\leq\delta$ (since $z\in(G\cdot x_{i})_{\delta}$ ) Definition 15 implies that

G\cdot x\subset\left(G\cdot x_{i}\right)_{\delta},

(D.48)

for all $x\in\left(G\cdot x_{i}\right)_{\delta}$ , which shows that the $\delta$ -neighborhood $\left(G\cdot x_{i}\right)_{\delta}$ is $G$ -invariant. ∎

Now, let us denote

\mathcal{M}^{\prime}\coloneq\left(G\cdot x_{i}\right)_{\delta}\cap\mathcal{M},

(D.49)

for $\delta>0$ as in Lemma 17. The subset $\mathcal{M}^{\prime}$ is an open subset of $\mathcal{M}$ and thus a submanifold in $\mathcal{M}$ . Furthermore, Lemma 17 also implies that the neighborhood $\left(G\cdot x_{i}\right)_{\delta}$ is $G$ -invariant, and since $\mathcal{M}$ is also $G$ -invariant than so is $\mathcal{M}^{\prime}$ . Let us further denote by

\mathcal{N}\coloneq\left\{z\;:\;\min_{A\in G}\lVert Ax-x_{i}\rVert=\left\lVert z% -x_{i}\right\rVert,\quad x\in\mathcal{M}^{\prime}\right\},

(D.50)

the set resulting from solving (D.43) for $x\in\mathcal{M}^{\prime}$ . Using (D.49) and (D.50) we can write

\mathcal{M}^{\prime}=G\cdot\mathcal{N}=\left\{A\cdot z\;:\;A\in G,\quad z\in% \mathcal{N}\right\}.

(D.51)

We now show that $\mathcal{N}$ is an embedded compact $(d-d_{G})$ -dimensional submanifold in $\mathcal{M}$ . We do this by deriving an explicit solution for (D.43) for all $x\in\mathcal{M}^{\prime}$ . Note that (D.51) is diffeomorphic to the product space $G\times\mathcal{N}$ , which implies that $\mathcal{M}^{\prime}$ is a $d$ -dimensional submanifold in $\mathcal{M}$ .

Now, denoting $u=\left(u_{1},\ldots,u_{d_{G}}\right)$ , and differentiating the norm in (D.43) with respect to $u_{k}$ for each $k\in\left\{1,\ldots,d_{G}\right\}$ , the solution $\hat{A}(x)$ for (D.43) is given by $A(u)$ that solves the set of $d_{G}$ equations

\displaystyle\text{Re}\left\{\left\langle\frac{\partial A(u)\cdot x}{\partial u% _{k}},A(u)\cdot x-x_{i}\right\rangle\right\}=0,\quad k=1,\ldots,d_{G},

(D.52)

which are equivalent to

\displaystyle\text{Re}\left\{\left\langle\frac{\partial A(u)\cdot z}{\partial u% _{k}}\bigg{|}_{u=u^{*}},z-x_{i}\right\rangle\right\}=0,\quad k=1,\ldots,d_{G},

(D.53)

where $A(u^{*})=I_{G}$ , since we defined $z$ to be the closest point in $G\cdot x$ to $x_{i}$ . In particular, we have $z=\hat{A}(x)\cdot x$ , where $\hat{A}$ is the unique solution to (D.43). The expression on the l.h.s of the inner product in (D.53) is a vector tangent to $G\cdot z$ at $z$ , since it is the derivative of the map $u\rightarrow A(u)\cdot z$ (an explicit parametrization of the orbit $G\cdot z$ ) at $u=u^{*}$ , for which $A(u^{*})=I_{G}$ . Thus, by our discussion in Section D.3, and in particular (D.29)-(D.31), equation (D.53) simply implies that the closest point to $x_{i}$ on the orbit $G\cdot x$ is $z$ such that $z-x_{i}$ is perpendicular to the tangent space of $G\cdot x$ at $z$ . We may now rewrite (D.53) as

\text{Re}\left\{\left\langle\frac{\partial A(u)}{\partial u_{k}}\bigg{|}_{u=u^% {*}}\cdot z,z\right\rangle\right\}=\text{Re}\left\{z^{*}Y_{k}z\right\},\quad Y% _{k}=\left(\frac{\partial A(u)}{\partial u_{k}}\bigg{|}_{u=u^{*}}\right)^{*},% \quad k=1,\ldots,d_{G},

(D.54)

where $Y_{k}$ resides in the tangent space to $G$ at $I$ , given by the Lie algebra $\mathfrak{g}$ of $G$ . Now, the Lie-algebra $\mathfrak{u}(n)$ of $\text{U}(n)$ is the space of all $n\times n$ skew-Hermitian matrices, and by a theorem (see [21]), if $G$ is a Lie subgroup of $\text{U}(n)$ , then $\mathfrak{g}$ is a $d_{G}$ -dimensional subspace of $\mathfrak{u}(n)$ . Using the fact that the diagonal entries of skew-Hermitian matrices are all purely imaginary, we have for any $Y\in\mathfrak{g}$

$\displaystyle z^{*}Yz$	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}+\sum_% {i<j}^{N}(Y)_{ij}\overline{z_{i}}z_{j}+\sum_{i>j}^{N}(Y)_{ij}\overline{z_{i}}z% _{j}$
	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}+\sum_% {i<j}^{N}Y_{ij}\overline{z_{i}}z_{j}-\sum_{i<j}^{N}(\overline{Y})_{ij}z_{i}% \overline{z_{j}}$
	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}-2i% \cdot\text{Im}\left\{\ \sum_{i<j}^{N}(Y)_{ij}\overline{z_{i}}z_{j}\right\},$	(D.55)

where in passing to the second equality we switched the roles of $i$ and $j$ in the third sum and used that $Y_{ij}=-\left(\overline{Y}\right)_{ji}$ since $Y$ is skew-Hermitian. Plugging (D.5) into (D.54), we obtain

\text{Re}\left\{\left\langle\frac{\partial A(u)}{\partial u_{k}}\bigg{|}_{u=u^% {*}}\cdot z,z\right\rangle\right\}=0,\quad k=1,\ldots,d_{G}.

(D.56)

Then, substituting (D.56) into (D.53), we are left with

\text{Re}\left\{\left\langle Y_{k}\cdot z,x_{i}\right\rangle\right\}=0,\quad k% =1,\ldots,d_{G},

(D.57)

by which we can write the set $\mathcal{N}$ in (D.50) as

\mathcal{N}=\left\{z:\text{Re}\left\{\left\langle Y_{k}\cdot z,x_{i}\right% \rangle\right\}=0,\quad k=1,\ldots,d_{G},\quad z\in\mathcal{M}^{\prime}\right\}.

(D.58)

We now observe that $\mathcal{N}$ is the intersection of an open neighborhood of $\mathcal{M}$ with the subspace of $\mathbb{C}^{n}$ defined by the $d_{G}$ linear constraints in (D.57). In the following lemma we show that $\mathcal{N}$ is a $d-d_{G}$ -dimensional submanifold in $\mathcal{M}$ .

Lemma 18.

The set $\mathcal{N}$ in (D.58) is a $d-d_{G}$ -dimensional submanifold in $\mathcal{M}$ .

Proof.

In the following, we use the formulation of real manifolds in $\mathbb{C}^{n}$ presented in Section D.3. In particular, by using the map $\tilde{(\cdot)}$ in (D.29), let us define

\tilde{\mathcal{M}}=\left\{\tilde{x}\;:\;x\in\mathcal{M}\right\},\quad\tilde{x% }=\left(\text{Re}\left\{x\right\},\text{Im}\left\{x\right\}\right)^{T}.

(D.59)

Clearly, the manifold $\tilde{\mathcal{M}}$ is diffeomorphic to $\mathcal{M}$ , and by (D.30) and (D.31), the map $\tilde{(\cdot)}$ restricted to $\mathcal{M}$ is a Riemannian isometry, preserving the metric tensor of $\mathcal{M}$ . Furthermore, defining

\tilde{\mathcal{N}}=\left\{\tilde{z}\in\mathbb{R}^{2n}:\text{Re}\left\{\left% \langle Y_{1}\cdot u,x_{i}\right\rangle_{\mathbb{C}^{n}}\right\}=0,\quad k=1,% \ldots,d_{G},\quad z\in\mathcal{M}^{\prime}\right\},

(D.60)

we have that $z\in\mathcal{N}\iff\tilde{z}\in\tilde{\mathcal{N}}$ , that is, the map $\tilde{(\cdot)}$ restricted to $\mathcal{N}$ is a bijection (and a isometry) onto $\tilde{\mathcal{N}}$ . Thus, it suffices to show that $\tilde{\mathcal{N}}$ is a $(d-d_{G})$ -dimensional submanifold in $\tilde{\mathcal{M}}$ , which we now do.

The proof utilizes the implicit function theorem. By a theorem (see proposition 5.16 in [30]), there exists a neighborhood $\tilde{U}$ of $\tilde{x}_{i}$ in $\tilde{\mathcal{M}}$ , a diffeomorphism onto its image $\Phi:\tilde{U}\rightarrow\mathbb{R}^{2n-d}$ , and $c\in\mathbb{R}^{2n-d}$ such that $\tilde{U}$ can be parameterized as

\Phi(\tilde{u}_{1},\ldots,\tilde{u}_{2n})=c,

(D.61)

where $(\tilde{u}_{1},\ldots,\tilde{u}_{2n})=\tilde{u}\in\mathbb{R}^{2n}$ are coordinates for $\tilde{U}$ . In other words, the neighborhood $\tilde{U}\subset\tilde{\mathcal{M}}$ of $\tilde{x}_{i}$ is a level set of $\Phi$ . Now, consider the set of equations

$\displaystyle F_{1}(\tilde{u})=$	$\displaystyle\Phi_{1}(\tilde{u})-c_{1}=0,$
	$\displaystyle\vdots$
$\displaystyle F_{2n-d}(\tilde{u})=$	$\displaystyle\Phi_{2n-d}(\tilde{u})-c_{2n-d}=0.$	(D.62)

Since $\Phi$ is a diffeomorphism, its differential has full rank for all points in $\tilde{U}$ . Hence, the matrix

\begin{pmatrix}-&\nabla F_{1}&-\\ &\vdots&\\ -&\nabla F_{2n-d}&-\end{pmatrix}=\begin{pmatrix}-&\nabla\Phi_{1}&-\\ &\vdots&\\ -&\nabla\Phi_{2n-d}&-\end{pmatrix}

(D.63)

has full rank for all $\tilde{u}\in\tilde{U}$ .

Next, let $u=(\tilde{u}_{1},\ldots,\tilde{u}_{n})^{T}+i\cdot(\tilde{u}_{n+1},\ldots,% \tilde{u}_{2n})^{T}$ , and consider the set of equations

$\displaystyle H_{1}(\tilde{u})=$	$\displaystyle\text{Re}\left\{\left\langle Y_{1}\cdot u,x_{i}\right\rangle% \right\}=0,$
$\displaystyle\vdots$
$\displaystyle H_{d_{G}}(\tilde{u})=$	$\displaystyle\text{Re}\left\{\left\langle Y_{d_{G}}\cdot u,x_{i}\right\rangle% \right\}=0.$	(D.64)

By a direct computation, we get that

\begin{pmatrix}-&\nabla H_{1}&-\\ &\vdots&\\ -&\nabla H_{d_{G}}&-\\ \end{pmatrix}=-\begin{pmatrix}-&\widetilde{Y_{1}\cdot x_{i}}&-\\ &\vdots&\\ -&\widetilde{Y_{d_{G}}\cdot x_{i}}&-\end{pmatrix},

(D.65)

where $\widetilde{Y_{k}\cdot x_{i}}$ is the image of the map $\tilde{(\cdot)}$ applied to $Y_{k}\cdot x_{i}$ . Now, we observe that by (D.54), we have that

\frac{\partial A(u)}{\partial u_{k}}\cdot x_{i}=Y_{k}\cdot x_{i},\quad k\in% \left\{1,\ldots,d_{G}\right\}

(D.66)

hence, the vectors $Y_{1}\cdot x_{i},\ldots,Y_{d_{G}}\cdot x_{i}$ are the rows of the differential of the map $\eqref{convPrf:expCoordinatesForG}$ at $x_{i}$ , which has full rank, since by assumption the map $\eqref{convPrf:expCoordinatesForG}$ is a diffemorphism. Thus, the vectors $Y_{1}\cdot x_{i},\ldots,Y_{d_{G}}\cdot x_{i}$ are linearly independent. Since the map $\tilde{(\cdot)}$ is an isometry, we infer that the vectors $\widetilde{Y_{1}\cdot x_{i}},\ldots,\widetilde{Y_{d_{G}}\cdot x_{i}}$ are also linearly independent, whence we get that (D.65) has full rank.

Next, we observe that since $A\mapsto Ax_{i}$ is a diffeomorphism onto $G\cdot x_{i}$ , by (D.66), the vectors $Y_{1}\cdot x_{i},\ldots,Y_{d_{G}}\cdot x_{i}$ reside in $T_{x_{i}}(G\cdot x_{i})\subset T_{x_{i}}\mathcal{M}$ , the tangent space to $G\cdot x_{i}$ at $x_{i}$ , and since $\tilde{(\cdot)}$ is a Riemannian isometry of $\mathcal{M}$ onto $\tilde{\mathcal{M}}$ , we conclude that the vectors $\widetilde{Y_{1}\cdot x_{i}},\ldots,\widetilde{Y_{d_{G}}\cdot x_{i}}$ are tangent to $\tilde{\mathcal{M}}$ . On the other hand, the vectors $\nabla F_{1},\ldots,\nabla F_{2n-d}$ are all perpendicular to the neighborhood $\tilde{U}$ of $\tilde{x}_{i}$ , since it is defined as the level set $F(\tilde{u})=0$ , and therefore, they are perpendicular to all the vectors $\widetilde{Y_{1}\cdot x_{i}},\ldots,\widetilde{Y_{d_{G}}\cdot x_{i}}$ tangent to $\tilde{\mathcal{M}}$ at $\tilde{x}_{i}$ . Hence, the $(2n-(d-d_{G}))\times(2n)$ matrix

\begin{pmatrix}-&\nabla F_{1}&-\\ &\vdots&\\ -&\nabla F_{2n-d}&-\\ \\ -&\nabla H_{1}&-\\ &\vdots&\\ -&\nabla H_{d_{G}}&-\end{pmatrix}

(D.67)

has full rank at $\tilde{x}_{i}$ . Thus, there exists a subset of $2n-(d-d_{G})$ columns of (D.67) that form a $(2n-(d-d_{G}))\times(2n-(d-d_{G}))$ matrix $\tilde{D}_{\tilde{x_{i}}}$ , which has a full rank. In particular, we have that $\det\left(\tilde{D}_{\tilde{x_{i}}}\right)\neq 0$ . Lastly, by (D.57), the point $\tilde{x}_{i}$ is a solution of (D.5), and by construction, also a solution of (D.5), and thus a solution of (D.67). Hence, by the implicit function theorem, there exists an open subset $\tilde{V}\subset\tilde{U}$ , and open subsets $\tilde{U}_{1}$ and $\tilde{U}_{2}$ such that $\tilde{V}=\tilde{U}_{1}\times\tilde{U}_{2}$ , and coordinates $\tilde{u}_{i_{1}},\ldots,\tilde{u}_{i_{d-d_{G}}}\in\tilde{U}_{1}$ , and smooth functions $g_{1},\ldots,g_{2n-(d-d_{G})}$ from $\tilde{U}_{1}$ onto $\tilde{U}_{2}$ such that

\tilde{V}=\left\{(\tilde{u}_{i_{1}},\ldots,\tilde{u}_{i_{d-d_{G}}},g_{1},% \ldots,g_{2n-(d-d_{G})})\;:\;\tilde{u}_{i_{1}},\ldots,\tilde{u}_{i_{d-d_{G}}}% \in\tilde{U}_{1}\right\}.

(D.68)

We can now redefine the set $\tilde{\mathcal{N}}$ in (D.60) as

\tilde{\mathcal{N}}=\left\{\tilde{z}\in\mathbb{R}^{2n}:\text{Re}\left\{\left% \langle Y_{k}\cdot z,x_{i}\right\rangle_{\mathbb{C}^{n}}\right\}=0,\quad k=1,% \ldots,d_{G},\quad z\in V\right\},

(D.69)

where

V=\left\{x\;:\;\tilde{x}\in\tilde{V}\right\}.

(D.70)

By (D.68), we conclude that $\tilde{\mathcal{N}}$ is a $(d-d_{G})$ -dimensional smooth submanifold in $\tilde{\mathcal{M}}$ of (D.59). Now, we can redefine $\mathcal{N}$ in (D.58) as

\mathcal{N}=\left\{z\;:\;\tilde{z}\in\tilde{\mathcal{N}}\right\}.

(D.71)

Moreover, we can take $\tilde{V}$ to be closed in $\mathbb{C}^{N}$ and small enough so that $V\subset\mathcal{M}^{\prime}$ , which guarantees that the problem (D.43) has a unique solution for each $x$ in $\mathcal{M}^{\prime}=G\cdot\mathcal{N}$ . Furthermore, since $\mathcal{N}$ and $\mathcal{M}$ are isometric to $\tilde{\mathcal{N}}$ and $\tilde{\mathcal{M}}$ , respectively, we conclude that $\mathcal{N}$ is a $(d-d_{G})$ -dimensional compact submanifold in $\mathcal{M}$ . ∎

Next, we show how to integrate over $\mathcal{M}^{\prime}$ using our $G$ -invariant parametrization. Let $z(w)=z\left(w_{1},\ldots,w_{d-d_{G}}\right)$ denote some coordinate chart on $\mathcal{N}$ in (D.58), and let $A(u)=A(u_{1},\ldots,u_{d_{G}})$ be the coordinate chart on $G$ in (D.39). The integral of a smooth function $h(x)$ over $\mathcal{M}^{\prime}$ is given by the change of variables (see [46])

\int_{\mathcal{M}^{\prime}}h(x)dx=\int_{z\in\mathcal{N}}\int_{G}h(A\cdot z)dV(% A\cdot z),

(D.72)

where we denote

V(A\cdot z)=\sqrt{\left|\det\left\{g_{\mathcal{M}^{\prime}}(A(u)\cdot z(w))% \right\}\right|}

(D.73)

and

dV(A\cdot z)=\sqrt{\left|\det\left\{g_{\mathcal{M}^{\prime}}(A(u)\cdot z(w))% \right\}\right|}dw_{1}\ldots dw_{d-d_{G}}du_{1}\ldots du_{d_{G}},

(D.74)

is the volume form at $x=A\cdot z$ , and $g_{\mathcal{M}^{\prime}(x)}$ is the metric tensor on $\mathcal{M}^{\prime}$ given by

g_{\mathcal{M}^{\prime}}(x)=\text{Re}\left\{J^{*}_{\mathcal{M}^{\prime}}(x)J_{% \mathcal{M}^{\prime}}(x)\right\},

(D.75)

and $J_{\mathcal{M}^{\prime}}$ is the Jacobian change of variables matrix, given explicitly by

J_{\mathcal{M}^{\prime}}(w,u)=\bigg{(}J_{w}\quad J_{u}\bigg{)},\quad J_{w}=% \left(\frac{\partial x}{\partial w_{1}}\cdots\frac{\partial x}{\partial w_{d-d% _{G}}}\right),\quad J_{u}=\left(\frac{\partial x}{\partial u_{1}}\cdots\frac{% \partial x}{\partial u_{d_{G}}}\right).

(D.76)

In the following section we prove Lemma 14, which requires a careful asymptotic approximation of the second moment $\mathbb{E}\left[\left(H_{i}\right)^{2}\right]$ in (D.25) with respect to the uniform distribution over $\mathcal{M}$ . The proof employs the relationship between the Haar measure on $G$ , and a certain measure induced by our $G$ -invariant parametrization on orbits of the form $G\cdot z$ , which we now define.

First, we note that diffeomorphism $z\rightarrow A\cdot z$ admits an inverse map $\Phi:G\cdot z\rightarrow G$ given by

\Phi(A\cdot z)=A,\quad A\cdot z\in G\cdot z,

(D.77)

which induces a topology on $G\cdot z$ given by

\mathcal{T}_{G\cdot z}=\left\{\Phi^{-1}(H)\;|\;H\text{ is a Borel measurable % subset in }G\right\}.

(D.78)

Next, consider the function $\mu_{z}$ over $\mathcal{T}_{G\cdot z}$ defined by

\mu_{z}(F)=\int_{u:A(u)\in\Phi(F)}d\mu_{z}(u),

(D.79)

where

d\mu_{z}(u)\coloneqq\sqrt{\left|\det(\text{Re}\{J_{u}^{*}\left(A(u)\cdot z% \right)J_{u}\left(A(u)\cdot z\right)\})\right|}du,

(D.80)

and $J_{u}$ was defined in (D.76). The following lemma asserts that $\mu_{z}$ is a measure over $G\cdot z$ , and characterizes its relationship to the Haar measure on $G$ .

Lemma 19.

For every $z\in\mathcal{N}$ , the function $\mu_{z}$ is a measure over $G\cdot z$ with the topology $\mathcal{T}_{G\cdot z}$ . Furthermore, define the pushforward of $\mu_{z}$ by the map $\Phi$ in (D.77), as the function $\Phi_{*}(\mu_{z})$ over the Borel $\sigma$ -algebra of $G$ given by

\Phi_{*}(\mu_{z})(H)=\mu_{z}(\Phi^{-1}(H)),

(D.81)

for every Borel subset $H\subseteq G$ . Then, with probability one we have that $\mu_{z}$ is a measure over $G\cdot z$ . Furthermore, there exists a constant $\mu(z)>0$ such that

\Phi_{*}(\mu_{z})(H)=\mu(z)\eta(H),\quad H\subseteq G,

(D.82)

where $\eta$ is the Haar measure over $G$ .

Proof.

To see that $\mu_{z}$ is a measure, first we note that since $\sqrt{\left|\det(\text{Re}\{J_{u}^{*}J_{u}\})\right|}$ is non-negative then so is $\mu_{z}(\cdot)$ . Furthermore, $\mu_{z}(\cdot)$ is bounded since for any $F\in\mathcal{T}_{G\cdot z}$ we have that

\mu_{z}(F)=\int_{u:A(u)\in\Phi(F)}d\mu_{z}(u)\leq\int_{u:A(u)\in G}d\mu_{z}(u)% =\text{Vol}(G\cdot z)<\infty,

(D.83)

where the last inequality is due to the fact that $G\cdot z$ is compact. Thus, it only remains to show that $\mu_{z}(\cdot)$ is countably additive over $\mathcal{T}_{G\cdot z}$ . Indeed, the map $\Phi$ being a homeomorphism preserves the topology of $G$ (see [32]), and in particular, it holds that for any countable family of disjoint open sets $F_{1},F_{2},\ldots\in\mathcal{T}_{G\cdot z}$ we have

\Phi\left(\bigcup_{k=1}^{\infty}F_{k}\right)=\bigcup_{k=1}^{\infty}\Phi\left(F% _{k}\right).

(D.84)

In other words, the map $\Phi$ preserves disjoint unions. Thus, by (D.79) and (D.84) we have

\mu_{z}\left(\bigcup_{k=1}^{\infty}F_{k}\right)=\sum_{k=1}^{\infty}\mu_{z}(F_{% k}).

(D.85)

We conclude that $\mu_{z}(\cdot)$ is a measure over $G\cdot z$ , the latter having the topology of $G$ (induced by $\Phi$ ).

Next, we show that $\mu_{z}$ is left invariant under the action of $G$ , that is, for any measurable subset $F\in\mathcal{T}_{G\cdot z}$ , we have

\mu_{z}(B\cdot F)=\mu_{z}(F),\quad B\in G.

(D.86)

Indeed, by (D.76) we have

J_{u}=\left[J_{u_{1}}\;\cdots\;J_{u_{d_{G}}}\right]=\left[\frac{\partial A}{% \partial u_{1}}\cdot z\;\cdots\;\frac{\partial A}{\partial u_{d_{G}}}\cdot z% \right],

(D.87)

and thus

(J_{u}^{*}J_{u})_{ij}=z^{*}\left(\frac{\partial A}{\partial u_{i}}\right)^{*}% \frac{\partial A}{\partial u_{j}}z,\quad 1\leq i,j\leq d_{G}.

(D.88)

Thus, for a fixed $B\in G$ we have

z^{*}\left(\frac{\partial\left(B\cdot A\right)}{\partial u_{i}}\right)^{*}% \frac{\partial\left(B\cdot A\right)}{\partial u_{j}}z=z^{*}\left(\frac{% \partial A}{\partial u_{i}}\right)^{*}\frac{\partial A}{\partial u_{j}}z.

(D.89)

Now, the map $\Phi:G\cdot z\rightarrow G$ induces a measure $\Phi_{*}(\mu_{z})$ on $G$ via pushforward, defined explicitly by

\Phi_{*}(\mu_{z})(H)=\mu_{z}(\Phi^{-1}(H)),

(D.90)

for every Borel measurable subset $H\subseteq G$ . Intuitively, the function $\Phi_{*}(\mu_{z})$ measures the volume of a subset $H\subseteq G$ by first map** $H$ into the orbit $G\cdot z$ , and then measuring the volume of the image $H\cdot z\subseteq G\cdot z$ . By (D.86), for a fixed $A\in G$ and any open subset $H\subseteq G$ we have

\Phi_{*}(\mu_{z})(AH)=\mu_{z}(\Phi^{-1}(AH))=\mu_{z}(AH\cdot z)=\mu_{z}(H\cdot z% )=\Phi_{*}(\mu_{z})(H),

(D.91)

which shows that the measure $\Phi_{*}(\mu_{z})$ is left-invariant. By Haar’s theorem for compact groups there exists, up to multiplication by a positive scalar, a unique left invariant measure over $G$ . It follows that for every $z\in\mathcal{N}$ there exists a positive scalar $\mu(z)\in\mathbb{R}$ such that

\mu(z)\eta(H)=\Phi_{*}(\mu_{z})(H),\quad H\subseteq G,

(D.92)

which in turn implies that $\Phi_{*}(\mu_{z})$ is related to the Haar measure by

\eta(H)=\frac{\Phi_{*}(\mu_{z})(H)}{\mu(z)},\quad z\in\mathcal{N}.

(D.93)

In particular, plugging $H=G$ into (D.93), and using (3.5) we get

\mu(z)=\frac{\Phi_{*}(\mu_{z})(G)}{\eta(G)}=\mu_{z}(\Phi^{-1}(G))=\mu_{z}(G% \cdot z),

(D.94)

which shows that $\mu(z)$ is the volume of the orbit $G\cdot z$ . By assumption, with probability one we have that $A\cdot z\neq z$ for all $A\in G$ , and thus the map $\Phi^{-1}(A)=A\cdot z$ is a diffeomorphism of $G$ onto $G\cdot z$ . Hence, we conclude that $\mu_{z}(G\cdot z)>0$ . ∎

D.6 Proof of Lemma 14

In this section, we evaluate the second order moment $\mathbb{E}\left[\left(H_{i}\right)^{2}\right]$ , which appears in the evaluation of (D.21). The evaluation of the remaining second order moments in $\eqref{convPrf:JiExpression}$ is done in a very similar fashion.

Now, recall that in Section D.5 we constructed a $G$ -invariant parametrization of a certain neighborhood $\mathcal{M}^{\prime}\subset\mathcal{M}$ of the data point $x_{i}$ , such that $\left\lVert x-x_{i}\right\rVert>\delta$ for all $x\notin\mathcal{M}^{\prime}$ . Thus, by (D.3) we have

\mathbb{E}\left[\left(H_{i}\right)^{2}\right]=\int_{\mathcal{M}}H_{i}^{2}(x)p(% x)dx=\int_{\mathcal{M}^{\prime}}H_{i}(x)p(x)dx+O\left(e^{-\delta^{2}/\epsilon}% \right),

(D.95)

where the second equality stems from the fact that $\mathcal{M}^{\prime}=\mathcal{M}\cap B_{\delta}(x_{i})$ , and the integrand of $H_{i}(x)$ is a Gaussian of width $\epsilon$ centered at $x_{i}$ . We point out that the exponentially small error term on the r.h.s of (D.95) is negligible with respect to the polynomial asymptotic error in (D.25) that we are about to derive, and thus will be dropped in all subsequent analysis. Furthermore, for a fixed $x\in\mathcal{M}^{\prime}$ there exist $z\in\mathcal{N}$ and $B^{\prime}\in G$ such that

H_{i}(x)=\int_{G}e^{-\lVert x_{i}-B\cdot B^{\prime}z\rVert^{2}/\epsilon}f(B% \cdot B^{\prime}z)d\eta(B)=\int_{G}e^{-\lVert x_{i}-B\cdot z\rVert^{2}/% \epsilon}f(B\cdot z)d\eta(B)=H_{i}(z),

(D.96)

where in the second equality we used the change of variables $B=B\cdot B^{\prime}$ , and that the Haar measure $\eta$ on a compact group is right invariant. Therefore, continuing from (D.95) and using (D.72) we can write

	$\displaystyle\mathbb{E}\left[\left(H_{i}\right)^{2}\right]$	$\displaystyle=\int_{\mathcal{N}}\int_{G}\left(H_{i}(B\cdot z)\right)^{2}\frac{% 1}{\text{Vol}\left(\mathcal{M}\right)}V(B\cdot z)d\eta(B)dz$		(D.97)
		$\displaystyle=\int_{\mathcal{N}}\left(H_{i}(z)\right)^{2}p_{\mathcal{N}}(z)dz,$

where we defined

p_{\mathcal{N}}(z)=\frac{1}{\text{Vol}\left(\mathcal{M}\right)}\int_{G}V(B% \cdot z)d\eta(B),

(D.98)

where we used that $\text{Vol}(G)=1$ .

Next, writing

\left\lVert x_{i}-B\cdot z\right\rVert^{2}=\left\lVert x_{i}-z+z-B\cdot z% \right\rVert^{2}=\left\lVert x_{i}-z\right\rVert^{2}+2\text{Re}\left\{\left% \langle x_{i}-z,z-B\cdot z\right\rangle\right\}+\left\lVert z-B\cdot z\right% \rVert^{2},

and defining

\delta_{i}(z,x)\coloneqq 2\text{Re}\left\{\left\langle x_{i}-z,z-x\right% \rangle\right\},

(D.99)

we have

H_{i}(z)=e^{-\left\lVert x_{i}-z\right\rVert^{2}/\epsilon}\int_{G}e^{-\left% \lVert z-B\cdot z\right\rVert^{2}/\epsilon}e^{-\delta_{i}(z,B\cdot z)/\epsilon% }f(B\cdot z)d\eta(B).

Taylor expanding the function $e^{-x}$ at $\delta_{i}(z,B\cdot z)/\epsilon$ we get,

e^{-\delta_{i}(z,B\cdot z)/\epsilon}=1+O\left(\frac{\delta_{i}(z,B\cdot z)}{% \epsilon}\right)

from which we have

	$\displaystyle H_{i}(z)=e^{-\left\lVert x_{i}-z\right\rVert^{2}/\epsilon}\bigg{[}$	$\displaystyle\int_{G}e^{-\left\lVert z-B\cdot z\right\rVert^{2}/\epsilon}f(B% \cdot z)d\eta(B)$		(D.100)
		$\displaystyle+O\left(\frac{1}{\epsilon}\int_{G}e^{-\left\lVert z-B\cdot z% \right\rVert^{2}/\epsilon}\delta_{i}(z,B\cdot z)f(B\cdot z)d\eta(B)\right)% \bigg{]}.$

We now proceed to evaluate (D.100) term by term.

For the first term, we have

$\displaystyle\int_{G}e^{-\left\lVert z-B\cdot z\right\rVert^{2}/\epsilon}f(B% \cdot z)d\eta(B)$	$\displaystyle=\frac{1}{\mu(z)}\int_{G}e^{-\left\lVert z-B\cdot z\right\rVert^{% 2}/\epsilon}f(B\cdot z)d\Phi_{*}(\mu_{z})(B)$
	$\displaystyle=\frac{1}{\mu(z)}\int_{G}e^{-\left\lVert z-\Phi^{-1}(B)\right% \rVert^{2}/\epsilon}f\left(\Phi^{-1}(B)\right)d\Phi_{*}(\mu_{z})(B)$
	$\displaystyle=\frac{1}{\mu(z)}\int_{G\cdot z}e^{-\left\lVert z-x\right\rVert^{% 2}/\epsilon}f(x)d\mu_{z}(x)$
	$\displaystyle=\frac{\left(\pi\epsilon\right)^{d_{G}/2}}{\mu(z)}(f(z)+O(% \epsilon)),$	(D.101)

where we used (D.90) and (D.93) in the first equality, the definition of the map $\Phi$ in (D.77) in the second equality, the change of variables theorem for pushforward measures (see Theorem 3.6.1. in [6]) for $x=\Phi^{-1}(B)$ in the third one, and Theorem 13 in the last equality, where we note that $d\mu_{z}(x)$ is the Riemannian volume element of the $d_{G}$ -dimensional manifold $G\cdot z$ .

For the second term in (D.100), using the same change of variables as in (D.6) we have

	$\displaystyle\frac{1}{\epsilon}\int_{G}e^{-\left\lVert z-B\cdot z\right\rVert^% {2}/\epsilon}\delta_{i}(z,B\cdot z)f(B\cdot z)d\eta(B)=\frac{1}{\epsilon\mu(z)% }\int_{G\cdot z}e^{-\left\lVert z-x\right\rVert^{2}}\delta_{i}(z,x)f(x)d\mu_{z% }(x)$		(D.102)
	$\displaystyle=\frac{\left(\pi\epsilon\right)^{d_{G}/2}}{\epsilon\mu(z)}\bigg{[% }\delta_{i}(z,z)f(z)+\frac{\epsilon}{4}\bigg{[}E(z)\delta_{i}(z,z)+\Delta_{G% \cdot z}\biggl{\{}\delta_{i}(z,x)f(x)\biggr{\}}\bigg{\|}_{x=z}\bigg{]}+O(% \epsilon^{2})\bigg{]}$
	$\displaystyle=\frac{\left(\pi\epsilon\right)^{d_{G}/2}}{\epsilon\mu(z)}\bigg{[% }\frac{\epsilon}{4}\Delta_{G\cdot z}\biggl{\{}\delta_{i}(z,x)f(x)\biggr{\}}% \bigg{\|}_{x=z}+O(\epsilon^{2})\bigg{]},$

where we used that $\delta_{i}(z,z)=0$ by (D.99). Furthermore, we have

$\displaystyle\Delta_{G\cdot z}$	$\displaystyle\biggl{\{}\delta_{i}(z,x)f(x)\biggr{\}}\big{\|}_{x=z}=$	(D.103)
	$\displaystyle=\Delta_{G\cdot z}f(x)\big{\|}_{x=z}\cdot\delta_{i}(z,z)-2\left% \langle\nabla_{G\cdot z}\delta_{i}(z,x)\big{\|}_{x=z},\nabla_{G\cdot z}f(z)% \right\rangle+f(z)\cdot\Delta_{G\cdot z}\delta_{i}(z,x)\big{\|}_{x=z}$	(D.104)
	$\displaystyle=f(z)\Delta_{G\cdot z}\delta_{i}(z,x)\big{\|}_{x=z},$

where we have used the multivariate version of the formula for the second derivative of a product of functions (see Lemma 3.3 in [36]), and that by (D.99)

\displaystyle\nabla_{G\cdot z}\delta_{i}(z,x)\big{|}_{x=z}=-2\text{Re}\left\{% \begin{pmatrix}|&\cdots&|\\ \nabla_{G\cdot z}x_{1}&\cdots&\nabla_{G\cdot z}x_{n}\\ |&\dots&|\end{pmatrix}^{*}\cdot(x_{i}-z)\right\}=0,

(D.105)

where we used the fact that $x$ in (D.105) is a coordinate function on $G\cdot z$ , that is, the function $x(p)$ returns the coordinates in $\mathbb{C}^{n}$ of $p\in G\cdot z$ , and thus the vectors $\nabla_{G\cdot z}x_{k}|_{x=z}$ reside in the tangent space $T_{z}\left\{G\cdot z\right\}$ to $G\cdot z$ at $x=z$ , and that, by construction, the vector $x_{i}-z$ is perpendicular to $T_{z}\left\{G\cdot z\right\}$ (see (D.53)). Continuing, we now define the function

q(z)\coloneqq\frac{f(z)}{4}\Delta_{G\cdot z}\delta_{i}(z,x)|_{x=z}=-\frac{f(z)% }{2}\text{Re}\big{\{}\left\langle x_{i}-z,\Delta_{G\cdot z}x\big{|}_{x=z}% \right\rangle\big{\}},

(D.106)

for all $z\in\mathcal{N}$ , where the expression $\Delta_{G\cdot z}x\big{|}_{x=z}$ is the vector valued function whose entries are the Laplacians of each coordinate function of the parametrization of $G\cdot z$ via $x(B)=B\cdot z$ , evaluated at $z$ . By the Cauchy-Schwart inequality combined with the compactness of $G\cdot z$ we get

q(x_{i})=0,\quad q(z)=O(\left\lVert x_{i}-z\right\rVert),

(D.107)

which leads to

\frac{1}{\epsilon}\int_{G}e^{-\left\lVert z-B\cdot z\right\rVert^{2}/\epsilon}% \delta_{i}(z,B\cdot z)f(B\cdot z)d\eta(B)=\frac{(\pi\epsilon)^{d_{G}/2}}{% \epsilon\mu(z)}[\epsilon q(z)+O(\epsilon^{2})]=\frac{(\pi\epsilon)^{d_{G}/2}}{% \mu(z)}[q(z)+O(\epsilon)].

(D.108)

Now, substituting (D.6) and (D.108) into (D.100) we have that

H_{i}(z)=\frac{e^{-\left\lVert x_{i}-z\right\rVert^{2}/\epsilon}}{\mu(z)}\left% (\pi\epsilon\right)^{d_{G}/2}\bigg{[}f(z)+O(q(z))+O\left(\epsilon\right)\bigg{% ]}.

(D.109)

Thus, we get

\left(H_{i}(z)\right)^{2}=\frac{e^{-2\left\lVert x_{i}-z\right\rVert^{2}/% \epsilon}}{\mu^{2}(z)}\left(\pi\epsilon\right)^{d_{G}}\big{[}f^{2}(z)+O(2f(z)q% (z))+O\left(\left\lVert x_{i}-z\right\rVert^{2}\right)+O\left(\epsilon\right)% \big{]},

(D.110)

after suppressing higher order terms. Plugging (D.110) into (D.97), we get

		$\displaystyle\mathbb{E}\left[\left(H_{i}(x)\right)^{2}\right]=\int_{\mathcal{N% }}\left(H_{i}(z)\right)^{2}p_{\mathcal{N}}(z)dx=$		(D.111)
		$\displaystyle\left(\pi\epsilon\right)^{d_{G}}\int_{\mathcal{N}}\frac{e^{-2% \left\lVert x_{i}-z\right\rVert^{2}/\epsilon}}{\mu^{2}(z)}\big{[}f^{2}(z)+O(2f% (z)q(z))+O\left(\left\lVert x_{i}-z\right\rVert^{2}\right)+O\left(\epsilon% \right)\big{]}p_{\mathcal{N}}(z)dz.$

Applying Theorem 13 to each term inside the integral (D.111), we have that

\int_{\mathcal{N}}e^{-2\left\lVert x_{i}-z\right\rVert^{2}/\epsilon}\frac{f^{2% }(z)p_{\mathcal{N}}(z)}{\mu^{2}(z)}dz=\left(\pi\epsilon/2\right)^{(d-d_{G})/2}% \bigg{[}\frac{f^{2}(x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)% \bigg{]},

(D.112)

and that

	$\displaystyle\int_{\mathcal{N}}e^{-2\left\lVert x_{i}-z\right\rVert^{2}/% \epsilon}\frac{f(z)q(z)p_{\mathcal{N}}}{\mu^{2}(z)}dz$	$\displaystyle=\left(\pi\epsilon/2\right)^{(d-d_{G})/2}\bigg{[}\frac{f(x_{i})q(% x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)\bigg{]}$		(D.113)
		$\displaystyle=(\pi\epsilon/2)^{\left(d-d_{G}\right)/2}\cdot O(\epsilon),$

where we used that by (D.107) we have that $q(x_{i})=0$ , and using that $\left\lVert x_{i}-z\right\rVert^{2}$ vanishes at $z=x_{i}$ we get that

\int_{\mathcal{N}}\frac{e^{-2\left\lVert x_{i}-z\right\rVert^{2}/\epsilon}}{% \mu^{2}(z)}O(\left\lVert x_{i}-z\right\rVert^{2})p_{\mathcal{N}}(z)dz=\left(% \pi\epsilon/2\right)^{(d-d_{G})/2}\cdot O(\epsilon).

(D.114)

Finally, plugging (D.112), (D.113) and (D.114) into (D.111), we get that

\displaystyle\mathbb{E}\left[\left(H_{i}(x)\right)^{2}\right]

\displaystyle=\frac{(\pi\epsilon)^{\left(d+d_{G}\right)/2}}{2^{(d-d_{G})/2}}% \bigg{[}\frac{f^{2}(x_{i})p_{\mathcal{N}}(x_{i})}{\mu^{2}(x_{i})}+O(\epsilon)% \bigg{]},

(D.115)

which finishes the proof of (D.25) in Lemma 14.

Appendix E Non-uniform sampling distribution

First, let us compute the limiting operator resulting from assuming a non-uniform sampling distribution $p(x)$ in the setting of Theorem 11, by repeating the analysis of the bias error at the beginning of D.1 under this assumption. Fixing an $\epsilon>0$ , we compute the limit of (D.1) as $N\rightarrow\infty$ . By (D.2)-(D.5), we have that the limit of $C_{i,N}^{1}$ in (D.2) evaluates as

	$\displaystyle\lim_{N\rightarrow\infty}C_{i,N}^{1}$	$\displaystyle=\int_{\mathcal{M}}H_{i}(x)p(x)d\omega(x)$
		$\displaystyle=\int_{\mathcal{M}}\int_{G}\exp{\left\{-{\left\\|x_{i}-B\cdot x% \right\\|^{2}}{/\varepsilon}\right\}}f(B\cdot x)p(x)d\omega(x)d\eta(A)$		(E.1)

Making the change of variables $y=A\cdot x$ , and using (D.6)-(D.8) we have that

$\displaystyle\lim_{N\rightarrow\infty}C_{i,N}^{1}$	$\displaystyle=\int_{G}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{}\cdot y)dU_{A}^{}(\omega)(y)d\eta(A)$
	$\displaystyle=\int_{G}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{*}\cdot y)d\omega(y)d\eta(A)$
	$\displaystyle=\int_{\mathcal{M}}\int_{G}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{*}\cdot y)d\eta(A)d\omega(y)$
	$\displaystyle=\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/% \varepsilon}\right\}}f(y)\tilde{p}(y)d\omega(y),$	(E.2)

where we defined

\tilde{p}(x)=\int_{G}p(A^{*}\cdot y)d\eta(A).

(E.3)

Similarly, we get that the limit of $C_{i,N}^{2}$ in (D.10) as $N\rightarrow\infty$ is given by

	$\displaystyle\lim_{N\rightarrow\infty}C_{i,N}^{2}$	$\displaystyle=\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{j=1}^{N}\int_{G}\exp{% \left\{-{\left\\|x_{i}-B\cdot x_{j}\right\\|^{2}}{/\varepsilon}\right\}}$
		$\displaystyle=\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/% \varepsilon}\right\}}\tilde{p}(y)d\omega(y)d\eta(B).$		(E.4)

By (E) and (E), we obtain that the the limit of (D.1) when $N\rightarrow\infty$ is given by

	$\displaystyle\lim_{N\rightarrow\infty}\frac{4}{\varepsilon}\left\{\tilde{L}g% \right\}(i,I)$	$\displaystyle=\lim_{N\rightarrow\infty}\frac{4}{\varepsilon}\left[f(x_{i})-% \frac{C_{i,N}^{1}}{C_{i,N}^{2}}\right]$
		$\displaystyle=\frac{4}{\varepsilon}\left[f(x_{i})-\frac{\int_{\mathcal{M}}\exp% {\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}\right\}}f(y)\tilde{p}(y)d% \omega(y)}{\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/% \varepsilon}\right\}}\tilde{p}(y)d\omega(y)}\right]$		(E.5)

By the results in [28], we get that

\displaystyle\lim_{\epsilon\rightarrow 0}\lim_{N\rightarrow\infty}\frac{4}{% \varepsilon}\left\{\tilde{L}g\right\}(i,I)=\Delta_{\mathcal{M}}f(x_{i})-2\frac% {\left\langle\nabla_{\mathcal{M}}f(x_{i}),\nabla_{\mathcal{M}}\tilde{p}(x_{i})% \right\rangle}{\tilde{p}(x_{i})}.

(E.6)

This shows that in case that $p(x)$ in non-uniform, the normalized $G$ -GL converges to an operator that is different from the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ , namely, the Fokker-Planck operator on $\mathcal{M}$ which depends on the density $\tilde{p}(x)$ .

Nevertheless, following [38] and [28], we now show that we can retrieve $\Delta_{\mathcal{M}}$ by normalizing the kernel function $W_{ij}(A,B)$ in (4.9), as follows. Let us define for all $i,j\in\left\{1,\ldots,N\right\}$

\bar{W}_{ij}(A,B)\coloneq\frac{W_{ij}(A,B)}{D_{ii}D_{jj}},

(E.7)

and for all $i\in\left\{1,\ldots,N\right\}$

\bar{D}_{ii}\coloneq\sum_{i=1}^{N}\int_{G}\bar{W}_{ij}(I,A)d\eta(A).

(E.8)

Then, we define the density-normalized $G$ -invariant graph Laplacian $\bar{L}$ as

\bar{L}=I-\bar{D}^{-1}\bar{W}.

(E.9)

By repeating the computations in equations $(190)-(191)$ in [38], we obtain that

\displaystyle\lim_{N\rightarrow\infty}\frac{4}{\varepsilon}\left\{\tilde{L}g% \right\}(i,I)=\frac{4}{\varepsilon}\left[f(x_{i})-\frac{\int_{\mathcal{M}}\exp% {\left\{-{\left\|x_{i}-y\right\|^{2}}{/\varepsilon}\right\}}f(y)\hat{p}(y)d% \omega(y)}{\int_{\mathcal{M}}\exp{\left\{-{\left\|x_{i}-y\right\|^{2}}{/% \varepsilon}\right\}}\hat{p}(y)d\omega(y)}\right],

(E.10)

where

\hat{p}(x)=\frac{\tilde{p}(x)}{\int_{\mathcal{M}}\exp{\left\{-{\left\|x-y% \right\|^{2}}{/\varepsilon}\right\}}\tilde{p}(y)d\omega(y)}.

(E.11)

Then, by a derivation in [28] we obtain that

\lim_{\epsilon\rightarrow\epsilon}\lim_{N\rightarrow\infty}\frac{4}{\epsilon}% \left\{\bar{L}g\right\}(i,I)=\Delta_{\mathcal{M}}f(x_{i}).

(E.12)

Appendix F Eigendecomposition of the normalized $G$ -GL

We now restate Theorem 10 for the operator $\tilde{L}$ in (4.8), the normalized version of the $G$ -GL. The proof is obtained by repeating that of Theorem 10, with the matrices $D^{(\ell)}-\hat{W}^{(\ell)}$ replaced by the matrix sequence

S^{(\ell)}=I-(D^{\ell})^{-1}\hat{W}^{\ell},\quad\ell\in\mathcal{I},

(F.1)

of (4.15), with required changes made in equations (C)-(C.12), and omitting the proof of orthogonality which doesn’t hold in this case.

Theorem 20.

For each $\ell\in\mathbb{N}$ , let $D^{\ell}$ be the $Nd_{\ell}\times Nd_{\ell}$ block-diagonal matrix who’s $i$ -th block of size $d_{\ell}\times d_{\ell}$ on the diagonal is given by the product of the scalar $D_{ii}$ in (4.5) with the $d_{\ell}\times d_{\ell}$ identity matrix. Then, the normalized $G$ -invariant graph Laplacian admits the following:

1.

A sequence of non-negative eigenvalues $\{\tilde{\lambda}_{1,\ell},\ldots,\tilde{\lambda}_{Nd_{\ell},\ell}\}_{\ell\in% \mathcal{I}}$ , where $\tilde{\lambda}_{n,\ell}$ is the $n$ -th eigenvalue of the matrix $S^{(\ell)}=I-D^{-1}\hat{W}^{\ell}$ .

A sequence $\{\tilde{\Phi}_{\ell,1,1},\ldots,\tilde{\Phi}_{\ell,d_{\ell},Nd_{\ell}}\}_{% \ell\in\mathcal{I}}$ of eigenfunctions, which are complete in $\mathcal{H}$ and are given by

\tilde{\Phi}_{\ell,m,n}(\cdot,A)=\begin{pmatrix}-&e^{1}(\tilde{v}_{n,\ell})&-% \\ &\vdots&\\ -&e^{N}(\tilde{v}_{n,\ell})&-\end{pmatrix}\cdot U^{(\ell)}_{\cdot,m}(A^{*}),

(F.2)

where $\tilde{v}_{n,\ell}$ is the eigenvector of $S^{(\ell)}=I-D^{-1}\hat{W}^{\ell}$ which corresponds to its eigenvalue $\tilde{\lambda}_{n,\ell}$ . For each $n\in\{1,\ldots,Nd_{\ell}\}$ and $\ell\in\mathcal{I}_{G}$ , the eigenvectors $\{\tilde{\Phi}_{\ell,-\ell,n},\ldots,\tilde{\Phi}_{\ell,\ell,n}\}$ correspond to the eigenvalue $\tilde{\lambda}_{n,\ell}$ of the normalized $G$ -invariant graph Laplacian.

References

[1] S. Axler, P. Bourdon, and W. Ramey. Harmonic Function Theory. Springer, Springer-Verlag New York, Inc., 2001.
[2] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373––1396, 2003.
[3] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
[4] M. Belkin and P. Niyogi. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. MIT Press, 2007.
[5] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(6):2399––2434, 2006.
[6] V.I Bogachev. Measure Theory. Springer, Springer-Verlag Heidelberg, Inc., 2007.
[7] D. Bump. Lie Groups. Springer, Springer-Verlag New York, Inc., 2004.
[8] L. Chen. Curse of Dimensionality, pages 545–546. Springer US, Boston, MA, 2009.
[9] X. Cheng and N. Wu. Eigen-convergence of gaussian kernelized graph Laplacian by manifold heat interpolation. Applied and Computational Harmonic Analysis, 61:132–190, 2022.
[10] G.S. Chirikjian. Stochastic Models, Information Theory, and Lie Groups, Volume 2. Birkhäuser, Birkhäuser Boston, 2010.
[11] G.S Chirikjian and A.B Kyatkin. Engineering applications of noncommutative harmonic analysis with emphasis on rotation and motion groups. CRC Press LLC, CRC Press LLC, Boca Raton, Florida., 2001.
[12] F. R. K. Chung. graph spectral theory. American Mathematical Society, 1997.
[13] Taco Cohen and Max Welling. Group equivariant convolutional networks. ArXiv, abs/1602.07576, 2016.
[14] S. Dieleman and K. De Fauw, J.and Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1889–1898, New York, New York, USA, 20–22 Jun 2016. PMLR.
[15] Andreas Doerr. Cryo-electron tomography. Nat Methods, 14(7):664–665, 2017.
[16] M. Eller and M. Fornasier. Rotation invariance in exemplar-based image inpainting. Variational Methods: In Imaging and Geometric Control, 18:108, 2017.
[17] Y. Fan, T. Gao, and Z. J. Zhao. Unsupervised co-learning on g-manifolds across irreducible representations, 2019.
[18] B. Fasel and D. Gatica-Perez. Rotation-invariant neoperceptron. In 18th International Conference on Pattern Recognition (ICPR’06), volume 3, pages 336–339, 2006.
[19] G.B. Folland. A Course in Abstract Harmonic Analysis. CRC Press, Boca Raton, Florida, 2001.
[20] Joachim Frank. Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford, 2006.
[21] J. Gallier and J. Quaintance. Differential Geometry and Lie Groups: A Computational Perspective. Number 12 in Geometry and Computing. Springer, 2020.
[22] C. Godsil and G.F. Royle. Algebraic Graph Theory. Graduate Texts in Mathematics. Springer, 2001.
[23] B.” ”Hall. ”Lie Groups, Lie Algebras, and Representations: An Elementary Introduction”. ”Springer”, ”Springer International Publishing Switzerland”, ”2015”.
[24] P. Hoyos and J. Kileel. Diffusion maps for group-invariant manifolds. Arxiv:2303.16169, 2023.
[25] Ryan K. Hylton and Matthew T. Swulius. Challenges and triumphs in cryo-electron tomography. iScience, 24(9):102959, 2021.
[26] Z. Ji, Q. Chen, Q.-S. Sun, and D.-S. Xia. A moment-based nonlocal-means algorithm for image denoising. Information Processing Letters, 109(23-24):1238–1244, 2009.
[27] J. Kileel, A. Moscovich, N. Zelesko, and A. Singer. Manifold learning with arbitrary norms. Journal of Fourier Analysis and Applications, 27(5), 2021.
[28] S. Lafon and R.R. Coifman. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5–30, 2006.
[29] B. Landa and Y. Shkolnisky. Steerable principal components for space-frequency localized images. SIAM Journal on Imaging Sciences, 10(2):508–534, 2017.
[30] J.M Lee. Introduction to smooth manifolds, Second Edition. Number 218 in Graduate Texts in Mathematics. Springer, 2013.
[31] D. Marcos, M. Volpi, and D. Tuia. Learning rotation invariant convolutional filters for texture classification. 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2012–2017, 2016.
[32] J. Munkres. Topology. Pearson Modern Classics for Advanced Mathematics. Pearson Education, Inc., 2000.
[33] D. Potts, J. Prestin, and J. Vollrath. A fast algorithm for nonequispaced fourier transforms on the rotation group. Numerical Algorithms, 52:355–384, 2009.
[34] D. Potts, G. Steidl, and M. Tasche. Fast algortihms for discrete ploynomial transforms. Mathematics of computation, 67(224):1577–1590, 1998.
[35] E. Rosen, X. Cheng, and Y. Shkolnisky. G-invariant diffusion maps. ArXiv:2306.07350, 2023.
[36] S. Rosenberg. ”The Laplacian on a Riemannian manifold: an introduction to analysis on manifolds”. ”Cambridge University Press”, ”1997”.
[37] N. Sharon, J. Kileel, Y. Khoo, B. Landa, and A. Singer. Method of moments for 3d single particle ab initio modeling with non-uniform distribution of viewing angles. Inverse Problems, 36(4), 2020.
[38] Y. Shkolnisky and B. Landa. The steerable graph laplacian and its application to filtering image datasets. SIAM Journal on Imaging Sciences, 11(4):2254––2304, 2018.
[39] P.Y. Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pages 958–963, 2003.
[40] A. Singer. From graph to manifold laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128––134, 2006.
[41] A. Singer and H.-T. Wu. Vector diffusion maps and the connection Laplacian. Communications on pure and applied mathematics, 65(8):1067–1144, 2012.
[42] A. Singer, Z. Zhao, Y. Shkolnisky, and R. Hadani. Viewing angle classification of cryo-electron microscopy images using eigenvectors. SIAM Journal on Imaging Sciences, 4(2):723–759, 2011.
[43] R. Talmon, I. Cohen, S. Gannot, and R.R. Coifman. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs. IEEE Signal Processing Magazine, 30:75–86, 2013.
[44] K. Tapp. Matrix Groups for Undergraduates, volume 29 of Student Mathematical Library. American Mathematical Society, 2005.
[45] N. Thomas, T. E. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. CoRR, abs/1802.08219, 2018.
[46] L. Tu. Differential Geometry, Connections,Curvature, and Characteristic Classes. Graduate Texts in Mathematics. Springer, 2017.
[47] N. Vilenkin. Special Functions and the Theory of Group Representations. The American Mathematical Society, 1968.
[48] M. Weiler, M. Geiger, M. Welling, W. Boomsma, and T. Cohen. 3D steerable CNNs: Learning rotationally equivariant features in volumetric data. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 10402–10413, Red Hook, NY, USA, 2018. Curran Associates Inc.
[49] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow. Harmonic networks: Deep translation and rotation equivariance. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7168–7177, 2017.
[50] X. and H. Wu. Convergence of graph laplacian with knn self-tuned kernels. ArXiv:2011.01479, 2020.
[51] S. Zhang, A. Moscovich, and A. Singer. Product manifold learning. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3241–3249. PMLR, 2021.
[52] Z. Zhao, Y. Shkolnisky, and A. Singer. Fast steerable principal component analysis. IEEE Transactions on Computational Imaging, 2(1):1–12, 2016.
[53] Z. Zhao and A. Singer. Rotationally invariant image representation for viewing direction classification in cryo-EM. Journal of structural biology, 186(1):153–166, 2014.
[54] S. Zimmer, S. Didas, and J. Weickert. A rotationally invariant block matching strategy improving image denoising with non-local means. In Proc. 2008 International Workshop on Local and Non-Local Approximation in Image Processing, pages 135–142, 2008.

	$\displaystyle\sum_{i=1}^{N}D_{ii}\cdot\int_{G}\left\|f_{i}(A)\right\|^{2}d\eta(A)$	$\displaystyle=\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\left\|f_{i}(A)\right\|% ^{2}d\eta(A)d\eta(B)$
		$\displaystyle=\sum_{i,j=1}^{N}\int_{G}\int_{G}W_{ij}(A,B)\left\|f_{j}(B)\right\|% ^{2}d\eta(A)d\eta(B).$		(B.4)

$\displaystyle\int_{\mathcal{M}}H_{i}(x)p(x)d\omega(x)$	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{% \mathcal{M}}\int_{G}\exp{\left\{-{\left\\|x_{i}-B\cdot x\right\\|^{2}}{/% \varepsilon}\right\}}f(B\cdot x)d\omega(x)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{G}% \int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}% \right\}}f(y)dU_{A}^{*}(\omega)(y)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{G}% \int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}% \right\}}f(y)d\omega(y)d\eta(A)$
	$\displaystyle=\frac{1}{\operatorname{Vol}\left\{\mathcal{M}\right\}}\int_{% \mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/\varepsilon}\right\}}f(% y)d\omega(y),$	(D.9)

$\displaystyle z^{*}Yz$	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}+\sum_% {i<j}^{N}(Y)_{ij}\overline{z_{i}}z_{j}+\sum_{i>j}^{N}(Y)_{ij}\overline{z_{i}}z% _{j}$
	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}+\sum_% {i<j}^{N}Y_{ij}\overline{z_{i}}z_{j}-\sum_{i<j}^{N}(\overline{Y})_{ij}z_{i}% \overline{z_{j}}$
	$\displaystyle=i\sum_{j=1}^{N}\left\|(Y)_{jj}\right\|\left\|z_{j}\right\|^{2}-2i% \cdot\text{Im}\left\{\ \sum_{i<j}^{N}(Y)_{ij}\overline{z_{i}}z_{j}\right\},$	(D.55)

$\displaystyle\lim_{N\rightarrow\infty}C_{i,N}^{1}$	$\displaystyle=\int_{G}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{}\cdot y)dU_{A}^{}(\omega)(y)d\eta(A)$
	$\displaystyle=\int_{G}\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{*}\cdot y)d\omega(y)d\eta(A)$
	$\displaystyle=\int_{\mathcal{M}}\int_{G}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{% 2}}{/\varepsilon}\right\}}f(y)p(A^{*}\cdot y)d\eta(A)d\omega(y)$
	$\displaystyle=\int_{\mathcal{M}}\exp{\left\{-{\left\\|x_{i}-y\right\\|^{2}}{/% \varepsilon}\right\}}f(y)\tilde{p}(y)d\omega(y),$	(E.2)

The G𝐺Gitalic_G-invariant graph Laplacian

Abstract

1 Introduction

2 Related work

3 Preliminaries

3.1 Manifolds under actions of matrix Lie groups

Definition 1.

Definition 2.

Definition 3.

3.2 Haar integration

Definition 4.

3.3 Harmonic analysis on compact matrix Lie groups

Definition 5.

Definition 6.

Remark 1.

4 The G𝐺Gitalic_G-invariant graph Laplacian

Definition 7.

Definition 8.

Lemma 9.

4.1 Eigendecomposition of the G𝐺Gitalic_G-GL

Theorem 10.

4.2 Convergence of the G𝐺Gitalic_G-invariant graph Laplacian

Theorem 11.

Remark 2.

4.3 Numerical examples

5 Denoising G𝐺Gitalic_G-invariant data sets

6 Implementation details and computational complexity

7 Summary and future work

Acknowledgements

Appendix A The FFT over S⁢U⁢(2)𝑆𝑈2SU(2)italic_S italic_U ( 2 )

Appendix B Proof of Lemma 9

Appendix C Proof of Theorem 10

Lemma 12.

Appendix D Proof of Theorem 11

D.1 Convergence of the G𝐺Gitalic_G-invariant graph Laplacian

D.2 The convergence rate

Theorem 13.

Lemma 14.

D.3 Real manifolds embedded in ℂnsuperscriptℂ𝑛\mathbb{C}^{n}blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

D.4 Coordinate charts on Lie groups

D.5 G-invariant local parametrization of the data manifold

Definition 15.

Theorem 16.

Lemma 17.

Proof.

Lemma 18.

Proof.

Lemma 19.

Proof.

D.6 Proof of Lemma 14

Appendix E Non-uniform sampling distribution

Appendix F Eigendecomposition of the normalized G𝐺Gitalic_G-GL

Theorem 20.

References

The $G$ -invariant graph Laplacian

4 The $G$ -invariant graph Laplacian

4.1 Eigendecomposition of the $G$ -GL

4.2 Convergence of the $G$ -invariant graph Laplacian

5 Denoising $G$ -invariant data sets

Appendix A The FFT over $SU(2)$

D.1 Convergence of the $G$ -invariant graph Laplacian

D.3 Real manifolds embedded in $\mathbb{C}^{n}$

Appendix F Eigendecomposition of the normalized $G$ -GL