Eitan Rosen
Department of Applied Mathematics, Tel Aviv University
Paulina Hoyos
Department of Mathematics, The University of Texas at Austin
Xiuyuan Cheng
Department of Mathematics, Duke University
Joe Kileel
Department of Mathematics, The University of Texas at Austin
Yoel Shkolnisky
Department of Applied Mathematics, Tel Aviv University
Abstract
Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group . We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of on the data set. We deem the latter construction the “-invariant Graph Laplacian” (-GL).
We show that the -GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the -GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group .
1 Introduction
A popular modeling assumption in data analysis is that the observed data lie on a low dimensional manifold that is embedded in high dimensional Euclidean space. When is a linear subspace, it can be identified by using principal component analysis (PCA). However, most often is non linear. A leading approach for analyzing data with a nonlinear manifold structure is to encode the data by using a graph, whose vertices are the data points, and whose edge weights encode the similarities between pairs of points. These similarities can be used to form a matrix known as the graph Laplacian, and its eigenvectors and eigenvalues are used for tasks such as dimensionality reduction, clustering, and denoising ([28, 2, 43]). While the term graph Laplacian has been given different definitions in different contexts [12, 22], in this paper we adopt the definition and notation in [3] and [28].
Formally, let be a set of points that reside on a compact and smooth -dimensional manifold embedded in . We form the matrix with , where is a positive semi-definite kernel function.
The graph Laplacian is then defined as the matrix given by
(1.1)
where is the diagonal matrix with the -th’ element on the diagonal given by in (1.1). Various choices for the kernel have been utilized in the literature [3, 41]. In this work, we make the popular choice of being the Gaussian kernel function, due to its favorable analytical properties. In this case,
(1.2)
where is a bandwidth to be determined from the data.
A particularly important matrix related to of (1.1) is the random-walk normalized graph Laplacian, defined as
(1.3)
The matrix is row-stochastic, and thus, may be viewed as the transition probability matrix of a random walk over the data points (which gives its name).
The latter view was adopted in the seminal work [28], where the eigenvectors of (which are identical to those of ) and its eigenvalues are used to construct “diffusion maps”, a successful machine learning framework for dimensionality reduction and clustering of manifold data.
Furthermore, in [4] it was shown that if the data points are sampled uniformly from a manifold , then converges to the Laplace-Beltrami operator on as and . Formally, it was shown that for a sufficiently smooth , with high probability
(1.4)
The latter result has important theoretical and practical consequences. First, we observe that the convergence rate of the graph Laplacian to depends on the intrinsic dimension of and not on the ambient high dimension of the data points, mitigating the “curse of dimensionality” [8]. Second, it is known that the eigenfunctions of provide a basis for the space of square-integrable functions on [36]. For example, when is the circle , the eigenfunctions of are given by the Fourier modes . Recent results [9] show that the eigenvectors of converge to the eigenfunctions of . This implies that the eigenvectors of the graph Laplacian constructed from a data set sampled from are discrete approximations to the Fourier modes, giving rise to classical discrete Fourier analysis.
Analogously, the eigenvectors of the graph Laplacian constructed by using a sample from a general compact manifold can be employed for a data-driven discrete Fourier analysis on [43].
In various scenarios, the data set under consideration is closed under the action of a group, namely, there is a known group such that if is a point in our data set, then for each the point resulting from the action of on is a valid data point (which is not necessarily in the data set, but may be added to it). Such data sets are called “-invariant”. For example, in electron-microscopy imaging, a method to determine the 3D structure of a molecule from its 2D images acquired by an electron microscope [20], all the images lie on a manifold of dimension 3 (diffeomorphic to the 3D rotations group ). The planar rotation of any such image is a valid image that may have been acquired by the microscope. Thus, the manifold of images is closed under the action of the rotations group .
In [38], it was shown how to construct the graph Laplacian from all given images and all their infinitely many in-plane rotations. This construction is deemed “the steerable graph Laplacian”. A key result of [38] is that the steerable graph Laplacian converges to the Laplace-Beltrami operator on (in this case a -invariant compact manifold) faster than the graph Laplacian (1.3). Specifically, it was shown that the steerable graph Laplacian approximates with an error that is given asymptotically by
(1.5)
The latter error converges to zero at a rate that depends on , and so converges to zero faster than the corresponding error term (1.4) of the standard graph Laplacian (1.3), whose convergence rate depends on . This improved convergence rate is attributed to the following two facts. The first is that all infinitely many in-plane rotations of each given image are known. The second is that the action of the rotations group on the image manifold accounts for one dimension of (since planar rotations are parametrized by a single angle in ).
Combining these two facts implies that the error depends only on dimensions.
Furthermore, it was shown in [38] that the eigenfunctions of the steerable graph Laplacian are tensor products between certain -dimensional vectors and complex exponentials. This special form of the eigenfunctions gives rise to efficient algorithms for their computation. These eigenfunctions are used in [38] for filtering noisy functions on using a Fourier-like scheme, and are shown to result in an improved error bound compared to the bound achieved by employing the eigenvectors of the standard graph Laplacian [43].
This paper is the first part of a two part work presenting a graph Laplacian based framework for the analysis of -invariant data sets.
In this paper, we extend the results of [38] (which focuses on image manifolds closed under planar rotations) to the setting where the given data points lie on an arbitrary compact manifold of dimension , closed under the action of an arbitrary compact unitary matrix Lie group . An example of such a data set is a collection of subtomograms (volumes) in cryo-electron tomography [25, 15], which due to the experimental setup are arbitrarily rotated in space, that is, . The results in the current paper also lay the foundations for Part II [35], where we develop low-dimensional embeddings of the -invariant data (which were not proposed in [38]) of two types. The first type is a -invariant embedding, which means that any two points which are related by the action of an element of are embedded into the same point. In the context of machine learning, this embedding may be used to organize the data into clusters where the points in each cluster are related by the action of a group element (for example, images which are rotations of one another). The second type is a -equivariant embedding, which means that the embeddings of two points which are related by the action of an element of , are themselves related by the action of the same element. Such embeddings may be applied, for instance, to align images which are rotations of one another.
The contributions of this paper are as follows.
First, we construct the -invariant graph Laplacian (-GL), which is conceptually the standard graph Laplacian (1.3) constructed using a data set consisting of the given data points as well as all (infinitely many) points generated by applying to the given points. Second, we show that if is the dimension of , then the -GL approximates the Laplace-Beltrami operator on the manifold with an error given asymptotically by
(1.6)
analogously to (1.4) and (1.5). The result (1.6) is of great practical importance, as the improved convergence rate implies that significantly less data is required in order to achieve a prescribed accuracy, compared to the standard graph Laplacian.
Third, we derive the eigenfunctions of the -invariant graph Laplacian and show that they admit the form of products between certain vectors and the elements of the irreducible unitary representations of . Furthermore, we show that this form of the eigenfunctions enables their efficient computation while avoiding explicitly augmenting the input data
(that is, by adding the points for every point in the data set, and all ).
We then demonstrate the utility of these eigenfunctions in filtering a noisy data set sampled from the four-dimensional unit sphere.
We comment that different proofs for some of the theoretical results in this paper can be found in [24]. The proof strategies in both papers differ in that here we explicitly construct a -invariant local parametrization of the data manifold, whereas [24] uses a less concrete approach by passing to the abstract quotient manifold. The advantage of the approach taken here is that while [24] uses advanced machinery from fiber bundle theory, here we employ mostly basic instruments from manifold calculus in Euclidean spaces, which is accessible to a wider audience.
This paper is organized as follows. In Section 2, we review some related work on group invariance and compare it with our approach. In Section 3, we discuss the structure induced on the data manifold by the group action, and introduce some basic machinery from representation theory used in this work. In Section 4, we introduce the -invariant graph Laplacian and present its key properties. In Section 5, we demonstrate how to use the eigenfunctions of the -invariant graph Laplacian to filter noisy data sets. In Section 6 we describe the details of the numerical computation of the -GL, and discuss computational complexity. Lastly, in Section 7, we summarize our results and discuss future work.
2 Related work
Other works dealing with group invariance typically focus on rotation invariance, especially in image processing algorithms [16, 54, 26, 41, 42, 53]. There are four main approaches in the literature towards rotation invariance. The first approach is based on the steerable PCA [52, 29], which computes the PCA of a set of images and all their infinitely many rotations, namely, finds the linear subspace which best spans a set of images and all their rotations. In a sense, our work is a generalization of this approach to nonlinear manifolds and to general compact matrix Lie groups (and not just rotations). The second approach towards rotation invariance is defining a rotationally-invariant distance for measuring pairwise similarities and constructing graph Laplacians using this distance [41]. Unfortunately, it is often not obvious what invariant distance is most appropriate for the task at hand, and how to compute it efficiently. Furthermore, in general,
the limiting operator resulting from such a construction is either unknown, or is not the Laplace-Beltrami operator [27], in which case, its properties are not well understood.
In our approach, on the other hand, we consider not only the distance between best matching rotations of image pairs (nor any other type of a rotationally-invariant distance), but rather the standard (Euclidean) distance between all rotations of all pairs of images. We show that all these pointwise Euclidean distances can be computed efficiently by using FFT-type algorithms (when available), and that the resulting operator converges to the Laplace-Beltrami operator on the data manifold. This enables us to preserve the geometry of the underlying manifold (in contrast to various rotation-invariant distances) while making the resulting operator (the -invariant graph Laplacian) invariant to the action of the group on our data set. Moreover, our approach is applicable not only to rotations, but rather to any compact matrix Lie group . The third approach to group invariance is based on CNNs [45, 48, 49] that produce group equivariant features (for low dimensional rotation groups) by convolving the data with steerable basis functions in each layer. However, this approach lacks solid theory, and in particular, provides no error bounds and no means for analyzing the properties of the resulting tools. Furthermore, unlike CNNs, our approach is applied directly to unlabeled data.
The fourth approach, also commonly based on CCN’s, is to augment a given data set , by adding to it all the points of the form for some finite set of elements [13, 14, 18, 31, 39]. This approach suffers from several shortcomings. First, since we have chosen a finite set of elements , the augmented data set is only approximately invariant to the action of . Second, the augmented data set is larger than the original data set by a factor of , which poses computational challenges. Third, if the data are noisy, this approach introduces correlations in the noise of different data points. In contrast, in our approach, we derive a numerically efficient construction of a -invariant operator that is equivalent to constructing the standard graph Laplacian in (1.1) from all the (infinitely many) points generated by the action of on the points in , without explicitly augmenting .
We note the work [51], which although does not deal with group invariance, makes an important contribution by deriving an algorithm for manifold factorization of product manifolds. We can consider the algorithm in [51] as a form of invariant learning, as the goal of the algorithm there is to learn submanifolds that are independent of the other submanifolds comprising the product, and thus, can be used to learn the submanifolds which are generated by the action of the group, and factor them out. However, as we later explain, in our setting, any sufficiently small neighborhood of the data manifold is isomorphic to a product of manifolds, but itself need not be a product of manifolds. In that sense, the setting in [51] is much more restrictive.
Finally, a special attention should be given to [17]. Similarly to the works mentioned above, this work also defines a group invariant distance by looking at a single group element that best “aligns” a given pair of points. In that sense, its approach is fundamentally different from what we propose here. Yet, this is the first work we know of that addresses the group invariance problem for arbitrary Lie groups.
3 Preliminaries
3.1 Manifolds under actions of matrix Lie groups
In this section, we describe our model for data sets closed under the action of a matrix Lie group. In particular, we define matrix Lie groups and their action on the data set.
Definition 1.
A matrix Lie group is a smooth (that is, differentiable) manifold , whose points form a group of matrices.
For example, consider the group of unitary matrices with determinant 1. Each matrix can be written using Euler angles as
(3.1)
where and . Using the fact that
the sum of squares of the entries of equals one, it is easily inferred that is diffeomorphic to the three-dimensional unit sphere . Other important examples for matrix Lie groups include the group of three-dimensional rotation matrices , and the -dimensional torus , which is simply the group of diagonal unitary matrices.
Definition 2.
The action of a group of matrices on a subset is the map , defined for each and by matrix multiplication on the left .
We say that a set is closed under the action of a group , or simply -invariant, if for all and .
In this work, we assume that we are given a data set sampled from a smooth, compact, and -invariant manifold without boundary, embedded in , where is a unitary matrix Lie group. In particular, the -invariance implies that for all and .
An additional useful characterization of -invariant manifolds is derived from the following definition.
Definition 3.
For a fixed point , the orbit generated by the action of on is defined as the set
(3.2)
Thus, a manifold is -invariant if for all , that is, contains all the orbits of the action of on its points. In particular, this implies that the set
(3.3)
of points generated by the action of on the data set is a -invariant subset in .
In Section 4, we construct the central object in our framework, namely, the -invariant graph Laplacian (-GL), which is a graph Laplacian constructed by using not only the points in but rather all the points in .
Finally, we will assume that the Lie group is also compact. In the rest of this section, we give a short introduction to the theory of harmonic analysis on compact Lie groups, which is essential for the construction of the -GL.
3.2 Haar integration
The theory of harmonic analysis on matrix Lie groups requires integrating functions over these groups. This is known as “Haar integration” since it is performed with respect to the Haar measure, which we now define.
Definition 4.
A Haar measure over a Lie group is a finite valued, non-negative function over all (Borel) subsets , such that
(3.4)
By Haar’s theorem (see .e.g [19]), for every compact matrix Lie group there exists a Haar measure which is unique up to a multiplicative constant. In this work, we choose (without loss of generality) the unique measure such that
(3.5)
and henceforth refer to this as “the Haar measure over ”.
Essentially, the function measures the volume of subsets of the manifold . Specifically, property (3.5) makes a probability measure over . Furthermore, property (3.4), known as ’left invariance’, means that multiplication by a matrix from the left maps the set to another subset of of the same measure, implying that is uniform over .
In the context of integration, property (3.4) implies that the Haar integral is left-invariant, namely, for any we have that
(3.6)
where we substituted in the first equality, and used (3.4) in the second equality.
As an example of a Haar integration, the integral of a function over can be computed in terms of Euler angles by (see [11])
(3.7)
where is defined in (3.1). In this case, the volume element induced by the Haar measure is just multiplied by , which is the absolute value of the Jacobian determinant of the parametrization of by Euler angles.
3.3 Harmonic analysis on compact matrix Lie groups
The framework we develop below in Section 4 employs series expansions of functions over compact matrix Lie groups. The expansion of a function is obtained in terms of the elements of certain matrix-valued functions, known as the irreducible unitary representations of , which we now define.
Definition 5.
An -dimensional unitary representation of a group is a unitary matrix-valued function from to the group U(n) of unitary matrices, such that
(3.8)
and the identity element in is mapped to the identity element in U(n). The homomorphism property (3.8), implies that the set is also a matrix Lie group. In particular, the latter implies that each element of the matrix valued function is a smooth function over .
Definition 6.
A group representation is called reducible if there exists a unitary matrix such that is block diagonal for all . A group representation is called irreducible if it is not reducible. We abbreviate irreducible unitray representation as IUR.
By the Peter-Weyl theorem [7], there exists a countable family of finite dimensional IURs of , such that the collection of all the elements of all these IURs forms an orthogonal basis for .
This implies that any smooth function can be expanded in a series of the elements of the IURs of .
For example, the IURs of in (3.1) are given by a sequence of matrices , where , and is a dimensional matrix for each (see e.g. [11]). In fact, the matrices in (3.1) correspond to the IUR of with .
The series expansion of a function is then given by
(3.9)
where is a countable set that enumerates the IURs of , is the dimension of the -th IUR, and is the Haar measure on . The latter can also be written in the form
(3.10)
where is the matrix given by
(3.11)
for all .
Remark 1.
The group of two-dimensional rotations is a one dimensional matrix Lie group, whose IURs are given by the Fourier modes . Thus, the series expansion of an -valued function in terms of the IURs of is nothing but the classical Fourier series. In this sense, the expansion (3.10) can be viewed as generalized Fourier series over , with the Fourier modes replaced by the IURs , and with coefficients given by the matrices of (3.11).
4 The -invariant graph Laplacian
In this section, we construct the -invariant graph Laplacian (-GL) - a generalization of the standard graph Laplacian (1.1) for data sets sampled from a -invariant manifold . We then compute the -GL’s eigendecomposition, and show that a proper normalization of the -GL converges to the Laplace-Beltrami operator on significantly faster than (1.1).
Let be a data set sampled from a -invariant (see Definition 2) compact manifold .
Our goal is to construct the graph Laplacian by using all the points in the -invariant set in (3.3). As we will see shortly, our construction results in an operator (rather than a matrix) over a certain Hilbert space, which we now define.
Definition 7.
Given a data set , let be the set of pairs
(4.1)
where each pair corresponds to the point .
We define the Hilbert space as the space of functions of the form , where for all , endowed with the inner product
(4.2)
where is the Haar measure on .
Now, let be an diagonal matrix. We define the action of on a function by
(4.3)
where is the ’th element on the diagonal of . Equipped with Definition 7 and (4.3), we are now ready to define the -GL.
Definition 8.
Let be the operator acting on functions by
(4.4)
and let be the diagonal matrix defined by
(4.5)
where is the identity element in . The -invariant graph Laplacian (-GL) is defined as the operator given by
(4.6)
Note that by (4.4), we have that for all and , which implies that the operator is symmetric. Combining the latter with (4.6) implies the same for . The following result asserts that is a positive semi-definite operator.
Lemma 9.
The -GL admits the positive semi-definite quadratic form
(4.7)
The proof of Lemma 9 is given in Appendix B.
The form (4.7) is analogous to the quadratic form of the standard graph Laplacian [5]. Thus, this form is important on its own right since it can be used as a smoothness regularization term in various machine learning algorithms where the objective function is assumed to have been sampled over a compact -invariant manifold. This idea was first proposed in [5], and rigorously justified in [50]. Intuitively, the quantity puts large penalties on the differences when is large, that is, when there exist such that the points and are close.
Thus, the quantity can be viewed as imposing a notion of smoothness on functions over the domain in (4.1).
Analogously to the results in [38, 40], below we show that the normalization
(4.8)
of in (4.6) converges to the Laplace-Beltrami operator on .
While other useful normalizations of are possible, in the current work, we mainly focus on (4.8). Thus, we henceforth refer to (4.8) as the normalized -GL.
As mentioned above, unlike previous works [2, 41], our construction results in an operator over a Hilbert space rather than a matrix (compare with (1.1)). This is a direct consequence of the continuous nature of the set in (4.1), being a product between a discrete set and a Lie group , on account of being a smooth manifold by Definition 1.
As we will see next, the continuity of also implies that the -GL admits an infinitely-countable basis of eigenfunctions for the space (see Definition 7) instead of the finite set of eigenvectors of the graph Laplacian matrix in (1.1). In particular, the eigenfunctions of the -GL can be evaluated for any .
4.1 Eigendecomposition of the -GL
We now derive the eigendecompostions of the -GL (4.6), and its normalized version (4.8).
Let be a -invariant compact and smooth manifold, without a boundary, where is a compact Lie group of unitary matrices. Let be a data set sampled from .
By (4.4), and since is unitary, for each we have that
(4.9)
That is, each function in (4.4) only depends on the quotient .
Thus, by using (3.10), we can expand the function in the Fourier series
Clearly, the -GL is completely characterized by the set of matrices of (4.11), since for any and , the kernel function can be recovered from them.
The following theorem characterizes the eigendecomposition of of (4.6) in terms of certain products between the columns of the IURs , and the eigenvectors of the block matrices
(4.12)
of dimension whose -th block of size is of (4.11).
To derive the eigendecomposition, we introduce the following notation.
For any vector and any , we denote by
(4.13)
the elements up to of stacked in a -dimensional row vector.
Theorem 10.
For each , let be the block-diagonal matrix whose -th block of size on the diagonal is given by the product of the scalar in (4.5) with the identity matrix.
Then, the -invariant graph Laplacian in admits the following:
1.
A sequence of non-negative eigenvalues , where is the -th eigenvalue of the matrix .
2.
A sequence of eigenfunctions, which are orthogonal and complete in and are given by
(4.14)
where is the eigenvector of which corresponds to its eigenvalue . Furthermore, for each and , the eigenfunctions correspond to the eigenvalue of the -invariant graph Laplacian.
The proof of Theorem 10 is given in Appendix C.
A nearly identical theorem (Theorem 20) characterizing the eigendecomposition of the normalized -GL in (4.8) is given below in Appendix F. Theorem 20 states that for the normalized -GL we only need to replace the eigenvectors above with the eigenvectors of the sequence of matrices
(4.15)
with the only difference that the resulting eigenfunctions
(4.16)
are no longer orthogonal due to the fact that the matrices in (4.15) are generally not Hermitian.
The form of the eigenfunctions in (4.14) is of practical importance for numerical computations, as it implies that the
eigendecomposition of the -GL can be obtained by diagonalizing the sequence of matrices of (4.12). Furthermore, for groups which are common in applications (e.g. ) all the elements of the Fourier matrices can be computed efficiently by employing generalized FFT algorithms [11, 33]. We provide the details of such a computational procedure for the case in Appendix A below.
4.2 Convergence of the -invariant graph Laplacian
We now show that the normalized -invariant graph Laplacian (4.8) converges to the Laplace-Beltrami operator on . Furthermore, we show that the convergence has an improved rate which scales with instead of , where is the dimension of the group .
Theorem 11.
Let be a smooth -dimensional compact manifold without boundary, closed under the action of a matrix Lie group . Let be i.i.d with the uniform probability density function , and suppose that for all with probability one. Let be a smooth function, and define so that . Then, with high probability, we have that
(4.17)
The proof of Theorem 11 is given in Appendix D.
We point out that the requirement that with probability one ensures that the orbits generated by the action of (see Definition 3) are diffeomorphic to . This eliminates pathological cases in which
the convergence analysis of the -GL may become over-complicated or even superfluous, while still accounting for a broad class of data manifolds.
Inspecting (4.17), we observe that as , the -GL estimates with a bias error of given by the third term on the r.h.s. The second term on the r.h.s accounts for the variance of the estimator when the sample size is finite. Thus, we conclude that the -GL reduces the variance error compared to that of the standard GL in (1.4), proportionally to the dimension of .
The improvement in the variance error (4.17) in comparison to (1.4) can be explained as follows.
In the proof of Theorem 11, we show that any sufficiently small neighborhood can be written as a disjoint union of orbits generated by . In fact, we show that there exists a set of coordinates for such that given a point , the first coordinates specify the orbit in which resides, while
the last coordinates indicate the position of on that orbit. In other words, these last coordinates are the “directions” in which acts on .
The construction of the -GL incorporates all the points in the set in (3.3), namely, those generated by following the directions in in which acts on the data set . Thus, the variance error of approximating by the -GL stems entirely from sampling the remaining directions in , resulting in the reduced variance error in (4.17).
Remark 2.
In Theorem 11, we have assumed that the sampling density is uniform over . In Appendix E, we show that in the case where is non-uniform, the operator in (4.8) converges to the Fokker-Planck operator , given for every smooth function by
(4.18)
where is the probability density given by
(4.19)
Furthermore, we show that there exists a normalization of in (4.6) (different from in (4.8)) that still converges to .
The practical importance of Theorem 11 is that we expect the eigenvalues and eigenfunctions of the -GL to approximate those of the Laplace-Beltrami operator better than the standard normalized graph Laplacian (1.3). We support this conjecture by simulations in the following section.
4.3 Numerical examples
At this point, we wish to demonstrate the improved convergence rate (4.17) with some numerical examples. In the following simulation, we let the group (of unitary matrices with determinant one) act on a data set sampled from the four-dimensional sphere , as follows. First, we sample a set of points and embed them in the Euclidean space via the map
(4.20)
where we denote by the -th coordinate in .
Then, we let the group act on each embedded point of (4.20) via the multiplication
(4.21)
where was defined explicitly in (3.1).
We then apply the -invariant graph Laplacian to the test function
(4.22)
at the point , where we denote by the -th coordinate of .
It can be shown that the coordinate functions on are eigenfunctions of the Laplace-Beltrami operator corresponding to the eigenvalue (see [1]). Thus, we have that and . To demonstrate the convergence and variance error of (4.17), we uniformly sample points , and generate the data set by using (4.20). We then approximate by applying , the normalized -GL, to the function for
by
(a)
(b)
Figure 1: Improved convergence rates of the -invariant GL and the invariant GL.
(4.23)
The quantity (4.23) can be approximated efficiently by using the parametrization (3.1) of by Euler angles, together with Gauss-Legendre quadratures to approximate the integrals over .
We observe that for large values of , the error (4.17) is dominated by the term , while
for small values of , the error is dominated by the middle term on the r.h.s of (4.17), which accounts for the sampling variance of the approximation. Thus, we refer to the error for small values of as the ’variance dominated region’ of the error.
The results of this experiment are depicted in Figure 1(a), where we plot the log-error of approximation of by (4.23) against different values of .
The slope of the log-error in the variance dominated region is -1.4122 for the normalized standard graph Laplacian (abbreviated standard-GL) and -0.7048 for the normalized -GL, supporting the classical result (1.4) for the normalized standard-GL, and (4.17) for the normalized -GL, which predict slopes of -1.5 and -0.75 respectively, when substituting and .
As another example, we simulated the action of the torus group , defined as the group of all diagonal unitary matrices, on the unit 3-sphere . In a similar fashion to our first example, we embed the sampled data points into via the map
(4.24)
and let act on each point by matrix multiplication. We then repeat the steps of our first simulation, computing the -GL by using samples from and applying it to the function for in (4.22) (defined over ), at the point . Using the fact the coordinate functions on are eigenfunctions of corresponding to the eigenvalue (see [1]), we obtain that .
The plot of the logs of the approximation errors of by the normalized -GL and the normalized standard-GL against different values of show the same qualitative picture as in the first simulation. In particular, the slope of the log-error in the variance dominated region is for the normalized standard-GL, and for the -GL, supporting the results (1.4) and (4.17), which predict slopes of -1.25 and -0.75, respectively, when and (since is a two-dimensional group).
(a)
(b)
(c)
Figure 2: The real part of the eigenfunction . Figure (a) shows the values at points in the data set which were projected to the plane. Figure (b) shows the values at circles generated by the action of on . Figure (c) shows the nested tori obtained via stereographic projection onto of two of the orbits in generated by the action of .(a)Figure 3: The 50 smallest eigenvalues of the normalized -GL, scaled by (green), the normalized standard-GL, also scaled by (blue), and 50 smallest eigenvalues of (red). Both graph Laplacians were computed by using the same data points, where for the normalized -GL, and for the normalized standard-GL.
We also computed the smallest eigenvalues of the normalized -GL on , scaled by in accordance with (4.17), and the smallest eigenvalues of the normalized standard-GL, also scaled by (see (1.4)). We used (the same) points for the construction of both graph Laplacians, with bandwidth parameter values of for the normalized -GL, and for the standard graph Laplacian. The values of were chosen to minimize the mean absolute error of approximating the eigenvalues of by those of each graph Laplacian.
The results are illustrated in Figure 3. The red bars depict the eigenvalues of which are given by the unique values and with respective multiplicities and (see e.g. [1]). The green and blue bars depict
the eigenvalues of the normalized -GL, and those of the normalized standard-GL, respectively.
While for both graph Laplacians the multiplicities are in agreement with those of , it is clear that the eigenvalues of the normalized -GL better approximate those of than those of the normalized standard-GL.
Lastly, we illustrate how constructing the normalized -GL by using all the points in is manifested in the eigenfunctions (4.14). The IUR’s of (see Definitions 5 and 6) are all one-dimensional, and are given by the set of products of Fourier modes , which can be conveniently enumerated by the set . Thus, Theorem 10 implies that the eigenfunctions in (4.14) take the form of a Kronecker product between an -dimensional vector and a bivariate function .
To visualize the eigenfunctions, we first map the points in to by using the stereographic projection from . It can be shown that each orbit gets projected to a torus in (a “bagel-shaped” surface), and furthermore, that the image of under this projection is a union of nested tori that fill all of . Figure 2(c) depicts two of these tori (one nested inside the other), generated by the action of on a pair of data points in , colored according to the values of , the real part of the function (i.e. . In Figure 2(a), we show the values of at the points of the stereographic projection of , which were projected to the -plane in , and in Figure 2(b), we show the values of at intersection of the -plane with all the tori generated by the action of on those points, which happens at planar circles. In particular, each circle in Figure 2(b) is generated by the action of on a point in Figure 2(a), which illustrates how the eigenfucntions account for the group action.
5 Denoising -invariant data sets
We now demonstrate how to apply Theorem 10 to denoise a data set sampled from an -invariant manifold. In the following simulations, we generate noisy samples from the -sphere according to the following model. For a scalar , we define the -tubular neighborhood of by
(5.1)
The set is simply a spherical shell of width in . A noisy sample of is generated by drawing points uniformly from for some fixed . Thus, the parameter controls the amount of noise in the data set.
We generate a data set by drawing points , and then map** each point to a point by using the map (4.20).
To apply our framework to denoise the data set , we consider the action of the group on defined in (4.21).
Using the notation in (4.20) and (4.21), we define the functions
(5.2)
for all and , where and denote the first and second elements of a vector in .
Clearly, we have that , and are all elements of the Hilbert space .
For each , the function is the -th’ coordinate of the points in the orbit , and thus is the -th coordinate function of the points in . Denote by the embedding of in by the map in (4.20).
Thus, the function attains the values of the -th coordinate of sampled at the points in .
We now denoise the data set as follows. First, we construct the normalized normalized -GL by using the points in the data set , and compute its eigenfunctions given by (4.14), as described by Theorem 10. We choose the bandwidth parameter so as to make the matrices in (4.12) sparse. Specifically, for a data set of points, we first subsample points and sort the elements in each of the rows of (which is real valued) corresponding to those points in descending order. The bandwidth is then chosen such that the values of the sorted elements in each row decay exponentially fast, and such that the index of the elbow of the scree plot of values in each row (defined as the first point where the derivative equals ) is (which is of the values).
We then expand each of the functions of (5.2) in terms of the eigenfunctions , and truncate the expansion.
A standard approach is to retain the terms in the expansion that correspond to eigenvalues above some threshold value. However, we truncate the expansion using the following observation. The -sphere can be completely recovered using the five eigenfunctions that correspond to the second leading eigenvalue of the Laplacian operator . This is simply due to the fact that the coordinate functions defined for each by , span the eigenspace that corresponds to the second smallest eigenvalue of [1].
Thus, we expect that the functions in (5.2) should be well approximated by the space spanned by the eigenfunctions corresponding to the five smallest eigenvalues of the normalized -GL after excluding the smallest eigenvalue. This suggests retaining only the terms corresponding to the latter eigenfunctions in the expansion of each coordinate function in (5.2). Finally, for each let denote the vector of values of the truncated expansion (just described) of the function of (5.2) at the points for all . The denoised data points are then given by
(5.3)
The denoising results of points sampled from for various values of are presented in Figure 1. Defining the error of approximation of each noisy point as the distance
(5.4)
for each value of , we report the mean squared error (MSE) of the approximation obtained by preforming our proposed denoising procedure using the normalized -GL. For comparison, we also report the MSE for the same data sets denoised by the eigenvectors of the normalized standard GL. Denoising using the normalized standard GL is implemented by viewing each column of the matrix
(5.5)
formed by stacking the data points in rows, as a sample of a coordinate function on , and projecting on the eigenvectors that correspond to the five smallest eigenvalues of the standard GL, after excluding the smallest one.
We observe that for moderate noise levels , denoising using the normalized -GL outperforms denoising using the normalized standard-GL by an order of magnitude, recovering the 4-sphere with high accuracy.
noisy data MSE
standard GL denoised data MSE
-GL denoised data MSE
0.1
3.3E-03
9.3E-04
5.04E-05
0.2
1.33E-02
3.11E-03
3.30E-04
0.4
5.33E-02
1.745E-02
1.6E-02
Table 1: MSE of noisy data before and after denoising.
6 Implementation details and computational complexity
In this section, we describe a numerical procedure to compute the eigendecomposition of the -invariant graph Laplacian in the case where . We point out that almost all of our analysis can be readily generalized to the case where is an arbitrary compact matrix Lie group, and we restrict ourselves to the case whose representation theory is well understood, for the sake of clarity and concreteness. In particular, the important case where is nearly identical to that of since the IUR’s of are a subset of those of .
With the exception of and the 2-dimensional torus , the dimension of a matrix Lie group is . Thus, even for a low-dimensional group such as , the integrals in (4.11), required to construct the matrices (4.12), need to be evaluated by triple sums. Such sums are computationally expensive even for moderate values of . Fortunately, for groups such as (and the closely related ) there exist generalized FFT algorithms that compute the Fourier coefficients efficiently [33].
The general approach for numerical integration over hinges upon the fact
that the elements of its IURs can be parameterized by Euler angles, and written in a separable form as a product of factors, each of which depends on a single angle. The integrals are then evaluated using quadrature formulas that are computed using FFT-type algorithms applied to each factor seperately, requiring operations where is a prescribed sampling resolution over the group. We give a detailed exposition of an -FFT in Appendix A below.
We now continue to describe analyze the complexity of computing the eigendecomposition presented in Theorem 10 for the case where acts on a data set by matrix multiplication.
The first step of the algorithm requires computing the affinities in (4.4) at sampling points, and in particular, the Euclidean pairwise distances inside each exponent.
In practice, the matrices are usually block-diagonal where each block is an IUR of (see e.g. [37, 38]). Formally, we write
(6.1)
where is the -th dimensional IUR of , and is the set of IURs that appear as blocks on the diagonal of , such that . Note that some of the IURs may appear more than once on the diagonal.
Accordingly, we can now index the coordinates of a point in the data set to match the indices of the rows of the IURs in the blocks of , by
(6.2)
That is, the indexing (6.2) partitions into tuples of length such that the action of on can be written as
(6.3)
for each and in (6.2). Altogether, in matrix form, we have that
(6.4)
To compute the matrices in (4.11), we must first evaluate the Euclidean distances
(6.5)
Expanding the squared norm function, we have
(6.6)
Then, expanding the inner product in the third term on the right hand side of (6.6), we get
(6.7)
where we denote
(6.8)
Given an integration parameter , we compute (6.7) and subsequently (6.6) for all matrices defined by using (3.1) and (6.1) as
(6.9)
where , and , and .
Once we have computed the coefficients , the third term in (6.6) can be computed for all with operations by using a generalized FFT algorithm for (see Appendix A).
Now, since the -th IUR consists of elements, the number of coefficients that need to be computed for a fixed pair and amounts to
where is the dimension of the points . Since (6.10) is bounded from above by the square of (6.11), we have that (6.10) is .
Finally, once we have computed the squared distances (6.6), we use Algorithm 2 to compute the elements of the matrices in (4.15), and compute their eigenvectors and eigenvalues. The entire procedure is described in Algorithm 1.
Algorithm 1 Evaluating the -invariant manifold harmonics
1:Input: A data set of points , integration parameter , and bandwidth parameter .
2:For every , apply Algorithm 2 with integration parameter , in conjunction with (6.6) and (6.7) to compute the affinities
3:For every and , apply Algorithm 2 to evaluate the generalized Fourier coefficient matrices of (4.11).
4:For every form the matrix
(6.13)
from (4.15), and return its eigenvectors and eigenvalues .
We now summarize the computational complexity of Algorithm 1. Given that we evaluate the Fourier series over points for each Euler angle, the sampling resolution of amounts to points. Denoting , computing the distances in (6.12) requires operations, out of which operations are required to compute the coefficients , and operations to compute (6.7) using a fast polynomial transform based -FFT.
Forming the generalized Fourier coefficients matrices of (4.11) when using a -FFT requires operations. Forming the sequence of matrices (6.13) in the last step of Algorithm 1 requires operations, and evaluating the eigenvalues and eigenfunctions of (6.13) requires additional operations. Thus, the computational complexity of Algorithm 1 amounts to operations in total.
7 Summary and future work
In this work, we extended the graph Laplacian to data sets that are closed under the action of a matrix Lie group. To that end, we introduced the -invariant graph Laplacian (the -GL), that incorporates the group action into its construction, by considering the pairwise distances between all points generated by applying the group action to the given data set. We have shown that the -GL converges to the Laplace-Beltrami operator , at a rate accelerated proportionally to the dimension of the group. This accelerated rate implies that it is advantageous to employ the -GL for graph Laplacian based methods [28, 2, 5] whenever the data set is equipped with a known group action, since faster convergence implies that significantly less data is required for a prescribed accuracy. We also derived the eigendecomposition of the -GL, showing that its eigenfunctions have a separable form, where the dependence on the group is expressed analytically using the irreducible unitary representations of the group.
We then demonstrated how the -GL can be employed to denoise a noisy sample from the 4-sphere by using a discrete Fourier analysis type algorithm, with the Fourier modes replaced by the eigenfunctions of the -GL.
As of future research, an important direction is to investigate the spectral convergence (see [9]) of the -GL, that is, the convergence of its eigenvectors and eigenvalues to those of . Another, could be to further develop applications of the -GL, e.g., in electron-microscopy imaging [20].
Acknowledgements
PH and JK were supported in part by NSF Award DMS-2309782 and start-up grants provided by the College of Natural Sciences and Oden Institute for Computational Engineering and Sciences at the University of Texas at Austin. XC was supported in part by NSF-BSF award 2019733. ER and YS were supported by NSF-BSF award 2019733 and by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH). YS was supported also by the NIH/NIGMS Award R01GM136780-01.
Appendix A The FFT over
We now describe how to efficiently compute the Fourier series of a function defined over the group whose elements are given by (3.1).
The explicit form of the series is given in (3.9), with the IURs of enumerated by the set of non-negative half integers , and for all .
Recall that the Fourier series of a function over a Lie group is given by the elements of its IURs.
The elements of the IURs of are given by (see [47])
(A.1)
with given by
(A.2)
where are the Jacobi polynomials (see [47]).
Now, by (3.7) and (3.9), the generalized Fourier coefficients of a function are given by
(A.3)
These coefficients can be approximated rapidly to an arbitrary accuracy as we now describe.
Note that the functions in separate into a product of factors each depending on a single angle , , or . Thus, (A.3) can be computed by integrating over each of the angles successively, as we now show.
Set a bandlimit depending on the required accuracy, and an integration parameter . We begin by evaluating the integrals
(A.4)
of multiplied by the conjugate of the factor depending on in (A.1)
for each , all , and all , where are the Gauss-Legendre quadrature nodes for some (the reason for this choice of s will become apparent shortly), by
(A.5)
Using applications of the classical FFT, the entire computation is accomplished by using operations. Next, we evaluate the integrals
(A.6)
of multiplied by the conjugate of the factors in (A.1) that depend on and , for each , and all values of and used in the previous computation, by
(A.7)
Using applications of the FFT, the latter computation amounts to a total of operations.
Algorithm 2 -FFT
1:Input:
1.
Integration parameter .
2.
Function .
3.
Precomputed weights and nodes for Gauss-Legendre quadrature.
2:fordo
3:fordo
4:fordo
5:fordo
6:fordo
(A.8)
7:endfor
(A.9)
8:endfor
(A.10)
9:endfor
10:endfor
11:endfor
12:The generalized Fourier coefficients .
Lastly, we evaluate
(A.11)
for each , and all and from the previous computation, by
(A.12)
using Gauss-Legendre quadrature with precomputed weights . The latter computation is accomplished using direct evaluations of size , amounting to a total complexity of operations.
We point out that (A) can also be computed using operations by applying fast polynomial transforms (see e.g. [34]), bringing the overall complexity of the entire algorithm to operations. However, after some experimentation, we found that while the direct computation of (A.12) is asymptotically more expensive, in practice, utilizing GPUs to evaluate it is substantially faster than the available algorithms. Unfortunately, utilizing GPUs does not easily lend itself for speeding up fast polynomial transform algorithms, due to their iterative nature. The entire procedure of evaluating the integrals in (A.3) is outlined in Algorithm 2.
Lastly, we note that the method described above can be applied to by restricting all computations to the integer valued IURs of , and the angle to .
For any , expanding (4.6) by using (4.4) and (4.5), we obtain that
(B.1)
which implies that
(B.2)
Next, by using the left-invariance property (3.6) of and (4.9), for any and we have that
(B.3)
where we made the change of variables in the second equality.
Thus, using that (by the definition of in (4.4)), we can write the first expression on the r.h.s of (B.2) as
For any , by plugging (4.10) into (4.4), and using (3.9) we have for any that
(C.1)
where we denote by and the entries of and , respectively, and enumerates the IURs of . Next, by using the homomorphism property of group representations (3.8), we have
(C.2)
We now show that for a given , and the function of (4.14) is an eigenfunction of .
Extending the notation of (4.13), for any and all , we denote the entries of the vector by
(C.3)
Now, the homomorphism property (3.8) implies that . Thus, plugging into (C.2), and using the notation in (C.3), and that
(C.4)
the expression for is given by
(C.5)
By Schur’s orthogonality relations (see e.g. [11]), we have
(C.6)
by which we get that the expression in (C.5) for becomes
(C.7)
Next, we notice that
(C.8)
where is the block matrix of Fourier coefficients matrices of order that was defined in (4.12), and is the row of the matrix consisting of blocks of . Thus, we get that
(C.9)
Next, we notice that by the definition of in statement of the theorem, we have that
(C.10)
That is, the -th element of the -th block of the matrix is given by defined in , for all .
Thus, by (4.6), (C.9) and (C.10), we have
(C.11)
Thus, since is an eigenvector of corresponding an eigenvalue , we get
(C.12)
showing that the function
is an eigenfunction of in (4.6).
Next, we show that the eigenfucntions in (4.14) are orthogonal. Indeed, we have that
(C.13)
The outer product of rows is a matrix of products between elements of the IURs of , and by Schur’s orthogonality relations, we have that
(C.14)
Thus, when and , we are left with
(C.15)
which shows that and are orthogonal.
To show that the eigenfunctions in (4.14) form a basis for , we first assert that the matrices in (4.11) are hermitian. For the latter we require the following result (see p.82 in [10]).
Lemma 12.
Let be a compact unitary matrix Lie group. Then, we have that
which shows directly that any function can be expanded in a series of eigenfunctions of the -GL.
Lastly, the fact that the eigenvalues of are real and non-negative is a direct result of Lemma 9, coupled with the fact that is a symmetric operator, since we have that for all and all .
The analysis that follows is a generalization of the proof of Theorem 2 in [38].
The proof is divided into 4 parts, given in appendices D.1,D.2D.5 and D.6.
In part 1, we show that the -invariant graph Laplacian converges to the Laplace-Beltrami operator on the data manifold .
In part 2, we derive the convergence rate (the variance term) of our operator, using the proof technique derived in [40] for the standard graph Laplacian. In parts 3 and 4, we provide proof for key results that are used in part 2. Appendices D.3 and D.4 provide some differential geometry background needed in for D.5.
D.1 Convergence of the -invariant graph Laplacian
In this section, we show that for a fixed and as , the normalized -invariant graph Laplacian approximates the Laplace-Beltrami operator on the data manifold up to an error at each data point . We will assume w.l.o.g that , since all the analysis that follows can be carried out exactly in the same manner and with the same results when .
We now derive the limit of (D.1) for and a fixed , showing that it is essentially the Laplace-Beltrami operator with an additional bias error term of .
First, let us focus on the expression
(D.2)
which is the numerator of the second term of (D.1) (inside the brackets).
Let us define
(D.3)
Since are i.i.d samples from , the law of large numbers implies
(D.4)
(D.5)
where is the sampling density of the data over , and is the measure with respect to the Riemannian metric on induced by the standard Euclidean inner product in .
Next, we recall that acts on points by multiplication by unitary matrices .
Consider the map defined by
(D.6)
The pushforward of by is the measure over defined by
(D.7)
for all Lebesgue-measurable subsets .
Since acts as an isometry over , and the metric tensor over is invariant under isometries, we conclude that is -invariant. That is, for fixed we have
(D.8)
for all Lebesgue-measurable subsets .
Using the latter observation, and assuming that is uniform over (and so ) we have
(D.9)
where in the second equality we applied the change of variables , and in the fourth equality that , by (3.5).
In a similar fashion, if we consider the denominator of the second term in (D.1)
(D.10)
and by repeating the calculations carried above for but with , we get that
(D.11)
where we defined
(D.12)
Lastly, if we substitute (D.1) and (D.11) into (D.1), we have that
The variance error in the approximation of the Laplace-Beltrami operator by the -GL (second term on the r.h.s of (4.17)),
is attributed to the difference between the values of and for a finite and their limit when .
To derive the variance error we employ the proof technique derived in [40]. As in Section D.1, we perform all our analysis in a neighbourhood of an arbitrary data point , assuming w.l.o.g that .
Using the definitions (D.3) and (D.12), the normalized -GL applied to an arbitrary smooth function on , and evaluated at the fixed point can be written as
(D.15)
Following [40], we employ the Chernoff tail inequality to bound the probability of (D.15) deviating from its mean (the limit of (D.15) when ).
We now derive a bound on the probability of the -error
(D.16)
where we point out that excluding the diagonal terms and results in an even smaller error than the variance error itself, as was shown in [38] and [40].
We also point out that a bound on the probability
(D.17)
can be obtained by the same technique that we now apply to bound (D.16).
To evaluate all the moments in (D.21), and consequently the quantities and in (D.20), we use the following result from [40], which will be the key instrument in the analysis that follows.
Theorem 13.
Let be a smooth and compact -dimensional submanifold, and let be a smooth function. Then, for any
(D.22)
where is a scalar function of the curvature of at .
This shows that the integral on the l.h.s of (D.22) essentially operates as an evaluation functional of at the point , up to an error.
Applying Theorem 13 to the first order moments appearing in (D.21), we immediately obtain
(D.23)
and
(D.24)
Thus, in order to evaluate in (D.21), it remains to evaluate the second order moments and , which we carry out in two steps in the following two sections.
First, in Section D.5 we construct a local parametrization of in a sufficiently small neighborhood of , such that each is mapped to a unique pair , where all the values reside on a -dimensional submanifold , and .
Next, in Section D.6, we use the results of Section D.5 to reduce integration over in the expressions for the second order moments in (D.21) to integration over , leading to the following lemma.
Lemma 14.
There exist a smooth function over , and a smooth function over such that
We now obtain Theorem 11, and in particular (4.17), by repeating the computations in equations in the proof of Lemma in [38], with replaced by an arbitrary group dimension .
D.3 Real manifolds embedded in
Before we continue with the proof of Theorem 11, we describe the way we view real manifolds embedded in a complex vector space.
Firstly, we point out that we say that a -dimensional manifold is real, in the sense that its charts are given by maps of the form , where is an open subset in . This is in contrast to complex manifolds that admit charts that map open subsets in the manifold to the unit disk in . The crucial distinction between the two is that real manifolds admit a differentiable structure where the transition maps between charts are differentiable with respect to real variables, while complex manifolds admit transition maps that are holomorphic.
Specifically , we can formulate our entire analysis in a real space by identifying with via the map
(D.29)
If we equip with the real valued inner product given by
Now, let be an embedded -dimensional submanifold, and let be a local chart on , that is, is an open subset, and is a diffeomorphism that maps onto an open subset of , where we identify with the set
using the map in (D.29).
The inverse map parametrizes the points as . The Jacobian matrix of the latter parametrization is given in coordinates by
(D.32)
Thus, denoting
(D.33)
the metric tensor induced on by the dot product (D.30) is given in local coordinates as
(D.34)
The latter enables us to integrate smooth functions over open subsets by
(D.35)
where is the volume form on given by
(D.36)
and
(D.37)
D.4 Coordinate charts on Lie groups
The next part of the proof of Theorem 11 also requires us to define coordinate charts on Lie groups.
The standard coordinates on a Lie group is given by the exponential map over the Lie-algebra of . In detail, the Lie-algebra of is the tangent space to at the identity . By a theorem (due to Von-Neumann, see [23]), there exists a sufficiently small neighborhood of , where for each there exists a unique element such that , where is the matrix exponential. Thus, choosing a basis for ,
we can write each element as a linear combination
(D.38)
inducing a coordinate chart for , such that the elements of are given explicitly by the matrix valued map
(D.39)
where is an open subset.
Multiplying the elements of by a fixed element (either from the left or the right) translates to a neighborhood of .
A chart for is thus given by (also see [44])
(D.40)
Hence, an atlas of charts for can be obtained by choosing a finite cover of (since is compact) by such neighborhoods.
Equipped with the map (D.40),
a chart for a neighborhood of is obtained by multiplying by (D.40) on the left, that is
(D.41)
Since is diffeomorphic to , and thus compact, an atlas of charts for is obtained by choosing a finite covering of .
In this work, to simplify notation, we refer to all charts in the atlas for by the notation , where we define implicitly that
(D.42)
whenever we refer to points in a chart for a neighborhood of .
D.5 G-invariant local parametrization of the data manifold
In this section, we construct a parametrization of in a local neighbourhood around the point , which takes the form of a product between and a certain -dimensional submanifold in , and derive the integration volume form over in terms of the resulting local coordinates. The parametrization we construct is -invariant in the sense that for every we have . As in the previous sections, for the rest of the this section, we will assume w.l.o.g that .
To construct our parametrization, for each we consider the solution to minimization problem
(D.43)
and the value . In other words, for each we solve for the element such that is the point on the orbit closest to .
Since is compact, a solution for (D.43) exists.
In Lemma 17 below, we prove that there exists a certain neighborhood of , such that the solution of
(D.43) is also unique for each in this neighborhood.
Subsequently, we parameterize points by
(D.44)
where , and is the set of unique solutions of (D.43) for all .
The proof of Lemma 17 requires the notion of a -neighborhood of a manifold.
Definition 15.
Let be a smooth compact embedded submanifold. Given a , the -neighborhood of is defined as
(D.45)
Our proof also requires the following property of -neighborhoods.
Theorem 16.
There exists a such that any has a unique closest point in .
For a proof, see Theorem 6.24 and Proposition 6.25 in [30].
Lemma 17.
There exists a such that the problem (D.43) has a unique solution for any in a -neighborhood of . Furthermore, the -neighborhood is -invariant.
Proof.
By assumption, with probability one we have that for all . Since is a smooth manifold, the map , , is a smooth injective map onto the orbit . This implies that is a smooth -dimensional compact embedded submanifold in , diffeomorphic to .
By Theorem 16, there exists a such that for any (see Definition 15) there exists a unique solution to the problem
(D.46)
which shows that there exists a unique point closest to , which is given by , where is the unique solution to (D.43).
Moreover, we observe that
(D.47)
for all , which shows that any point on the orbit has a point at the minimal distance . Thus, using that (since ) Definition 15 implies that
(D.48)
for all ,
which shows that the -neighborhood is -invariant.
∎
Now, let us denote
(D.49)
for as in Lemma 17. The subset is an open subset of and thus a submanifold in . Furthermore, Lemma 17 also implies that the neighborhood is -invariant, and since is also -invariant than so is .
Let us further denote by
(D.50)
the set resulting from solving (D.43) for .
Using (D.49) and (D.50) we can write
(D.51)
We now show that is an embedded compact -dimensional submanifold in . We do this by deriving an explicit solution for (D.43) for all .
Note that (D.51) is diffeomorphic to the product space , which implies that is a -dimensional submanifold in .
Now, denoting , and differentiating the norm in (D.43) with respect to for each , the solution for (D.43) is given by that solves the set of equations
(D.52)
which are equivalent to
(D.53)
where , since we defined to be the closest point in to . In particular, we have , where is the unique solution to (D.43).
The expression on the l.h.s of the inner product in (D.53) is a vector tangent to at , since it is the derivative of the map (an explicit parametrization of the orbit ) at , for which .
Thus, by our discussion in Section D.3, and in particular (D.29)-(D.31), equation (D.53) simply implies that the closest point to on the orbit is such that is perpendicular to the tangent space of at .
We may now rewrite (D.53) as
(D.54)
where resides in the tangent space to at , given by the Lie algebra of . Now, the Lie-algebra of is the space of all skew-Hermitian matrices, and by a theorem (see [21]), if is a Lie subgroup of , then is a -dimensional subspace of .
Using the fact that the diagonal entries of skew-Hermitian matrices are all purely imaginary, we have for any
(D.55)
where in passing to the second equality we switched the roles of and in the third sum and used that since is skew-Hermitian.
Plugging (D.5) into (D.54), we obtain
(D.56)
Then, substituting (D.56) into (D.53), we are left with
We now observe that is the intersection of an open neighborhood of with the subspace of defined by the linear constraints in (D.57). In the following lemma we show that is a -dimensional submanifold in .
Lemma 18.
The set in (D.58) is a -dimensional submanifold in .
Proof.
In the following, we use the formulation of real manifolds in presented in Section D.3. In particular, by using the map in (D.29), let us define
(D.59)
Clearly, the manifold is diffeomorphic to , and by (D.30) and (D.31), the map restricted to is a Riemannian isometry, preserving the metric tensor of .
Furthermore, defining
(D.60)
we have that , that is, the map restricted to is a bijection (and a isometry) onto .
Thus, it suffices to show that is a -dimensional submanifold in , which we now do.
The proof utilizes the implicit function theorem.
By a theorem (see proposition 5.16 in [30]), there exists a neighborhood of in , a diffeomorphism onto its image , and such that can be parameterized as
(D.61)
where are coordinates for .
In other words, the neighborhood of is a level set of . Now, consider the set of equations
(D.62)
Since is a diffeomorphism, its differential has full rank for all points in . Hence, the matrix
(D.63)
has full rank for all .
Next, let , and consider the set of equations
(D.64)
By a direct computation, we get that
(D.65)
where is the image of the map applied to .
Now, we observe that by (D.54), we have that
(D.66)
hence, the vectors are the rows of the differential of the map at , which has full rank, since by assumption the map is a diffemorphism. Thus, the vectors are linearly independent. Since the map is an isometry, we infer that the vectors are also linearly independent, whence we get that (D.65) has full rank.
Next, we observe that since is a diffeomorphism onto , by (D.66), the vectors reside in , the tangent space to at , and since is a Riemannian isometry of onto , we conclude that the vectors are tangent to . On the other hand, the vectors are all perpendicular to the neighborhood of , since it is defined as the level set , and therefore, they are perpendicular to all the vectors tangent to at . Hence, the matrix
(D.67)
has full rank at . Thus, there exists a subset of columns of (D.67) that form a matrix , which has a full rank. In particular, we have that .
Lastly, by (D.57), the point is a solution of (D.5), and by construction, also a solution of (D.5), and thus a solution of (D.67).
Hence, by the implicit function theorem, there exists an open subset , and open subsets and such that , and coordinates , and smooth functions from onto such that
By (D.68), we conclude that is a -dimensional smooth submanifold in of (D.59).
Now, we can redefine in (D.58) as
(D.71)
Moreover, we can take to be closed in and small enough so that , which guarantees that the problem (D.43) has a unique solution for each in . Furthermore, since and are isometric to and , respectively, we conclude that is a -dimensional compact submanifold in .
∎
Next, we show how to integrate over using our -invariant parametrization.
Let denote some coordinate chart on in (D.58), and let be the coordinate chart on in (D.39).
The integral of a smooth function over is given by the change of variables (see [46])
(D.72)
where we denote
(D.73)
and
(D.74)
is the volume form at , and is the metric tensor on given by
(D.75)
and is the Jacobian change of variables matrix, given explicitly by
(D.76)
In the following section we prove Lemma 14, which requires a careful asymptotic approximation of the second moment in (D.25) with respect to the uniform distribution over . The proof employs the relationship between the Haar measure on , and a certain measure induced by our -invariant parametrization on orbits of the form , which we now define.
First, we note that diffeomorphism admits an inverse map given by
(D.77)
which induces a topology on given by
(D.78)
Next, consider the function over defined by
(D.79)
where
(D.80)
and was defined in (D.76). The following lemma asserts that is a measure over , and characterizes its relationship to the Haar measure on .
Lemma 19.
For every , the function is a measure over with the topology . Furthermore, define the pushforward of by the map in (D.77), as the function over the Borel -algebra of given by
(D.81)
for every Borel subset . Then, with probability one we have that is a measure over . Furthermore, there exists a constant such that
(D.82)
where is the Haar measure over .
Proof.
To see that is a measure, first we note that since is non-negative then so is . Furthermore, is bounded since for any we have that
(D.83)
where the last inequality is due to the fact that is compact.
Thus, it only remains to show that is countably additive over . Indeed, the map being a homeomorphism preserves the topology of (see [32]), and in particular, it holds that for any countable family of disjoint open sets we have
(D.84)
In other words, the map preserves disjoint unions.
Thus, by (D.79) and (D.84) we have
(D.85)
We conclude that is a measure over , the latter having the topology of (induced by ).
Next, we show that is left invariant under the action of , that is, for any measurable subset , we have
Now, the map
induces a measure on via pushforward, defined explicitly by
(D.90)
for every Borel measurable subset .
Intuitively, the function measures the volume of a subset by first map** into the orbit , and then measuring the volume of the image .
By (D.86), for a fixed and any open subset we have
(D.91)
which shows that the measure is left-invariant. By Haar’s theorem for compact groups there exists, up to multiplication by a positive scalar, a unique left invariant measure over . It follows that for every there exists a positive scalar such that
(D.92)
which in turn implies that is related to the Haar measure by
(D.93)
In particular, plugging into (D.93), and using (3.5) we get
(D.94)
which shows that is the volume of the orbit . By assumption, with probability one we have that for all , and thus the map is a diffeomorphism of onto . Hence, we conclude that .
∎
In this section, we evaluate the second order moment , which appears in the evaluation of (D.21). The evaluation of the remaining second order moments in is done in a very similar fashion.
Now, recall that in Section D.5 we constructed a -invariant parametrization of a certain neighborhood of the data point , such that for all .
Thus, by (D.3) we have
(D.95)
where the second equality stems from the fact that , and the integrand of is a Gaussian of width centered at .
We point out that the exponentially small error term on the r.h.s of (D.95) is negligible with respect to the polynomial asymptotic error in (D.25) that we are about to derive, and thus will be dropped in all subsequent analysis.
Furthermore, for a fixed there exist and such that
(D.96)
where in the second equality we used the change of variables , and that the Haar measure on a compact group is right invariant. Therefore, continuing from (D.95) and using (D.72) we can write
where we used (D.90) and (D.93) in the first equality, the definition of the map in (D.77) in the second equality, the change of variables theorem for pushforward measures (see Theorem 3.6.1. in [6]) for in the third one, and Theorem 13 in the last equality, where we note that is the Riemannian volume element of the -dimensional manifold .
For the second term in (D.100), using the same change of variables as in (D.6) we have
(D.102)
where we used that by (D.99).
Furthermore, we have
(D.103)
(D.104)
where we have used the multivariate version of the formula for the second derivative of a product of functions (see Lemma 3.3 in [36]), and that by (D.99)
(D.105)
where we used the fact that in (D.105) is a coordinate function on , that is, the function returns the coordinates in of , and thus the vectors reside in the tangent space to at , and that, by construction, the vector is perpendicular to (see (D.53)).
Continuing, we now define the function
(D.106)
for all , where the expression is the vector valued function whose entries are the Laplacians of each coordinate function of the parametrization of via , evaluated at .
By the Cauchy-Schwart inequality combined with the compactness of we get
(D.107)
which leads to
(D.108)
Now, substituting (D.6) and (D.108) into (D.100) we have that
(D.109)
Thus, we get
(D.110)
after suppressing higher order terms. Plugging (D.110) into (D.97), we get
(D.111)
Applying Theorem 13 to each term inside the integral (D.111), we have that
(D.112)
and that
(D.113)
where we used that by (D.107) we have that , and using that vanishes at we get that
First, let us compute the limiting operator resulting from assuming a non-uniform sampling distribution in the setting of Theorem 11, by repeating the analysis of the bias error at the beginning of D.1 under this assumption. Fixing an , we compute the limit of (D.1) as . By (D.2)-(D.5), we have that the limit of in (D.2) evaluates as
(E.1)
Making the change of variables , and using (D.6)-(D.8) we have that
(E.2)
where we defined
(E.3)
Similarly, we get that the limit of in (D.10) as is given by
(E.4)
By (E) and (E), we obtain that the the limit of (D.1) when is given by
This shows that in case that in non-uniform, the normalized -GL converges to an operator that is different from the Laplace-Beltrami operator , namely, the Fokker-Planck operator on which depends on the density .
Nevertheless, following [38] and [28], we now show that we can retrieve by normalizing the kernel function in (4.9), as follows. Let us define for all
(E.7)
and for all
(E.8)
Then, we define the density-normalized -invariant graph Laplacian as
(E.9)
By repeating the computations in equations in [38], we obtain that
Appendix F Eigendecomposition of the normalized -GL
We now restate Theorem 10 for the operator in (4.8), the normalized version of the -GL. The proof is obtained by repeating that of Theorem 10, with the matrices replaced by the matrix sequence
(F.1)
of (4.15), with required changes made in equations (C)-(C.12), and omitting the proof of orthogonality which doesn’t hold in this case.
Theorem 20.
For each , let be the block-diagonal matrix who’s -th block of size on the diagonal is given by the product of the scalar in (4.5) with the identity matrix.
Then, the normalized -invariant graph Laplacian admits the following:
1.
A sequence of non-negative eigenvalues , where is the -th eigenvalue of the matrix .
2.
A sequence of eigenfunctions, which are complete in and are given by
(F.2)
where is the eigenvector of which corresponds to its eigenvalue . For each and , the eigenvectors correspond to the eigenvalue of the normalized -invariant graph Laplacian.
References
[1]
S. Axler, P. Bourdon, and W. Ramey.
Harmonic Function Theory.
Springer, Springer-Verlag New York, Inc., 2001.
[2]
M. Belkin and P. Niyogi.
Laplacian eigenmaps for dimensionality reduction and data
representation.
Neural computation, 15(6):1373––1396, 2003.
[3]
M. Belkin and P. Niyogi.
Laplacian eigenmaps for dimensionality reduction and data
representation.
Neural computation, 15(6):1373–1396, 2003.
[4]
M. Belkin and P. Niyogi.
Advances in Neural Information Processing Systems 19:
Proceedings of the 2006 Conference.
MIT Press, 2007.
[5]
M. Belkin, P. Niyogi, and V. Sindhwani.
Manifold regularization: A geometric framework for learning from
labeled and unlabeled examples.
Journal of Machine Learning Research, 7(6):2399––2434, 2006.
[7]
D. Bump.
Lie Groups.
Springer, Springer-Verlag New York, Inc., 2004.
[8]
L. Chen.
Curse of Dimensionality, pages 545–546.
Springer US, Boston, MA, 2009.
[9]
X. Cheng and N. Wu.
Eigen-convergence of gaussian kernelized graph Laplacian by
manifold heat interpolation.
Applied and Computational Harmonic Analysis, 61:132–190, 2022.
[10]
G.S. Chirikjian.
Stochastic Models, Information Theory, and Lie Groups, Volume
2.
Birkhäuser, Birkhäuser Boston, 2010.
[11]
G.S Chirikjian and A.B Kyatkin.
Engineering applications of noncommutative harmonic analysis
with emphasis on rotation and motion groups.
CRC Press LLC, CRC Press LLC, Boca Raton, Florida., 2001.
[12]
F. R. K. Chung.
graph spectral theory.
American Mathematical Society, 1997.
[13]
Taco Cohen and Max Welling.
Group equivariant convolutional networks.
ArXiv, abs/1602.07576, 2016.
[14]
S. Dieleman and K. De Fauw, J.and Kavukcuoglu.
Exploiting cyclic symmetry in convolutional neural networks.
In Proceedings of The 33rd International Conference on Machine
Learning, volume 48 of Proceedings of Machine Learning Research, pages
1889–1898, New York, New York, USA, 20–22 Jun 2016. PMLR.
[15]
Andreas Doerr.
Cryo-electron tomography.
Nat Methods, 14(7):664–665, 2017.
[16]
M. Eller and M. Fornasier.
Rotation invariance in exemplar-based image inpainting.
Variational Methods: In Imaging and Geometric Control, 18:108,
2017.
[17]
Y. Fan, T. Gao, and Z. J. Zhao.
Unsupervised co-learning on g-manifolds across irreducible
representations, 2019.
[18]
B. Fasel and D. Gatica-Perez.
Rotation-invariant neoperceptron.
In 18th International Conference on Pattern Recognition
(ICPR’06), volume 3, pages 336–339, 2006.
[19]
G.B. Folland.
A Course in Abstract Harmonic Analysis.
CRC Press, Boca Raton, Florida, 2001.
[20]
Joachim Frank.
Three-Dimensional Electron Microscopy of Macromolecular
Assemblies: Visualization of Biological Molecules in Their Native State.
Oxford, 2006.
[21]
J. Gallier and J. Quaintance.
Differential Geometry and Lie Groups: A Computational
Perspective.
Number 12 in Geometry and Computing. Springer, 2020.
[22]
C. Godsil and G.F. Royle.
Algebraic Graph Theory.
Graduate Texts in Mathematics. Springer, 2001.
[23]
B.” ”Hall.
”Lie Groups, Lie Algebras, and Representations: An Elementary
Introduction”.
”Springer”, ”Springer International Publishing Switzerland”, ”2015”.
[24]
P. Hoyos and J. Kileel.
Diffusion maps for group-invariant manifolds.
Arxiv:2303.16169, 2023.
[25]
Ryan K. Hylton and Matthew T. Swulius.
Challenges and triumphs in cryo-electron tomography.
iScience, 24(9):102959, 2021.
[26]
Z. Ji, Q. Chen, Q.-S. Sun, and D.-S. Xia.
A moment-based nonlocal-means algorithm for image denoising.
Information Processing Letters, 109(23-24):1238–1244, 2009.
[27]
J. Kileel, A. Moscovich, N. Zelesko, and A. Singer.
Manifold learning with arbitrary norms.
Journal of Fourier Analysis and Applications, 27(5), 2021.
[28]
S. Lafon and R.R. Coifman.
Diffusion maps.
Applied and Computational Harmonic Analysis, 21:5–30, 2006.
[29]
B. Landa and Y. Shkolnisky.
Steerable principal components for space-frequency localized images.
SIAM Journal on Imaging Sciences, 10(2):508–534, 2017.
[30]
J.M Lee.
Introduction to smooth manifolds, Second Edition.
Number 218 in Graduate Texts in Mathematics. Springer, 2013.
[31]
D. Marcos, M. Volpi, and D. Tuia.
Learning rotation invariant convolutional filters for texture
classification.
2016 23rd International Conference on Pattern Recognition
(ICPR), pages 2012–2017, 2016.
[32]
J. Munkres.
Topology.
Pearson Modern Classics for Advanced Mathematics. Pearson Education,
Inc., 2000.
[33]
D. Potts, J. Prestin, and J. Vollrath.
A fast algorithm for nonequispaced fourier transforms on the rotation
group.
Numerical Algorithms, 52:355–384, 2009.
[34]
D. Potts, G. Steidl, and M. Tasche.
Fast algortihms for discrete ploynomial transforms.
Mathematics of computation, 67(224):1577–1590, 1998.
[35]
E. Rosen, X. Cheng, and Y. Shkolnisky.
G-invariant diffusion maps.
ArXiv:2306.07350, 2023.
[36]
S. Rosenberg.
”The Laplacian on a Riemannian manifold: an introduction to
analysis on manifolds”.
”Cambridge University Press”, ”1997”.
[37]
N. Sharon, J. Kileel, Y. Khoo, B. Landa, and A. Singer.
Method of moments for 3d single particle ab initio modeling with
non-uniform distribution of viewing angles.
Inverse Problems, 36(4), 2020.
[38]
Y. Shkolnisky and B. Landa.
The steerable graph laplacian and its application to filtering image
datasets.
SIAM Journal on Imaging Sciences, 11(4):2254––2304, 2018.
[39]
P.Y. Simard, D. Steinkraus, and J.C. Platt.
Best practices for convolutional neural networks applied to visual
document analysis.
In Seventh International Conference on Document Analysis and
Recognition, 2003. Proceedings., pages 958–963, 2003.
[40]
A. Singer.
From graph to manifold laplacian: The convergence rate.
Applied and Computational Harmonic Analysis, 21(1):128––134,
2006.
[41]
A. Singer and H.-T. Wu.
Vector diffusion maps and the connection Laplacian.
Communications on pure and applied mathematics,
65(8):1067–1144, 2012.
[42]
A. Singer, Z. Zhao, Y. Shkolnisky, and R. Hadani.
Viewing angle classification of cryo-electron microscopy images using
eigenvectors.
SIAM Journal on Imaging Sciences, 4(2):723–759, 2011.
[43]
R. Talmon, I. Cohen, S. Gannot, and R.R. Coifman.
Diffusion maps for signal processing: A deeper look at
manifold-learning techniques based on kernels and graphs.
IEEE Signal Processing Magazine, 30:75–86, 2013.
[44]
K. Tapp.
Matrix Groups for Undergraduates, volume 29 of Student
Mathematical Library.
American Mathematical Society, 2005.
[45]
N. Thomas, T. E. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley.
Tensor field networks: Rotation- and translation-equivariant neural
networks for 3D point clouds.
CoRR, abs/1802.08219, 2018.
[46]
L. Tu.
Differential Geometry, Connections,Curvature, and Characteristic
Classes.
Graduate Texts in Mathematics. Springer, 2017.
[47]
N. Vilenkin.
Special Functions and the Theory of Group Representations.
The American Mathematical Society, 1968.
[48]
M. Weiler, M. Geiger, M. Welling, W. Boomsma, and T. Cohen.
3D steerable CNNs: Learning rotationally equivariant features in
volumetric data.
In Proceedings of the 32nd International Conference on Neural
Information Processing Systems, NIPS’18, pages 10402–10413, Red Hook, NY,
USA, 2018. Curran Associates Inc.
[49]
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow.
Harmonic networks: Deep translation and rotation equivariance.
2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 7168–7177, 2017.
[50]
X. and H. Wu.
Convergence of graph laplacian with knn self-tuned kernels.
ArXiv:2011.01479, 2020.
[51]
S. Zhang, A. Moscovich, and A. Singer.
Product manifold learning.
In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of
The 24th International Conference on Artificial Intelligence and Statistics,
volume 130 of Proceedings of Machine Learning Research, pages
3241–3249. PMLR, 2021.
[52]
Z. Zhao, Y. Shkolnisky, and A. Singer.
Fast steerable principal component analysis.
IEEE Transactions on Computational Imaging, 2(1):1–12, 2016.
[53]
Z. Zhao and A. Singer.
Rotationally invariant image representation for viewing direction
classification in cryo-EM.
Journal of structural biology, 186(1):153–166, 2014.
[54]
S. Zimmer, S. Didas, and J. Weickert.
A rotationally invariant block matching strategy improving image
denoising with non-local means.
In Proc. 2008 International Workshop on Local and Non-Local
Approximation in Image Processing, pages 135–142, 2008.