License: arXiv.org perpetual non-exclusive license
arXiv:2403.03670v1 [cs.LG] 06 Mar 2024

CDC: A Simple Framework for Complex Data Clusteringthanks: Z. Kang, X. Xie, B. Li, E. Pan are with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China (e-mail: [email protected]; [email protected]; bingheng86, [email protected]).

Zhao Kang, Xuanting Xie, Bingheng Li and Erlin Pan
Abstract

In today’s data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.

Index Terms:
Anchor graph, clustering, large-scale data, topology structure, multiview learning

I Introduction

Clustering is a fundamental technique for unsupervised learning that groups data points into different clusters without labels. It is driven by diverse applications in scientific research and industrial development, which induce complex data types [1], such as multi-view, non-Euclidean, and multi-relational. Specifically, in many real-world applications, data are often gathered from multiple sources or different extractors and therefore exhibit different features, dubbed as multi-view data [2]. Despite the fact that each view may be noisy and incomplete, important factors, such as geometry and semantics, tend to be shared across all views. Different views also provide complementary information, so it is paramount for multi-view clustering (MVC) methods to integrate diverse features. For example, [3] learns a consensus graph with a rank constraint on the corresponding Laplacian matrix from multiple views for clustering. [4] employs intra-view collaborative learning to harvest complementary and consistent information among different views.

Along with the development of sophisticated data collection and storage techniques, the size of data increases explosively. To handle large-scale data efficiently, several MVC methods with linear complexity have been proposed. [5] learns an anchor graph for each view and concatenates them for multi-view subspace clustering. [6], [7], and [8] construct bipartite graphs or learn representations based on anchors. [9] effectively integrates the collaborative information from multiple views via learning discrete representations and binary cluster structures jointly. Despite these progresses, they often produce unstable performance towards different datasets because of the randomness in anchor selection.

Recently, non-Euclidean graph data become pervasive since they contain not only node attributes but also topological structure information, which characterizes relations among data points [10]. Social network users, for example, have their own profiles and social relationships reflected by the topological graph. Traditional clustering methods exploit either attribute or graph structure and could not achieve the best performance [11]. Graph Neural Network (GNN) is a powerful tool to simultaneously explore node attribute and structure [12]. Based on it, several graph clustering methods have been designed [13, 14, 15]. In some applications, the graph could exhibit multi-view attributes or multi-relational. To cluster multi-view graphs, [16] learns a representation for each view and forces them to be close. To handle multi-relational graphs, [17] finds the most informative graph to recover multiple graphs.

Though the remarkable success of GNN-based methods in graph clustering, there is still one crucial question, i.e., scalability, which prevents the deployment of them to web-scale graph data. For example, ogbn-papers100M contains more than 100M nodes, which could not be processed by most graph clustering methods. Although [18, 19] have made advances in scalable graph clustering through applying a light-weight encoder and contrastive learning, their performance highly depends on graph augmentation. Therefore, scalability for graph clustering is still under-explored, and more dedicated efforts are pressingly needed.

We can see that some specialized methods have been developed to address one of the above problems and there lacks a unified model for complex data clustering that generalizes well, while still being scalable. To fill this gap, we propose a simple yet effective framework for Complex Data Clustering (CDC). We first use graph filtering to fuse raw features and topology information, which produces cluster-ability representations and provides a flexible way to handle different types of data. Rather than construct a complete graph from all data points, CDC learns anchor graphs, resulting in linear computation complexity. In particular, we generate anchors adaptively with a similarity-preserving regularizer to alleviate the randomness from anchor selection. To summarize, we make the following contributions:

  • We propose a simple clustering framework for complex data, e.g., single-view and multi-view, graph and non-graph, small-scale and large-scale data. Our method has linear time and space complexity.

  • (Section III) We are the first to propose a similarity-preserving regularizer to automatically learn high-quality anchors from data.

  • (Section

    (Section V) CDC achieves impressive performance on 14 complex datasets. Most notably, it scales beyond the graph with more than 111M nodes.

II Related Work

II-A Multi-view Clustering

MVC methods generally focus on enhancing performance by utilizing the global consensus and complementary information among multiple views. [20] mines view-shared information via adding a sample-level contrastive module to align angles between representations. [21] uses Hilbert Schmidt Independence Criterion (HSIC) to explore underlying cluster structure shared by multiple views. [22] generates an automatic partitioning with data of multiple views via a multi-objective clustering framework. [23] achieves cross-view consensus by projecting data points into a space where geometric and cluster assignments are consistent.
Different from shallow methods, deep MVC methods learn good representations via designed neural networks. [24] applies attention encoder and multi-view mutual information maximization to capture the complementary information, consistency information, and internal relations of each view. Recently, some methods have combined the contrastive learning mechanisms to obtain clustering-favorable representations. For example, [25] performs instance-level and category-level contrastive learning to improve cross-view consistency. However, these methods are not scalable to large-scale data. To reduce the complexity, [5, 26] construct bipartite graphs between cluster centroids of K𝐾Kitalic_K-means and raw data points, where the anchors are chosen randomly and fixed for subsequent learning. [27] leverages features, anchors, and neighbors jointly to construct bipartite graphs. [28] captures the view-specific and consistent information by constructing a consensus graph from view-independent anchors. Although they all have linear complexity, their performance could be sub-optimal since the pre-defined anchors are not updated according to the downstream task. Differently, we generate high-quality anchors adaptively, which is efficient and stable on complex data.

II-B Graph Clustering

Graph clustering methods aim to group nodes based on node attributes and topological structure information. Some representation learning methods, such as Node2vec [29] and GAE [30], can be used to learn embeddings for traditional clustering techniques. However, the obtained embeddings might not be appropriate for clustering because they are not specific to learn representations of cluster-ability. MVGRL [31], BGRL [32], and GRACE [33] obtain classification-favorable representation via contrastive graph learning, but are not applicable on large-scale graph due to their computation cost of data augmentation. Although MCGC [11] is augmentation-free by regarding k-nearest neighbors as positive pairs, the algorithm has a square complexity. Other deep graph clustering methods, like SDCN [34], DFCN [35], and DCRN [36], achieve promising performance via training MLP and GNNs jointly on small-scale/medium-scale graph. MvAGC [37] has low complexity, but is not efficient owing to its anchor sampling strategy. Hence, these graph clustering methods cannot effectively and efficiently handle large-scale graph. Though S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC [18] obtains promising results on extra-large-scale graph via random walk-based sampler and light-weight encoder, there is expensive computation cost of training. Our method can handle graph clustering in linear time with promising performance.

III Methodology

Notation

Define the generic data as 𝒢={𝒱,E1,,EV1,X1,,XV2}𝒢𝒱subscript𝐸1subscript𝐸subscript𝑉1superscript𝑋1superscript𝑋subscript𝑉2\mathcal{G}=\{\mathcal{V},E_{1},...,E_{V_{1}},X^{1},...,X^{V_{2}}\}caligraphic_G = { caligraphic_V , italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_E start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }, where 𝒱𝒱\mathcal{V}caligraphic_V represents the set of N𝑁Nitalic_N nodes, eijEvsubscript𝑒𝑖𝑗subscript𝐸𝑣e_{ij}\in E_{v}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_E start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT denotes the relationship between node i𝑖iitalic_i and node j𝑗jitalic_j in the v𝑣vitalic_v-th view. V10subscript𝑉10V_{1}\geq 0italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ 0 and V2>0subscript𝑉20V_{2}>0italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 are the number of relational graphs and attributes, and the data is non-graph when initial V1=0subscript𝑉10V_{1}=0italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0. Xv={x1v,,xNv}N×dvsuperscript𝑋𝑣superscriptsuperscriptsubscript𝑥1𝑣superscriptsubscript𝑥𝑁𝑣topsuperscript𝑁subscript𝑑𝑣X^{v}=\{x_{1}^{v},...,x_{N}^{v}\}^{\top}\in\mathbb{R}^{N\times d_{v}}italic_X start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the feature matrix, dvsubscript𝑑𝑣d_{v}italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the dimension of features. Adjacency matrices {A~v}v=1V1N×Nsubscriptsuperscriptsuperscript~𝐴𝑣subscript𝑉1𝑣1superscript𝑁𝑁\{\widetilde{A}^{v}\}^{V_{1}}_{v=1}\in\mathbb{R}^{N\times N}{ over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT characterize the initial graph structure. For non-graph data, we construct adjacency matrices in each view via the 5-nearest neighbor method. There are V𝑉Vitalic_V views after graph filtering for each dataset, where V=V1×V2𝑉subscript𝑉1subscript𝑉2V=V_{1}\times V_{2}italic_V = italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for graph data and V=V1=V2𝑉subscript𝑉1subscript𝑉2V=V_{1}=V_{2}italic_V = italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for non-graph data. {Dv}v=1V1subscriptsuperscriptsuperscript𝐷𝑣subscript𝑉1𝑣1\{D^{v}\}^{V_{1}}_{v=1}{ italic_D start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT represent the degree matrices in various views. The normalized adjacency matrix is Av=(Dv)12(A~v+I)(Dv)12superscript𝐴𝑣superscriptsuperscript𝐷𝑣12superscript~𝐴𝑣𝐼superscriptsuperscript𝐷𝑣12A^{v}=(D^{v})^{-\frac{1}{2}}(\widetilde{A}^{v}+I)(D^{v})^{-\frac{1}{2}}italic_A start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = ( italic_D start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + italic_I ) ( italic_D start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT and the corresponding graph Laplacian is Lv=IAvsuperscript𝐿𝑣𝐼superscript𝐴𝑣L^{v}=I-A^{v}italic_L start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = italic_I - italic_A start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT.

III-A Graph Filtering

Filtered features are more clustering-favorable [38], and we apply graph filtering to remove undesirable high-frequency noise while preserving the graph’s geometric features. Similar to [11], smoothed H𝐻Hitalic_H is obtained by solving the following optimization problem:

minHHXF2+12Tr(HLH).subscript𝐻superscriptsubscriptnorm𝐻𝑋𝐹212TrsuperscriptHtopLH\min_{H}\|H-X\|_{F}^{2}+\frac{1}{2}\operatorname{Tr}\left({\mathrm{H}}^{\top}% \mathrm{LH}\right).roman_min start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ∥ italic_H - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr ( roman_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_LH ) . (1)

We keep the first-order Taylor series of H𝐻Hitalic_H from Eq. (1) and apply k𝑘kitalic_k-order filtering, which yields:

H=(I12L)kX,𝐻superscript𝐼12𝐿𝑘𝑋H=(I-\frac{1}{2}L)^{k}X,italic_H = ( italic_I - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_X , (2)

where k𝑘kitalic_k is a non-negative integer and it controls the depth of feature aggregation and smoothness of representation. In addition to learning smooth features, graph filtering is also used to unify different types of data into our framework.

III-B Anchor Graph Learning

We use the idea of data self-expression to capture the relations among data points, i.e., each sample can be presented as a linear combination of other data points. The combination coefficient matrix can be regarded as a reconstructed graph [11]. To reduce the computation complexity, m𝑚mitalic_m representative samples Bm×d𝐵superscript𝑚𝑑B\in\mathbb{R}^{m\times d}italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT called anchors are selected to construct anchor graph Zm×N𝑍superscript𝑚𝑁Z\in\mathbb{R}^{m\times N}italic_Z ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_N end_POSTSUPERSCRIPT [39]. However, the performance of this approach is unstable since it introduces anchors in a probabilistic way. Moreover, once the anchors are chosen, they won’t be updated, which could lead to sub-optimal performance. To get rid of uncertainty in anchor selection, we propose to learn anchors from data, i.e., anchors B𝐵Bitalic_B are generated adaptively. To guarantee the quality of anchors, we enforce that the similarity between B𝐵Bitalic_B and H𝐻Hitalic_H is preserved, i.e., BH=Z𝐵superscript𝐻top𝑍BH^{\top}=Zitalic_B italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_Z. Then we formalize the graph learning problem as:

minZ,BHBZF2+βZF2, s.t. BH=Z,subscript𝑍𝐵superscriptsubscriptnormsuperscript𝐻topsuperscript𝐵top𝑍𝐹2𝛽superscriptsubscriptnorm𝑍𝐹2, s.t. 𝐵superscript𝐻top𝑍\min_{Z,B}\left\|H^{\top}-B^{\top}Z\right\|_{F}^{2}+\beta\left\|Z\right\|_{F}^% {2}\text{, s.t. }BH^{\top}=Z,roman_min start_POSTSUBSCRIPT italic_Z , italic_B end_POSTSUBSCRIPT ∥ italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_β ∥ italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , s.t. italic_B italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_Z , (3)

where β𝛽\betaitalic_β is a balance parameter. To make it easy to solve, we relax the above problem to:

minZ,BHBZF2+βZF2+αBHZF2.subscript𝑍𝐵superscriptsubscriptnormsuperscript𝐻topsuperscript𝐵top𝑍𝐹2𝛽superscriptsubscriptnorm𝑍𝐹2𝛼subscriptsuperscriptnorm𝐵superscript𝐻top𝑍2𝐹\min_{Z,B}\left\|H^{\top}-B^{\top}Z\right\|_{F}^{2}+\beta\left\|Z\right\|_{F}^% {2}+\alpha\|BH^{\top}-Z\|^{2}_{F}.roman_min start_POSTSUBSCRIPT italic_Z , italic_B end_POSTSUBSCRIPT ∥ italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_β ∥ italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ∥ italic_B italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_Z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT . (4)

It has two advantages over other anchor-based methods: efficient and adaptive generation of high-quality anchors. First, existing methods often repeat many times to reduce the uncertainty in results, which is time-consuming and is not suitable for large-scale data. Second, existing methods perform anchor selection and graph learning in two separate steps. By contrast, we follow a joint learning approach, where anchors and anchor graphs will be mutually boosted by each other.
For a multi-view scenario, each view could contribute differently. Therefore, we introduce learnable weights {λv}v=1Vsuperscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\{\lambda_{v}\}_{v=1}^{V}{ italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT and achieve a consensus anchor graph Z𝑍Zitalic_Z by solving the following model:

minZ,{Bv}v=1V,{λv}v=1Vsubscript𝑍superscriptsubscriptsuperscript𝐵𝑣𝑣1𝑉superscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\displaystyle\min_{Z,\{B^{v}\}_{v=1}^{V},\{\lambda_{v}\}_{v=1}^{V}}roman_min start_POSTSUBSCRIPT italic_Z , { italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT , { italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT end_POSTSUBSCRIPT v=1Vλv2(HvBvZF2\displaystyle\sum_{v=1}^{V}\lambda_{v}^{2}(\left\|H^{v}{}^{\top}-B^{v}{}^{\top% }Z\right\|_{F}^{2}∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT - italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (5)
+αBvHvZF2)+βZF2,\displaystyle+\alpha\|B^{v}H^{v}{}^{\top}-Z\|^{2}_{F})+\beta\left\|Z\right\|_{% F}^{2},+ italic_α ∥ italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT - italic_Z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) + italic_β ∥ italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
s.t.v=1Vλv=1,λv>0.formulae-sequences.t.superscriptsubscript𝑣1𝑉subscript𝜆𝑣1subscript𝜆𝑣0\displaystyle\text{s.t.}\sum_{v=1}^{V}\lambda_{v}=1,\lambda_{v}>0.s.t. ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT > 0 .

Note that we learn anchors for each view to capture distinctive information. After constructing anchor graph Z𝑍Zitalic_Z, ZΔZsuperscript𝑍topΔ𝑍Z^{\top}\Delta Zitalic_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ italic_Z can be used as input to obtain the spectral embedding for clustering in traditional anchor-based methods, where ΔΔ\Deltaroman_Δ is a diagonal matrix with Δii=j=1NZjisubscriptΔ𝑖𝑖superscriptsubscript𝑗1𝑁subscript𝑍𝑗𝑖\Delta_{ii}=\sum_{j=1}^{N}Z_{ji}roman_Δ start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT. According to [40], the right singular vectors of Z𝑍Zitalic_Z are the same as the eigenvectors of ZΔZsuperscript𝑍topΔ𝑍Z^{\top}\Delta Zitalic_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ italic_Z. Consequently, we perform singular value decomposition (SVD) on Z𝑍Zitalic_Z and then run K𝐾Kitalic_K-means on the right vector to produce the final result, which needs 𝒪(m2N)𝒪superscript𝑚2𝑁\mathcal{O}(m^{2}N)caligraphic_O ( italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N ) instead of 𝒪(N3)𝒪superscript𝑁3\mathcal{O}(N^{3})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ).

III-C Optimization

To solve Eq. (5), we use an alternative strategy.

III-C1 Initialization of Bvsuperscript𝐵𝑣B^{v}italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT

We could optionally initialize Bvsuperscript𝐵𝑣B^{v}italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT with the cluster centers by dividing Hvsuperscript𝐻𝑣H^{v}italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT into m𝑚mitalic_m partitions with a K𝐾Kitalic_K-means algorithm.

III-C2 Update Z𝑍Zitalic_Z

By fixing {Bv}v=1Vsuperscriptsubscriptsuperscript𝐵𝑣𝑣1𝑉\{B^{v}\}_{v=1}^{V}{ italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT and {λv}v=1Vsuperscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\{\lambda_{v}\}_{v=1}^{V}{ italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT, we set the derivative of the objective function with respect to Z𝑍Zitalic_Z to zero, we have:

Z=(1+α)[βIm+v=1Vλv2(BvBv+αIm)]1(v=1Vλv2BvHv)Z={(1+\alpha)}[\beta I_{m}+\sum_{v=1}^{V}\lambda_{v}^{2}(B^{v}B^{v}{}^{\top}+% \alpha I_{m})]^{-1}(\sum_{v=1}^{V}\lambda_{v}^{2}B^{v}H^{v}{}^{\top})italic_Z = ( 1 + italic_α ) [ italic_β italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT + italic_α italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT ) (6)

For single-view scenario, the solution is Z=(1+α)(BB+(α+β)Im)1(BH)𝑍1𝛼superscript𝐵superscript𝐵top𝛼𝛽subscript𝐼𝑚1𝐵superscript𝐻topZ=(1+\alpha)(BB^{\top}+(\alpha+\beta)I_{m})^{-1}(BH^{\top})italic_Z = ( 1 + italic_α ) ( italic_B italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( italic_α + italic_β ) italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_B italic_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ).

III-C3 Update {Bv}v=1Vsuperscriptsubscriptsuperscript𝐵𝑣𝑣1𝑉\{B^{v}\}_{v=1}^{V}{ italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT

By fixing Z𝑍Zitalic_Z and {λv}v=1Vsuperscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\{\lambda_{v}\}_{v=1}^{V}{ italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT, Eq. (5) can be rewritten as:

RBv+βBvTv=Cv,𝑅superscript𝐵𝑣𝛽superscript𝐵𝑣superscript𝑇𝑣superscript𝐶𝑣RB^{v}+\beta B^{v}T^{v}=C^{v},italic_R italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + italic_β italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = italic_C start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , (7)

where R=ZZm×m𝑅𝑍superscript𝑍topsuperscript𝑚𝑚R=ZZ^{\top}\in\mathbb{R}^{m\times m}italic_R = italic_Z italic_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT, Tv=HvHvdv×dvsuperscript𝑇𝑣superscript𝐻𝑣superscriptsuperscript𝐻𝑣topsuperscriptsubscript𝑑𝑣subscript𝑑𝑣T^{v}=H^{v}{}^{\top}H^{v}\in\mathbb{R}^{d_{v}\times d_{v}}italic_T start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Cv=(1+α)ZHvm×dvsuperscript𝐶𝑣1𝛼𝑍superscript𝐻𝑣superscript𝑚subscript𝑑𝑣C^{v}=(1+\alpha)ZH^{v}\in\mathbb{R}^{m\times d_{v}}italic_C start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = ( 1 + italic_α ) italic_Z italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Then we can obtain Bvsuperscript𝐵𝑣B^{v}italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT by solving Sylvester Equation.

III-C4 Update {λv}v=1Vsuperscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\{\lambda_{v}\}_{v=1}^{V}{ italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT

Fixing Z𝑍Zitalic_Z and {Bv}v=1Vsuperscriptsubscriptsuperscript𝐵𝑣𝑣1𝑉\{B^{v}\}_{v=1}^{V}{ italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT, we let Mv=HvBvZF2+αBvHvZF2M_{v}=\left\|H^{v}{}^{\top}-B^{v}{}^{\top}Z\right\|_{F}^{2}+\alpha\|B^{v}H^{v}% {}^{\top}-Z\|^{2}_{F}italic_M start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = ∥ italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT - italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ∥ italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT - italic_Z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. Then the problem is simplified as:

minλvv=1Vλv2Mv,s.t.v=1Vλv=1,λv>0.formulae-sequencesubscript𝜆𝑣minsuperscriptsubscript𝑣1𝑉superscriptsubscript𝜆𝑣2subscript𝑀𝑣s.t.superscriptsubscript𝑣1𝑉subscript𝜆𝑣1subscript𝜆𝑣0\underset{\lambda_{v}}{\operatorname{min}}\sum_{v=1}^{V}\lambda_{v}^{2}M_{v},% \quad\text{s.t.}\hskip 2.84544pt\sum_{v=1}^{V}\lambda_{v}=1,\lambda_{v}>0.start_UNDERACCENT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_UNDERACCENT start_ARG roman_min end_ARG ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , s.t. ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT > 0 . (8)

This is a standard quadratic programming problem, which yields: λv=1Mvp=1V1Mp.subscript𝜆𝑣1subscript𝑀𝑣superscriptsubscript𝑝1𝑉1subscript𝑀𝑝\lambda_{v}=\frac{\frac{1}{M_{v}}}{\sum_{p=1}^{V}\frac{1}{M_{p}}}.italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG end_ARG .

Comment The optimization procedure will monotonically decrease the objective function value in Eq. (5) in each iteration. Since the objective function has a lower bound, such as zero, the above iteration converges.

III-D Complexity Analysis

The adjacency graph is often sparse in real-world scenarios. Consequently, we implement graph filtering with sparse matrix techniques, which takes linear time while multiplication operation takes 𝒪(f1N2)𝒪subscript𝑓1superscript𝑁2\mathcal{O}(f_{1}N^{2})caligraphic_O ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) in general, where f1=v=1Vdvsubscript𝑓1superscriptsubscript𝑣1𝑉subscript𝑑𝑣f_{1}=\sum_{v=1}^{V}d_{v}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT. Assume there are t𝑡titalic_t iterations in total, then the optimization of Z𝑍Zitalic_Z takes 𝒪(tmax(m3,mf1N))𝒪𝑡maxsuperscript𝑚3𝑚subscript𝑓1𝑁\mathcal{O}(t\operatorname{max}(m^{3},mf_{1}N))caligraphic_O ( italic_t roman_max ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_m italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_N ) ). Specifically, all multiplications and additions take 𝒪(tmf1N)𝒪𝑡𝑚subscript𝑓1𝑁\mathcal{O}(tmf_{1}N)caligraphic_O ( italic_t italic_m italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_N ) and the inverse operation needs 𝒪(tm3)𝒪𝑡superscript𝑚3\mathcal{O}(tm^{3})caligraphic_O ( italic_t italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). Then optimization of Bvsuperscript𝐵𝑣B^{v}italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT and {λv}v=1Vsuperscriptsubscriptsubscript𝜆𝑣𝑣1𝑉\{\lambda_{v}\}_{v=1}^{V}{ italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT takes 𝒪(tf2)𝒪𝑡subscript𝑓2\mathcal{O}(tf_{2})caligraphic_O ( italic_t italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and 𝒪(tmf1N)𝒪𝑡𝑚subscript𝑓1𝑁\mathcal{O}(tmf_{1}N)caligraphic_O ( italic_t italic_m italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_N ), where f2=v=1Vdv3subscript𝑓2superscriptsubscript𝑣1𝑉superscriptsubscript𝑑𝑣3f_{2}=\sum_{v=1}^{V}d_{v}^{3}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. It is worth pointing out that anchor generation has a constant complexity, which won’t be limited by the size of the data. We perform SVD on Z𝑍Zitalic_Z and implement K𝐾Kitalic_K-means to obtain clustering result, which takes 𝒪(m2N)𝒪superscript𝑚2𝑁\mathcal{O}(m^{2}N)caligraphic_O ( italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N ) and 𝒪(t¯cmN)𝒪¯𝑡𝑐𝑚𝑁\mathcal{O}(\bar{t}cmN)caligraphic_O ( over¯ start_ARG italic_t end_ARG italic_c italic_m italic_N ) respectively, where t¯¯𝑡\bar{t}over¯ start_ARG italic_t end_ARG is the iteration number of K𝐾Kitalic_K-means and c𝑐citalic_c is cluster number. In practice, dNmuch-less-than𝑑𝑁d\ll Nitalic_d ≪ italic_N, mNmuch-less-than𝑚𝑁m\ll Nitalic_m ≪ italic_N, and tNmuch-less-than𝑡𝑁t\ll Nitalic_t ≪ italic_N, c𝑐citalic_c and t¯¯𝑡\bar{t}over¯ start_ARG italic_t end_ARG are constants, so the proposed method has a linear time complexity. Moreover, the largest space cost is m×N𝑚𝑁m\times Nitalic_m × italic_N or N×d𝑁𝑑N\times ditalic_N × italic_d, which means our approach has a linear space complexity.

We compare our complexity with baselines in Table I. The iteration number t𝑡titalic_t is omitted. The P^^𝑃\widehat{P}over^ start_ARG italic_P end_ARG represents the average degree of the graph in S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC. l𝑙litalic_l and K𝐾Kitalic_K are the number of view groups and nearest neighbors for each view group, where view group is defined as a group of multiple randomly selected views. B𝐵Bitalic_B is the batch size, remaining symbols are the same as those in the main body of CDC. It can be seen that our method has clear advantages and only suffers from feature dimension. What’s more, for high-dimensional data, dimension reduction techniques can be applied.

TABLE I: The brief complexity analysis of recent SOTA methods.
Methods Time Space
Single-view MVGRL 𝒪(dN2+d2N)𝒪𝑑superscript𝑁2superscript𝑑2𝑁\mathcal{O}(dN^{2}+d^{2}N)caligraphic_O ( italic_d italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N ) 𝒪(N2+dN)𝒪superscript𝑁2𝑑𝑁\mathcal{O}(N^{2}+dN)caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d italic_N )
S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC 𝒪(P^dN)𝒪^𝑃𝑑𝑁\mathcal{O}(\widehat{P}dN)caligraphic_O ( over^ start_ARG italic_P end_ARG italic_d italic_N ) 𝒪(d(BP^+N))𝒪𝑑𝐵^𝑃𝑁\mathcal{O}(d(B\widehat{P}+N))caligraphic_O ( italic_d ( italic_B over^ start_ARG italic_P end_ARG + italic_N ) )
Multi-view MCGC 𝒪(N2+dN)𝒪superscript𝑁2𝑑𝑁\mathcal{O}(N^{2}+dN)caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d italic_N ) 𝒪(N2)𝒪superscript𝑁2\mathcal{O}(N^{2})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
MvAGC 𝒪(mdN)𝒪𝑚𝑑𝑁\mathcal{O}(mdN)caligraphic_O ( italic_m italic_d italic_N ) 𝒪((m+d)N)𝒪𝑚𝑑𝑁\mathcal{O}((m+d)N)caligraphic_O ( ( italic_m + italic_d ) italic_N )
Non-graph EOMSC-CA 𝒪((m2+d)N+m3)𝒪superscript𝑚2𝑑𝑁superscript𝑚3\mathcal{O}((m^{2}+d)N+m^{3})caligraphic_O ( ( italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d ) italic_N + italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) 𝒪((m+d)N)𝒪𝑚𝑑𝑁\mathcal{O}((m+d)N)caligraphic_O ( ( italic_m + italic_d ) italic_N )
FastMICE 𝒪(lm12V12N)𝒪𝑙superscript𝑚12superscript𝑉12𝑁\mathcal{O}(lm^{\frac{1}{2}}V^{\frac{1}{2}}N)caligraphic_O ( italic_l italic_m start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_N ) 𝒪((c+K+l+V)N)𝒪𝑐𝐾𝑙𝑉𝑁\mathcal{O}((c+K+l+V)N)caligraphic_O ( ( italic_c + italic_K + italic_l + italic_V ) italic_N )
Proposed CDC 𝒪((m2+d)N+d3)𝒪superscript𝑚2𝑑𝑁superscript𝑑3\mathcal{O}((m^{2}+d)N+d^{3})caligraphic_O ( ( italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d ) italic_N + italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) 𝒪((m+d)N)𝒪𝑚𝑑𝑁\mathcal{O}((m+d)N)caligraphic_O ( ( italic_m + italic_d ) italic_N )

IV Theoretical Analysis

We establish theoretical support for our method: 1) filtered features encode node attribute and topology structure; 2) the learned anchor graph is clustering-favorable.

Definition IV.1 (Grou** effect [41]).

There are two similar nodes i𝑖iitalic_i and j𝑗jitalic_j in terms of local topology and node features, i.e., 𝒱i𝒱j(AiAj20)(xixj20)subscript𝒱𝑖subscript𝒱𝑗superscriptnormsubscript𝐴𝑖subscript𝐴𝑗20superscriptnormsubscript𝑥𝑖subscript𝑥𝑗20\mathcal{V}_{i}\rightarrow\mathcal{V}_{j}\Longleftrightarrow\left(\left\|A_{i}% -A_{j}\right\|^{2}\rightarrow 0\right)\wedge\left(\left\|x_{i}-x_{j}\right\|^{% 2}\rightarrow 0\right)caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → caligraphic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟺ ( ∥ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 ) ∧ ( ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 ), the matrix G𝐺Gitalic_G is said to have a grou** effect if 𝒱i𝒱j|GipGjp|0,1pN.formulae-sequencesubscript𝒱𝑖subscript𝒱𝑗subscript𝐺𝑖𝑝subscript𝐺𝑗𝑝0for-all1𝑝𝑁\mathcal{V}_{i}\rightarrow\mathcal{V}_{j}\Longrightarrow|G_{ip}-G_{jp}|% \rightarrow 0,\forall 1\leq p\leq N.caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → caligraphic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟹ | italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT - italic_G start_POSTSUBSCRIPT italic_j italic_p end_POSTSUBSCRIPT | → 0 , ∀ 1 ≤ italic_p ≤ italic_N .

Theorem IV.2.

Define the distance between filtered node i𝑖iitalic_i and j𝑗jitalic_j is hihj2superscriptnormsubscript𝑖subscript𝑗2\|h_{i}-h_{j}\|^{2}∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have hihj2122k[(AiAj)i=0k1(iN)AiX2+xixj2]superscriptnormsubscript𝑖subscript𝑗21superscript22𝑘delimited-[]superscriptnormsubscript𝐴𝑖subscript𝐴𝑗superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑖𝑋2superscriptnormsubscript𝑥𝑖subscript𝑥𝑗2\|h_{i}-h_{j}\|^{2}\leq\frac{1}{2^{2k}}[\|(A_{i}-A_{j})\sum_{i=0}^{k-1}{i% \choose N}A^{i}X\|^{2}+\|x_{i}-x_{j}\|^{2}]∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT end_ARG [ ∥ ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_X ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ], i.e., the filtered features H𝐻Hitalic_H preserve both topology and attribute similarity.

Proof.

Note L=IA𝐿𝐼𝐴L=I-Aitalic_L = italic_I - italic_A, then I12L=A+I2𝐼12𝐿𝐴𝐼2I-\frac{1}{2}L=\frac{A+I}{2}italic_I - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L = divide start_ARG italic_A + italic_I end_ARG start_ARG 2 end_ARG. Then we have H=(I12L)kX=(A+I)k2kX𝐻superscript𝐼12𝐿𝑘𝑋superscript𝐴𝐼𝑘superscript2𝑘𝑋H=(I-\frac{1}{2}L)^{k}X=\frac{(A+I)^{k}}{2^{k}}Xitalic_H = ( italic_I - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_L ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_X = divide start_ARG ( italic_A + italic_I ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG italic_X. Expand it as follows:

H𝐻\displaystyle Hitalic_H =(A+I)k2kX=Ai=0k1(iN)Ai+I2kXabsentsuperscript𝐴𝐼𝑘superscript2𝑘𝑋𝐴superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑖𝐼superscript2𝑘𝑋\displaystyle=\frac{(A+I)^{k}}{2^{k}}X=\frac{A\sum_{i=0}^{k-1}{i\choose N}A^{i% }+I}{2^{k}}X= divide start_ARG ( italic_A + italic_I ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG italic_X = divide start_ARG italic_A ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_I end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG italic_X
=Ai=0k1(iN)AkiX+X2k.absent𝐴superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑘𝑖𝑋𝑋superscript2𝑘\displaystyle=\frac{A\sum_{i=0}^{k-1}{i\choose N}A^{k-i}X+X}{2^{k}}.= divide start_ARG italic_A ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_k - italic_i end_POSTSUPERSCRIPT italic_X + italic_X end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG .

Then compute the distance of node i𝑖iitalic_i and j𝑗jitalic_j:

hihj2superscriptnormsubscript𝑖subscript𝑗2\displaystyle\|h_{i}-h_{j}\|^{2}∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (9)
=(Ai=0k1(iN)AiX+X)i(Ai=0k1(iN)AiX+X)j2k2absentsuperscriptnormsubscript𝐴superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑖𝑋𝑋𝑖subscript𝐴superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑖𝑋𝑋𝑗superscript2𝑘2\displaystyle=\|\frac{(A\sum_{i=0}^{k-1}{i\choose N}A^{i}X+X)_{i}-(A\sum_{i=0}% ^{k-1}{i\choose N}A^{i}X+X)_{j}}{2^{k}}\|^{2}= ∥ divide start_ARG ( italic_A ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_X + italic_X ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( italic_A ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_X + italic_X ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=122k||(AiAj)i=0k1(iN)AiX+(XiXj)2\displaystyle=\frac{1}{2^{2k}}||(A_{i}-A_{j})\sum_{i=0}^{k-1}{i\choose N}A^{i}% X+(X_{i}-X_{j})\|^{2}= divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT end_ARG | | ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_X + ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
122k[(AiAj)i=0k1(iN)AiX2+XiXj2]absent1superscript22𝑘delimited-[]superscriptnormsubscript𝐴𝑖subscript𝐴𝑗superscriptsubscript𝑖0𝑘1binomial𝑖𝑁superscript𝐴𝑖𝑋2superscriptnormsubscript𝑋𝑖subscript𝑋𝑗2\displaystyle\leq\frac{1}{2^{2k}}[\|(A_{i}-A_{j})\sum_{i=0}^{k-1}{i\choose N}A% ^{i}X\|^{2}+\|X_{i}-X_{j}\|^{2}]≤ divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT end_ARG [ ∥ ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ( binomial start_ARG italic_i end_ARG start_ARG italic_N end_ARG ) italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_X ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

So, if 𝒱i𝒱jsubscript𝒱𝑖subscript𝒱𝑗\mathcal{V}_{i}\rightarrow\mathcal{V}_{j}caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → caligraphic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, hihj0normsubscript𝑖subscript𝑗0\|h_{i}-h_{j}\|\rightarrow 0∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ → 0. However, when nodes are similar to each other in only one space, i.e., either AiAj20superscriptnormsubscript𝐴𝑖subscript𝐴𝑗20\|A_{i}-A_{j}\|^{2}\rightarrow 0∥ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 or xixj20superscriptnormsubscript𝑥𝑖subscript𝑥𝑗20\|x_{i}-x_{j}\|^{2}\rightarrow 0∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0,hihj2superscriptnormsubscript𝑖subscript𝑗2\|h_{i}-h_{j}\|^{2}∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT has a non-zero upper bound unless k𝑘kitalic_k is large enough. This indicates that the filtered representations of similar nodes in both attribute and topology space get closer, and different graph filtering order will adjust this bias.

Theorem IV.3.

Let G=Z𝐺superscript𝑍topG=Z^{\top}italic_G = italic_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, then |GipGjp|2hihj2C2F2superscriptsubscript𝐺𝑖𝑝subscript𝐺𝑗𝑝2superscriptnormsubscript𝑖subscript𝑗2subscriptsuperscriptnormsubscript𝐶22𝐹|G_{ip}-G_{jp}|^{2}\leq\|h_{i}-h_{j}\|^{2}\|C_{2}\|^{2}_{F}| italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT - italic_G start_POSTSUBSCRIPT italic_j italic_p end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, where C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a constant matrix. We have 𝒱i𝒱jnormal-→subscript𝒱𝑖subscript𝒱𝑗\mathcal{V}_{i}\rightarrow\mathcal{V}_{j}caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → caligraphic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, |GipGjp|20normal-→superscriptsubscript𝐺𝑖𝑝subscript𝐺𝑗𝑝20|G_{ip}-G_{jp}|^{2}\rightarrow 0| italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT - italic_G start_POSTSUBSCRIPT italic_j italic_p end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0, i.e., the learned anchor graph Z𝑍Zitalic_Z have a grou** effect.

Proof.

Define G*=Z*=(1+α)(HB)(BB+(α+β)Im)1G^{*}=Z^{*}{}^{\top}=(1+\alpha)(HB^{\top})(BB^{\top}+(\alpha+\beta)I_{m})^{-1}italic_G start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_Z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT = ( 1 + italic_α ) ( italic_H italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ( italic_B italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( italic_α + italic_β ) italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and i=higiB2+αhiBgi2+βgi2subscript𝑖superscriptnormsubscript𝑖subscript𝑔𝑖𝐵2𝛼superscriptnormsubscript𝑖superscript𝐵topsubscript𝑔𝑖2𝛽superscriptnormsubscript𝑔𝑖2\mathcal{L}_{i}=\|h_{i}-g_{i}B\|^{2}+\alpha\|h_{i}B^{\top}-g_{i}\|^{2}+\beta\|% g_{i}\|^{2}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_β ∥ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith row of G𝐺Gitalic_G. Then let iGip|gi=gi*=0evaluated-atsubscript𝑖subscript𝐺𝑖𝑝subscript𝑔𝑖superscriptsubscript𝑔𝑖0\frac{\partial\mathcal{L}_{i}}{\partial G_{ip}}|_{g_{i}=g_{i}^{*}}=0divide start_ARG ∂ caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0, which yields Gip=(higiB)Bp+αhiBpβαsubscript𝐺𝑖𝑝subscript𝑖subscript𝑔𝑖𝐵superscriptsubscript𝐵𝑝top𝛼subscript𝑖superscriptsubscript𝐵𝑝top𝛽𝛼G_{ip}=\frac{(h_{i}-g_{i}B)B_{p}^{\top}+\alpha h_{i}B_{p}^{\top}}{\beta-\alpha}italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT = divide start_ARG ( italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B ) italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_α italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_β - italic_α end_ARG. Let C1=(BB+(α+β)Im)1subscript𝐶1superscript𝐵superscript𝐵top𝛼𝛽subscript𝐼𝑚1C_{1}=(BB^{\top}+(\alpha+\beta)I_{m})^{-1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_B italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( italic_α + italic_β ) italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, thus gi=(1+α)hiBC1subscript𝑔𝑖1𝛼subscript𝑖superscript𝐵topsubscript𝐶1g_{i}=(1+\alpha)h_{i}B^{\top}C_{1}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 + italic_α ) italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Eventually,

Gipsubscript𝐺𝑖𝑝\displaystyle G_{ip}italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT =[hi(1+α)hiBC1B]Bp+αhiBpβαabsentdelimited-[]subscript𝑖1𝛼subscript𝑖superscript𝐵topsubscript𝐶1𝐵superscriptsubscript𝐵𝑝top𝛼subscript𝑖superscriptsubscript𝐵𝑝top𝛽𝛼\displaystyle=\frac{[h_{i}-(1+\alpha)h_{i}B^{\top}C_{1}B]B_{p}^{\top}+\alpha h% _{i}B_{p}^{\top}}{\beta-\alpha}= divide start_ARG [ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 + italic_α ) italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_B ] italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_α italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_β - italic_α end_ARG
=hi(1+α)(IdBC1B)Bpβα.absentsubscript𝑖1𝛼subscript𝐼𝑑superscript𝐵topsubscript𝐶1𝐵superscriptsubscript𝐵𝑝top𝛽𝛼\displaystyle=\frac{h_{i}(1+\alpha)(I_{d}-B^{\top}C_{1}B)B_{p}^{\top}}{\beta-% \alpha}.= divide start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 + italic_α ) ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_B ) italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_β - italic_α end_ARG .

Remarking C2=(1+α)(IdBC1B)Bpβαsubscript𝐶21𝛼subscript𝐼𝑑superscript𝐵topsubscript𝐶1𝐵superscriptsubscript𝐵𝑝top𝛽𝛼C_{2}=\frac{(1+\alpha)(I_{d}-B^{\top}C_{1}B)B_{p}^{\top}}{\beta-\alpha}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG ( 1 + italic_α ) ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_B ) italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG italic_β - italic_α end_ARG, we obtain |GipGjp|2hihj2C22superscriptsubscript𝐺𝑖𝑝subscript𝐺𝑗𝑝2superscriptnormsubscript𝑖subscript𝑗2superscriptnormsubscript𝐶22|G_{ip}-G_{jp}|^{2}\leq\|h_{i}-h_{j}\|^{2}\|C_{2}\|^{2}| italic_G start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT - italic_G start_POSTSUBSCRIPT italic_j italic_p end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. ∎

This indicates that local structures of similar nodes tend to be identical on the learned graph Z𝑍Zitalic_Z, which makes corresponding nodes be clustered into the same group. In other words, the learned graph is clustering-friendly. To intuitively demonstrate the grou** effect of the anchor graph, we plot five diagrams of Z𝑍Zitalic_Z in Fig. Z𝑍Zitalic_Z on ACM has a stronger grou** effect than the one on Pubmed.

Refer to caption
(a) ACM(ACC=93.6%)
Refer to caption
(b) Pubmed(ACC=74.1%)
Figure 1: Visualization of learned graph Z𝑍Zitalic_Z’s grou** effect.

V Experiments

V-A Datasets and Metrics

TABLE II: Statistical information of datasets.
Type Datasets Samples Edges/Dims Clusters
Graph Single-view Citeseer 3327 4614 / 3703 6
Pubmed 19717 44325 / 500 3
Multi-relational ACM 3025 29281, 2210761 / 1830 3
DBLP 4057 11113, 5000495, 6776335 / 334 4
Multi-attribute AMAP 7487 119043 / 745, 7487 8
AMAC 13381 245778 / 767, 13381 10
Extra-/Large-scale Products 2449029 61859140 / 100 47
Papers100M 111059956 1615685872 / 128 172
Non-graph Large-scale multi-view YTF-31 101499 507495 / 64, 512, 64, 647, 838 31
YTF-400 398191 1990955 / 944, 576, 512, 640 400

To show the effectiveness and efficiency of the CDC, we evaluate CDC on 10 benchmark datasets, including 6 multi-view data and 4 single-view data. More specifically, ACM and DBLP [17] are multi-view graphs with multiple relations, AMAP and AMAC [37] are multi-attribute graphs, YTF-31 [7] and YTF-400 [27] are multi-view non-graph data (YouTube-Faces); Citeseer, Pubmed [12], Products and Papers100M [42] are single-view graphs, where the latter two are from Open Graph Benchmarks [43]. The statistical information of these datasets is shown in Table II. Most notably, YTF-400 represents the largest multi-view non-graph dataset, while Papers100M is the largest graph used in the clustering task. We adopt four popular clustering metrics, including ACCuracy (ACC), Normalized Mutual Information (NMI), F1 score, and Adjusted Rand Index (ARI). A higher value of them indicates a better performance.

V-B Experimental Setup

We compare CDC with a number of single-view methods as well as multi-view methods.
Single-view graph Baselines include MinCutPool [44], METIS [45], Node2vec [29], DGI [46], DMoN [47], GRACE [33], BGRL [32], MVGRL [31], and S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC [18].

METIS uses only structural information to partition graphs. Node2vec is a well-known graph embedding algorithm based on random walks. MinCutPool and DMoN integrate spectral clustering with graph neural networks. DGI learns node representations by maximizing mutual information between patch representations and corresponding high-level summaries of graphs. GRACE, BGRL, and MVGRL are three contrastive graph representation learning methods. S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC is a recent scalable graph clustering method, which uses light-weight encoder and random walk-based sampler.

Multi-view graph There are 10 baselines on multi-view graphs clustering, including SDCN [34], DAEGC [14], O2MAC [17], HDMI [48], CMGEC [24], COMPLETER [49], MvAGC [37], MCGC [11], MVGRL [31], and MAGCN [16]. The first six methods are only applicable to data with multiple graphs or multiple attributes, whereas the last four are applicable to general multi-view graph data.
Graph attention auto-encoders and GCNs are used in SDCN and DAEGC, respectively.

To get consistent embedding, CMGEC adds a graph fusion network to multiple graph auto-encoders. In O2MAC, the most informative view is selected to learn cluster representation. HDMI learns node embeddings by using high-order mutual information. MAGCN applies graph auto-encoder on both attributes and topological graphs to learn consensus representations. Through contrastive mechanisms, COMPLETER and MVGRL learn a common representation shared across multiple views and graphs. MCGC uses a contrastive regularizer to boost the quality of the learned graph. In MvAGC, high-order topological interactions are explored to improve clustering performance.

Non-graph We compare CDC with six scalable MVC methods on non-graph data, including BMVC [9], LMVSC [5], MSGL [26], FPMVS [40], EOMSC-CA [7], and FastMICE [27].

BMVC learns discrete representations and binary cluster structures jointly to integrate collaborative information. MSGL and LMVSC are two scalable subspace clustering methods. FPMVS and EMOMSC-CA are two adaptive anchor-based algorithms. The differences between CDC and them are: 1) CDC uses a similarity preservation regularizer while anchor matrices are assumed to be unitary matrices in FPMVS and EMOMSC; 2) the complexity of anchor generation in CDC is not linked with data size. FastMICE constructs anchor graphs by using features, anchors, and neighbors jointly.

Parameter setting The balance parameters α𝛼\alphaitalic_α and β𝛽\betaitalic_β are set as [1e3,1,1e1,1e3,1e4]1𝑒311𝑒11𝑒31𝑒4[1e-3,1,1e1,1e3,1e4][ 1 italic_e - 3 , 1 , 1 italic_e 1 , 1 italic_e 3 , 1 italic_e 4 ]. The number of anchors m𝑚mitalic_m is set as [c,10,30,50,70,100]𝑐10305070100[c,10,30,50,70,100][ italic_c , 10 , 30 , 50 , 70 , 100 ]. All experiments are conducted on the same machine with the Intel(R) Core(TM) i9-12900k CPU, two GeForce GTX 3090 GPUs, and 128GB RAM.

V-C Results

TABLE III: Results on extra-large-scale graph.
Metrics Papers100M
K𝐾Kitalic_K-means Node2vec DGI S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC CDC
ACC 0.144 0.175 0.151 0.173 0.174
NMI 0.368 0.380 0.416 0.453 0.427
ARI 0.074 0.112 0.096 0.110 0.114
F1 0.101 0.099 0.111 0.118 0.119

V-C1 Single-view Scenario

The results on the small-scale graph Citeseer, medium-scale graph Pubmed, large-scale graph Products, and extra-large-scale graph Papers100M are shown in Table III and Table IV. Note that most neural network-based methods can’t handle large and extra-large-scale graphs. On Citeseer and Pubmed, our method achieves the best results, and on Products and Papers100M, our method produces competitive results. In particular, on Pubmed, CDC surpasses the most recent S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC method by more than 3%percent\%% in all metrics. CDC also shows a slight advantage against S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC on the largest Papers100M dataset. Furthermore, CDC involves a lower time cost in comparison to S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC. Specifically, it takes similar-to\sim5mins and similar-to\sim4hs on Products and Papers100M, while S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC consumes similar-to\sim1h and similar-to\sim24hs, which proves CDC’s efficiency when it comes to large/extra-large-scale graphs. With respect to many GNN-based methods, like MinCutPool, DGI, DMON, GRACE, BGRL, and MVGRL, CDC demonstrates a clear edge.

TABLE IV: Results on single-view graphs. ”-” denotes that the method ran out of memory (OM) or didn’t converge. The best results are denoted with red and the with blue.
Citeseer Pubmed Products
Method ACC NMI ARI F1 ACC NMI ARI F1 ACC NMI ARI F1

MinCutPool

0.537 0.295 0.262 0.516 0.521 0.214 0.175 0.445 0.257 0.430 0.180 0.130

METIS

0.413 0.170 0.150 0.400 0.693 0.297 0.323 0.682 0.294 0.468 0.220 0.145

Node2vec

0.421 0.240 0.116 0.401 0.641 0.288 0.258 0.634 0.357 0.489 0.247 0.170

DGI

0.686 0.435 0.445 0.643 0.657 0.322 0.292 0.654 0.320 0.467 0.192 0.174

DMoN

0.385 0.303 0.200 0.437 0.351 0.257 0.108 0.343 0.304 0.428 0.210 0.139

GRACE

0.631 0.399 0.377 0.603 0.637 0.308 0.276 0.628 - - - -

BGRL

0.675 0.422 0.428 0.631 0.654 0.315 0.285 0.649 - - - -

MVGRL

0.703 0.459 0.471 0.654 0.675 0.345 0.310 0.672 - - - -

S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTGC

0.688 0.441 0.448 0.643 0.713 0.333 0.345 0.703 0.402 0.536 0.25 0.23

CDC

0.709 0.444 0.471 0.661 0.741 0.371 0.383 0.737 0.366 0.390 0.121 0.187

V-C2 Multi-view Scenario

Clustering on Multi-view Graphs

CDC clustering results on multi-attribute and multi-relational graphs are reported in Table V and Table VI. The performance of CDC is much better than that of any other methods on four benchmarks in all metrics. For example, compared to the 2nd best method MCGC, ACC, NMI, ARI on ACM, AMAP, and AMAC are improved by more than 5%percent\%%, 7%percent\%%, and 9%percent\%% on average, respectively. Although MvAGC samples nodes as anchors, it takes more time than CDC since its sampling strategy suffers from low efficiency. Specifically, CDC is more than 2×2\times2 × and 5×5\times5 × faster on the multi-relational and multi-attribute graphs, respectively. Compared to other methods, the advantage is more significant. Therefore, CDC is a promising clustering method for graph data with various forms.

TABLE V: Results on the multi-relational graph.
Method ACM DBLP
ACC NMI ARI F1 ACC NMI ARI F1
SDCN 0.863 0.578 0.639 0.862 0.650 0.298 0.310 0.638
DAEGC 0.891 0.643 0.705 0.891 0.873 0.674 0.701 0.862
O2MAC 0.904 0.692 0.739 0.905 0.907 0.729 0.778 0.901
HDMI 0.874 0.645 0.674 0.872 0.885 0.692 0.753 0.865
CMGEC 0.909 0.691 0.723 0.907 0.910 0.724 0.786 0.904
MvAGC 0.898 0.674 0.721 0.899 0.928 0.773 0.828 0.923
MCGC 0.915 0.713 0.763 0.916 0.930 0.775 0.830 0.925
CDC 0.936 0.769 0.817 0.936 0.933 0.781 0.836 0.929
TABLE VI: Results on the multi-attribute graph.
Datasets AMAP AMAC
Method ACC NMI ARI F1 ACC NMI ARI F1
COMPLETER 0.368 0.261 0.076 0.307 0.242 0.156 0.054 0.160
MVGRL 0.505 0.433 0.238 0.460 0.245 0.101 0.055 0.171
MAGCN 0.517 0.390 0.240 0.474 - - - -
MvAGC 0.678 0.524 0.397 0.640 0.580 0.396 0.322 0.412
MCGC 0.716 0.615 0.432 0.686 0.597 0.532 0.390 0.520
CDC 0.795 0.707 0.620 0.730 0.647 0.604 0.437 0.546
Clustering on Multi-view Non-graph data
TABLE VII: Results on large-scale multi-view non-graph data.
Method YTF-31 YTF-400
ACC NMI F1 ACC NMI F1
BMVC 0.090 0.059 0.058 - - -
LMVSC 0.140 0.118 0.083 0.489 0.767 0.589
MSGL 0.167 0.001 0.151 0.502 0.738 0.606
FPMVS 0.230 0.234 0.140 0.562 0.797 0.472
EOMSC-CA 0.265 0.003 0.164 0.570 0.779 0.408
FastMICE 0.275 0.236 0.295 0.564 0.798 0.509
CDC 0.285 0.260 0.298 0.571 0.745 0.591

To process non-graph data, we manually construct 5-nearest neighbor graphs for graph filtering. Table VII shows the clustering results on YTF-31 and YTF-400. We find that most existing methods can’t handle YTF-400, which is the largest non-graph multi-view data. CDC still achieves the best results in most cases. Though some others also use anchor ideas, their computation time cost is still high. Specifically, CDC takes similar-to\sim20s and similar-to\sim1min respectively, while EOMSC-CA needs similar-to\sim2mins, similar-to\sim6mins and FastMICE takes similar-to\sim30s, similar-to\sim3mins on these two datasets. This verifies that CDC is also a promising clustering method for non-graph data. The time cost of several recent SOTA methods is summarized in Fig. 2.

Refer to caption
Figure 2: Run time of existing SOTA methods on various datasets

V-D Ablation Study

V-D1 Effect of Similarity-Preserving

Anchors are generated adaptively in the similarity space constrained by a similarity-preserving (marked as SP) regularizer. To clearly show the effect of SP, we remove it from the model and test the performance of CDC w/o SP on Pubmed and ACM in Table VIII. It’s clear that similarity preserving does improve the clustering performance by 3%percent\%% on average. Moreover, CDC takes less time than CDC w/o SP on two datasets. The reason is that the computation complexity for Bv=(ZZ)1(ZHv)superscript𝐵𝑣superscript𝑍superscript𝑍top1𝑍superscript𝐻𝑣B^{v}=(ZZ^{\top})^{-1}(ZH^{v})italic_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = ( italic_Z italic_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_Z italic_H start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) is 𝒪(mdvN)𝒪𝑚subscript𝑑𝑣𝑁\mathcal{O}(md_{v}N)caligraphic_O ( italic_m italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT italic_N ), which is higher than 𝒪(dv3)𝒪superscriptsubscript𝑑𝑣3\mathcal{O}(d_{v}^{3})caligraphic_O ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) in CDC. Therefore, as a bonus, SP regularizer helps to reduce the complexity of anchor generation. Moreover, it improves the quality of anchors. As observed in Fig. 3, CDC achieves the best results with a few anchors, which further reduces the computation cost. In fact, too many anchors could deteriorate the performance since some noisy anchors that are not representative could be introduced.

TABLE VIII: Results of CDC with/without GF and SR.
Method Pubmed ACM
ACC NMI F1 Time (s) ACC NMI F1 Times (s)
CDC 0.741 0.371 0.737 2.03 0.936 0.769 0.936 0.81
w/o SP 0.707 0.349 0.704 5.91 0.918 0.710 0.919 1.03
(-0.034) (-0.022) (-0.033) (+3.88) (-0.018) (-0.059) (-0.017) (+0.22)
w/o GF 0.626 0.256 0.639 2.86 0.872 0.585 0.871 0.88
(-0.115) (-0.115) (-0.098) (+0.83) (-0.064) (-0.184) (-0.055) (+0.07)
Refer to caption
(a) ACM
Refer to caption
(b) Pubmed
Figure 3: Results on ACM and Pubmed with different anchor number m𝑚mitalic_m.

V-D2 Effect of Graph Filtering

Graph filtering (marked as GF) is applied to integrate node attributes and topology in our method. Besides theoretically showing that the learned anchor graph from filtered representations is clustering-favorable, we also show this experimentally in Table VIII. We can see that the performance of CDC w/o GF drops about 10%percent\%% on average, which validates the significance of graph filtering. We also observe the increase in run time, which could be caused by the slow convergence due to the loss of cluster-ability.

V-D3 Robustness on Heterophily

TABLE IX: Results on heterophilic graphs.
Texas Cornell Wisconsin Squirrel
Methods ACC NMI ACC NMI ACC NMI ACC NMI
CDRS 0.599 0.154 - - 0.562 0.137 - -
CGC 0.615 0.215 0.446 0.141 0.559 0.230 0.272 0.030
CDC 0.672 0.293 0.514 0.142 0.637 0.318 0.279 0.043

In some real-world applications, graphs could be heterophilic, where connected nodes tend to have different labels [50]. To show the robustness of CDC on heterophily, we report the results on several popular heterophilic graphs, including Texas, Cornell, Wisconsin [51], Squirrel [52]. As shown in Table IX, our proposed CDC dominates the recent SOTA methods, CDRS [53] and CGC [50]. Although the used low-pass filter is considered to be less useful for heterophily, CDC still works well because of the high-quality anchors and clustering-friendly graph. In fact, there are few graph clustering methods for heterophily, further works like an omnipotent filter could contribute a lot to handling clustering on heterophilic graphs.

VI Parameter Analysis

There are two trade-off parameters, α𝛼\alphaitalic_α and β𝛽\betaitalic_β, in our model. As shown in Figure 4, although CDC works well for a wide range of α𝛼\alphaitalic_α and β𝛽\betaitalic_β, fine-tuning does enhance its performance. β𝛽\betaitalic_β makes less impact than α𝛼\alphaitalic_α, which indicates that the similarity-preserving regularizer is more important. The proposed CDC is of linear complexity, so fine-tuning procedures take little time.

Refer to caption
(a) ACM
Refer to caption
(b) Pubmed
Figure 4: Accuracy on ACM and Pubmed with different α𝛼\alphaitalic_α and β𝛽\betaitalic_β.

Besides, we also visualize the objective function value of CDC on ACM, Pubmed and YTF-31 in Fig. 5. It can be seen that losses converge fast.

Refer to caption
(a) loss of CDC on ACM
Refer to caption
(b) loss of CDC on Pubmed
Refer to caption
(c) loss of CDC on YTF-31
Figure 5: The objective function value of CDC.

VII Conclusion

In this paper, we propose a simple framework for clustering complex data, which is readily applicable to graph and non-graph, multi-view and single-view data. The developed method has linear complexity and nice theoretical properties. With graph filtering, we integrate deep structural information and learn representations with cluster-ability. In particular, a similarity-preserving regularizer is designed to adaptively generate high-quality anchors, which alleviates the burden and randomness of anchor selection. CDC demonstrates its effectiveness and efficiency with impressive results on 14 complex datasets. In particular, it even exceeds the performance of many complex GNN-based methods. In light of the simplicity of the proposed framework and its effectiveness on various types of data, this work could have a broad impact on the clustering community and have a high potential for deployment in real applications. One potential limitation of the CDC is that it might not be able to handle high-dimensional data efficiently, since anchor generation has a cubic complexity of sample dimension.

References

  • [1] A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recognition letters, vol. 31, no. 8, pp. 651–666, 2010.
  • [2] J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion, vol. 38, pp. 43–54, 2017.
  • [3] K. Zhan, F. Nie, J. Wang, and Y. Yang, “Multiview consensus graph clustering,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1261–1270, 2018.
  • [4] X. Yang, C. Deng, Z. Dang, and D. Tao, “Deep multiview collaborative clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  • [5] Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han, and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4412–4419.
  • [6] X. Li, H. Zhang, R. Wang, and F. Nie, “Multiview clustering: A scalable and parameter-free bipartite graph fusion method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 330–344, 2020.
  • [7] S. Liu, S. Wang, P. Zhang, K. Xu, X. Liu, C. Zhang, and F. Gao, “Efficient one-pass multi-view subspace clustering with consensus anchors,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7576–7584.
  • [8] M. Sun, P. Zhang, S. Wang, S. Zhou, W. Tu, X. Liu, E. Zhu, and C. Wang, “Scalable multi-view subspace clustering with unified anchors,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3528–3536.
  • [9] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary multi-view clustering,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1774–1782, 2018.
  • [10] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, “Community preserving network embedding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017.
  • [11] E. Pan and Z. Kang, “Multi-view contrastive graph clustering,” Advances in neural information processing systems, vol. 34, pp. 2148–2159, 2021.
  • [12] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in 5th International Conference on Learning Representations, 2017.
  • [13] X. Zhang, H. Liu, Q. Li, and X. Wu, “Attributed graph clustering via adaptive graph convolution,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, S. Kraus, Ed., 2019, pp. 4327–4333.
  • [14] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “Attributed graph clustering: A deep attentional embedding approach,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019, pp. 3670–3676.
  • [15] S. Pan, R. Hu, S.-f. Fung, G. Long, J. Jiang, and C. Zhang, “Learning graph embedding with adversarial training methods,” IEEE transactions on cybernetics, vol. 50, no. 6, pp. 2475–2487, 2019.
  • [16] J. Cheng, Q. Wang, Z. Tao, D. Xie, and Q. Gao, “Multi-view attribute graph convolution networks for clustering,” in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 2973–2979.
  • [17] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi graph autoencoder for multi-view graph clustering,” in Proceedings of The Web Conference 2020, 2020, pp. 3070–3076.
  • [18] F. Devvrit, A. Sinha, I. S. Dhillon, and P. Jain, “S3GC: Scalable self-supervised graph clustering,” in Advances in Neural Information Processing Systems, 2022.
  • [19] Y. Liu, K. Liang, J. Xia, S. Zhou, X. Yang, X. Liu, and S. Z. Li, “Dink-net: Neural clustering on large graphs,” in International Conference on Machine Learning, ICML 2023.   PMLR, 2023.
  • [20] D. J. Trosten, S. Lokse, R. Jenssen, and M. Kampffmeyer, “Reconsidering representation alignment for multi-view clustering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1255–1265.
  • [21] R. Li, C. Zhang, Q. Hu, P. Zhu, and Z. Wang, “Flexible multi-view representation learning for subspace clustering.” pp. 2916–2922, 2019.
  • [22] S. Mitra, M. Hasanuzzaman, and S. Saha, “A unified multi-view clustering algorithm using multi-objective optimization coupled with generative model,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 14, no. 1, pp. 1–31, 2020.
  • [23] X. Peng, Z. Huang, J. Lv, H. Zhu, and J. T. Zhou, “Comic: Multi-view clustering without parameter selection,” in International conference on machine learning.   PMLR, 2019, pp. 5092–5101.
  • [24] Y. Wang, D. Chang, Z. Fu, and Y. Zhao, “Consistent multiple graph embedding for multi-view clustering,” IEEE Transactions on Multimedia, 2021.
  • [25] Y. Lin, Y. Gou, X. Liu, J. Bai, J. Lv, and X. Peng, “Dual contrastive prediction for incomplete multi-view representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  • [26] Z. Kang, Z. Lin, X. Zhu, and W. Xu, “Structured graph learning for scalable subspace clustering: From single view to multiview,” IEEE Transactions on Cybernetics, vol. 52, no. 9, pp. 8976 – 8986, 2022.
  • [27] D. Huang, C.-D. Wang, and J.-H. Lai, “Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  • [28] S. Liu, X. Liu, S. Wang, X. Niu, and E. Zhu, “Fast incomplete multi-view clustering with view-independent anchors,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • [29] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
  • [30] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” NIPS Workshop on Bayesian Deep Learning, 2016.
  • [31] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representation learning on graphs,” in International Conference on Machine Learning.   PMLR, 2020, pp. 4116–4126.
  • [32] S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and M. Valko, “Bootstrapped representation learning on graphs,” in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
  • [33] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep Graph Contrastive Representation Learning,” in ICML Workshop on Graph Representation Learning and Beyond, 2020.
  • [34] D. Bo, X. Wang, C. Shi, M. Zhu, E. Lu, and P. Cui, “Structural deep clustering network,” in Proceedings of The Web Conference 2020, 2020, pp. 1400–1410.
  • [35] W. Tu, S. Zhou, X. Liu, X. Guo, Z. Cai, E. Zhu, and J. Cheng, “Deep fusion clustering network,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, 2021, pp. 9978–9987.
  • [36] Y. Liu, W. Tu, S. Zhou, X. Liu, L. Song, X. Yang, and E. Zhu, “Deep graph clustering via dual correlation reduction,” in Proc. of AAAI, 2022.
  • [37] Z. Lin and Z. Kang, “Graph filter-based multi-view attributed graph clustering.” in IJCAI, 2021, pp. 2723–2729.
  • [38] M. Hamidouche, C. Lassance, Y. Hu, L. Drumetz, B. Pasdeloup, and V. Gripon, “Improving classification accuracy with graph filtering,” in 2021 IEEE International Conference on Image Processing (ICIP).   IEEE, 2021, pp. 334–338.
  • [39] Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han, and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4412–4419.
  • [40] S. Wang, X. Liu, X. Zhu, P. Zhang, Y. Zhang, F. Gao, and E. Zhu, “Fast parameter-free multi-view subspace clustering with consensus anchor guidance,” IEEE Transactions on Image Processing, vol. 31, pp. 556–568, 2021.
  • [41] X. Li, B. Kao, C. Shan, D. Yin, and M. Ester, “CAST: A correlation-based adaptive spectral clustering algorithm on multi-scale data,” in The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.   ACM, 2020, pp. 439–449.
  • [42] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
  • [43] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020.
  • [44] F. M. Bianchi, D. Grattarola, and C. Alippi, “Spectral clustering with graph neural networks for graph pooling,” in International Conference on Machine Learning.   PMLR, 2020, pp. 874–883.
  • [45] G. Karypis and V. Kumar, “Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices,” 1997.
  • [46] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR, vol. 2, no. 3, p. 4, 2019.
  • [47] B. P. Anton Tsitsulin, John Palowitch and E. Müller, “Graph clustering with graph neural networks,” in Proceedings of the 16th International Workshop on Mining and Learning with Graphs (MLG), 2020.
  • [48] B. **g, C. Park, and H. Tong, “Hdmi: High-order deep multiplex infomax,” in Proceedings of the Web Conference 2021, 2021, pp. 2414–2424.
  • [49] Y. Lin, Y. Gou, Z. Liu, B. Li, J. Lv, and X. Peng, “Completer: Incomplete multi-view clustering via contrastive prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 174–11 183.
  • [50] X. Xie, W. Chen, Z. Kang, and C. Peng, “Contrastive graph clustering with adaptive filter,” Expert Systems with Applications, vol. 219, p. 119645, 2023.
  • [51] H. Pei, B. Wei, K. C. Chang, Y. Lei, and B. Yang, “Geom-gcn: Geometric graph convolutional networks,” in 8th International Conference on Learning Representations, ICLR 2020,, 2020.
  • [52] B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, 2021.
  • [53] P. Zhu, J. Li, Y. Wang, B. Xiao, S. Zhao, and Q. Hu, “Collaborative decision-reinforced self-supervision for attributed graph clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2022.