CDC: A Simple Framework for Complex Data Clustering††thanks: Z. Kang, X. Xie, B. Li, E. Pan are with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China (e-mail: [email protected]; [email protected]; bingheng86, [email protected]).
Abstract
In today’s data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.
Index Terms:
Anchor graph, clustering, large-scale data, topology structure, multiview learningI Introduction
Clustering is a fundamental technique for unsupervised learning that groups data points into different clusters without labels. It is driven by diverse applications in scientific research and industrial development, which induce complex data types [1], such as multi-view, non-Euclidean, and multi-relational. Specifically, in many real-world applications, data are often gathered from multiple sources or different extractors and therefore exhibit different features, dubbed as multi-view data [2]. Despite the fact that each view may be noisy and incomplete, important factors, such as geometry and semantics, tend to be shared across all views. Different views also provide complementary information, so it is paramount for multi-view clustering (MVC) methods to integrate diverse features. For example, [3] learns a consensus graph with a rank constraint on the corresponding Laplacian matrix from multiple views for clustering. [4] employs intra-view collaborative learning to harvest complementary and consistent information among different views.
Along with the development of sophisticated data collection and storage techniques, the size of data increases explosively. To handle large-scale data efficiently, several MVC methods with linear complexity have been proposed. [5] learns an anchor graph for each view and concatenates them for multi-view subspace clustering. [6], [7], and [8] construct bipartite graphs or learn representations based on anchors. [9] effectively integrates the collaborative information from multiple views via learning discrete representations and binary cluster structures jointly. Despite these progresses, they often produce unstable performance towards different datasets because of the randomness in anchor selection.
Recently, non-Euclidean graph data become pervasive since they contain not only node attributes but also topological structure information, which characterizes relations among data points [10]. Social network users, for example, have their own profiles and social relationships reflected by the topological graph. Traditional clustering methods exploit either attribute or graph structure and could not achieve the best performance [11]. Graph Neural Network (GNN) is a powerful tool to simultaneously explore node attribute and structure [12]. Based on it, several graph clustering methods have been designed [13, 14, 15]. In some applications, the graph could exhibit multi-view attributes or multi-relational. To cluster multi-view graphs, [16] learns a representation for each view and forces them to be close. To handle multi-relational graphs, [17] finds the most informative graph to recover multiple graphs.
Though the remarkable success of GNN-based methods in graph clustering, there is still one crucial question, i.e., scalability, which prevents the deployment of them to web-scale graph data. For example, ogbn-papers100M contains more than 100M nodes, which could not be processed by most graph clustering methods. Although [18, 19] have made advances in scalable graph clustering through applying a light-weight encoder and contrastive learning, their performance highly depends on graph augmentation. Therefore, scalability for graph clustering is still under-explored, and more dedicated efforts are pressingly needed.
We can see that some specialized methods have been developed to address one of the above problems and there lacks a unified model for complex data clustering that generalizes well, while still being scalable. To fill this gap, we propose a simple yet effective framework for Complex Data Clustering (CDC). We first use graph filtering to fuse raw features and topology information, which produces cluster-ability representations and provides a flexible way to handle different types of data. Rather than construct a complete graph from all data points, CDC learns anchor graphs, resulting in linear computation complexity. In particular, we generate anchors adaptively with a similarity-preserving regularizer to alleviate the randomness from anchor selection. To summarize, we make the following contributions:
-
•
We propose a simple clustering framework for complex data, e.g., single-view and multi-view, graph and non-graph, small-scale and large-scale data. Our method has linear time and space complexity.
-
•
(Section III) We are the first to propose a similarity-preserving regularizer to automatically learn high-quality anchors from data.
-
•
(Section •
(Section V) CDC achieves impressive performance on 14 complex datasets. Most notably, it scales beyond the graph with more than 111M nodes.
II Related Work
II-A Multi-view Clustering
MVC methods generally focus on enhancing performance by utilizing the global consensus and complementary information among multiple views. [20] mines view-shared information via adding a sample-level contrastive module to align angles between representations. [21] uses Hilbert Schmidt Independence Criterion (HSIC) to explore underlying cluster structure shared by multiple views. [22] generates an automatic partitioning with data of multiple views via a multi-objective clustering framework. [23] achieves cross-view consensus by projecting data points into a space where geometric and cluster assignments are consistent.
Different from shallow methods, deep MVC methods learn good representations via designed neural networks. [24] applies attention encoder and multi-view mutual information maximization to capture the complementary information, consistency information, and internal relations of each view. Recently, some methods have combined the contrastive learning mechanisms to obtain clustering-favorable representations. For example, [25] performs instance-level and category-level contrastive learning to improve cross-view consistency. However, these methods are not scalable to large-scale data. To reduce the complexity, [5, 26] construct bipartite graphs between cluster centroids of -means and raw data points, where the anchors are chosen randomly and fixed for subsequent learning. [27] leverages features, anchors, and neighbors jointly to construct bipartite graphs. [28] captures the view-specific and consistent information by constructing a consensus graph from view-independent anchors. Although they all have linear complexity, their performance could be sub-optimal since the pre-defined anchors are not updated according to the downstream task. Differently, we generate high-quality anchors adaptively, which is efficient and stable on complex data.II-B Graph Clustering
Graph clustering methods aim to group nodes based on node attributes and topological structure information. Some representation learning methods, such as Node2vec [29] and GAE [30], can be used to learn embeddings for traditional clustering techniques. However, the obtained embeddings might not be appropriate for clustering because they are not specific to learn representations of cluster-ability. MVGRL [31], BGRL [32], and GRACE [33] obtain classification-favorable representation via contrastive graph learning, but are not applicable on large-scale graph due to their computation cost of data augmentation. Although MCGC [11] is augmentation-free by regarding k-nearest neighbors as positive pairs, the algorithm has a square complexity. Other deep graph clustering methods, like SDCN [34], DFCN [35], and DCRN [36], achieve promising performance via training MLP and GNNs jointly on small-scale/medium-scale graph. MvAGC [37] has low complexity, but is not efficient owing to its anchor sampling strategy. Hence, these graph clustering methods cannot effectively and efficiently handle large-scale graph. Though SGC [18] obtains promising results on extra-large-scale graph via random walk-based sampler and light-weight encoder, there is expensive computation cost of training. Our method can handle graph clustering in linear time with promising performance.
III Methodology
Notation
Define the generic data as , where represents the set of nodes, denotes the relationship between node and node in the -th view. and are the number of relational graphs and attributes, and the data is non-graph when initial . is the feature matrix, is the dimension of features. Adjacency matrices characterize the initial graph structure. For non-graph data, we construct adjacency matrices in each view via the 5-nearest neighbor method. There are views after graph filtering for each dataset, where for graph data and for non-graph data. represent the degree matrices in various views. The normalized adjacency matrix is and the corresponding graph Laplacian is .
III-A Graph Filtering
Filtered features are more clustering-favorable [38], and we apply graph filtering to remove undesirable high-frequency noise while preserving the graph’s geometric features. Similar to [11], smoothed is obtained by solving the following optimization problem:
(1) We keep the first-order Taylor series of from Eq. (1) and apply -order filtering, which yields:
(2) where is a non-negative integer and it controls the depth of feature aggregation and smoothness of representation. In addition to learning smooth features, graph filtering is also used to unify different types of data into our framework.
III-B Anchor Graph Learning
We use the idea of data self-expression to capture the relations among data points, i.e., each sample can be presented as a linear combination of other data points. The combination coefficient matrix can be regarded as a reconstructed graph [11]. To reduce the computation complexity, representative samples called anchors are selected to construct anchor graph [39]. However, the performance of this approach is unstable since it introduces anchors in a probabilistic way. Moreover, once the anchors are chosen, they won’t be updated, which could lead to sub-optimal performance. To get rid of uncertainty in anchor selection, we propose to learn anchors from data, i.e., anchors are generated adaptively. To guarantee the quality of anchors, we enforce that the similarity between and is preserved, i.e., . Then we formalize the graph learning problem as:
(3) where is a balance parameter. To make it easy to solve, we relax the above problem to:
(4) It has two advantages over other anchor-based methods: efficient and adaptive generation of high-quality anchors. First, existing methods often repeat many times to reduce the uncertainty in results, which is time-consuming and is not suitable for large-scale data. Second, existing methods perform anchor selection and graph learning in two separate steps. By contrast, we follow a joint learning approach, where anchors and anchor graphs will be mutually boosted by each other.
For a multi-view scenario, each view could contribute differently. Therefore, we introduce learnable weights and achieve a consensus anchor graph by solving the following model:(5) Note that we learn anchors for each view to capture distinctive information. After constructing anchor graph , can be used as input to obtain the spectral embedding for clustering in traditional anchor-based methods, where is a diagonal matrix with . According to [40], the right singular vectors of are the same as the eigenvectors of . Consequently, we perform singular value decomposition (SVD) on and then run -means on the right vector to produce the final result, which needs instead of .
III-C Optimization
To solve Eq. (5), we use an alternative strategy.
III-C1 Initialization of
We could optionally initialize with the cluster centers by dividing into partitions with a -means algorithm.
III-C2 Update
By fixing and , we set the derivative of the objective function with respect to to zero, we have:
(6) For single-view scenario, the solution is .
III-C3 Update
By fixing and , Eq. (5) can be rewritten as:
(7) where , , . Then we can obtain by solving Sylvester Equation.
III-C4 Update
Fixing and , we let . Then the problem is simplified as:
(8) This is a standard quadratic programming problem, which yields:
Comment The optimization procedure will monotonically decrease the objective function value in Eq. (5) in each iteration. Since the objective function has a lower bound, such as zero, the above iteration converges.
III-D Complexity Analysis
The adjacency graph is often sparse in real-world scenarios. Consequently, we implement graph filtering with sparse matrix techniques, which takes linear time while multiplication operation takes in general, where . Assume there are iterations in total, then the optimization of takes . Specifically, all multiplications and additions take and the inverse operation needs . Then optimization of and takes and , where . It is worth pointing out that anchor generation has a constant complexity, which won’t be limited by the size of the data. We perform SVD on and implement -means to obtain clustering result, which takes and respectively, where is the iteration number of -means and is cluster number. In practice, , , and , and are constants, so the proposed method has a linear time complexity. Moreover, the largest space cost is or , which means our approach has a linear space complexity.
We compare our complexity with baselines in Table I. The iteration number is omitted. The represents the average degree of the graph in SGC. and are the number of view groups and nearest neighbors for each view group, where view group is defined as a group of multiple randomly selected views. is the batch size, remaining symbols are the same as those in the main body of CDC. It can be seen that our method has clear advantages and only suffers from feature dimension. What’s more, for high-dimensional data, dimension reduction techniques can be applied.
TABLE I: The brief complexity analysis of recent SOTA methods. Methods Time Space Single-view MVGRL SGC Multi-view MCGC MvAGC Non-graph EOMSC-CA FastMICE Proposed CDC IV Theoretical Analysis
We establish theoretical support for our method: 1) filtered features encode node attribute and topology structure; 2) the learned anchor graph is clustering-favorable.
Definition IV.1 (Grou** effect [41]).
There are two similar nodes and in terms of local topology and node features, i.e., , the matrix is said to have a grou** effect if
Theorem IV.2.
Define the distance between filtered node and is , we have , i.e., the filtered features preserve both topology and attribute similarity.
Proof.
Note , then . Then we have . Expand it as follows:
Then compute the distance of node and :
(9) So, if , . However, when nodes are similar to each other in only one space, i.e., either or , has a non-zero upper bound unless is large enough. This indicates that the filtered representations of similar nodes in both attribute and topology space get closer, and different graph filtering order will adjust this bias.
Theorem IV.3.
Let , then , where is a constant matrix. We have , , i.e., the learned anchor graph have a grou** effect.
∎
Proof.
Define and , where is the th row of . Then let , which yields . Let , thus . Eventually,
Remarking , we obtain . ∎
This indicates that local structures of similar nodes tend to be identical on the learned graph , which makes corresponding nodes be clustered into the same group. In other words, the learned graph is clustering-friendly. To intuitively demonstrate the grou** effect of the anchor graph, we plot five diagrams of in Fig.
on ACM has a stronger grou** effect than the one on Pubmed.Z 𝑍 Z italic_Z V Experiments
V-A Datasets and Metrics
TABLE II: Statistical information of datasets. Type Datasets Samples Edges/Dims Clusters Graph Single-view Citeseer 3327 4614 / 3703 6 Pubmed 19717 44325 / 500 3 Multi-relational ACM 3025 29281, 2210761 / 1830 3 DBLP 4057 11113, 5000495, 6776335 / 334 4 Multi-attribute AMAP 7487 119043 / 745, 7487 8 AMAC 13381 245778 / 767, 13381 10 Extra-/Large-scale Products 2449029 61859140 / 100 47 Papers100M 111059956 1615685872 / 128 172 Non-graph Large-scale multi-view YTF-31 101499 507495 / 64, 512, 64, 647, 838 31 YTF-400 398191 1990955 / 944, 576, 512, 640 400 To show the effectiveness and efficiency of the CDC, we evaluate CDC on 10 benchmark datasets, including 6 multi-view data and 4 single-view data. More specifically, ACM and DBLP [17] are multi-view graphs with multiple relations, AMAP and AMAC [37] are multi-attribute graphs, YTF-31 [7] and YTF-400 [27] are multi-view non-graph data (YouTube-Faces); Citeseer, Pubmed [12], Products and Papers100M [42] are single-view graphs, where the latter two are from Open Graph Benchmarks [43]. The statistical information of these datasets is shown in Table II. Most notably, YTF-400 represents the largest multi-view non-graph dataset, while Papers100M is the largest graph used in the clustering task. We adopt four popular clustering metrics, including ACCuracy (ACC), Normalized Mutual Information (NMI), F1 score, and Adjusted Rand Index (ARI). A higher value of them indicates a better performance.
V-B Experimental Setup
We compare CDC with a number of single-view methods as well as multi-view methods.
Single-view graph Baselines include MinCutPool [44], METIS [45], Node2vec [29], DGI [46], DMoN [47], GRACE [33], BGRL [32], MVGRL [31], and SGC [18].METIS uses only structural information to partition graphs. Node2vec is a well-known graph embedding algorithm based on random walks. MinCutPool and DMoN integrate spectral clustering with graph neural networks. DGI learns node representations by maximizing mutual information between patch representations and corresponding high-level summaries of graphs. GRACE, BGRL, and MVGRL are three contrastive graph representation learning methods. SGC is a recent scalable graph clustering method, which uses light-weight encoder and random walk-based sampler.
Multi-view graph There are 10 baselines on multi-view graphs clustering, including SDCN [34], DAEGC [14], O2MAC [17], HDMI [48], CMGEC [24], COMPLETER [49], MvAGC [37], MCGC [11], MVGRL [31], and MAGCN [16]. The first six methods are only applicable to data with multiple graphs or multiple attributes, whereas the last four are applicable to general multi-view graph data.
Graph attention auto-encoders and GCNs are used in SDCN and DAEGC, respectively.To get consistent embedding, CMGEC adds a graph fusion network to multiple graph auto-encoders. In O2MAC, the most informative view is selected to learn cluster representation. HDMI learns node embeddings by using high-order mutual information. MAGCN applies graph auto-encoder on both attributes and topological graphs to learn consensus representations. Through contrastive mechanisms, COMPLETER and MVGRL learn a common representation shared across multiple views and graphs. MCGC uses a contrastive regularizer to boost the quality of the learned graph. In MvAGC, high-order topological interactions are explored to improve clustering performance.
Non-graph We compare CDC with six scalable MVC methods on non-graph data, including BMVC [9], LMVSC [5], MSGL [26], FPMVS [40], EOMSC-CA [7], and FastMICE [27].
BMVC learns discrete representations and binary cluster structures jointly to integrate collaborative information. MSGL and LMVSC are two scalable subspace clustering methods. FPMVS and EMOMSC-CA are two adaptive anchor-based algorithms. The differences between CDC and them are: 1) CDC uses a similarity preservation regularizer while anchor matrices are assumed to be unitary matrices in FPMVS and EMOMSC; 2) the complexity of anchor generation in CDC is not linked with data size. FastMICE constructs anchor graphs by using features, anchors, and neighbors jointly.
Parameter setting The balance parameters and are set as . The number of anchors is set as . All experiments are conducted on the same machine with the Intel(R) Core(TM) i9-12900k CPU, two GeForce GTX 3090 GPUs, and 128GB RAM.
V-C Results
TABLE III: Results on extra-large-scale graph. Metrics Papers100M -means Node2vec DGI SGC CDC ACC 0.144 0.175 0.151 0.173 0.174 NMI 0.368 0.380 0.416 0.453 0.427 ARI 0.074 0.112 0.096 0.110 0.114 F1 0.101 0.099 0.111 0.118 0.119 V-C1 Single-view Scenario
The results on the small-scale graph Citeseer, medium-scale graph Pubmed, large-scale graph Products, and extra-large-scale graph Papers100M are shown in Table III and Table IV. Note that most neural network-based methods can’t handle large and extra-large-scale graphs. On Citeseer and Pubmed, our method achieves the best results, and on Products and Papers100M, our method produces competitive results. In particular, on Pubmed, CDC surpasses the most recent SGC method by more than 3 in all metrics. CDC also shows a slight advantage against SGC on the largest Papers100M dataset. Furthermore, CDC involves a lower time cost in comparison to SGC. Specifically, it takes 5mins and 4hs on Products and Papers100M, while SGC consumes 1h and 24hs, which proves CDC’s efficiency when it comes to large/extra-large-scale graphs. With respect to many GNN-based methods, like MinCutPool, DGI, DMON, GRACE, BGRL, and MVGRL, CDC demonstrates a clear edge.
TABLE IV: Results on single-view graphs. ”-” denotes that the method ran out of memory (OM) or didn’t converge. The best results are denoted with red and the with blue. Citeseer Pubmed Products Method ACC NMI ARI F1 ACC NMI ARI F1 ACC NMI ARI F1 MinCutPool
0.537 0.295 0.262 0.516 0.521 0.214 0.175 0.445 0.257 0.430 0.180 0.130 METIS
0.413 0.170 0.150 0.400 0.693 0.297 0.323 0.682 0.294 0.468 0.220 0.145 Node2vec
0.421 0.240 0.116 0.401 0.641 0.288 0.258 0.634 0.357 0.489 0.247 0.170 DGI
0.686 0.435 0.445 0.643 0.657 0.322 0.292 0.654 0.320 0.467 0.192 0.174 DMoN
0.385 0.303 0.200 0.437 0.351 0.257 0.108 0.343 0.304 0.428 0.210 0.139 GRACE
0.631 0.399 0.377 0.603 0.637 0.308 0.276 0.628 - - - - BGRL
0.675 0.422 0.428 0.631 0.654 0.315 0.285 0.649 - - - - MVGRL
0.703 0.459 0.471 0.654 0.675 0.345 0.310 0.672 - - - - SGC
0.688 0.441 0.448 0.643 0.713 0.333 0.345 0.703 0.402 0.536 0.25 0.23 CDC
0.709 0.444 0.471 0.661 0.741 0.371 0.383 0.737 0.366 0.390 0.121 0.187 V-C2 Multi-view Scenario
Clustering on Multi-view Graphs
CDC clustering results on multi-attribute and multi-relational graphs are reported in Table V and Table VI. The performance of CDC is much better than that of any other methods on four benchmarks in all metrics. For example, compared to the 2nd best method MCGC, ACC, NMI, ARI on ACM, AMAP, and AMAC are improved by more than 5, 7, and 9 on average, respectively. Although MvAGC samples nodes as anchors, it takes more time than CDC since its sampling strategy suffers from low efficiency. Specifically, CDC is more than and faster on the multi-relational and multi-attribute graphs, respectively. Compared to other methods, the advantage is more significant. Therefore, CDC is a promising clustering method for graph data with various forms.
TABLE V: Results on the multi-relational graph. Method ACM DBLP ACC NMI ARI F1 ACC NMI ARI F1 SDCN 0.863 0.578 0.639 0.862 0.650 0.298 0.310 0.638 DAEGC 0.891 0.643 0.705 0.891 0.873 0.674 0.701 0.862 O2MAC 0.904 0.692 0.739 0.905 0.907 0.729 0.778 0.901 HDMI 0.874 0.645 0.674 0.872 0.885 0.692 0.753 0.865 CMGEC 0.909 0.691 0.723 0.907 0.910 0.724 0.786 0.904 MvAGC 0.898 0.674 0.721 0.899 0.928 0.773 0.828 0.923 MCGC 0.915 0.713 0.763 0.916 0.930 0.775 0.830 0.925 CDC 0.936 0.769 0.817 0.936 0.933 0.781 0.836 0.929 TABLE VI: Results on the multi-attribute graph. Datasets AMAP AMAC Method ACC NMI ARI F1 ACC NMI ARI F1 COMPLETER 0.368 0.261 0.076 0.307 0.242 0.156 0.054 0.160 MVGRL 0.505 0.433 0.238 0.460 0.245 0.101 0.055 0.171 MAGCN 0.517 0.390 0.240 0.474 - - - - MvAGC 0.678 0.524 0.397 0.640 0.580 0.396 0.322 0.412 MCGC 0.716 0.615 0.432 0.686 0.597 0.532 0.390 0.520 CDC 0.795 0.707 0.620 0.730 0.647 0.604 0.437 0.546 Clustering on Multi-view Non-graph data
TABLE VII: Results on large-scale multi-view non-graph data. Method YTF-31 YTF-400 ACC NMI F1 ACC NMI F1 BMVC 0.090 0.059 0.058 - - - LMVSC 0.140 0.118 0.083 0.489 0.767 0.589 MSGL 0.167 0.001 0.151 0.502 0.738 0.606 FPMVS 0.230 0.234 0.140 0.562 0.797 0.472 EOMSC-CA 0.265 0.003 0.164 0.570 0.779 0.408 FastMICE 0.275 0.236 0.295 0.564 0.798 0.509 CDC 0.285 0.260 0.298 0.571 0.745 0.591 To process non-graph data, we manually construct 5-nearest neighbor graphs for graph filtering. Table VII shows the clustering results on YTF-31 and YTF-400. We find that most existing methods can’t handle YTF-400, which is the largest non-graph multi-view data. CDC still achieves the best results in most cases. Though some others also use anchor ideas, their computation time cost is still high. Specifically, CDC takes 20s and 1min respectively, while EOMSC-CA needs 2mins, 6mins and FastMICE takes 30s, 3mins on these two datasets. This verifies that CDC is also a promising clustering method for non-graph data. The time cost of several recent SOTA methods is summarized in Fig. 2.
Figure 2: Run time of existing SOTA methods on various datasets V-D Ablation Study
V-D1 Effect of Similarity-Preserving
Anchors are generated adaptively in the similarity space constrained by a similarity-preserving (marked as SP) regularizer. To clearly show the effect of SP, we remove it from the model and test the performance of CDC w/o SP on Pubmed and ACM in Table VIII. It’s clear that similarity preserving does improve the clustering performance by 3 on average. Moreover, CDC takes less time than CDC w/o SP on two datasets. The reason is that the computation complexity for is , which is higher than in CDC. Therefore, as a bonus, SP regularizer helps to reduce the complexity of anchor generation. Moreover, it improves the quality of anchors. As observed in Fig. 3, CDC achieves the best results with a few anchors, which further reduces the computation cost. In fact, too many anchors could deteriorate the performance since some noisy anchors that are not representative could be introduced.
TABLE VIII: Results of CDC with/without GF and SR. Method Pubmed ACM ACC NMI F1 Time (s) ACC NMI F1 Times (s) CDC 0.741 0.371 0.737 2.03 0.936 0.769 0.936 0.81 w/o SP 0.707 0.349 0.704 5.91 0.918 0.710 0.919 1.03 (-0.034) (-0.022) (-0.033) (+3.88) (-0.018) (-0.059) (-0.017) (+0.22) w/o GF 0.626 0.256 0.639 2.86 0.872 0.585 0.871 0.88 (-0.115) (-0.115) (-0.098) (+0.83) (-0.064) (-0.184) (-0.055) (+0.07) (a) ACM (b) Pubmed Figure 3: Results on ACM and Pubmed with different anchor number . V-D2 Effect of Graph Filtering
Graph filtering (marked as GF) is applied to integrate node attributes and topology in our method. Besides theoretically showing that the learned anchor graph from filtered representations is clustering-favorable, we also show this experimentally in Table VIII. We can see that the performance of CDC w/o GF drops about 10 on average, which validates the significance of graph filtering. We also observe the increase in run time, which could be caused by the slow convergence due to the loss of cluster-ability.
V-D3 Robustness on Heterophily
TABLE IX: Results on heterophilic graphs. Texas Cornell Wisconsin Squirrel Methods ACC NMI ACC NMI ACC NMI ACC NMI CDRS 0.599 0.154 - - 0.562 0.137 - - CGC 0.615 0.215 0.446 0.141 0.559 0.230 0.272 0.030 CDC 0.672 0.293 0.514 0.142 0.637 0.318 0.279 0.043 In some real-world applications, graphs could be heterophilic, where connected nodes tend to have different labels [50]. To show the robustness of CDC on heterophily, we report the results on several popular heterophilic graphs, including Texas, Cornell, Wisconsin [51], Squirrel [52]. As shown in Table IX, our proposed CDC dominates the recent SOTA methods, CDRS [53] and CGC [50]. Although the used low-pass filter is considered to be less useful for heterophily, CDC still works well because of the high-quality anchors and clustering-friendly graph. In fact, there are few graph clustering methods for heterophily, further works like an omnipotent filter could contribute a lot to handling clustering on heterophilic graphs.
VI Parameter Analysis
There are two trade-off parameters, and , in our model. As shown in Figure 4, although CDC works well for a wide range of and , fine-tuning does enhance its performance. makes less impact than , which indicates that the similarity-preserving regularizer is more important. The proposed CDC is of linear complexity, so fine-tuning procedures take little time.
(a) ACM (b) Pubmed Figure 4: Accuracy on ACM and Pubmed with different and . Besides, we also visualize the objective function value of CDC on ACM, Pubmed and YTF-31 in Fig. 5. It can be seen that losses converge fast.
(a) loss of CDC on ACM (b) loss of CDC on Pubmed (c) loss of CDC on YTF-31 Figure 5: The objective function value of CDC. VII Conclusion
In this paper, we propose a simple framework for clustering complex data, which is readily applicable to graph and non-graph, multi-view and single-view data. The developed method has linear complexity and nice theoretical properties. With graph filtering, we integrate deep structural information and learn representations with cluster-ability. In particular, a similarity-preserving regularizer is designed to adaptively generate high-quality anchors, which alleviates the burden and randomness of anchor selection. CDC demonstrates its effectiveness and efficiency with impressive results on 14 complex datasets. In particular, it even exceeds the performance of many complex GNN-based methods. In light of the simplicity of the proposed framework and its effectiveness on various types of data, this work could have a broad impact on the clustering community and have a high potential for deployment in real applications. One potential limitation of the CDC is that it might not be able to handle high-dimensional data efficiently, since anchor generation has a cubic complexity of sample dimension.
References
- [1] A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recognition letters, vol. 31, no. 8, pp. 651–666, 2010.
- [2] J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion, vol. 38, pp. 43–54, 2017.
- [3] K. Zhan, F. Nie, J. Wang, and Y. Yang, “Multiview consensus graph clustering,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1261–1270, 2018.
- [4] X. Yang, C. Deng, Z. Dang, and D. Tao, “Deep multiview collaborative clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- [5] Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han, and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4412–4419.
- [6] X. Li, H. Zhang, R. Wang, and F. Nie, “Multiview clustering: A scalable and parameter-free bipartite graph fusion method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 330–344, 2020.
- [7] S. Liu, S. Wang, P. Zhang, K. Xu, X. Liu, C. Zhang, and F. Gao, “Efficient one-pass multi-view subspace clustering with consensus anchors,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7576–7584.
- [8] M. Sun, P. Zhang, S. Wang, S. Zhou, W. Tu, X. Liu, E. Zhu, and C. Wang, “Scalable multi-view subspace clustering with unified anchors,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3528–3536.
- [9] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary multi-view clustering,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1774–1782, 2018.
- [10] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, “Community preserving network embedding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017.
- [11] E. Pan and Z. Kang, “Multi-view contrastive graph clustering,” Advances in neural information processing systems, vol. 34, pp. 2148–2159, 2021.
- [12] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in 5th International Conference on Learning Representations, 2017.
- [13] X. Zhang, H. Liu, Q. Li, and X. Wu, “Attributed graph clustering via adaptive graph convolution,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, S. Kraus, Ed., 2019, pp. 4327–4333.
- [14] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “Attributed graph clustering: A deep attentional embedding approach,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, 2019, pp. 3670–3676.
- [15] S. Pan, R. Hu, S.-f. Fung, G. Long, J. Jiang, and C. Zhang, “Learning graph embedding with adversarial training methods,” IEEE transactions on cybernetics, vol. 50, no. 6, pp. 2475–2487, 2019.
- [16] J. Cheng, Q. Wang, Z. Tao, D. Xie, and Q. Gao, “Multi-view attribute graph convolution networks for clustering,” in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 2973–2979.
- [17] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi graph autoencoder for multi-view graph clustering,” in Proceedings of The Web Conference 2020, 2020, pp. 3070–3076.
- [18] F. Devvrit, A. Sinha, I. S. Dhillon, and P. Jain, “S3GC: Scalable self-supervised graph clustering,” in Advances in Neural Information Processing Systems, 2022.
- [19] Y. Liu, K. Liang, J. Xia, S. Zhou, X. Yang, X. Liu, and S. Z. Li, “Dink-net: Neural clustering on large graphs,” in International Conference on Machine Learning, ICML 2023. PMLR, 2023.
- [20] D. J. Trosten, S. Lokse, R. Jenssen, and M. Kampffmeyer, “Reconsidering representation alignment for multi-view clustering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1255–1265.
- [21] R. Li, C. Zhang, Q. Hu, P. Zhu, and Z. Wang, “Flexible multi-view representation learning for subspace clustering.” pp. 2916–2922, 2019.
- [22] S. Mitra, M. Hasanuzzaman, and S. Saha, “A unified multi-view clustering algorithm using multi-objective optimization coupled with generative model,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 14, no. 1, pp. 1–31, 2020.
- [23] X. Peng, Z. Huang, J. Lv, H. Zhu, and J. T. Zhou, “Comic: Multi-view clustering without parameter selection,” in International conference on machine learning. PMLR, 2019, pp. 5092–5101.
- [24] Y. Wang, D. Chang, Z. Fu, and Y. Zhao, “Consistent multiple graph embedding for multi-view clustering,” IEEE Transactions on Multimedia, 2021.
- [25] Y. Lin, Y. Gou, X. Liu, J. Bai, J. Lv, and X. Peng, “Dual contrastive prediction for incomplete multi-view representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- [26] Z. Kang, Z. Lin, X. Zhu, and W. Xu, “Structured graph learning for scalable subspace clustering: From single view to multiview,” IEEE Transactions on Cybernetics, vol. 52, no. 9, pp. 8976 – 8986, 2022.
- [27] D. Huang, C.-D. Wang, and J.-H. Lai, “Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity,” IEEE Transactions on Knowledge and Data Engineering, 2023.
- [28] S. Liu, X. Liu, S. Wang, X. Niu, and E. Zhu, “Fast incomplete multi-view clustering with view-independent anchors,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- [29] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
- [30] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” NIPS Workshop on Bayesian Deep Learning, 2016.
- [31] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representation learning on graphs,” in International Conference on Machine Learning. PMLR, 2020, pp. 4116–4126.
- [32] S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and M. Valko, “Bootstrapped representation learning on graphs,” in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
- [33] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep Graph Contrastive Representation Learning,” in ICML Workshop on Graph Representation Learning and Beyond, 2020.
- [34] D. Bo, X. Wang, C. Shi, M. Zhu, E. Lu, and P. Cui, “Structural deep clustering network,” in Proceedings of The Web Conference 2020, 2020, pp. 1400–1410.
- [35] W. Tu, S. Zhou, X. Liu, X. Guo, Z. Cai, E. Zhu, and J. Cheng, “Deep fusion clustering network,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, 2021, pp. 9978–9987.
- [36] Y. Liu, W. Tu, S. Zhou, X. Liu, L. Song, X. Yang, and E. Zhu, “Deep graph clustering via dual correlation reduction,” in Proc. of AAAI, 2022.
- [37] Z. Lin and Z. Kang, “Graph filter-based multi-view attributed graph clustering.” in IJCAI, 2021, pp. 2723–2729.
- [38] M. Hamidouche, C. Lassance, Y. Hu, L. Drumetz, B. Pasdeloup, and V. Gripon, “Improving classification accuracy with graph filtering,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 334–338.
- [39] Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han, and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4412–4419.
- [40] S. Wang, X. Liu, X. Zhu, P. Zhang, Y. Zhang, F. Gao, and E. Zhu, “Fast parameter-free multi-view subspace clustering with consensus anchor guidance,” IEEE Transactions on Image Processing, vol. 31, pp. 556–568, 2021.
- [41] X. Li, B. Kao, C. Shan, D. Yin, and M. Ester, “CAST: A correlation-based adaptive spectral clustering algorithm on multi-scale data,” in The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2020, pp. 439–449.
- [42] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
- [43] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020.
- [44] F. M. Bianchi, D. Grattarola, and C. Alippi, “Spectral clustering with graph neural networks for graph pooling,” in International Conference on Machine Learning. PMLR, 2020, pp. 874–883.
- [45] G. Karypis and V. Kumar, “Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices,” 1997.
- [46] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR, vol. 2, no. 3, p. 4, 2019.
- [47] B. P. Anton Tsitsulin, John Palowitch and E. Müller, “Graph clustering with graph neural networks,” in Proceedings of the 16th International Workshop on Mining and Learning with Graphs (MLG), 2020.
- [48] B. **g, C. Park, and H. Tong, “Hdmi: High-order deep multiplex infomax,” in Proceedings of the Web Conference 2021, 2021, pp. 2414–2424.
- [49] Y. Lin, Y. Gou, Z. Liu, B. Li, J. Lv, and X. Peng, “Completer: Incomplete multi-view clustering via contrastive prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 174–11 183.
- [50] X. Xie, W. Chen, Z. Kang, and C. Peng, “Contrastive graph clustering with adaptive filter,” Expert Systems with Applications, vol. 219, p. 119645, 2023.
- [51] H. Pei, B. Wei, K. C. Chang, Y. Lei, and B. Yang, “Geom-gcn: Geometric graph convolutional networks,” in 8th International Conference on Learning Representations, ICLR 2020,, 2020.
- [52] B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, 2021.
- [53] P. Zhu, J. Li, Y. Wang, B. Xiao, S. Zhao, and Q. Hu, “Collaborative decision-reinforced self-supervision for attributed graph clustering,” IEEE Transactions on Neural Networks and Learning Systems, 2022.