\useunder

\ul

HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection

Yali Fu111footnotemark: 1    **dong Li1    Jiahong Liu2   Qianli Xing1    Qi Wang1,3     Irwin King2
1Jilin University,  2The Chinese University of Hong Kong,
3Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Ministry of Education, China
{fuyl23, jdli21}@mails.jlu.edu.cn, {qianlixing, qiwang}@jlu.edu.cn,
 [email protected], [email protected]
 Equal Contribution. Corresponding Author.
Abstract

Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group information which plays a crucial role in UGAD. In addition, most previous works ignore the global underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors on UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit node group connections, we construct hypergraphs based on gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group connections and hyperbolic geometry into this field. Extensive experiments on several real-world datasets of different fields demonstrate the superiority of HC-GLAD on UGAD task. The code is available at https://github.com/Yali-F/HC-GLAD.

1 Introduction

Graph-level anomaly detection helps uncover anomalous behaviors hidden within complex graph structures 2022_WSDM_GLocalKD ; 2022_ScientificReports_GLADC ; 2023(2024)_NeurIPS_SIGNET , which has been widely applied in various fields, including social network analysis, bioinformatics, and network security. Unlike traditional anomaly detection methods that focus on individual data points or samples, graph-level anomaly detection focuses on the overall structure, topology, or features of the entire graph. Recently, there has been a growing interest in unsupervised graph-level anomaly detection as it offers an advantage by not relying on labeled data, rendering it applicable across various real-world scenarios. Despite the considerable research and exploration already conducted in this area 2023_WSDM_GOOD-D ; 2023_DASFAA_TUAF ; 2023_ECMLPKDD_CVTGAD ; 2023_ECMLPKDD_HimNet , there are still several issues that need to be further explored.

Firstly, most existing methods only exploit the pairwise relationships to conduct unsupervised graph-level anomaly detection. However, in the real world, the relationship among graph data is not merely pairwise. It also encompasses more complex relationships, such as node group relationships. For instance, as shown in Figure 1(a), the decisive factor in determining whether a graph is a normal graph or an anomalous graph lies in the number of distinct groups to which the central group is connected outwardly 2023(2024)_NeurIPS_SIGNET . This parallels the classification of molecules in chemistry and the assessment of whether molecule substances are anomalous. In fact, node group information needs to be considered not only in the field of chemical molecules but also in other real-world scenarios. Therefore, there is an urgent need to exploit node group information to capture key patterns for UGAD.

Refer to caption
Figure 1: Toy examples to illustrate two primary challenges: (a) node group information, (b) hierarchy information, and (c) different characters of Hyperbolic space and Euclidean space.

Secondly, the majority of current methods are based on GNNs established in Euclidean space 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D . But, the dimensionality of Euclidean space is a fundamental limitation on its ability to represent complex patterns 2014_NeurIPS_ARE . It has been demonstrated that numerous real-world datasets exhibit characteristics akin to those of complex networks, including the presence of latent hierarchical structure and power-law degree distributions 2003_PhysicalReview_Hierarchical ; 2020_ICML_WHC_Hyperbolic . For example, as shown in Figure 1(b), a small group of nodes is organized hierarchically into increasingly large groups. The tree-like hierarchical organization leads to a power-law distribution of node degrees 2003_PhysicalReview_Hierarchical ; 2017_NeurIPS_PoincareBall ; 2023_TKDE_HGSR_Hyperbolic . Nevertheless, Euclidean space cannot embed latent hierarchies without suffering from high distortion, and Euclidean methods are ill-equipped to model the hierarchy information of graph data 2020_ICML_WHC_Hyperbolic ; 2011_ISGD_DelaunayEmbedding_Hyperbolic , as shown in Figure 1(c). Therefore, it is necessary to employ a new paradigm or space to exploit the latent hierarchical information in UGAD.

Based on the aforementioned challenges and analysis, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection framework, namely HC-GLAD. In concrete, for the first challenge, we build a hypergraph based on the gold motif and execute hypergraph convolution to exploit node group information. By constructing a hypergraph with anomaly awareness and continuing with hypergraph convolution, information transfer between node groups is achieved in the form of hyperedges, which compensates for the shortcomings of existing methods. For the second challenge, we incorporate hyperbolic geometry into UGAD, and conduct both graph and hypergraph embedding learning in hyperbolic space, utilizing the hyperboloid model to exploit the latent hierarchical information to enhance the performance of UGAD. Hyperbolic space can naturally represent and obtain hierarchical information in graph data with a rich hierarchical or tree-like structure 2020_ICML_WHC_Hyperbolic . The power-law distribution and hierarchical structure in hyperbolic space are interrelated, mutually reinforcing, and collectively sha** the features and properties of the network 2019_AAAI_HHNE_Hyperbolic . Our major contributions are summarized as follows:

  • We propose a novel dual hyperbolic contrastive learning for unsupervised graph-level anomaly detection framework (HC-GLAD). To the best of our knowledge, this is the first work to introduce hypergraph exploiting node group connections and hyperbolic geometry to unsupervised graph-level anomaly detection task.

  • We utilize hypergraphs to explore node group information based on gold motif. In addition, we employ hyperbolic geometry to leverage latent hierarchical information and accomplish achievements that cannot be attained in Euclidean space. The advantages of hypergraph learning, hyperbolic learning, and contrastive learning are integrated in a unified framework to jointly improve model performance.

  • We conduct extensive experiments on 12 real-world datasets, demonstrating the effectiveness and superiority of HC-GLAD for unsupervised graph-level anomaly detection.

2 Related Work

2.1 Graph-Level Anomaly Detection

In the context of graph data analysis, the objective of graph-level anomaly detection is to discern abnormal graphs from normal ones, wherein anomalous graphs often signify a minority but pivotal patterns 2021_TKDE_SurveyofGAD . OCGIN 2021(2023)_BigData_OCGIN is the first representative model, and it integrates the one-class classification and graph isomorphism network (GIN) 2019_ICLR_GIN into this graph-level anomaly detection. OCGTL 2022_IJCAI_OCGTL integrates the strengths of deep one-class classification and neural transformation learning. GLocalKD 2022_WSDM_GLocalKD implements joint random distillation to detect both locally anomalous and globally anomalous graphs by training one graph neural network to predict another graph neural network. GOOD-D 2023_WSDM_GOOD-D introduces perturbation-free graph data augmentation and performs hierarchical contrastive learning to detect anomalous graphs based on semantic inconsistency in different levels. TUAF 2023_DASFAA_TUAF builds triple-unit graphs and further learns triple representations to simultaneously capture abundant information on edges and their corresponding nodes. CVTGAD 2023_ECMLPKDD_CVTGAD applies transformer and cross-attention into UGAD, directly exploiting relationships across different views. SIGNET 2023(2024)_NeurIPS_SIGNET proposes a multi-view subgraph information bottleneck framework and further infers anomaly scores and provides subgraph-level explanations.

2.2 Hyperbolic Learning on Graphs

Hyperbolic learning has attracted massive attention from the research field due to its superior geometry property (i.e., its volume increases exponentially in proportion to its radius) of hyperbolic space compared to Euclidean space 2022_arXiv_Survey_Hyperbolic ; 2022_KDD_HICF . HGNN (hyperbolic graph neural network) 2019_NeurIPS_HGNN_Hyperbolic generalizes the graph neural networks to Riemannian manifolds and improves the performance of the full-graph classification task. It fully utilizes the power of hyperbolic geometry and demonstrates that hyperbolic representations are suitable for capturing high-level structural information. HGCN (hyperbolic graph convolutional neural network) 2019_NeurIPS_HGCN_Hyperbolic leverages both the expressiveness of GCNs and hyperbolic geometry. κ𝜅\kappaitalic_κ-GCN 2020_ICML_k-GCN presents an innovative expansion of GCNs to encompass stereographic models with both positive and negative curvatures, thereby offering a unified approach. HAT (hyperbolic graph attention network) 2021_BigData_HAT_Hyperbolic proposes the hyperbolic multi-head attention mechanism to acquire robust node representation of graph in hyperbolic space and further improves the accuracy of node classification. LGCN 2021_WWW_LGCN introduces a unified framework of graph operations on the hyperboloid (i.e., feature transformation and non-linearity activation), and proposes an elegant hyperbolic neighborhood aggregation based on the centroid of Lorentzian distance. HRCF 2022_WWW_HRCF_Hyperbolic designs a geometric-aware hyperbolic regularizer to boost the optimization process by the root alignment and origin-aware penalty, and it enhances the performance of a hyperbolic-powered collaborative filtering. HyperIMBA 2023_WWW_HyperIMBA_Hyperbolic explores the hierarchy-imbalance issue on hierarchical structure and captures the implicit hierarchy of graph nodes by hyperbolic geometry.

2.3 Hypergraph Learning

Due to the capability and flexibility in modeling complex correlations of graph data, hypergraph learning has earned more attention from both academia and industry 2022_TPAMI_Survey_Hypergraph . Hypergraphs naturally depict a wide array of systems characterized by group relationships among their interacting parts 2023_ACM_Survey_HyperGraph . HGNN (hypergraph neural network) 2019_AAAI_HGNN_Hypergraph designs a hyperedge convolution operation and encodes high-order data correlation in a hypergraph structure. HyperGCN 2019_NeurIPS_HyperGCN utilizes tools from spectral theory of hypergraphs and introduces a novel way to train GCN for semi-supervised learning and combinatorial optimization tasks. HGNN+ 2022_TPAMI_HGNN+_Hypergraph conceptually introduces "hyperedge group", and it bridges multi-modal/multi-type data and hyperedge. DHCF 2020_KDD_DHCF constructs two hypergraphs (i.e., user and item hypergraph) and introduces a jump hypergraph convolution (jHConv) to enhance collaborative filtering recommendation performance. HHGR 2021_CIKM_HHGR_Hypergraph builds user-level and group-level hypergraphs and employs a hierarchical hypergraph convolution network to capture complex high-order relationships within and beyond groups, thus improving the performance of group recommendation. DH-HGCN 2022_SIGIR_DH-HGCN_Hypergraph utilizes both a hypergraph convolution network and homogeneity study to explicitly learn high-order relationships among items and users to enhance multiple social recommendation performance. HCCF 2022_SIGIR_HCCF_Hypergraph designs a hypergraph-enhanced cross-view contrastive learning architecture to jointly capture local and global collaborative relations in recommender system.

A more extensive review of the literature is provided in Appendix A.

3 Methodology

In this section, we introduce the preliminaries and dual hyperbolic contrastive learning for unsupervised graph-level anomaly detection framework (HC-GLAD). The overall framework and brief procedure are illustrated in Figure 2. And the pseudo-code algorithm of HC-GLAD is illustrated in Appendix B.1.

Refer to caption
Figure 2: The overall framework of HC-GLAD.

3.1 Preliminaries

Notations. We denote a graph as G=(𝒱,)𝐺𝒱G=(\mathcal{V},\mathcal{E})italic_G = ( caligraphic_V , caligraphic_E ), where 𝒱𝒱\mathcal{V}caligraphic_V is the set of nodes and \mathcal{E}caligraphic_E is the set of edges. The topology (i.e., structure) information of G𝐺Gitalic_G is represented by adjacency matrix An×n𝐴superscript𝑛𝑛A\in\mathbb{R}^{n\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, where n𝑛nitalic_n is the number of nodes. Ai,j=1subscript𝐴𝑖𝑗1A_{i,j}=1italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 if there is an edge between node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, otherwise, Ai,j=0subscript𝐴𝑖𝑗0A_{i,j}=0italic_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0. We denote an attributed graph as G=(𝒱,,𝒳)𝐺𝒱𝒳G=(\mathcal{V},\mathcal{E},\mathcal{X})italic_G = ( caligraphic_V , caligraphic_E , caligraphic_X ), where 𝒳n×dattr𝒳superscript𝑛subscript𝑑𝑎𝑡𝑡𝑟\mathcal{X}\in\mathbb{R}^{n\times{d_{attr}}}caligraphic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d start_POSTSUBSCRIPT italic_a italic_t italic_t italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents the feature matrix of node features. Each row of 𝐗𝐗\mathbf{X}bold_X represents a node’s feature vector with dattrsubscript𝑑𝑎𝑡𝑡𝑟d_{attr}italic_d start_POSTSUBSCRIPT italic_a italic_t italic_t italic_r end_POSTSUBSCRIPT dimension. The graph set is denoted as 𝒢={G1,G2,,Gm}𝒢subscript𝐺1subscript𝐺2subscript𝐺𝑚\mathcal{G}=\{G_{1},G_{2},...,G_{m}\}caligraphic_G = { italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }, where m𝑚mitalic_m is the number of graphs in 𝒢𝒢\mathcal{G}caligraphic_G.

Problem Definition. In this work, we focus on unsupervised graph-level anomaly detection task: in the training phase, we train the model only using normal graphs; in the inference phase, given a graph set 𝒢𝒢\mathcal{G}caligraphic_G containing normal graphs and anomalous graphs, HC-GLAD aims to distinguish the anomalous graphs that are different from the normal graphs.

3.2 Data Preprocessing

Graph Data Augmentation. We employ the perturbation-free graph augmentation strategy 2023_WSDM_GOOD-D ; 2023_AAAI_FedStar to generate two augmented views (i.e., view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) for an input graph G𝐺Gitalic_G. Concretely, view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT focuses more on attribute and is directly built by integrating the node attribute 𝒳𝒳\mathcal{X}caligraphic_X (for attributed graph) and adjacency matrix A𝐴Aitalic_A. view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT focuses more on structure and is built by structural encodings from the graph topology and then it is combined with adjacency matrix A𝐴Aitalic_A.

Hypergraph Construction. After obtaining two augmented views of a graph, we essentially have two augmented graphs. Inspired by 2020_VLDB_MoCHy ; 2021_WWW_MHCN , we leverage ternary relationships between nodes, using the "gold motif" (i.e., a triangular relationship formed by three nodes) to initially construct hypergraph. Given adjacency matrix A𝐴Aitalic_A of an augmented graph (view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), we first construct relationship matrix Arelationsubscript𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛A_{relation}italic_A start_POSTSUBSCRIPT italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT of the constructed hypergraph by using gold motif. It can be calculated by:

Arelation=(AAT)A=(AA)A,subscript𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛direct-product𝐴superscript𝐴𝑇𝐴direct-product𝐴𝐴𝐴A_{relation}=(AA^{T})\odot A=(AA)\odot A,italic_A start_POSTSUBSCRIPT italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT = ( italic_A italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ⊙ italic_A = ( italic_A italic_A ) ⊙ italic_A , (1)

where AT=Asuperscript𝐴𝑇𝐴A^{T}=Aitalic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_A beacause graph G𝐺Gitalic_G is an undirected graph so A𝐴Aitalic_A is symmetric.

We determine the higher-order relationships between vertices based on the matrix A^realation=Arelation+INsubscript^𝐴𝑟𝑒𝑎𝑙𝑎𝑡𝑖𝑜𝑛subscript𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛subscript𝐼𝑁\hat{A}_{realation}=A_{relation}+I_{N}over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT + italic_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, where INsubscript𝐼𝑁I_{N}italic_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the identity matrix. We further build the incidence matrix 𝐇𝐢𝐧𝐜subscript𝐇𝐢𝐧𝐜\mathbf{H_{inc}}bold_H start_POSTSUBSCRIPT bold_inc end_POSTSUBSCRIPT, concretely, if vertex visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is connected by hyperedge ϵitalic-ϵ\epsilonitalic_ϵ, Hinc(iϵ)=1subscript𝐻𝑖𝑛𝑐𝑖italic-ϵ1H_{inc(i\epsilon)}=1italic_H start_POSTSUBSCRIPT italic_i italic_n italic_c ( italic_i italic_ϵ ) end_POSTSUBSCRIPT = 1, otherwise 0. While thoroughly investigating and utilizing the gold motif, we must also consider instances that do not constitute this kind of high-order relationship and ensure the integrity of the entire graph. Therefore, we will also include the edges that are not part of the high-order relationships in the incidence matrix 𝐇incsubscript𝐇𝑖𝑛𝑐\mathbf{H}_{inc}bold_H start_POSTSUBSCRIPT italic_i italic_n italic_c end_POSTSUBSCRIPT. Finally, we get a hypergraph HyperG𝐻𝑦𝑝𝑒𝑟𝐺HyperGitalic_H italic_y italic_p italic_e italic_r italic_G with N𝑁Nitalic_N vertices and M𝑀Mitalic_M hyperedges. The high-order relationships in hypergraph HyperG𝐻𝑦𝑝𝑒𝑟𝐺HyperGitalic_H italic_y italic_p italic_e italic_r italic_G could be simply represented by the incidence matrix 𝐇incN×Msubscript𝐇𝑖𝑛𝑐superscript𝑁𝑀\mathbf{H}_{inc}\in\mathbb{R}^{N\times M}bold_H start_POSTSUBSCRIPT italic_i italic_n italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT.

3.3 Lorentz Manifold

Hyperbolic space, defined by its constant negative curvature, diverges from the flatness of Euclidean geometry. The Lorentz manifold is often favored for its numerical stability, making it a popular choice in hyperbolic geometry applications 2018_ICML_nickel_Hyperbolic_RSGD .

Definition 1 (Lorentzian Inner Product)

The inner product 𝐱,𝐲subscript𝐱𝐲\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}⟨ bold_x , bold_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT for vectors 𝐱,𝐲d+1𝐱𝐲superscript𝑑1\mathbf{x},\mathbf{y}\in\mathbb{R}^{d+1}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT is defined by the expression 𝐱,𝐲=x0y0+i=1dxiyisubscript𝐱𝐲subscript𝑥0subscript𝑦0superscriptsubscript𝑖1𝑑subscript𝑥𝑖subscript𝑦𝑖\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}=-x_{0}y_{0}+\sum_{i=1}^{d}x_% {i}y_{i}⟨ bold_x , bold_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Definition 2 (Lorentz Manifold)

A d𝑑ditalic_d-dimensional Lorentz manifold, denoted as dsuperscript𝑑\mathcal{L}^{d}caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with a constant negative curvature, is defined as the Riemannian manifold (d,g)superscript𝑑subscript𝑔(\mathbb{H}^{d},g_{\ell})( blackboard_H start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ). Here, we adopt the constant negative curvature of 11-1- 1, and gsubscript𝑔g_{\ell}italic_g start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is the metric tensor represented by diag([1,1,,1])diag111\operatorname{diag}([-1,1,\ldots,1])roman_diag ( [ - 1 , 1 , … , 1 ] ), and dsuperscript𝑑\mathbb{H}^{d}blackboard_H start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the set of all vectors 𝐱d+1𝐱superscript𝑑1\mathbf{x}\in\mathbb{R}^{d+1}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT satisfying 𝐱,𝐱=1subscript𝐱𝐱1\langle\mathbf{x},\mathbf{x}\rangle_{\mathcal{L}}=-1⟨ bold_x , bold_x ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = - 1 and x0>0subscript𝑥00x_{0}>0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0.

Next, the corresponding Lorentzian distance function for two points 𝐱,𝐲d𝐱𝐲superscript𝑑\mathbf{x},\mathbf{y}\in\mathcal{L}^{d}bold_x , bold_y ∈ caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is provided as:

d(𝐱,𝐲)=arcosh(𝐱,𝐲).subscript𝑑𝐱𝐲arcoshsubscript𝐱𝐲d_{\mathcal{L}}(\mathbf{x},\mathbf{y})=~{}\mbox{arcosh}~{}(-\langle\mathbf{x},% \mathbf{y}\rangle_{\mathcal{L}}).italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ( bold_x , bold_y ) = arcosh ( - ⟨ bold_x , bold_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ) . (2)
Definition 3 (Tangent Space)

For a point 𝐱d𝐱superscript𝑑\mathbf{x}\in\mathcal{L}^{d}bold_x ∈ caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the tangent space 𝒯𝐱dsubscript𝒯𝐱superscript𝑑\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT consists of all vectors 𝐯𝐯\mathbf{v}bold_v that are orthogonal to 𝐱𝐱\mathbf{x}bold_x under the Lorentzian inner product. This orthogonality is defined such that 𝐱,𝐯=0subscript𝐱𝐯0\langle\mathbf{x},\mathbf{v}\rangle_{\mathcal{L}}=0⟨ bold_x , bold_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = 0. Therefore, the tangent space can be expressed as: 𝒯𝐱d={𝐯:𝐱,𝐯=0}.subscript𝒯𝐱superscript𝑑conditional-set𝐯subscript𝐱𝐯0\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}=\left\{\mathbf{v}:\langle\mathbf{x},% \mathbf{v}\rangle_{\mathcal{L}}=0\right\}.caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { bold_v : ⟨ bold_x , bold_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = 0 } .

Definition 4 (Exponential and Logarithmic Maps)

Let 𝐯𝒯𝐱d𝐯subscript𝒯𝐱superscript𝑑\mathbf{v}\in\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}bold_v ∈ caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The exponential map exp𝐱:𝒯𝐱dd:subscript𝐱subscript𝒯𝐱superscript𝑑superscript𝑑\exp_{\mathbf{x}}:\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}\rightarrow\mathcal{L% }^{d}roman_exp start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT : caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and the logarithmic map log𝐱:d𝒯𝐱d:subscript𝐱superscript𝑑subscript𝒯𝐱superscript𝑑\log_{\mathbf{x}}:\mathcal{L}^{d}\rightarrow\mathcal{T}_{\mathbf{x}}\mathcal{L% }^{d}roman_log start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT : caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT are defined as follows:

exp𝐱(𝐯)=cosh(𝐯)𝐱+sinh(𝐯)𝐯𝐯,subscript𝐱𝐯subscriptnorm𝐯𝐱subscriptnorm𝐯𝐯subscriptnorm𝐯\exp_{\mathbf{x}}(\mathbf{v})=\cosh(\|\mathbf{v}\|_{\mathcal{L}})\mathbf{x}+% \sinh(\|\mathbf{v}\|_{\mathcal{L}})\frac{\mathbf{v}}{\|\mathbf{v}\|_{\mathcal{% L}}},roman_exp start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( bold_v ) = roman_cosh ( ∥ bold_v ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ) bold_x + roman_sinh ( ∥ bold_v ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ) divide start_ARG bold_v end_ARG start_ARG ∥ bold_v ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT end_ARG , (3)
log𝐱(𝐲)=d(𝐱,𝐲)𝐲+𝐱,𝐲𝐱𝐲+𝐱,𝐲𝐱,subscript𝐱𝐲subscript𝑑𝐱𝐲𝐲subscript𝐱𝐲𝐱subscriptnorm𝐲subscript𝐱𝐲𝐱\log_{\mathbf{x}}(\mathbf{y})=d_{\mathcal{L}}(\mathbf{x},\mathbf{y})\frac{% \mathbf{y}+\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}\mathbf{x}}{\left% \|\mathbf{y}+\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}\mathbf{x}\right% \|_{\mathcal{L}}},roman_log start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( bold_y ) = italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ( bold_x , bold_y ) divide start_ARG bold_y + ⟨ bold_x , bold_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT bold_x end_ARG start_ARG ∥ bold_y + ⟨ bold_x , bold_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT bold_x ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT end_ARG , (4)

where 𝐯=𝐯,𝐯subscriptnorm𝐯subscript𝐯𝐯\|\mathbf{v}\|_{\mathcal{L}}=\sqrt{\langle\mathbf{v},\mathbf{v}\rangle_{% \mathcal{L}}}∥ bold_v ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = square-root start_ARG ⟨ bold_v , bold_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT end_ARG denotes the norm of 𝐯𝐯\mathbf{v}bold_v in 𝒯𝐱dsubscript𝒯𝐱superscript𝑑\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}caligraphic_T start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

For computational convenience, the origin of the Lorentz manifold, denoted as 𝐨=(1,0,0,,0)𝐨1000\mathbf{o}=(1,0,0,\ldots,0)bold_o = ( 1 , 0 , 0 , … , 0 ) in dsuperscript𝑑\mathcal{L}^{d}caligraphic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, is selected as the reference point for the exponential and logarithmic maps. This choice allows for simplified expressions of these map**s.

exp𝐨(𝐯)=exp𝐨([0,𝐯E])=(cosh(𝐯E2),sinh(𝐯E2)𝐯E𝐯E2),subscript𝐨𝐯subscript𝐨0superscript𝐯𝐸subscriptnormsuperscript𝐯𝐸2subscriptnormsuperscript𝐯𝐸2superscript𝐯𝐸subscriptnormsuperscript𝐯𝐸2\displaystyle\exp_{\mathbf{o}}(\mathbf{v})=\exp_{\mathbf{o}}\left(\left[0,% \mathbf{v}^{E}\right]\right)=\left(\cosh\left(\|\mathbf{v}^{E}\|_{2}\right),% \sinh\left(\|\mathbf{v}^{E}\|_{2}\right)\frac{\mathbf{v}^{E}}{\|\mathbf{v}^{E}% \|_{2}}\right),roman_exp start_POSTSUBSCRIPT bold_o end_POSTSUBSCRIPT ( bold_v ) = roman_exp start_POSTSUBSCRIPT bold_o end_POSTSUBSCRIPT ( [ 0 , bold_v start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT ] ) = ( roman_cosh ( ∥ bold_v start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , roman_sinh ( ∥ bold_v start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) divide start_ARG bold_v start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_v start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) , (5)

where the (,)(,)( , ) denotes concatenation and the Esuperscript𝐸\cdot^{E}⋅ start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT denotes the embedding in Euclidean space 2021_WWW_LGCN .

3.4 Hyperbolic (Hyper-)Graph Convolution

Before we conduct hyperbolic (hyper-)graph convolution, we insert a value 0 in the zeroth dimension of the Euclidean state of the node for both view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Refer to Eq. (5), the initial hyperbolic node state 𝐞0superscript𝐞0\mathbf{e}^{0}bold_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT could be obtained by:

ei0=exp𝐨([0,𝐱i]),subscriptsuperscript𝑒0𝑖subscript𝐨0subscript𝐱𝑖e^{0}_{i}=\exp_{\mathbf{o}}([0,\mathbf{x}_{i}]),italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_exp start_POSTSUBSCRIPT bold_o end_POSTSUBSCRIPT ( [ 0 , bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) , (6)

where 𝐱𝐱\mathbf{x}bold_x is the initial feature (or encoding) from augmented graphs (i.e., view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). [0,𝐱]0𝐱[0,\mathbf{x}][ 0 , bold_x ] denotes the operation of inserting the value 0 into the zeroth dimension of 𝐱𝐱\mathbf{x}bold_x so that 𝐞0superscript𝐞0\mathbf{e}^{0}bold_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT could always be in the tangent space of origin 2022_KDD_HICF ; 2021_WWW_HGCF . And the superscript 0 in ei0subscriptsuperscript𝑒0𝑖e^{0}_{i}italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates the initial state.

3.4.1 Hyperbolic Graph Aggregation

Following 2022_KDD_HICF ; 2021_WWW_HGCF , we first map the initial embedding ei0subscriptsuperscript𝑒0𝑖e^{0}_{i}italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in hyperbolic space to the tangent space using the logarithmic map. Then, we select GCN as our fundamental graph encoder to perform graph convolution aggregation. The propagation rule in the l𝑙litalic_l-th layer on the view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be expressed as:

𝐇graph(view1,l)=σ(𝐃^12𝐀^𝐃^12𝐇graph(view1,l1)𝐖(l1)),subscriptsuperscript𝐇𝑣𝑖𝑒subscript𝑤1𝑙𝑔𝑟𝑎𝑝𝜎superscript^𝐃12^𝐀superscript^𝐃12subscriptsuperscript𝐇𝑣𝑖𝑒subscript𝑤1𝑙1𝑔𝑟𝑎𝑝superscript𝐖𝑙1\mathbf{H}^{(view_{1},~{}l)}_{graph}=\sigma\left(\hat{\mathbf{D}}^{-\frac{1}{2% }}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-\frac{1}{2}}\mathbf{H}^{(view_{1},~{}l-1)% }_{graph}\mathbf{W}^{(l-1)}\right),bold_H start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h end_POSTSUBSCRIPT = italic_σ ( over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h end_POSTSUBSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) , (7)

where 𝐀^=𝐀+𝐈N^𝐀𝐀subscript𝐈𝑁\hat{\mathbf{A}}=\mathbf{A}+\mathbf{I}_{N}over^ start_ARG bold_A end_ARG = bold_A + bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the adjacency matrix of the input graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with added self-connections, and 𝐈Nsubscript𝐈𝑁\mathbf{I}_{N}bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the identity matrix. 𝐃^^𝐃\hat{\mathbf{D}}over^ start_ARG bold_D end_ARG is the degree matrix, 𝐇(f,l1)superscript𝐇𝑓𝑙1\mathbf{H}^{(f,l-1)}bold_H start_POSTSUPERSCRIPT ( italic_f , italic_l - 1 ) end_POSTSUPERSCRIPT is node embedding matrix in the l1𝑙1l-1italic_l - 1-th layer of feature view, 𝐖(l1)superscript𝐖𝑙1\mathbf{W}^{(l-1)}bold_W start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT is a layer-specific trainable weight matrix, and σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) is a non-linear activation function 2016_arXiv_GCN . The calculation of view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be calculated in the same way. After we obtain the final embedding 𝐡lsuperscript𝐡𝑙\mathbf{h}^{l}bold_h start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT of node i𝑖iitalic_i in tangent space, we map the final embedding from tangent space to hyperbolic space using exponential map (defined in Definition 4).

3.4.2 Hyperbolic Hypergraph Aggregation

Similar to hyperbolic graph aggregation, we first map the initial embedding ei0subscriptsuperscript𝑒0𝑖e^{0}_{i}italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in hyperbolic space to the tangent space using the logarithmic map, then we employ HGCN as our fundamental hypergraph encoder to perform hypergraph convolution aggregation. The propagation rule in the l𝑙litalic_l-th layer on the view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be expressed as:

𝐇hyperg(view1,l)=σ(𝐃hyperg1/2𝐇inc𝐖𝐁1𝐇incT𝐃hyperg1/2𝐇hyperg(view1,l1)𝐏),subscriptsuperscript𝐇𝑣𝑖𝑒subscript𝑤1𝑙𝑦𝑝𝑒𝑟𝑔𝜎subscriptsuperscript𝐃12𝑦𝑝𝑒𝑟𝑔subscript𝐇𝑖𝑛𝑐superscript𝐖𝐁1subscriptsuperscript𝐇𝑇𝑖𝑛𝑐subscriptsuperscript𝐃12𝑦𝑝𝑒𝑟𝑔subscriptsuperscript𝐇𝑣𝑖𝑒subscript𝑤1𝑙1𝑦𝑝𝑒𝑟𝑔𝐏\mathbf{H}^{(view_{1},~{}l)}_{hyperg}=\sigma\left(\mathbf{D}^{-1/2}_{hyperg}% \mathbf{H}_{inc}\mathbf{W}\mathbf{B}^{-1}\mathbf{H}^{T}_{inc}\mathbf{D}^{-1/2}% _{hyperg}\mathbf{H}^{(view_{1},~{}l-1)}_{hyperg}\mathbf{P}\right),bold_H start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g end_POSTSUBSCRIPT = italic_σ ( bold_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_i italic_n italic_c end_POSTSUBSCRIPT bold_WB start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_n italic_c end_POSTSUBSCRIPT bold_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g end_POSTSUBSCRIPT bold_H start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g end_POSTSUBSCRIPT bold_P ) , (8)

where 𝐃hypergN×Nsubscript𝐃𝑦𝑝𝑒𝑟𝑔superscript𝑁𝑁\mathbf{D}_{hyperg}\in\mathbb{R}^{N\times N}bold_D start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is the vertex degree matrix, 𝐁M×M𝐁superscript𝑀𝑀\mathbf{B}\in\mathbb{R}^{M\times M}bold_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT is the hyperedge degree matrix, 𝐖M×M𝐖superscript𝑀𝑀\mathbf{W}\in\mathbb{R}^{M\times M}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT is the hyperedge weights matrix, 𝐏F(l1)×F(l)𝐏superscriptsuperscript𝐹𝑙1superscript𝐹𝑙\mathbf{P}\in\mathbb{R}^{F^{(l-1)}\times F^{(l)}}bold_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_F start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT × italic_F start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is weight matirx between the (l1)𝑙1(l-1)( italic_l - 1 )-th and (l+1)𝑙1(l+1)( italic_l + 1 )-th layer 2021_PR_HGCN_HGAT_HyperGraph . The calculation of view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be calculated in the same way. After we obtain the final embedding 𝐡lsuperscript𝐡𝑙\mathbf{h}^{l}bold_h start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT of node i𝑖iitalic_i in tangent space, we map the final embedding from tangent space to hyperbolic space using exponential map (defined in Definition 4).

3.5 Multi-Level Contrast

Following 2023_WSDM_GOOD-D ; 2023_ECMLPKDD_CVTGAD , we design a contrastive strategy considering both node-level contrast and graph-level contrast to train model. Our proposed model comprises both graph- and hypergraph-channels, and their methods for computing multi-level contrast are similar. We elaborate on this as follows through graph-channel contrast.

Node-level Contrast. For an input graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we first map node embedding into node-level contrast space with MLP-based projection head, and then we construct node-level contrastive loss to maximize the agreement between the embeddings belonging to different views on the node level:

node=1||Gj12|𝒱Gj|vi𝒱Gj[l(𝐡i(view1),𝐡i(view2))+l(𝐡i(view2),𝐡i(view1))],subscript𝑛𝑜𝑑𝑒1subscriptsubscript𝐺𝑗12subscript𝒱subscript𝐺𝑗subscriptsubscript𝑣𝑖subscript𝒱subscript𝐺𝑗delimited-[]lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖\begin{split}\mathcal{L}_{node}=\frac{1}{|\mathcal{B}|}\sum_{G_{j}\in\mathcal{% B}}\frac{1}{2|\mathcal{V}_{G_{j}}|}\sum_{v_{i}\in\mathcal{V}_{G_{j}}}\left[% \emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)+% \emph{l}\left(\mathbf{h}^{(view_{2})}_{i},\mathbf{h}^{(view_{1})}_{i}\right)% \right]\end{split},start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT italic_n italic_o italic_d italic_e end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_B | end_ARG ∑ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 | caligraphic_V start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_CELL end_ROW , (9)
l(𝐡i(view1),𝐡i(view2))=loge(HDist(𝐡i(view1),𝐡i(view2))/τ)vk𝒱Gj\vie(HDist(𝐡i(view1),𝐡k(view2))/τ).lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖𝑙𝑜𝑔superscript𝑒subscript𝐻𝐷𝑖𝑠𝑡subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖𝜏subscriptsubscript𝑣𝑘\subscript𝒱subscript𝐺𝑗subscript𝑣𝑖superscript𝑒subscript𝐻𝐷𝑖𝑠𝑡subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑘𝜏\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)=-% log\frac{e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{i},~{}\mathbf{h}^{(% view_{2})}_{i}\right)/{\tau}\right)}}{\sum_{v_{k}\in\mathcal{V}_{G_{j}}% \backslash v_{i}}e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{i},~{}% \mathbf{h}^{(view_{2})}_{k}\right)/{\tau}\right)}}.l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = - italic_l italic_o italic_g divide start_ARG italic_e start_POSTSUPERSCRIPT ( - italic_H start_POSTSUBSCRIPT italic_D italic_i italic_s italic_t end_POSTSUBSCRIPT ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_τ ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT \ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ( - italic_H start_POSTSUBSCRIPT italic_D italic_i italic_s italic_t end_POSTSUBSCRIPT ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ ) end_POSTSUPERSCRIPT end_ARG . (10)

In Eq. (9), \mathcal{B}caligraphic_B is the training/testing batch and 𝒱Gjsubscript𝒱subscript𝐺𝑗\mathcal{V}_{G_{j}}caligraphic_V start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the node set of graph Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The calculation of l(𝐡i(view2),𝐡i(view1))lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖\emph{l}\left(\mathbf{h}^{(view_{2})}_{i},\mathbf{h}^{(view_{1})}_{i}\right)l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and l(𝐡i(view1),𝐡i(view2))lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the same, and we briefly show the calculation of l(𝐡i(view1),𝐡i(view2))lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2𝑖\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) in Eq. (10). In Eq. (10), the HDist(.,.)H_{Dist}\left(.,.\right)italic_H start_POSTSUBSCRIPT italic_D italic_i italic_s italic_t end_POSTSUBSCRIPT ( . , . ) is the function to measure the hyperbolic distance between different views. In this work, we compute the Lorentzian distance as Eq. (2) indicates.

Graph-level Contrast. To obtain graph embedding 𝐡Gisubscript𝐡subscript𝐺𝑖\mathbf{h}_{G_{i}}bold_h start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT of graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we employ mean pooling simply on embedding of nodes in graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We first map graph embedding into graph-level contrast space with MLP-based projection head. Similar to the node-level loss nodesubscript𝑛𝑜𝑑𝑒\mathcal{L}_{node}caligraphic_L start_POSTSUBSCRIPT italic_n italic_o italic_d italic_e end_POSTSUBSCRIPT, we then construct a graph-level loss for mutual agreement maximization on graph level:

graph=12||Gi[l(𝐡Gi(view1),𝐡Gi(view2))+l(𝐡Gi(view2),𝐡Gi(view1))],subscript𝑔𝑟𝑎𝑝12subscriptsubscript𝐺𝑖delimited-[]lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖\mathcal{L}_{graph}=\frac{1}{2|\mathcal{B}|}\sum_{G_{i}\in\mathcal{B}}\left[% \emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}% \right)+\emph{l}\left(\mathbf{h}^{(view_{2})}_{G_{i}},\mathbf{h}^{(view_{1})}_% {G_{i}}\right)\right],caligraphic_L start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 | caligraphic_B | end_ARG ∑ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT [ l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] , (11)
l(𝐡Gi(view1),𝐡Gi(view2))=loge(HDist(𝐡Gi(view1),𝐡Gi(view2))/τ)Gj\Gie(HDist(𝐡Gi(view1),𝐡Gj(view2))/τ),lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖𝑙𝑜𝑔superscript𝑒subscript𝐻𝐷𝑖𝑠𝑡subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖𝜏subscriptsubscript𝐺𝑗\subscript𝐺𝑖superscript𝑒subscript𝐻𝐷𝑖𝑠𝑡subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑗𝜏\emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}% \right)=-log\frac{e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{G_{i}},% \mathbf{h}^{(view_{2})}_{G_{i}}\right)/{\tau}\right)}}{\sum_{G_{j}\in\mathcal{% B}\backslash G_{i}}e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{G_{i}},% \mathbf{h}^{(view_{2})}_{G_{j}}\right)/\tau\right)}},l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = - italic_l italic_o italic_g divide start_ARG italic_e start_POSTSUPERSCRIPT ( - italic_H start_POSTSUBSCRIPT italic_D italic_i italic_s italic_t end_POSTSUBSCRIPT ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / italic_τ ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_B \ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ( - italic_H start_POSTSUBSCRIPT italic_D italic_i italic_s italic_t end_POSTSUBSCRIPT ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / italic_τ ) end_POSTSUPERSCRIPT end_ARG , (12)

where notations are similar to node-level loss, and l(𝐡Gi(view2),𝐡Gi(view1))lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖\emph{l}\left(\mathbf{h}^{(view_{2})}_{G_{i}},\mathbf{h}^{(view_{1})}_{G_{i}}\right)l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is calculated in the same way as l(𝐡Gi(view1),𝐡Gi(view2))lsubscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤1subscript𝐺𝑖subscriptsuperscript𝐡𝑣𝑖𝑒subscript𝑤2subscript𝐺𝑖\emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}\right)l ( bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ( italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). The training loss function on graph channel is:

graphchannel=ξ1node+ξ2graph,subscript𝑔𝑟𝑎𝑝𝑐𝑎𝑛𝑛𝑒𝑙subscript𝜉1subscript𝑛𝑜𝑑𝑒subscript𝜉2subscript𝑔𝑟𝑎𝑝\mathcal{L}_{graph-channel}=\xi_{1}~{}\mathcal{L}_{node}+\xi_{2}~{}\mathcal{L}% _{graph},caligraphic_L start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h - italic_c italic_h italic_a italic_n italic_n italic_e italic_l end_POSTSUBSCRIPT = italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_n italic_o italic_d italic_e end_POSTSUBSCRIPT + italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h end_POSTSUBSCRIPT , (13)

where ξ1subscript𝜉1\xi_{1}italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ξ2subscript𝜉2\xi_{2}italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are trade-off parameters, and we set ξ1=1subscript𝜉11\xi_{1}=1italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and ξ2=1subscript𝜉21\xi_{2}=1italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 on experiments of this work for simplicity. The training loss function on the hypergraph channel is calculated in the same way as the one on the graph channel. Therefore, in the training phase, we employ the loss function as:

total=λ1graphchannel+λ2hypergraphchannel.subscript𝑡𝑜𝑡𝑎𝑙subscript𝜆1subscript𝑔𝑟𝑎𝑝𝑐𝑎𝑛𝑛𝑒𝑙subscript𝜆2subscript𝑦𝑝𝑒𝑟𝑔𝑟𝑎𝑝𝑐𝑎𝑛𝑛𝑒𝑙\mathcal{L}_{total}=\lambda_{1}~{}\mathcal{L}_{graph-channel}+\lambda_{2}~{}% \mathcal{L}_{hypergraph-channel}.caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_g italic_r italic_a italic_p italic_h - italic_c italic_h italic_a italic_n italic_n italic_e italic_l end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_h italic_y italic_p italic_e italic_r italic_g italic_r italic_a italic_p italic_h - italic_c italic_h italic_a italic_n italic_n italic_e italic_l end_POSTSUBSCRIPT . (14)

3.6 Anomaly Scoring

In the inference phase, we calculate anomaly score from both graph-channel and hypergraph-channel. For simplicity and efficiency, we directly employ the totalsubscript𝑡𝑜𝑡𝑎𝑙\mathcal{L}_{total}caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT (Eq. (14)) as the final anomaly score for an input graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as:

scoreGi=total.𝑠𝑐𝑜𝑟subscript𝑒subscript𝐺𝑖subscript𝑡𝑜𝑡𝑎𝑙score_{G_{i}}=\mathcal{L}_{total}.italic_s italic_c italic_o italic_r italic_e start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT . (15)

4 Experiments

4.1 Experimental Setup

Datasets. We conduct experiments on 12 open-source datasets from TUDataset 2020_arXiv_TuDataset , which involves small molecules, bioinformatics, and social networks. We follow the settings in 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D to define anomaly, while the rest are viewed as normal data (i.e., normal graphs). Similar to 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D ; 2021_BigData_OCGIN , only normal data are utilized during the training phase.

Baselines. We select 9 representative baselines to compare with our proposed model. For the non-end-to-end method, we mainly select two categories: (i) kernel + detector. We adopt Weisfeiler-Lehman kernel (WL in short) 2011_JMLR_WL_graph_Kernel and propagation kernel (PK in short) 2016_ML_PK_graph_Kernel to first obtain representations, and then we take one-class SVM (OCSVM in short) 2001_JMLR_OCSVM and isolation forest (iF in short) 2008_ICDM_iF_graph_Kernel to detect anomaly. After arranging and combining the above kernels and detectors, there are four baselines available: PK-OCSVM, PK-iF, WL-OCSVM, and WL-iF. (ii) GCL model + detector. Considering that we used the paradigm of graph contrastive learning, we select two classic graph-level contrastive learning models (i.e., InfoGraph 2020_ICLR_InfoGraph and GraphCL 2020_NeurIPS_GraphCL ) to first obtain representations, and then we take iF as detector to detect anomaly (i.e., InfoGraph-iF, GraphCL-iF). For the end-to-end method, we select 3 classical models: OCGIN 2021_BigData_OCGIN , GLocalKD 2022_WSDM_GLocalKD and GOOD-D 2023_WSDM_GOOD-D .

Metrics and Implementations. Following 2022_WSDM_GLocalKD ; 2022_ScientificReports_GLADC ; 2023_WSDM_GOOD-D ; 2023_ECMLPKDD_CVTGAD , we adopt popular graph-level anomaly detection metric (i.e., the area under the receiver operating characteristic (AUC)) to evaluate methods. A higher AUC value corresponds to better anomaly detection performance. We use the Riemannian SGD with weight decay to learn the parameters of network 2022_KDD_HICF ; 2013_TAC_Riemannian_SGD . In practice, we implement HC-GLAD with PyTorch 2019_NeurIPS_PyTorch_Library . Appendix C provides more details of datasets and implementation.

Table 1: The performance comparison in terms of AUC (in percent, mean value ± standard deviation). The best performance is highlighted in bold, and the second-best performance is \ulunderlined. †: we report the result from 2023_WSDM_GOOD-D .
Method PK-OCSVM† PK-iF† WL-OCSVM† WL-iF† InfoGraph-iF† GraphCL-iF† OCGIN† GLocalKD† GOOD-D† HC-GLAD
PROTEINS-full 50.49±4.92 60.70±2.55 51.35±4.35 61.36±2.54 57.47±3.03 60.18±2.53 70.89±2.44 \ul77.30±5.15 71.97±3.86 77.51±2.58
ENZYMES 53.67±2.66 51.30±2.01 55.24±2.66 51.60±3.81 53.80±4.50 53.60±4.88 58.75±5.98 61.39±8.81 \ul63.90±3.69 65.39±6.23
AIDS 50.79±4.30 51.84±2.87 50.12±3.43 61.13±0.71 70.19±5.03 79.72±3.98 78.16±3.05 93.27±4.19 \ul97.28±0.69 99.51±0.38
DHFR 47.91±3.76 52.11±3.96 50.24±3.13 50.29±2.77 52.68±3.21 51.10±2.35 49.23±3.05 56.71±3.57 62.67±3.11 \ul61.16±4.20
BZR 46.85±5.31 55.32±6.18 50.56±5.87 52.46±3.30 63.31±8.52 60.24±5.37 65.91±1.47 69.42±7.78 \ul75.16±5.15 75.75±9.11
COX2 50.27±7.91 50.05±2.06 49.86±7.43 50.27±0.34 53.36±8.86 52.01±3.17 53.58±5.05 59.37±12.67 62.65±8.14 \ul59.98±7.44
DD 48.30±3.98 71.32±2.41 47.99±4.09 70.31±1.09 55.80±1.77 59.32±3.92 72.27±1.83 80.12±5.24 73.25±3.19 \ul77.66±1.73
REDDIT-B 45.68±2.24 46.72±3.42 49.31±2.33 48.26±0.32 68.50±5.56 71.80±4.38 75.93±8.65 77.85±2.62 88.67±1.24 \ul79.09±2.52
HSE 57.02±8.42 56.87±10.51 62.72±10.13 53.02±5.12 53.56±3.98 51.18±2.71 \ul64.84±4.70 59.48±1.44 69.65±2.14 64.05±4.75
MMP 46.65±6.31 50.06±3.73 55.24±3.26 52.68±3.34 54.59±2.01 54.54±1.86 71.23±0.16 67.84±0.59 70.51±1.56 \ul70.96±4.45
p53 46.74±4.88 50.69±2.02 54.59±4.46 50.85±2.16 52.66±1.95 53.29±2.32 58.50±0.37 \ul64.20±0.81 62.99±1.55 66.01±1.77
PPAR-gamma 53.94±6.94 45.51±2.58 57.91±6.13 49.60±0.22 51.40±2.53 50.30±1.56 71.19±4.28 64.59±0.67 67.34±1.71 \ul69.51±5.04
Avg.Rank 8.75 7.83 7.25 7.58 6.33 6.67 3.83 3.00 \ul2.08 1.67

4.2 Overall Performance

The AUC results of HC-GLAD, along with nine other baseline methods, are summarized in Table 1. As depicted in Table 1, HC-GLAD outperforms the other methods by securing first place on 5 datasets and second place on 6 datasets, while maintaining a competitive performance on the remaining dataset. Furthermore, HC-GLAD achieves the best average rank among all methods across the 12 datasets. Our observations indicate that graph kernel-based methods exhibit the poorest performance among baselines. This underperformance is attributed to their limited ability to identify regular patterns and essential graph information, rendering them less effective with complex datasets. GCL-based methods show a moderate level of performance, highlighting the competitive potential of graph contrastive learning for UGAD tasks. In conclusion, the competitive performance of our proposed model underscores the effectiveness of incorporating node group connections, as well as integrating hypergraph learning and hyperbolic geometry into graph-level anomaly detection. These findings also validate that HC-GLAD possesses inherent capabilities to capture the fundamental characteristics of normal graphs, consequently delivering superior anomaly detection performance.

4.3 Ablation Study

We conduct ablation study on four representative datasets to investigate the effects of the two key components: hypergraph-channel and hyperbolic learning. For convenience, let w/o HyperG and w/o HyperB denote the customized variants of HC-GLAD without hypergraph-channel and hyperbolic learning, respectively. The results are illustrated in Figure LABEL:fig:_Ablation-study. We can observe that HC-GLAD consistently achieves the best performance against two variants, demonstrating that hypergraph learning and hyperbolic learning are necessary to get the best detection performance. Compared with HC-GLAD, the poor performance of w/o HyperG proves the importance of considering node group information and introducing hypergraph learning to this field. The poor performance of w/o HyperB proves the importance of introducing hyperbolic learning to UGAD. Additionally, we find that on these datasets, hyperbolic learning has a more pronounced impact compared to hypergraph learning.

4.4 Hyper-parameter Analysis

Trade-off parameter λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In totalsubscript𝑡𝑜𝑡𝑎𝑙\mathcal{L}_{total}caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are trade-off parameters that determine the weights of the graph-channel and hypergraph-channel, respectively. To investigate their impact on model performance, we conduct experiments on four representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-lambda_1. For simplicity, we set λ2=1λ1subscript𝜆21subscript𝜆1\lambda_{2}=1-\lambda_{1}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We observe that as λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT increases from 0.1 to 0.9, the performance trend varies across different datasets. However, its variation does not cause significant changes in model performance, indicating that the overall model performance remains relatively stable. This implies a relatively high robustness of the proposed model.

Hidden Dimension. To investigate the impact of hidden dimension on model performance, we conduct experiments on five representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-HddenDimension. Based on our observations, we can preliminarily conclude that higher dimensionality does not necessarily lead to better performance. In certain intervals, increasing the dimensionality can actually degrade the model’s performance. The impact of dimensionality changes on model performance is minimal across most datasets. And the performance of the model remains relatively stable.

4.5 Visualization

To better understand our proposed model, we employ T-SNE 2008_JMLR_tSNE to visualize the embeddings learned by HC-GLAD, as shown in Figure LABEL:fig:_Visualization. Through observation, we can see that the embeddings of view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT learned via graph-channel or hypergraph-channel can already separate most normal graphs from anomalous graphs. However, it is ultimately the mechanism designed by HC-GLAD that distinctly differentiates normal graphs from anomalous graphs.

5 Conclusion and Limitation

In this paper, we propose a novel framework named HC-GLAD, which integrates the strength of hypergraph learning and hyperbolic learning to jointly enhance the performance of UGAD. In concrete, we employ hypergraph built on gold motif to exploit the node group information and utilize hyperbolic geometry to explore the latent hierarchical information. To the best of our knowledge, this is the first work to introduce hypergraph exploiting node group connections and hyperbolic geometry to the UGAD task. Through extensive experiments, we validate the superiority of HC-GLAD on 12 real-world datasets corresponding to different fields. One limitation of our method is that the integration of multiple learning paradigms in our framework may introduce increased computational cost. In the future, we will explore the design of lightweight yet efficient frameworks to overcome this limitation. We believe that our work can contribute to the advancement of the relevant field, promote societal progress, and benefit humanity.

References

  • [1] Rongrong Ma, Guansong Pang, Ling Chen, and Anton van den Hengel. Deep graph-level anomaly detection by glocal knowledge distillation. In Proceedings of the fifteenth ACM international conference on web search and data mining, pages 704–714, 2022.
  • [2] Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Hao Peng, Chuan Zhou, Hongyang Chen, Zhao Li, and Quan Z Sheng. Deep graph level anomaly detection with contrastive learning. Scientific Reports, 12(1):19867, 2022.
  • [3] Yixin Liu, Kaize Ding, Qinghua Lu, Fuyi Li, Leo Yu Zhang, and Shirui Pan. Towards self-interpretable graph-level anomaly detection. Advances in Neural Information Processing Systems, 36, 2024.
  • [4] Yixin Liu, Kaize Ding, Huan Liu, and Shirui Pan. Good-d: On unsupervised graph out-of-distribution detection. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 339–347, 2023.
  • [5] Zhenyang Yu, Xinye Wang, Bingzhe Zhang, Zhaohang Luo, and Lei Duan. Tuaf: Triple-unit-based graph-level anomaly detection with adaptive fusion readout. In International Conference on Database Systems for Advanced Applications, pages 415–430. Springer, 2023.
  • [6] **dong Li, Qianli Xing, Qi Wang, and Yi Chang. Cvtgad: Simplified transformer with cross-view attention for unsupervised graph-level anomaly detection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 185–200. Springer, 2023.
  • [7] Chaoxi Niu, Guansong Pang, and Ling Chen. Graph-level anomaly detection via hierarchical memory networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 201–218. Springer, 2023.
  • [8] Maximilian Nickel, Xueyan Jiang, and Volker Tresp. Reducing the rank in relational factorization models by including observable patterns. Advances in Neural Information Processing Systems, 27, 2014.
  • [9] Erzsébet Ravasz and Albert-László Barabási. Hierarchical organization in complex networks. Physical review E, 67(2):026112, 2003.
  • [10] Joey Bose, Ariella Smofsky, Renjie Liao, Prakash Panangaden, and Will Hamilton. Latent variable modelling with hyperbolic normalizing flows. In International Conference on Machine Learning, pages 1045–1055. PMLR, 2020.
  • [11] Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
  • [12] Yonghui Yang, Le Wu, Kun Zhang, Richang Hong, Hailin Zhou, Zhiqiang Zhang, Jun Zhou, and Meng Wang. Hyperbolic graph learning for social recommendation. IEEE Transactions on Knowledge and Data Engineering, 2023.
  • [13] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International symposium on graph drawing, pages 355–366. Springer, 2011.
  • [14] Xiao Wang, Yiding Zhang, and Chuan Shi. Hyperbolic heterogeneous information network embedding. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5337–5344, 2019.
  • [15] Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z Sheng, Hui Xiong, and Leman Akoglu. A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering, 35(12):12012–12038, 2021.
  • [16] Lingxiao Zhao and Leman Akoglu. On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights. Big Data, 11(3):151–180, 2023.
  • [17] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  • [18] Chen Qiu, Marius Kloft, Stephan Mandt, and Maja Rudolph. Raising the bar in graph-level anomaly detection. In International Joint Conference on Artificial Intelligence, 2022.
  • [19] Menglin Yang, Min Zhou, Zhihao Li, Jiahong Liu, Lujia Pan, Hui Xiong, and Irwin King. Hyperbolic graph neural networks: A review of methods and applications. arXiv preprint arXiv:2202.13852, 2022.
  • [20] Menglin Yang, Zhihao Li, Min Zhou, Jiahong Liu, and Irwin King. Hicf: Hyperbolic informative collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2212–2221, 2022.
  • [21] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. Advances in neural information processing systems, 32, 2019.
  • [22] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional neural networks. Advances in neural information processing systems, 32, 2019.
  • [23] Gregor Bachmann, Gary Bécigneul, and Octavian Ganea. Constant curvature graph convolutional networks. In International conference on machine learning, pages 486–496. PMLR, 2020.
  • [24] Yiding Zhang, Xiao Wang, Chuan Shi, Xunqiang Jiang, and Yanfang Ye. Hyperbolic graph attention network. IEEE Transactions on Big Data, 8(6):1690–1701, 2021.
  • [25] Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. Lorentzian graph convolutional networks. In Proceedings of the web conference 2021, pages 1249–1261, 2021.
  • [26] Menglin Yang, Min Zhou, Jiahong Liu, Defu Lian, and Irwin King. Hrcf: Enhancing collaborative filtering via hyperbolic geometric regularization. In Proceedings of the ACM Web Conference 2022, pages 2462–2471, 2022.
  • [27] Xingcheng Fu, Yuecen Wei, Qingyun Sun, Haonan Yuan, Jia Wu, Hao Peng, and Jianxin Li. Hyperbolic geometric graph representation learning for hierarchy-imbalance node classification. In Proceedings of the ACM Web Conference 2023, pages 460–468, 2023.
  • [28] Yue Gao, Zizhao Zhang, Haojie Lin, Xibin Zhao, Shaoyi Du, and Changqing Zou. Hypergraph learning: Methods and practices. IEEE transactions on pattern analysis and machine intelligence, 44(5):2548–2566, 2022.
  • [29] Alessia Antelmi, Gennaro Cordasco, Mirko Polato, Vittorio Scarano, Carmine Spagnuolo, and Dingqi Yang. A survey on hypergraph representation learning. ACM Computing Surveys, 56(1):1–38, 2023.
  • [30] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3558–3565, 2019.
  • [31] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. Hypergcn: A new method for training graph convolutional networks on hypergraphs. Advances in neural information processing systems, 32, 2019.
  • [32] Yue Gao, Yifan Feng, Shuyi Ji, and Rongrong Ji. Hgnn+: General hypergraph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3181–3199, 2022.
  • [33] Shuyi Ji, Yifan Feng, Rongrong Ji, Xibin Zhao, Wanwan Tang, and Yue Gao. Dual channel hypergraph collaborative filtering. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2020–2029, 2020.
  • [34] Junwei Zhang, Min Gao, Junliang Yu, Lei Guo, Jundong Li, and Hongzhi Yin. Double-scale self-supervised hypergraph learning for group recommendation. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 2557–2567, 2021.
  • [35] Jiadi Han, Qian Tao, Yufei Tang, and Yuhan Xia. Dh-hgcn: dual homogeneity hypergraph convolutional network for multiple social recommendations. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 2190–2194, 2022.
  • [36] Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, and Jimmy Huang. Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval, pages 70–79, 2022.
  • [37] Yue Tan, Yixin Liu, Guodong Long, **g Jiang, Qinghua Lu, and Chengqi Zhang. Federated learning on non-iid graphs via structural knowledge sharing. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 9953–9961, 2023.
  • [38] Geon Lee, Jihoon Ko, and Kijung Shin. Hypergraph motifs: concepts, algorithms, and discoveries. arXiv preprint arXiv:2003.01853, 2020.
  • [39] Junliang Yu, Hongzhi Yin, Jundong Li, Qinyong Wang, Nguyen Quoc Viet Hung, and Xiangliang Zhang. Self-supervised multi-channel hypergraph convolutional network for social recommendation. In Proceedings of the web conference 2021, pages 413–424, 2021.
  • [40] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International conference on machine learning, pages 3779–3788. PMLR, 2018.
  • [41] Jianing Sun, Zhaoyue Cheng, Saba Zuberi, Felipe Pérez, and Maksims Volkovs. Hgcf: Hyperbolic graph convolution networks for collaborative filtering. In Proceedings of the Web Conference 2021, pages 593–601, 2021.
  • [42] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • [43] Song Bai, Feihu Zhang, and Philip HS Torr. Hypergraph convolution and hypergraph attention. Pattern Recognition, 110:107637, 2021.
  • [44] Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663, 2020.
  • [45] L Zhao and L Akoglu. On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights. Big Data, 11(3):151–180, 2021.
  • [46] Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
  • [47] Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: efficient graph kernels from propagated information. Machine Learning, 102:209–245, 2016.
  • [48] Larry M Manevitz and Malik Yousef. One-class svms for document classification. Journal of machine Learning research, 2(Dec):139–154, 2001.
  • [49] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008.
  • [50] Fan-Yun Sun, Jordon Hoffman, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In International Conference on Learning Representations, 2020.
  • [51] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
  • [52] Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
  • [53] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • [54] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  • [55] Lirong Wu, Haitao Lin, Cheng Tan, Zhangyang Gao, and Stan Z Li. Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Transactions on Knowledge and Data Engineering, 35(4):4216–4235, 2021.
  • [56] Yixin Liu, Ming **, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and S Yu Philip. Graph self-supervised learning: A survey. IEEE transactions on knowledge and data engineering, 35(6):5879–5900, 2022.
  • [57] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, **g Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1150–1160, 2020.
  • [58] Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. ICLR (Poster), 2(3):4, 2019.
  • [59] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, pages 2069–2080, 2021.
  • [60] Yixin Liu, Yizhen Zheng, Daokun Zhang, Vincent Lee, and Shirui Pan. Beyond smoothing: Unsupervised graph representation learning with edge heterophily discriminating. In Proceedings of the AAAI conference on artificial intelligence, 2023.
  • [61] M. Gromov. Hyperbolic Groups, pages 75–263. Springer New York, New York, NY, 1987.
  • [62] Alexandru Tifrea, Gary Bécigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic word embeddings. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  • [63] Réka Albert, Bhaskar DasGupta, and Nasim Mobasheri. Topological implications of negative curvature for biological and social networks. Phys. Rev. E, 89:032811, Mar 2014.
  • [64] Menglin Yang, Min Zhou, Hui Xiong, and Irwin King. Hyperbolic temporal network embedding. IEEE Trans. Knowl. Data Eng., 35(11):11489–11502, 2023.

Appendix A Supplementary Related Work

A.1 Graph Contrastive Learning

Graph contrastive learning employs the principle of mutual information maximization to extract rich representations by optimizing instances with similar semantic content [55, 56]. This approach has gained widespread application for achieving outstanding performance in unsupervised graph representation learning [50, 51, 57, 58, 59, 60]. For example, GraphCL [51] proposes four types of data augmentations for graph-structured data to create pairs for contrastive learning. In the context of graph classification, InfoGraph [50] aims to maximize the mutual information between graph-level and substructure-level representations, with the latter being computed at various scales. Recent research has also applied graph contrastive learning to the field of graph-level anomaly detection. For instance, GLADC [2] captures both node-level and graph-level representations using a dual-graph encoder module within a contrastive learning framework. GOOD-D [4] detects anomalous graphs by identifying semantic inconsistencies across different granularities through a hierarchical contrastive learning framework. CVTGAD [6] similarly incorporates graph contrastive learning principles, utilizing transformer for unsupervised graph anomaly detection and explicitly accounting for co-occurrence between different views.

Appendix B Method Discussion

B.1 Algorithm

The overall algorithm of HC-GLAD is summarized in Algorithm 1.

Input : Graph set: 𝒢={G1,G2,,Gm}𝒢subscript𝐺1subscript𝐺2subscript𝐺𝑚\mathcal{G}=\{G_{1},G_{2},...,G_{m}\}caligraphic_G = { italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT };
Output : The anomaly scores for each graph ScoreG𝑆𝑐𝑜𝑟subscript𝑒𝐺Score_{G}italic_S italic_c italic_o italic_r italic_e start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT;
Initialize : (i) graph data augmentation: Obtain two augmented graph (i.e., view1𝑣𝑖𝑒subscript𝑤1view_{1}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view2𝑣𝑖𝑒subscript𝑤2view_{2}italic_v italic_i italic_e italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), using perturbation-free graph augmentation strategy [4, 37];
(ii) hypergraph construction: Construct hypergraph by "gold motif".
Training Phase
for i=1𝑖1i=1italic_i = 1 to s_epochs𝑠_𝑒𝑝𝑜𝑐𝑠s\_epochsitalic_s _ italic_e italic_p italic_o italic_c italic_h italic_s do
       Obtain initial hyperbolic node state e0superscript𝑒0e^{0}italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT by Eq. (6).
       Hyperbolic graph aggregation.
       Hyperbolic hypergraph aggregation.
       Graph-channel: (i) conduct node-level contrast by Eq. (9);
                           (ii) conduct graph-level contrast by Eq. (11).
       Hypergraph-channel: (i) conduct node-level contrast by Eq. (9);
                                  (ii) conduct graph-level contrast by Eq. (11).
       Calculate graph-channel loss by Eq. (13).
       Calculate hypergraph-channel loss similar to the way to calculate graph-channel loss.
       Calculate the total loss by Eq. (14).
end for
Inference Phase
for Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in Graph set G𝐺Gitalic_G do
       Calculate anomaly scores via Eq. (15).
end for
Algorithm 1 HC-GLAD

Appendix C Supplement of Experiments

C.1 Datasets

More details about datasets we employed in our experiments are illustrated in Table 2.

Table 2: The statistics of datasets of our experiments from TUDataset [44].
Dataset PROTEINS_full ENZYMES AIDS DHFR BZR COX2 DD REDDIT-B HSE MMP p53 PPAR-gamma
Graphs 1113 600 2000 467 405 467 1178 2000 8417 7558 8903 8451
Avg. Nodes 39.06 32.63 15.69 42.43 35.75 41.22 284.32 429.63 16.89 17.62 17.92 17.38
Avg. Edges 72.82 62.14 16.20 44.54 38.36 43.45 715.66 497.75 17.23 17.98 18.34 17.72

C.2 Hyperbolicity

To measure hyperbolic nature in the datasets, we introduce the hyperbolicity δ𝛿\deltaitalic_δ proposed by Gromov [61]. In general, the hyperbolicity δ𝛿\deltaitalic_δ quantifies the tree-likeness of a graph. The lower the value of δ𝛿\deltaitalic_δ, the more tree-like the structure, suitable to embed in hyperbolic space [62]. When δ𝛿\deltaitalic_δ = 0, the graph can be considered a tree [63, 64]. The hyperbolicity is based on the 4-node condition, a quadruple of distinct nodes n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, n3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, n4subscript𝑛4n_{4}italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT in a graph. Let π=(π1,π2,π3,π4)𝜋subscript𝜋1subscript𝜋2subscript𝜋3subscript𝜋4\pi=(\pi_{1},\pi_{2},\pi_{3},\pi_{4})italic_π = ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) be a permutation of node indices 1, 2, 3, and 4, such that

Sn1,n2,n3,n4=d(nπ1,nπ2)+d(nπ3,nπ4)Mn1,n2,n3,n4=d(nπ1,nπ3)+d(nπ2,nπ4)Ln1,n2,n3,n4=d(nπ1,nπ4)+d(nπ2,nπ3),subscript𝑆subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4𝑑subscript𝑛subscript𝜋1subscript𝑛subscript𝜋2𝑑subscript𝑛subscript𝜋3subscript𝑛subscript𝜋4subscript𝑀subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4𝑑subscript𝑛subscript𝜋1subscript𝑛subscript𝜋3𝑑subscript𝑛subscript𝜋2subscript𝑛subscript𝜋4subscript𝐿subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4𝑑subscript𝑛subscript𝜋1subscript𝑛subscript𝜋4𝑑subscript𝑛subscript𝜋2subscript𝑛subscript𝜋3\begin{split}{S}_{n_{1},n_{2},n_{3},n_{4}}&=d(n_{\pi_{1}},n_{\pi_{2}})+d(n_{% \pi_{3}},n_{\pi_{4}})\\ &\leq{M}_{n_{1},n_{2},n_{3},n_{4}}=d(n_{\pi_{1}},n_{\pi_{3}})+d(n_{\pi_{2}},n_% {\pi_{4}})\\ &\leq{L}_{n_{1},n_{2},n_{3},n_{4}}=d(n_{\pi_{1}},n_{\pi_{4}})+d(n_{\pi_{2}},n_% {\pi_{3}}),\end{split}start_ROW start_CELL italic_S start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL = italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_M start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_L start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_d ( italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW (16)

where d𝑑ditalic_d is the shortest path length, and define

δ+=Ln1,n2,n3,n4Mn1,n2,n3,n42.superscript𝛿subscript𝐿subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4subscript𝑀subscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛42\delta^{+}=\frac{{L}_{n_{1},n_{2},n_{3},n_{4}}-{M}_{n_{1},n_{2},n_{3},n_{4}}}{% 2}.italic_δ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = divide start_ARG italic_L start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG . (17)

The worst-case hyperbolicity [61] is defined as the maximum value of δ𝛿\deltaitalic_δ+ among all quadruples in the graph, i.e.,

δworst=maxn1,n2,n3,n4{δ+}.subscript𝛿𝑤𝑜𝑟𝑠𝑡subscriptsubscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4superscript𝛿\delta_{worst}=\max_{n_{1},n_{2},n_{3},n_{4}}\{\delta^{+}\}.italic_δ start_POSTSUBSCRIPT italic_w italic_o italic_r italic_s italic_t end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_δ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT } . (18)

The average hyperbolicity [63] is defined as the average value of δ𝛿\deltaitalic_δ+ among all quadruples in the graph, i.e.,

δavg=1(n4)n1,n2,n3,n4{δ+},subscript𝛿𝑎𝑣𝑔1binomial𝑛4subscriptsubscript𝑛1subscript𝑛2subscript𝑛3subscript𝑛4superscript𝛿\delta_{avg}=\frac{1}{\binom{n}{4}}\sum_{n_{1},n_{2},n_{3},n_{4}}\{\delta^{+}\},italic_δ start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ( FRACOP start_ARG italic_n end_ARG start_ARG 4 end_ARG ) end_ARG ∑ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_δ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT } , (19)

where n𝑛nitalic_n is the number of nodes in the graph.

A graph 𝒢𝒢\mathcal{G}caligraphic_G is called δ𝛿\deltaitalic_δ-hyperbolic if δworst(𝒢)δsubscript𝛿𝑤𝑜𝑟𝑠𝑡𝒢𝛿\delta_{worst}(\mathcal{G})\leq\deltaitalic_δ start_POSTSUBSCRIPT italic_w italic_o italic_r italic_s italic_t end_POSTSUBSCRIPT ( caligraphic_G ) ≤ italic_δ [63]. We adopt the aforementioned δworstsubscript𝛿𝑤𝑜𝑟𝑠𝑡\delta_{worst}italic_δ start_POSTSUBSCRIPT italic_w italic_o italic_r italic_s italic_t end_POSTSUBSCRIPT as the hyperbolicity δ𝛿\deltaitalic_δ of the datasets, which to some extent reflects the underlying hyperbolic geometry of the graph. Additionally, we report the average hyperbolicity δavgsubscript𝛿𝑎𝑣𝑔\delta_{avg}italic_δ start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT, which is robust to the addition or removal of an edge from the graph [25]. Given that the time complexity for calculating δ𝛿\deltaitalic_δworst and δavgsubscript𝛿𝑎𝑣𝑔\delta_{avg}italic_δ start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT is O(n4superscript𝑛4n^{4}italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT), we employ a random sampling method to approximate the calculations [22, 24, 25]. The results are illustrated in Table 3.

Table 3: The hyperbolicity δ𝛿\deltaitalic_δ and average hyperbolicity δavgsubscript𝛿𝑎𝑣𝑔\delta_{avg}italic_δ start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT of datasets.
Dataset PROTEINS_full ENZYMES AIDS DHFR BZR COX2 DD REDDIT-B HSE MMP p53 PPAR-gamma
δ𝛿\deltaitalic_δ 1.09 1.15 0.74 1.01 1.11 1.00 3.74 0.97 0.76 0.77 0.78 0.77
δavgsubscript𝛿𝑎𝑣𝑔\delta_{avg}italic_δ start_POSTSUBSCRIPT italic_a italic_v italic_g end_POSTSUBSCRIPT 0.14 0.15 0.15 0.12 0.18 0.09 0.64 0.05 0.12 0.12 0.12 0.12

C.3 Hyper-parameters Analysis

Number of Layers in GNN Encoder. To investigate the impact of GNN encoder layers on model performance, we conduct experiments against four representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-GNNLayerNumber). We observe that when the number of layers is set to 2, the model exhibits promising performance. However, increasing the number of layers does not lead to significant performance improvements. Conversely, when the number of layers reaches 6, a phenomenon of performance degradation commonly occurs, which we attribute to over-smoothing.

C.4 Hyper-parameters selection

We select the hyper-parameters of our proposed model through grid search. Concretely, the hyper-parameters for each dataset are illustrated in the Anonymous GitHub repository. The grid search is conducted on the following search space:

  • Number of epochs: {10, 50, 100, 200, 500, 1000, 1500, 2000, 2500}

  • Learning rate: {1e-2, 1e-3, 1e-4, 1e-5}

  • Layer number of GNN encoders: {2, 3, 4, 5, 6, 7}

  • Hidden dimension of encoders: {2, 4, 8, 16, 32, 64}

  • Model type of graph-channel encoder: {GIN, GCN, GAT}

  • Model type of hypergraph-channel encoder: {HGNN, HGCN, HGAT}

  • Layer number of MLP encoders: {1, 2, 3, 4, 5}

  • Trade-off parameter of totalsubscript𝑡𝑜𝑡𝑎𝑙\mathcal{L}_{total}caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_t italic_a italic_l end_POSTSUBSCRIPT for graph-channel and hypergraph-channel: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}

  • Weight decay of optimizer: {0.001, 0.005, 0.01, 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30}

  • Momentum of optimizer: {0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99}

  • Temperature coefficient in contrastive loss function: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2}