\useunder

\ul

HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection

Yali Fu¹¹¹footnotemark: 1 **dong Li¹ Jiahong Liu² Qianli Xing¹ Qi Wang^1,3 Irwin King²
¹Jilin University, ²The Chinese University of Hong Kong,
³Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Ministry of Education, China
{fuyl23, jdli21}@mails.jlu.edu.cn, {qianlixing, qiwang}@jlu.edu.cn,
[email protected], [email protected] Equal Contribution. Corresponding Author.

Abstract

Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group information which plays a crucial role in UGAD. In addition, most previous works ignore the global underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors on UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit node group connections, we construct hypergraphs based on gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group connections and hyperbolic geometry into this field. Extensive experiments on several real-world datasets of different fields demonstrate the superiority of HC-GLAD on UGAD task. The code is available at https://github.com/Yali-F/HC-GLAD.

1 Introduction

Graph-level anomaly detection helps uncover anomalous behaviors hidden within complex graph structures 2022_WSDM_GLocalKD ; 2022_ScientificReports_GLADC ; 2023(2024)_NeurIPS_SIGNET , which has been widely applied in various fields, including social network analysis, bioinformatics, and network security. Unlike traditional anomaly detection methods that focus on individual data points or samples, graph-level anomaly detection focuses on the overall structure, topology, or features of the entire graph. Recently, there has been a growing interest in unsupervised graph-level anomaly detection as it offers an advantage by not relying on labeled data, rendering it applicable across various real-world scenarios. Despite the considerable research and exploration already conducted in this area 2023_WSDM_GOOD-D ; 2023_DASFAA_TUAF ; 2023_ECMLPKDD_CVTGAD ; 2023_ECMLPKDD_HimNet , there are still several issues that need to be further explored.

Firstly, most existing methods only exploit the pairwise relationships to conduct unsupervised graph-level anomaly detection. However, in the real world, the relationship among graph data is not merely pairwise. It also encompasses more complex relationships, such as node group relationships. For instance, as shown in Figure 1(a), the decisive factor in determining whether a graph is a normal graph or an anomalous graph lies in the number of distinct groups to which the central group is connected outwardly 2023(2024)_NeurIPS_SIGNET . This parallels the classification of molecules in chemistry and the assessment of whether molecule substances are anomalous. In fact, node group information needs to be considered not only in the field of chemical molecules but also in other real-world scenarios. Therefore, there is an urgent need to exploit node group information to capture key patterns for UGAD.

Refer to caption — Figure 1: Toy examples to illustrate two primary challenges: (a) node group information, (b) hierarchy information, and (c) different characters of Hyperbolic space and Euclidean space.

Secondly, the majority of current methods are based on GNNs established in Euclidean space 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D . But, the dimensionality of Euclidean space is a fundamental limitation on its ability to represent complex patterns 2014_NeurIPS_ARE . It has been demonstrated that numerous real-world datasets exhibit characteristics akin to those of complex networks, including the presence of latent hierarchical structure and power-law degree distributions 2003_PhysicalReview_Hierarchical ; 2020_ICML_WHC_Hyperbolic . For example, as shown in Figure 1(b), a small group of nodes is organized hierarchically into increasingly large groups. The tree-like hierarchical organization leads to a power-law distribution of node degrees 2003_PhysicalReview_Hierarchical ; 2017_NeurIPS_PoincareBall ; 2023_TKDE_HGSR_Hyperbolic . Nevertheless, Euclidean space cannot embed latent hierarchies without suffering from high distortion, and Euclidean methods are ill-equipped to model the hierarchy information of graph data 2020_ICML_WHC_Hyperbolic ; 2011_ISGD_DelaunayEmbedding_Hyperbolic , as shown in Figure 1(c). Therefore, it is necessary to employ a new paradigm or space to exploit the latent hierarchical information in UGAD.

Based on the aforementioned challenges and analysis, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection framework, namely HC-GLAD. In concrete, for the first challenge, we build a hypergraph based on the gold motif and execute hypergraph convolution to exploit node group information. By constructing a hypergraph with anomaly awareness and continuing with hypergraph convolution, information transfer between node groups is achieved in the form of hyperedges, which compensates for the shortcomings of existing methods. For the second challenge, we incorporate hyperbolic geometry into UGAD, and conduct both graph and hypergraph embedding learning in hyperbolic space, utilizing the hyperboloid model to exploit the latent hierarchical information to enhance the performance of UGAD. Hyperbolic space can naturally represent and obtain hierarchical information in graph data with a rich hierarchical or tree-like structure 2020_ICML_WHC_Hyperbolic . The power-law distribution and hierarchical structure in hyperbolic space are interrelated, mutually reinforcing, and collectively sha** the features and properties of the network 2019_AAAI_HHNE_Hyperbolic . Our major contributions are summarized as follows:

•

We propose a novel dual hyperbolic contrastive learning for unsupervised graph-level anomaly detection framework (HC-GLAD). To the best of our knowledge, this is the first work to introduce hypergraph exploiting node group connections and hyperbolic geometry to unsupervised graph-level anomaly detection task.
•

We utilize hypergraphs to explore node group information based on gold motif. In addition, we employ hyperbolic geometry to leverage latent hierarchical information and accomplish achievements that cannot be attained in Euclidean space. The advantages of hypergraph learning, hyperbolic learning, and contrastive learning are integrated in a unified framework to jointly improve model performance.
•

We conduct extensive experiments on 12 real-world datasets, demonstrating the effectiveness and superiority of HC-GLAD for unsupervised graph-level anomaly detection.

2 Related Work

2.1 Graph-Level Anomaly Detection

In the context of graph data analysis, the objective of graph-level anomaly detection is to discern abnormal graphs from normal ones, wherein anomalous graphs often signify a minority but pivotal patterns 2021_TKDE_SurveyofGAD . OCGIN 2021(2023)_BigData_OCGIN is the first representative model, and it integrates the one-class classification and graph isomorphism network (GIN) 2019_ICLR_GIN into this graph-level anomaly detection. OCGTL 2022_IJCAI_OCGTL integrates the strengths of deep one-class classification and neural transformation learning. GLocalKD 2022_WSDM_GLocalKD implements joint random distillation to detect both locally anomalous and globally anomalous graphs by training one graph neural network to predict another graph neural network. GOOD-D 2023_WSDM_GOOD-D introduces perturbation-free graph data augmentation and performs hierarchical contrastive learning to detect anomalous graphs based on semantic inconsistency in different levels. TUAF 2023_DASFAA_TUAF builds triple-unit graphs and further learns triple representations to simultaneously capture abundant information on edges and their corresponding nodes. CVTGAD 2023_ECMLPKDD_CVTGAD applies transformer and cross-attention into UGAD, directly exploiting relationships across different views. SIGNET 2023(2024)_NeurIPS_SIGNET proposes a multi-view subgraph information bottleneck framework and further infers anomaly scores and provides subgraph-level explanations.

2.2 Hyperbolic Learning on Graphs

Hyperbolic learning has attracted massive attention from the research field due to its superior geometry property (i.e., its volume increases exponentially in proportion to its radius) of hyperbolic space compared to Euclidean space 2022_arXiv_Survey_Hyperbolic ; 2022_KDD_HICF . HGNN (hyperbolic graph neural network) 2019_NeurIPS_HGNN_Hyperbolic generalizes the graph neural networks to Riemannian manifolds and improves the performance of the full-graph classification task. It fully utilizes the power of hyperbolic geometry and demonstrates that hyperbolic representations are suitable for capturing high-level structural information. HGCN (hyperbolic graph convolutional neural network) 2019_NeurIPS_HGCN_Hyperbolic leverages both the expressiveness of GCNs and hyperbolic geometry. $\kappa$ -GCN 2020_ICML_k-GCN presents an innovative expansion of GCNs to encompass stereographic models with both positive and negative curvatures, thereby offering a unified approach. HAT (hyperbolic graph attention network) 2021_BigData_HAT_Hyperbolic proposes the hyperbolic multi-head attention mechanism to acquire robust node representation of graph in hyperbolic space and further improves the accuracy of node classification. LGCN 2021_WWW_LGCN introduces a unified framework of graph operations on the hyperboloid (i.e., feature transformation and non-linearity activation), and proposes an elegant hyperbolic neighborhood aggregation based on the centroid of Lorentzian distance. HRCF 2022_WWW_HRCF_Hyperbolic designs a geometric-aware hyperbolic regularizer to boost the optimization process by the root alignment and origin-aware penalty, and it enhances the performance of a hyperbolic-powered collaborative filtering. HyperIMBA 2023_WWW_HyperIMBA_Hyperbolic explores the hierarchy-imbalance issue on hierarchical structure and captures the implicit hierarchy of graph nodes by hyperbolic geometry.

2.3 Hypergraph Learning

Due to the capability and flexibility in modeling complex correlations of graph data, hypergraph learning has earned more attention from both academia and industry 2022_TPAMI_Survey_Hypergraph . Hypergraphs naturally depict a wide array of systems characterized by group relationships among their interacting parts 2023_ACM_Survey_HyperGraph . HGNN (hypergraph neural network) 2019_AAAI_HGNN_Hypergraph designs a hyperedge convolution operation and encodes high-order data correlation in a hypergraph structure. HyperGCN 2019_NeurIPS_HyperGCN utilizes tools from spectral theory of hypergraphs and introduces a novel way to train GCN for semi-supervised learning and combinatorial optimization tasks. HGNN⁺ 2022_TPAMI_HGNN+_Hypergraph conceptually introduces "hyperedge group", and it bridges multi-modal/multi-type data and hyperedge. DHCF 2020_KDD_DHCF constructs two hypergraphs (i.e., user and item hypergraph) and introduces a jump hypergraph convolution (jHConv) to enhance collaborative filtering recommendation performance. HHGR 2021_CIKM_HHGR_Hypergraph builds user-level and group-level hypergraphs and employs a hierarchical hypergraph convolution network to capture complex high-order relationships within and beyond groups, thus improving the performance of group recommendation. DH-HGCN 2022_SIGIR_DH-HGCN_Hypergraph utilizes both a hypergraph convolution network and homogeneity study to explicitly learn high-order relationships among items and users to enhance multiple social recommendation performance. HCCF 2022_SIGIR_HCCF_Hypergraph designs a hypergraph-enhanced cross-view contrastive learning architecture to jointly capture local and global collaborative relations in recommender system.

A more extensive review of the literature is provided in Appendix A.

3 Methodology

In this section, we introduce the preliminaries and dual hyperbolic contrastive learning for unsupervised graph-level anomaly detection framework (HC-GLAD). The overall framework and brief procedure are illustrated in Figure 2. And the pseudo-code algorithm of HC-GLAD is illustrated in Appendix B.1.

3.1 Preliminaries

Notations. We denote a graph as $G=(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ is the set of nodes and $\mathcal{E}$ is the set of edges. The topology (i.e., structure) information of $G$ is represented by adjacency matrix $A\in\mathbb{R}^{n\times n}$ , where $n$ is the number of nodes. $A_{i,j}=1$ if there is an edge between node $v_{i}$ and $v_{j}$ , otherwise, $A_{i,j}=0$ . We denote an attributed graph as $G=(\mathcal{V},\mathcal{E},\mathcal{X})$ , where $\mathcal{X}\in\mathbb{R}^{n\times{d_{attr}}}$ represents the feature matrix of node features. Each row of $\mathbf{X}$ represents a node’s feature vector with $d_{attr}$ dimension. The graph set is denoted as $\mathcal{G}=\{G_{1},G_{2},...,G_{m}\}$ , where $m$ is the number of graphs in $\mathcal{G}$ .

Problem Definition. In this work, we focus on unsupervised graph-level anomaly detection task: in the training phase, we train the model only using normal graphs; in the inference phase, given a graph set $\mathcal{G}$ containing normal graphs and anomalous graphs, HC-GLAD aims to distinguish the anomalous graphs that are different from the normal graphs.

3.2 Data Preprocessing

Graph Data Augmentation. We employ the perturbation-free graph augmentation strategy 2023_WSDM_GOOD-D ; 2023_AAAI_FedStar to generate two augmented views (i.e., $view_{1}$ and $view_{2}$ ) for an input graph $G$ . Concretely, $view_{1}$ focuses more on attribute and is directly built by integrating the node attribute $\mathcal{X}$ (for attributed graph) and adjacency matrix $A$ . $view_{2}$ focuses more on structure and is built by structural encodings from the graph topology and then it is combined with adjacency matrix $A$ .

Hypergraph Construction. After obtaining two augmented views of a graph, we essentially have two augmented graphs. Inspired by 2020_VLDB_MoCHy ; 2021_WWW_MHCN , we leverage ternary relationships between nodes, using the "gold motif" (i.e., a triangular relationship formed by three nodes) to initially construct hypergraph. Given adjacency matrix $A$ of an augmented graph ( $view_{1}$ and $view_{2}$ ), we first construct relationship matrix $A_{relation}$ of the constructed hypergraph by using gold motif. It can be calculated by:

A_{relation}=(AA^{T})\odot A=(AA)\odot A,

(1)

where $A^{T}=A$ beacause graph $G$ is an undirected graph so $A$ is symmetric.

We determine the higher-order relationships between vertices based on the matrix $\hat{A}_{realation}=A_{relation}+I_{N}$ , where $I_{N}$ is the identity matrix. We further build the incidence matrix $\mathbf{H_{inc}}$ , concretely, if vertex $v_{i}$ is connected by hyperedge $\epsilon$ , $H_{inc(i\epsilon)}=1$ , otherwise 0. While thoroughly investigating and utilizing the gold motif, we must also consider instances that do not constitute this kind of high-order relationship and ensure the integrity of the entire graph. Therefore, we will also include the edges that are not part of the high-order relationships in the incidence matrix $\mathbf{H}_{inc}$ . Finally, we get a hypergraph $HyperG$ with $N$ vertices and $M$ hyperedges. The high-order relationships in hypergraph $HyperG$ could be simply represented by the incidence matrix $\mathbf{H}_{inc}\in\mathbb{R}^{N\times M}$ .

3.3 Lorentz Manifold

Hyperbolic space, defined by its constant negative curvature, diverges from the flatness of Euclidean geometry. The Lorentz manifold is often favored for its numerical stability, making it a popular choice in hyperbolic geometry applications 2018_ICML_nickel_Hyperbolic_RSGD .

Definition 1 (Lorentzian Inner Product)

The inner product $\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}$ for vectors $\mathbf{x},\mathbf{y}\in\mathbb{R}^{d+1}$ is defined by the expression $\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}=-x_{0}y_{0}+\sum_{i=1}^{d}x_% {i}y_{i}$ .

Definition 2 (Lorentz Manifold)

A $d$ -dimensional Lorentz manifold, denoted as $\mathcal{L}^{d}$ , with a constant negative curvature, is defined as the Riemannian manifold $(\mathbb{H}^{d},g_{\ell})$ . Here, we adopt the constant negative curvature of $-1$ , and $g_{\ell}$ is the metric tensor represented by $\operatorname{diag}([-1,1,\ldots,1])$ , and $\mathbb{H}^{d}$ is the set of all vectors $\mathbf{x}\in\mathbb{R}^{d+1}$ satisfying $\langle\mathbf{x},\mathbf{x}\rangle_{\mathcal{L}}=-1$ and $x_{0}>0$ .

Next, the corresponding Lorentzian distance function for two points $\mathbf{x},\mathbf{y}\in\mathcal{L}^{d}$ is provided as:

d_{\mathcal{L}}(\mathbf{x},\mathbf{y})=~{}\mbox{arcosh}~{}(-\langle\mathbf{x},% \mathbf{y}\rangle_{\mathcal{L}}).

(2)

Definition 3 (Tangent Space)

For a point $\mathbf{x}\in\mathcal{L}^{d}$ , the tangent space $\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}$ consists of all vectors $\mathbf{v}$ that are orthogonal to $\mathbf{x}$ under the Lorentzian inner product. This orthogonality is defined such that $\langle\mathbf{x},\mathbf{v}\rangle_{\mathcal{L}}=0$ . Therefore, the tangent space can be expressed as: $\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}=\left\{\mathbf{v}:\langle\mathbf{x},% \mathbf{v}\rangle_{\mathcal{L}}=0\right\}.$

Definition 4 (Exponential and Logarithmic Maps)

Let $\mathbf{v}\in\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}$ . The exponential map $\exp_{\mathbf{x}}:\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}\rightarrow\mathcal{L% }^{d}$ and the logarithmic map $\log_{\mathbf{x}}:\mathcal{L}^{d}\rightarrow\mathcal{T}_{\mathbf{x}}\mathcal{L% }^{d}$ are defined as follows:

\exp_{\mathbf{x}}(\mathbf{v})=\cosh(\|\mathbf{v}\|_{\mathcal{L}})\mathbf{x}+% \sinh(\|\mathbf{v}\|_{\mathcal{L}})\frac{\mathbf{v}}{\|\mathbf{v}\|_{\mathcal{% L}}},

(3)

\log_{\mathbf{x}}(\mathbf{y})=d_{\mathcal{L}}(\mathbf{x},\mathbf{y})\frac{% \mathbf{y}+\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}\mathbf{x}}{\left% \|\mathbf{y}+\langle\mathbf{x},\mathbf{y}\rangle_{\mathcal{L}}\mathbf{x}\right% \|_{\mathcal{L}}},

(4)

where $\|\mathbf{v}\|_{\mathcal{L}}=\sqrt{\langle\mathbf{v},\mathbf{v}\rangle_{% \mathcal{L}}}$ denotes the norm of $\mathbf{v}$ in $\mathcal{T}_{\mathbf{x}}\mathcal{L}^{d}$ .

For computational convenience, the origin of the Lorentz manifold, denoted as $\mathbf{o}=(1,0,0,\ldots,0)$ in $\mathcal{L}^{d}$ , is selected as the reference point for the exponential and logarithmic maps. This choice allows for simplified expressions of these map**s.

\displaystyle\exp_{\mathbf{o}}(\mathbf{v})=\exp_{\mathbf{o}}\left(\left[0,% \mathbf{v}^{E}\right]\right)=\left(\cosh\left(\|\mathbf{v}^{E}\|_{2}\right),% \sinh\left(\|\mathbf{v}^{E}\|_{2}\right)\frac{\mathbf{v}^{E}}{\|\mathbf{v}^{E}% \|_{2}}\right),

(5)

where the $(,)$ denotes concatenation and the $\cdot^{E}$ denotes the embedding in Euclidean space 2021_WWW_LGCN .

3.4 Hyperbolic (Hyper-)Graph Convolution

Before we conduct hyperbolic (hyper-)graph convolution, we insert a value 0 in the zeroth dimension of the Euclidean state of the node for both $view_{1}$ and $view_{2}$ . Refer to Eq. (5), the initial hyperbolic node state $\mathbf{e}^{0}$ could be obtained by:

e^{0}_{i}=\exp_{\mathbf{o}}([0,\mathbf{x}_{i}]),

(6)

where $\mathbf{x}$ is the initial feature (or encoding) from augmented graphs (i.e., $view_{1}$ and $view_{2}$ ). $[0,\mathbf{x}]$ denotes the operation of inserting the value 0 into the zeroth dimension of $\mathbf{x}$ so that $\mathbf{e}^{0}$ could always be in the tangent space of origin 2022_KDD_HICF ; 2021_WWW_HGCF . And the superscript 0 in $e^{0}_{i}$ indicates the initial state.

3.4.1 Hyperbolic Graph Aggregation

Following 2022_KDD_HICF ; 2021_WWW_HGCF , we first map the initial embedding $e^{0}_{i}$ in hyperbolic space to the tangent space using the logarithmic map. Then, we select GCN as our fundamental graph encoder to perform graph convolution aggregation. The propagation rule in the $l$ -th layer on the $view_{1}$ can be expressed as:

\mathbf{H}^{(view_{1},~{}l)}_{graph}=\sigma\left(\hat{\mathbf{D}}^{-\frac{1}{2% }}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-\frac{1}{2}}\mathbf{H}^{(view_{1},~{}l-1)% }_{graph}\mathbf{W}^{(l-1)}\right),

(7)

where $\hat{\mathbf{A}}=\mathbf{A}+\mathbf{I}_{N}$ is the adjacency matrix of the input graph $G_{i}$ with added self-connections, and $\mathbf{I}_{N}$ is the identity matrix. $\hat{\mathbf{D}}$ is the degree matrix, $\mathbf{H}^{(f,l-1)}$ is node embedding matrix in the $l-1$ -th layer of feature view, $\mathbf{W}^{(l-1)}$ is a layer-specific trainable weight matrix, and $\sigma(\cdot)$ is a non-linear activation function 2016_arXiv_GCN . The calculation of $view_{2}$ can be calculated in the same way. After we obtain the final embedding $\mathbf{h}^{l}$ of node $i$ in tangent space, we map the final embedding from tangent space to hyperbolic space using exponential map (defined in Definition 4).

3.4.2 Hyperbolic Hypergraph Aggregation

Similar to hyperbolic graph aggregation, we first map the initial embedding $e^{0}_{i}$ in hyperbolic space to the tangent space using the logarithmic map, then we employ HGCN as our fundamental hypergraph encoder to perform hypergraph convolution aggregation. The propagation rule in the $l$ -th layer on the $view_{1}$ can be expressed as:

\mathbf{H}^{(view_{1},~{}l)}_{hyperg}=\sigma\left(\mathbf{D}^{-1/2}_{hyperg}% \mathbf{H}_{inc}\mathbf{W}\mathbf{B}^{-1}\mathbf{H}^{T}_{inc}\mathbf{D}^{-1/2}% _{hyperg}\mathbf{H}^{(view_{1},~{}l-1)}_{hyperg}\mathbf{P}\right),

(8)

where $\mathbf{D}_{hyperg}\in\mathbb{R}^{N\times N}$ is the vertex degree matrix, $\mathbf{B}\in\mathbb{R}^{M\times M}$ is the hyperedge degree matrix, $\mathbf{W}\in\mathbb{R}^{M\times M}$ is the hyperedge weights matrix, $\mathbf{P}\in\mathbb{R}^{F^{(l-1)}\times F^{(l)}}$ is weight matirx between the $(l-1)$ -th and $(l+1)$ -th layer 2021_PR_HGCN_HGAT_HyperGraph . The calculation of $view_{2}$ can be calculated in the same way. After we obtain the final embedding $\mathbf{h}^{l}$ of node $i$ in tangent space, we map the final embedding from tangent space to hyperbolic space using exponential map (defined in Definition 4).

3.5 Multi-Level Contrast

Following 2023_WSDM_GOOD-D ; 2023_ECMLPKDD_CVTGAD , we design a contrastive strategy considering both node-level contrast and graph-level contrast to train model. Our proposed model comprises both graph- and hypergraph-channels, and their methods for computing multi-level contrast are similar. We elaborate on this as follows through graph-channel contrast.

Node-level Contrast. For an input graph $G_{i}$ , we first map node embedding into node-level contrast space with MLP-based projection head, and then we construct node-level contrastive loss to maximize the agreement between the embeddings belonging to different views on the node level:

\begin{split}\mathcal{L}_{node}=\frac{1}{|\mathcal{B}|}\sum_{G_{j}\in\mathcal{% B}}\frac{1}{2|\mathcal{V}_{G_{j}}|}\sum_{v_{i}\in\mathcal{V}_{G_{j}}}\left[% \emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)+% \emph{l}\left(\mathbf{h}^{(view_{2})}_{i},\mathbf{h}^{(view_{1})}_{i}\right)% \right]\end{split},

(9)

\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)=-% log\frac{e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{i},~{}\mathbf{h}^{(% view_{2})}_{i}\right)/{\tau}\right)}}{\sum_{v_{k}\in\mathcal{V}_{G_{j}}% \backslash v_{i}}e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{i},~{}% \mathbf{h}^{(view_{2})}_{k}\right)/{\tau}\right)}}.

(10)

In Eq. (9), $\mathcal{B}$ is the training/testing batch and $\mathcal{V}_{G_{j}}$ is the node set of graph $G_{j}$ . The calculation of $\emph{l}\left(\mathbf{h}^{(view_{2})}_{i},\mathbf{h}^{(view_{1})}_{i}\right)$ and $\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)$ is the same, and we briefly show the calculation of $\emph{l}\left(\mathbf{h}^{(view_{1})}_{i},\mathbf{h}^{(view_{2})}_{i}\right)$ in Eq. (10). In Eq. (10), the $H_{Dist}\left(.,.\right)$ is the function to measure the hyperbolic distance between different views. In this work, we compute the Lorentzian distance as Eq. (2) indicates.

Graph-level Contrast. To obtain graph embedding $\mathbf{h}_{G_{i}}$ of graph $G_{i}$ , we employ mean pooling simply on embedding of nodes in graph $G_{i}$ . We first map graph embedding into graph-level contrast space with MLP-based projection head. Similar to the node-level loss $\mathcal{L}_{node}$ , we then construct a graph-level loss for mutual agreement maximization on graph level:

\mathcal{L}_{graph}=\frac{1}{2|\mathcal{B}|}\sum_{G_{i}\in\mathcal{B}}\left[% \emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}% \right)+\emph{l}\left(\mathbf{h}^{(view_{2})}_{G_{i}},\mathbf{h}^{(view_{1})}_% {G_{i}}\right)\right],

(11)

\emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}% \right)=-log\frac{e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{G_{i}},% \mathbf{h}^{(view_{2})}_{G_{i}}\right)/{\tau}\right)}}{\sum_{G_{j}\in\mathcal{% B}\backslash G_{i}}e^{\left(-H_{Dist}\left(\mathbf{h}^{(view_{1})}_{G_{i}},% \mathbf{h}^{(view_{2})}_{G_{j}}\right)/\tau\right)}},

(12)

where notations are similar to node-level loss, and $\emph{l}\left(\mathbf{h}^{(view_{2})}_{G_{i}},\mathbf{h}^{(view_{1})}_{G_{i}}\right)$ is calculated in the same way as $\emph{l}\left(\mathbf{h}^{(view_{1})}_{G_{i}},\mathbf{h}^{(view_{2})}_{G_{i}}\right)$ . The training loss function on graph channel is:

\mathcal{L}_{graph-channel}=\xi_{1}~{}\mathcal{L}_{node}+\xi_{2}~{}\mathcal{L}% _{graph},

(13)

where $\xi_{1}$ and $\xi_{2}$ are trade-off parameters, and we set $\xi_{1}=1$ and $\xi_{2}=1$ on experiments of this work for simplicity. The training loss function on the hypergraph channel is calculated in the same way as the one on the graph channel. Therefore, in the training phase, we employ the loss function as:

\mathcal{L}_{total}=\lambda_{1}~{}\mathcal{L}_{graph-channel}+\lambda_{2}~{}% \mathcal{L}_{hypergraph-channel}.

(14)

3.6 Anomaly Scoring

In the inference phase, we calculate anomaly score from both graph-channel and hypergraph-channel. For simplicity and efficiency, we directly employ the $\mathcal{L}_{total}$ (Eq. (14)) as the final anomaly score for an input graph $G_{i}$ as:

score_{G_{i}}=\mathcal{L}_{total}.

(15)

4 Experiments

4.1 Experimental Setup

Datasets. We conduct experiments on 12 open-source datasets from TUDataset 2020_arXiv_TuDataset , which involves small molecules, bioinformatics, and social networks. We follow the settings in 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D to define anomaly, while the rest are viewed as normal data (i.e., normal graphs). Similar to 2022_WSDM_GLocalKD ; 2023_WSDM_GOOD-D ; 2021_BigData_OCGIN , only normal data are utilized during the training phase.

Baselines. We select 9 representative baselines to compare with our proposed model. For the non-end-to-end method, we mainly select two categories: (i) kernel + detector. We adopt Weisfeiler-Lehman kernel (WL in short) 2011_JMLR_WL_graph_Kernel and propagation kernel (PK in short) 2016_ML_PK_graph_Kernel to first obtain representations, and then we take one-class SVM (OCSVM in short) 2001_JMLR_OCSVM and isolation forest (iF in short) 2008_ICDM_iF_graph_Kernel to detect anomaly. After arranging and combining the above kernels and detectors, there are four baselines available: PK-OCSVM, PK-iF, WL-OCSVM, and WL-iF. (ii) GCL model + detector. Considering that we used the paradigm of graph contrastive learning, we select two classic graph-level contrastive learning models (i.e., InfoGraph 2020_ICLR_InfoGraph and GraphCL 2020_NeurIPS_GraphCL ) to first obtain representations, and then we take iF as detector to detect anomaly (i.e., InfoGraph-iF, GraphCL-iF). For the end-to-end method, we select 3 classical models: OCGIN 2021_BigData_OCGIN , GLocalKD 2022_WSDM_GLocalKD and GOOD-D 2023_WSDM_GOOD-D .

Metrics and Implementations. Following 2022_WSDM_GLocalKD ; 2022_ScientificReports_GLADC ; 2023_WSDM_GOOD-D ; 2023_ECMLPKDD_CVTGAD , we adopt popular graph-level anomaly detection metric (i.e., the area under the receiver operating characteristic (AUC)) to evaluate methods. A higher AUC value corresponds to better anomaly detection performance. We use the Riemannian SGD with weight decay to learn the parameters of network 2022_KDD_HICF ; 2013_TAC_Riemannian_SGD . In practice, we implement HC-GLAD with PyTorch 2019_NeurIPS_PyTorch_Library . Appendix C provides more details of datasets and implementation.

Table 1: The performance comparison in terms of AUC (in percent, mean value ± standard deviation). The best performance is highlighted in bold, and the second-best performance is \ulunderlined. †: we report the result from 2023_WSDM_GOOD-D .

Method	PK-OCSVM†	PK-iF†	WL-OCSVM†	WL-iF†	InfoGraph-iF†	GraphCL-iF†	OCGIN†	GLocalKD†	GOOD-D†	HC-GLAD
PROTEINS-full	50.49±4.92	60.70±2.55	51.35±4.35	61.36±2.54	57.47±3.03	60.18±2.53	70.89±2.44	\ul77.30±5.15	71.97±3.86	77.51±2.58
ENZYMES	53.67±2.66	51.30±2.01	55.24±2.66	51.60±3.81	53.80±4.50	53.60±4.88	58.75±5.98	61.39±8.81	\ul63.90±3.69	65.39±6.23
AIDS	50.79±4.30	51.84±2.87	50.12±3.43	61.13±0.71	70.19±5.03	79.72±3.98	78.16±3.05	93.27±4.19	\ul97.28±0.69	99.51±0.38
DHFR	47.91±3.76	52.11±3.96	50.24±3.13	50.29±2.77	52.68±3.21	51.10±2.35	49.23±3.05	56.71±3.57	62.67±3.11	\ul61.16±4.20
BZR	46.85±5.31	55.32±6.18	50.56±5.87	52.46±3.30	63.31±8.52	60.24±5.37	65.91±1.47	69.42±7.78	\ul75.16±5.15	75.75±9.11
COX2	50.27±7.91	50.05±2.06	49.86±7.43	50.27±0.34	53.36±8.86	52.01±3.17	53.58±5.05	59.37±12.67	62.65±8.14	\ul59.98±7.44
DD	48.30±3.98	71.32±2.41	47.99±4.09	70.31±1.09	55.80±1.77	59.32±3.92	72.27±1.83	80.12±5.24	73.25±3.19	\ul77.66±1.73
REDDIT-B	45.68±2.24	46.72±3.42	49.31±2.33	48.26±0.32	68.50±5.56	71.80±4.38	75.93±8.65	77.85±2.62	88.67±1.24	\ul79.09±2.52
HSE	57.02±8.42	56.87±10.51	62.72±10.13	53.02±5.12	53.56±3.98	51.18±2.71	\ul64.84±4.70	59.48±1.44	69.65±2.14	64.05±4.75
MMP	46.65±6.31	50.06±3.73	55.24±3.26	52.68±3.34	54.59±2.01	54.54±1.86	71.23±0.16	67.84±0.59	70.51±1.56	\ul70.96±4.45
p53	46.74±4.88	50.69±2.02	54.59±4.46	50.85±2.16	52.66±1.95	53.29±2.32	58.50±0.37	\ul64.20±0.81	62.99±1.55	66.01±1.77
PPAR-gamma	53.94±6.94	45.51±2.58	57.91±6.13	49.60±0.22	51.40±2.53	50.30±1.56	71.19±4.28	64.59±0.67	67.34±1.71	\ul69.51±5.04
Avg.Rank	8.75	7.83	7.25	7.58	6.33	6.67	3.83	3.00	\ul2.08	1.67

4.2 Overall Performance

The AUC results of HC-GLAD, along with nine other baseline methods, are summarized in Table 1. As depicted in Table 1, HC-GLAD outperforms the other methods by securing first place on 5 datasets and second place on 6 datasets, while maintaining a competitive performance on the remaining dataset. Furthermore, HC-GLAD achieves the best average rank among all methods across the 12 datasets. Our observations indicate that graph kernel-based methods exhibit the poorest performance among baselines. This underperformance is attributed to their limited ability to identify regular patterns and essential graph information, rendering them less effective with complex datasets. GCL-based methods show a moderate level of performance, highlighting the competitive potential of graph contrastive learning for UGAD tasks. In conclusion, the competitive performance of our proposed model underscores the effectiveness of incorporating node group connections, as well as integrating hypergraph learning and hyperbolic geometry into graph-level anomaly detection. These findings also validate that HC-GLAD possesses inherent capabilities to capture the fundamental characteristics of normal graphs, consequently delivering superior anomaly detection performance.

4.3 Ablation Study

We conduct ablation study on four representative datasets to investigate the effects of the two key components: hypergraph-channel and hyperbolic learning. For convenience, let w/o HyperG and w/o HyperB denote the customized variants of HC-GLAD without hypergraph-channel and hyperbolic learning, respectively. The results are illustrated in Figure LABEL:fig:_Ablation-study. We can observe that HC-GLAD consistently achieves the best performance against two variants, demonstrating that hypergraph learning and hyperbolic learning are necessary to get the best detection performance. Compared with HC-GLAD, the poor performance of w/o HyperG proves the importance of considering node group information and introducing hypergraph learning to this field. The poor performance of w/o HyperB proves the importance of introducing hyperbolic learning to UGAD. Additionally, we find that on these datasets, hyperbolic learning has a more pronounced impact compared to hypergraph learning.

4.4 Hyper-parameter Analysis

Trade-off parameter $\lambda_{1}$ . In $\mathcal{L}_{total}$ , $\lambda_{1}$ and $\lambda_{2}$ are trade-off parameters that determine the weights of the graph-channel and hypergraph-channel, respectively. To investigate their impact on model performance, we conduct experiments on four representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-lambda_1. For simplicity, we set $\lambda_{2}=1-\lambda_{1}$ . We observe that as $\lambda_{1}$ increases from 0.1 to 0.9, the performance trend varies across different datasets. However, its variation does not cause significant changes in model performance, indicating that the overall model performance remains relatively stable. This implies a relatively high robustness of the proposed model.

Hidden Dimension. To investigate the impact of hidden dimension on model performance, we conduct experiments on five representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-HddenDimension. Based on our observations, we can preliminarily conclude that higher dimensionality does not necessarily lead to better performance. In certain intervals, increasing the dimensionality can actually degrade the model’s performance. The impact of dimensionality changes on model performance is minimal across most datasets. And the performance of the model remains relatively stable.

4.5 Visualization

To better understand our proposed model, we employ T-SNE 2008_JMLR_tSNE to visualize the embeddings learned by HC-GLAD, as shown in Figure LABEL:fig:_Visualization. Through observation, we can see that the embeddings of $view_{1}$ and $view_{2}$ learned via graph-channel or hypergraph-channel can already separate most normal graphs from anomalous graphs. However, it is ultimately the mechanism designed by HC-GLAD that distinctly differentiates normal graphs from anomalous graphs.

5 Conclusion and Limitation

In this paper, we propose a novel framework named HC-GLAD, which integrates the strength of hypergraph learning and hyperbolic learning to jointly enhance the performance of UGAD. In concrete, we employ hypergraph built on gold motif to exploit the node group information and utilize hyperbolic geometry to explore the latent hierarchical information. To the best of our knowledge, this is the first work to introduce hypergraph exploiting node group connections and hyperbolic geometry to the UGAD task. Through extensive experiments, we validate the superiority of HC-GLAD on 12 real-world datasets corresponding to different fields. One limitation of our method is that the integration of multiple learning paradigms in our framework may introduce increased computational cost. In the future, we will explore the design of lightweight yet efficient frameworks to overcome this limitation. We believe that our work can contribute to the advancement of the relevant field, promote societal progress, and benefit humanity.

References

[1] Rongrong Ma, Guansong Pang, Ling Chen, and Anton van den Hengel. Deep graph-level anomaly detection by glocal knowledge distillation. In Proceedings of the fifteenth ACM international conference on web search and data mining, pages 704–714, 2022.
[2] Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Hao Peng, Chuan Zhou, Hongyang Chen, Zhao Li, and Quan Z Sheng. Deep graph level anomaly detection with contrastive learning. Scientific Reports, 12(1):19867, 2022.
[3] Yixin Liu, Kaize Ding, Qinghua Lu, Fuyi Li, Leo Yu Zhang, and Shirui Pan. Towards self-interpretable graph-level anomaly detection. Advances in Neural Information Processing Systems, 36, 2024.
[4] Yixin Liu, Kaize Ding, Huan Liu, and Shirui Pan. Good-d: On unsupervised graph out-of-distribution detection. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 339–347, 2023.
[5] Zhenyang Yu, Xinye Wang, Bingzhe Zhang, Zhaohang Luo, and Lei Duan. Tuaf: Triple-unit-based graph-level anomaly detection with adaptive fusion readout. In International Conference on Database Systems for Advanced Applications, pages 415–430. Springer, 2023.
[6] **dong Li, Qianli Xing, Qi Wang, and Yi Chang. Cvtgad: Simplified transformer with cross-view attention for unsupervised graph-level anomaly detection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 185–200. Springer, 2023.
[7] Chaoxi Niu, Guansong Pang, and Ling Chen. Graph-level anomaly detection via hierarchical memory networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 201–218. Springer, 2023.
[8] Maximilian Nickel, Xueyan Jiang, and Volker Tresp. Reducing the rank in relational factorization models by including observable patterns. Advances in Neural Information Processing Systems, 27, 2014.
[9] Erzsébet Ravasz and Albert-László Barabási. Hierarchical organization in complex networks. Physical review E, 67(2):026112, 2003.
[10] Joey Bose, Ariella Smofsky, Renjie Liao, Prakash Panangaden, and Will Hamilton. Latent variable modelling with hyperbolic normalizing flows. In International Conference on Machine Learning, pages 1045–1055. PMLR, 2020.
[11] Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
[12] Yonghui Yang, Le Wu, Kun Zhang, Richang Hong, Hailin Zhou, Zhiqiang Zhang, Jun Zhou, and Meng Wang. Hyperbolic graph learning for social recommendation. IEEE Transactions on Knowledge and Data Engineering, 2023.
[13] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International symposium on graph drawing, pages 355–366. Springer, 2011.
[14] Xiao Wang, Yiding Zhang, and Chuan Shi. Hyperbolic heterogeneous information network embedding. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5337–5344, 2019.
[15] Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z Sheng, Hui Xiong, and Leman Akoglu. A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering, 35(12):12012–12038, 2021.
[16] Lingxiao Zhao and Leman Akoglu. On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights. Big Data, 11(3):151–180, 2023.
[17] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
[18] Chen Qiu, Marius Kloft, Stephan Mandt, and Maja Rudolph. Raising the bar in graph-level anomaly detection. In International Joint Conference on Artificial Intelligence, 2022.
[19] Menglin Yang, Min Zhou, Zhihao Li, Jiahong Liu, Lujia Pan, Hui Xiong, and Irwin King. Hyperbolic graph neural networks: A review of methods and applications. arXiv preprint arXiv:2202.13852, 2022.
[20] Menglin Yang, Zhihao Li, Min Zhou, Jiahong Liu, and Irwin King. Hicf: Hyperbolic informative collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2212–2221, 2022.
[21] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. Advances in neural information processing systems, 32, 2019.
[22] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional neural networks. Advances in neural information processing systems, 32, 2019.
[23] Gregor Bachmann, Gary Bécigneul, and Octavian Ganea. Constant curvature graph convolutional networks. In International conference on machine learning, pages 486–496. PMLR, 2020.
[24] Yiding Zhang, Xiao Wang, Chuan Shi, Xunqiang Jiang, and Yanfang Ye. Hyperbolic graph attention network. IEEE Transactions on Big Data, 8(6):1690–1701, 2021.
[25] Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. Lorentzian graph convolutional networks. In Proceedings of the web conference 2021, pages 1249–1261, 2021.
[26] Menglin Yang, Min Zhou, Jiahong Liu, Defu Lian, and Irwin King. Hrcf: Enhancing collaborative filtering via hyperbolic geometric regularization. In Proceedings of the ACM Web Conference 2022, pages 2462–2471, 2022.
[27] Xingcheng Fu, Yuecen Wei, Qingyun Sun, Haonan Yuan, Jia Wu, Hao Peng, and Jianxin Li. Hyperbolic geometric graph representation learning for hierarchy-imbalance node classification. In Proceedings of the ACM Web Conference 2023, pages 460–468, 2023.
[28] Yue Gao, Zizhao Zhang, Haojie Lin, Xibin Zhao, Shaoyi Du, and Changqing Zou. Hypergraph learning: Methods and practices. IEEE transactions on pattern analysis and machine intelligence, 44(5):2548–2566, 2022.
[29] Alessia Antelmi, Gennaro Cordasco, Mirko Polato, Vittorio Scarano, Carmine Spagnuolo, and Dingqi Yang. A survey on hypergraph representation learning. ACM Computing Surveys, 56(1):1–38, 2023.
[30] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3558–3565, 2019.
[31] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. Hypergcn: A new method for training graph convolutional networks on hypergraphs. Advances in neural information processing systems, 32, 2019.
[32] Yue Gao, Yifan Feng, Shuyi Ji, and Rongrong Ji. Hgnn+: General hypergraph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3181–3199, 2022.
[33] Shuyi Ji, Yifan Feng, Rongrong Ji, Xibin Zhao, Wanwan Tang, and Yue Gao. Dual channel hypergraph collaborative filtering. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2020–2029, 2020.
[34] Junwei Zhang, Min Gao, Junliang Yu, Lei Guo, Jundong Li, and Hongzhi Yin. Double-scale self-supervised hypergraph learning for group recommendation. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 2557–2567, 2021.
[35] Jiadi Han, Qian Tao, Yufei Tang, and Yuhan Xia. Dh-hgcn: dual homogeneity hypergraph convolutional network for multiple social recommendations. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 2190–2194, 2022.
[36] Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, and Jimmy Huang. Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval, pages 70–79, 2022.
[37] Yue Tan, Yixin Liu, Guodong Long, **g Jiang, Qinghua Lu, and Chengqi Zhang. Federated learning on non-iid graphs via structural knowledge sharing. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 9953–9961, 2023.
[38] Geon Lee, Jihoon Ko, and Kijung Shin. Hypergraph motifs: concepts, algorithms, and discoveries. arXiv preprint arXiv:2003.01853, 2020.
[39] Junliang Yu, Hongzhi Yin, Jundong Li, Qinyong Wang, Nguyen Quoc Viet Hung, and Xiangliang Zhang. Self-supervised multi-channel hypergraph convolutional network for social recommendation. In Proceedings of the web conference 2021, pages 413–424, 2021.
[40] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International conference on machine learning, pages 3779–3788. PMLR, 2018.
[41] Jianing Sun, Zhaoyue Cheng, Saba Zuberi, Felipe Pérez, and Maksims Volkovs. Hgcf: Hyperbolic graph convolution networks for collaborative filtering. In Proceedings of the Web Conference 2021, pages 593–601, 2021.
[42] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[43] Song Bai, Feihu Zhang, and Philip HS Torr. Hypergraph convolution and hypergraph attention. Pattern Recognition, 110:107637, 2021.
[44] Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663, 2020.
[45] L Zhao and L Akoglu. On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights. Big Data, 11(3):151–180, 2021.
[46] Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
[47] Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: efficient graph kernels from propagated information. Machine Learning, 102:209–245, 2016.
[48] Larry M Manevitz and Malik Yousef. One-class svms for document classification. Journal of machine Learning research, 2(Dec):139–154, 2001.
[49] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008.
[50] Fan-Yun Sun, Jordon Hoffman, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In International Conference on Learning Representations, 2020.
[51] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
[52] Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
[53] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
[54] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
[55] Lirong Wu, Haitao Lin, Cheng Tan, Zhangyang Gao, and Stan Z Li. Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Transactions on Knowledge and Data Engineering, 35(4):4216–4235, 2021.
[56] Yixin Liu, Ming **, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and S Yu Philip. Graph self-supervised learning: A survey. IEEE transactions on knowledge and data engineering, 35(6):5879–5900, 2022.
[57] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, **g Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1150–1160, 2020.
[58] Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. ICLR (Poster), 2(3):4, 2019.
[59] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, pages 2069–2080, 2021.
[60] Yixin Liu, Yizhen Zheng, Daokun Zhang, Vincent Lee, and Shirui Pan. Beyond smoothing: Unsupervised graph representation learning with edge heterophily discriminating. In Proceedings of the AAAI conference on artificial intelligence, 2023.
[61] M. Gromov. Hyperbolic Groups, pages 75–263. Springer New York, New York, NY, 1987.
[62] Alexandru Tifrea, Gary Bécigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic word embeddings. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
[63] Réka Albert, Bhaskar DasGupta, and Nasim Mobasheri. Topological implications of negative curvature for biological and social networks. Phys. Rev. E, 89:032811, Mar 2014.
[64] Menglin Yang, Min Zhou, Hui Xiong, and Irwin King. Hyperbolic temporal network embedding. IEEE Trans. Knowl. Data Eng., 35(11):11489–11502, 2023.

Appendix A Supplementary Related Work

A.1 Graph Contrastive Learning

Graph contrastive learning employs the principle of mutual information maximization to extract rich representations by optimizing instances with similar semantic content [55, 56]. This approach has gained widespread application for achieving outstanding performance in unsupervised graph representation learning [50, 51, 57, 58, 59, 60]. For example, GraphCL [51] proposes four types of data augmentations for graph-structured data to create pairs for contrastive learning. In the context of graph classification, InfoGraph [50] aims to maximize the mutual information between graph-level and substructure-level representations, with the latter being computed at various scales. Recent research has also applied graph contrastive learning to the field of graph-level anomaly detection. For instance, GLADC [2] captures both node-level and graph-level representations using a dual-graph encoder module within a contrastive learning framework. GOOD-D [4] detects anomalous graphs by identifying semantic inconsistencies across different granularities through a hierarchical contrastive learning framework. CVTGAD [6] similarly incorporates graph contrastive learning principles, utilizing transformer for unsupervised graph anomaly detection and explicitly accounting for co-occurrence between different views.

Appendix B Method Discussion

B.1 Algorithm

The overall algorithm of HC-GLAD is summarized in Algorithm 1.

Input : Graph set:

\mathcal{G}=\{G_{1},G_{2},...,G_{m}\}

;

Output : The anomaly scores for each graph

Score_{G}

;

Initialize : (i) graph data augmentation: Obtain two augmented graph (i.e.,

view_{1}

and

view_{2}

), using perturbation-free graph augmentation strategy [4, 37];

(ii) hypergraph construction: Construct hypergraph by "gold motif".

Training Phase

for $i=1$ to $s\_epochs$ do

Obtain initial hyperbolic node state

e^{0}

by Eq. (6).

Hyperbolic graph aggregation.

Hyperbolic hypergraph aggregation.

Graph-channel: (i) conduct node-level contrast by Eq. (9);

(ii) conduct graph-level contrast by Eq. (11).

Hypergraph-channel: (i) conduct node-level contrast by Eq. (9);

(ii) conduct graph-level contrast by Eq. (11).

Calculate graph-channel loss by Eq. (13).

Calculate hypergraph-channel loss similar to the way to calculate graph-channel loss.

Calculate the total loss by Eq. (14).

end for

Inference Phase

for $G_{i}$ in Graph set $G$ do

Calculate anomaly scores via Eq. (15).

end for

Algorithm 1 HC-GLAD

Appendix C Supplement of Experiments

C.1 Datasets

More details about datasets we employed in our experiments are illustrated in Table 2.

Table 2: The statistics of datasets of our experiments from TUDataset [44].

Dataset	PROTEINS_full	ENZYMES	AIDS	DHFR	BZR	COX2	DD	REDDIT-B	HSE	MMP	p53	PPAR-gamma
Graphs	1113	600	2000	467	405	467	1178	2000	8417	7558	8903	8451
Avg. Nodes	39.06	32.63	15.69	42.43	35.75	41.22	284.32	429.63	16.89	17.62	17.92	17.38
Avg. Edges	72.82	62.14	16.20	44.54	38.36	43.45	715.66	497.75	17.23	17.98	18.34	17.72

C.2 Hyperbolicity

To measure hyperbolic nature in the datasets, we introduce the hyperbolicity $\delta$ proposed by Gromov [61]. In general, the hyperbolicity $\delta$ quantifies the tree-likeness of a graph. The lower the value of $\delta$ , the more tree-like the structure, suitable to embed in hyperbolic space [62]. When $\delta$ = 0, the graph can be considered a tree [63, 64]. The hyperbolicity is based on the 4-node condition, a quadruple of distinct nodes $n_{1}$ , $n_{2}$ , $n_{3}$ , $n_{4}$ in a graph. Let $\pi=(\pi_{1},\pi_{2},\pi_{3},\pi_{4})$ be a permutation of node indices 1, 2, 3, and 4, such that

\begin{split}{S}_{n_{1},n_{2},n_{3},n_{4}}&=d(n_{\pi_{1}},n_{\pi_{2}})+d(n_{% \pi_{3}},n_{\pi_{4}})\\ &\leq{M}_{n_{1},n_{2},n_{3},n_{4}}=d(n_{\pi_{1}},n_{\pi_{3}})+d(n_{\pi_{2}},n_% {\pi_{4}})\\ &\leq{L}_{n_{1},n_{2},n_{3},n_{4}}=d(n_{\pi_{1}},n_{\pi_{4}})+d(n_{\pi_{2}},n_% {\pi_{3}}),\end{split}

(16)

where $d$ is the shortest path length, and define

\delta^{+}=\frac{{L}_{n_{1},n_{2},n_{3},n_{4}}-{M}_{n_{1},n_{2},n_{3},n_{4}}}{% 2}.

(17)

The worst-case hyperbolicity [61] is defined as the maximum value of $\delta$ ⁺ among all quadruples in the graph, i.e.,

\delta_{worst}=\max_{n_{1},n_{2},n_{3},n_{4}}\{\delta^{+}\}.

(18)

The average hyperbolicity [63] is defined as the average value of $\delta$ ⁺ among all quadruples in the graph, i.e.,

\delta_{avg}=\frac{1}{\binom{n}{4}}\sum_{n_{1},n_{2},n_{3},n_{4}}\{\delta^{+}\},

(19)

where $n$ is the number of nodes in the graph.

A graph $\mathcal{G}$ is called $\delta$ -hyperbolic if $\delta_{worst}(\mathcal{G})\leq\delta$ [63]. We adopt the aforementioned $\delta_{worst}$ as the hyperbolicity $\delta$ of the datasets, which to some extent reflects the underlying hyperbolic geometry of the graph. Additionally, we report the average hyperbolicity $\delta_{avg}$ , which is robust to the addition or removal of an edge from the graph [25]. Given that the time complexity for calculating $\delta$ _worst and $\delta_{avg}$ is O( $n^{4}$ ), we employ a random sampling method to approximate the calculations [22, 24, 25]. The results are illustrated in Table 3.

Table 3: The hyperbolicity

\delta

and average hyperbolicity

\delta_{avg}

of datasets.

Dataset	PROTEINS_full	ENZYMES	AIDS	DHFR	BZR	COX2	DD	REDDIT-B	HSE	MMP	p53	PPAR-gamma
$\delta$	1.09	1.15	0.74	1.01	1.11	1.00	3.74	0.97	0.76	0.77	0.78	0.77
$\delta_{avg}$	0.14	0.15	0.15	0.12	0.18	0.09	0.64	0.05	0.12	0.12	0.12	0.12

C.3 Hyper-parameters Analysis

Number of Layers in GNN Encoder. To investigate the impact of GNN encoder layers on model performance, we conduct experiments against four representative datasets. The results are illustrated in Figure LABEL:fig:_Hyper-parameter-GNNLayerNumber). We observe that when the number of layers is set to 2, the model exhibits promising performance. However, increasing the number of layers does not lead to significant performance improvements. Conversely, when the number of layers reaches 6, a phenomenon of performance degradation commonly occurs, which we attribute to over-smoothing.

C.4 Hyper-parameters selection

We select the hyper-parameters of our proposed model through grid search. Concretely, the hyper-parameters for each dataset are illustrated in the Anonymous GitHub repository. The grid search is conducted on the following search space:

•

Number of epochs: {10, 50, 100, 200, 500, 1000, 1500, 2000, 2500}
•

Learning rate: {1e-2, 1e-3, 1e-4, 1e-5}
•

Layer number of GNN encoders: {2, 3, 4, 5, 6, 7}
•

Hidden dimension of encoders: {2, 4, 8, 16, 32, 64}
•

Model type of graph-channel encoder: {GIN, GCN, GAT}
•

Model type of hypergraph-channel encoder: {HGNN, HGCN, HGAT}
•

Layer number of MLP encoders: {1, 2, 3, 4, 5}
•

Trade-off parameter of $\mathcal{L}_{total}$ for graph-channel and hypergraph-channel: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
•

Weight decay of optimizer: {0.001, 0.005, 0.01, 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30}
•

Momentum of optimizer: {0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99}
•

Temperature coefficient in contrastive loss function: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2}