Generating Explanations for Cellular Neural Networks

Akshit Sinha*, Sreeram Vennam*, Charu Sharma, Ponnurangam Kumaraguru akshit.sinha, [email protected] charu.sharma, [email protected] IIIT, HyderabadIndia

Abstract.

Recent advancements in graph learning contributed to explaining predictions generated by Graph Neural Networks. However, existing methodologies often fall short when applied to real-world datasets. We introduce HOGE, a framework to capture higher-order structures using cell complexes, which excel at modeling higher-order relationships. In the real world, higher-order structures are ubiquitous like in molecules or social networks, thus our work significantly enhances the practical applicability of graph explanations. HOGE produces clearer and more accurate explanations compared to prior methods. Our method can be integrated with all existing graph explainers, ensuring seamless integration into current frameworks. We evaluate on GraphXAI benchmark datasets, HOGE achieves improved or comparable performance with minimal computational overhead. Ablation studies show that the performance gain observed can be attributed to the higher-order structures that come from introducing cell complexes.

Graph Learning, Graph Interpretability, Cell Complexes, Explainability in Graph Learning, Graph Classification

^†^†copyright: CC^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†conference: ; Pre-print; 2024^†^†submissionid: 126

Refer to caption — Figure 1. (a) Ground Truth for an example from BENZENE (Agarwal et al., 2023) (b) Explanations generated by GNNExplainer (c) Explanations generated by HOGE-GNNExplainer. In all cases, green nodes and edges signify the subgraph considered important for GNN prediction (the explanation). Incorporating HOGE substantially increases the explanation accuracy for this example from 0.125 to 1.

1. Introduction

Graph^†^†*Both authors contributed equally to this research. neural networks (GNNs) (Kipf and Welling, 2017) have become increasingly important in machine learning, since in many real-world applications, including social, information, chemical, and biological domains, data can be naturally modeled as graphs (Perozzi et al., 2014; Kipf and Welling, 2017; Veličković et al., 2018). Despite their power in capturing intricate node features and structural information, GNNs face unique interpretability challenges due to the message passing mechanism. Interpretability is crucial for sensitive domains like healthcare and finance, where decisions must be transparent and justifiable. As the use of GNNs expands across various fields, understanding their complex decision-making processes becomes important to ensure that the outcomes are ethical and accountable. Enhancing the interpretability of GNNs is essential for both human comprehension and building trust to facilitate informed decision-making by stakeholders. Effective explainability translates complex model outputs into human-intelligible insights. This simplifies the underlying rationale of GNN predictions and encourages users to leverage these advanced technologies confidently. Various strategies and frameworks to address these challenges have been proposed. Some popular works include GNNExplainer (Ying et al., 2019), a perturbation-based method to generate important subgraphs, and GraphLIME (Huang et al., 2020), which fits a simple and interpretable linear surrogate model to the locality of the prediction. However, these methods often fall short in real-world scenarios, as shown in the example Figure 1. Existing methods fail to adequately explain GNNs for real world applications, this is the gap our works aims to fill by introducing a novel framework that leverages higher-order structures to enhance graph explainability.

Another emerging research area in recent years is the exploration of higher-order structures such as simplicial or cell complexes in modeling the data. A cell complex is a mathematical structure consisting of vertices, edges, faces, and their higher-dimensional counterparts, assembled in a way that defines a topological space. Prior work has also introduced message-passing schemes to model such higher-order relationships. This exploration is motivated by the inherent limitations of traditional GNNs, which struggle with expressive power and managing long-range interactions. This is because they typically do not model complex, multi-level relationships in data. Ebli et al.(Ebli et al., 2020) and Bodnar et al.(Bodnar et al., 2021) provide different strategies to model these higher-order structures and create more powerful neural networks. They show that incorporating higher-order structures improves performance on benchmark datasets. This improvement has been attributed to the increased expressivity and information exchange taking place during message passing in such neural networks. While these approaches have improved the modeling of higher-order relationships, their potential for enhancing explainability has not been explored yet, a gap our work aims to fill.

This paper explores graph explainability through the lens of higher-order structures. Prior studies demonstrated that higher-order structures enhance performance on benchmarks, suggesting potential benefits for explainability. This can be achieved by modifying the underlying structure and making the final data representations richer in information.

To study the influence of higher order structures on model explainability, we introduce HOGE: Higher Order Graph Explainer, a novel framework designed to utilize higher-order structures in graph explanations. The framework for HOGE is outlined in Figure 2, and visually summarises the operational pipeline for our method. Our approach only augments the input graph, ensuring that HOGE remains model-agnostic and explainer-agnostic. This allows it to integrate seamlessly into any existing GNN pipeline.

Unlike traditional methods that analyze sensitivity or generate subgraphs based on existing graph topology, HOGE leverages the complex interrelationships made possible by cell complexes. This technique enables a more comprehensive capture of multi-level interactions within the graph. This allows HOGE to not only enhance the accuracy of explanations but also to provide a deeper understanding of the underlying mechanisms by modifying the graph’s structure itself. This approach represents an exploration of higher-order structures in the domain of graph explainability. We systematically extend the concepts and methodologies introduced in prior work to analyze the potential of higher-dimensional structures in enhancing existing graph explainability methods.

Our experiments demonstrate that explanations using higher-order structures are more accurate and show how significantly these structures contribute to the explanation process. Our findings show that lifting a graph to a higher-order structure before training a GNN enhances explanation accuracy significantly. Furthermore, ablation studies reveal that lifting graphs to cell complexes before the training phase of GNNs is responsible for achieving high explanation accuracy. However, the process of lifting graphs is less critical during the training and inference phases of graph explainers. This suggests that the improvements in explanations stem more from the increased information flow and denser graph representations provided by higher-order structures, and not from the structures themselves. In particular, the main contributions of our paper are as follows:

(1)

To the best of our knowledge, this is the first attempt at studying graph explainability using higher-order structures for graph classification tasks.
(2)

We propose HOGE, a novel framework that enhances the explainability of graph neural networks by utilizing higher-order structures. HOGE is designed to be both model-agnostic and explainer-agnostic, allowing seamless integration into any graph learning pipeline.
(3)

We perform an empirical analysis of our proposed method for various graph neural network explainers. We observe that HOGE significantly improves over graph-based explainers. We attribute this to increased information flow and structural changes, providing critical insights into their application during the training and inference phases. All our code and models used for experiments will be made publicly available after the review period.

2. Related Work

Various methods have been proposed to explain predictions generated by GNNs. In this section, we focus on a few popular methods that have come out in recent years (2019-2021). Explainability methods for GNNs can be categorized based on the approach they take. For a comprehensive overview of GNN explainability methods, please refer to (Kakkad et al., 2023). Traditional methods include perturbation-based, surrogate-based, and gradient-based methods.

Perturbation-based methods, such as GNNExplainer (Ying et al., 2019), PGExplainer (Luo et al., 2020), and SubgraphX (Yuan et al., 2021), assess the impact of subgraph structures and node features on GNN performance by introducing perturbations. GNNExplainer generates an explanation mask for individual predictions, while PGExplainer uses a neural network to identify key edges. SubgraphX employs Shapley values and the Monte Carlo Tree Search algorithm to quantify feature importance.

Surrogate methods, on the other hand, use a two-step process where data generated from a graph’s local neighborhood informs a surrogate model. This model then provides insights into the original model’s decision-making process. For example, GraphLime (Huang et al., 2020) utilizes a Hilbert-Schmidt Independence Criterion Lasso, a kernel-based approach, to provide interpretable local explanations.

Gradient-based methods, among the earliest developed to explain GNN predictions, assess how changes in input affect predictions by analyzing gradients. Methods like Sensitivity Analysis and Guided-BP compute feature importance directly from these gradients. Grad-CAM (Selvaraju et al., 2019), specifically, calculates node importance by summing the feature maps of node embeddings weighted by their gradients, offering a nuanced view of feature relevance.

Building on these foundational methods, recent advancements such as RG-Explainer (Shan et al., 2021) utilize reinforcement learning to craft explanatory subgraphs tailored to both the model and individual instances, enhancing the precision of explanations. Similarly, MATE (Spinelli et al., 2024) enhances explanation quality by iteratively training both a GNN and a baseline explainer. This allows the model to develop internal representations that are inherently more interpretable. This emphasis on enhancing internal representations aligns with our approach, where we elevate graph structures into higher-order cell complexes to achieve even deeper explanatory insight.

While existing methods for GNN explainability have advanced our understanding of how GNNs process information, they predominantly focus on low-dimensional, direct interactions within graphs. HOGE introduces a significant departure from these approaches by integrating higher-order structures into the explanation process.

3. Proposed Approach

3.1. Preliminaries

Graphs typically model pairwise interactions. They are generalized by simplicial and cell complexes¹¹1For a more comprehensive understanding of cell complexes, we point the reader to https://jeffe.cs.illinois.edu/teaching/comptop/2009/notes/cell-complexes.pdf, which encapsulate higher dimensional relationships, allowing for group-wise interactions among points. Unlike simplicial complexes, which have a rigid combinatorial structure that constrains transformation possibilities, cell complexes provide greater flexibility. This flexibility enhances the ability to control message passing and decouple input from computational graphs, facilitating better data interpretations (Bodnar et al., 2022).

Although this work does not extensively engage with algebraic topology concepts associated with cell complexes, we nevertheless list some fundamental definitions that are essential for understanding the proposed framework (Bodnar et al., 2022; Yang et al., 2022; Ebli et al., 2020).

Definition 3.0 ( $p$ -cell).

A $p$ -cell in a cell complex refers to an element of dimension $p$ . In analogy to traditional graphs where we have vertices (0-dimensional) and edges (1-dimensional), cell complexes include these and extend to higher dimensions. For instance, 0-cells are points, 1-cells are edges, and 2-cells can be envisioned as surfaces of geometric figures. For example, in a three-dimensional cell complex, a 2-cell could represent the triangular surface of a tetrahedron.

Definition 3.0 (Faces/Cofaces).

In a cell complex, a $p$ -cell $\chi_{p}$ is considered a face or boundary of a $(p+1)$ -cell $\chi_{(p+1)}$ if the set of points composing $\chi_{p}$ is a subset of those composing $\chi_{(p+1)}$ . Conversely, $\chi_{(p+1)}$ is referred to as the coface or coboundary of $\chi_{p}$ . For example, in a 3D complex, if a triangle (2-cell) is part of the surface of a tetrahedron (3-cell), the triangle is a face of the tetrahedron, and the tetrahedron is a coface of the triangle.

Definition 3.0 ( $p$ -chain).

In a given cell complex, a $p$ -chain is simply defined as the set of all $p$ -dimensional cells. For example, in a 2D complex such as a triangle, the set of all vertices is the 0-chain of the cell complex.

Definition 3.0 ( $p$ -skeleton).

The $p$ -skeleton of a cell complex $\chi$ is defined as the subcomplex $\chi^{(p)}$ consisting of cells of dimension at most $p$ .

Definition 3.0 (connection).

Within a cell complex, a connection is analogous to an edge in traditional graphs. It connects two cells, either of the same dimension (horizontal connection) or different dimensions (vertical connection). A horizontal connection links two $p$ -cells that share a common coface, whereas vertical connections link a $p$ -cell to its corresponding faces or cofaces.

3.2. Lifting Graphs to Higher Dimensions

Graphs are typically represented through adjacency matrices describing the connections between the nodes. This notion can be naturally extended to cell complexes. This section introduces the lifting operation on graphs (Bodnar et al., 2022). Within cell complexes, each $p$ -chain possesses a distinct adjacency matrix $A_{p}$ , representing the interconnections among cells of the same dimension. Furthermore, incidence matrices $B_{(p)(p+1)}$ are employed to portray the relationships between $p$ -cells and their respective $(p+1)$ -cell cofaces, illustrating a hierarchical structural integration. See Figure 3 for an example.

Using these definitions, we can represent a cell complex through a complete adjacency matrix, following a representation introduced by Yang et al. (Yang et al., 2022) for simplicial complexes.

\displaystyle A_{c}=\begin{bmatrix}A_{0}&B_{01}&0\\ B_{01}^{T}&A_{1}&B_{12}\\ 0&B_{12}^{T}&A_{2}\end{bmatrix}

In the context of this work, where the primary focus is on 0-cells or nodes, the full adjacency matrix is simplified to a partial adjacency matrix. This is achieved by disregarding the adjacency matrices of 1-cells and 2-cells. This approach not only maintains topological information but also enhances computational efficiency by reducing the complexity of data representation.

\displaystyle A_{c}=\begin{bmatrix}A_{0}&B_{01}&0\\ B_{01}^{T}&0&B_{12}\\ 0&B_{12}^{T}&0\end{bmatrix}

To quantify the increase in the size of the adjacency matrix when converting from graphs to cell complexes, we present theoretical results below and empirical results in section 5.4.

Theorem 3.6.

For a graph $G(V,E)$ with adjacency matrix $A$ having cycles of length at most $K$ , let $W_{k}$ represent the number of closed walks of length $k$ which are not $k$ -cycles. The corresponding cell complex $\chi$ will have cells $V_{\chi}$ and connections $E_{\chi}$ such that

(1)

\displaystyle|V_{\chi}|=|V|+|E|+\sum_{k=3}^{K}\frac{1}{2k}[tr({A^{(k)})}-W_{k}]

(2)

\displaystyle|E_{\chi}|=3|E|+\frac{1}{2}\sum_{k=3}^{K}[tr({A^{(k)})}-W_{k}]

For $V_{\chi}$ , the proof follows by construction. We restrict cell complexes to have at most 2-cells. Let us define the set of 0-cells as $V$ , 1-cells as $E$ , and 2-cells as the set of all cycles present in the graph, of length at most $K$ . Thus, the number of cells that will be present in the cell complex of graph $G$ is exactly the total number of vertices, edges, and cycles. The third term in Equation 1 represents the total number of cycles present in the graph. This term is calculated using the formula for finding the number of $k$ -length cycles in a graph, introduced in prior work done by Movarraei and Boxwala (Gerbner et al., 2018), where they show that

|C_{k}|=\frac{1}{2k}[tr({A^{(k)})}-W_{k}]

Where $|C_{k}|$ is the number of cycles of length $k$ .

Equation 2 can also be proved following construction. The connections in $\chi$ are of three types,

(1)

The original edges of the graph, equivalently represented by $A_{0}$ ,
(2)

The connections between nodes (0-cells) and edges (1-cells), equivalently represented by $B_{01}$ ,
(3)

The connections between edges (1-cells) and cycles (2-cells), equivalently represented by $B_{12}$

The number of connections present in $A_{0}$ is simply $|E|$ . For the incidence matrix $B_{01}$ , 2 connections are added per 1-cell, as each 1-cell is connected to two 0-cells. For the incidence matrix $B_{12}$ , a connection is added between a cycle (2-cell) and each edge (1-cell) contained in it. As each $k$ -cycle will consist of $k$ edges, we add $k$ connections for each $k$ -cycle. This leads us to the following

|E_{\chi}|=|E|+2|E|+\sum_{k=3}^{K}\left(\left(k\right)\frac{1}{2k}[tr({A^{(k)}% )}-W_{k}]\right)

(3)

\displaystyle|E_{\chi}|=3|E|+\frac{1}{2}\sum_{k=3}^{K}[tr({A^{(k)})}-W_{k}]

3.3. Generating Cell Complexes Explanations

Building on the structural representation of cell complexes via adjacency matrices, these matrices can now be integrated into any GNN architecture. This integration enables the GNN to perform message passing across higher-dimensional structures encapsulated within the complexes. As cited, the motivation for utilizing higher-order structures stems from prior findings that suggest such configurations enhance model accuracy significantly (Ebli et al., 2020). This enhancement is primarily due to an augmented volume of informational exchange. We hypothesize that by leveraging cell complexes, GNNs can achieve a more nuanced representation of data, which in turn enhances the capability of graph explainers to generate more accurate and insightful model predictions.

We demonstrate the change in the working of graph explainers by integrating HOGE with GNNExplainer, demonstrating how adding cell complexes modifies the explainer’s working. Originally, the optimization function of GNNExplainer is as follows

(4)

\displaystyle\min_{M}\sum_{c=1}^{C}1[y=c]\log P_{\Phi}(Y=y|G=A_{c}\odot\sigma(% M),X=X_{c})

In this function, $A_{c}$ is the adjacency matrix representing the computation graph of the node to be explained. $M$ represents the mask optimizing the explanation, $C$ is the number of classes, $y$ is the class label, $P_{\phi}$ denotes the prediction probability by the model $\phi$ , $G$ represents the graph after applying the mask $M$ to the adjacency matrix $A_{c}$ , and $X_{c}$ is the feature matrix. The goal is to minimize the log loss of the correct class prediction across all classes.

Transitioning from traditional computation graphs, our approach utilizes cell complexes as the input to the function. This shift allows the integration of more complex structural data into the computational framework. Consequently, the learned edge mask is adapted to represent the computational complex, incorporating both the connectivity and hierarchical structure inherent in cell complexes. Let $A_{\chi}$ be the computational complex of a 0-cell. Again, since we are only explaining predictions through nodes, we omit the higher-order adjacency matrices. For a two-layer GCN, $A_{\chi}$ can be understood as having two parts (1) the horizontal computation graph, which contains horizontal connections between 0-cells, and (2) the vertical computation graph, which contains vertical connections between cells across dimensions. This is shown in Figure 4.

In this setting, $A_{c}$ can be extended to $A_{\chi}$ as below. $B_{(p)(p+1)}$ represent the incidence matrices derived from $A_{c}$ .

(5)

\displaystyle A_{\chi}=\begin{bmatrix}A_{c}&B_{01}&0\\ B_{01}^{T}&0&B_{12}\\ 0&B_{12}^{T}&0\end{bmatrix}

Consequently, with a richer representation of the underlying graph, we can modify (4) to (6):

(6)

\displaystyle\min_{M_{\chi}}\sum_{c=1}^{C}1[y=c]P_{\chi}

P_{\chi}=\log P_{\Phi}(Y=y|G_{\chi}=A_{\chi}\odot\sigma(M\chi),X=X_{c})

In summary, we train a GNN on the cell complex generated by lifting the graph and then provide the trained GNN and cell complex as inputs to a graph explainer. The explainer then learns $M_{\chi}$ , which is the connection mask for the cell complex, and quantifies the importance of all the connections in the cell complex. Following this, the explainer outputs the final edge importance mask for the input graph.

Table 1. Dataset details of Graph Classification datasets from the GraphXAI (Agarwal et al., 2023) Benchmark.

\varnothing

indicates average.

Dataset	No. of graphs	$\varnothing$ No. of vertices	$\varnothing$ No. of edges	$\varnothing$ Degree	$\varnothing$ No. of Cycles	Max Cycle Length
Benzene	12000	20.58	43.63	2.12	2.242	6
Mutagenicity	1768	29.1	60.83	2.08	2.20	6
Fluoride Carbonyl	8671	21.42	45.44	2.12	2.256	6

3.4. Information Propagation

With the explainer generating an importance mask $M_{\chi}$ for the cell complex connections, we must propagate this information from the higher-order structures back to the base graph structure. We term this process information propagation, as it involves transferring the learned importance values from the complex to the original domain. A visual representation is shown in Figure 5. For brevity, We note that there are multiple ways to propagate information to the base graph, and in this section, describe one in detail. We term this specific information propagation method Direct Propagation. We also introduce three alternative methods: 0-skeleton Propagation, 1-skeleton Propagation and Hierarchical Propagation. All methods are described and evaluated in section 5.3.

Formally, let $I^{G}_{e}$ denote the derived importance of edge $e$ in the base graph $G$ , and $I^{\chi}_{e}$ represent the importance of the corresponding connection $e$ in the cell complex $\chi$ . We define the information propagation function as:

(7)

\displaystyle I^{G}_{e}=\tanh\left(I^{\chi}_{e}+\frac{1}{|C_{e}^{\chi}|}\sum_{% \tau\in C^{\chi}_{e}}I^{\chi}_{\tau}\right)

where $\mathcal{C}^{\chi}_{e}$ is the set of all connections in $\chi$ that constitute the higher-order structures (e.g., cycles) of which $e$ is a part. The $\tanh$ activation ensures that the propagated importance values are bounded within the range [0, 1], while the summation term captures the cumulative influence of higher-order structures on the base edge. The $\tanh$ was chosen over other non-linear functions owing to its steeper slope, which polarises the importance values to tend to the bounds. This creates a notable distinction between important and unimportant edges.

This information propagation step is crucial as it translates the higher-order explanations generated by HOGE into a format that is comprehensible and actionable for the original graph domain. By explicitly accounting for the hierarchical relationships between the base graph and its lifted representation, we ensure that the explanatory power of higher-order structures is effectively distilled and propagated to the base level.

3.5. Dimension Masking

To gain more control over the incorporation of higher-order structures from the cell complex, we introduce a dimension masking strategy. This selectively includes or excludes specific dimensions when constructing the cell complex adjacency matrix $A_{c}$ .

We define a dimension mask $D_{m}$ , which is a binary matrix of the same dimensions as $A_{c}$ . The entries in $D_{m}$ corresponding to the dimensions we want to keep are set to 1, while the entries for dimensions to be excluded are set to 0. Using $D_{m}$ , we compute the masked adjacency matrix $C_{0}$ as:

(8)

\displaystyle C_{0}=(A_{c}\odot D_{m})\odot\sigma(M)

where $\odot$ denotes the elementwise product, and $\sigma(M)$ applies the learned importance mask $M$ using a non-linear activation function.

This dimension masking technique serves two main purposes. Firstly, it enables a systematic evaluation of the relative importance of different dimensional structures within the cell complex. By selectively masking out certain dimensions, we can identify the most informative subspaces contributing to the explanation process.

Secondly, dimension masking acts as a regularization mechanism. During training, we can disable specific higher-order dimensions that may introduce noise or cause overfitting. This encourages the model to learn representations that are robust to irrelevant higher-order features, potentially enhancing its generalization performance.

With this, we generate an explanation for the input graph by propagating information from the generated cell complex back to the original graph.

4. Experimental Settings

4.1. Datasets

We evaluate HOGE on GraphXAI (Agarwal et al., 2023) benchmark datasets. Table 1 briefly describes some properties of the datasets. These datasets are real-world molecular datasets, which were chosen as molecules inherently include higher-order structures like rings.

Benzene The Benzene (Agarwal et al., 2023) dataset contains 12,000 molecular graphs extracted from the ZINC15 (Sterling and Irwin, 2015) database and labeled into two classes. The task is to identify whether a given molecule is a benzene ring or not. Explanations are based on the presence or absence of benzene rings.

Mutagenicity This (Kazius et al., 2005) dataset contains 1,768 graphs, each representing a molecule. Graph molecules are labeled into two different classes according to their Mutagenicityenic properties. Ground truth explanations are based on the presence or absence of chosen toxicophores: NH2, NO2, aliphatic halide, nitroso, and azo-type.

Fluoride carbonyl The Fluoride Carbonyl (Agarwal et al., 2023) dataset contains 8,671 molecular graphs. Graph molecules are labeled into two classes where a positive sample indicates that the molecule contains a fluoride and a carbonyl functional group. The ground-truth explanations consist of combinations of fluoride atoms and carbonyl functional groups within a given molecule.

4.2. Evaluation Criteria

The task of a graph explainer method can be seen as binary classification, predicting whether an edge in the graph belongs in the ground truth explanation or not.

4.2.1. Graph Explanation Accuracy

We adopt Graph Explanation Accuracy (GEA) from GraphXAI (Agarwal et al., 2023). It measures the correctness of the generated explanation using the Jaccard index between the ground truth and prediction.

JAC(Y,\hat{Y})=\frac{TP(Y,\hat{Y})}{TP(Y,\hat{Y})+FP(Y,\hat{Y})+FN(Y,\hat{Y})}

Where $Y$ is the ground truth binary edge mask and $\hat{Y}$ is the predicted binary edge mask.

4.2.2. Area Under Curve (AUC)

The Area Under Curve (AUC) metric quantifies the performance of a binary classifier by measuring the area under the Receiver Operating Characteristic (ROC) curve. This curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

\text{AUC}=\int_{0}^{1}\text{TPR}(t)\,dt

where $t$ represents the threshold value.

4.2.3. F1 Score

The $F1$ Score is a widely used metric for evaluating the accuracy of binary classification models, particularly in situations where class distributions are imbalanced. It is the harmonic mean of precision and recall, providing a balance between the two by penalizing extreme values. The formula for the F1 Score is given by:

F1=2\cdot\frac{\text{Precision}\cdot\text{Recall}}{\text{Precision}+\text{% Recall}}

where Precision is defined as the ratio of true positive observations to the total predicted positives, and Recall is the ratio of true positives to the total actual positives.

4.3. Baselines

For our comparative analysis, we select three established graph explanation methods as baselines, each paired with its HOGE counterpart. These explainers are evaluated for their efficacy in extracting meaningful explanations from graph neural networks (GNNs). Below, we detail each baseline and specify the underlying GNN models used in our experiments:

Table 2. Results on graph classification datasets for the HOGE framework. The better result between a baseline explainer and its HOGE version is underlined. We find that HOGE consistently achieves comparable or superior performance across explainers and datasets. *Although AE gets the best results on Fluoride Carbonyl, as we describe in section 5.1.2, the results are not reliable.

Dataset	Metrics	Explainer
		GNNE	HOGE-GNNE	GME	HOGE-GME	AE	HOGE-AE
Benzene	GEA	0.124 $\pm$ 0.170	0.745 $\pm$ 0.069	0.072 $\pm$ 0.021	0.113 $\pm$ 0.022	0.032 $\pm$ 0.021	0.084 $\pm$ 0.031
	AUC	0.45 $\pm$ 0.119	0.884 $\pm$ 0.03	0.503 $\pm$ 0.007	0.509 $\pm$ 0.006	0.374 $\pm$ 0.023	0.515 $\pm$ 0.018
	F1	0.165 $\pm$ 0.208	0.831 $\pm$ 0.057	0.101 $\pm$ 0.028	0.160 $\pm$ 0.031	0.059 $\pm$ 0.038	0.146 $\pm$ 0.051
Fluoride Carbonyl	GEA	0.134 $\pm$ 0.008	0.178 $\pm$ 0.046	0.045 $\pm$ 0.019	0.050 $\pm$ 0.017	0.238 $\pm$ 0.128*	0.012 $\pm$ 0.004
	AUC	0.416 $\pm$ 0.025	0.554 $\pm$ 0.03	0.504 $\pm$ 0.002	0.501 $\pm$ 0.003	0.624 $\pm$ 0.091*	0.483 $\pm$ 0.004
	F1	0.226 $\pm$ 0.014	0.276 $\pm$ 0.073	0.072 $\pm$ 0.030	0.078 $\pm$ 0.028	0.456 $\pm$ 0.105*	0.018 $\pm$ 0.006
Mutagenicity	GEA	0.234 $\pm$ 0.021	0.161 $\pm$ 0.001	0.057 $\pm$ 0.009	0.069 $\pm$ 0.005	0.194 $\pm$ 0.037	0.228 $\pm$ 0.065
	AUC	0.614 $\pm$ 0.023	0.503 $\pm$ 0.001	0.497 $\pm$ 0.006	0.506 $\pm$ 0.002	0.617 $\pm$ 0.051	0.626 $\pm$ 0.060
	F1	0.334 $\pm$ 0.024	0.261 $\pm$ 0.007	0.092 $\pm$ 0.014	0.113 $\pm$ 0.009	0.302 $\pm$ 0.048	0.338 $\pm$ 0.110

GNNExplainer (GNNE) (Ying et al., 2019) A method that utilizes the idea of perturbations and aims to identify the most influential subgraph and node features responsible for a model’s prediction. This method optimizes for a mask that minimizes the difference between the predictions of the original and the perturbed graph, thereby highlighting crucial components. In our experiments, we employ GNNE with a Graph Convolutional Network (GCN) (Kipf and Welling, 2017) to elucidate its explanatory capabilities across different graph datasets.

Attention Explainer (AE) Leveraging the intrinsic attention mechanisms of Graph Attention Networks (GAT) (Veličković et al., 2018), this explainer aggregates attention scores across various layers and heads. This approach assumes that higher attention weights correlate with higher relevance to the model’s decision-making process. AE is specifically paired with GAT in our experimental setup to assess how attention-based explanations align with model predictions.

Graph Mask Explainer (GME) (Schlichtkrull et al., 2022) Also a perturbation-based approach, GME focuses on determining the minimal yet most informative subgraph that influences a GNN’s output. By iteratively masking out parts of the input graph, GME identifies critical nodes and edges that significantly affect the prediction accuracy. Like GNNE, GME is applied in conjunction with a GCN to explore the utility and limitations of perturbation-based explanations in our study.

Through our experiments, we seek to understand the comparative effectiveness of these baselines in providing transparent and actionable insights into GNN decisions. We hypothesize that the integration of higher-order explanations in HOGE versions may reveal more nuanced and comprehensive interpretative details, potentially leading to more robust and interpretable machine learning models.

4.4. Implementation Details

To ensure robust and generalizable results, all experiments are conducted across ten different dataset seeds, each featuring distinct train-test-validation splits. We report the average performance metrics for the explanations along with their standard deviations to capture variability and ensure reproducibility.

For the evaluation of explanation methods, we systematically select 500 graphs from each seed within every dataset to form a comprehensive test suite. This structured sampling allows for balanced and fair assessments across varying data distributions.

The experiments leverage two prominent Graph Neural Network (GNN) architectures:

•

Graph Convolutional Network (GCN) (Kipf and Welling, 2017): Employs spectral-based convolution layers that effectively capture graph topology.
•

Graph Attention Network (GAT) (Veličković et al., 2018): Integrates attention mechanisms to weigh the importance of nodes dynamically, based on their neighborhood.

Both architectures are configured with two convolution layers, each followed by a ReLU activation function (Agarap, 2018). The network topology concludes with a global mean pooling layer to aggregate node features and a final linear layer equipped with a Sigmoid activation function (Narayan, 1997) to produce output predictions.

Training is conducted using the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.01 and a weight decay parameter set at $5\times 10^{-4}$ . Each model undergoes 100 epochs of training, with the best-performing model selected based on the lowest validation loss.

Our computational environment consists of 4 NVIDIA GeForce RTX 2080 Ti GPUs and 40 Intel Xeon E5-2640 v4 CPU cores, supported by at least 80GB of RAM. This hardware setup ensures sufficient computational power to handle the intensive training and evaluation processes involved in our experiments.

5. Results

5.1. Evaluating HOGE

The core findings of our study are summarized in Table 2, which compares the performance of HOGE against standard baseline explainers across multiple datasets.

5.1.1. Overall Performance

The results in Table 2 demonstrate a consistent enhancement in the performance of explainer models when augmented with HOGE. These results reinforce that utilizing higher-order structures facilitates better explanations.

5.1.2. Specific Dataset Insights

•

Benzene A notable increase in performance (500% for GNNE) is observed with HOGE for all explainer models on the Benzene dataset. We find that the HOGE-enhanced version of GNNE (HOGE-GNNE) leads in performance, while the HOGE versions of AE and GME, despite enhancements, still lag behind HOGE-GNNE.
•

Fluoride Carbonyl Across this dataset, we generally observe performance improvements with the adoption of HOGE, for example we see a 32% increase for GNNE. However, an exception is noted with HOGE-AE, where performance did not improve. Detailed analysis revealed that the underlying GAT model struggled with the dataset, predicting only the majority class and failing to capture meaningful patterns. This, in turn, resulted in AE predicting almost randomly, as can be seen by the standard deviation. HOGE-AE, on the other hand, is stable in its results with small deviations.
•

Mutagenicity This dataset, known for its complexity, showed performance gains with HOGE for both AE (17%) and GME (21%) explainers, except for HOGE-GNNE. We attribute this to the fact that the underlying GNN with cell complex inputs performed worse than the traditional GNN on the binary classification task. The worse performance of the GNN may have resulted in worse explanations.

5.1.3. Discussion

The findings from these experiments affirm that incorporating higher-order structures enhances the capability of graph explainers to present human-interpretable substructures. However, we also note that the influence of higher-order structures is dependent on the presence of underlying higher-order structures in the base graph, as HOGE constructs cell complexes directly from the base graph without introducing any external structures.

These results collectively highlight the impacts of higher-order information in enhancing the interpretability and performance of explanation methods in graph neural networks.

5.2. Ablation Studies

Our method relies on higher-order cell complexes that come from lifting the graph. To analyze how important these cell complexes are, we use two approaches. We describe the approaches below. Table 3 displays the results of our study.

5.2.1. Training on Graphs, Explaining on Complexes

We train a model on the original graphs. To generate explanations, we lift the graphs to cell complexes and use the model trained on graphs and cell complexes as input to graph explainers. We find that if the model hasn’t seen these higher-order structures during training, then it doesn’t benefit from them during explanation. This details the importance of training the model on cell complexes.

5.2.2. Training on Complexes, Explaining on Graphs

We train a model on the cell complexes. For generating explanations, we only provide the base graph, along with this model to graph explainers. We find that the performance drastically drops, with AUC scores being comparable to not using HOGE at all. This shows us that our method strongly relies on higher-order relations, which supports our hypothesis that higher-order structures improve explanation accuracy.

5.3. Information Propagation Methods

This subsection elaborates on various methods of information (or importance) propagation used to derive graph explanations from cell complex explanations, as initially outlined in Section 3.4 and Section 3.5. These methods are specifically tested on the Benzene dataset described in Section 4.1.

Information propagation is crucial in our methodology as it determines how significance from complex graph structures (cycles and edges) is relayed to simpler underlying graphs. The following list details the different approaches we employed:

Table 3. AUC Scores for the Ablation Study. The best results are highlighted in bold. We see that the best performance is achieved when both training and explanation are done using cell complexes. A significant drop is seen in the AUC score when either training or explanation is done using just graphs, indicating the importance of cell complexes in the HOGE framework.

Configuration	Training Data	Explanation Data
		Graphs	Complexes
Benzene + GNNE	Graphs	0.6727	0.6800
	Complexes	0.4739	0.8906
Benzene + AE	Graphs	0.3790	0.3790
	Complexes	0.3730	0.5674
Benzene + GME	Graphs	0.5047	0.4909
	Complexes	0.5082	0.5664
Mutagenicity + GNNE	Graphs	0.4148	0.4148
	Complexes	0.4739	0.598
Mutagenicity + AE	Graphs	0.5435	0.5435
	Complexes	0.5955	0.6789
Mutagenicity + GME	Graphs	0.4777	0.4804
	Complexes	0.5129	0.5337

(1)

0-skeleton Propagation: In this method, we do not transfer importance from complex structures like cycles and edges, instead only using the 0-skeleton of the cell complex to pass information. This utilizes the idea of Dimension Masking introduced in Section 3.5. Essentially, this approach treats the base graph as isolated from its higher-dimensional complexes, focusing solely on its inherent properties without considering extended topological features.
(2)

1-skeleton Propagation: Here, the importance is transferred solely from the 1-cells and 0-cells down to the base graph, ignoring any contribution from the 2-cells. This method allows for the investigation of the impact of linear connections between nodes while still omitting cyclic structures. Again, we utilize Dimension Masking to achieve this effect.
(3)

Hierarchical Propagation: This approach involves transferring to the base graph in a hierarchical structure. Specifically, the importance of 2-cells is mapped to their corresponding faces, which are 1-cells, and then further down to the faces of the 1-cells, which are the 0-cells. This is then transferred to the base graph, facilitating a direct flow of information from more complex to simpler structures, effectively tracing the influence of higher-order relations to the base graph.
(4)

Direct Propagation: In contrast to hierarchical propagation, this method transfers the importance of all cells in the cell complex directly to the base graph. If a 1-cell or 2-cell is deemed important, its importance is distributed across all the edges in the base graph that constitutes it. This ensures that we maintain the integral value of cyclic structures in the explanation process. This is what we use for the above tables.

Table 4 presents the comparative performance of these methods. Notably, the 0-skeleton Propagation approach yields the least effective results, underscoring the limitations of disregarding higher-order structures in graph explanations. This finding aligns with our in-depth analysis in Section 5.2. Furthermore, the Hierarchical Propagation method underperforms relative to the Direct Propagation technique, likely due to the latter’s ability to preserve the holistic influence of cycles, which is critical when cycles contribute significantly to the graph’s characteristics. We also find that 1-skeleton Propagation performs the best, this is likely because the edges present in a cycle by themselves help aid the flow of information. Spreading importance from cycles directly may result in over-emphasizing them leading to worse performance.

Table 4. Information propagation method performance results on the benzene dataset. (1) 0-skeleton Propagation (2) 1-skeleton Propagation (3) Hierarchical Propagation and (4) Direct Propagation. The best results are highlighted in bold.

Metrics	(1)	(2)	(3)	(4)
GEA	0.635	0.806	0.672	0.717
AUC	0.481	0.919	0.847	0.872
F1	0.152	0.876	0.759	0.807

5.4. Performance Analysis

Empirically, we analyzed the size difference between a graph and its corresponding cell complex. We present these results in Table 5. We find that for the datasets used here, the cell complexes have 3.2x more cells than nodes in the original graphs and 3.5x more connections than edges in the original graph on average. Notably, we find that the average degree remains the same before and after lifting the graph, indicating that while the cell complexes are larger, they are not particularly denser. This shows that our implementation of lifting graphs to complexes described in Section 3.2 does not incur a substantial performance overhead.

Table 5. Increase in size when lifting a graph to a cell complex.

\varnothing\delta

represents the average degree for both graphs and cell complexes.

Dataset	Graphs			Cell Complexes
	$\varnothing\|V\|$	$\varnothing\|E\|$	$\varnothing\delta$	$\varnothing\|V_{\chi}\|$	$\varnothing\|E_{\chi}\|$	$\varnothing\delta$
Benzene	20.58	43.63	2.12	66.47	156.04	2.34
Mutagenicity	29.1	60.83	2.08	92.18	207.64	2.25
Fluoride Carbonyl	21.42	45.44	2.12	69.04	162.48	2.32

We perform further analyses and report the average time (over 10 runs) for the conversion to cell complex, GNN training, and prediction explanation for both graphs and cell complexes. We perform these tests on the Benzene dataset with a GCN model, and GNNExplainer as the explainer. GNN is trained for 50 epochs, while GNNExplainer is trained for 200 epochs. Results are detailed in Table 6. The conversion to a cell complex is quick and has negligible contribution to the overhead. The main computational overhead comes from training the GNN on cell complexes, which is expected since they are larger. However, the overhead is relatively small compared to the increase in the size. GNNExplainer takes similar amounts of time when run with graphs and complexes, indicating that adding higher-order structures does not slow down the generation of explanations. The slow performance of GNNExplainer can be attributed to the fact that explanations are not running in batches, but it is explaining one graph at a time.

Table 6. Results for lifting time, training time, and explanation time for graphs and complexes (in seconds). Lifting is performed on all 12000 samples in the Benzene dataset, training is performed on 8400 samples, and explanation on 500 samples.

Component	Graphs	Cell Complexes
Lifting Graph	-	1.89s
GNN Training	20.64s	47.35s
Explanation	82.58s	93.36s
Total	103.22s	142.6s

6. Limitations

While HOGE demonstrates significant advancements in explaining the predictions of Graph Neural Networks by utilizing higher-order structures, some limitations warrant further discussion. Firstly, the computational complexity associated with lifting graphs to higher-dimensional structures can be substantial, especially with large-scale graphs common in domains like social networking and bioinformatics. Because we primarily test on small molecular and synthetic datasets, our experiments show minimal overhead; scalability to even larger graphs remains an open problem.

Moreover, the effectiveness of HOGE is heavily reliant on the inherent properties of the dataset, particularly the presence of meaningful higher-order structures. In datasets where such structures are sparse or irrelevant, the benefits of HOGE may not be as pronounced. This can potentially lead to unnecessary computational expenses without corresponding gains in explainability.

7. Conclusion

HOGE represents a novel approach to enhancing the explainability of Graph Neural Networks through the integration of higher-order structures. Our experiments demonstrate that incorporating cell complexes can lead to more accurate and interpretable explanations of GNN predictions with minimal computational overhead on the tested benchmarks. We also test the importance of higher-order structures using ablation studies and show that the best results are obtained for graph predictions when both GNN training and explaining are done using cell complexes. Looking ahead, there are several promising directions for extending this work. Firstly, expanding the applicability of HOGE to a broader range of GNN architectures and graph explainers could further validate and refine the approach. Additionally, we hope this work motivates investigating methods to reduce the computational complexity associated with higher-dimensional structures, making HOGE more practical for large-scale applications.

References

(1)
Agarap (2018) Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
Agarwal et al. (2023) Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. 2023. Evaluating explainability for graph neural networks. Scientific Data 10, 1 (2023), 144.
Bodnar et al. (2021) Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yuguang Wang, Pietro Liò, Guido F Montufar, and Michael Bronstein. 2021. Weisfeiler and Lehman Go Cellular: CW Networks. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 2625–2640.
Bodnar et al. (2022) Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yu Guang Wang, Pietro Liò, Guido Montúfar, and Michael Bronstein. 2022. Weisfeiler and Lehman Go Cellular: CW Networks. arXiv:2106.12575 [cs.LG]
Ebli et al. (2020) Stefania Ebli, Michaël Defferrard, and Gard Spreemann. 2020. Simplicial neural networks. arXiv preprint arXiv:2010.03633 (2020).
Gerbner et al. (2018) Dániel Gerbner, Balázs Keszegh, Cory Palmer, and Balázs Patkós. 2018. On the number of cycles in a graph with restricted cycle lengths. SIAM Journal on Discrete Mathematics 32, 1 (2018), 266–279.
Huang et al. (2020) Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, and Yi Chang. 2020. GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks. arXiv:2001.06216 [cs.LG]
Kakkad et al. (2023) Jaykumar Kakkad, Jaspal Jannu, Kartik Sharma, Charu Aggarwal, and Sourav Medya. 2023. A Survey on Explainability of Graph Neural Networks. arXiv:2306.01958 [cs.LG]
Kazius et al. (2005) Jeroen Kazius, Ross McGuire, and Roberta Bursi. 2005. Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry 48, 1 (2005), 312–320.
Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]
Luo et al. (2020) Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. 2020. Parameterized Explainer for Graph Neural Network. arXiv:2011.04573 [cs.LG]
Narayan (1997) Sridhar Narayan. 1997. The generalized sigmoid activation function: Competitive supervised learning. Information Sciences 99, 1 (1997), 69–82. https://doi.org/10.1016/S0020-0255(96)00200-9
Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). ACM. https://doi.org/10.1145/2623330.2623732
Schlichtkrull et al. (2022) Michael Sejr Schlichtkrull, Nicola De Cao, and Ivan Titov. 2022. Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking. arXiv:2010.00577 [cs.CL]
Selvaraju et al. (2019) Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2019. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision 128, 2 (Oct. 2019), 336–359. https://doi.org/10.1007/s11263-019-01228-7
Shan et al. (2021) Caihua Shan, Yifei Shen, Yao Zhang, Xiang Li, and Dongsheng Li. 2021. Reinforcement Learning Enhanced Explainer for Graph Neural Networks. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 22523–22533. https://proceedings.neurips.cc/paper_files/paper/2021/file/be26abe76fb5c8a4921cf9d3e865b454-Paper.pdf
Spinelli et al. (2024) Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2024. A Meta-Learning Approach for Training Explainable Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 35, 4 (April 2024), 4647–4655. https://doi.org/10.1109/tnnls.2022.3171398
Sterling and Irwin (2015) Teague Sterling and John J. Irwin. 2015. ZINC 15 – Ligand Discovery for Everyone. Journal of Chemical Information and Modeling 55, 11 (2015), 2324–2337. https://doi.org/10.1021/acs.jcim.5b00559 arXiv:https://doi.org/10.1021/acs.jcim.5b00559 PMID: 26479676.
Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML]
Yang et al. (2022) Ruochen Yang, Frederic Sala, and Paul Bogdan. 2022. Efficient Representation Learning for Higher-Order Data with Simplicial Complexes. In Learning on Graphs Conference. PMLR, 13–1.
Ying et al. (2019) Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. arXiv:1903.03894 [cs.LG]
Yuan et al. (2021) Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. 2021. On Explainability of Graph Neural Networks via Subgraph Explorations. arXiv:2102.05152 [cs.LG]