Generating Explanations for Cellular Neural Networks

Akshit Sinha*, Sreeram Vennam*, Charu Sharma, Ponnurangam Kumaraguru akshit.sinha, [email protected] charu.sharma, [email protected] IIIT, HyderabadIndia
Abstract.

Recent advancements in graph learning contributed to explaining predictions generated by Graph Neural Networks. However, existing methodologies often fall short when applied to real-world datasets. We introduce HOGE, a framework to capture higher-order structures using cell complexes, which excel at modeling higher-order relationships. In the real world, higher-order structures are ubiquitous like in molecules or social networks, thus our work significantly enhances the practical applicability of graph explanations. HOGE produces clearer and more accurate explanations compared to prior methods. Our method can be integrated with all existing graph explainers, ensuring seamless integration into current frameworks. We evaluate on GraphXAI benchmark datasets, HOGE achieves improved or comparable performance with minimal computational overhead. Ablation studies show that the performance gain observed can be attributed to the higher-order structures that come from introducing cell complexes.

Graph Learning, Graph Interpretability, Cell Complexes, Explainability in Graph Learning, Graph Classification
copyright: CCjournalyear: 2024doi: XXXXXXX.XXXXXXXconference: ; Pre-print; 2024submissionid: 126
Refer to caption
Figure 1. (a) Ground Truth for an example from BENZENE (Agarwal et al., 2023) (b) Explanations generated by GNNExplainer (c) Explanations generated by HOGE-GNNExplainer. In all cases, green nodes and edges signify the subgraph considered important for GNN prediction (the explanation). Incorporating HOGE substantially increases the explanation accuracy for this example from 0.125 to 1.
\Description

Visualisation of GNN explanations generated by GNNExplainer and its HOGE version against the ground truth.

1. Introduction

Graph*Both authors contributed equally to this research. neural networks (GNNs) (Kipf and Welling, 2017) have become increasingly important in machine learning, since in many real-world applications, including social, information, chemical, and biological domains, data can be naturally modeled as graphs (Perozzi et al., 2014; Kipf and Welling, 2017; Veličković et al., 2018). Despite their power in capturing intricate node features and structural information, GNNs face unique interpretability challenges due to the message passing mechanism. Interpretability is crucial for sensitive domains like healthcare and finance, where decisions must be transparent and justifiable. As the use of GNNs expands across various fields, understanding their complex decision-making processes becomes important to ensure that the outcomes are ethical and accountable. Enhancing the interpretability of GNNs is essential for both human comprehension and building trust to facilitate informed decision-making by stakeholders. Effective explainability translates complex model outputs into human-intelligible insights. This simplifies the underlying rationale of GNN predictions and encourages users to leverage these advanced technologies confidently. Various strategies and frameworks to address these challenges have been proposed. Some popular works include GNNExplainer (Ying et al., 2019), a perturbation-based method to generate important subgraphs, and GraphLIME (Huang et al., 2020), which fits a simple and interpretable linear surrogate model to the locality of the prediction. However, these methods often fall short in real-world scenarios, as shown in the example Figure 1. Existing methods fail to adequately explain GNNs for real world applications, this is the gap our works aims to fill by introducing a novel framework that leverages higher-order structures to enhance graph explainability.

Refer to caption
Figure 2. Visual representation of HOGE. The input graph is lifted to a cell complex, which is given as input to a GNN and an explainer. Information propagation is then done on the output cell complex explanation to map it to an explanation for the original graph. The green color on cells, nodes and edges signify the substructure (complex or graph) considered important for GNN prediction (the explanation).
\Description

Flowchart illustrating the HOGE framework

Another emerging research area in recent years is the exploration of higher-order structures such as simplicial or cell complexes in modeling the data. A cell complex is a mathematical structure consisting of vertices, edges, faces, and their higher-dimensional counterparts, assembled in a way that defines a topological space. Prior work has also introduced message-passing schemes to model such higher-order relationships. This exploration is motivated by the inherent limitations of traditional GNNs, which struggle with expressive power and managing long-range interactions. This is because they typically do not model complex, multi-level relationships in data. Ebli et al.(Ebli et al., 2020) and Bodnar et al.(Bodnar et al., 2021) provide different strategies to model these higher-order structures and create more powerful neural networks. They show that incorporating higher-order structures improves performance on benchmark datasets. This improvement has been attributed to the increased expressivity and information exchange taking place during message passing in such neural networks. While these approaches have improved the modeling of higher-order relationships, their potential for enhancing explainability has not been explored yet, a gap our work aims to fill.

This paper explores graph explainability through the lens of higher-order structures. Prior studies demonstrated that higher-order structures enhance performance on benchmarks, suggesting potential benefits for explainability. This can be achieved by modifying the underlying structure and making the final data representations richer in information.

To study the influence of higher order structures on model explainability, we introduce HOGE: Higher Order Graph Explainer, a novel framework designed to utilize higher-order structures in graph explanations. The framework for HOGE is outlined in Figure 2, and visually summarises the operational pipeline for our method. Our approach only augments the input graph, ensuring that HOGE remains model-agnostic and explainer-agnostic. This allows it to integrate seamlessly into any existing GNN pipeline.

Unlike traditional methods that analyze sensitivity or generate subgraphs based on existing graph topology, HOGE leverages the complex interrelationships made possible by cell complexes. This technique enables a more comprehensive capture of multi-level interactions within the graph. This allows HOGE to not only enhance the accuracy of explanations but also to provide a deeper understanding of the underlying mechanisms by modifying the graph’s structure itself. This approach represents an exploration of higher-order structures in the domain of graph explainability. We systematically extend the concepts and methodologies introduced in prior work to analyze the potential of higher-dimensional structures in enhancing existing graph explainability methods.

Our experiments demonstrate that explanations using higher-order structures are more accurate and show how significantly these structures contribute to the explanation process. Our findings show that lifting a graph to a higher-order structure before training a GNN enhances explanation accuracy significantly. Furthermore, ablation studies reveal that lifting graphs to cell complexes before the training phase of GNNs is responsible for achieving high explanation accuracy. However, the process of lifting graphs is less critical during the training and inference phases of graph explainers. This suggests that the improvements in explanations stem more from the increased information flow and denser graph representations provided by higher-order structures, and not from the structures themselves. In particular, the main contributions of our paper are as follows:

  1. (1)

    To the best of our knowledge, this is the first attempt at studying graph explainability using higher-order structures for graph classification tasks.

  2. (2)

    We propose HOGE, a novel framework that enhances the explainability of graph neural networks by utilizing higher-order structures. HOGE is designed to be both model-agnostic and explainer-agnostic, allowing seamless integration into any graph learning pipeline.

  3. (3)

    We perform an empirical analysis of our proposed method for various graph neural network explainers. We observe that HOGE significantly improves over graph-based explainers. We attribute this to increased information flow and structural changes, providing critical insights into their application during the training and inference phases. All our code and models used for experiments will be made publicly available after the review period.

2. Related Work

Various methods have been proposed to explain predictions generated by GNNs. In this section, we focus on a few popular methods that have come out in recent years (2019-2021). Explainability methods for GNNs can be categorized based on the approach they take. For a comprehensive overview of GNN explainability methods, please refer to (Kakkad et al., 2023). Traditional methods include perturbation-based, surrogate-based, and gradient-based methods.

Perturbation-based methods, such as GNNExplainer (Ying et al., 2019), PGExplainer (Luo et al., 2020), and SubgraphX (Yuan et al., 2021), assess the impact of subgraph structures and node features on GNN performance by introducing perturbations. GNNExplainer generates an explanation mask for individual predictions, while PGExplainer uses a neural network to identify key edges. SubgraphX employs Shapley values and the Monte Carlo Tree Search algorithm to quantify feature importance.

Surrogate methods, on the other hand, use a two-step process where data generated from a graph’s local neighborhood informs a surrogate model. This model then provides insights into the original model’s decision-making process. For example, GraphLime (Huang et al., 2020) utilizes a Hilbert-Schmidt Independence Criterion Lasso, a kernel-based approach, to provide interpretable local explanations.

Gradient-based methods, among the earliest developed to explain GNN predictions, assess how changes in input affect predictions by analyzing gradients. Methods like Sensitivity Analysis and Guided-BP compute feature importance directly from these gradients. Grad-CAM (Selvaraju et al., 2019), specifically, calculates node importance by summing the feature maps of node embeddings weighted by their gradients, offering a nuanced view of feature relevance.

Building on these foundational methods, recent advancements such as RG-Explainer (Shan et al., 2021) utilize reinforcement learning to craft explanatory subgraphs tailored to both the model and individual instances, enhancing the precision of explanations. Similarly, MATE (Spinelli et al., 2024) enhances explanation quality by iteratively training both a GNN and a baseline explainer. This allows the model to develop internal representations that are inherently more interpretable. This emphasis on enhancing internal representations aligns with our approach, where we elevate graph structures into higher-order cell complexes to achieve even deeper explanatory insight.

While existing methods for GNN explainability have advanced our understanding of how GNNs process information, they predominantly focus on low-dimensional, direct interactions within graphs. HOGE introduces a significant departure from these approaches by integrating higher-order structures into the explanation process.

3. Proposed Approach

3.1. Preliminaries

Graphs typically model pairwise interactions. They are generalized by simplicial and cell complexes111For a more comprehensive understanding of cell complexes, we point the reader to https://jeffe.cs.illinois.edu/teaching/comptop/2009/notes/cell-complexes.pdf, which encapsulate higher dimensional relationships, allowing for group-wise interactions among points. Unlike simplicial complexes, which have a rigid combinatorial structure that constrains transformation possibilities, cell complexes provide greater flexibility. This flexibility enhances the ability to control message passing and decouple input from computational graphs, facilitating better data interpretations (Bodnar et al., 2022).

Although this work does not extensively engage with algebraic topology concepts associated with cell complexes, we nevertheless list some fundamental definitions that are essential for understanding the proposed framework (Bodnar et al., 2022; Yang et al., 2022; Ebli et al., 2020).

Definition 3.0 (p𝑝pitalic_p-cell).

A p𝑝pitalic_p-cell in a cell complex refers to an element of dimension p𝑝pitalic_p. In analogy to traditional graphs where we have vertices (0-dimensional) and edges (1-dimensional), cell complexes include these and extend to higher dimensions. For instance, 0-cells are points, 1-cells are edges, and 2-cells can be envisioned as surfaces of geometric figures. For example, in a three-dimensional cell complex, a 2-cell could represent the triangular surface of a tetrahedron.

Definition 3.0 (Faces/Cofaces).

In a cell complex, a p𝑝pitalic_p-cell χpsubscript𝜒𝑝\chi_{p}italic_χ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is considered a face or boundary of a (p+1)𝑝1(p+1)( italic_p + 1 )-cell χ(p+1)subscript𝜒𝑝1\chi_{(p+1)}italic_χ start_POSTSUBSCRIPT ( italic_p + 1 ) end_POSTSUBSCRIPT if the set of points composing χpsubscript𝜒𝑝\chi_{p}italic_χ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is a subset of those composing χ(p+1)subscript𝜒𝑝1\chi_{(p+1)}italic_χ start_POSTSUBSCRIPT ( italic_p + 1 ) end_POSTSUBSCRIPT. Conversely, χ(p+1)subscript𝜒𝑝1\chi_{(p+1)}italic_χ start_POSTSUBSCRIPT ( italic_p + 1 ) end_POSTSUBSCRIPT is referred to as the coface or coboundary of χpsubscript𝜒𝑝\chi_{p}italic_χ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. For example, in a 3D complex, if a triangle (2-cell) is part of the surface of a tetrahedron (3-cell), the triangle is a face of the tetrahedron, and the tetrahedron is a coface of the triangle.

Definition 3.0 (p𝑝pitalic_p-chain).

In a given cell complex, a p𝑝pitalic_p-chain is simply defined as the set of all p𝑝pitalic_p-dimensional cells. For example, in a 2D complex such as a triangle, the set of all vertices is the 0-chain of the cell complex.

Definition 3.0 (p𝑝pitalic_p-skeleton).

The p𝑝pitalic_p-skeleton of a cell complex χ𝜒\chiitalic_χ is defined as the subcomplex χ(p)superscript𝜒𝑝\chi^{(p)}italic_χ start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT consisting of cells of dimension at most p𝑝pitalic_p.

Definition 3.0 (connection).

Within a cell complex, a connection is analogous to an edge in traditional graphs. It connects two cells, either of the same dimension (horizontal connection) or different dimensions (vertical connection). A horizontal connection links two p𝑝pitalic_p-cells that share a common coface, whereas vertical connections link a p𝑝pitalic_p-cell to its corresponding faces or cofaces.

3.2. Lifting Graphs to Higher Dimensions

Graphs are typically represented through adjacency matrices describing the connections between the nodes. This notion can be naturally extended to cell complexes. This section introduces the lifting operation on graphs (Bodnar et al., 2022). Within cell complexes, each p𝑝pitalic_p-chain possesses a distinct adjacency matrix Apsubscript𝐴𝑝A_{p}italic_A start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, representing the interconnections among cells of the same dimension. Furthermore, incidence matrices B(p)(p+1)subscript𝐵𝑝𝑝1B_{(p)(p+1)}italic_B start_POSTSUBSCRIPT ( italic_p ) ( italic_p + 1 ) end_POSTSUBSCRIPT are employed to portray the relationships between p𝑝pitalic_p-cells and their respective (p+1)𝑝1(p+1)( italic_p + 1 )-cell cofaces, illustrating a hierarchical structural integration. See Figure 3 for an example.

Using these definitions, we can represent a cell complex through a complete adjacency matrix, following a representation introduced by Yang et al. (Yang et al., 2022) for simplicial complexes.

Ac=[A0B010B01TA1B120B12TA2]subscript𝐴𝑐matrixsubscript𝐴0subscript𝐵010superscriptsubscript𝐵01𝑇subscript𝐴1subscript𝐵120superscriptsubscript𝐵12𝑇subscript𝐴2\displaystyle A_{c}=\begin{bmatrix}A_{0}&B_{01}&0\\ B_{01}^{T}&A_{1}&B_{12}\\ 0&B_{12}^{T}&A_{2}\end{bmatrix}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]
Refer to caption
Figure 3. Lifting a graph to a cell complex and the corresponding incidence matrices for the cell complex.
\Description

A figure representing the lifting operation on graphs, as well as an illustration of incidence matrices

In the context of this work, where the primary focus is on 0-cells or nodes, the full adjacency matrix is simplified to a partial adjacency matrix. This is achieved by disregarding the adjacency matrices of 1-cells and 2-cells. This approach not only maintains topological information but also enhances computational efficiency by reducing the complexity of data representation.

Ac=[A0B010B01T0B120B12T0]subscript𝐴𝑐matrixsubscript𝐴0subscript𝐵010superscriptsubscript𝐵01𝑇0subscript𝐵120superscriptsubscript𝐵12𝑇0\displaystyle A_{c}=\begin{bmatrix}A_{0}&B_{01}&0\\ B_{01}^{T}&0&B_{12}\\ 0&B_{12}^{T}&0\end{bmatrix}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ]

To quantify the increase in the size of the adjacency matrix when converting from graphs to cell complexes, we present theoretical results below and empirical results in section 5.4.

Theorem 3.6.

For a graph G(V,E)𝐺𝑉𝐸G(V,E)italic_G ( italic_V , italic_E ) with adjacency matrix A𝐴Aitalic_A having cycles of length at most K𝐾Kitalic_K, let Wksubscript𝑊𝑘W_{k}italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represent the number of closed walks of length k𝑘kitalic_k which are not k𝑘kitalic_k-cycles. The corresponding cell complex χ𝜒\chiitalic_χ will have cells Vχsubscript𝑉𝜒V_{\chi}italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT and connections Eχsubscript𝐸𝜒E_{\chi}italic_E start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT such that

(1) |Vχ|=|V|+|E|+k=3K12k[tr(A(k))Wk]subscript𝑉𝜒𝑉𝐸superscriptsubscript𝑘3𝐾12𝑘delimited-[]𝑡𝑟superscript𝐴𝑘subscript𝑊𝑘\displaystyle|V_{\chi}|=|V|+|E|+\sum_{k=3}^{K}\frac{1}{2k}[tr({A^{(k)})}-W_{k}]| italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | = | italic_V | + | italic_E | + ∑ start_POSTSUBSCRIPT italic_k = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG [ italic_t italic_r ( italic_A start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]
(2) |Eχ|=3|E|+12k=3K[tr(A(k))Wk]subscript𝐸𝜒3𝐸12superscriptsubscript𝑘3𝐾delimited-[]𝑡𝑟superscript𝐴𝑘subscript𝑊𝑘\displaystyle|E_{\chi}|=3|E|+\frac{1}{2}\sum_{k=3}^{K}[tr({A^{(k)})}-W_{k}]| italic_E start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | = 3 | italic_E | + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_k = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT [ italic_t italic_r ( italic_A start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]

For Vχsubscript𝑉𝜒V_{\chi}italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, the proof follows by construction. We restrict cell complexes to have at most 2-cells. Let us define the set of 0-cells as V𝑉Vitalic_V, 1-cells as E𝐸Eitalic_E, and 2-cells as the set of all cycles present in the graph, of length at most K𝐾Kitalic_K. Thus, the number of cells that will be present in the cell complex of graph G𝐺Gitalic_G is exactly the total number of vertices, edges, and cycles. The third term in Equation 1 represents the total number of cycles present in the graph. This term is calculated using the formula for finding the number of k𝑘kitalic_k-length cycles in a graph, introduced in prior work done by Movarraei and Boxwala (Gerbner et al., 2018), where they show that

|Ck|=12k[tr(A(k))Wk]subscript𝐶𝑘12𝑘delimited-[]𝑡𝑟superscript𝐴𝑘subscript𝑊𝑘|C_{k}|=\frac{1}{2k}[tr({A^{(k)})}-W_{k}]| italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | = divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG [ italic_t italic_r ( italic_A start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]

Where |Ck|subscript𝐶𝑘|C_{k}|| italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | is the number of cycles of length k𝑘kitalic_k.

Equation 2 can also be proved following construction. The connections in χ𝜒\chiitalic_χ are of three types,

  1. (1)

    The original edges of the graph, equivalently represented by A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

  2. (2)

    The connections between nodes (0-cells) and edges (1-cells), equivalently represented by B01subscript𝐵01B_{01}italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT,

  3. (3)

    The connections between edges (1-cells) and cycles (2-cells), equivalently represented by B12subscript𝐵12B_{12}italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT

The number of connections present in A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is simply |E|𝐸|E|| italic_E |. For the incidence matrix B01subscript𝐵01B_{01}italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT, 2 connections are added per 1-cell, as each 1-cell is connected to two 0-cells. For the incidence matrix B12subscript𝐵12B_{12}italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT, a connection is added between a cycle (2-cell) and each edge (1-cell) contained in it. As each k𝑘kitalic_k-cycle will consist of k𝑘kitalic_k edges, we add k𝑘kitalic_k connections for each k𝑘kitalic_k-cycle. This leads us to the following

|Eχ|=|E|+2|E|+k=3K((k)12k[tr(A(k))Wk])subscript𝐸𝜒𝐸2𝐸superscriptsubscript𝑘3𝐾𝑘12𝑘delimited-[]𝑡𝑟superscript𝐴𝑘subscript𝑊𝑘|E_{\chi}|=|E|+2|E|+\sum_{k=3}^{K}\left(\left(k\right)\frac{1}{2k}[tr({A^{(k)}% )}-W_{k}]\right)| italic_E start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | = | italic_E | + 2 | italic_E | + ∑ start_POSTSUBSCRIPT italic_k = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( ( italic_k ) divide start_ARG 1 end_ARG start_ARG 2 italic_k end_ARG [ italic_t italic_r ( italic_A start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] )
(3) |Eχ|=3|E|+12k=3K[tr(A(k))Wk]subscript𝐸𝜒3𝐸12superscriptsubscript𝑘3𝐾delimited-[]𝑡𝑟superscript𝐴𝑘subscript𝑊𝑘\displaystyle|E_{\chi}|=3|E|+\frac{1}{2}\sum_{k=3}^{K}[tr({A^{(k)})}-W_{k}]| italic_E start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | = 3 | italic_E | + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_k = 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT [ italic_t italic_r ( italic_A start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]

3.3. Generating Cell Complexes Explanations

Building on the structural representation of cell complexes via adjacency matrices, these matrices can now be integrated into any GNN architecture. This integration enables the GNN to perform message passing across higher-dimensional structures encapsulated within the complexes. As cited, the motivation for utilizing higher-order structures stems from prior findings that suggest such configurations enhance model accuracy significantly (Ebli et al., 2020). This enhancement is primarily due to an augmented volume of informational exchange. We hypothesize that by leveraging cell complexes, GNNs can achieve a more nuanced representation of data, which in turn enhances the capability of graph explainers to generate more accurate and insightful model predictions.

We demonstrate the change in the working of graph explainers by integrating HOGE with GNNExplainer, demonstrating how adding cell complexes modifies the explainer’s working. Originally, the optimization function of GNNExplainer is as follows

(4) minMc=1C1[y=c]logPΦ(Y=y|G=Acσ(M),X=Xc)\displaystyle\min_{M}\sum_{c=1}^{C}1[y=c]\log P_{\Phi}(Y=y|G=A_{c}\odot\sigma(% M),X=X_{c})roman_min start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT 1 [ italic_y = italic_c ] roman_log italic_P start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ( italic_Y = italic_y | italic_G = italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊙ italic_σ ( italic_M ) , italic_X = italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )

In this function, Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the adjacency matrix representing the computation graph of the node to be explained. M𝑀Mitalic_M represents the mask optimizing the explanation, C𝐶Citalic_C is the number of classes, y𝑦yitalic_y is the class label, Pϕsubscript𝑃italic-ϕP_{\phi}italic_P start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT denotes the prediction probability by the model ϕitalic-ϕ\phiitalic_ϕ, G𝐺Gitalic_G represents the graph after applying the mask M𝑀Mitalic_M to the adjacency matrix Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, and Xcsubscript𝑋𝑐X_{c}italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the feature matrix. The goal is to minimize the log loss of the correct class prediction across all classes.

Refer to caption
Figure 4. Horizontal and vertical message passing. Message passing across the same dimension is horizontal and across dimensions is vertical.
\Description

Figure illustrating horizontal and vertical message passing in cell complexes

Transitioning from traditional computation graphs, our approach utilizes cell complexes as the input to the function. This shift allows the integration of more complex structural data into the computational framework. Consequently, the learned edge mask is adapted to represent the computational complex, incorporating both the connectivity and hierarchical structure inherent in cell complexes. Let Aχsubscript𝐴𝜒A_{\chi}italic_A start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT be the computational complex of a 0-cell. Again, since we are only explaining predictions through nodes, we omit the higher-order adjacency matrices. For a two-layer GCN, Aχsubscript𝐴𝜒A_{\chi}italic_A start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT can be understood as having two parts (1) the horizontal computation graph, which contains horizontal connections between 0-cells, and (2) the vertical computation graph, which contains vertical connections between cells across dimensions. This is shown in Figure 4.

In this setting, Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT can be extended to Aχsubscript𝐴𝜒A_{\chi}italic_A start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT as below. B(p)(p+1)subscript𝐵𝑝𝑝1B_{(p)(p+1)}italic_B start_POSTSUBSCRIPT ( italic_p ) ( italic_p + 1 ) end_POSTSUBSCRIPT represent the incidence matrices derived from Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

(5) Aχ=[AcB010B01T0B120B12T0]subscript𝐴𝜒matrixsubscript𝐴𝑐subscript𝐵010superscriptsubscript𝐵01𝑇0subscript𝐵120superscriptsubscript𝐵12𝑇0\displaystyle A_{\chi}=\begin{bmatrix}A_{c}&B_{01}&0\\ B_{01}^{T}&0&B_{12}\\ 0&B_{12}^{T}&0\end{bmatrix}italic_A start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ]

Consequently, with a richer representation of the underlying graph, we can modify (4) to (6):

(6) minMχc=1C1[y=c]Pχsubscriptsubscript𝑀𝜒superscriptsubscript𝑐1𝐶1delimited-[]𝑦𝑐subscript𝑃𝜒\displaystyle\min_{M_{\chi}}\sum_{c=1}^{C}1[y=c]P_{\chi}roman_min start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT 1 [ italic_y = italic_c ] italic_P start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT
Pχ=logPΦ(Y=y|Gχ=Aχσ(Mχ),X=Xc)P_{\chi}=\log P_{\Phi}(Y=y|G_{\chi}=A_{\chi}\odot\sigma(M\chi),X=X_{c})italic_P start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = roman_log italic_P start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ( italic_Y = italic_y | italic_G start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ⊙ italic_σ ( italic_M italic_χ ) , italic_X = italic_X start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )

In summary, we train a GNN on the cell complex generated by lifting the graph and then provide the trained GNN and cell complex as inputs to a graph explainer. The explainer then learns Mχsubscript𝑀𝜒M_{\chi}italic_M start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, which is the connection mask for the cell complex, and quantifies the importance of all the connections in the cell complex. Following this, the explainer outputs the final edge importance mask for the input graph.

Table 1. Dataset details of Graph Classification datasets from the GraphXAI (Agarwal et al., 2023) Benchmark. \varnothing indicates average.
Dataset No. of graphs \varnothing No. of vertices \varnothing No. of edges \varnothing Degree \varnothing No. of Cycles Max Cycle Length
Benzene 12000 20.58 43.63 2.12 2.242 6
Mutagenicity 1768 29.1 60.83 2.08 2.20 6
Fluoride Carbonyl 8671 21.42 45.44 2.12 2.256 6

3.4. Information Propagation

With the explainer generating an importance mask Mχsubscript𝑀𝜒M_{\chi}italic_M start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT for the cell complex connections, we must propagate this information from the higher-order structures back to the base graph structure. We term this process information propagation, as it involves transferring the learned importance values from the complex to the original domain. A visual representation is shown in Figure 5. For brevity, We note that there are multiple ways to propagate information to the base graph, and in this section, describe one in detail. We term this specific information propagation method Direct Propagation. We also introduce three alternative methods: 0-skeleton Propagation, 1-skeleton Propagation and Hierarchical Propagation. All methods are described and evaluated in section 5.3.

Formally, let IeGsubscriptsuperscript𝐼𝐺𝑒I^{G}_{e}italic_I start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT denote the derived importance of edge e𝑒eitalic_e in the base graph G𝐺Gitalic_G, and Ieχsubscriptsuperscript𝐼𝜒𝑒I^{\chi}_{e}italic_I start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT represent the importance of the corresponding connection e𝑒eitalic_e in the cell complex χ𝜒\chiitalic_χ. We define the information propagation function as:

(7) IeG=tanh(Ieχ+1|Ceχ|τCeχIτχ)subscriptsuperscript𝐼𝐺𝑒subscriptsuperscript𝐼𝜒𝑒1superscriptsubscript𝐶𝑒𝜒subscript𝜏subscriptsuperscript𝐶𝜒𝑒subscriptsuperscript𝐼𝜒𝜏\displaystyle I^{G}_{e}=\tanh\left(I^{\chi}_{e}+\frac{1}{|C_{e}^{\chi}|}\sum_{% \tau\in C^{\chi}_{e}}I^{\chi}_{\tau}\right)italic_I start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = roman_tanh ( italic_I start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG | italic_C start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_τ ∈ italic_C start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT )

where 𝒞eχsubscriptsuperscript𝒞𝜒𝑒\mathcal{C}^{\chi}_{e}caligraphic_C start_POSTSUPERSCRIPT italic_χ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is the set of all connections in χ𝜒\chiitalic_χ that constitute the higher-order structures (e.g., cycles) of which e𝑒eitalic_e is a part. The tanh\tanhroman_tanh activation ensures that the propagated importance values are bounded within the range [0, 1], while the summation term captures the cumulative influence of higher-order structures on the base edge. The tanh\tanhroman_tanh was chosen over other non-linear functions owing to its steeper slope, which polarises the importance values to tend to the bounds. This creates a notable distinction between important and unimportant edges.

This information propagation step is crucial as it translates the higher-order explanations generated by HOGE into a format that is comprehensible and actionable for the original graph domain. By explicitly accounting for the hierarchical relationships between the base graph and its lifted representation, we ensure that the explanatory power of higher-order structures is effectively distilled and propagated to the base level.

Refer to caption
Figure 5. Information propagation from 2-cells and 1-cells down to the base graph. The 2-cells transfers importance to all edges that make up the 2-cell and 1-cells transfers importance to the underlying edges.
\Description

Figure illustrating the process of information propagation

3.5. Dimension Masking

To gain more control over the incorporation of higher-order structures from the cell complex, we introduce a dimension masking strategy. This selectively includes or excludes specific dimensions when constructing the cell complex adjacency matrix Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

We define a dimension mask Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, which is a binary matrix of the same dimensions as Acsubscript𝐴𝑐A_{c}italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. The entries in Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT corresponding to the dimensions we want to keep are set to 1, while the entries for dimensions to be excluded are set to 0. Using Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we compute the masked adjacency matrix C0subscript𝐶0C_{0}italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as:

(8) C0=(AcDm)σ(M)subscript𝐶0direct-productdirect-productsubscript𝐴𝑐subscript𝐷𝑚𝜎𝑀\displaystyle C_{0}=(A_{c}\odot D_{m})\odot\sigma(M)italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊙ italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ⊙ italic_σ ( italic_M )

where direct-product\odot denotes the elementwise product, and σ(M)𝜎𝑀\sigma(M)italic_σ ( italic_M ) applies the learned importance mask M𝑀Mitalic_M using a non-linear activation function.

This dimension masking technique serves two main purposes. Firstly, it enables a systematic evaluation of the relative importance of different dimensional structures within the cell complex. By selectively masking out certain dimensions, we can identify the most informative subspaces contributing to the explanation process.

Secondly, dimension masking acts as a regularization mechanism. During training, we can disable specific higher-order dimensions that may introduce noise or cause overfitting. This encourages the model to learn representations that are robust to irrelevant higher-order features, potentially enhancing its generalization performance.

With this, we generate an explanation for the input graph by propagating information from the generated cell complex back to the original graph.

4. Experimental Settings

4.1. Datasets

We evaluate HOGE on GraphXAI (Agarwal et al., 2023) benchmark datasets. Table 1 briefly describes some properties of the datasets. These datasets are real-world molecular datasets, which were chosen as molecules inherently include higher-order structures like rings.

Benzene The Benzene (Agarwal et al., 2023) dataset contains 12,000 molecular graphs extracted from the ZINC15 (Sterling and Irwin, 2015) database and labeled into two classes. The task is to identify whether a given molecule is a benzene ring or not. Explanations are based on the presence or absence of benzene rings.

Mutagenicity This (Kazius et al., 2005) dataset contains 1,768 graphs, each representing a molecule. Graph molecules are labeled into two different classes according to their Mutagenicityenic properties. Ground truth explanations are based on the presence or absence of chosen toxicophores: NH2, NO2, aliphatic halide, nitroso, and azo-type.

Fluoride carbonyl The Fluoride Carbonyl (Agarwal et al., 2023) dataset contains 8,671 molecular graphs. Graph molecules are labeled into two classes where a positive sample indicates that the molecule contains a fluoride and a carbonyl functional group. The ground-truth explanations consist of combinations of fluoride atoms and carbonyl functional groups within a given molecule.

4.2. Evaluation Criteria

The task of a graph explainer method can be seen as binary classification, predicting whether an edge in the graph belongs in the ground truth explanation or not.

4.2.1. Graph Explanation Accuracy

We adopt Graph Explanation Accuracy (GEA) from GraphXAI (Agarwal et al., 2023). It measures the correctness of the generated explanation using the Jaccard index between the ground truth and prediction.

JAC(Y,Y^)=TP(Y,Y^)TP(Y,Y^)+FP(Y,Y^)+FN(Y,Y^)𝐽𝐴𝐶𝑌^𝑌𝑇𝑃𝑌^𝑌𝑇𝑃𝑌^𝑌𝐹𝑃𝑌^𝑌𝐹𝑁𝑌^𝑌JAC(Y,\hat{Y})=\frac{TP(Y,\hat{Y})}{TP(Y,\hat{Y})+FP(Y,\hat{Y})+FN(Y,\hat{Y})}italic_J italic_A italic_C ( italic_Y , over^ start_ARG italic_Y end_ARG ) = divide start_ARG italic_T italic_P ( italic_Y , over^ start_ARG italic_Y end_ARG ) end_ARG start_ARG italic_T italic_P ( italic_Y , over^ start_ARG italic_Y end_ARG ) + italic_F italic_P ( italic_Y , over^ start_ARG italic_Y end_ARG ) + italic_F italic_N ( italic_Y , over^ start_ARG italic_Y end_ARG ) end_ARG

Where Y𝑌Yitalic_Y is the ground truth binary edge mask and Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is the predicted binary edge mask.

4.2.2. Area Under Curve (AUC)

The Area Under Curve (AUC) metric quantifies the performance of a binary classifier by measuring the area under the Receiver Operating Characteristic (ROC) curve. This curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

AUC=01TPR(t)𝑑tAUCsuperscriptsubscript01TPR𝑡differential-d𝑡\text{AUC}=\int_{0}^{1}\text{TPR}(t)\,dtAUC = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT TPR ( italic_t ) italic_d italic_t

where t𝑡titalic_t represents the threshold value.

4.2.3. F1 Score

The F1𝐹1F1italic_F 1 Score is a widely used metric for evaluating the accuracy of binary classification models, particularly in situations where class distributions are imbalanced. It is the harmonic mean of precision and recall, providing a balance between the two by penalizing extreme values. The formula for the F1 Score is given by:

F1=2PrecisionRecallPrecision+Recall𝐹12PrecisionRecallPrecisionRecallF1=2\cdot\frac{\text{Precision}\cdot\text{Recall}}{\text{Precision}+\text{% Recall}}italic_F 1 = 2 ⋅ divide start_ARG Precision ⋅ Recall end_ARG start_ARG Precision + Recall end_ARG

where Precision is defined as the ratio of true positive observations to the total predicted positives, and Recall is the ratio of true positives to the total actual positives.

4.3. Baselines

For our comparative analysis, we select three established graph explanation methods as baselines, each paired with its HOGE counterpart. These explainers are evaluated for their efficacy in extracting meaningful explanations from graph neural networks (GNNs). Below, we detail each baseline and specify the underlying GNN models used in our experiments:

Table 2. Results on graph classification datasets for the HOGE framework. The better result between a baseline explainer and its HOGE version is underlined. We find that HOGE consistently achieves comparable or superior performance across explainers and datasets. *Although AE gets the best results on Fluoride Carbonyl, as we describe in section 5.1.2, the results are not reliable.
Dataset Metrics Explainer
GNNE HOGE-GNNE GME HOGE-GME AE HOGE-AE
Benzene GEA 0.124 ±plus-or-minus\pm± 0.170 0.745 ±plus-or-minus\pm± 0.069 0.072 ±plus-or-minus\pm± 0.021 0.113 ±plus-or-minus\pm± 0.022 0.032 ±plus-or-minus\pm± 0.021 0.084 ±plus-or-minus\pm± 0.031
AUC 0.45 ±plus-or-minus\pm± 0.119 0.884 ±plus-or-minus\pm± 0.03 0.503 ±plus-or-minus\pm± 0.007 0.509 ±plus-or-minus\pm± 0.006 0.374 ±plus-or-minus\pm± 0.023 0.515 ±plus-or-minus\pm± 0.018
F1 0.165 ±plus-or-minus\pm± 0.208 0.831 ±plus-or-minus\pm± 0.057 0.101 ±plus-or-minus\pm± 0.028 0.160 ±plus-or-minus\pm± 0.031 0.059 ±plus-or-minus\pm± 0.038 0.146 ±plus-or-minus\pm± 0.051
Fluoride Carbonyl GEA 0.134 ±plus-or-minus\pm± 0.008 0.178 ±plus-or-minus\pm± 0.046 0.045 ±plus-or-minus\pm± 0.019 0.050 ±plus-or-minus\pm± 0.017 0.238 ±plus-or-minus\pm± 0.128* 0.012 ±plus-or-minus\pm± 0.004
AUC 0.416 ±plus-or-minus\pm± 0.025 0.554 ±plus-or-minus\pm± 0.03 0.504 ±plus-or-minus\pm± 0.002 0.501 ±plus-or-minus\pm± 0.003 0.624 ±plus-or-minus\pm± 0.091* 0.483 ±plus-or-minus\pm± 0.004
F1 0.226 ±plus-or-minus\pm± 0.014 0.276 ±plus-or-minus\pm± 0.073 0.072 ±plus-or-minus\pm± 0.030 0.078 ±plus-or-minus\pm± 0.028 0.456 ±plus-or-minus\pm± 0.105* 0.018 ±plus-or-minus\pm± 0.006
Mutagenicity GEA 0.234 ±plus-or-minus\pm± 0.021 0.161 ±plus-or-minus\pm± 0.001 0.057 ±plus-or-minus\pm± 0.009 0.069 ±plus-or-minus\pm± 0.005 0.194 ±plus-or-minus\pm± 0.037 0.228 ±plus-or-minus\pm± 0.065
AUC 0.614 ±plus-or-minus\pm± 0.023 0.503 ±plus-or-minus\pm± 0.001 0.497 ±plus-or-minus\pm± 0.006 0.506 ±plus-or-minus\pm± 0.002 0.617 ±plus-or-minus\pm± 0.051 0.626 ±plus-or-minus\pm± 0.060
F1 0.334 ±plus-or-minus\pm± 0.024 0.261 ±plus-or-minus\pm± 0.007 0.092 ±plus-or-minus\pm± 0.014 0.113 ±plus-or-minus\pm± 0.009 0.302 ±plus-or-minus\pm± 0.048 0.338 ±plus-or-minus\pm± 0.110

GNNExplainer (GNNE) (Ying et al., 2019) A method that utilizes the idea of perturbations and aims to identify the most influential subgraph and node features responsible for a model’s prediction. This method optimizes for a mask that minimizes the difference between the predictions of the original and the perturbed graph, thereby highlighting crucial components. In our experiments, we employ GNNE with a Graph Convolutional Network (GCN) (Kipf and Welling, 2017) to elucidate its explanatory capabilities across different graph datasets.

Attention Explainer (AE) Leveraging the intrinsic attention mechanisms of Graph Attention Networks (GAT) (Veličković et al., 2018), this explainer aggregates attention scores across various layers and heads. This approach assumes that higher attention weights correlate with higher relevance to the model’s decision-making process. AE is specifically paired with GAT in our experimental setup to assess how attention-based explanations align with model predictions.

Graph Mask Explainer (GME) (Schlichtkrull et al., 2022) Also a perturbation-based approach, GME focuses on determining the minimal yet most informative subgraph that influences a GNN’s output. By iteratively masking out parts of the input graph, GME identifies critical nodes and edges that significantly affect the prediction accuracy. Like GNNE, GME is applied in conjunction with a GCN to explore the utility and limitations of perturbation-based explanations in our study.

Through our experiments, we seek to understand the comparative effectiveness of these baselines in providing transparent and actionable insights into GNN decisions. We hypothesize that the integration of higher-order explanations in HOGE versions may reveal more nuanced and comprehensive interpretative details, potentially leading to more robust and interpretable machine learning models.

4.4. Implementation Details

To ensure robust and generalizable results, all experiments are conducted across ten different dataset seeds, each featuring distinct train-test-validation splits. We report the average performance metrics for the explanations along with their standard deviations to capture variability and ensure reproducibility.

For the evaluation of explanation methods, we systematically select 500 graphs from each seed within every dataset to form a comprehensive test suite. This structured sampling allows for balanced and fair assessments across varying data distributions.

The experiments leverage two prominent Graph Neural Network (GNN) architectures:

  • Graph Convolutional Network (GCN) (Kipf and Welling, 2017): Employs spectral-based convolution layers that effectively capture graph topology.

  • Graph Attention Network (GAT) (Veličković et al., 2018): Integrates attention mechanisms to weigh the importance of nodes dynamically, based on their neighborhood.

Both architectures are configured with two convolution layers, each followed by a ReLU activation function (Agarap, 2018). The network topology concludes with a global mean pooling layer to aggregate node features and a final linear layer equipped with a Sigmoid activation function (Narayan, 1997) to produce output predictions.

Training is conducted using the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.01 and a weight decay parameter set at 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Each model undergoes 100 epochs of training, with the best-performing model selected based on the lowest validation loss.

Our computational environment consists of 4 NVIDIA GeForce RTX 2080 Ti GPUs and 40 Intel Xeon E5-2640 v4 CPU cores, supported by at least 80GB of RAM. This hardware setup ensures sufficient computational power to handle the intensive training and evaluation processes involved in our experiments.

5. Results

5.1. Evaluating HOGE

The core findings of our study are summarized in Table 2, which compares the performance of HOGE against standard baseline explainers across multiple datasets.

5.1.1. Overall Performance

The results in Table 2 demonstrate a consistent enhancement in the performance of explainer models when augmented with HOGE. These results reinforce that utilizing higher-order structures facilitates better explanations.

5.1.2. Specific Dataset Insights

  • Benzene A notable increase in performance (500% for GNNE) is observed with HOGE for all explainer models on the Benzene dataset. We find that the HOGE-enhanced version of GNNE (HOGE-GNNE) leads in performance, while the HOGE versions of AE and GME, despite enhancements, still lag behind HOGE-GNNE.

  • Fluoride Carbonyl Across this dataset, we generally observe performance improvements with the adoption of HOGE, for example we see a 32% increase for GNNE. However, an exception is noted with HOGE-AE, where performance did not improve. Detailed analysis revealed that the underlying GAT model struggled with the dataset, predicting only the majority class and failing to capture meaningful patterns. This, in turn, resulted in AE predicting almost randomly, as can be seen by the standard deviation. HOGE-AE, on the other hand, is stable in its results with small deviations.

  • Mutagenicity This dataset, known for its complexity, showed performance gains with HOGE for both AE (17%) and GME (21%) explainers, except for HOGE-GNNE. We attribute this to the fact that the underlying GNN with cell complex inputs performed worse than the traditional GNN on the binary classification task. The worse performance of the GNN may have resulted in worse explanations.

5.1.3. Discussion

The findings from these experiments affirm that incorporating higher-order structures enhances the capability of graph explainers to present human-interpretable substructures. However, we also note that the influence of higher-order structures is dependent on the presence of underlying higher-order structures in the base graph, as HOGE constructs cell complexes directly from the base graph without introducing any external structures.

These results collectively highlight the impacts of higher-order information in enhancing the interpretability and performance of explanation methods in graph neural networks.

5.2. Ablation Studies

Our method relies on higher-order cell complexes that come from lifting the graph. To analyze how important these cell complexes are, we use two approaches. We describe the approaches below. Table 3 displays the results of our study.

5.2.1. Training on Graphs, Explaining on Complexes

We train a model on the original graphs. To generate explanations, we lift the graphs to cell complexes and use the model trained on graphs and cell complexes as input to graph explainers. We find that if the model hasn’t seen these higher-order structures during training, then it doesn’t benefit from them during explanation. This details the importance of training the model on cell complexes.

5.2.2. Training on Complexes, Explaining on Graphs

We train a model on the cell complexes. For generating explanations, we only provide the base graph, along with this model to graph explainers. We find that the performance drastically drops, with AUC scores being comparable to not using HOGE at all. This shows us that our method strongly relies on higher-order relations, which supports our hypothesis that higher-order structures improve explanation accuracy.

5.3. Information Propagation Methods

This subsection elaborates on various methods of information (or importance) propagation used to derive graph explanations from cell complex explanations, as initially outlined in Section 3.4 and Section 3.5. These methods are specifically tested on the Benzene dataset described in Section 4.1.

Information propagation is crucial in our methodology as it determines how significance from complex graph structures (cycles and edges) is relayed to simpler underlying graphs. The following list details the different approaches we employed:

Table 3. AUC Scores for the Ablation Study. The best results are highlighted in bold. We see that the best performance is achieved when both training and explanation are done using cell complexes. A significant drop is seen in the AUC score when either training or explanation is done using just graphs, indicating the importance of cell complexes in the HOGE framework.
Configuration Training Data Explanation Data
Graphs Complexes
Benzene + GNNE Graphs 0.6727 0.6800
Complexes 0.4739 0.8906
Benzene + AE Graphs 0.3790 0.3790
Complexes 0.3730 0.5674
Benzene + GME Graphs 0.5047 0.4909
Complexes 0.5082 0.5664
Mutagenicity + GNNE Graphs 0.4148 0.4148
Complexes 0.4739 0.598
Mutagenicity + AE Graphs 0.5435 0.5435
Complexes 0.5955 0.6789
Mutagenicity + GME Graphs 0.4777 0.4804
Complexes 0.5129 0.5337
  1. (1)

    0-skeleton Propagation: In this method, we do not transfer importance from complex structures like cycles and edges, instead only using the 0-skeleton of the cell complex to pass information. This utilizes the idea of Dimension Masking introduced in Section 3.5. Essentially, this approach treats the base graph as isolated from its higher-dimensional complexes, focusing solely on its inherent properties without considering extended topological features.

  2. (2)

    1-skeleton Propagation: Here, the importance is transferred solely from the 1-cells and 0-cells down to the base graph, ignoring any contribution from the 2-cells. This method allows for the investigation of the impact of linear connections between nodes while still omitting cyclic structures. Again, we utilize Dimension Masking to achieve this effect.

  3. (3)

    Hierarchical Propagation: This approach involves transferring to the base graph in a hierarchical structure. Specifically, the importance of 2-cells is mapped to their corresponding faces, which are 1-cells, and then further down to the faces of the 1-cells, which are the 0-cells. This is then transferred to the base graph, facilitating a direct flow of information from more complex to simpler structures, effectively tracing the influence of higher-order relations to the base graph.

  4. (4)

    Direct Propagation: In contrast to hierarchical propagation, this method transfers the importance of all cells in the cell complex directly to the base graph. If a 1-cell or 2-cell is deemed important, its importance is distributed across all the edges in the base graph that constitutes it. This ensures that we maintain the integral value of cyclic structures in the explanation process. This is what we use for the above tables.

Table 4 presents the comparative performance of these methods. Notably, the 0-skeleton Propagation approach yields the least effective results, underscoring the limitations of disregarding higher-order structures in graph explanations. This finding aligns with our in-depth analysis in Section 5.2. Furthermore, the Hierarchical Propagation method underperforms relative to the Direct Propagation technique, likely due to the latter’s ability to preserve the holistic influence of cycles, which is critical when cycles contribute significantly to the graph’s characteristics. We also find that 1-skeleton Propagation performs the best, this is likely because the edges present in a cycle by themselves help aid the flow of information. Spreading importance from cycles directly may result in over-emphasizing them leading to worse performance.

Table 4. Information propagation method performance results on the benzene dataset. (1) 0-skeleton Propagation (2) 1-skeleton Propagation (3) Hierarchical Propagation and (4) Direct Propagation. The best results are highlighted in bold.
Metrics (1) (2) (3) (4)
GEA 0.635 0.806 0.672 0.717
AUC 0.481 0.919 0.847 0.872
F1 0.152 0.876 0.759 0.807

5.4. Performance Analysis

Empirically, we analyzed the size difference between a graph and its corresponding cell complex. We present these results in Table 5. We find that for the datasets used here, the cell complexes have 3.2x more cells than nodes in the original graphs and 3.5x more connections than edges in the original graph on average. Notably, we find that the average degree remains the same before and after lifting the graph, indicating that while the cell complexes are larger, they are not particularly denser. This shows that our implementation of lifting graphs to complexes described in Section 3.2 does not incur a substantial performance overhead.

Table 5. Increase in size when lifting a graph to a cell complex. δ𝛿\varnothing\delta∅ italic_δ represents the average degree for both graphs and cell complexes.
Dataset Graphs Cell Complexes
|V|𝑉\varnothing|V|∅ | italic_V | |E|𝐸\varnothing|E|∅ | italic_E | δ𝛿\varnothing\delta∅ italic_δ |Vχ|subscript𝑉𝜒\varnothing|V_{\chi}|∅ | italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | |Eχ|subscript𝐸𝜒\varnothing|E_{\chi}|∅ | italic_E start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT | δ𝛿\varnothing\delta∅ italic_δ
Benzene 20.58 43.63 2.12 66.47 156.04 2.34
Mutagenicity 29.1 60.83 2.08 92.18 207.64 2.25
Fluoride Carbonyl 21.42 45.44 2.12 69.04 162.48 2.32

We perform further analyses and report the average time (over 10 runs) for the conversion to cell complex, GNN training, and prediction explanation for both graphs and cell complexes. We perform these tests on the Benzene dataset with a GCN model, and GNNExplainer as the explainer. GNN is trained for 50 epochs, while GNNExplainer is trained for 200 epochs. Results are detailed in Table 6. The conversion to a cell complex is quick and has negligible contribution to the overhead. The main computational overhead comes from training the GNN on cell complexes, which is expected since they are larger. However, the overhead is relatively small compared to the increase in the size. GNNExplainer takes similar amounts of time when run with graphs and complexes, indicating that adding higher-order structures does not slow down the generation of explanations. The slow performance of GNNExplainer can be attributed to the fact that explanations are not running in batches, but it is explaining one graph at a time.

Table 6. Results for lifting time, training time, and explanation time for graphs and complexes (in seconds). Lifting is performed on all 12000 samples in the Benzene dataset, training is performed on 8400 samples, and explanation on 500 samples.
Component Graphs Cell Complexes
Lifting Graph - 1.89s
GNN Training 20.64s 47.35s
Explanation 82.58s 93.36s
Total 103.22s 142.6s

6. Limitations

While HOGE demonstrates significant advancements in explaining the predictions of Graph Neural Networks by utilizing higher-order structures, some limitations warrant further discussion. Firstly, the computational complexity associated with lifting graphs to higher-dimensional structures can be substantial, especially with large-scale graphs common in domains like social networking and bioinformatics. Because we primarily test on small molecular and synthetic datasets, our experiments show minimal overhead; scalability to even larger graphs remains an open problem.

Moreover, the effectiveness of HOGE is heavily reliant on the inherent properties of the dataset, particularly the presence of meaningful higher-order structures. In datasets where such structures are sparse or irrelevant, the benefits of HOGE may not be as pronounced. This can potentially lead to unnecessary computational expenses without corresponding gains in explainability.

7. Conclusion

HOGE represents a novel approach to enhancing the explainability of Graph Neural Networks through the integration of higher-order structures. Our experiments demonstrate that incorporating cell complexes can lead to more accurate and interpretable explanations of GNN predictions with minimal computational overhead on the tested benchmarks. We also test the importance of higher-order structures using ablation studies and show that the best results are obtained for graph predictions when both GNN training and explaining are done using cell complexes. Looking ahead, there are several promising directions for extending this work. Firstly, expanding the applicability of HOGE to a broader range of GNN architectures and graph explainers could further validate and refine the approach. Additionally, we hope this work motivates investigating methods to reduce the computational complexity associated with higher-dimensional structures, making HOGE more practical for large-scale applications.

References

  • (1)
  • Agarap (2018) Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
  • Agarwal et al. (2023) Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. 2023. Evaluating explainability for graph neural networks. Scientific Data 10, 1 (2023), 144.
  • Bodnar et al. (2021) Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yuguang Wang, Pietro Liò, Guido F Montufar, and Michael Bronstein. 2021. Weisfeiler and Lehman Go Cellular: CW Networks. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 2625–2640.
  • Bodnar et al. (2022) Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yu Guang Wang, Pietro Liò, Guido Montúfar, and Michael Bronstein. 2022. Weisfeiler and Lehman Go Cellular: CW Networks. arXiv:2106.12575 [cs.LG]
  • Ebli et al. (2020) Stefania Ebli, Michaël Defferrard, and Gard Spreemann. 2020. Simplicial neural networks. arXiv preprint arXiv:2010.03633 (2020).
  • Gerbner et al. (2018) Dániel Gerbner, Balázs Keszegh, Cory Palmer, and Balázs Patkós. 2018. On the number of cycles in a graph with restricted cycle lengths. SIAM Journal on Discrete Mathematics 32, 1 (2018), 266–279.
  • Huang et al. (2020) Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, and Yi Chang. 2020. GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks. arXiv:2001.06216 [cs.LG]
  • Kakkad et al. (2023) Jaykumar Kakkad, Jaspal Jannu, Kartik Sharma, Charu Aggarwal, and Sourav Medya. 2023. A Survey on Explainability of Graph Neural Networks. arXiv:2306.01958 [cs.LG]
  • Kazius et al. (2005) Jeroen Kazius, Ross McGuire, and Roberta Bursi. 2005. Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry 48, 1 (2005), 312–320.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]
  • Luo et al. (2020) Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. 2020. Parameterized Explainer for Graph Neural Network. arXiv:2011.04573 [cs.LG]
  • Narayan (1997) Sridhar Narayan. 1997. The generalized sigmoid activation function: Competitive supervised learning. Information Sciences 99, 1 (1997), 69–82. https://doi.org/10.1016/S0020-0255(96)00200-9
  • Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). ACM. https://doi.org/10.1145/2623330.2623732
  • Schlichtkrull et al. (2022) Michael Sejr Schlichtkrull, Nicola De Cao, and Ivan Titov. 2022. Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking. arXiv:2010.00577 [cs.CL]
  • Selvaraju et al. (2019) Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2019. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision 128, 2 (Oct. 2019), 336–359. https://doi.org/10.1007/s11263-019-01228-7
  • Shan et al. (2021) Caihua Shan, Yifei Shen, Yao Zhang, Xiang Li, and Dongsheng Li. 2021. Reinforcement Learning Enhanced Explainer for Graph Neural Networks. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 22523–22533. https://proceedings.neurips.cc/paper_files/paper/2021/file/be26abe76fb5c8a4921cf9d3e865b454-Paper.pdf
  • Spinelli et al. (2024) Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2024. A Meta-Learning Approach for Training Explainable Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 35, 4 (April 2024), 4647–4655. https://doi.org/10.1109/tnnls.2022.3171398
  • Sterling and Irwin (2015) Teague Sterling and John J. Irwin. 2015. ZINC 15 – Ligand Discovery for Everyone. Journal of Chemical Information and Modeling 55, 11 (2015), 2324–2337. https://doi.org/10.1021/acs.jcim.5b00559 arXiv:https://doi.org/10.1021/acs.jcim.5b00559 PMID: 26479676.
  • Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML]
  • Yang et al. (2022) Ruochen Yang, Frederic Sala, and Paul Bogdan. 2022. Efficient Representation Learning for Higher-Order Data with Simplicial Complexes. In Learning on Graphs Conference. PMLR, 13–1.
  • Ying et al. (2019) Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. arXiv:1903.03894 [cs.LG]
  • Yuan et al. (2021) Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. 2021. On Explainability of Graph Neural Networks via Subgraph Explorations. arXiv:2102.05152 [cs.LG]