License: CC BY-NC-ND 4.0
arXiv:2312.09802v1 [cs.LG] 15 Dec 2023
\theorembodyfont\theoremheaderfont\theorempostheader

: \theoremsep
\jmlrvolume1 \jmlryear2023 \jmlrworkshopAAAI Workshop on AI for Education \editorEditor’s name

Concept Prerequisite Relation Prediction by Using \titlebreakPermutation-Equivariant Directed Graph Neural Networks \titletagthanks: This work was supported in part by National Natural Science Foundation of China (62272392, U22A2025) and the funding for Teaching & Learning Reform at NPU (2023JGZ14).

\NameXiran Qu \Email[email protected]
\NameXuequn Shang \Email[email protected]
\NameYupei Zhang\nametag \Email[email protected]
\addrSchool of Computer Science
Corresponding Author
   Northwestern Polytechnical University    Xi’an    710129    China.
\addrBig data Storage and Management MIIT Lab
   Xi’an    710129    China
Abstract

This paper studies the problem of CPRP, concept prerequisite relation prediction, which is a fundamental task in using AI for education. CPRP is usually formulated into a link-prediction task on a relationship graph of concepts and solved by training the graph neural network (GNN) model. However, current directed GNNs fail to manage graph isomorphism which refers to the invariance of non-isomorphic graphs, reducing the expressivity of resulting representations. We present a permutation-equivariant directed GNN model by introducing the Weisfeiler-Lehman test into directed GNN learning. Our method is then used for CPRP and evaluated on three public datasets. The experimental results show that our model delivers better prediction performance than the state-of-the-art methods.

keywords:
Concept Prerequisite Relation, permutation-equivariant GNNs, Weisfeiler-Lehman Test, Directed Graph Learning, AI for Education.

1 Introduction

With the continuous advancement of dissemination methods, an increasing number of educational resources are becoming available for people to learn Fischer et al. (2020). Therefore, finding prerequisite relationships among concepts has become an important issue requiring investigation in the field of AI for education Pan et al. (2017); Roy et al. (2019). Generally, this Concept Prerequisite Relation Prediction task, CPRP, is modeled as the link prediction problem in many studies Sun et al. (2022); Roy et al. (2019). Applications of CRPR involve material recommendation Guan et al. (2023), learning path planning Shi et al. (2020), and optimization of problem-solving paths Le et al. (2023).

There are numerous approaches to solving the link prediction problem, including probabilistic models, spectral clustering, evolutionary algorithms, and deep graph learning models Kumar et al. (2020). Currently, graph neural networks (GNNs) have become the benchmark method and presented state-of-the-art performance in link prediction Cai et al. (2021), including many GNN-based solutions to CPRP Roy et al. (2019); Jia et al. (2021); Sun et al. (2022); Mazumder et al. (2023). To improve the capability of GNNs, Long et al. (2022) proposed to pre-train the node features via the method of graph reconstruction, achieving improved performance on two biological prediction tasks and effectively reducing training costs. Chamberlain et al. (2022) proposed to utilize subgraph sketches to pass messages in subgraph GNNs, delivering higher accuracy and lower computation costs in line prediction tasks. This model mitigated the expressive limitations, such as the inability to count triangles and distinguish automorphic nodes. Persistent homology was also adopted in the work of Yan et al. (2021) to extract topological information from graphs, which was integrated with node features to enhance the expressive power of GNNs for link prediction. Besides, the Weisfeiler-Lehman test was introduced by Morris et al. (2019) to improve the capability of differentiating graph isomorphism in undirected GNN learning Huang et al. (2022).

However, the CPRP problem is usually formulated into the directed-link prediction in a directed graph Sun et al. (2022). Rather than the undirected graph, the edges can indicate the prerequisite relation, describing the flow of information from one node to another. The difference makes undirected GNNs not directly applicable to directed graphs. Hence, Salha et al. (2019) designed a new gravity-inspired decoder to extend the graph Autoencoders, while Wu et al. (2019) adopted two weight matrixes for forward edges and backward edges, respectively, to perform message passing between graph nodes. Nevertheless, there is a lack of studies on improving the expressive power of direct GNNs for CPRP.

In this paper, we extended the Weisfeiler-Lehman test-based GNN model, i.e., SpeqNets recently introduced by Morris et al. (2022), into the directed graphs for CPRP. The proposed framework contributes to learning the permutation-equivariant direct GNNs and improving the prediction performance of CPRP by distinguishing non-isomorphism graphs. Experimental results on three public datasets manifest that our method performs better than the state-of-the-art methods Sun et al. (2022); Jia et al. (2021); Roy et al. (2019).

2 CPRP: Concept Prerequisite Relation Prediction

The problem of CPRP refers to predicting prerequisite relations between knowledge concepts involved in learning Sun et al. (2022). For example, one should learn the knowledge concept (KC) of “conditional probability distribution” before learning “Bayesian theory.” As usual, CPRP can be formulated into the directed-link prediction in a KC graph.

2.1 Problem formulation

Denote by G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) a directed KC graph with a vertex set V={v1,v2,,vN}𝑉subscript𝑣1subscript𝑣2subscript𝑣𝑁V=\{v_{1},v_{2},\ldots,v_{N}\}italic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, where visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents a KC in our study, and an edge set E={eij}1i,jN𝐸subscriptsubscript𝑒𝑖𝑗formulae-sequence1𝑖𝑗𝑁E=\{e_{ij}\}_{1\leq i,j\leq N}italic_E = { italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT 1 ≤ italic_i , italic_j ≤ italic_N end_POSTSUBSCRIPT, where eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the prerequisite relation vivjsubscript𝑣𝑖subscript𝑣𝑗v_{i}\to v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT meaning that the KC visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a prerequisite KC vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Denote by ARN×N𝐴superscript𝑅𝑁𝑁A\in R^{N\times N}italic_A ∈ italic_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT the adjacency matrix of G𝐺Gitalic_G with each element being 0 or 1. CPRP on G𝐺Gitalic_G can be written into

𝒫(vp,vq)=(𝒢(vp),𝒢(vq))𝒫subscript𝑣𝑝subscript𝑣𝑞𝒢subscript𝑣𝑝𝒢subscript𝑣𝑞\mathcal{P}(v_{p},v_{q})=\mathcal{M}(\mathcal{G}(v_{p}),\mathcal{G}(v_{q}))caligraphic_P ( italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = caligraphic_M ( caligraphic_G ( italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) , caligraphic_G ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ) (1)

where vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT are two KCs; 𝒫𝒫\mathcal{P}caligraphic_P is the probability of whether the relation epqsubscript𝑒𝑝𝑞e_{pq}italic_e start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT exists; 𝒢𝒢\mathcal{G}caligraphic_G is a representation model that integrates KC information into a vector; the function \mathcal{M}caligraphic_M aims to obtain the existing probability of the prerequisite relation epqsubscript𝑒𝑝𝑞e_{pq}italic_e start_POSTSUBSCRIPT italic_p italic_q end_POSTSUBSCRIPT.

2.2 Previous Methods

Early works of CPRP usually extracted handcrafted features for 𝒢𝒢\mathcal{G}caligraphic_G, such as the contextual and structural features Pan et al. (2017), while recent works are focused on designing deep-learning models, such as Siamese networks Roy et al. (2019) and GNNs Sun et al. (2022).

Inspired by the promising performance, we implemented 𝒢𝒢\mathcal{G}caligraphic_G by training a GNN model in this study. GNNs aim to learn node representations in a graph by iteratively aggregating the neighborhood features. In each layer, the feature of the node v𝑣vitalic_v is updated by merging the information transmitted from its neighbors, expressed as

𝐟v(t)=merW1(t)(𝐟v(t1),aggW2(t)({𝐟(t1)(u)|u𝒩(v)}))subscriptsuperscript𝐟𝑡𝑣superscriptsubscript𝑚𝑒𝑟superscriptsubscript𝑊1𝑡subscriptsuperscript𝐟𝑡1𝑣superscriptsubscript𝑎𝑔𝑔superscriptsubscript𝑊2𝑡conditional-setsuperscript𝐟𝑡1𝑢𝑢𝒩𝑣\mathbf{f}^{(t)}_{v}=\mathcal{F}_{mer}^{W_{1}^{(t)}}(\mathbf{f}^{(t-1)}_{v},% \mathcal{F}_{agg}^{W_{2}^{(t)}}(\{{\mathbf{f}^{(t-1)}(u)}|u\in\mathcal{N}(v)\}))bold_f start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = caligraphic_F start_POSTSUBSCRIPT italic_m italic_e italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_f start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT italic_a italic_g italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( { bold_f start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT ( italic_u ) | italic_u ∈ caligraphic_N ( italic_v ) } ) ) (2)

where 𝐟v(t)subscriptsuperscript𝐟𝑡𝑣\mathbf{f}^{(t)}_{v}bold_f start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT indicates the feature vector of the node v𝑣vitalic_v at the t𝑡titalic_t-th layer; merW1tsuperscriptsubscript𝑚𝑒𝑟superscriptsubscript𝑊1𝑡\mathcal{F}_{mer}^{W_{1}^{t}}caligraphic_F start_POSTSUBSCRIPT italic_m italic_e italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the merging function with learned parameters W1(t)superscriptsubscript𝑊1𝑡W_{1}^{(t)}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and aggW2(t)superscriptsubscript𝑎𝑔𝑔superscriptsubscript𝑊2𝑡\mathcal{F}_{agg}^{W_{2}^{(t)}}caligraphic_F start_POSTSUBSCRIPT italic_a italic_g italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is the aggregating function with network parameters W2(t)superscriptsubscript𝑊2𝑡W_{2}^{(t)}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT; 𝒩(v)𝒩𝑣\mathcal{N}(v)caligraphic_N ( italic_v ) delivers the neighbors of node v𝑣vitalic_v Wu et al. (2020).

With the node representations from GNNs, many methods can estimate the prerequisite relations by employing similarity metrics or classical classifiers for \mathcal{M}caligraphic_M Liang et al. (2018). However, the previous studies fail to consider the problem of graph isomorphism of the KC graphs, leading to low expressive powers for CPRP.

3 Our CPRP Method

To achieve fine representations of KCs, our method adopts the well-known Weisfeiler-Leman test Morris et al. (2021) to guide the GNN training in the KC graph G𝐺Gitalic_G. With the obtained KC representations, the Siamese network computes the link probability, shown in Fig. 1.

3.1 Weisfeiler-Leman Test

Denote by S={(s1,s2,,sk)|siV,iIk}𝑆conditional-setsubscript𝑠1subscript𝑠2subscript𝑠𝑘formulae-sequencesubscript𝑠𝑖𝑉𝑖subscript𝐼𝑘S=\{(s_{1},s_{2},...,s_{k})|s_{i}\in V,i\in I_{k}\}italic_S = { ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V , italic_i ∈ italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } a k-tuple S𝑆Sitalic_S of vertices in the KC graph G𝐺Gitalic_G, where Iksubscript𝐼𝑘I_{k}italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT indicates the first k𝑘kitalic_k natural numbers; sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th element of S𝑆Sitalic_S specified to a vjVsubscript𝑣𝑗𝑉v_{j}\in Vitalic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_V. Let Vk(G)superscript𝑉𝑘𝐺V^{k}(G)italic_V start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_G ) be the collection of all k𝑘kitalic_k-tuples from the graph G𝐺Gitalic_G. Weisfeiler-Leman (WL) test is to assign labels to each tuple in Vk(G)superscript𝑉𝑘𝐺V^{k}(G)italic_V start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_G ) and then iteratively relabel these tuples by merging their neighborhood labels. Here, the j𝑗jitalic_j-th neighborhood of the tuple S𝑆Sitalic_S is yielded by replacing its j𝑗jitalic_j-th element with every node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, i.e., 𝒩jk(S,v)={(s1,s2,,sj1,v,sj+1,sk)|vV}superscriptsubscript𝒩𝑗𝑘𝑆𝑣conditional-setsubscript𝑠1subscript𝑠2subscript𝑠𝑗1𝑣subscript𝑠𝑗1subscript𝑠𝑘𝑣𝑉\mathcal{N}_{j}^{k}(S,v)=\{(s_{1},s_{2},...,s_{j-1},v,s_{j+1},...s_{k})|v\in V\}caligraphic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_S , italic_v ) = { ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , italic_v , italic_s start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , … italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | italic_v ∈ italic_V } Morris et al. (2021). Based on all neighborhoods of S𝑆Sitalic_S, the WL test usually uses a predefined route to compute a new label for the merged node in the i𝑖iitalic_i-th iteration, i.e.,

li(S)={{(Cik(𝒩1k(S,v),𝒩2k(S,v),,𝒩kk(S,v)|vV}}l_{i}(S)=\{\{(C^{k}_{i}(\mathcal{N}_{1}^{k}(S,v),\mathcal{N}_{2}^{k}(S,v),...,% \mathcal{N}_{k}^{k}(S,v)|v\in V\}\}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S ) = { { ( italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_S , italic_v ) , caligraphic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_S , italic_v ) , … , caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_S , italic_v ) | italic_v ∈ italic_V } } (3)

where li(S)subscript𝑙𝑖𝑆l_{i}(S)italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S ) represents the obtained label and Ciksubscriptsuperscript𝐶𝑘𝑖C^{k}_{i}italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates a predefined function that maps all tuples Vksuperscript𝑉𝑘V^{k}italic_V start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT to new labels. Then, the iterative labeling for S𝑆Sitalic_S can be expressed as

Cik(S)=Relabel((Ci1k(S),li(S)))subscriptsuperscript𝐶𝑘𝑖𝑆𝑅𝑒𝑙𝑎𝑏𝑒𝑙subscriptsuperscript𝐶𝑘𝑖1𝑆subscript𝑙𝑖𝑆C^{k}_{i}(S)=Relabel((C^{k}_{i-1}(S),l_{i}(S)))italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S ) = italic_R italic_e italic_l italic_a italic_b italic_e italic_l ( ( italic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ( italic_S ) , italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S ) ) ) (4)

where Relabel(a,b)𝑅𝑒𝑙𝑎𝑏𝑒𝑙𝑎𝑏Relabel(a,b)italic_R italic_e italic_l italic_a italic_b italic_e italic_l ( italic_a , italic_b ) is to achieve ab𝑎𝑏a\leftarrow bitalic_a ← italic_b.

The k𝑘kitalic_k-WL test is a powerful tool for distinguishing isomorphic graphs. Let G𝐺Gitalic_G and H𝐻Hitalic_H be two graphs. If the number of k𝑘kitalic_k-tuples with a specific label differs between graphs G𝐺Gitalic_G and H𝐻Hitalic_H at any iteration, then the two graphs are non-isomorphic. Specifically, 1-WL employs the 1-hop neighbors for 𝒩𝒩\mathcal{N}caligraphic_N in Eq. (3). With the increase of k𝑘kitalic_k, the k𝑘kitalic_k-WL algorithm can become more capable of distinguishing non-isomorphic graphs.

3.2 Main Steps

Refer to caption
Figure 1: Workflow of the proposed method for CPRP.

3.2.1 KC Graph Construction

The first step of the proposed method is to achieve the node features for the given directed KC graph G𝐺Gitalic_G using the pre-trained BERT 111https://huggingface.co/bert-large-uncased. More specifically, the textual descriptions of KCs from the datasets or Wikipedia were obtained and fed into BERT to extract the KC embedding for V𝑉Vitalic_V. With the given E𝐸Eitalic_E, we achieved the KC graph G𝐺Gitalic_G Devlin et al. (2018).

3.2.2 Weisfeiler Leman Guided Dircted GNNs

On the resulting graph G𝐺Gitalic_G, we proposed the directed GNN model guided by the k𝑘kitalic_k-WL test to achieve KC representations in this step. To be different from undirected graphs, we denote by 𝒩out(v)subscript𝒩𝑜𝑢𝑡𝑣\mathcal{N}_{out}(v)caligraphic_N start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_v ) the set of all out-neighbors of the node v𝑣vitalic_v. For two connected nodes v𝑣vitalic_v and w𝑤witalic_w in a directed graph, if there exists an edge pointing from v𝑣vitalic_v to w𝑤witalic_w, node w𝑤witalic_w is defined as an out-neighbor of node v𝑣vitalic_v. In the proposed method, we redefined the k𝑘kitalic_k-tuple as follows,

V^(G)k:={(s1,s2,,sk)|siV,iIk;jIi1:si𝒩out(sj){sj}}.assign^𝑉superscript𝐺𝑘conditional-setsubscript𝑠1subscript𝑠2subscript𝑠𝑘:formulae-sequencesubscript𝑠𝑖𝑉formulae-sequence𝑖subscript𝐼𝑘𝑗subscript𝐼𝑖1subscript𝑠𝑖subscript𝒩𝑜𝑢𝑡subscript𝑠𝑗subscript𝑠𝑗\widehat{V}(G)^{k}:=\{(s_{1},s_{2},...,s_{k})|s_{i}\in V,i\in I_{k};\exists{j}% \in I_{i-1}:{s_{i}}\in\mathcal{N}_{out}(s_{j})\cup{\{s_{j}\}}\}.over^ start_ARG italic_V end_ARG ( italic_G ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT := { ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V , italic_i ∈ italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; ∃ italic_j ∈ italic_I start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT : italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_N start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∪ { italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } } . (5)

The feature representation of k𝑘kitalic_k-tuple (s1,s2,,sk)subscript𝑠1subscript𝑠2subscript𝑠𝑘(s_{1},s_{2},...,s_{k})( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is here given by [𝐧𝟏:𝐧𝟐::𝐧𝐤]delimited-[]:subscript𝐧1subscript𝐧2::subscript𝐧𝐤[\mathbf{n_{1}}:\mathbf{n_{2}}:...:\mathbf{n_{k}}][ bold_n start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT : bold_n start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT : … : bold_n start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT ], where 𝐧𝐢subscript𝐧𝐢\mathbf{n_{i}}bold_n start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT denotes the embedding of node sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT; [:][\cdot:\cdot][ ⋅ : ⋅ ] denotes the concatenation of vectors. For SV^(G)kfor-all𝑆^𝑉superscript𝐺𝑘\forall{S}\in\widehat{V}(G)^{k}∀ italic_S ∈ over^ start_ARG italic_V end_ARG ( italic_G ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, the j𝑗jitalic_j-th out-neighbor of S𝑆{S}italic_S can be cast as:

𝒩^jk(S):={(s1,s2,,sj1,v,sj+1,sk)|v𝒩out(sj)}assignsubscriptsuperscript^𝒩𝑘𝑗𝑆conditional-setsubscript𝑠1subscript𝑠2subscript𝑠𝑗1𝑣subscript𝑠𝑗1subscript𝑠𝑘𝑣subscript𝒩𝑜𝑢𝑡subscript𝑠𝑗\widehat{\mathcal{N}}^{k}_{j}({S}):=\{(s_{1},s_{2},...,s_{j-1},v,s_{j+1},...s_% {k})|v\in\mathcal{N}_{out}(s_{j})\}over^ start_ARG caligraphic_N end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_S ) := { ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , italic_v , italic_s start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , … italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | italic_v ∈ caligraphic_N start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } (6)

For all k𝑘kitalic_k-tuples, the neighbor relationships generated on the j𝑗jitalic_j-th element are used to construct a graph representation as Gjsubscript𝐺𝑗G_{j}italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. For all graphs Gi(iIk)subscript𝐺𝑖𝑖subscript𝐼𝑘G_{i}(i\in I_{k})italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_i ∈ italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), each node in the graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents a k𝑘kitalic_k-tuple in V^(G)k^𝑉superscript𝐺𝑘\widehat{V}(G)^{k}over^ start_ARG italic_V end_ARG ( italic_G ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. For all graphs Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, graph neural networks are constructed separately for training. After each layer of training is completed, the features of the same k𝑘kitalic_k-tuple on different graphs Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are fused using multilayer perceptron (MLP). The expression of k𝑘kitalic_k-tuple S𝑆Sitalic_S at the t𝑡titalic_t-th layer is as follows:

𝐟^S=MLP(S(1),S(2),,S(k))subscript^𝐟𝑆subscript𝑀𝐿𝑃superscriptsubscript𝑆1superscriptsubscript𝑆2superscriptsubscript𝑆𝑘\widehat{\mathbf{f}}_{S}=\mathcal{F}_{MLP}(\mathcal{F}_{S}^{(1)},\mathcal{F}_{% S}^{(2)},...,\mathcal{F}_{S}^{(k)})over^ start_ARG bold_f end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = caligraphic_F start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT ( caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) (7)

where 𝐟^Ssubscript^𝐟𝑆\widehat{\mathbf{f}}_{S}over^ start_ARG bold_f end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is defined as the representation of k𝑘kitalic_k-tuple S𝑆Sitalic_S at layer t. For iIkfor-all𝑖subscript𝐼𝑘\forall i\in I_{k}∀ italic_i ∈ italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, S(i)superscriptsubscript𝑆𝑖\mathcal{F}_{S}^{(i)}caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT denotes the representation of k𝑘kitalic_k-tuple S𝑆Sitalic_S in graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT after passing through the GCN Kipf and Welling (2016) designed for graph Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. MLPsubscript𝑀𝐿𝑃\mathcal{F}_{MLP}caligraphic_F start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT denotes the multilayer perceptron. After several layers of learning, we obtain the representations of all k𝑘kitalic_k-tuples.

The representations of k𝑘kitalic_k-tuples obtained through GNN training are finally distributed to the nodes of G𝐺Gitalic_G using an average allocation scheme, shown as follows:

𝐱j(i)=1#{S|SV^(G)k,si=vj}{S|SV^(G)k,si=vj}𝐡Ssuperscriptsubscript𝐱𝑗𝑖1#conditional-set𝑆formulae-sequencefor-all𝑆^𝑉superscript𝐺𝑘subscript𝑠𝑖subscript𝑣𝑗subscriptconditional-set𝑆formulae-sequencefor-all𝑆^𝑉superscript𝐺𝑘subscript𝑠𝑖subscript𝑣𝑗subscript𝐡𝑆\textbf{x}_{j}^{(i)}=\frac{1}{\#\{\mathit{S}|\forall\mathit{S}\in\widehat{V}(G% )^{k},s_{i}=v_{j}\}}\displaystyle\sum_{\{\mathit{S}|\forall\mathit{S}\in% \widehat{V}(G)^{k},s_{i}=v_{j}\}}{\textbf{h}_{S}}x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG # { italic_S | ∀ italic_S ∈ over^ start_ARG italic_V end_ARG ( italic_G ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } end_ARG ∑ start_POSTSUBSCRIPT { italic_S | ∀ italic_S ∈ over^ start_ARG italic_V end_ARG ( italic_G ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } end_POSTSUBSCRIPT h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT (8)

where sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the i𝑖iitalic_i-th element in the k-tuple S𝑆Sitalic_S. 𝐱j(i)subscriptsuperscript𝐱𝑖𝑗\textbf{x}^{(i)}_{j}x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT represents the transmission of feature expressions of k𝑘kitalic_k-tuples back to the representations of concept vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, based on the i𝑖iitalic_i-th element in each k𝑘kitalic_k-tuple. 𝐡Ssubscript𝐡𝑆\textbf{h}_{S}h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT denotes the feature of k𝑘kitalic_k-tuple S𝑆Sitalic_S. #{}#\#\{...\}# { … } denotes the number of elements in the set {}\{...\}{ … }. For ifor-all𝑖\forall i∀ italic_i, all 𝐱j(i)subscriptsuperscript𝐱𝑖𝑗\textbf{x}^{(i)}_{j}x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT vectors are merged and fed into a multi-layer perceptron to obtain node representations in G𝐺Gitalic_G.

𝐟j=MLP([𝐱j(1):𝐱j(2)::𝐱j(k)])\mathbf{f}_{j}=\mathcal{F}_{MLP}([\textbf{x}^{(1)}_{j}:\textbf{x}^{(2)}_{j}:..% .:\textbf{x}^{(k)}_{j}])bold_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_F start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT ( [ x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : … : x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ) (9)

where 𝐟jsubscript𝐟𝑗\mathbf{f}_{j}bold_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT represents the learned concept vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT representations.

3.2.3 Prediction Network

After obtaining the KC representations, the Siamese network is employed here to predict the probability that the concept vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is a prerequisite concept of concept vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in Eq. (1). Let 𝐞~psubscript~𝐞𝑝\mathbf{\tilde{e}}_{p}over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and 𝐞~qsubscript~𝐞𝑞\mathbf{\tilde{e}}_{q}over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT be the representations of vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT from the two feed-forward networks with shared weights in the Siamese network. The probability between vpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and vqsubscript𝑣𝑞v_{q}italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is achieved via

𝒫(vp,vq)=σ(W[𝐞~i:𝐞~j:𝐞~i𝐞~j:𝐞~i𝐞~j:1])\mathcal{P}(v_{p},v_{q})=\sigma\left(W\left[\mathbf{\tilde{e}}_{i}:\mathbf{% \tilde{e}}_{j}:\mathbf{\tilde{e}}_{i}-\mathbf{\tilde{e}}_{j}:\mathbf{\tilde{e}% }_{i}\odot\mathbf{\tilde{e}}_{j}:1\right]\right)caligraphic_P ( italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = italic_σ ( italic_W [ over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over~ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : 1 ] ) (10)

where σ𝜎\sigmaitalic_σ represents the sigmoid operator and direct-product\odot is Hadamard product. Finally, we used the cross-entropy loss to compute the training loss for the deep framework.

4 Experiments

We used LectureBank Li et al. (2019), University Courses Liang et al. (2017), and ML of MOOCs Pan et al. (2017) to evaluate the performance and compare to the state-of-the-art methods, including binary classification models (SVM, LR, NB, RF) Pan et al. (2017), RefD Liang et al. (2015), GAE Li et al. (2019), VGAE Li et al. (2019), PREREQ Roy et al. (2019), CPRL Jia et al. (2021), and Conlearn Sun et al. (2022). Besides, we employed precision, recall, and F1-score as evaluation metrics to measure the performance.

For BERT, we leveraged a combination of course lecture information and Wikipedia for concept description extraction in the MOOC and University Courses datasets. For the LectureBank dataset, we utilized the text information from the Wikipedia URLs of each concept in the dataset. When extracting vectors from texts, the max token size of BERT was set to 256 for all three datasets.

For our method, the parameter k𝑘kitalic_k of k𝑘kitalic_k-WL was set to 2. We used Adam as the optimizer with a learning rate of 0.00002 for all experiments. The batch size was set to 256 for the MOOC dataset and LectureBank dataset and 512 for the University Course dataset. The models were trained for 4000 epochs for all experiments until the loss stabilized. As for the baseline methods, we used default parameters as in their original implementations.

For all three datasets, we selected 80% of the concept prerequisite pairs as the training set and 20% of the concept prerequisite pairs as the test set. Negative samples were generated by randomly selecting unrelated phrase pairs from the vocabulary, along with the reverse pairs of the original positive samples. The results are recorded in Table 1.

Table 1: Comparison between the results of the baseline model and our model
Datasets Metric NB SVM LR RF RefD GAE VGAE PREREQ CPRL ConLearn Ours
MOOCs Precision 0.577 0.668 0.748 0.375 0.784 0.293 0.266 0.448 0.800 0.895 0.915
Recall 0.623 0.577 0.270 0.669 0.188 0.733 0.647 0.592 0.642 0.850 0.860
F1-score 0.599 0.619 0.397 0.481 0.303 0.419 0.377 0.510 0.712 0.872 0.887
LectureBank Precision 0.670 0.857 0.744 0.855 0.666 0.462 0.417 0.590 0.861 0.831 0.857
Recall 0.640 0.692 0.744 0.681 0.228 0.811 0.575 0.502 0.858 0.960 0.960
F1-score 0.655 0.766 0.744 0.758 0.339 0.589 0.484 0.543 0.860 0.891 0.906
University Courses Precision 0.478 0.796 0.595 0.739 0.919 0.450 0.470 0.468 0.689 0.611 0.822
Recall 0.649 0.635 0.546 0.480 0.415 0.886 0.694 0.916 0.760 0.966 0.74
F1-score 0.550 0.707 0.569 0.582 0.572 0.597 0.560 0.597 0.723 0.749 0.778

From Table 1, we can draw the following observations: the four binary classifiers, i.e., NB, SVM, LR, and RF, perform weak on the three datasets due to hand-crafted features, as well as RefD; GAE, VGAE, and PREREQ exploit the prerequisite relation information between KCs, resulting in improved performance; the CPRL method fails to delve into the textual information behind the concepts, leading to better performance; finally, both our algorithm and ConLearn extract prior information using the large-scale language model BERT and yield the best evaluations. Importantly, compared to ConLearn, our method uses 2-WL to integrate the structural information of the graph deeply, resulting in the best performance in terms of F1-score. All observations manifest that the introduced WL test into direct GNN is effective for CPRP.

5 Conclusion

This paper proposes a directed graph neural network based on the Weisfeiler-Leman algorithm to address the CPRP problem. Our method leverages BERT for KC text embeddings and redefines the k𝑘kitalic_k-tuple in the directed KC graph. Then, the 2-WL test is implemented to train a permutation-equivariant GNN. With the KC representation from GNN, the Siamese network computes the prediction probability of a KC link. Extensive experiments on three datasets demonstrate the superiority of the proposed method, achieving a more advanced performance than the state-of-the-art approaches of CPRP. Our future work will consider more evaluation results and topological information on the graph.

References

  • Cai et al. (2021) Lei Cai, Jundong Li, Jie Wang, and Shuiwang Ji. Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5103–5113, 2021.
  • Chamberlain et al. (2022) Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M Bronstein, and Max Hansmire. Graph neural networks for link prediction with subgraph sketching. arXiv preprint arXiv:2209.15486, 2022.
  • Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  • Fischer et al. (2020) Christian Fischer, Zachary A Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1):130–160, 2020.
  • Guan et al. (2023) Quanlong Guan, Fang Xiao, Xinghe Cheng, Liangda Fang, Ziliang Chen, Guanliang Chen, and Weiqi Luo. Kg4ex: An explainable knowledge graph-based approach for exercise recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 597–607, 2023.
  • Huang et al. (2022) Zhongyu Huang, Yingheng Wang, Chaozhuo Li, and Huiguang He. Going deeper into permutation-sensitive graph neural networks. In International Conference on Machine Learning, pages 9377–9409. PMLR, 2022.
  • Jia et al. (2021) Chenghao Jia, Yongliang Shen, Yechun Tang, Lu Sun, and Weiming Lu. Heterogeneous graph neural networks for concept prerequisite relation learning in educational data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2036–2047, 2021.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Kumar et al. (2020) Ajay Kumar, Shashank Sheshar Singh, Kuldeep Singh, and Bhaskar Biswas. Link prediction techniques, applications, and performance: A survey. Physica A: Statistical Mechanics and its Applications, 553:124289, 2020.
  • Le et al. (2023) Thanh Le, Ngoc Huynh, and Bac Le. Knowledge graph embedding by projection and rotation on hyperplanes for link prediction. Applied Intelligence, 53(9):10340–10364, 2023.
  • Li et al. (2019) Irene Li, Alexander R Fabbri, Robert R Tung, and Dragomir R Radev. What should i learn first: Introducing lecturebank for nlp education and prerequisite chain learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6674–6681, 2019.
  • Liang et al. (2015) Chen Liang, Zhaohui Wu, Wenyi Huang, and C Lee Giles. Measuring prerequisite relations among concepts. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1668–1674, 2015.
  • Liang et al. (2017) Chen Liang, Jianbo Ye, Zhaohui Wu, Bart Pursel, and C Giles. Recovering concept prerequisite relations from university course dependencies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  • Liang et al. (2018) Chen Liang, Jianbo Ye, Shuting Wang, Bart Pursel, and C Lee Giles. Investigating active learning for concept prerequisite learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  • Long et al. (2022) Yahui Long, Min Wu, Yong Liu, Yuan Fang, Chee Keong Kwoh, **miao Chen, Jiawei Luo, and Xiaoli Li. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics, 38(8):2254–2262, 2022.
  • Mazumder et al. (2023) Debjani Mazumder, Jiaul H Paik, and Anupam Basu. A graph neural network model for concept prerequisite relation extraction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 1787–1796, 2023.
  • Morris et al. (2019) Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
  • Morris et al. (2021) Christopher Morris, Matthias Fey, and Nils M Kriege. The power of the weisfeiler-leman algorithm for machine learning with graphs. arXiv preprint arXiv:2105.05911, 2021.
  • Morris et al. (2022) Christopher Morris, Gaurav Rattan, Sandra Kiefer, and Siamak Ravanbakhsh. Speqnets: Sparsity-aware permutation-equivariant graph networks. In International Conference on Machine Learning, pages 16017–16042. PMLR, 2022.
  • Pan et al. (2017) Liangming Pan, Chengjiang Li, Juanzi Li, and Jie Tang. Prerequisite relation learning for concepts in moocs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1447–1456, 2017.
  • Roy et al. (2019) Sudeshna Roy, Meghana Madhyastha, Sheril Lawrence, and Vaibhav Rajan. Inferring concept prerequisite relations from online educational resources. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 9589–9594, 2019.
  • Salha et al. (2019) Guillaume Salha, Stratis Limnios, Romain Hennequin, Viet-Anh Tran, and Michalis Vazirgiannis. Gravity-inspired graph autoencoders for directed link prediction. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 589–598, 2019.
  • Shi et al. (2020) Daqian Shi, Ting Wang, Hao Xing, and Hao Xu. A learning path recommendation model based on a multidimensional knowledge graph framework for e-learning. Knowledge-Based Systems, 195:105618, 2020.
  • Sun et al. (2022) Hao Sun, Yuntao Li, and Yan Zhang. Conlearn: contextual-knowledge-aware concept prerequisite relation learning with graph neural network. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 118–126. SIAM, 2022.
  • Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 346–353, 2019.
  • Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
  • Yan et al. (2021) Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, and Chao Chen. Link prediction with persistent homology: An interactive view. In International conference on machine learning, pages 11659–11669. PMLR, 2021.