:
\theoremsep
\jmlrvolume1
\jmlryear2023
\jmlrworkshopAAAI Workshop on AI for Education
\editorEditor’s name
Concept Prerequisite Relation Prediction by Using \titlebreakPermutation-Equivariant Directed Graph Neural Networks \titletag††thanks: This work was supported in part by National Natural Science Foundation of China (62272392, U22A2025) and the funding for Teaching & Learning Reform at NPU (2023JGZ14).
Abstract
This paper studies the problem of CPRP, concept prerequisite relation prediction, which is a fundamental task in using AI for education. CPRP is usually formulated into a link-prediction task on a relationship graph of concepts and solved by training the graph neural network (GNN) model. However, current directed GNNs fail to manage graph isomorphism which refers to the invariance of non-isomorphic graphs, reducing the expressivity of resulting representations. We present a permutation-equivariant directed GNN model by introducing the Weisfeiler-Lehman test into directed GNN learning. Our method is then used for CPRP and evaluated on three public datasets. The experimental results show that our model delivers better prediction performance than the state-of-the-art methods.
keywords:
Concept Prerequisite Relation, permutation-equivariant GNNs, Weisfeiler-Lehman Test, Directed Graph Learning, AI for Education.1 Introduction
With the continuous advancement of dissemination methods, an increasing number of educational resources are becoming available for people to learn Fischer et al. (2020). Therefore, finding prerequisite relationships among concepts has become an important issue requiring investigation in the field of AI for education Pan et al. (2017); Roy et al. (2019). Generally, this Concept Prerequisite Relation Prediction task, CPRP, is modeled as the link prediction problem in many studies Sun et al. (2022); Roy et al. (2019). Applications of CRPR involve material recommendation Guan et al. (2023), learning path planning Shi et al. (2020), and optimization of problem-solving paths Le et al. (2023).
There are numerous approaches to solving the link prediction problem, including probabilistic models, spectral clustering, evolutionary algorithms, and deep graph learning models Kumar et al. (2020). Currently, graph neural networks (GNNs) have become the benchmark method and presented state-of-the-art performance in link prediction Cai et al. (2021), including many GNN-based solutions to CPRP Roy et al. (2019); Jia et al. (2021); Sun et al. (2022); Mazumder et al. (2023). To improve the capability of GNNs, Long et al. (2022) proposed to pre-train the node features via the method of graph reconstruction, achieving improved performance on two biological prediction tasks and effectively reducing training costs. Chamberlain et al. (2022) proposed to utilize subgraph sketches to pass messages in subgraph GNNs, delivering higher accuracy and lower computation costs in line prediction tasks. This model mitigated the expressive limitations, such as the inability to count triangles and distinguish automorphic nodes. Persistent homology was also adopted in the work of Yan et al. (2021) to extract topological information from graphs, which was integrated with node features to enhance the expressive power of GNNs for link prediction. Besides, the Weisfeiler-Lehman test was introduced by Morris et al. (2019) to improve the capability of differentiating graph isomorphism in undirected GNN learning Huang et al. (2022).
However, the CPRP problem is usually formulated into the directed-link prediction in a directed graph Sun et al. (2022). Rather than the undirected graph, the edges can indicate the prerequisite relation, describing the flow of information from one node to another. The difference makes undirected GNNs not directly applicable to directed graphs. Hence, Salha et al. (2019) designed a new gravity-inspired decoder to extend the graph Autoencoders, while Wu et al. (2019) adopted two weight matrixes for forward edges and backward edges, respectively, to perform message passing between graph nodes. Nevertheless, there is a lack of studies on improving the expressive power of direct GNNs for CPRP.
In this paper, we extended the Weisfeiler-Lehman test-based GNN model, i.e., SpeqNets recently introduced by Morris et al. (2022), into the directed graphs for CPRP. The proposed framework contributes to learning the permutation-equivariant direct GNNs and improving the prediction performance of CPRP by distinguishing non-isomorphism graphs. Experimental results on three public datasets manifest that our method performs better than the state-of-the-art methods Sun et al. (2022); Jia et al. (2021); Roy et al. (2019).
2 CPRP: Concept Prerequisite Relation Prediction
The problem of CPRP refers to predicting prerequisite relations between knowledge concepts involved in learning Sun et al. (2022). For example, one should learn the knowledge concept (KC) of “conditional probability distribution” before learning “Bayesian theory.” As usual, CPRP can be formulated into the directed-link prediction in a KC graph.
2.1 Problem formulation
Denote by a directed KC graph with a vertex set , where represents a KC in our study, and an edge set , where is the prerequisite relation meaning that the KC is a prerequisite KC . Denote by the adjacency matrix of with each element being 0 or 1. CPRP on can be written into
(1) |
where and are two KCs; is the probability of whether the relation exists; is a representation model that integrates KC information into a vector; the function aims to obtain the existing probability of the prerequisite relation .
2.2 Previous Methods
Early works of CPRP usually extracted handcrafted features for , such as the contextual and structural features Pan et al. (2017), while recent works are focused on designing deep-learning models, such as Siamese networks Roy et al. (2019) and GNNs Sun et al. (2022).
Inspired by the promising performance, we implemented by training a GNN model in this study. GNNs aim to learn node representations in a graph by iteratively aggregating the neighborhood features. In each layer, the feature of the node is updated by merging the information transmitted from its neighbors, expressed as
(2) |
where indicates the feature vector of the node at the -th layer; is the merging function with learned parameters and is the aggregating function with network parameters ; delivers the neighbors of node Wu et al. (2020).
With the node representations from GNNs, many methods can estimate the prerequisite relations by employing similarity metrics or classical classifiers for Liang et al. (2018). However, the previous studies fail to consider the problem of graph isomorphism of the KC graphs, leading to low expressive powers for CPRP.
3 Our CPRP Method
To achieve fine representations of KCs, our method adopts the well-known Weisfeiler-Leman test Morris et al. (2021) to guide the GNN training in the KC graph . With the obtained KC representations, the Siamese network computes the link probability, shown in Fig. 1.
3.1 Weisfeiler-Leman Test
Denote by a k-tuple of vertices in the KC graph , where indicates the first natural numbers; is the -th element of specified to a . Let be the collection of all -tuples from the graph . Weisfeiler-Leman (WL) test is to assign labels to each tuple in and then iteratively relabel these tuples by merging their neighborhood labels. Here, the -th neighborhood of the tuple is yielded by replacing its -th element with every node , i.e., Morris et al. (2021). Based on all neighborhoods of , the WL test usually uses a predefined route to compute a new label for the merged node in the -th iteration, i.e.,
(3) |
where represents the obtained label and indicates a predefined function that maps all tuples to new labels. Then, the iterative labeling for can be expressed as
(4) |
where is to achieve .
The -WL test is a powerful tool for distinguishing isomorphic graphs. Let and be two graphs. If the number of -tuples with a specific label differs between graphs and at any iteration, then the two graphs are non-isomorphic. Specifically, 1-WL employs the 1-hop neighbors for in Eq. (3). With the increase of , the -WL algorithm can become more capable of distinguishing non-isomorphic graphs.
3.2 Main Steps
![Refer to caption](x1.png)
3.2.1 KC Graph Construction
The first step of the proposed method is to achieve the node features for the given directed KC graph using the pre-trained BERT 111https://huggingface.co/bert-large-uncased. More specifically, the textual descriptions of KCs from the datasets or Wikipedia were obtained and fed into BERT to extract the KC embedding for . With the given , we achieved the KC graph Devlin et al. (2018).
3.2.2 Weisfeiler Leman Guided Dircted GNNs
On the resulting graph , we proposed the directed GNN model guided by the -WL test to achieve KC representations in this step. To be different from undirected graphs, we denote by the set of all out-neighbors of the node . For two connected nodes and in a directed graph, if there exists an edge pointing from to , node is defined as an out-neighbor of node . In the proposed method, we redefined the -tuple as follows,
(5) |
The feature representation of -tuple is here given by , where denotes the embedding of node ; denotes the concatenation of vectors. For , the -th out-neighbor of can be cast as:
(6) |
For all -tuples, the neighbor relationships generated on the -th element are used to construct a graph representation as . For all graphs , each node in the graph represents a -tuple in . For all graphs , graph neural networks are constructed separately for training. After each layer of training is completed, the features of the same -tuple on different graphs are fused using multilayer perceptron (MLP). The expression of -tuple at the -th layer is as follows:
(7) |
where is defined as the representation of -tuple at layer t. For , denotes the representation of -tuple in graph after passing through the GCN Kipf and Welling (2016) designed for graph . denotes the multilayer perceptron. After several layers of learning, we obtain the representations of all -tuples.
The representations of -tuples obtained through GNN training are finally distributed to the nodes of using an average allocation scheme, shown as follows:
(8) |
where represents the -th element in the k-tuple . represents the transmission of feature expressions of -tuples back to the representations of concept , based on the -th element in each -tuple. denotes the feature of -tuple . denotes the number of elements in the set . For , all vectors are merged and fed into a multi-layer perceptron to obtain node representations in .
(9) |
where represents the learned concept representations.
3.2.3 Prediction Network
After obtaining the KC representations, the Siamese network is employed here to predict the probability that the concept is a prerequisite concept of concept in Eq. (1). Let and be the representations of and from the two feed-forward networks with shared weights in the Siamese network. The probability between and is achieved via
(10) |
where represents the sigmoid operator and is Hadamard product. Finally, we used the cross-entropy loss to compute the training loss for the deep framework.
4 Experiments
We used LectureBank Li et al. (2019), University Courses Liang et al. (2017), and ML of MOOCs Pan et al. (2017) to evaluate the performance and compare to the state-of-the-art methods, including binary classification models (SVM, LR, NB, RF) Pan et al. (2017), RefD Liang et al. (2015), GAE Li et al. (2019), VGAE Li et al. (2019), PREREQ Roy et al. (2019), CPRL Jia et al. (2021), and Conlearn Sun et al. (2022). Besides, we employed precision, recall, and F1-score as evaluation metrics to measure the performance.
For BERT, we leveraged a combination of course lecture information and Wikipedia for concept description extraction in the MOOC and University Courses datasets. For the LectureBank dataset, we utilized the text information from the Wikipedia URLs of each concept in the dataset. When extracting vectors from texts, the max token size of BERT was set to 256 for all three datasets.
For our method, the parameter of -WL was set to 2. We used Adam as the optimizer with a learning rate of 0.00002 for all experiments. The batch size was set to 256 for the MOOC dataset and LectureBank dataset and 512 for the University Course dataset. The models were trained for 4000 epochs for all experiments until the loss stabilized. As for the baseline methods, we used default parameters as in their original implementations.
For all three datasets, we selected 80% of the concept prerequisite pairs as the training set and 20% of the concept prerequisite pairs as the test set. Negative samples were generated by randomly selecting unrelated phrase pairs from the vocabulary, along with the reverse pairs of the original positive samples. The results are recorded in Table 1.
Datasets | Metric | NB | SVM | LR | RF | RefD | GAE | VGAE | PREREQ | CPRL | ConLearn | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MOOCs | Precision | 0.577 | 0.668 | 0.748 | 0.375 | 0.784 | 0.293 | 0.266 | 0.448 | 0.800 | 0.895 | 0.915 |
Recall | 0.623 | 0.577 | 0.270 | 0.669 | 0.188 | 0.733 | 0.647 | 0.592 | 0.642 | 0.850 | 0.860 | |
F1-score | 0.599 | 0.619 | 0.397 | 0.481 | 0.303 | 0.419 | 0.377 | 0.510 | 0.712 | 0.872 | 0.887 | |
LectureBank | Precision | 0.670 | 0.857 | 0.744 | 0.855 | 0.666 | 0.462 | 0.417 | 0.590 | 0.861 | 0.831 | 0.857 |
Recall | 0.640 | 0.692 | 0.744 | 0.681 | 0.228 | 0.811 | 0.575 | 0.502 | 0.858 | 0.960 | 0.960 | |
F1-score | 0.655 | 0.766 | 0.744 | 0.758 | 0.339 | 0.589 | 0.484 | 0.543 | 0.860 | 0.891 | 0.906 | |
University Courses | Precision | 0.478 | 0.796 | 0.595 | 0.739 | 0.919 | 0.450 | 0.470 | 0.468 | 0.689 | 0.611 | 0.822 |
Recall | 0.649 | 0.635 | 0.546 | 0.480 | 0.415 | 0.886 | 0.694 | 0.916 | 0.760 | 0.966 | 0.74 | |
F1-score | 0.550 | 0.707 | 0.569 | 0.582 | 0.572 | 0.597 | 0.560 | 0.597 | 0.723 | 0.749 | 0.778 |
From Table 1, we can draw the following observations: the four binary classifiers, i.e., NB, SVM, LR, and RF, perform weak on the three datasets due to hand-crafted features, as well as RefD; GAE, VGAE, and PREREQ exploit the prerequisite relation information between KCs, resulting in improved performance; the CPRL method fails to delve into the textual information behind the concepts, leading to better performance; finally, both our algorithm and ConLearn extract prior information using the large-scale language model BERT and yield the best evaluations. Importantly, compared to ConLearn, our method uses 2-WL to integrate the structural information of the graph deeply, resulting in the best performance in terms of F1-score. All observations manifest that the introduced WL test into direct GNN is effective for CPRP.
5 Conclusion
This paper proposes a directed graph neural network based on the Weisfeiler-Leman algorithm to address the CPRP problem. Our method leverages BERT for KC text embeddings and redefines the -tuple in the directed KC graph. Then, the 2-WL test is implemented to train a permutation-equivariant GNN. With the KC representation from GNN, the Siamese network computes the prediction probability of a KC link. Extensive experiments on three datasets demonstrate the superiority of the proposed method, achieving a more advanced performance than the state-of-the-art approaches of CPRP. Our future work will consider more evaluation results and topological information on the graph.
References
- Cai et al. (2021) Lei Cai, Jundong Li, Jie Wang, and Shuiwang Ji. Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5103–5113, 2021.
- Chamberlain et al. (2022) Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M Bronstein, and Max Hansmire. Graph neural networks for link prediction with subgraph sketching. arXiv preprint arXiv:2209.15486, 2022.
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Fischer et al. (2020) Christian Fischer, Zachary A Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1):130–160, 2020.
- Guan et al. (2023) Quanlong Guan, Fang Xiao, Xinghe Cheng, Liangda Fang, Ziliang Chen, Guanliang Chen, and Weiqi Luo. Kg4ex: An explainable knowledge graph-based approach for exercise recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 597–607, 2023.
- Huang et al. (2022) Zhongyu Huang, Yingheng Wang, Chaozhuo Li, and Huiguang He. Going deeper into permutation-sensitive graph neural networks. In International Conference on Machine Learning, pages 9377–9409. PMLR, 2022.
- Jia et al. (2021) Chenghao Jia, Yongliang Shen, Yechun Tang, Lu Sun, and Weiming Lu. Heterogeneous graph neural networks for concept prerequisite relation learning in educational data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2036–2047, 2021.
- Kipf and Welling (2016) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Kumar et al. (2020) Ajay Kumar, Shashank Sheshar Singh, Kuldeep Singh, and Bhaskar Biswas. Link prediction techniques, applications, and performance: A survey. Physica A: Statistical Mechanics and its Applications, 553:124289, 2020.
- Le et al. (2023) Thanh Le, Ngoc Huynh, and Bac Le. Knowledge graph embedding by projection and rotation on hyperplanes for link prediction. Applied Intelligence, 53(9):10340–10364, 2023.
- Li et al. (2019) Irene Li, Alexander R Fabbri, Robert R Tung, and Dragomir R Radev. What should i learn first: Introducing lecturebank for nlp education and prerequisite chain learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6674–6681, 2019.
- Liang et al. (2015) Chen Liang, Zhaohui Wu, Wenyi Huang, and C Lee Giles. Measuring prerequisite relations among concepts. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1668–1674, 2015.
- Liang et al. (2017) Chen Liang, Jianbo Ye, Zhaohui Wu, Bart Pursel, and C Giles. Recovering concept prerequisite relations from university course dependencies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Liang et al. (2018) Chen Liang, Jianbo Ye, Shuting Wang, Bart Pursel, and C Lee Giles. Investigating active learning for concept prerequisite learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Long et al. (2022) Yahui Long, Min Wu, Yong Liu, Yuan Fang, Chee Keong Kwoh, **miao Chen, Jiawei Luo, and Xiaoli Li. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics, 38(8):2254–2262, 2022.
- Mazumder et al. (2023) Debjani Mazumder, Jiaul H Paik, and Anupam Basu. A graph neural network model for concept prerequisite relation extraction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 1787–1796, 2023.
- Morris et al. (2019) Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
- Morris et al. (2021) Christopher Morris, Matthias Fey, and Nils M Kriege. The power of the weisfeiler-leman algorithm for machine learning with graphs. arXiv preprint arXiv:2105.05911, 2021.
- Morris et al. (2022) Christopher Morris, Gaurav Rattan, Sandra Kiefer, and Siamak Ravanbakhsh. Speqnets: Sparsity-aware permutation-equivariant graph networks. In International Conference on Machine Learning, pages 16017–16042. PMLR, 2022.
- Pan et al. (2017) Liangming Pan, Chengjiang Li, Juanzi Li, and Jie Tang. Prerequisite relation learning for concepts in moocs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1447–1456, 2017.
- Roy et al. (2019) Sudeshna Roy, Meghana Madhyastha, Sheril Lawrence, and Vaibhav Rajan. Inferring concept prerequisite relations from online educational resources. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 9589–9594, 2019.
- Salha et al. (2019) Guillaume Salha, Stratis Limnios, Romain Hennequin, Viet-Anh Tran, and Michalis Vazirgiannis. Gravity-inspired graph autoencoders for directed link prediction. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 589–598, 2019.
- Shi et al. (2020) Daqian Shi, Ting Wang, Hao Xing, and Hao Xu. A learning path recommendation model based on a multidimensional knowledge graph framework for e-learning. Knowledge-Based Systems, 195:105618, 2020.
- Sun et al. (2022) Hao Sun, Yuntao Li, and Yan Zhang. Conlearn: contextual-knowledge-aware concept prerequisite relation learning with graph neural network. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 118–126. SIAM, 2022.
- Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 346–353, 2019.
- Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
- Yan et al. (2021) Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, and Chao Chen. Link prediction with persistent homology: An interactive view. In International conference on machine learning, pages 11659–11669. PMLR, 2021.