Differentiable Reasoning about Knowledge Graphs
with Region-based Graph Neural Networks
Abstract
Methods for knowledge graph (KG) completion need to capture semantic regularities and use these regularities to infer plausible knowledge that is not explicitly stated. Most embedding-based methods are opaque in the kinds of regularities they can capture, although region-based KG embedding models have emerged as a more transparent alternative. By modeling relations as geometric regions in high-dimensional vector spaces, such models can explicitly capture semantic regularities in terms of the spatial arrangement of these regions. Unfortunately, existing region-based approaches are severely limited in the kinds of rules they can capture. We argue that this limitation arises because the considered regions are defined as the Cartesian product of two-dimensional regions. As an alternative, in this paper, we propose ReshufflE, a simple model based on ordering constraints that can faithfully capture a much larger class of rule bases than existing approaches. Moreover, the embeddings in our framework can be learned by a monotonic Graph Neural Network (GNN), which effectively acts as a differentiable rule base. This approach has the important advantage that embeddings can be easily updated as new knowledge is added to the KG. At the same time, since the resulting representations can be used similarly to standard KG embeddings, our approach is significantly more efficient than existing approaches to differentiable reasoning.
1 Introduction
Knowledge graph (KG) embedding models learn geometric representations of knowledge graphs, with the aim of capturing regularities in the available knowledge. These representations can then be used to infer plausible knowledge that is not explicitly stated in the KG. An important research question is concerned with the kinds of regularities that can be captured by different kinds of models. While standard approaches are often difficult to analyse from this perspective, region-based embedding models aim to make these regularities more explicit. Essentially, in such approaches, each entity e is represented by an embedding and each relation is represented by a geometric region . We say that the triple is captured by the embedding iff , where we write for vector concatenation. In this way, we can naturally associate a KG with a given embedding. The advantage of region-based models is that we can similarly also associate a rule base with the embedding, where the rules reflect the spatial configuration of the regions . However, not all rule bases can be captured in this way. As a simple example, models based on TransE (?) cannot distinguish between the rules and .
This particular limitation can be avoided by using more sophisticated region-based models (?; ?), but even these models remain limited in terms of which rule bases they can capture. The underlying limitation seems to be related to the fact that these models use regions which are the Cartesian product of two-dimensional regions, i.e. , with . To check whether is captured, we then check whether for each , with and . We will refer to such approaches as coordinate-wise models. Existing models thus primarily differ in how these two-dimensional regions are defined, e.g. ExpressivE (?) uses parallelograms for this purpose, while ? (?) used octagons. While it is, in principle, possible to use more flexible region-based representations, this typically leads to overfitting. In this paper, we go beyond coordinate-wise models but aim to avoid overfitting by otherwise kee** the model as simple as possible: we essentially learn regions , which are defined in terms of ordering constraints of the form .
Our main contributions are two-fold. First, we show that, despite its simplicity, the proposed model can capture a large class of rule bases, thus overcoming some of the limitations of existing region-based models. In fact, if we only consider consequences that can be inferred using a bounded number of inference steps, our model is capable of faithfully capturing arbitrary sets of closed path rules. Second, we show that knowledge graph embeddings in our framework can be learned using a monotonic Graph Neural Network (GNN) with randomly initialised node embeddings. This GNN effectively serves as a differentiable approximation of a rule base, acting on the initial representations of the entities to ensure that they capture the consequences that can be inferred from the KG. An important practical consequence is that our KG embeddings can be efficiently updated when new knowledge becomes available. Thus, our model is particularly well suited for KG completion in the inductive setting, where we need to predict links between entities that were not seen during training. Moreover, whereas existing inductive KG completion methods tend to be computationally expensive, e.g. by requiring one (?) or even many (?) forward passes of a GNN model for each query, our approach retains the advantage of KG embeddings, where the plausibility of a triple can be checked almost instantaneously.
2 Related Work
Region-based Models
Despite the vast amount of work on KG embedding models in the last decade, the reasoning abilities of most existing models are poorly understood. The main exception comes from a line of work that has focused on region-based representations (?; ?; ?; ?; ?; ?). Essentially, the region-based view makes explicit what triples and rules are captured by a given embedding. This allows us to study what kinds of semantic dependencies a given model is capable of capturing, which is important for ensuring that models have the right inductive bias, especially for settings where reasoning is important. Existing work has uncovered various limitations of existing models. For instance, ? (?) revealed that bilinear models such as RESCAL (?), DistMult (?), TuckER (?) and ComplEx (?) cannot capture relation hierarchies in a faithful way. They furthermore found that models that represent relations using convex regions have inherent limitations when it comes to modelling disjointness. However, such models were found to be capable of modelling arbitrary sets of closed path rules (and even more general classes of rule bases, involving existentials in the head and relations of different arity). In practice, learning arbitrary convex polytopes is not feasible in high-dimensional spaces. Practical region-based embedding models therefore focus on much simpler classes of regions, such as Cartesian products of boxes (?), cones (?; ?), parallelograms (?) and octagons (?). This makes the models easier to learn but limits the kinds of rules that they can capture. While the use of parallelograms and octagons makes it possible to capture arbitrary closed path rules, in practice we want to capture sets of such rules. This is only known to be possible under rather restrictive conditions (see Section 3).
Inductive KG Completion
Standard benchmarks for KG completion can only evaluate the reasoning abilities of models to a limited extent. For instance, BoxE (?) achieves strong results on these benchmarks, despite provably being incapable of modelling simple rules such as . In this paper, we will therefore instead focus on the problem of inductive KG completion (?). In the inductive setting, we need to predict links between entities that are different from those that were seen during training. In particular, there is no overlap between the entities that occur in the KG that was used for training and the one that is used for testing (although the relations are the same in both KGs). To perform this task, models need to learn semantic dependencies between the relations, and then exploit this knowledge when making predictions. This can be achieved in different ways. A natural strategy is to learn rules from the training KG, either explicitly using a model such as AnyBURL (?) or implicitly using differentiable rule learners such as Neural-LP (?) or DRUM (?). The latter essentially approximate rule applications using tensor multiplications. In practice, better results have been obtained using GNNs. For instance, some approaches (?) reduce the problem of link prediction to a graph classification problem. They first construct a subgraph containing paths connecting the head entity with some candidate tail entity, and then use a GNN to predict a score from this subgraph. Such approaches suffer from limited scalability, as answering a link prediction query requires constructing and processing such a subgraph for each candidate tail entity. NBFNet (?) alleviates this limitation, by using a single GNN that processes the entire graph. The resulting node embeddings can then be used to score the different candidate tail entities. However, the node embeddings are query-specific, meaning that this model still requires a new forward pass of the GNN for each query, which is considerably less efficient than using KG embeddings.
While we use a GNN for computing entity embeddings, once these embeddings have been learned, we can use them to answer arbitrary link prediction queries. Our method is thus considerably more efficient than the aforementioned GNN-based models for inductive KG completion. ReFactor GNN (?) similarly uses a GNN to learn entity embeddings, by simulating the training dynamic of traditional KG embedding methods such as TransE (?). However, their method has the disadvantage that all embeddings have to be recomputed when new triples are added to the KG. Moreover, their model inherits the limitations of traditional embedding models when it comes to faithfully modelling rules. Conceptually, our method has more in common with differentiable rule learning methods than with subgraph classification strategies. Indeed, each layer of the GNN updates the entity embeddings by essentially simulating the application of rules. Moreover, our model can simulate the deductive chaining of rules, which makes it fundamentally different from Neural-LP and DRUM, which focus on one-off rule application.
3 Problem Setting
Let be a set of relations, a set of entities, and a knowledge graph. Similar to standard KG embedding models, our aim is to learn a vector space representation for every entity and a scoring function for every relation such that reflects the plausibility of the triple . In the case of region-based models, the scoring function is defined in terms of a geometric region . Specifically, the triple is then considered to be captured by the embedding iff , where we write to denote vector concatenation. Accordingly, the scoring function then reflects how close is to the region (which is formalised in different ways by different models).
A key advantage of region-based models is that they offer a mechanism for modelling rules. Let us write to denote a given region-based embedding, i.e. denotes the embedding of the entity and denotes the region representing the relation . Let us consider a rule of the following form:
(1) | ||||
We say that captures this rule if for all vectors we have:
(2) | |||
Rules of the form (1) are known as closed path rules. Region-based embeddings can similarly capture other kinds of rules, such as intersection rules of the form . However, we will specifically focus on closed path rules in this paper, due to their importance for KG completion. For instance, most rule-based methods for KG completion focus on learning rules of this type (?). Moreover, existing region-based models have particular limitations when it comes to capturing this kind of rules. Some approaches, such as BoxE (?) are not capable of capturing such rules at all. More recent approaches (?; ?) are capable of capturing individual closed path rules, but they are limited when it comes to jointly capturing a set of such rules.
Specifically, given a set of closed path rules , we ideally want an embedding that captures every rule in while not capturing any rules that are not entailed by . ? (?) showed this to be possible, provided that every rule entailed from is either a trivial rule such as or a rule of the form (1) in which are all distinct relations. For instance, rules of the form were not allowed in their construction. They also provided a counterexample, which shows that without this restriction, it is not always possible to faithfully capture rule bases with octagon embeddings (without also capturing rules that are not entailed by the given rule base). ? (?) did not study the problem of capturing sets of rules, but their model is likely to suffer from similar limitations.
In the following, we write to denote that the triple can be entailed from the rule base and the knowledge graph . More precisely, we have iff either or contains a rule of the form (1) such that , , …, for some entities . We furthermore write for a rule of the form (1) to denote that entails w.r.t. the standard notion of entailment from propositional logic (when interpreting rules in terms of material implication). Note that while we consider both a knowledge graph and a rule base in our analysis, in practice only the knowledge graph is given. We study whether our model is capable of capturing the rule base because this is a necessary condition to allow it to learn semantic dependencies in the form of rules.
4 Model Description
Our aim is to develop a model that can capture a larger class of rule bases than existing region-based models. Furthermore, we want the embeddings to be defined such that they can be efficiently updated whenever new knowledge becomes available.
Ordering Constraints
The central idea is to rely on ordering constraints. Specifically, we model each relation using a region of the following form: iff
(3) |
where , and we assume and . The following example illustrates why the use of ordering constraints is well-suited for modelling rules.
Example 1.
Consider a rule of the form . This rule is captured by an embedding of the form (3) if for each we have that , and . Indeed, if these conditions are satisfied and we have and in , then for each we have the following constraint:
Since we assumed it follows that for every and thus that the embedding captures the triple .
We will come back to the analysis of how rules can be modelled using ordering constraints in the next section. We now turn our focus to how (a differentiable approximation of) the ordering constraints can be learned. Note that we can characterise (3) as follows:
(4) |
where the maximum is applied component-wise and the matrix is constrained such that (i) all components are either 0 or 1 and (ii) at most one component in each row is non-zero. This characterisation suggests how our embeddings can be learned using a GNN, as we explain next.
Learning Embeddings with GNNs
Let us write for the representation of entity in layer of the GNN. The embeddings are initialised randomly, ensuring that all coordinates are non-negative, the coordinates of different entity embeddings are sampled independently, and there are at least two distinct values that have a non-negative probability of being sampled for each coordinate. We use a simple message-passing GNN of the following form:
(5) |
where the matrices are constrained as before. Due to this constraint, the GNN converges after a finite number of steps to embeddings satisfying for each .
Representing Entities as Matrices
Since the model relies on randomly initialised entity embeddings, the dimensionality of the entity embeddings needs to be sufficiently high. At the same time, the number of parameters that have to be learned for each relation should be sufficiently low to prevent overfitting. For this reason, we learn matrices of the following form:
(6) |
where we write for the Kronecker product, is the -dimensional identity matrix and is an matrix, with . To make the computation of the GNN updates more efficient, we can then represent each entity using a matrix and compute updates as follows:
(7) |
It is easy to verify that this model is equivalent to (5) when each matrix is constrained to be of the form (6). Specifically, the matrix corresponding to the entity embedding is defined as , with and . Note that a triple is then supported by the embeddings at layer if:
where denotes that .
Model Details
To learn the matrix , we choose each row as the first coordinates of the vector , where are learnable parameters. Note that we need parameters for this softmax operation to allow for the possibility of some rows to be all 0s. Furthermore, note that while we conceptually think of as binary matrices, in practice, we need to approximate such matrices to make learning possible. To initialise the entity embeddings, we set each coordinate to 0 or 1, with 50% probability. To train the model, we use the following scoring function for a given triple :
where denotes the number of GNN layers. Note that reaches its maximal value of 0 iff . For each we add an inverse triple to . For each entity , we also add the triple to , which corresponds to the common practice of adding self-loops to the GNN. Following the literature (?; ?), ReshufflE’s training process uses negative sampling under the partial completeness assumption (PCA) (?), i.e., for each training triple , triples (negative samples) are created by replacing or in by randomly sampled entities . To train ReshufflE, we minimise the margin ranking loss, defined as follows:
(8) |
where is the ith negative sample and is a hyper-parameter, called the margin. At an intuitive level, the margin ranking loss pushes scores of true triples (i.e., those within the training graph) to be larger by at least than the scores of triples that are likely false (i.e., negative samples).
5 Constructing GNNs from Rule Graphs
Consider a finite set of closed path rules of the form (1). We now study the following question: Can parameters be found for the proposed GNN model (i.e. the matrices ) such that the rules in are captured, and no rules which are not entailed by . Rather than constructing the matrices directly, we first introduce the notion of a rule graph, which will serve as a convenient abstraction of the considered GNNs. We then explain how we can construct the matrices from a given rule graph. Throughout this paper, we will assume that contains the triple for every . We will also assume that the relation eq does not appear in the rule base .
Rule Graphs
We will encode the rule base as a labelled multi-graph , i.e. a set of triples . Note that this graph is formally equivalent to a knowledge graph, but the nodes in this case do not correspond to entities. A path in from to is a sequence of triples of the form . The type of this path is given by the sequence of relations . The eq-reduced type of the path is obtained by removing all occurrences of the relation eq in . For instance, for a path of type , the eq-reduced type is .
Definition 1.
A rule graph for a given rule base is a labelled multi-graph, where the labels are taken from , such that the following properties are satisfied:
- (R1)
-
For every relation , there is some edge in labelled with .
- (R2)
-
For every node in and every , it holds that has at most one incoming edge labelled with .
- (R3)
-
Suppose there is an edge in with label from node to node . Suppose furthermore that . Then there is a path in from to whose eq-reduced type is .
- (R4)
-
Suppose for every two nodes connected by an edge with label , there is a path connecting these two nodes whose eq-reduced type belongs to . Then there is some such that that .
This definition reflects the fact that a rule is captured when the ordering constraints associated with its body entail the ordering constraints associated with its head, as was illustrated in Example 1. Specifically, this requirement is captured by condition (R3). Condition (R4) is needed to ensure that only the rules in are captured. Conditions (R1) and (R2) are needed because, in the construction we consider below, the nodes of the rule graph will correspond to the rows of the matrices . Condition (R1) will then ensure that contains at least one non-zero component for each relation , while (R2) will ensure that each row of has at most one non-zero component.
Example 2.
Constructing GNNs
Given a rule graph , we define the corresponding parameters of the GNN as follows. Specifically, we need to define the matrix for every . Each node from the rule graph is associated with one row of . Let be an enumeration of the nodes in the rule graph. The corresponding matrix is defined as:
Note that because of condition (R2), there will be at most one non-zero element in each row of , in accordance with the assumptions that we made in Section 4.
The following result shows that the constructed GNN indeed captures all the rules from . Specifically, we show that the embeddings which are learned by the GNN (upon convergence) capture all triples that are entailed by .
Proposition 1.
Let be a rule base and a knowledge graph. Suppose . Let be a rule graph for and let be the entity representations that are learned by the corresponding GNN. Assume for every entity (). It holds that .
We also need to show that the GNN does not capture rules which are not entailed by . However, for any given triple there is always a chance that it is captured by the learned embeddings, even if , due to the fact that the entity embeddings are initialised randomly. However, by choosing to be sufficiently large, we can make the probability of this happening arbitrarily small.
Proposition 2.
Let be a rule base and a knowledge graph. Let be a rule graph for and let be the entity representations that are learned by the corresponding GNN. For any , there exists some such that, when , for any and such that , we have
6 Constructing Rule Graphs
An important question is whether it is always possible, given a set of closed path rules , to construct a corresponding rule graph satisfying conditions (R1)–(R4). For rule bases where a relation appearing in the head of a rule never appears in the body of some rule, this is clearly the case. The following example illustrates how rule graphs can sometimes be constructed for rule bases which encode cyclic dependencies between the relations.
Example 3.
However, there exist rule bases for which no valid rule graph can be found. This is illustrated in the next example.
Example 4.
Let contain the following rule:
To see why this rule base cannot be modelled using a rule graph, consider the following knowledge graph :
We have that only if the number of repetitions of at the start of the sequence matches the number of repetitions at the end. However, this requirement cannot be encoded using a rule graph.
The argument from the previous example can be formalised as follows. Let be a set of closed path rules. Let be the set of relations from that appear in the head of some rule in . For any , we can consider a context-free grammar with two types of production rules:
-
•
For each rule of the form (1), there is a production rule .
-
•
For each , there is a production rule .
The elements of are viewed as terminal symbols, those in are seen as non-terminal symbols, and is used as the starting symbol. Let us write for the corresponding language.
Proposition 3.
Let be a set of closed path rules and suppose that there exists a rule graph for . Let be the set of relations that appear in the head of some rule in . It holds that the language is regular for every .
This result shows that we cannot capture arbitrary rule bases using rule graphs. For instance, for the rule base from Example 4, we have , where we write for the string that consists of repetitions of . It is well-known that the language is not regular, hence it follows from Proposition 3 that no rule graph exists for this rule base. We address this issue in two different ways. First, in Section 6.1, we introduce a construction for a special class of rule bases, inspired by regular grammars. Second, in Section 6.2, we focus on the practically important setting of bounded inference: since GNNs use a fixed number of layers in practice, what mostly matters is what can be derived in a bounded number of steps. It turns out that if we only care about such inferences, we can capture arbitrary sets of closed path rules.
6.1 Left-Regular Rule Bases
We now introduce the notion of a left-regular rule base, which closely corresponds to the notion of left-regular grammar. As we will see, for left-regular rule bases we can always construct a valid rule graph. This, in turn, means that our model is capable of faithfully capturing such rule bases.
Definition 2.
Let be a rule base. Let be the set of relations that appear in the head of a rule from . We call left-regular if every rule is of the following form:
(9) |
such that .
Note that even though we only consider rules of the form (9) for the purpose of the construction below, rules with more than two atoms can straightforwardly be simulated by introducing fresh relations. Given a left-regular rule base , we construct the corresponding rule graph as follows.
-
1.
We add the node .
-
2.
For each relation , we add a node , and we connect to with an -edge.
-
3.
For each rule of the form (9), we add an -edge from to .
-
4.
For each node with multiple incoming -edges for some , we do the following. Let be the number of incoming -edges for node . Let . We create fresh nodes and add eq-edges from to (), where we define . Let be such that . Let be the nodes with an -link to ; then we have . For each we replace the edge from to by an edge from to .
We now illustrate the construction process with an example.
Example 5.
Let contain the following rules:
The corresponding rule graph is depicted in Figure 3. The nodes and were introduced in step 4 of the construction process. Before this step, there were -edges from to and from to . The node thus had three incoming -edges, which violates condition (R2). This is addressed through the use of eq edges in step 4.
Note that the rule graph may have loops, as illustrated next.
Example 6.
The proposed construction process clearly terminates after a finite number of steps. The following proposition shows that it constructs a valid rule graph for .
Proposition 4.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. It holds that satisfies (R1)–(R4).
6.2 Bounded Inference
In practice, the GNN can only carry out a finite number of inference steps. Rather than requiring that the resulting embeddings capture all triples that can be inferred from , it is natural to merely require that the result captures all triples that can be inferred using a bounded number of inference steps. As before, we assume that contains rules of the form (9), but we no longer require that . We know from Proposition 3 that it is then not always possible to construct a valid rule graph. To address this, we will weaken the notion of a rule graph, aiming to capture reasoning up to a fixed number of inference steps.
Let us write to denote that can be derived from in steps. More precisely:
-
•
iff .
-
•
, for , iff or there is a rule in and an entity such that and , with .
Definition 3.
Let . We call an -bounded rule graph for if satisfies conditions (R1)–(R3) as well as the following weakening of (R4):
- (R4m)
-
Suppose for every two nodes connected by an edge with label , there is a path connecting these two nodes whose eq-reduced type belongs to , with . Then there is some such that that .
Given an -bounded rule graph, we can construct a corresponding GNN in the same way as in Section 5. Moreover, Proposition 1 remains valid for -bounded rule graphs, as its proof does not depend on (R4). Proposition 2 can be weakened as follows.
Proposition 5.
Let be a rule base and a knowledge graph. Let be an -bounded rule graph for and let be the entity representations that are learned by the corresponding GNN. For any , there exists some such that, when , for any and such that , we have
Given a set of closed path rules we can construct an -bounded rule graph as follows.
-
1.
We add the node .
-
2.
For each relation , we add a node , and we connect to with an -edge.
-
3.
We repeat the following until convergence. Let and assume there is an -edge from to . Let be a rule from and suppose that there is no path connecting and . Suppose furthermore that the edge is on some path from to a node , with whose length is at most . We add a fresh node to the rule graph, an -edge from to , and an -edge from to .
-
4.
For each and -edge such that for some rule from there is no path connecting and , we do the following:
-
(a)
We add a fresh node , an -edge from to and an -edge from to .
-
(b)
We repeat the following until convergence. For each -edge from to and each rule from , we add an edge from to and an -loop to (if no such edges/loops exist yet).
-
(c)
We repeat the following until convergence. For each -edge from to and each rule from , we add an -loop to and an -edge from to (if no such edges/loops exist yet).
-
(d)
We repeat the following until convergence. For each -loop at , and each rule from , we add an -loop and an -loop to (if no such loops exist yet).
-
(a)
-
5.
For each node with multiple incoming -edges for one or more relations from , we do the following. Let be the number of incoming -edges for node . Let . We create fresh nodes and add eq-edges from to (), where we define . Let be such that . Let be the nodes with an -link to ; then we have . For each we replace the edge from to by an edge from to .
We illustrate the construction process with two examples.
Example 7.
Let us consider the following set of rules:
The corresponding -bounded rule graph is shown in Fig. 5.
Example 8.
Let us consider the following set of rules:
The corresponding -bounded rule graph is shown in Fig. 6. Note how this graph is in fact also a rule graph: due to the fact that there are no cyclic dependencies in the rule base is equivalent with .
The construction process clearly terminates after a finite number of steps. Indeed, only edges that are on a path of length are expanded in step 3, and given that there are only finitely many such paths, step 3 must terminate. It is also straightforward to see that the other steps must terminate. As the following proposition shows, the proposed process indeed constructs an -bounded rule graph.
Proposition 6.
Let be a set of closed path rules and let be the graph obtained using the proposed construction method for -bounded rule graphs. It holds that satisfies (R1)–(R3) and (R4m).
FB15k-237 | v1 | 180 | 1594 | 5226 | 142 | 1093 | 2404 |
---|---|---|---|---|---|---|---|
v2 | 200 | 2608 | 12085 | 172 | 1660 | 5092 | |
v3 | 215 | 3668 | 22394 | 183 | 2501 | 9137 | |
v4 | 219 | 4707 | 33916 | 200 | 3051 | 14554 | |
WN18RR | v1 | 9 | 2746 | 6678 | 8 | 922 | 1991 |
v2 | 10 | 6954 | 18968 | 10 | 2757 | 4863 | |
v3 | 11 | 12078 | 32150 | 11 | 5084 | 7470 | |
v4 | 9 | 3861 | 9842 | 9 | 7084 | 15157 | |
NELL-995 | v1 | 14 | 3103 | 5540 | 14 | 225 | 1034 |
v2 | 88 | 2564 | 10109 | 79 | 2086 | 5521 | |
v3 | 142 | 4647 | 20117 | 122 | 3566 | 9668 | |
v4 | 76 | 2092 | 9289 | 61 | 2795 | 8520 |
FB15k-237 | WN18RR | NELL-995 | |||||||||||
v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | ||
GNN | CoMPILE | 0.676 | 0.829 | 0.846 | 0.874 | 0.836 | 0.798 | 0.606 | 0.754 | 0.583 | 0.938 | 0.927 | 0.751 |
GraIL | 0.642 | 0.818 | 0.828 | 0.893 | 0.825 | 0.787 | 0.584 | 0.734 | 0.595 | 0.933 | 0.914 | 0.732 | |
NBFNet | 0.845 | 0.949 | 0.946 | 0.947 | 0.946 | 0.897 | 0.904 | 0.889 | 0.644 | 0.953 | 0.967 | 0.928 | |
Rule | RuleN | 0.498 | 0.778 | 0.877 | 0.856 | 0.809 | 0.782 | 0.534 | 0.716 | 0.535 | 0.818 | 0.773 | 0.614 |
AnyBURL | 0.604 | 0.823 | 0.847 | 0.849 | 0.867 | 0.828 | 0.656 | 0.796 | 0.683 | 0.835 | 0.798 | 0.652 | |
Diff-R | DRUM | 0.529 | 0.587 | 0.529 | 0.559 | 0.744 | 0.689 | 0.462 | 0.671 | 0.194 | 0.786 | 0.827 | 0.806 |
Neural-LP | 0.529 | 0.589 | 0.529 | 0.559 | 0.744 | 0.689 | 0.462 | 0.671 | 0.408 | 0.787 | 0.827 | 0.806 | |
ReshufflE | 0.747 | 0.885 | 0.903 | 0.918 | 0.710 | 0.729 | 0.602 | 0.694 | 0.638 | 0.861 | 0.882 | 0.812 |
7 Experimental Results
We now empirically evaluate the effectiveness of the proposed model. We focus on inductive KG completion, as the need to capture reasoning patterns is intuitively more important for this setting compared to the traditional (i.e. transductive) setting. Our model has significant practical advantages compared to the state-of-the-art models. For instance, by only comparing the learned embeddings at query time, it is significantly more efficient than approaches that use GNNs for evaluating queries. Moreover, by using a monotonic GNN, our embeddings can straightforwardly be updated when new knowledge becomes available. As such, our main interest is to see whether our model can be competitive in terms of link prediction performance rather than expecting it to improve the state-of-the-art in this respect.
Datasets
We evaluate ReshufflE on the three standard benchmarks for inductive knowledge graph completion (KGC) that were derived by ? (?) from the datasets: FB15k-237, WN18RR, and NELL-995. Each of these inductive benchmarks contains four different dataset variants, named v1 to v4, and each of these variants consists of two graphs (the training and testing graph) that are sampled from the original dataset as follows. The training graph was obtained by randomly sampling different numbers of entities and selecting their -hop neighbourhoods. Next, to construct a disjoint testing graph , the entities of were removed from the initial graph, and the same sampling procedure was repeated. Each of these graphs was split into a train set (), validation set (), and test set (). Thus, the three inductive benchmarks consist in total of twelve datasets: FB15k-237 v1-4, WN18RR v1-4, and NELL-995 v1-4. Furthermore, each of these datasets consists of six graphs: the train, validation, and test splits of and . Table 1 states the entity, relation, and triple counts of each graph. The supplementary materials provide additional information about these benchmarks, such as their origins and licenses.
Experimental Setup
Following ? (?), we train ReshufflE on the train split of , tune our model’s hyper-parameters on the validation split of , and finally evaluate the performance of the best model on the test split of . As discussed by ? (?), some approaches in the literature have been evaluated in different ways, e.g. by tuning hyper-parameters on the validation split of , and their reported results are thus not directly comparable. ReshufflE is trained on an NVIDIA Tesla V100 PCIe 32 GB GPU. We train ReshufflE for up to epochs, minimizing the margin ranking loss (see Equation 8) with the Adam optimiser (?). If the Hits@10 score on the validation split of does not increase by at least within epochs, we stop the training early. To account for small performance fluctuations, we repeat our experiments three times and report ReshufflE’s average performance.111Results for all seeds and the resulting standard deviations are provided in the supplementary materials. For the final evaluation, we select the hyper-parameter configuration with the highest Hits@10 score on the validation split of . In accordance with ? (?), we evaluate ReshufflE’s test performance on negatively sampled entities per triple of the test split of and report the Hits@10 scores. We list further details about the experimental setup in the supplementary materials. To facilitate ReshufflE’s reuse by our community, we will provide its source code in a public GitHub repository upon acceptance of our paper.
Baselines
As the analysis in Sections 5 and 6 reveals, our GNN model acts as a kind of differentiable rule base. We therefore compare ReshufflE to existing approaches for differentiable rule learning: Neural-LP (?) and DRUM (?). We also compare our method to two classical rule learning methods: RuleN (?) and AnyBURL (?). Finally, we include a comparison with GNN-based approaches: CoMPILE (?), GraIL (?), and NBFNet (?).
FB15k-237 | WN18RR | NELL-995 | ||||||||||
v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | |
ReshufflE2 | 0.304 | 0.569 | 0.385 | 0.916 | 0.293 | 0.309 | 0.155 | 0.270 | 0.488 | 0.558 | 0.334 | 0.370 |
ReshufflEnL | 0.744 | 0.890 | 0.903 | 0.917 | 0.698 | 0.685 | 0.618 | 0.682 | 0.627 | 0.738 | 0.886 | 0.815 |
ReshufflE | 0.747 | 0.885 | 0.903 | 0.918 | 0.710 | 0.729 | 0.602 | 0.694 | 0.638 | 0.861 | 0.882 | 0.812 |
Inductive KGC Results
Table 2 reports the performance of ReshufflE on the inductive benchmarks. The results of ReshufflE were obtained by us; AnyBURL and NBFNet results are from ? (?); Neural-LP, DRUM, RuleN, and GraIL results are from ? (?); and CoMPILE results are from ? (?). Table 2 reveals that ReshufflE consistently outperforms the differentiable rule learners DRUM and Neural-LP, often by a significant margin (with WN18RR-v1 the only exception). Compared to the traditional rule learners, ReshufflE performs clearly better on FB15k-237 and NELL-995 (apart from v1) but underperforms on WN18RR. ? (?) found that the kind of rules which are needed for WN18RR are much noisier compared to those than those which are needed for FB15k-237 and NELL-995. Our use of ordering constraints may be less suitable in such cases. Finally, compared to the GNN-based methods, ReshufflE outperforms CoMPILE and GraIL on FB15k-237 and NELL-995 v1 and v4 while again (mostly) underperforming on WN18RR. ReshufflE furthermore consistently underperforms the state-of-the-art method NBFNet. Recall, however, that ReshufflE is significantly more efficient than such GNN-based approaches, as ReshufflE can score the plausibility of a given triple almost instantaneously.
Ablation Study
Finally, we empirically investigate ReshufflE’s components. We consider two variants for this study, namely: ReshufflEnL, which does not add a self-loop relation to the KG (i.e. triples of the form ); and ReshufflE2, which allows for more general matrices. In particular, different from ReshufflE, which applies the softmax function on the rows of (see Section 4), ReshufflE2 squares the matrices component-wise, thereby allowing them to contain arbitrary positive values. For a fair comparison, we train each of ReshufflE’s versions with the same hyper-parameter values, experimental setup, and evaluation protocol (see supplementary materials). Table 3 depicts the outcome of this study. It reveals that ReshufflE performs comparable to or better than ReshufflEnL and dramatically outperforms ReshufflE2 on all benchmarks. The similar performance of ReshufflE and ReshufflEnL on most datasets suggests that the self-loop relation only matters in specific cases, which may not occur frequently in some datasets. The poor performance of ReshufflE2 is as expected since allowing arbitrary positive parameters makes overfitting the training data more likely.
8 Conclusions
We have proposed a region-based knowledge graph embedding model that can faithfully capturing rule bases. Specifically, we have shown that embeddings can be constructed that exactly capture the deductive closure of a rule base, provided that the rules are left-regular, a condition which is inspired by left-regular grammars. Furthermore, we have shown that for arbitrary sets of closed path rules, we can learn embeddings which faithfully capture consequences that can be inferred using a bounded number of steps. In this way, our approach goes significantly beyond existing region-based embedding models. An important design choice is that our entity embeddings are constructed using a monotonic GNN, which essentially acts as a differentiable representation of a rule base. We introduced the notion of the rule graph to make this connection between the GNN model and rule bases explicit. The monotonic nature of the GNN also has practical advantages, in particular, the fact that entity embeddings can easily and efficiently be updated when new knowledge becomes available. However, this approach is perhaps less suitable for cases where we need to weigh different pieces of weak evidence (as illustrated by the disappointing results on WN18RR). In such cases, when further evidence becomes available, we may want to revise earlier assumptions, which is not possible with the proposed model. Develo** effective models that can provably simulate non-monotonic (or probabilistic) reasoning thus remains as an important challenge for future work.
Appendix A Constructing GNNs from Rule Graphs
Let be a set of closed path rules and let be a corresponding rule graph, satisfying the conditions (R1)–(R4). We also assume that a knowledge graph is given. We show that the GNN, which is constructed based on , correctly simulates the rules from . For the proofs, it will be more convenient to characterise the GNN in terms of operations on the coordinates of entity embeddings. Specifically, let and let be the set of nodes from the rule graph which have an incoming edge labelled with . We define:
Let and let be the unique incoming edge with label . Then we define ():
Now let us define:
where if and otherwise. Let be the entity embedding corresponding to the matrix . In other words, if we write for the components of and for the components of , then we have . For a matrix , let us write for the vector that is obtained by concatenating the rows of . In particular, . The following lemma reveals how the GNN constructed from the rule graph can be characterised in terms of entity embeddings.
Lemma 1.
It holds that .
Proof.
Let us write , and . Let . Let us first assume that does not have any incoming edges in which are labelled with . In that case, row of consists only of 0s and we have . Similarly, we then also have for and thus . Now assume that there is an edge from to which is labelled with . Then we have that row of is a one-hot vector with 1 at position . Accordingly, we have for . Accordingly we then have and thus . ∎
For a sequence of relations , we define as follows. We define , where (, ):
Note that if there is an path arriving at node in the rule graph, it has to be unique, given that each node has at most one incoming edge of a given type. In the following, we will also use , defined as follows:
We have the following result.
Lemma 2.
For we have
Proof.
It is sufficient to show
We have , with
We furthermore have with
Taking into account the definition of , we have only if there is an path from some node to the node , in which case we have . In other words, we have:
In other words, we have
We thus have . ∎
We also have the following result.
Lemma 3.
Suppose . There exists paths of type and and … and , all of whose eq-reduced type is , such that for every embedding we have:
Proof.
This follows immediately from the fact that whenever there is an -edge between two nodes and , there must also be a path between these nodes whose eq-reduced type is , because of condition (R3). ∎
The following result shows that the GNN will correctly predict all triples that can be inferred from .
Proposition 7.
Let be a rule base and a knowledge graph. Suppose . Let be a rule graph for and let be the entity representations that are learned by the corresponding GNN. Assume for every entity (). It holds that .
Proof.
Because of Lemma 1, it is sufficient to show that . If contains the triple then the result is trivially satisfied. Otherwise, implies that , for some such that contains triples , for some . Because , by construction, it holds for each that:
Similarly, because , we have and thus
In other words, we have
Continuing in the same way, we find that
Now consider a path of type whose eq-reduced type is . Then we have that contains triples of the form . Indeed, the only triples that need to be considered in addition to the triples are of the form , which we have assumed to belong to for every . For every path of type whose eq-reduced type is , we thus find entirely similarly to before that
Because of Lemma 3, this implies
In particular, we have
and because of the assumption that the GNN has converged after steps, we also have . ∎
For , let be the set of all paths in the knowledge graph which end in . For a path in , we write for the entity where the path starts and for the corresponding sequence of relations. For an entity , we write for its embedding in layer , i.e. . The following observation follows immediately from the construction of the GNN, together with Lemma 2.
Lemma 4.
For any entity it holds that
We will also need the following technical lemma.
Lemma 5.
Suppose . Then there is some such that:
-
•
; and
-
•
whenever with , it holds that .
Proof.
Let us write . Note that iff node in has an incoming -edge. It thus follows from condition (R1) that . Suppose that for every , there was some with such that . Let us write . We then have that for every -edge in , there is a path connecting the same nodes, with . From Condition (R4), it then follows that , a contradiction. ∎
The following result shows that the GNN is unlikely to predict triples that cannot be inferred from , as long as the embeddings are sufficiently high-dimensional.
Proposition 8.
Let be a rule base and a knowledge graph. Let be a rule graph for and let be the entity representations that are learned by the corresponding GNN. For any , there exists some such that, when , for any and such that , we have
Proof.
First, note that because of Lemma 1, what we need to show is equivalent to:
Let be such that . From Lemma 5, we know that there is some such that and whenever with , it holds that . The following condition is clearly a necessary requirement for :
where we write for . We need in particular also that:
Due to Lemma 4 this is equivalent to requiring that for every we have:
We can view the coordinates of the input embeddings as random variables. The latter condition is thus equivalent to a condition of the following form:
where is the random variable corresponding to the th coordinate of , is the th coordinate of and are the random variables corresponding to the th coordinate of the vectors . By construction, we have that the coordinates of different entity embeddings are sampled independently and that there are at least two distinct values that have a non-negative probability of being sampled for each coordinate. This means that there exists some value such that and for each . Moreover, since we have that whenever with it holds that , it follows that the random variable is not among . We thus have:
The value of is upper bounded by , with the number of nodes in the rule graph. By choosing sufficiently large, we can thus make this probability arbitrarily small. In particular:
∎
Appendix B Constructing Rule Graphs
We write for the set of relations that appear in the head of some rule from the considered rule base, and for the remaining relations.
Proposition 9.
Let be a set of closed path rules and suppose that there exists a rule graph for . Let be the set of relations that appear in the head of some rule in . It holds that the language is regular for every .
Proof.
Let if and otherwise. We clearly have that iff entails the following rule:
Since we have assumed that has a rule graph, thanks to conditions (R3) and (R4), we can check whether this rule is valid by checking whether for each edge labelled with there is a path connecting the same nodes whose eq-reduced type is . Let be a an edge labelled with . Then, we can construct a finite state machine (FSM) from by treating as the start node and as the unique final node and interpreting eq edges as -transitions (i.e. corresponding to the empty string). Clearly, this FSM will accept the string if there is a path labelled with connecting to . For each edge labelled with , we can construct such an FSM. Let be the languages associated with these FSMs. By construction, is the intersection of . Since are regular, it follows that is regular as well. ∎
B.1 Left Regular Rule Bases
We show that the graph resulting from the construction process satisfies the conditions (R1)-(R4). The fact that (R1) is satisfied follows from the following lemma.
Lemma 6.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. For every , it holds that contains an outgoing -edge from .
Proof.
Let . The edge from to is added in step 2 of the construction process. This edge may be removed in step 4, but in that case, a new -edge is added from to a fresh node. ∎
The fact that (R2) is satisfied follows immediately from the construction in step 4. We now move to condition (R3).
Lemma 7.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. If contains the rule , then whenever two nodes and are connected in by a path whose eq-reduced type is , there is some node such that and are connected by a path whose eq-reduced type is and and are connected by a path whose eq-reduced type is .
Proof.
The stated assertion clearly holds after step 3 of the construction method. Indeed, the only -edge in is from to . Note in particular that no edges can be added in step 3, given our assumption that is left-regular. Finally, it is also easy to see that this property remains satisfied after step 4. ∎
The next lemma shows that (R3) is satisfied.
Lemma 8.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. Suppose nodes and are connected with an edge of type and suppose . Then there is a path whose eq-reduced type is from to .
Proof.
Assume . Let and be nodes connected by an edge of type . We show the result by structural induction. First, suppose . In this case, the considered rule is of the form . It then follows from Lemma 7 that there is a path whose eq-reduced type is connecting and . Let us now consider the inductive case. If then is derived from at least two rules in (given that the rules in were restricted to have only two atoms in the body). The last step of the derivation of this rule is done by secting some rule from such that
If there is a path from to whose eq-reduced type is , we know from Lemma 7 that there must be a path from to with eq-reduced type -edge and a path from to with eq-reduced type , for some node in . By induction, we furthermore know that there must then be a path with eq-reduced type from to and a path with eq-reduced type from to . Thus, we find that there must be a path with eq-reduced type from to . ∎
The fact that (R4) is satisfied follows from the next lemma.
Lemma 9.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. Suppose there is a path in from to whose eq-reduced type is . Then it holds that .
Proof.
The result clearly holds after step 2. We show that the result remains valid after each iteration of step 3. Suppose in step 3 we add an -edge between and . This means that:
Let be a path from to . If does not contain the new -edge, then the fact that the result is valid for follows by induction. Now, suppose that contains the new edge. Then is of the form . By induction we have:
Clearly there is a path from to with eq-reduced type . In particular, there is a path from to with eq-reduced type . By induction, we thus have:
Together we find that the stated result is satisfied.
Finally, we need to show that the result remains satisfied after step 4. This is clearly the case, as this step replaces edges of type with paths of type . The eq-reduced types of the paths from to thus remain unchanged after this step. ∎
Proposition 10.
Let be a left-regular set of closed path rules and let be the graph obtained using the proposed construction method. It holds that satisfies (R1)–(R4).
B.2 Bounded Inference
Let be the set of all paths in of length at most which are ending in .
Lemma 10.
For any entity it holds that
Proof.
This follows immediately from the construction of the GNN. ∎
Lemma 11.
Let be the number of nodes in the given -bounded rule graph. Suppose . Then there is some such that:
-
•
; and
-
•
whenever with , it holds that .
Proof.
This lemma is shown in exactly the same way as Lemma 5, simply replacing by and replacing Condition (R4) by Condition (R4m). ∎
Proposition 11.
Let be a rule base and a knowledge graph. Let be an -bounded rule graph for and let be the entity representations that are learned by the corresponding GNN. For any , there exists some such that, when , for any and such that , we have
Proof.
Let us now show the correctness of the proposed process for constructing -bounded rule graphs. Conditions (R1) and (R2) are clearly satisfied. Next, we show that condition (R3) is satisfied.
Lemma 12.
Let be a set of closed path rules and let be the resulting -bounded rule graph, constructed using the proposed process. Suppose nodes and are connected with an edge of type and suppose . Then there is a path connecting to , whose eq-reduced type is .
Proof.
First, we show that at the end of step 4, there must be a path of type connecting and . By construction, we immediately have that whenever two nodes are connected with an -edge and contains the rule it holds that there exists some node such that there is an -edge from to and an edge from to . The existence of a path of type then follows in the same way as in the proof of Lemma 8. It remains to be shown that the proposition remains valid after step 5. However, the paths in the final graph are those that can be found in the graph after step 4, with the possible addition of some eq-edges. This means in particular that after step 5, there must still be a path from to whose eq-reduced type is . ∎
Finally, the fact that (R4m) is satisfied follows from the following lemma.
Lemma 13.
Let be a set of closed path rules, and let be the resulting -bounded rule graph, constructed using the process outlined above. Suppose there is a path from to whose eq-reduced type if , with . Then it holds that .
Proof.
We clearly have that the proposition holds after step 3 of the construction method. After step 3, if there is an -link between nodes and and a rule such that and are not connected by an path, it must be the case that any path from to some node which contains the edge must have a length of at least . It follows that any path from to some node which contains an edge that was added during step 4 must have length at least . We thus have in particular that the proposition still holds after step 4. The paths in the final graph are those that can be found in the graph after step 4, with the possible addition of some eq-edges. Since the proposition only depends on the eq-reduced types of the paths, the result still holds after step 5. ∎
Together, we have shown the following result.
Proposition 12.
Let be a set of closed path rules and let be the graph obtained using the proposed construction method for -bounded rule graphs. It holds that satisfies (R1)–(R3) and (R4m).
Appendix C Experimental Details
This section lists additional details about our experiment’s setup, benchmark datasets, and evaluation protocol. Section C.1 lists ReshufflE’s implementation details. The origins and licenses of the standard benchmarks for inductive KGC are discussed in Section C.2. Details on ReshufflE’s hyper-parameter optimisation are discussed in Section C.3. Finally, details about the evaluation protocol, together with the complete evaluation results, are provided in Section C.4.
C.1 Implementation Details
ReshufflE was implemented using the Python library PyKEEN 1.10.1 (?). PyKEEN employs the MIT license and offers numerous benchmarks for KGC, facilitating the comfortable reuse of ReshufflE’s code for upcoming applications and comparisons. Upon acceptance of our paper, we will provide ReshufflE’s source code in a public GitHub repository to further facilitate the reuse of ReshufflE by our community.
C.2 Benchmarks: Origins and Licenses
We did not find a license for any of the three inductive benchmarks nor their corresponding transductive supersets. Furthermore, WN18RR is a subset of the WordNet database (?), which states lexical relations of English words. We also did not find a license for this dataset. FB15k-237 is a subset of FB15k (?), which is a subset of Freebase (?), a collaborative database that contains general knowledge, such as about celebrities and awards, in English. We did not find a license for FB15k-237 but found that FB15k (?) uses the CC BY 2.5 license. Finally, NELL-995 (?) is a subset of NELL (?), a dataset that was extracted from semi-structured and natural-language data on the web and that includes information about e.g., cities, companies, and sports teams. Also for NELL, we did not find any license information.
C.3 Hyper-Parameter Optimisation
#Layers | lr | |||||
---|---|---|---|---|---|---|
FB15k-237 | v1 | 4 | 25 | 80 | 2.0 | 0.005 |
v2 | 3 | 30 | 60 | 1.0 | 0.005 | |
v3 | 5 | 25 | 40 | 0.5 | 0.005 | |
v4 | 3 | 30 | 80 | 1.0 | 0.01 | |
WN18RR | v1 | 3 | 20 | 40 | 1.0 | 0.01 |
v2 | 3 | 20 | 60 | 0.5 | 0.01 | |
v3 | 3 | 20 | 40 | 1.0 | 0.01 | |
v4 | 3 | 30 | 80 | 1.0 | 0.01 | |
NELL-995 | v1 | 3 | 20 | 80 | 2.0 | 0.005 |
v2 | 4 | 30 | 60 | 2.0 | 0.01 | |
v3 | 4 | 25 | 40 | 0.5 | 0.01 | |
v4 | 4 | 30 | 60 | 1.0 | 0.01 |
FB15k-237 | WN18RR | NELL-995 | ||||||||||
v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | v1 | v2 | v3 | v4 | |
Seed 1 | 0.751 | 0.879 | 0.905 | 0.918 | 0.713 | 0.727 | 0.614 | 0.693 | 0.630 | 0.874 | 0.871 | 0.816 |
Seed 2 | 0.744 | 0.892 | 0.908 | 0.916 | 0.707 | 0.726 | 0.574 | 0.690 | 0.650 | 0.860 | 0.893 | 0.808 |
Seed 3 | 0.746 | 0.883 | 0.897 | 0.918 | 0.710 | 0.736 | 0.617 | 0.698 | 0.635 | 0.848 | 0.881 | 0.812 |
mean | 0.747 | 0.885 | 0.903 | 0.918 | 0.710 | 0.729 | 0.602 | 0.694 | 0.638 | 0.861 | 0.882 | 0.812 |
stdv | 0.004 | 0.007 | 0.005 | 0.001 | 0.003 | 0.006 | 0.024 | 0.004 | 0.010 | 0.013 | 0.011 | 0.004 |
Following ? (?), we manually tune ReshufflE’s hyper-parameters on the validation split of . We use the following ranges for the hyperparameters: the number of ReshufflE’s layers , the embedding dimensionality parameters and , the loss margin , and finally the learning rate . We use the same batch and negative sampling size for all runs. In particular, we set the batch size to and the negative sampling size to . We report the best hyper-parameters for ReshufflE split by each inductive benchmark in Table 4. Finally, we reuse the same hyper-parameters for each of ReshufflE’s ablations, namely, ReshufflEnL and ReshufflE2.
C.4 Evaluation Protocol and Complete Results
Following the standard evaluation protocol for inductive KGC, introduced by ? (?), we evaluate ReshufflE’s final performance on the test split of the testing graph by measuring the ranking quality of any test triple over randomly sampled entities and : and for all . Following ? (?), we report the Hits@10 metric, i.e., the proportion of true triples (those within the test split of the testing graph) among the predicted triples whose rank is maximally .
Table 5 states ReshufflE’s benchmark results over all inductive datasets, as well as their means and standard deviations.
References
- 2020 Abboud, R.; Ceylan, İ. İ.; Lukasiewicz, T.; and Salvatori, T. 2020. BoxE: A box embedding model for knowledge base completion. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- 2021 Ali, M.; Berrendorf, M.; Hoyt, C. T.; Vermue, L.; Sharifzadeh, S.; Tresp, V.; and Lehmann, J. 2021. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. Journal of Machine Learning Research 22(82):1–6.
- 2023 Anil, A.; Gutiérrez-Basulto, V.; Ibáñez-García, Y.; and Schockaert, S. 2023. Inductive knowledge graph completion with gnns and rules: An analysis. CoRR abs/2308.07942.
- 2019 Balazevic, I.; Allen, C.; and Hospedales, T. M. 2019. TuckER: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, 5184–5193. Association for Computational Linguistics.
- 2013 Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, 2787–2795.
- 2010 Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Jr., E. R. H.; and Mitchell, T. M. 2010. Toward an architecture for never-ending language learning. In Fox, M., and Poole, D., eds., Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010, 1306–1313. AAAI Press.
- 2024 Charpenay, V., and Schockaert, S. 2024. Capturing knowledge graphs and rules with octagon embeddings. CoRR abs/2401.16270.
- 2022 Chen, Y.; Mishra, P.; Franceschi, L.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2022. Refactor gnns: Revisiting factorisation-based models from a message-passing perspective. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- 2013 Galárraga, L. A.; Teflioudi, C.; Hose, K.; and Suchanek, F. M. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Schwabe, D.; Almeida, V. A. F.; Glaser, H.; Baeza-Yates, R.; and Moon, S. B., eds., 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, 413–422. International World Wide Web Conferences Steering Committee / ACM.
- 2018 Gutiérrez-Basulto, V., and Schockaert, S. 2018. From knowledge graph embedding to ontology embedding? an analysis of the compatibility between vector space representations and rules. In Thielscher, M.; Toni, F.; and Wolter, F., eds., Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, Tempe, Arizona, 30 October - 2 November 2018, 379–388. AAAI Press.
- 2015 Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- 2022 Leemhuis, M.; Özçep, Ö. L.; and Wolter, D. 2022. Learning with cone-based geometric models and orthologics. Ann. Math. Artif. Intell. 90(11-12):1159–1195.
- 2021 Mai, S.; Zheng, S.; Yang, Y.; and Hu, H. 2021. Communicative message passing for inductive relation reasoning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, 4294–4302. AAAI Press.
- 2018 Meilicke, C.; Fink, M.; Wang, Y.; Ruffinelli, D.; Gemulla, R.; and Stuckenschmidt, H. 2018. Fine-grained evaluation of rule- and embedding-based systems for knowledge graph completion. In Vrandecic, D.; Bontcheva, K.; Suárez-Figueroa, M. C.; Presutti, V.; Celino, I.; Sabou, M.; Kaffee, L.; and Simperl, E., eds., The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume 11136 of Lecture Notes in Computer Science, 3–20. Springer.
- 2019 Meilicke, C.; Chekol, M. W.; Ruffinelli, D.; and Stuckenschmidt, H. 2019. Anytime bottom-up rule learning for knowledge graph completion. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, 3137–3143. ijcai.org.
- 1995 Miller, G. A. 1995. Wordnet: A lexical database for english. Commun. ACM 38(11):39–41.
- 2011 Nickel, M.; Tresp, V.; and Kriegel, H. 2011. A three-way model for collective learning on multi-relational data. In Getoor, L., and Scheffer, T., eds., Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, 809–816. Omnipress.
- 2023 Pavlovic, A., and Sallinger, E. 2023. ExpressivE: A spatio-functional embedding for knowledge graph completion. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- 2019 Sadeghian, A.; Armandpour, M.; Ding, P.; and Wang, D. Z. 2019. DRUM: end-to-end differentiable rule mining on knowledge graphs. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 15321–15331.
- 2020 Teru, K. K.; Denis, E. G.; and Hamilton, W. L. 2020. Inductive relation prediction by subgraph reasoning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, 9448–9457. PMLR.
- 2015 Toutanova, K., and Chen, D. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, CVSC 2015, Bei**g, China, July 26-31, 2015, 57–66. Association for Computational Linguistics.
- 2016 Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In Balcan, M., and Weinberger, K. Q., eds., Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, 2071–2080. JMLR.org.
- 2017 Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning. In Palmer, M.; Hwa, R.; and Riedel, S., eds., Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, 564–573. Association for Computational Linguistics.
- 2015 Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Bengio, Y., and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- 2017 Yang, F.; Yang, Z.; and Cohen, W. W. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Guyon, I.; von Luxburg, U.; Bengio, S.; Wallach, H. M.; Fergus, R.; Vishwanathan, S. V. N.; and Garnett, R., eds., Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2319–2328.
- 2021 Zhang, Z.; Wang, J.; Chen, J.; Ji, S.; and Wu, F. 2021. Cone: Cone embeddings for multi-hop reasoning over knowledge graphs. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y. N.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 19172–19183.
- 2021 Zhu, Z.; Zhang, Z.; Xhonneux, L. A. C.; and Tang, J. 2021. Neural bellman-ford networks: A general graph neural network framework for link prediction. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 29476–29490.