Differentiable Reasoning about Knowledge Graphs
with Region-based Graph Neural Networks

Aleksandar Pavlović1    Emanuel Sallinger1    Steven Schockaert2
\affiliations1TU Wien, Vienna, Austria
2Cardiff University, Cardiff, United Kingdom \emails{aleksandar.pavlovic, emanuel.sallinger}@tuwien.ac.at, [email protected]
Abstract

Methods for knowledge graph (KG) completion need to capture semantic regularities and use these regularities to infer plausible knowledge that is not explicitly stated. Most embedding-based methods are opaque in the kinds of regularities they can capture, although region-based KG embedding models have emerged as a more transparent alternative. By modeling relations as geometric regions in high-dimensional vector spaces, such models can explicitly capture semantic regularities in terms of the spatial arrangement of these regions. Unfortunately, existing region-based approaches are severely limited in the kinds of rules they can capture. We argue that this limitation arises because the considered regions are defined as the Cartesian product of two-dimensional regions. As an alternative, in this paper, we propose ReshufflE, a simple model based on ordering constraints that can faithfully capture a much larger class of rule bases than existing approaches. Moreover, the embeddings in our framework can be learned by a monotonic Graph Neural Network (GNN), which effectively acts as a differentiable rule base. This approach has the important advantage that embeddings can be easily updated as new knowledge is added to the KG. At the same time, since the resulting representations can be used similarly to standard KG embeddings, our approach is significantly more efficient than existing approaches to differentiable reasoning.

1 Introduction

Knowledge graph (KG) embedding models learn geometric representations of knowledge graphs, with the aim of capturing regularities in the available knowledge. These representations can then be used to infer plausible knowledge that is not explicitly stated in the KG. An important research question is concerned with the kinds of regularities that can be captured by different kinds of models. While standard approaches are often difficult to analyse from this perspective, region-based embedding models aim to make these regularities more explicit. Essentially, in such approaches, each entity e is represented by an embedding 𝐞d𝐞superscript𝑑\mathbf{e}\in\mathbb{R}^{d}bold_e ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and each relation r𝑟ritalic_r is represented by a geometric region Xr2dsubscript𝑋𝑟superscript2𝑑X_{r}\subseteq\mathbb{R}^{2d}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT 2 italic_d end_POSTSUPERSCRIPT. We say that the triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) is captured by the embedding iff 𝐞𝐟Xrdirect-sum𝐞𝐟subscript𝑋𝑟\mathbf{e}\oplus\mathbf{f}\in X_{r}bold_e ⊕ bold_f ∈ italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, where we write direct-sum\oplus for vector concatenation. In this way, we can naturally associate a KG with a given embedding. The advantage of region-based models is that we can similarly also associate a rule base with the embedding, where the rules reflect the spatial configuration of the regions Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. However, not all rule bases can be captured in this way. As a simple example, models based on TransE (?) cannot distinguish between the rules r1(X,Y)r2(Y,Z)r3(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟3𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r_{3}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z ) and r2(X,Y)r1(Y,Z)r3(X,Z)subscript𝑟2𝑋𝑌subscript𝑟1𝑌𝑍subscript𝑟3𝑋𝑍r_{2}(X,Y)\wedge r_{1}(Y,Z)\rightarrow r_{3}(X,Z)italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z ).

This particular limitation can be avoided by using more sophisticated region-based models (??), but even these models remain limited in terms of which rule bases they can capture. The underlying limitation seems to be related to the fact that these models use regions which are the Cartesian product of d𝑑ditalic_d two-dimensional regions, i.e. Xr=A1r××Adrsubscript𝑋𝑟subscriptsuperscript𝐴𝑟1subscriptsuperscript𝐴𝑟𝑑X_{r}=A^{r}_{1}\times...\times A^{r}_{d}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, with Air2superscriptsubscript𝐴𝑖𝑟superscript2A_{i}^{r}\subseteq\mathbb{R}^{2}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. To check whether (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) is captured, we then check whether (ei,fi)Airsubscript𝑒𝑖subscript𝑓𝑖superscriptsubscript𝐴𝑖𝑟(e_{i},f_{i})\in A_{i}^{r}( italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT for each i{1,,d}𝑖1𝑑i\in\{1,...,d\}italic_i ∈ { 1 , … , italic_d }, with 𝐞=(e1,,ed)𝐞subscript𝑒1subscript𝑒𝑑\mathbf{e}=(e_{1},...,e_{d})bold_e = ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and 𝐟=(f1,,fd)𝐟subscript𝑓1subscript𝑓𝑑\mathbf{f}=(f_{1},...,f_{d})bold_f = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). We will refer to such approaches as coordinate-wise models. Existing models thus primarily differ in how these two-dimensional regions are defined, e.g. ExpressivE (?) uses parallelograms for this purpose, while ? (?) used octagons. While it is, in principle, possible to use more flexible region-based representations, this typically leads to overfitting. In this paper, we go beyond coordinate-wise models but aim to avoid overfitting by otherwise kee** the model as simple as possible: we essentially learn regions Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, which are defined in terms of ordering constraints of the form eifjsubscript𝑒𝑖subscript𝑓𝑗e_{i}\leq f_{j}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Our main contributions are two-fold. First, we show that, despite its simplicity, the proposed model can capture a large class of rule bases, thus overcoming some of the limitations of existing region-based models. In fact, if we only consider consequences that can be inferred using a bounded number of inference steps, our model is capable of faithfully capturing arbitrary sets of closed path rules. Second, we show that knowledge graph embeddings in our framework can be learned using a monotonic Graph Neural Network (GNN) with randomly initialised node embeddings. This GNN effectively serves as a differentiable approximation of a rule base, acting on the initial representations of the entities to ensure that they capture the consequences that can be inferred from the KG. An important practical consequence is that our KG embeddings can be efficiently updated when new knowledge becomes available. Thus, our model is particularly well suited for KG completion in the inductive setting, where we need to predict links between entities that were not seen during training. Moreover, whereas existing inductive KG completion methods tend to be computationally expensive, e.g. by requiring one (?) or even many (?) forward passes of a GNN model for each query, our approach retains the advantage of KG embeddings, where the plausibility of a triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) can be checked almost instantaneously.

2 Related Work

Region-based Models

Despite the vast amount of work on KG embedding models in the last decade, the reasoning abilities of most existing models are poorly understood. The main exception comes from a line of work that has focused on region-based representations (??????). Essentially, the region-based view makes explicit what triples and rules are captured by a given embedding. This allows us to study what kinds of semantic dependencies a given model is capable of capturing, which is important for ensuring that models have the right inductive bias, especially for settings where reasoning is important. Existing work has uncovered various limitations of existing models. For instance, ? (?) revealed that bilinear models such as RESCAL (?), DistMult (?), TuckER (?) and ComplEx (?) cannot capture relation hierarchies in a faithful way. They furthermore found that models that represent relations using convex regions have inherent limitations when it comes to modelling disjointness. However, such models were found to be capable of modelling arbitrary sets of closed path rules (and even more general classes of rule bases, involving existentials in the head and relations of different arity). In practice, learning arbitrary convex polytopes is not feasible in high-dimensional spaces. Practical region-based embedding models therefore focus on much simpler classes of regions, such as Cartesian products of boxes (?), cones (??), parallelograms (?) and octagons (?). This makes the models easier to learn but limits the kinds of rules that they can capture. While the use of parallelograms and octagons makes it possible to capture arbitrary closed path rules, in practice we want to capture sets of such rules. This is only known to be possible under rather restrictive conditions (see Section 3).

Inductive KG Completion

Standard benchmarks for KG completion can only evaluate the reasoning abilities of models to a limited extent. For instance, BoxE (?) achieves strong results on these benchmarks, despite provably being incapable of modelling simple rules such as r1(X,Y)r2(Y,Z)r3(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟3𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r_{3}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z ). In this paper, we will therefore instead focus on the problem of inductive KG completion (?). In the inductive setting, we need to predict links between entities that are different from those that were seen during training. In particular, there is no overlap between the entities that occur in the KG that was used for training and the one that is used for testing (although the relations are the same in both KGs). To perform this task, models need to learn semantic dependencies between the relations, and then exploit this knowledge when making predictions. This can be achieved in different ways. A natural strategy is to learn rules from the training KG, either explicitly using a model such as AnyBURL (?) or implicitly using differentiable rule learners such as Neural-LP (?) or DRUM (?). The latter essentially approximate rule applications using tensor multiplications. In practice, better results have been obtained using GNNs. For instance, some approaches (?) reduce the problem of link prediction to a graph classification problem. They first construct a subgraph containing paths connecting the head entity with some candidate tail entity, and then use a GNN to predict a score from this subgraph. Such approaches suffer from limited scalability, as answering a link prediction query requires constructing and processing such a subgraph for each candidate tail entity. NBFNet (?) alleviates this limitation, by using a single GNN that processes the entire graph. The resulting node embeddings can then be used to score the different candidate tail entities. However, the node embeddings are query-specific, meaning that this model still requires a new forward pass of the GNN for each query, which is considerably less efficient than using KG embeddings.

While we use a GNN for computing entity embeddings, once these embeddings have been learned, we can use them to answer arbitrary link prediction queries. Our method is thus considerably more efficient than the aforementioned GNN-based models for inductive KG completion. ReFactor GNN (?) similarly uses a GNN to learn entity embeddings, by simulating the training dynamic of traditional KG embedding methods such as TransE (?). However, their method has the disadvantage that all embeddings have to be recomputed when new triples are added to the KG. Moreover, their model inherits the limitations of traditional embedding models when it comes to faithfully modelling rules. Conceptually, our method has more in common with differentiable rule learning methods than with subgraph classification strategies. Indeed, each layer of the GNN updates the entity embeddings by essentially simulating the application of rules. Moreover, our model can simulate the deductive chaining of rules, which makes it fundamentally different from Neural-LP and DRUM, which focus on one-off rule application.

3 Problem Setting

Let \mathcal{R}caligraphic_R be a set of relations, \mathcal{E}caligraphic_E a set of entities, and 𝒢××𝒢\mathcal{G}\subseteq\mathcal{E}\times\mathcal{R}\times\mathcal{E}caligraphic_G ⊆ caligraphic_E × caligraphic_R × caligraphic_E a knowledge graph. Similar to standard KG embedding models, our aim is to learn a vector space representation 𝐞d𝐞superscript𝑑\mathbf{e}\in\mathbb{R}^{d}bold_e ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for every entity e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E and a scoring function srsubscript𝑠𝑟s_{r}italic_s start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for every relation r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R such that sr(𝐞,𝐟)subscript𝑠𝑟𝐞𝐟s_{r}(\mathbf{e},\mathbf{f})italic_s start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_e , bold_f ) reflects the plausibility of the triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ). In the case of region-based models, the scoring function srsubscript𝑠𝑟s_{r}italic_s start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is defined in terms of a geometric region Xrdsubscript𝑋𝑟superscript𝑑X_{r}\subseteq\mathbb{R}^{d}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Specifically, the triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) is then considered to be captured by the embedding iff 𝐞𝐟Xrdirect-sum𝐞𝐟subscript𝑋𝑟\mathbf{e}\oplus\mathbf{f}\in X_{r}bold_e ⊕ bold_f ∈ italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, where we write 𝐞𝐟direct-sum𝐞𝐟\mathbf{e}\oplus\mathbf{f}bold_e ⊕ bold_f to denote vector concatenation. Accordingly, the scoring function srsubscript𝑠𝑟s_{r}italic_s start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT then reflects how close 𝐞𝐟direct-sum𝐞𝐟\mathbf{e}\oplus\mathbf{f}bold_e ⊕ bold_f is to the region Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (which is formalised in different ways by different models).

A key advantage of region-based models is that they offer a mechanism for modelling rules. Let us write η𝜂\etaitalic_η to denote a given region-based embedding, i.e. η(e)d𝜂𝑒superscript𝑑\eta(e)\in\mathbb{R}^{d}italic_η ( italic_e ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denotes the embedding of the entity e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E and η(r)2d𝜂𝑟superscript2𝑑\eta(r)\subseteq\mathbb{R}^{2d}italic_η ( italic_r ) ⊆ blackboard_R start_POSTSUPERSCRIPT 2 italic_d end_POSTSUPERSCRIPT denotes the region representing the relation r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R. Let us consider a rule ρ𝜌\rhoitalic_ρ of the following form:

r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1\displaystyle r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r_{p}% (X_{p},X_{p+1})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) (1)
r(X1,Xp+1)absent𝑟subscript𝑋1subscript𝑋𝑝1\displaystyle\quad\quad\rightarrow r(X_{1},X_{p+1})→ italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT )

We say that η𝜂\etaitalic_η captures this rule if for all vectors 𝐱𝟏,,𝐱𝐩+𝟏nsubscript𝐱1subscript𝐱𝐩1superscript𝑛\mathbf{x_{1}},...,\mathbf{x_{p+1}}\in\mathbb{R}^{n}bold_x start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT bold_p + bold_1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we have:

(𝐱𝟏𝐱𝟐η(r1)).(𝐱𝐩𝐱𝐩+𝟏η(rp))formulae-sequencedirect-sumsubscript𝐱1subscript𝐱2𝜂subscript𝑟1direct-sumsubscript𝐱𝐩subscript𝐱𝐩1𝜂subscript𝑟𝑝\displaystyle(\mathbf{x_{1}}\oplus\mathbf{x_{2}}\in\eta(r_{1}))\wedge....% \wedge(\mathbf{x_{p}}\oplus\mathbf{x_{p+1}}\in\eta(r_{p}))( bold_x start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ⊕ bold_x start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ∈ italic_η ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ∧ … . ∧ ( bold_x start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT ⊕ bold_x start_POSTSUBSCRIPT bold_p + bold_1 end_POSTSUBSCRIPT ∈ italic_η ( italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) ) (2)
(𝐱𝟏𝐱𝐩+𝟏η(r))absentdirect-sumsubscript𝐱1subscript𝐱𝐩1𝜂𝑟\displaystyle\quad\quad\Rightarrow(\mathbf{x_{1}}\oplus\mathbf{x_{p+1}}\in\eta% (r))⇒ ( bold_x start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ⊕ bold_x start_POSTSUBSCRIPT bold_p + bold_1 end_POSTSUBSCRIPT ∈ italic_η ( italic_r ) )

Rules of the form (1) are known as closed path rules. Region-based embeddings can similarly capture other kinds of rules, such as intersection rules of the form r1(X1,X2)r2(X1,X2)r(X1,X2)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋1subscript𝑋2𝑟subscript𝑋1subscript𝑋2r_{1}(X_{1},X_{2})\wedge r_{2}(X_{1},X_{2})\rightarrow r(X_{1},X_{2})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). However, we will specifically focus on closed path rules in this paper, due to their importance for KG completion. For instance, most rule-based methods for KG completion focus on learning rules of this type (?). Moreover, existing region-based models have particular limitations when it comes to capturing this kind of rules. Some approaches, such as BoxE (?) are not capable of capturing such rules at all. More recent approaches (??) are capable of capturing individual closed path rules, but they are limited when it comes to jointly capturing a set of such rules.

Specifically, given a set of closed path rules 𝒫𝒫\mathcal{P}caligraphic_P, we ideally want an embedding η𝜂\etaitalic_η that captures every rule in 𝒫𝒫\mathcal{P}caligraphic_P while not capturing any rules that are not entailed by 𝒫𝒫\mathcal{P}caligraphic_P. ? (?) showed this to be possible, provided that every rule entailed from 𝒫𝒫\mathcal{P}caligraphic_P is either a trivial rule such as r(X1,X2)r(X1,X2)𝑟subscript𝑋1subscript𝑋2𝑟subscript𝑋1subscript𝑋2r(X_{1},X_{2})\rightarrow r(X_{1},X_{2})italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) or a rule of the form (1) in which r1,,rp,rsubscript𝑟1subscript𝑟𝑝𝑟r_{1},...,r_{p},ritalic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r are all distinct relations. For instance, rules of the form r1(X1,X2)r1(X2,X3)r(X1,X3)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟1subscript𝑋2subscript𝑋3𝑟subscript𝑋1subscript𝑋3r_{1}(X_{1},X_{2})\wedge r_{1}(X_{2},X_{3})\rightarrow r(X_{1},X_{3})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) were not allowed in their construction. They also provided a counterexample, which shows that without this restriction, it is not always possible to faithfully capture rule bases with octagon embeddings (without also capturing rules that are not entailed by the given rule base). ? (?) did not study the problem of capturing sets of rules, but their model is likely to suffer from similar limitations.

In the following, we write 𝒫𝒢(e,r,f)models𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ ( italic_e , italic_r , italic_f ) to denote that the triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) can be entailed from the rule base 𝒫𝒫\mathcal{P}caligraphic_P and the knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G. More precisely, we have 𝒫𝒢(e,r,f)models𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ ( italic_e , italic_r , italic_f ) iff either (e,r,f)𝒢𝑒𝑟𝑓𝒢(e,r,f)\in\mathcal{G}( italic_e , italic_r , italic_f ) ∈ caligraphic_G or 𝒫𝒫\mathcal{P}caligraphic_P contains a rule of the form (1) such that 𝒫𝒢(e,r1,e2)models𝒫𝒢𝑒subscript𝑟1subscript𝑒2\mathcal{P}\cup\mathcal{G}\models(e,r_{1},e_{2})caligraphic_P ∪ caligraphic_G ⊧ ( italic_e , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), 𝒫𝒢(e2,r2,e3)models𝒫𝒢subscript𝑒2subscript𝑟2subscript𝑒3\mathcal{P}\cup\mathcal{G}\models(e_{2},r_{2},e_{3})caligraphic_P ∪ caligraphic_G ⊧ ( italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), …, 𝒫𝒢(ep,rp,f)models𝒫𝒢subscript𝑒𝑝subscript𝑟𝑝𝑓\mathcal{P}\cup\mathcal{G}\models(e_{p},r_{p},f)caligraphic_P ∪ caligraphic_G ⊧ ( italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_f ) for some entities e2,,epsubscript𝑒2subscript𝑒𝑝e_{2},...,e_{p}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. We furthermore write 𝒫ρmodels𝒫𝜌\mathcal{P}\models\rhocaligraphic_P ⊧ italic_ρ for a rule ρ𝜌\rhoitalic_ρ of the form (1) to denote that 𝒫𝒫\mathcal{P}caligraphic_P entails ρ𝜌\rhoitalic_ρ w.r.t. the standard notion of entailment from propositional logic (when interpreting rules in terms of material implication). Note that while we consider both a knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G and a rule base 𝒫𝒫\mathcal{P}caligraphic_P in our analysis, in practice only the knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G is given. We study whether our model is capable of capturing the rule base because this is a necessary condition to allow it to learn semantic dependencies in the form of rules.

4 Model Description

Our aim is to develop a model that can capture a larger class of rule bases than existing region-based models. Furthermore, we want the embeddings to be defined such that they can be efficiently updated whenever new knowledge becomes available.

Ordering Constraints

The central idea is to rely on ordering constraints. Specifically, we model each relation r𝑟ritalic_r using a region Xrsubscript𝑋𝑟X_{r}italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT of the following form: 𝐞𝐟Xrdirect-sum𝐞𝐟subscript𝑋𝑟\mathbf{e}\oplus\mathbf{f}\in X_{r}bold_e ⊕ bold_f ∈ italic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT iff

iIr.eσr(i)fiformulae-sequencefor-all𝑖subscript𝐼𝑟subscript𝑒subscript𝜎𝑟𝑖subscript𝑓𝑖\displaystyle\forall i\in I_{r}\,.\,e_{\sigma_{r}(i)}\leq f_{i}∀ italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT . italic_e start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ≤ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (3)

where Ir{1,,d}subscript𝐼𝑟1𝑑I_{r}\subseteq\{1,...,d\}italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ { 1 , … , italic_d }, σr:Ir{1,,d}:subscript𝜎𝑟subscript𝐼𝑟1𝑑\sigma_{r}:I_{r}\rightarrow\{1,...,d\}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT → { 1 , … , italic_d } and we assume 𝐞=(e1,,ed)𝐞subscript𝑒1subscript𝑒𝑑\mathbf{e}=(e_{1},...,e_{d})bold_e = ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and 𝐟=(f1,,fd)𝐟subscript𝑓1subscript𝑓𝑑\mathbf{f}=(f_{1},...,f_{d})bold_f = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). The following example illustrates why the use of ordering constraints is well-suited for modelling rules.

Example 1.

Consider a rule of the form r1(X,Y)r2(Y,Z)r(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍𝑟𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r ( italic_X , italic_Z ). This rule is captured by an embedding of the form (3) if for each iIr𝑖subscript𝐼𝑟i\in I_{r}italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT we have that iIr2𝑖subscript𝐼subscript𝑟2i\in I_{r_{2}}italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, σr2(i)Ir1subscript𝜎subscript𝑟2𝑖subscript𝐼subscript𝑟1\sigma_{r_{2}}(i)\in I_{r_{1}}italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i ) ∈ italic_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and σr1(σr2(i))=σr(i)subscript𝜎subscript𝑟1subscript𝜎subscript𝑟2𝑖subscript𝜎𝑟𝑖\sigma_{r_{1}}(\sigma_{r_{2}}(i))=\sigma_{r}(i)italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i ) ) = italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ). Indeed, if these conditions are satisfied and we have (e,r1,f)𝑒subscript𝑟1𝑓(e,r_{1},f)( italic_e , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f ) and (f,r2,g)𝑓subscript𝑟2𝑔(f,r_{2},g)( italic_f , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_g ) in 𝒢𝒢\mathcal{G}caligraphic_G, then for each iIr𝑖subscript𝐼𝑟i\in I_{r}italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT we have the following constraint:

eσr1(σr2(i))subscript𝑒subscript𝜎subscript𝑟1subscript𝜎subscript𝑟2𝑖\displaystyle e_{\sigma_{r_{1}}(\sigma_{r_{2}}(i))}italic_e start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i ) ) end_POSTSUBSCRIPT fσr2(i)giabsentsubscript𝑓subscript𝜎subscript𝑟2𝑖subscript𝑔𝑖\displaystyle\leq f_{\sigma_{r_{2}}(i)}\leq g_{i}≤ italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ≤ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

Since we assumed σr1(σr2(i))=σr(i)subscript𝜎subscript𝑟1subscript𝜎subscript𝑟2𝑖subscript𝜎𝑟𝑖\sigma_{r_{1}}(\sigma_{r_{2}}(i))=\sigma_{r}(i)italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i ) ) = italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) it follows that eσr(i)gisubscript𝑒subscript𝜎𝑟𝑖subscript𝑔𝑖e_{\sigma_{r}(i)}\leq g_{i}italic_e start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ≤ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for every iIr𝑖subscript𝐼𝑟i\in I_{r}italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and thus that the embedding captures the triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ).

We will come back to the analysis of how rules can be modelled using ordering constraints in the next section. We now turn our focus to how (a differentiable approximation of) the ordering constraints can be learned. Note that we can characterise (3) as follows:

max(𝐀𝐫𝐞,𝐟)=𝐟subscript𝐀𝐫𝐞𝐟𝐟\displaystyle\max(\mathbf{A_{r}}\mathbf{e},\mathbf{f})=\mathbf{f}roman_max ( bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_e , bold_f ) = bold_f (4)

where the maximum is applied component-wise and the matrix 𝐀𝐫d×dsubscript𝐀𝐫superscript𝑑𝑑\mathbf{A_{r}}\in\mathbb{R}^{d\times d}bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT is constrained such that (i) all components are either 0 or 1 and (ii) at most one component in each row is non-zero. This characterisation suggests how our embeddings can be learned using a GNN, as we explain next.

Learning Embeddings with GNNs

Let us write 𝐞(𝐥)dsuperscript𝐞𝐥superscript𝑑\mathbf{e^{(l)}}\in\mathbb{R}^{d}bold_e start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the representation of entity e𝑒eitalic_e in layer l𝑙litalic_l of the GNN. The embeddings 𝐞(𝟎)superscript𝐞0\mathbf{e^{(0)}}bold_e start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT are initialised randomly, ensuring that all coordinates are non-negative, the coordinates of different entity embeddings are sampled independently, and there are at least two distinct values that have a non-negative probability of being sampled for each coordinate. We use a simple message-passing GNN of the following form:

𝐟(𝐥+𝟏)=max({𝐟(𝐥)}{𝐀𝐫𝐞(𝐥)|(e,r,f)𝒢})superscript𝐟𝐥1superscript𝐟𝐥conditional-setsubscript𝐀𝐫superscript𝐞𝐥𝑒𝑟𝑓𝒢\displaystyle\mathbf{f^{(l+1)}}=\max\big{(}\{\mathbf{f^{(l)}}\}\cup\{\mathbf{A% _{r}}\mathbf{e^{(l)}}\,|\,(e,r,f)\in\mathcal{G}\}\big{)}bold_f start_POSTSUPERSCRIPT ( bold_l + bold_1 ) end_POSTSUPERSCRIPT = roman_max ( { bold_f start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT } ∪ { bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_e start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT | ( italic_e , italic_r , italic_f ) ∈ caligraphic_G } ) (5)

where the matrices 𝐀𝐫subscript𝐀𝐫\mathbf{A_{r}}bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT are constrained as before. Due to this constraint, the GNN converges after a finite number of steps m𝑚mitalic_m to embeddings 𝐟𝐦superscript𝐟𝐦\mathbf{f^{m}}bold_f start_POSTSUPERSCRIPT bold_m end_POSTSUPERSCRIPT satisfying 𝐟𝐦=max(𝐟𝐦,𝐀𝐫𝐞𝐦)superscript𝐟𝐦superscript𝐟𝐦subscript𝐀𝐫superscript𝐞𝐦\mathbf{f^{m}}=\max(\mathbf{f^{m}},\mathbf{A_{r}}\mathbf{e^{m}})bold_f start_POSTSUPERSCRIPT bold_m end_POSTSUPERSCRIPT = roman_max ( bold_f start_POSTSUPERSCRIPT bold_m end_POSTSUPERSCRIPT , bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_e start_POSTSUPERSCRIPT bold_m end_POSTSUPERSCRIPT ) for each (e,r,f)𝒢𝑒𝑟𝑓𝒢(e,r,f)\in\mathcal{G}( italic_e , italic_r , italic_f ) ∈ caligraphic_G.

Representing Entities as Matrices

Since the model relies on randomly initialised entity embeddings, the dimensionality of the entity embeddings needs to be sufficiently high. At the same time, the number of parameters that have to be learned for each relation should be sufficiently low to prevent overfitting. For this reason, we learn matrices 𝐀𝐫subscript𝐀𝐫\mathbf{A_{r}}bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT of the following form:

𝐀𝐫=𝐁𝐫𝐈𝐤subscript𝐀𝐫tensor-productsubscript𝐁𝐫subscript𝐈𝐤\displaystyle\mathbf{A_{r}}=\mathbf{B_{r}}\otimes\mathbf{I_{k}}bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT = bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT ⊗ bold_I start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT (6)

where we write tensor-product\otimes for the Kronecker product, 𝐈𝐤subscript𝐈𝐤\mathbf{I_{k}}bold_I start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT is the k𝑘kitalic_k-dimensional identity matrix and 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT is an ×\ell\times\ellroman_ℓ × roman_ℓ matrix, with d=k𝑑𝑘d=k\ellitalic_d = italic_k roman_ℓ. To make the computation of the GNN updates more efficient, we can then represent each entity using a matrix 𝐙𝐞(𝐥)×ksuperscriptsubscript𝐙𝐞𝐥superscript𝑘\mathbf{Z_{e}^{(l)}}\in\mathbb{R}^{\ell\times k}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT roman_ℓ × italic_k end_POSTSUPERSCRIPT and compute updates as follows:

𝐙𝐟(𝐥+𝟏)=max({𝐙𝐟(𝐥)}{𝐁𝐫𝐙𝐞(𝐥)|(e,r,f)𝒢})superscriptsubscript𝐙𝐟𝐥1superscriptsubscript𝐙𝐟𝐥conditional-setsubscript𝐁𝐫superscriptsubscript𝐙𝐞𝐥𝑒𝑟𝑓𝒢\displaystyle\mathbf{Z_{f}^{(l+1)}}=\max\big{(}\{\mathbf{Z_{f}^{(l)}}\}\cup\{% \mathbf{B_{r}}\mathbf{Z_{e}^{(l)}}\,|\,(e,r,f)\in\mathcal{G}\}\big{)}bold_Z start_POSTSUBSCRIPT bold_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l + bold_1 ) end_POSTSUPERSCRIPT = roman_max ( { bold_Z start_POSTSUBSCRIPT bold_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT } ∪ { bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT | ( italic_e , italic_r , italic_f ) ∈ caligraphic_G } ) (7)

It is easy to verify that this model is equivalent to (5) when each matrix 𝐀𝐫subscript𝐀𝐫\mathbf{A_{r}}bold_A start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT is constrained to be of the form (6). Specifically, the matrix 𝐙𝐞(𝐥)=(zij)superscriptsubscript𝐙𝐞𝐥subscript𝑧𝑖𝑗\mathbf{Z_{e}^{(l)}}=(z_{ij})bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT = ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) corresponding to the entity embedding 𝐞(𝐥)=(e1,,ed)superscript𝐞𝐥subscript𝑒1subscript𝑒𝑑\mathbf{e^{(l)}}=(e_{1},...,e_{d})bold_e start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT = ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) is defined as zij=e(i1)k+jsubscript𝑧𝑖𝑗subscript𝑒𝑖1𝑘𝑗z_{ij}=e_{(i-1)k+j}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_j end_POSTSUBSCRIPT, with i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ } and j{1,,k}𝑗1𝑘j\in\{1,...,k\}italic_j ∈ { 1 , … , italic_k }. Note that a triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) is then supported by the embeddings at layer l𝑙litalic_l if:

𝐁𝐫𝐙𝐞(𝐥)𝐙𝐟(𝐥)precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐞𝐥superscriptsubscript𝐙𝐟𝐥\displaystyle\mathbf{B_{r}}\mathbf{Z_{e}^{(l)}}\preceq\mathbf{Z_{f}^{(l)}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT

where 𝐗𝐘precedes-or-equals𝐗𝐘\mathbf{X}\preceq\mathbf{Y}bold_X ⪯ bold_Y denotes that max(𝐗,𝐘)=𝐘𝐗𝐘𝐘\max(\mathbf{X},\mathbf{Y})=\mathbf{Y}roman_max ( bold_X , bold_Y ) = bold_Y.

Model Details

To learn the matrix 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT, we choose each row i𝑖iitalic_i as the first \ellroman_ℓ coordinates of the vector 𝗌𝗈𝖿𝗍𝗆𝖺𝗑(bi,1r,,bi,+1r)𝗌𝗈𝖿𝗍𝗆𝖺𝗑subscriptsuperscript𝑏𝑟𝑖1subscriptsuperscript𝑏𝑟𝑖1\mathsf{softmax}(b^{r}_{i,1},...,b^{r}_{i,\ell+1})sansserif_softmax ( italic_b start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , roman_ℓ + 1 end_POSTSUBSCRIPT ), where bi,1r,,bi,+1rsubscriptsuperscript𝑏𝑟𝑖1subscriptsuperscript𝑏𝑟𝑖1b^{r}_{i,1},...,b^{r}_{i,\ell+1}italic_b start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , roman_ℓ + 1 end_POSTSUBSCRIPT are learnable parameters. Note that we need +11\ell+1roman_ℓ + 1 parameters for this softmax operation to allow for the possibility of some rows to be all 0s. Furthermore, note that while we conceptually think of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT as binary matrices, in practice, we need to approximate such matrices to make learning possible. To initialise the entity embeddings, we set each coordinate to 0 or 1, with 50% probability. To train the model, we use the following scoring function for a given triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ):

s(e,r,f)=𝖱𝖾𝖫𝖴(𝐁𝐫𝐙𝐞(𝐦)𝐙𝐟(𝐦))2𝑠𝑒𝑟𝑓subscriptnorm𝖱𝖾𝖫𝖴subscript𝐁𝐫superscriptsubscript𝐙𝐞𝐦superscriptsubscript𝐙𝐟𝐦2\displaystyle s(e,r,f)=-\|\mathsf{ReLU}(\mathbf{B_{r}}\,\mathbf{Z_{e}^{(m)}}-% \mathbf{Z_{f}^{(m)}})\|_{2}italic_s ( italic_e , italic_r , italic_f ) = - ∥ sansserif_ReLU ( bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT - bold_Z start_POSTSUBSCRIPT bold_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

where m𝑚mitalic_m denotes the number of GNN layers. Note that s(e,r,f)=0𝑠𝑒𝑟𝑓0s(e,r,f)=0italic_s ( italic_e , italic_r , italic_f ) = 0 reaches its maximal value of 0 iff 𝐁𝐫𝐙𝐞(𝐦)𝐙𝐟(𝐦)precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐞𝐦superscriptsubscript𝐙𝐟𝐦\mathbf{B_{r}}\mathbf{Z_{e}^{(m)}}\preceq\mathbf{Z_{f}^{(m)}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT. For each (e,r,f)𝒢𝑒𝑟𝑓𝒢(e,r,f)\in\mathcal{G}( italic_e , italic_r , italic_f ) ∈ caligraphic_G we add an inverse triple (f,rinv,e)𝑓subscript𝑟inv𝑒(f,r_{\textit{inv}},e)( italic_f , italic_r start_POSTSUBSCRIPT inv end_POSTSUBSCRIPT , italic_e ) to 𝒢𝒢\mathcal{G}caligraphic_G. For each entity e𝑒eitalic_e, we also add the triple (e,eq,e)𝑒eq𝑒(e,\textit{eq},e)( italic_e , eq , italic_e ) to 𝒢𝒢\mathcal{G}caligraphic_G, which corresponds to the common practice of adding self-loops to the GNN. Following the literature (??), ReshufflE’s training process uses negative sampling under the partial completeness assumption (PCA) (?), i.e., for each training triple (e,r,f)𝒢𝑒𝑟𝑓𝒢(e,r,f)\in\mathcal{G}( italic_e , italic_r , italic_f ) ∈ caligraphic_G, N𝑁Nitalic_N triples (negative samples) are created by replacing e𝑒eitalic_e or f𝑓fitalic_f in (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) by randomly sampled entities e,fsuperscript𝑒superscript𝑓e^{\prime},f^{\prime}\in\mathcal{E}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_E. To train ReshufflE, we minimise the margin ranking loss, defined as follows:

L(e,r,f)=i=1Nmax(0,s(ei,r,fi)s(e,r,f)+λ)𝐿𝑒𝑟𝑓subscriptsuperscript𝑁𝑖10𝑠superscriptsubscript𝑒𝑖𝑟superscriptsubscript𝑓𝑖𝑠𝑒𝑟𝑓𝜆\displaystyle L(e,r,f)=\sum^{N}_{i=1}\max(0,s(e_{i}^{\prime},r,f_{i}^{\prime})% -s(e,r,f)+\lambda)italic_L ( italic_e , italic_r , italic_f ) = ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT roman_max ( 0 , italic_s ( italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_r , italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_s ( italic_e , italic_r , italic_f ) + italic_λ ) (8)

where (ei,r,fi)superscriptsubscript𝑒𝑖𝑟superscriptsubscript𝑓𝑖(e_{i}^{\prime},r,f_{i}^{\prime})( italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_r , italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is the ith negative sample and λ>0𝜆0\lambda>0italic_λ > 0 is a hyper-parameter, called the margin. At an intuitive level, the margin ranking loss pushes scores of true triples (i.e., those within the training graph) to be larger by at least λ𝜆\lambdaitalic_λ than the scores of triples that are likely false (i.e., negative samples).

5 Constructing GNNs from Rule Graphs

Consider a finite set 𝒫𝒫\mathcal{P}caligraphic_P of closed path rules of the form (1). We now study the following question: Can parameters be found for the proposed GNN model (i.e. the matrices 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT) such that the rules in 𝒫𝒫\mathcal{P}caligraphic_P are captured, and no rules which are not entailed by 𝒫𝒫\mathcal{P}caligraphic_P. Rather than constructing the matrices 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT directly, we first introduce the notion of a rule graph, which will serve as a convenient abstraction of the considered GNNs. We then explain how we can construct the matrices 𝐁rsubscript𝐁𝑟\mathbf{B}_{r}bold_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT from a given rule graph. Throughout this paper, we will assume that 𝒢𝒢\mathcal{G}caligraphic_G contains the triple (e,eq,e)𝑒eq𝑒(e,\textit{eq},e)( italic_e , eq , italic_e ) for every e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E. We will also assume that the relation eq does not appear in the rule base 𝒫𝒫\mathcal{P}caligraphic_P.

Rule Graphs

We will encode the rule base 𝒫𝒫\mathcal{P}caligraphic_P as a labelled multi-graph \mathcal{H}caligraphic_H, i.e. a set of triples (n1,r,n2)subscript𝑛1𝑟subscript𝑛2(n_{1},r,n_{2})( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Note that this graph is formally equivalent to a knowledge graph, but the nodes in this case do not correspond to entities. A path in \mathcal{H}caligraphic_H from n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to np+1subscript𝑛𝑝1n_{p+1}italic_n start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT is a sequence of triples of the form (n1,r1,n2),(n2,r2,n3),,(np,rp,np+1)subscript𝑛1subscript𝑟1subscript𝑛2subscript𝑛2subscript𝑟2subscript𝑛3subscript𝑛𝑝subscript𝑟𝑝subscript𝑛𝑝1(n_{1},r_{1},n_{2}),(n_{2},r_{2},n_{3}),...,(n_{p},r_{p},n_{p+1})( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , … , ( italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). The type of this path is given by the sequence of relations r1;r2;;rpsubscript𝑟1subscript𝑟2subscript𝑟𝑝r_{1};r_{2};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. The eq-reduced type of the path is obtained by removing all occurrences of the relation eq in r1;r2;;rpsubscript𝑟1subscript𝑟2subscript𝑟𝑝r_{1};r_{2};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. For instance, for a path of type r1;eq;eq;r2;eqsubscript𝑟1eqeqsubscript𝑟2eqr_{1};\textit{eq};\textit{eq};r_{2};\textit{eq}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; eq ; eq ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; eq, the eq-reduced type is r1;r2subscript𝑟1subscript𝑟2r_{1};r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Definition 1.

A rule graph \mathcal{H}caligraphic_H for a given rule base 𝒫𝒫\mathcal{P}caligraphic_P is a labelled multi-graph, where the labels are taken from \mathcal{R}caligraphic_R, such that the following properties are satisfied:

(R1)

For every relation r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, there is some edge in \mathcal{H}caligraphic_H labelled with r𝑟ritalic_r.

(R2)

For every node n𝑛nitalic_n in \mathcal{H}caligraphic_H and every r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, it holds that n𝑛nitalic_n has at most one incoming edge labelled with r𝑟ritalic_r.

(R3)

Suppose there is an edge in \mathcal{H}caligraphic_H with label r𝑟ritalic_r from node n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to node n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Suppose furthermore that 𝒫r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r% _{p}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). Then there is a path in \mathcal{H}caligraphic_H from n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

(R4)

Suppose for every two nodes connected by an edge with label r𝑟ritalic_r, there is a path connecting these two nodes whose eq-reduced type belongs to {(r11;;r1p1),,(rq1;;rqpq)}subscript𝑟11subscript𝑟1subscript𝑝1subscript𝑟𝑞1subscript𝑟𝑞subscript𝑝𝑞\{(r_{11};...;r_{1p_{1}}),...,(r_{q1};...;r_{qp_{q}})\}{ ( italic_r start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT 1 italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , … , ( italic_r start_POSTSUBSCRIPT italic_q 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_q italic_p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }. Then there is some i{1,,q}𝑖1𝑞i\in\{1,...,q\}italic_i ∈ { 1 , … , italic_q } such that that 𝒫ri1(X1,X2)ripi(Xpi,Xpi+1)r(X1,Xpi+1)models𝒫subscript𝑟𝑖1subscript𝑋1subscript𝑋2subscript𝑟𝑖subscript𝑝𝑖subscript𝑋subscript𝑝𝑖subscript𝑋subscript𝑝𝑖1𝑟subscript𝑋1subscript𝑋subscript𝑝𝑖1\mathcal{P}\models r_{i1}(X_{1},X_{2})\wedge...\wedge r_{ip_{i}}(X_{p_{i}},X_{% p_{i+1}})\rightarrow r(X_{1},X_{p_{i+1}})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_i italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

This definition reflects the fact that a rule is captured when the ordering constraints associated with its body entail the ordering constraints associated with its head, as was illustrated in Example 1. Specifically, this requirement is captured by condition (R3). Condition (R4) is needed to ensure that only the rules in 𝒫𝒫\mathcal{P}caligraphic_P are captured. Conditions (R1) and (R2) are needed because, in the construction we consider below, the nodes of the rule graph will correspond to the rows of the matrices 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT. Condition (R1) will then ensure that 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT contains at least one non-zero component for each relation r𝑟ritalic_r, while (R2) will ensure that each row of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT has at most one non-zero component.

Example 2.

Let 𝒫𝒫\mathcal{P}caligraphic_P contain the following rules:

r1(X,Y)r2(Y,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r4(X,Y)r5(Y,Z)subscript𝑟4𝑋𝑌subscript𝑟5𝑌𝑍\displaystyle r_{4}(X,Y)\wedge r_{5}(Y,Z)italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r2(X,Z)absentsubscript𝑟2𝑋𝑍\displaystyle\rightarrow r_{2}(X,Z)→ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Z )

A corresponding rule graph is shown in Figure 1.

n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTn2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTn3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTn4subscript𝑛4n_{4}italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTn5subscript𝑛5n_{5}italic_n start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTeq
Figure 1: Rule graph for Example 2.

Constructing GNNs

Given a rule graph \mathcal{H}caligraphic_H, we define the corresponding parameters of the GNN as follows. Specifically, we need to define the matrix 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT for every r𝑟ritalic_r. Each node from the rule graph is associated with one row of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT. Let n1,,nsubscript𝑛1subscript𝑛n_{1},...,n_{\ell}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT be an enumeration of the nodes in the rule graph. The corresponding matrix 𝐁𝐫=(bij)subscript𝐁𝐫subscript𝑏𝑖𝑗\mathbf{B_{r}}=(b_{ij})bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT = ( italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) is defined as:

bij={1if  has an r-edge from nj to ni0otherwisesubscript𝑏𝑖𝑗cases1if  has an r-edge from nj to ni0otherwise\displaystyle b_{ij}=\begin{cases}1&\text{if $\mathcal{H}$ has an $r$-edge % from $n_{j}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 1 end_CELL start_CELL if caligraphic_H has an italic_r -edge from italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

Note that because of condition (R2), there will be at most one non-zero element in each row of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT, in accordance with the assumptions that we made in Section 4.

The following result shows that the constructed GNN indeed captures all the rules from 𝒫𝒫\mathcal{P}caligraphic_P. Specifically, we show that the embeddings which are learned by the GNN (upon convergence) capture all triples that are entailed by 𝒫𝒢𝒫𝒢\mathcal{P}\cup\mathcal{G}caligraphic_P ∪ caligraphic_G.

Proposition 1.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Suppose 𝒫𝒢(a,r,b)models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧ ( italic_a , italic_r , italic_b ). Let \mathcal{H}caligraphic_H be a rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. Assume 𝐙𝐞(𝐦)=𝐙𝐞(𝐦+𝟏)superscriptsubscript𝐙𝐞𝐦superscriptsubscript𝐙𝐞𝐦1\mathbf{Z_{e}^{(m)}}=\mathbf{Z_{e}^{(m+1)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT = bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m + bold_1 ) end_POSTSUPERSCRIPT for every entity e𝑒eitalic_e (m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N). It holds that 𝐁𝐫𝐙𝐚(𝐦)𝐙𝐛(𝐦)precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐦superscriptsubscript𝐙𝐛𝐦\mathbf{B_{r}}\mathbf{Z_{a}^{(m)}}\preceq\mathbf{Z_{b}^{(m)}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT.

We also need to show that the GNN does not capture rules which are not entailed by 𝒫𝒫\mathcal{P}caligraphic_P. However, for any given triple (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) there is always a chance that it is captured by the learned embeddings, even if 𝒫𝒢⊧̸(e,r,f)not-models𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\not\models(e,r,f)caligraphic_P ∪ caligraphic_G ⊧̸ ( italic_e , italic_r , italic_f ), due to the fact that the entity embeddings are initialised randomly. However, by choosing k𝑘kitalic_k to be sufficiently large, we can make the probability of this happening arbitrarily small.

Proposition 2.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Let \mathcal{H}caligraphic_H be a rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. For any ε>0𝜀0\varepsilon>0italic_ε > 0, there exists some k0subscript𝑘0k_{0}\in\mathbb{N}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_N such that, when kk0𝑘subscript𝑘0k\geq k_{0}italic_k ≥ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, for any m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N and (a,r,b)××𝑎𝑟𝑏(a,r,b)\in\mathcal{E}\times\mathcal{R}\times\mathcal{E}( italic_a , italic_r , italic_b ) ∈ caligraphic_E × caligraphic_R × caligraphic_E such that 𝒫𝒢⊧̸(a,r,b)not-models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ ( italic_a , italic_r , italic_b ), we have

Pr[𝐁𝐫𝐙𝐚(𝐦)𝐙𝐛(𝐦)]εPrdelimited-[]precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐦superscriptsubscript𝐙𝐛𝐦𝜀\displaystyle\textit{Pr}[\mathbf{B_{r}}\mathbf{Z_{a}^{(m)}}\preceq\mathbf{Z_{b% }^{(m)}}]\leq\varepsilonPr [ bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ] ≤ italic_ε

6 Constructing Rule Graphs

An important question is whether it is always possible, given a set of closed path rules 𝒫𝒫\mathcal{P}caligraphic_P, to construct a corresponding rule graph satisfying conditions (R1)–(R4). For rule bases where a relation appearing in the head of a rule never appears in the body of some rule, this is clearly the case. The following example illustrates how rule graphs can sometimes be constructed for rule bases which encode cyclic dependencies between the relations.

Example 3.

Let 𝒫𝒫\mathcal{P}caligraphic_P contain the following rules:

r2(X,Y)r3(Y,Z)subscript𝑟2𝑋𝑌subscript𝑟3𝑌𝑍\displaystyle r_{2}(X,Y)\wedge r_{3}(Y,Z)italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r1(Y,Z)absentsubscript𝑟1𝑌𝑍\displaystyle\rightarrow r_{1}(Y,Z)→ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y , italic_Z )
r1(X,Y)r4(Y,Z)subscript𝑟1𝑋𝑌subscript𝑟4𝑌𝑍\displaystyle r_{1}(X,Y)\wedge r_{4}(Y,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r2(X,Z)absentsubscript𝑟2𝑋𝑍\displaystyle\rightarrow r_{2}(X,Z)→ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Z )

A corresponding rule graph is shown in Figure 2.

n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTn2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTn3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTn4subscript𝑛4n_{4}italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTeq
Figure 2: Rule rule graph for Example 3.

However, there exist rule bases for which no valid rule graph can be found. This is illustrated in the next example.

Example 4.

Let 𝒫𝒫\mathcal{P}caligraphic_P contain the following rule:

r1(X,Y)r2(Y,Z)r1(Z,U)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟1𝑍𝑈\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)\wedge r_{1}(Z,U)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) ∧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z , italic_U ) r2(X,U)absentsubscript𝑟2𝑋𝑈\displaystyle\rightarrow r_{2}(X,U)→ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_U )

To see why this rule base cannot be modelled using a rule graph, consider the following knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G:

𝒢={\displaystyle\mathcal{G}{=}\{caligraphic_G = { (x1,r1,x2),(x2,r1,x3),,(xl1,r1,rl),subscript𝑥1subscript𝑟1subscript𝑥2subscript𝑥2subscript𝑟1subscript𝑥3subscript𝑥𝑙1subscript𝑟1subscript𝑟𝑙\displaystyle(x_{1},r_{1},x_{2}),(x_{2},r_{1},x_{3}),...,(x_{l-1},r_{1},r_{l}),( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ,
(xl,r2,xl+1),(xl+1,r1,xl+2),,(xk,r1,xk+1)}\displaystyle(x_{l},r_{2},x_{l+1}),(x_{l+1},r_{1},x_{l+2}),...,(x_{k},r_{1},x_% {k+1})\}( italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT italic_l + 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_l + 2 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) }

We have that 𝒫𝒢(x1,r2,xk+1)models𝒫𝒢subscript𝑥1subscript𝑟2subscript𝑥𝑘1\mathcal{P}\cup\mathcal{G}\models(x_{1},r_{2},x_{k+1})caligraphic_P ∪ caligraphic_G ⊧ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) only if the number of repetitions of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at the start of the sequence matches the number of repetitions at the end. However, this requirement cannot be encoded using a rule graph.

The argument from the previous example can be formalised as follows. Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules. Let 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the set of relations from \mathcal{R}caligraphic_R that appear in the head of some rule in 𝒫𝒫\mathcal{P}caligraphic_P. For any r1𝑟subscript1r\in\mathcal{R}_{1}italic_r ∈ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we can consider a context-free grammar with two types of production rules:

  • For each rule of the form (1), there is a production rule rr1r2rp𝑟subscript𝑟1subscript𝑟2subscript𝑟𝑝r\Rightarrow r_{1}r_{2}...r_{p}italic_r ⇒ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

  • For each r1𝑟subscript1r\in\mathcal{R}_{1}italic_r ∈ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, there is a production rule rr¯𝑟¯𝑟r\Rightarrow\overline{r}italic_r ⇒ over¯ start_ARG italic_r end_ARG.

The elements of (1){r¯|r1}subscript1conditional-set¯𝑟𝑟subscript1(\mathcal{R}\setminus\mathcal{R}_{1})\cup\{\overline{r}\,|\,r\in\mathcal{R}_{1}\}( caligraphic_R ∖ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ { over¯ start_ARG italic_r end_ARG | italic_r ∈ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } are viewed as terminal symbols, those in 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are seen as non-terminal symbols, and r𝑟ritalic_r is used as the starting symbol. Let us write Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for the corresponding language.

Proposition 3.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and suppose that there exists a rule graph \mathcal{H}caligraphic_H for 𝒫𝒫\mathcal{P}caligraphic_P. Let 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the set of relations that appear in the head of some rule in 𝒫𝒫\mathcal{P}caligraphic_P. It holds that the language Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is regular for every r1𝑟subscript1r\in\mathcal{R}_{1}italic_r ∈ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

This result shows that we cannot capture arbitrary rule bases using rule graphs. For instance, for the rule base from Example 4, we have Lr2={r1(l)r¯2r1(l)|l{0}}subscript𝐿subscript𝑟2conditional-setsuperscriptsubscript𝑟1𝑙subscript¯𝑟2superscriptsubscript𝑟1𝑙𝑙0L_{r_{2}}=\{r_{1}^{(l)}\overline{r}_{2}r_{1}^{(l)}\,|\,l\in\mathbb{N}\setminus% \{0\}\}italic_L start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT | italic_l ∈ blackboard_N ∖ { 0 } }, where we write x(l)superscript𝑥𝑙x^{(l)}italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT for the string that consists of l𝑙litalic_l repetitions of x𝑥xitalic_x. It is well-known that the language Lr2subscript𝐿subscript𝑟2L_{r_{2}}italic_L start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is not regular, hence it follows from Proposition 3 that no rule graph exists for this rule base. We address this issue in two different ways. First, in Section 6.1, we introduce a construction for a special class of rule bases, inspired by regular grammars. Second, in Section 6.2, we focus on the practically important setting of bounded inference: since GNNs use a fixed number of layers in practice, what mostly matters is what can be derived in a bounded number of steps. It turns out that if we only care about such inferences, we can capture arbitrary sets of closed path rules.

6.1 Left-Regular Rule Bases

We now introduce the notion of a left-regular rule base, which closely corresponds to the notion of left-regular grammar. As we will see, for left-regular rule bases we can always construct a valid rule graph. This, in turn, means that our model is capable of faithfully capturing such rule bases.

Definition 2.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base. Let 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the set of relations that appear in the head of a rule from 𝒫𝒫\mathcal{P}caligraphic_P. We call 𝒫𝒫\mathcal{P}caligraphic_P left-regular if every rule is of the following form:

r1(X,Y)r2(Y,Z)r3(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟3𝑋𝑍\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r_{3}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z ) (9)

such that r21subscript𝑟2subscript1r_{2}\notin\mathcal{R}_{1}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∉ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Note that even though we only consider rules of the form (9) for the purpose of the construction below, rules with more than two atoms can straightforwardly be simulated by introducing fresh relations. Given a left-regular rule base 𝒫𝒫\mathcal{P}caligraphic_P, we construct the corresponding rule graph \mathcal{H}caligraphic_H as follows.

  1. 1.

    We add the node n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

  2. 2.

    For each relation r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, we add a node nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, and we connect n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT with an r𝑟ritalic_r-edge.

  3. 3.

    For each rule of the form (9), we add an r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edge from nr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

  4. 4.

    For each node n𝑛nitalic_n with multiple incoming r𝑟ritalic_r-edges for some r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, we do the following. Let rsubscript𝑟\sharp_{r}♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT be the number of incoming r𝑟ritalic_r-edges for node n𝑛nitalic_n. Let p=maxrr𝑝subscript𝑟subscript𝑟p=\max_{r\in\mathcal{R}}\sharp_{r}italic_p = roman_max start_POSTSUBSCRIPT italic_r ∈ caligraphic_R end_POSTSUBSCRIPT ♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. We create fresh nodes n1,,np1subscript𝑛1subscript𝑛𝑝1n_{1},...,n_{p-1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT and add eq-edges from nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to ni1subscript𝑛𝑖1n_{i-1}italic_n start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (i{1,,p1}𝑖1𝑝1i\in\{1,...,p-1\}italic_i ∈ { 1 , … , italic_p - 1 }), where we define n0=nsubscript𝑛0𝑛n_{0}=nitalic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_n. Let r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R be such that r>1subscript𝑟1\sharp_{r}>1♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT > 1. Let n0,,nqsubscriptsuperscript𝑛0subscriptsuperscript𝑛𝑞n^{\prime}_{0},...,n^{\prime}_{q}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT be the nodes with an r𝑟ritalic_r-link to n𝑛nitalic_n; then we have qp1𝑞𝑝1q\leq p-1italic_q ≤ italic_p - 1. For each i{1,,q}𝑖1𝑞i\in\{1,...,q\}italic_i ∈ { 1 , … , italic_q } we replace the edge from nisubscriptsuperscript𝑛𝑖n^{\prime}_{i}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to n𝑛nitalic_n by an edge from nisubscriptsuperscript𝑛𝑖n^{\prime}_{i}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

We now illustrate the construction process with an example.

Example 5.

Let 𝒫𝒫\mathcal{P}caligraphic_P contain the following rules:

r1(X,Y)r2(Y,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r4(X,Y)r2(Y,Z)subscript𝑟4𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{4}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r5(X,Y)r2(Y,Z)subscript𝑟5𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{5}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )

The corresponding rule graph is depicted in Figure 3. The nodes n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT were introduced in step 4 of the construction process. Before this step, there were r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edges from nr4subscript𝑛subscript𝑟4n_{r_{4}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and from nr5subscript𝑛subscript𝑟5n_{r_{5}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The node nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT thus had three incoming r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edges, which violates condition (R2). This is addressed through the use of eq edges in step 4.

n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTneqsubscript𝑛eqn_{\textit{eq}}italic_n start_POSTSUBSCRIPT eq end_POSTSUBSCRIPTnr2subscript𝑛subscript𝑟2n_{r_{2}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr4subscript𝑛subscript𝑟4n_{r_{4}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPTn1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTnr5subscript𝑛subscript𝑟5n_{r_{5}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_POSTSUBSCRIPTn2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTeqr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTeqr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTeq
Figure 3: Rule graph for Example 5.

Note that the rule graph may have loops, as illustrated next.

Example 6.

Let 𝒫𝒫\mathcal{P}caligraphic_P contain the following rule:

r1(X,Y)r2(Y,Z)r1(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟1𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r_{1}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Z )

The corresponding rule graph is shown in Figure 4.

n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTnr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr2subscript𝑛subscript𝑟2n_{r_{2}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTneqsubscript𝑛eqn_{\textit{eq}}italic_n start_POSTSUBSCRIPT eq end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTeq
Figure 4: Rule graph for Example 6.

The proposed construction process clearly terminates after a finite number of steps. The following proposition shows that it constructs a valid rule graph for 𝒫𝒫\mathcal{P}caligraphic_P.

Proposition 4.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. It holds that \mathcal{H}caligraphic_H satisfies (R1)–(R4).

6.2 Bounded Inference

In practice, the GNN can only carry out a finite number of inference steps. Rather than requiring that the resulting embeddings capture all triples that can be inferred from 𝒫𝒢𝒫𝒢\mathcal{P}\cup\mathcal{G}caligraphic_P ∪ caligraphic_G, it is natural to merely require that the result captures all triples that can be inferred using a bounded number of inference steps. As before, we assume that 𝒫𝒫\mathcal{P}caligraphic_P contains rules of the form (9), but we no longer require that r21subscript𝑟2subscript1r_{2}\notin\mathcal{R}_{1}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∉ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We know from Proposition 3 that it is then not always possible to construct a valid rule graph. To address this, we will weaken the notion of a rule graph, aiming to capture reasoning up to a fixed number of inference steps.

Let us write 𝒫𝒢m(e,r,f)subscriptmodels𝑚𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models_{m}(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_e , italic_r , italic_f ) to denote that (e,r,f)𝑒𝑟𝑓(e,r,f)( italic_e , italic_r , italic_f ) can be derived from 𝒫𝒢𝒫𝒢\mathcal{P}\cup\mathcal{G}caligraphic_P ∪ caligraphic_G in m𝑚mitalic_m steps. More precisely:

  • 𝒫𝒢0(e,r,f)subscriptmodels0𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models_{0}(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_e , italic_r , italic_f ) iff (e,r,f)𝒢𝑒𝑟𝑓𝒢(e,r,f)\in\mathcal{G}( italic_e , italic_r , italic_f ) ∈ caligraphic_G.

  • 𝒫𝒢m(e,r,f)subscriptmodels𝑚𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models_{m}(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_e , italic_r , italic_f ), for m>0𝑚0m>0italic_m > 0, iff 𝒫𝒢m1(e,r,f)subscriptmodels𝑚1𝒫𝒢𝑒𝑟𝑓\mathcal{P}\cup\mathcal{G}\models_{m-1}(e,r,f)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ( italic_e , italic_r , italic_f ) or there is a rule r1(X1,X2)r2(X2,X3)r(X1,X3)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3𝑟subscript𝑋1subscript𝑋3r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\rightarrow r(X_{1},X_{3})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) in 𝒫𝒫\mathcal{P}caligraphic_P and an entity g𝑔g\in\mathcal{E}italic_g ∈ caligraphic_E such that 𝒫𝒢m1(e,r1,g)subscriptmodelssubscript𝑚1𝒫𝒢𝑒subscript𝑟1𝑔\mathcal{P}\cup\mathcal{G}\models_{m_{1}}(e,r_{1},g)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_e , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g ) and 𝒫𝒢m2(g,r2,f)subscriptmodelssubscript𝑚2𝒫𝒢𝑔subscript𝑟2𝑓\mathcal{P}\cup\mathcal{G}\models_{m_{2}}(g,r_{2},f)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_g , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_f ), with m=m1+m2+1𝑚subscript𝑚1subscript𝑚21m=m_{1}+m_{2}+1italic_m = italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1.

Definition 3.

Let m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N. We call \mathcal{H}caligraphic_H an m𝑚mitalic_m-bounded rule graph for 𝒫𝒫\mathcal{P}caligraphic_P if \mathcal{H}caligraphic_H satisfies conditions (R1)–(R3) as well as the following weakening of (R4):

(R4m)

Suppose for every two nodes connected by an edge with label r𝑟ritalic_r, there is a path connecting these two nodes whose eq-reduced type belongs to {(r11;;r1p1),,(rq1;;rqpq)}subscript𝑟11subscript𝑟1subscript𝑝1subscript𝑟𝑞1subscript𝑟𝑞subscript𝑝𝑞\{(r_{11};...;r_{1p_{1}}),...,(r_{q1};...;r_{qp_{q}})\}{ ( italic_r start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT 1 italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , … , ( italic_r start_POSTSUBSCRIPT italic_q 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_q italic_p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }, with p1,,pqm+1subscript𝑝1subscript𝑝𝑞𝑚1p_{1},...,p_{q}\leq m+1italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≤ italic_m + 1. Then there is some i{1,,q}𝑖1𝑞i\in\{1,...,q\}italic_i ∈ { 1 , … , italic_q } such that that 𝒫mri1(X1,X2)ripi(Xpi,Xpi+1)r(X1,Xpi+1)subscriptmodels𝑚𝒫subscript𝑟𝑖1subscript𝑋1subscript𝑋2subscript𝑟𝑖subscript𝑝𝑖subscript𝑋subscript𝑝𝑖subscript𝑋subscript𝑝𝑖1𝑟subscript𝑋1subscript𝑋subscript𝑝𝑖1\mathcal{P}\models_{m}r_{i1}(X_{1},X_{2})\wedge...\wedge r_{ip_{i}}(X_{p_{i}},% X_{p_{i+1}})\rightarrow r(X_{1},X_{p_{i+1}})caligraphic_P ⊧ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_i italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

Given an m𝑚mitalic_m-bounded rule graph, we can construct a corresponding GNN in the same way as in Section 5. Moreover, Proposition 1 remains valid for m𝑚mitalic_m-bounded rule graphs, as its proof does not depend on (R4). Proposition 2 can be weakened as follows.

Proposition 5.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Let \mathcal{H}caligraphic_H be an m𝑚mitalic_m-bounded rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. For any ε>0𝜀0\varepsilon>0italic_ε > 0, there exists some k0subscript𝑘0k_{0}\in\mathbb{N}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_N such that, when kk0𝑘subscript𝑘0k\geq k_{0}italic_k ≥ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, for any im+1𝑖𝑚1i\leq m+1italic_i ≤ italic_m + 1 and (a,r,b)××𝑎𝑟𝑏(a,r,b)\in\mathcal{E}\times\mathcal{R}\times\mathcal{E}( italic_a , italic_r , italic_b ) ∈ caligraphic_E × caligraphic_R × caligraphic_E such that 𝒫𝒢⊧̸m(a,r,b)subscriptnot-models𝑚𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models_{m}(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a , italic_r , italic_b ), we have

Pr[𝐁𝐫𝐙𝐚(𝐢)𝐙𝐛(𝐢)]εPrdelimited-[]precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐢superscriptsubscript𝐙𝐛𝐢𝜀\displaystyle\textit{Pr}[\mathbf{B_{r}}\mathbf{Z_{a}^{(i)}}\preceq\mathbf{Z_{b% }^{(i)}}]\leq\varepsilonPr [ bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ] ≤ italic_ε

Given a set of closed path rules 𝒫𝒫\mathcal{P}caligraphic_P we can construct an m𝑚mitalic_m-bounded rule graph as follows.

  1. 1.

    We add the node n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

  2. 2.

    For each relation r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, we add a node nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, and we connect n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT with an r𝑟ritalic_r-edge.

  3. 3.

    We repeat the following until convergence. Let r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R and assume there is an r𝑟ritalic_r-edge from n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Let r1(X,Y)r2(Y,Z)r(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍𝑟𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r ( italic_X , italic_Z ) be a rule from 𝒫𝒫\mathcal{P}caligraphic_P and suppose that there is no r1;r2subscript𝑟1subscript𝑟2r_{1};r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT path connecting n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Suppose furthermore that the edge (n,n)𝑛superscript𝑛(n,n^{\prime})( italic_n , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is on some path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to a node nrsubscript𝑛superscript𝑟n_{r^{\prime}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with rsuperscript𝑟r^{\prime}\in\mathcal{R}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_R whose length is at most m𝑚mitalic_m. We add a fresh node n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to the rule graph, an r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-edge from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, and an r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edge from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

  4. 4.

    For each r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R and r𝑟ritalic_r-edge (n,n)𝑛superscript𝑛(n,n^{\prime})( italic_n , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) such that for some rule r1(X,Y)r2(Y,Z)r(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍𝑟𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r ( italic_X , italic_Z ) from 𝒫𝒫\mathcal{P}caligraphic_P there is no r1;r2subscript𝑟1subscript𝑟2r_{1};r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT path connecting n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we do the following:

    1. (a)

      We add a fresh node n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, an r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-edge from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and an r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edge from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

    2. (b)

      We repeat the following until convergence. For each rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-edge from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and each rule r1(X,Y)r2(Y,Z)r(X,Z)superscriptsubscript𝑟1𝑋𝑌superscriptsubscript𝑟2𝑌𝑍superscript𝑟𝑋𝑍r_{1}^{\prime}(X,Y)\wedge r_{2}^{\prime}(Y,Z)\rightarrow r^{\prime}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Z ) from 𝒫𝒫\mathcal{P}caligraphic_P, we add an r1superscriptsubscript𝑟1r_{1}^{\prime}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT edge from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and an r2subscriptsuperscript𝑟2r^{\prime}_{2}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-loop to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT (if no such edges/loops exist yet).

    3. (c)

      We repeat the following until convergence. For each rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-edge from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and each rule r1(X,Y)r2(Y,Z)r(X,Z)superscriptsubscript𝑟1𝑋𝑌superscriptsubscript𝑟2𝑌𝑍superscript𝑟𝑋𝑍r_{1}^{\prime}(X,Y)\wedge r_{2}^{\prime}(Y,Z)\rightarrow r^{\prime}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Z ) from 𝒫𝒫\mathcal{P}caligraphic_P, we add an r1superscriptsubscript𝑟1r_{1}^{\prime}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-loop to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and an r2superscriptsubscript𝑟2r_{2}^{\prime}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-edge from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (if no such edges/loops exist yet).

    4. (d)

      We repeat the following until convergence. For each rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-loop at n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, and each rule r1(X,Y)r2(Y,Z)r(X,Z)superscriptsubscript𝑟1𝑋𝑌superscriptsubscript𝑟2𝑌𝑍superscript𝑟𝑋𝑍r_{1}^{\prime}(X,Y)\wedge r_{2}^{\prime}(Y,Z)\rightarrow r^{\prime}(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X , italic_Z ) from 𝒫𝒫\mathcal{P}caligraphic_P, we add an r1superscriptsubscript𝑟1r_{1}^{\prime}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-loop and an r2superscriptsubscript𝑟2r_{2}^{\prime}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-loop to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT (if no such loops exist yet).

  5. 5.

    For each node n𝑛nitalic_n with multiple incoming r𝑟ritalic_r-edges for one or more relations from \mathcal{R}caligraphic_R, we do the following. Let rsubscript𝑟\sharp_{r}♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT be the number of incoming r𝑟ritalic_r-edges for node n𝑛nitalic_n. Let p=maxrr𝑝subscript𝑟subscript𝑟p=\max_{r\in\mathcal{R}}\sharp_{r}italic_p = roman_max start_POSTSUBSCRIPT italic_r ∈ caligraphic_R end_POSTSUBSCRIPT ♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. We create fresh nodes n1,,np1subscript𝑛1subscript𝑛𝑝1n_{1},...,n_{p-1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT and add eq-edges from nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to ni1subscript𝑛𝑖1n_{i-1}italic_n start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (i{1,,p1}𝑖1𝑝1i\in\{1,...,p-1\}italic_i ∈ { 1 , … , italic_p - 1 }), where we define n0=nsubscript𝑛0𝑛n_{0}=nitalic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_n. Let r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R be such that r>1subscript𝑟1\sharp_{r}>1♯ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT > 1. Let n0,,nqsubscriptsuperscript𝑛0subscriptsuperscript𝑛𝑞n^{\prime}_{0},...,n^{\prime}_{q}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT be the nodes with an r𝑟ritalic_r-link to n𝑛nitalic_n; then we have qp1𝑞𝑝1q\leq p-1italic_q ≤ italic_p - 1. For each i{1,,q}𝑖1𝑞i\in\{1,...,q\}italic_i ∈ { 1 , … , italic_q } we replace the edge from nisubscriptsuperscript𝑛𝑖n^{\prime}_{i}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to n𝑛nitalic_n by an edge from nisubscriptsuperscript𝑛𝑖n^{\prime}_{i}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

We illustrate the construction process with two examples.

Example 7.

Let us consider the following set of rules:

r1(X,Y)r2(Y,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r3(X,Y)r1(Y,Z)subscript𝑟3𝑋𝑌subscript𝑟1𝑌𝑍\displaystyle r_{3}(X,Y)\wedge r_{1}(Y,Z)italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r2(X,Z)absentsubscript𝑟2𝑋𝑍\displaystyle\rightarrow r_{2}(X,Z)→ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Z )

The corresponding 1111-bounded rule graph is shown in Fig. 5.

n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTnr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTneqsubscript𝑛eqn_{\textit{eq}}italic_n start_POSTSUBSCRIPT eq end_POSTSUBSCRIPTn1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTn2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTnr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPTn3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTn4subscript𝑛4n_{4}italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTnr2subscript𝑛subscript𝑟2n_{r_{2}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTeqr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1,r3subscript𝑟1subscript𝑟3r_{1},r_{3}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT abr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTab r2,r1subscript𝑟2subscript𝑟1r_{2},r_{1}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1,r2,r3subscript𝑟1subscript𝑟2subscript𝑟3r_{1},r_{2},r_{3}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr1,r2,r3subscript𝑟1subscript𝑟2subscript𝑟3r_{1},r_{2},r_{3}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
Figure 5: Rule graph for Example 7.
Example 8.

Let us consider the following set of rules:

r1(X,Y)r2(Y,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍\displaystyle r_{1}(X,Y)\wedge r_{2}(Y,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r3(X,Z)absentsubscript𝑟3𝑋𝑍\displaystyle\rightarrow r_{3}(X,Z)→ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r4(X,Y)r5(Y,Z)subscript𝑟4𝑋𝑌subscript𝑟5𝑌𝑍\displaystyle r_{4}(X,Y)\wedge r_{5}(Y,Z)italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r1(X,Z)absentsubscript𝑟1𝑋𝑍\displaystyle\rightarrow r_{1}(X,Z)→ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Z )
r4(X,Y)r5(Y,Z)subscript𝑟4𝑋𝑌subscript𝑟5𝑌𝑍\displaystyle r_{4}(X,Y)\wedge r_{5}(Y,Z)italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) r2(X,Z)absentsubscript𝑟2𝑋𝑍\displaystyle\rightarrow r_{2}(X,Z)→ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X , italic_Z )

The corresponding 2222-bounded rule graph is shown in Fig. 6. Note how this graph is in fact also a rule graph: due to the fact that there are no cyclic dependencies in the rule base 𝒫𝒢2(e,r,g)subscriptmodels2𝒫𝒢𝑒𝑟𝑔\mathcal{P}\cup\mathcal{G}\models_{2}(e,r,g)caligraphic_P ∪ caligraphic_G ⊧ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_e , italic_r , italic_g ) is equivalent with 𝒫𝒢(e,r,g)models𝒫𝒢𝑒𝑟𝑔\mathcal{P}\cup\mathcal{G}\models(e,r,g)caligraphic_P ∪ caligraphic_G ⊧ ( italic_e , italic_r , italic_g ).

n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTn1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTn2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTn3subscript𝑛3n_{3}italic_n start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTnr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPTn4subscript𝑛4n_{4}italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTnr2subscript𝑛subscript𝑟2n_{r_{2}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTn5subscript𝑛5n_{5}italic_n start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTnr4subscript𝑛subscript𝑟4n_{r_{4}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPTnr5subscript𝑛subscript𝑟5n_{r_{5}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_POSTSUBSCRIPTeqr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTeqr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT
Figure 6: Rule graph for Example 8.

The construction process clearly terminates after a finite number of steps. Indeed, only edges that are on a path of length m𝑚mitalic_m are expanded in step 3, and given that there are only finitely many such paths, step 3 must terminate. It is also straightforward to see that the other steps must terminate. As the following proposition shows, the proposed process indeed constructs an m𝑚mitalic_m-bounded rule graph.

Proposition 6.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method for m𝑚mitalic_m-bounded rule graphs. It holds that \mathcal{H}caligraphic_H satisfies (R1)–(R3) and (R4m).

TrainsubscriptTrain\mathcal{R}_{\textit{Train}}caligraphic_R start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT TrainsubscriptTrain\mathcal{E}_{\textit{Train}}caligraphic_E start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT TestsubscriptTest\mathcal{R}_{\textit{Test}}caligraphic_R start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT TestsubscriptTest\mathcal{E}_{\textit{Test}}caligraphic_E start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT
FB15k-237 v1 180 1594 5226 142 1093 2404
v2 200 2608 12085 172 1660 5092
v3 215 3668 22394 183 2501 9137
v4 219 4707 33916 200 3051 14554
WN18RR v1 9 2746 6678 8 922 1991
v2 10 6954 18968 10 2757 4863
v3 11 12078 32150 11 5084 7470
v4 9 3861 9842 9 7084 15157
NELL-995 v1 14 3103 5540 14 225 1034
v2 88 2564 10109 79 2086 5521
v3 142 4647 20117 122 3566 9668
v4 76 2092 9289 61 2795 8520
Table 1: Number of relation, entities, and triples of the train, validation, and test split of the training and testing graph of the inductive benchmarks, split by corresponding benchmark versions v1-4.
FB15k-237 WN18RR NELL-995
v1 v2 v3 v4 v1 v2 v3 v4 v1 v2 v3 v4
GNN CoMPILE 0.676 0.829 0.846 0.874 0.836 0.798 0.606 0.754 0.583 0.938 0.927 0.751
GraIL 0.642 0.818 0.828 0.893 0.825 0.787 0.584 0.734 0.595 0.933 0.914 0.732
NBFNet 0.845 0.949 0.946 0.947 0.946 0.897 0.904 0.889 0.644 0.953 0.967 0.928
Rule RuleN 0.498 0.778 0.877 0.856 0.809 0.782 0.534 0.716 0.535 0.818 0.773 0.614
AnyBURL 0.604 0.823 0.847 0.849 0.867 0.828 0.656 0.796 0.683 0.835 0.798 0.652
Diff-R DRUM 0.529 0.587 0.529 0.559 0.744 0.689 0.462 0.671 0.194 0.786 0.827 0.806
Neural-LP 0.529 0.589 0.529 0.559 0.744 0.689 0.462 0.671 0.408 0.787 0.827 0.806
ReshufflE 0.747 0.885 0.903 0.918 0.710 0.729 0.602 0.694 0.638 0.861 0.882 0.812
Table 2: Hits@10 for 50 negative samples on inductive KGC split by method type (GNN-based vs. rule-based vs. differentiable rule-based).

7 Experimental Results

We now empirically evaluate the effectiveness of the proposed model. We focus on inductive KG completion, as the need to capture reasoning patterns is intuitively more important for this setting compared to the traditional (i.e. transductive) setting. Our model has significant practical advantages compared to the state-of-the-art models. For instance, by only comparing the learned embeddings at query time, it is significantly more efficient than approaches that use GNNs for evaluating queries. Moreover, by using a monotonic GNN, our embeddings can straightforwardly be updated when new knowledge becomes available. As such, our main interest is to see whether our model can be competitive in terms of link prediction performance rather than expecting it to improve the state-of-the-art in this respect.

Datasets

We evaluate ReshufflE on the three standard benchmarks for inductive knowledge graph completion (KGC) that were derived by ? (?) from the datasets: FB15k-237, WN18RR, and NELL-995. Each of these inductive benchmarks contains four different dataset variants, named v1 to v4, and each of these variants consists of two graphs (the training and testing graph) that are sampled from the original dataset as follows. The training graph 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT was obtained by randomly sampling different numbers of entities and selecting their k𝑘kitalic_k-hop neighbourhoods. Next, to construct a disjoint testing graph 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT, the entities of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT were removed from the initial graph, and the same sampling procedure was repeated. Each of these graphs was split into a train set (80%percent8080\%80 %), validation set (10%percent1010\%10 %), and test set (10%percent1010\%10 %). Thus, the three inductive benchmarks consist in total of twelve datasets: FB15k-237 v1-4, WN18RR v1-4, and NELL-995 v1-4. Furthermore, each of these datasets consists of six graphs: the train, validation, and test splits of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT and 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT. Table 1 states the entity, relation, and triple counts of each graph. The supplementary materials provide additional information about these benchmarks, such as their origins and licenses.

Experimental Setup

Following ? (?), we train ReshufflE on the train split of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT, tune our model’s hyper-parameters on the validation split of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT, and finally evaluate the performance of the best model on the test split of 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT. As discussed by ? (?), some approaches in the literature have been evaluated in different ways, e.g. by tuning hyper-parameters on the validation split of 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT, and their reported results are thus not directly comparable. ReshufflE is trained on an NVIDIA Tesla V100 PCIe 32 GB GPU. We train ReshufflE for up to 1000100010001000 epochs, minimizing the margin ranking loss (see Equation 8) with the Adam optimiser (?). If the Hits@10 score on the validation split of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT does not increase by at least 1%percent11\%1 % within 100100100100 epochs, we stop the training early. To account for small performance fluctuations, we repeat our experiments three times and report ReshufflE’s average performance.111Results for all seeds and the resulting standard deviations are provided in the supplementary materials. For the final evaluation, we select the hyper-parameter configuration with the highest Hits@10 score on the validation split of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT. In accordance with ? (?), we evaluate ReshufflE’s test performance on 50505050 negatively sampled entities per triple of the test split of 𝒢Testsubscript𝒢Test\mathcal{G}_{\textit{Test}}caligraphic_G start_POSTSUBSCRIPT Test end_POSTSUBSCRIPT and report the Hits@10 scores. We list further details about the experimental setup in the supplementary materials. To facilitate ReshufflE’s reuse by our community, we will provide its source code in a public GitHub repository upon acceptance of our paper.

Baselines

As the analysis in Sections 5 and 6 reveals, our GNN model acts as a kind of differentiable rule base. We therefore compare ReshufflE to existing approaches for differentiable rule learning: Neural-LP (?) and DRUM (?). We also compare our method to two classical rule learning methods: RuleN (?) and AnyBURL (?). Finally, we include a comparison with GNN-based approaches: CoMPILE (?), GraIL (?), and NBFNet (?).

FB15k-237 WN18RR NELL-995
v1 v2 v3 v4 v1 v2 v3 v4 v1 v2 v3 v4
ReshufflE2 0.304 0.569 0.385 0.916 0.293 0.309 0.155 0.270 0.488 0.558 0.334 0.370
ReshufflEnL 0.744 0.890 0.903 0.917 0.698 0.685 0.618 0.682 0.627 0.738 0.886 0.815
ReshufflE 0.747 0.885 0.903 0.918 0.710 0.729 0.602 0.694 0.638 0.861 0.882 0.812
Table 3: Hits@10 for 50 negative samples on inductive KGC for each ablation of ReshufflE.

Inductive KGC Results

Table 2 reports the performance of ReshufflE on the inductive benchmarks. The results of ReshufflE were obtained by us; AnyBURL and NBFNet results are from ? (?); Neural-LP, DRUM, RuleN, and GraIL results are from ? (?); and CoMPILE results are from ? (?). Table 2 reveals that ReshufflE consistently outperforms the differentiable rule learners DRUM and Neural-LP, often by a significant margin (with WN18RR-v1 the only exception). Compared to the traditional rule learners, ReshufflE performs clearly better on FB15k-237 and NELL-995 (apart from v1) but underperforms on WN18RR. ? (?) found that the kind of rules which are needed for WN18RR are much noisier compared to those than those which are needed for FB15k-237 and NELL-995. Our use of ordering constraints may be less suitable in such cases. Finally, compared to the GNN-based methods, ReshufflE outperforms CoMPILE and GraIL on FB15k-237 and NELL-995 v1 and v4 while again (mostly) underperforming on WN18RR. ReshufflE furthermore consistently underperforms the state-of-the-art method NBFNet. Recall, however, that ReshufflE is significantly more efficient than such GNN-based approaches, as ReshufflE can score the plausibility of a given triple almost instantaneously.

Ablation Study

Finally, we empirically investigate ReshufflE’s components. We consider two variants for this study, namely: (i)𝑖(i)( italic_i ) ReshufflEnL, which does not add a self-loop relation to the KG (i.e. triples of the form (e,eq,e)𝑒eq𝑒(e,\textit{eq},e)( italic_e , eq , italic_e )); and (ii)𝑖𝑖(ii)( italic_i italic_i ) ReshufflE2, which allows for more general 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT matrices. In particular, different from ReshufflE, which applies the softmax function on the rows of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT (see Section 4), ReshufflE2 squares the 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT matrices component-wise, thereby allowing them to contain arbitrary positive values. For a fair comparison, we train each of ReshufflE’s versions with the same hyper-parameter values, experimental setup, and evaluation protocol (see supplementary materials). Table 3 depicts the outcome of this study. It reveals that ReshufflE performs comparable to or better than ReshufflEnL and dramatically outperforms ReshufflE2 on all benchmarks. The similar performance of ReshufflE and ReshufflEnL on most datasets suggests that the self-loop relation only matters in specific cases, which may not occur frequently in some datasets. The poor performance of ReshufflE2 is as expected since allowing arbitrary positive parameters makes overfitting the training data more likely.

8 Conclusions

We have proposed a region-based knowledge graph embedding model that can faithfully capturing rule bases. Specifically, we have shown that embeddings can be constructed that exactly capture the deductive closure of a rule base, provided that the rules are left-regular, a condition which is inspired by left-regular grammars. Furthermore, we have shown that for arbitrary sets of closed path rules, we can learn embeddings which faithfully capture consequences that can be inferred using a bounded number of steps. In this way, our approach goes significantly beyond existing region-based embedding models. An important design choice is that our entity embeddings are constructed using a monotonic GNN, which essentially acts as a differentiable representation of a rule base. We introduced the notion of the rule graph to make this connection between the GNN model and rule bases explicit. The monotonic nature of the GNN also has practical advantages, in particular, the fact that entity embeddings can easily and efficiently be updated when new knowledge becomes available. However, this approach is perhaps less suitable for cases where we need to weigh different pieces of weak evidence (as illustrated by the disappointing results on WN18RR). In such cases, when further evidence becomes available, we may want to revise earlier assumptions, which is not possible with the proposed model. Develo** effective models that can provably simulate non-monotonic (or probabilistic) reasoning thus remains as an important challenge for future work.

Appendix A Constructing GNNs from Rule Graphs

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and let \mathcal{H}caligraphic_H be a corresponding rule graph, satisfying the conditions (R1)–(R4). We also assume that a knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G is given. We show that the GNN, which is constructed based on \mathcal{H}caligraphic_H, correctly simulates the rules from 𝒫𝒫\mathcal{P}caligraphic_P. For the proofs, it will be more convenient to characterise the GNN in terms of operations on the coordinates of entity embeddings. Specifically, let Zi={(i1)k+1,,(i1)k+k}subscript𝑍𝑖𝑖1𝑘1𝑖1𝑘𝑘Z_{i}=\{(i-1)k+1,...,(i-1)k+k\}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { ( italic_i - 1 ) italic_k + 1 , … , ( italic_i - 1 ) italic_k + italic_k } and let Nr{n1,,n}subscript𝑁𝑟subscript𝑛1subscript𝑛N_{r}\subseteq\{n_{1},...,n_{\ell}\}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } be the set of nodes from the rule graph \mathcal{H}caligraphic_H which have an incoming edge labelled with r𝑟ritalic_r. We define:

Irsubscript𝐼𝑟\displaystyle I_{r}italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =niNrZiabsentsubscriptsubscript𝑛𝑖subscript𝑁𝑟subscript𝑍𝑖\displaystyle=\bigcup_{n_{i}\in N_{r}}Z_{i}= ⋃ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

Let niNrsubscript𝑛𝑖subscript𝑁𝑟n_{i}\in N_{r}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and let (nj,ni)subscript𝑛𝑗subscript𝑛𝑖(n_{j},n_{i})( italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) be the unique incoming edge with label r𝑟ritalic_r. Then we define (t{1,,k}𝑡1𝑘t\in\{1,...,k\}italic_t ∈ { 1 , … , italic_k }):

σr((i1)k+t)subscript𝜎𝑟𝑖1𝑘𝑡\displaystyle\sigma_{r}((i-1)k+t)italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ( italic_i - 1 ) italic_k + italic_t ) =(j1)k+tabsent𝑗1𝑘𝑡\displaystyle=(j-1)k+t= ( italic_j - 1 ) italic_k + italic_t

Now let us define:

μr(e1,,ed)subscript𝜇𝑟subscript𝑒1subscript𝑒𝑑\displaystyle\mu_{r}(e_{1},...,e_{d})italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) =(e1,,ed)absentsuperscriptsubscript𝑒1superscriptsubscript𝑒𝑑\displaystyle=(e_{1}^{\prime},...,e_{d}^{\prime})= ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

where ei=eσr(i)superscriptsubscript𝑒𝑖subscript𝑒subscript𝜎𝑟𝑖e_{i}^{\prime}=e_{\sigma_{r}(i)}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_e start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT if iIr𝑖subscript𝐼𝑟i\in I_{r}italic_i ∈ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and ei=0superscriptsubscript𝑒𝑖0e_{i}^{\prime}=0italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 otherwise. Let 𝐞(l)superscript𝐞𝑙\mathbf{e}^{(l)}bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT be the entity embedding corresponding to the matrix 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT. In other words, if we write zijsubscript𝑧𝑖𝑗z_{ij}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for the components of 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT and eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for the components of 𝐞(l)superscript𝐞𝑙\mathbf{e}^{(l)}bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, then we have zij=e(i1)k+jsubscript𝑧𝑖𝑗subscript𝑒𝑖1𝑘𝑗z_{ij}=e_{(i-1)k+j}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_j end_POSTSUBSCRIPT. For a matrix 𝐗=(xij)𝐗subscript𝑥𝑖𝑗\mathbf{X}=(x_{ij})bold_X = ( italic_x start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ), let us write flatten(𝐗)flatten𝐗\textit{flatten}(\mathbf{X})flatten ( bold_X ) for the vector that is obtained by concatenating the rows of 𝐗𝐗\mathbf{X}bold_X. In particular, flatten(𝐙𝐞(𝐥))=𝐞(l)flattensuperscriptsubscript𝐙𝐞𝐥superscript𝐞𝑙\textit{flatten}(\mathbf{Z_{e}^{(l)}})=\mathbf{e}^{(l)}flatten ( bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ) = bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. The following lemma reveals how the GNN constructed from the rule graph \mathcal{H}caligraphic_H can be characterised in terms of entity embeddings.

Lemma 1.

It holds that flatten(𝐁𝐫𝐙𝐞(𝐥))=μr(𝐞(l))flattensubscript𝐁𝐫superscriptsubscript𝐙𝐞𝐥subscript𝜇𝑟superscript𝐞𝑙\textit{flatten}(\mathbf{B_{r}}\mathbf{Z_{e}^{(l)}})=\mu_{r}(\mathbf{e}^{(l)})flatten ( bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ).

Proof.

Let us write flatten(𝐁𝐫𝐙𝐞(𝐥))=(x1,,xd)flattensubscript𝐁𝐫superscriptsubscript𝐙𝐞𝐥subscript𝑥1subscript𝑥𝑑\textit{flatten}(\mathbf{B_{r}}\mathbf{Z_{e}^{(l)}})=(x_{1},...,x_{d})flatten ( bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT ) = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), μr(𝐞(l))=(y1,,yd)subscript𝜇𝑟superscript𝐞𝑙subscript𝑦1subscript𝑦𝑑\mu_{r}(\mathbf{e}^{(l)})=(y_{1},...,y_{d})italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and 𝐞(l)=(e1,,ed)superscript𝐞𝑙subscript𝑒1subscript𝑒𝑑\mathbf{e}^{(l)}=(e_{1},...,e_{d})bold_e start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Let i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ }. Let us first assume that nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT does not have any incoming edges in \mathcal{H}caligraphic_H which are labelled with r𝑟ritalic_r. In that case, row i𝑖iitalic_i of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT consists only of 0s and we have x(i1)k+1==x(i1)k+k=0subscript𝑥𝑖1𝑘1subscript𝑥𝑖1𝑘𝑘0x_{(i-1)k+1}=...=x_{(i-1)k+k}=0italic_x start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + 1 end_POSTSUBSCRIPT = … = italic_x start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_k end_POSTSUBSCRIPT = 0. Similarly, we then also have (i1)k+jIr𝑖1𝑘𝑗subscript𝐼𝑟(i-1)k+j\notin I_{r}( italic_i - 1 ) italic_k + italic_j ∉ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for j{1,,k}𝑗1𝑘j\in\{1,...,k\}italic_j ∈ { 1 , … , italic_k } and thus y(i1)k+1==y(i1)k+k=0subscript𝑦𝑖1𝑘1subscript𝑦𝑖1𝑘𝑘0y_{(i-1)k+1}=...=y_{(i-1)k+k}=0italic_y start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + 1 end_POSTSUBSCRIPT = … = italic_y start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_k end_POSTSUBSCRIPT = 0. Now assume that there is an edge from njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which is labelled with r𝑟ritalic_r. Then we have that row i𝑖iitalic_i of 𝐁𝐫subscript𝐁𝐫\mathbf{B_{r}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT is a one-hot vector with 1 at position j𝑗jitalic_j. Accordingly, we have x(i1)k+t=e(j1)k+tsubscript𝑥𝑖1𝑘𝑡subscript𝑒𝑗1𝑘𝑡x_{(i-1)k+t}=e_{(j-1)k+t}italic_x start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT for t{1,,k}𝑡1𝑘t\in\{1,...,k\}italic_t ∈ { 1 , … , italic_k }. Accordingly we then have σr((i1)k+t)=(j1)k+tsubscript𝜎𝑟𝑖1𝑘𝑡𝑗1𝑘𝑡\sigma_{r}((i-1)k+t)=(j-1)k+titalic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ( italic_i - 1 ) italic_k + italic_t ) = ( italic_j - 1 ) italic_k + italic_t and thus y(i1)k+t=e(j1)k+tsubscript𝑦𝑖1𝑘𝑡subscript𝑒𝑗1𝑘𝑡y_{(i-1)k+t}=e_{(j-1)k+t}italic_y start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT. ∎

For a sequence of relations r1,,rpsubscript𝑟1subscript𝑟𝑝r_{1},...,r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, we define μr1;;rpsubscript𝜇subscript𝑟1subscript𝑟𝑝\mu_{r_{1};...;r_{p}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT as follows. We define μr1;;rp(x1,,xd)=(y1,,yd)subscript𝜇subscript𝑟1subscript𝑟𝑝subscript𝑥1subscript𝑥𝑑subscript𝑦1subscript𝑦𝑑\mu_{r_{1};...;r_{p}}(x_{1},...,x_{d})=(y_{1},...,y_{d})italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), where (i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ }, t{1,,k}𝑡1𝑘t\in\{1,...,k\}italic_t ∈ { 1 , … , italic_k }):

y(i1)k+t={x(j1)k+tif there is an r1;;rp path from nj to ni0otherwisesubscript𝑦𝑖1𝑘𝑡casessubscript𝑥𝑗1𝑘𝑡if there is an r1;;rp path otherwisefrom nj to ni0otherwise\displaystyle y_{(i-1)k+t}=\begin{cases}x_{(j-1)k+t}&\text{if there is an $r_{% 1};...;r_{p}$ path }\\ &\text{from $n_{j}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_y start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT end_CELL start_CELL if there is an italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT path end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL from italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

Note that if there is an r1;;rksubscript𝑟1subscript𝑟𝑘r_{1};...;r_{k}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT path arriving at node nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the rule graph, it has to be unique, given that each node has at most one incoming edge of a given type. In the following, we will also use Ir1;;rpsubscript𝐼subscript𝑟1subscript𝑟𝑝I_{r_{1};...;r_{p}}italic_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT, defined as follows:

Ir1;;rpsubscript𝐼subscript𝑟1subscript𝑟𝑝\displaystyle I_{r_{1};...;r_{p}}italic_I start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT
={(i1)k+t|there is an r1;;rp path ending in ni}absentconditional-set𝑖1𝑘𝑡there is an r1;;rp path ending in ni\displaystyle\quad=\{(i-1)k+t\,|\,\text{there is an $r_{1};...;r_{p}$ path % ending in $n_{i}$}\}= { ( italic_i - 1 ) italic_k + italic_t | there is an italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT path ending in italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }

We have the following result.

Lemma 2.

For r1,,rpsubscript𝑟1subscript𝑟𝑝r_{1},...,r_{p}\in\mathcal{R}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ caligraphic_R we have

μr1;;rp(x1,,xd)=μrp(μr1(x1,,xd))subscript𝜇subscript𝑟1subscript𝑟𝑝subscript𝑥1subscript𝑥𝑑subscript𝜇subscript𝑟𝑝subscript𝜇subscript𝑟1subscript𝑥1subscript𝑥𝑑\mu_{r_{1};...;r_{p}}(x_{1},...,x_{d})=\mu_{r_{p}}(...\mu_{r_{1}}(x_{1},...,x_% {d})...)italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( … italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) … )
Proof.

It is sufficient to show

μr1;;rp(x1,,xd)=μrp(μr1;;rp1(x1,,xd))subscript𝜇subscript𝑟1subscript𝑟𝑝subscript𝑥1subscript𝑥𝑑subscript𝜇subscript𝑟𝑝subscript𝜇subscript𝑟1subscript𝑟𝑝1subscript𝑥1subscript𝑥𝑑\mu_{r_{1};...;r_{p}}(x_{1},...,x_{d})=\mu_{r_{p}}(\mu_{r_{1};...;r_{p-1}}(x_{% 1},...,x_{d}))italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) )

We have μr1;;rp1(x1,,xd)=(y1,,yd)subscript𝜇subscript𝑟1subscript𝑟𝑝1subscript𝑥1subscript𝑥𝑑subscript𝑦1subscript𝑦𝑑\mu_{r_{1};...;r_{p-1}}(x_{1},...,x_{d})=(y_{1},...,y_{d})italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), with

y(i1)k+t={x(j1)k+tif there is an r1;;rp1 pathfrom nj to ni0otherwisesubscript𝑦𝑖1𝑘𝑡casessubscript𝑥𝑗1𝑘𝑡if there is an r1;;rp1 pathotherwisefrom nj to ni0otherwise\displaystyle y_{(i-1)k+t}=\begin{cases}x_{(j-1)k+t}&\text{if there is an $r_{% 1};...;r_{p-1}$ path}\\ &\text{from $n_{j}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_y start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT end_CELL start_CELL if there is an italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT path end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL from italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

We furthermore have μrp(y1,,yd)=(z1,,zd)subscript𝜇subscript𝑟𝑝subscript𝑦1subscript𝑦𝑑subscript𝑧1subscript𝑧𝑑\mu_{r_{p}}(y_{1},...,y_{d})=(z_{1},...,z_{d})italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) with

z(i1)k+t={y(j1)k+tif there is an rp-edgefrom nj to ni0otherwisesubscript𝑧𝑖1𝑘𝑡casessubscript𝑦𝑗1𝑘𝑡if there is an rp-edgeotherwisefrom nj to ni0otherwise\displaystyle z_{(i-1)k+t}=\begin{cases}y_{(j-1)k+t}&\text{if there is an $r_{% p}$-edge}\\ &\text{from $n_{j}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_z start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_y start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT end_CELL start_CELL if there is an italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT -edge end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL from italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

Taking into account the definition of (y1,,yd)subscript𝑦1subscript𝑦𝑑(y_{1},...,y_{d})( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), we have y(j1)k+t0subscript𝑦𝑗1𝑘𝑡0y_{(j-1)k+t}\neq 0italic_y start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT ≠ 0 only if there is an r1;;rp1subscript𝑟1subscript𝑟𝑝1r_{1};...;r_{p-1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT path from some node nlsubscript𝑛𝑙n_{l}italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT to the node njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, in which case we have y(j1)k+t=x(l1)k+tsubscript𝑦𝑗1𝑘𝑡subscript𝑥𝑙1𝑘𝑡y_{(j-1)k+t}=x_{(l-1)k+t}italic_y start_POSTSUBSCRIPT ( italic_j - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT ( italic_l - 1 ) italic_k + italic_t end_POSTSUBSCRIPT. In other words, we have:

z(i1)k+t={x(l1)k+tif there is an r1;;rp1 path from nl to some nj and an rp edge from nj to ni0otherwisesubscript𝑧𝑖1𝑘𝑡casessubscript𝑥𝑙1𝑘𝑡if there is an r1;;rp1 path otherwisefrom nl to some nj and an otherwiserp edge from nj to ni0otherwise\displaystyle z_{(i-1)k+t}=\begin{cases}x_{(l-1)k+t}&\text{if there is an $r_{% 1};...;r_{p-1}$ path }\\ &\text{from $n_{l}$ to some $n_{j}$ and an }\\ &\text{$r_{p}$ edge from $n_{j}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_z start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT ( italic_l - 1 ) italic_k + italic_t end_POSTSUBSCRIPT end_CELL start_CELL if there is an italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT path end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL from italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT to some italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and an end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT edge from italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

In other words, we have

z(i1)k+t={x(l1)k+tif there is an r1;;rp pathfrom nl to ni0otherwisesubscript𝑧𝑖1𝑘𝑡casessubscript𝑥𝑙1𝑘𝑡if there is an r1;;rp pathotherwisefrom nl to ni0otherwise\displaystyle z_{(i-1)k+t}=\begin{cases}x_{(l-1)k+t}&\text{if there is an $r_{% 1};...;r_{p}$ path}\\ &\text{from $n_{l}$ to $n_{i}$}\\ 0&\text{otherwise}\end{cases}italic_z start_POSTSUBSCRIPT ( italic_i - 1 ) italic_k + italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT ( italic_l - 1 ) italic_k + italic_t end_POSTSUBSCRIPT end_CELL start_CELL if there is an italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT path end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL from italic_n start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT to italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

We thus have (z1,,zd)=μr1;;rp(x1,,xd)subscript𝑧1subscript𝑧𝑑subscript𝜇subscript𝑟1subscript𝑟𝑝subscript𝑥1subscript𝑥𝑑(z_{1},...,z_{d})=\mu_{r_{1};...;r_{p}}(x_{1},...,x_{d})( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). ∎

We also have the following result.

Lemma 3.

Suppose 𝒫r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r% _{p}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). There exists paths of type r11;;rq11subscriptsuperscript𝑟11subscriptsuperscript𝑟1subscript𝑞1r^{1}_{1};...;r^{1}_{q_{1}}italic_r start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and r12;;rq22subscriptsuperscript𝑟21subscriptsuperscript𝑟2subscript𝑞2r^{2}_{1};...;r^{2}_{q_{2}}italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and … and r1l;;rqllsubscriptsuperscript𝑟𝑙1subscriptsuperscript𝑟𝑙subscript𝑞𝑙r^{l}_{1};...;r^{l}_{q_{l}}italic_r start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT, all of whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, such that for every embedding (x1,,xd)subscript𝑥1subscript𝑥𝑑(x_{1},...,x_{d})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) we have:

μr(x1,,xd)maxi=1lμr1i;;rqii(x1,,xd)precedes-or-equalssubscript𝜇𝑟subscript𝑥1subscript𝑥𝑑superscriptsubscript𝑖1𝑙subscript𝜇subscriptsuperscript𝑟𝑖1subscriptsuperscript𝑟𝑖subscript𝑞𝑖subscript𝑥1subscript𝑥𝑑\mu_{r}(x_{1},...,x_{d})\preccurlyeq\max_{i=1}^{l}\mu_{r^{i}_{1};...;r^{i}_{q_% {i}}}(x_{1},...,x_{d})italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≼ roman_max start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT )
Proof.

This follows immediately from the fact that whenever there is an r𝑟ritalic_r-edge between two nodes n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there must also be a path between these nodes whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, because of condition (R3). ∎

The following result shows that the GNN will correctly predict all triples that can be inferred from 𝒢𝒫𝒢𝒫\mathcal{G}\cup\mathcal{P}caligraphic_G ∪ caligraphic_P.

Proposition 7.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Suppose 𝒫𝒢(a,r,b)models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧ ( italic_a , italic_r , italic_b ). Let \mathcal{H}caligraphic_H be a rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. Assume 𝐙𝐞(𝐦)=𝐙𝐞(𝐦+𝟏)superscriptsubscript𝐙𝐞𝐦superscriptsubscript𝐙𝐞𝐦1\mathbf{Z_{e}^{(m)}}=\mathbf{Z_{e}^{(m+1)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT = bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m + bold_1 ) end_POSTSUPERSCRIPT for every entity e𝑒eitalic_e (m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N). It holds that 𝐁𝐫𝐙𝐚(𝐦)𝐙𝐛(𝐦)precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐦superscriptsubscript𝐙𝐛𝐦\mathbf{B_{r}}\mathbf{Z_{a}^{(m)}}\preceq\mathbf{Z_{b}^{(m)}}bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT.

Proof.

Because of Lemma 1, it is sufficient to show that μr(𝐚(𝐦))𝐛(𝐦)precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦\mu_{r}(\mathbf{a^{(m)}})\preceq\mathbf{b^{(m)}}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ⪯ bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT. If 𝒢𝒢\mathcal{G}caligraphic_G contains the triple (a,r,b)𝑎𝑟𝑏(a,r,b)( italic_a , italic_r , italic_b ) then the result is trivially satisfied. Otherwise, 𝒫𝒢r(a,b)models𝒫𝒢𝑟𝑎𝑏\mathcal{P}\cup\mathcal{G}\models r(a,b)caligraphic_P ∪ caligraphic_G ⊧ italic_r ( italic_a , italic_b ) implies that 𝒫r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r% _{p}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ), for some r1,,rp,rsubscript𝑟1subscript𝑟𝑝𝑟r_{1},...,r_{p},r\in\mathcal{R}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r ∈ caligraphic_R such that 𝒢𝒢\mathcal{G}caligraphic_G contains triples (a,r1,a2),(a2,r2,a3),,(ap,rp,b)𝑎subscript𝑟1subscript𝑎2subscript𝑎2subscript𝑟2subscript𝑎3subscript𝑎𝑝subscript𝑟𝑝𝑏(a,r_{1},a_{2}),(a_{2},r_{2},a_{3}),...,(a_{p},r_{p},b)( italic_a , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , … , ( italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_b ), for some a2,,apsubscript𝑎2subscript𝑎𝑝a_{2},...,a_{p}\in\mathcal{E}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ caligraphic_E. Because (a,r1,a2)𝒢𝑎subscript𝑟1subscript𝑎2𝒢(a,r_{1},a_{2})\in\mathcal{G}( italic_a , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ caligraphic_G, by construction, it holds for each i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N that:

μr1(𝐚(𝐢))𝐚𝟐(𝐢+𝟏)precedes-or-equalssubscript𝜇subscript𝑟1superscript𝐚𝐢superscriptsubscript𝐚2𝐢1\mu_{r_{1}}(\mathbf{a^{(i)}})\preccurlyeq\mathbf{a_{2}^{(i+1)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ≼ bold_a start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_1 ) end_POSTSUPERSCRIPT

Similarly, because (a2,r2,a3)𝒢subscript𝑎2subscript𝑟2subscript𝑎3𝒢(a_{2},r_{2},a_{3})\in\mathcal{G}( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ caligraphic_G, we have μr2(𝐚𝟐(𝐢+𝟏))𝐚𝟑(𝐢+𝟐)precedes-or-equalssubscript𝜇subscript𝑟2superscriptsubscript𝐚2𝐢1superscriptsubscript𝐚3𝐢2\mu_{r_{2}}(\mathbf{a_{2}^{(i+1)}})\preccurlyeq\mathbf{a_{3}^{(i+2)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_1 ) end_POSTSUPERSCRIPT ) ≼ bold_a start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_2 ) end_POSTSUPERSCRIPT and thus

μr2(μr1(𝐚(𝐢)))μr2(𝐚𝟐(𝐢+𝟏))𝐚𝟑(𝐢+𝟐)precedes-or-equalssubscript𝜇subscript𝑟2subscript𝜇subscript𝑟1superscript𝐚𝐢subscript𝜇subscript𝑟2superscriptsubscript𝐚2𝐢1precedes-or-equalssuperscriptsubscript𝐚3𝐢2\mu_{r_{2}}(\mu_{r_{1}}(\mathbf{a^{(i)}}))\preccurlyeq\mu_{r_{2}}(\mathbf{a_{2% }^{(i+1)}})\preccurlyeq\mathbf{a_{3}^{(i+2)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ) ≼ italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_1 ) end_POSTSUPERSCRIPT ) ≼ bold_a start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_2 ) end_POSTSUPERSCRIPT

In other words, we have

μr1;r2(𝐚(𝐢))𝐚𝟑(𝐢+𝟐)precedes-or-equalssubscript𝜇subscript𝑟1subscript𝑟2superscript𝐚𝐢superscriptsubscript𝐚3𝐢2\mu_{r_{1};r_{2}}(\mathbf{a^{(i)}})\preccurlyeq\mathbf{a_{3}^{(i+2)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ≼ bold_a start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i + bold_2 ) end_POSTSUPERSCRIPT

Continuing in the same way, we find that

μr1;;rp1;rp(𝐚(𝐢))𝐛(𝐢+𝐩)precedes-or-equalssubscript𝜇subscript𝑟1subscript𝑟𝑝1subscript𝑟𝑝superscript𝐚𝐢superscript𝐛𝐢𝐩\mu_{r_{1};...;r_{p-1};r_{p}}(\mathbf{a^{(i)}})\preccurlyeq\mathbf{b^{(i+p)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ≼ bold_b start_POSTSUPERSCRIPT ( bold_i + bold_p ) end_POSTSUPERSCRIPT

Now consider a path of type r1;;rqsubscriptsuperscript𝑟1superscriptsubscript𝑟𝑞r^{\prime}_{1};...;r_{q}^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Then we have that 𝒢𝒢\mathcal{G}caligraphic_G contains triples of the form (a,r1,b2),(b2,r2,b3),,(bp,rq,b)𝑎subscriptsuperscript𝑟1subscript𝑏2subscript𝑏2subscript𝑟2subscript𝑏3subscript𝑏𝑝subscriptsuperscript𝑟𝑞𝑏(a,r^{\prime}_{1},b_{2}),(b_{2},r_{2},b_{3}),...,(b_{p},r^{\prime}_{q},b)( italic_a , italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , … , ( italic_b start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_b ). Indeed, the only triples that need to be considered in addition to the triples (a,r1,a2),(a2,r2,a3),,(ap,rp,b)𝑎subscript𝑟1subscript𝑎2subscript𝑎2subscript𝑟2subscript𝑎3subscript𝑎𝑝subscript𝑟𝑝𝑏(a,r_{1},a_{2}),(a_{2},r_{2},a_{3}),...,(a_{p},r_{p},b)( italic_a , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , … , ( italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_b ) are of the form (ai,eq,ai)subscript𝑎𝑖eqsubscript𝑎𝑖(a_{i},\textit{eq},a_{i})( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , eq , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which we have assumed to belong to 𝒢𝒢\mathcal{G}caligraphic_G for every aisubscript𝑎𝑖a_{i}\in\mathcal{E}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_E. For every path of type r1;;rqsubscriptsuperscript𝑟1superscriptsubscript𝑟𝑞r^{\prime}_{1};...;r_{q}^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, we thus find entirely similarly to before that

μr1;;rq(𝐚(𝐢))𝐛(𝐢+𝐩)precedes-or-equalssubscript𝜇subscriptsuperscript𝑟1subscriptsuperscript𝑟𝑞superscript𝐚𝐢superscript𝐛𝐢𝐩\mu_{r^{\prime}_{1};...;r^{\prime}_{q}}(\mathbf{a^{(i)}})\preccurlyeq\mathbf{b% ^{(i+p)}}italic_μ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ≼ bold_b start_POSTSUPERSCRIPT ( bold_i + bold_p ) end_POSTSUPERSCRIPT

Because of Lemma 3, this implies

μr(𝐚(𝐢))𝐛(𝐢+𝐩)precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐢superscript𝐛𝐢𝐩\mu_{r}(\mathbf{a^{(i)}})\preccurlyeq\mathbf{b^{(i+p)}}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ) ≼ bold_b start_POSTSUPERSCRIPT ( bold_i + bold_p ) end_POSTSUPERSCRIPT

In particular, we have

μr(𝐚(𝐦))𝐛(𝐦+𝐩)precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦𝐩\mu_{r}(\mathbf{a^{(m)}})\preccurlyeq\mathbf{b^{(m+p)}}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ≼ bold_b start_POSTSUPERSCRIPT ( bold_m + bold_p ) end_POSTSUPERSCRIPT

and because of the assumption that the GNN has converged after m𝑚mitalic_m steps, we also have μr(𝐚(𝐦))𝐛(𝐦)precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦\mu_{r}(\mathbf{a^{(m)}})\preccurlyeq\mathbf{b^{(m)}}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ≼ bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT. ∎

For e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E, let paths𝒢(e)subscriptpaths𝒢𝑒\textit{paths}_{\mathcal{G}}(e)paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_e ) be the set of all paths in the knowledge graph 𝒢𝒢\mathcal{G}caligraphic_G which end in e𝑒eitalic_e. For a path π𝜋\piitalic_π in paths𝒢(e)subscriptpaths𝒢𝑒\textit{paths}_{\mathcal{G}}(e)paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_e ), we write ℎ𝑒𝑎𝑑(π)ℎ𝑒𝑎𝑑𝜋\mathit{head}(\pi)italic_head ( italic_π ) for the entity where the path starts and 𝑟𝑒𝑙𝑠(π)𝑟𝑒𝑙𝑠𝜋\mathit{rels}(\pi)italic_rels ( italic_π ) for the corresponding sequence of relations. For an entity e𝑒eitalic_e, we write embm(e)subscriptemb𝑚𝑒\textit{emb}_{m}(e)emb start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_e ) for its embedding in layer m𝑚mitalic_m, i.e. embm(e)=𝐞(𝐦)subscriptemb𝑚𝑒superscript𝐞𝐦\textit{emb}_{m}(e)=\mathbf{e^{(m)}}emb start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_e ) = bold_e start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT. The following observation follows immediately from the construction of the GNN, together with Lemma 2.

Lemma 4.

For any entity e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E it holds that

𝐞(𝐦)max(𝐞(𝟎),maxπpaths𝒢(e)μ𝑟𝑒𝑙𝑠(π)(emb0(head(π))))precedes-or-equalssuperscript𝐞𝐦superscript𝐞0subscript𝜋subscriptpaths𝒢𝑒subscript𝜇𝑟𝑒𝑙𝑠𝜋subscriptemb0head𝜋\displaystyle\mathbf{e^{(m)}}\preceq\max\Big{(}\mathbf{e^{(0)}},\max_{\pi\in% \textit{paths}_{\mathcal{G}}(e)}\mu_{\mathit{rels}(\pi)}\big{(}\textit{emb}_{0% }(\textit{head}(\pi))\big{)}\Big{)}bold_e start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ roman_max ( bold_e start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT , roman_max start_POSTSUBSCRIPT italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_e ) end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_rels ( italic_π ) end_POSTSUBSCRIPT ( emb start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( head ( italic_π ) ) ) )

We will also need the following technical lemma.

Lemma 5.

Suppose 𝒫𝒢⊧̸(a,r,b)not-models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ ( italic_a , italic_r , italic_b ). Then there is some i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ } such that:

  • ZiIrsubscript𝑍𝑖subscript𝐼𝑟Z_{i}\subseteq I_{r}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT; and

  • whenever πpaths𝒢(b)𝜋subscriptpaths𝒢𝑏\pi\in\textit{paths}_{\mathcal{G}}(b)italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) with head(π)=ahead𝜋𝑎\textit{head}(\pi)=ahead ( italic_π ) = italic_a, it holds that Irels(π)Zi=subscript𝐼rels𝜋subscript𝑍𝑖I_{\textit{rels}(\pi)}\cap Z_{i}=\emptysetitalic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∅.

Proof.

Let us write 𝒵r={i{1,,}|ZiIr1}subscript𝒵𝑟conditional-set𝑖1subscript𝑍𝑖subscriptsuperscript𝐼1𝑟\mathcal{Z}_{r}=\{i\in\{1,...,\ell\}\,|\,Z_{i}\subseteq I^{1}_{r}\}caligraphic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = { italic_i ∈ { 1 , … , roman_ℓ } | italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_I start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT }. Note that i𝒵r𝑖subscript𝒵𝑟i\in\mathcal{Z}_{r}italic_i ∈ caligraphic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT iff node nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in \mathcal{H}caligraphic_H has an incoming r𝑟ritalic_r-edge. It thus follows from condition (R1) that 𝒵rsubscript𝒵𝑟\mathcal{Z}_{r}\neq\emptysetcaligraphic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≠ ∅. Suppose that for every i𝒵r𝑖subscript𝒵𝑟i\in\mathcal{Z}_{r}italic_i ∈ caligraphic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, there was some πpaths𝒢(b)𝜋subscriptpaths𝒢𝑏\pi\in\textit{paths}_{\mathcal{G}}(b)italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) with head(π)=ahead𝜋𝑎\textit{head}(\pi)=ahead ( italic_π ) = italic_a such that Irels(π)Zisubscript𝐼rels𝜋subscript𝑍𝑖I_{\textit{rels}(\pi)}\cap Z_{i}\neq\emptysetitalic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅. Let us write X={rels(π)|πpaths𝒢(b),head(π)=a,Irels(π)Zi}𝑋conditional-setrels𝜋formulae-sequence𝜋subscriptpaths𝒢𝑏formulae-sequencehead𝜋𝑎subscript𝐼rels𝜋subscript𝑍𝑖X=\{\textit{rels}(\pi)\,|\,\pi\in\textit{paths}_{\mathcal{G}}(b),\textit{head}% (\pi)=a,I_{\textit{rels}(\pi)}\cap Z_{i}\neq\emptyset\}italic_X = { rels ( italic_π ) | italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) , head ( italic_π ) = italic_a , italic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅ }. We then have that for every r𝑟ritalic_r-edge in \mathcal{H}caligraphic_H, there is a path τ𝜏\tauitalic_τ connecting the same nodes, with rels(τ)Xrels𝜏𝑋\textit{rels}(\tau)\in Xrels ( italic_τ ) ∈ italic_X. From Condition (R4), it then follows that 𝒫𝒢(a,r,b)models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧ ( italic_a , italic_r , italic_b ), a contradiction. ∎

The following result shows that the GNN is unlikely to predict triples that cannot be inferred from 𝒢𝒫𝒢𝒫\mathcal{G}\cup\mathcal{P}caligraphic_G ∪ caligraphic_P, as long as the embeddings are sufficiently high-dimensional.

Proposition 8.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Let \mathcal{H}caligraphic_H be a rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. For any ε>0𝜀0\varepsilon>0italic_ε > 0, there exists some k0subscript𝑘0k_{0}\in\mathbb{N}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_N such that, when kk0𝑘subscript𝑘0k\geq k_{0}italic_k ≥ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, for any m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N and (a,r,b)××𝑎𝑟𝑏(a,r,b)\in\mathcal{E}\times\mathcal{R}\times\mathcal{E}( italic_a , italic_r , italic_b ) ∈ caligraphic_E × caligraphic_R × caligraphic_E such that 𝒫𝒢⊧̸(a,r,b)not-models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ ( italic_a , italic_r , italic_b ), we have

Pr[𝐁𝐫𝐙𝐚(𝐦)𝐙𝐛(𝐦)]εPrdelimited-[]precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐦superscriptsubscript𝐙𝐛𝐦𝜀\displaystyle\textit{Pr}[\mathbf{B_{r}}\mathbf{Z_{a}^{(m)}}\preceq\mathbf{Z_{b% }^{(m)}}]\leq\varepsilonPr [ bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ] ≤ italic_ε
Proof.

First, note that because of Lemma 1, what we need to show is equivalent to:

Pr[μr(𝐚(𝐦))𝐛(𝐦)]εPrdelimited-[]precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦𝜀\displaystyle\textit{Pr}[\mu_{r}(\mathbf{a^{(m)}})\preceq\mathbf{b^{(m)}}]\leq\varepsilonPr [ italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ⪯ bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ] ≤ italic_ε

Let (a,b)×𝑎𝑏(a,b)\in\mathcal{E}\times\mathcal{E}( italic_a , italic_b ) ∈ caligraphic_E × caligraphic_E be such that 𝒫𝒢⊧̸(a,r,b)not-models𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ ( italic_a , italic_r , italic_b ). From Lemma 5, we know that there is some i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ } such that ZiIr1subscript𝑍𝑖subscriptsuperscript𝐼1𝑟Z_{i}\subseteq I^{1}_{r}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_I start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and whenever πpaths𝒢(b)𝜋subscriptpaths𝒢𝑏\pi\in\textit{paths}_{\mathcal{G}}(b)italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) with head(π)=ahead𝜋𝑎\textit{head}(\pi)=ahead ( italic_π ) = italic_a, it holds that Irels(π)Zi=subscript𝐼rels𝜋subscript𝑍𝑖I_{\textit{rels}(\pi)}\cap Z_{i}=\emptysetitalic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∅. The following condition is clearly a necessary requirement for μr(𝐚(𝐦))𝐛(𝐦)precedes-or-equalssubscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦\mu_{r}(\mathbf{a^{(m)}})\preceq\mathbf{b^{(m)}}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ⪯ bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT:

jZi.μr(𝐚(𝐦))j𝐛(𝐦)formulae-sequencefor-all𝑗subscript𝑍𝑖subscriptprecedes-or-equals𝑗subscript𝜇𝑟superscript𝐚𝐦superscript𝐛𝐦\forall j\in Z_{i}\,.\,\mu_{r}(\mathbf{a^{(m)}})\preccurlyeq_{j}\mathbf{b^{(m)}}∀ italic_j ∈ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ) ≼ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT

where we write (x1,,xd)j(y1,,yd)subscriptprecedes-or-equals𝑗subscript𝑥1subscript𝑥𝑑subscript𝑦1subscript𝑦𝑑(x_{1},...,x_{d})\preccurlyeq_{j}(y_{1},...,y_{d})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≼ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) for xjyjsubscript𝑥𝑗subscript𝑦𝑗x_{j}\leq y_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We need in particular also that:

jZi.μr(𝐚(𝟎))j𝐛(𝐦)formulae-sequencefor-all𝑗subscript𝑍𝑖subscriptprecedes-or-equals𝑗subscript𝜇𝑟superscript𝐚0superscript𝐛𝐦\forall j\in Z_{i}\,.\,\mu_{r}(\mathbf{a^{(0)}})\preccurlyeq_{j}\mathbf{b^{(m)}}∀ italic_j ∈ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT ) ≼ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_b start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT

Due to Lemma 4 this is equivalent to requiring that for every jZi𝑗subscript𝑍𝑖j\in Z_{i}italic_j ∈ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT we have:

μr(𝐚(𝟎))jmax(𝐛(𝟎),maxπpaths𝒢(b)μ𝑟𝑒𝑙𝑠(π)(emb0(head(π))))subscriptprecedes-or-equals𝑗subscript𝜇𝑟superscript𝐚0superscript𝐛0subscript𝜋subscriptpaths𝒢𝑏subscript𝜇𝑟𝑒𝑙𝑠𝜋subscriptemb0head𝜋\displaystyle\mu_{r}(\mathbf{a^{(0)}}){\preccurlyeq_{j}}\max\big{(}\mathbf{b^{% (0)}},\max_{\pi\in\textit{paths}_{\mathcal{G}}(b)}\mu_{\mathit{rels}(\pi)}\big% {(}\textit{emb}_{0}(\textit{head}(\pi))\big{)}\big{)}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT ) ≼ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_max ( bold_b start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT , roman_max start_POSTSUBSCRIPT italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_rels ( italic_π ) end_POSTSUBSCRIPT ( emb start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( head ( italic_π ) ) ) )

We can view the coordinates of the input embeddings as random variables. The latter condition is thus equivalent to a condition of the following form:

jZi.Ajrmax(Bj,Xj1,,Xjp)formulae-sequencefor-all𝑗subscript𝑍𝑖subscriptsuperscript𝐴𝑟𝑗subscript𝐵𝑗subscriptsuperscript𝑋1𝑗subscriptsuperscript𝑋𝑝𝑗\forall j\in Z_{i}\,.\,A^{r}_{j}\leq\max(B_{j},X^{1}_{j},...,X^{p}_{j})∀ italic_j ∈ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ roman_max ( italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )

where Ajrsubscriptsuperscript𝐴𝑟𝑗A^{r}_{j}italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the random variable corresponding to the j𝑗jitalic_jth coordinate of μr(𝐚(𝟎))subscript𝜇𝑟superscript𝐚0\mu_{r}(\mathbf{a^{(0)}})italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_a start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT ), Bjsubscript𝐵𝑗B_{j}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the j𝑗jitalic_jth coordinate of 𝐛(𝟎)superscript𝐛0\mathbf{b^{(0)}}bold_b start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT and Xj1,,Xjpsubscriptsuperscript𝑋1𝑗subscriptsuperscript𝑋𝑝𝑗X^{1}_{j},...,X^{p}_{j}italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the random variables corresponding to the j𝑗jitalic_jth coordinate of the vectors μ𝑟𝑒𝑙𝑠(π)(emb0(head(π)))subscript𝜇𝑟𝑒𝑙𝑠𝜋subscriptemb0head𝜋\mu_{\mathit{rels}(\pi)}\big{(}\textit{emb}_{0}(\textit{head}(\pi)))italic_μ start_POSTSUBSCRIPT italic_rels ( italic_π ) end_POSTSUBSCRIPT ( emb start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( head ( italic_π ) ) ). By construction, we have that the coordinates of different entity embeddings are sampled independently and that there are at least two distinct values that have a non-negative probability of being sampled for each coordinate. This means that there exists some value λ>0𝜆0\lambda>0italic_λ > 0 such that Pr[Ajr>Bj]λ𝑃𝑟delimited-[]subscriptsuperscript𝐴𝑟𝑗subscript𝐵𝑗𝜆Pr[A^{r}_{j}>B_{j}]\geq\lambdaitalic_P italic_r [ italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ≥ italic_λ and Pr[Ajr>Xjt]λ𝑃𝑟delimited-[]subscriptsuperscript𝐴𝑟𝑗superscriptsubscript𝑋𝑗𝑡𝜆Pr[A^{r}_{j}>X_{j}^{t}]\geq\lambdaitalic_P italic_r [ italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ≥ italic_λ for each t{1,,p}𝑡1𝑝t\in\{1,...,p\}italic_t ∈ { 1 , … , italic_p }. Moreover, since we have that whenever πpaths𝒢(b)𝜋subscriptpaths𝒢𝑏\pi\in\textit{paths}_{\mathcal{G}}(b)italic_π ∈ paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) with head(π)=ahead𝜋𝑎\textit{head}(\pi)=ahead ( italic_π ) = italic_a it holds that Irels(π)Zi=subscript𝐼rels𝜋subscript𝑍𝑖I_{\textit{rels}(\pi)}\cap Z_{i}=\emptysetitalic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∅, it follows that the random variable Ajrsubscriptsuperscript𝐴𝑟𝑗A^{r}_{j}italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is not among Bj,Xj1,,Xjpsubscript𝐵𝑗subscriptsuperscript𝑋1𝑗subscriptsuperscript𝑋𝑝𝑗B_{j},X^{1}_{j},...,X^{p}_{j}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We thus have:

Pr[jZi.Ajrmax(Bj,Xj1,,Xjp)]\displaystyle\textit{Pr}[\forall j\in Z_{i}\,.\,A^{r}_{j}\leq\max(B_{j},X^{1}_% {j},...,X^{p}_{j})]Pr [ ∀ italic_j ∈ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ roman_max ( italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , italic_X start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ]
(1λp+1)|Zi|absentsuperscript1superscript𝜆𝑝1subscript𝑍𝑖\displaystyle\quad\leq\left(1-\lambda^{p+1}\right)^{|Z_{i}|}≤ ( 1 - italic_λ start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT | italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT
=(1λp+1)kabsentsuperscript1superscript𝜆𝑝1𝑘\displaystyle\quad=\left(1-\lambda^{p+1}\right)^{k}= ( 1 - italic_λ start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT
ekλp+1absentsuperscript𝑒𝑘superscript𝜆𝑝1\displaystyle\quad\leq e^{-k\lambda^{p+1}}≤ italic_e start_POSTSUPERSCRIPT - italic_k italic_λ start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT

The value of p𝑝pitalic_p is upper bounded by ||\ell\cdot|\mathcal{E}|roman_ℓ ⋅ | caligraphic_E |, with \ellroman_ℓ the number of nodes in the rule graph. By choosing k𝑘kitalic_k sufficiently large, we can thus make this probability arbitrarily small. In particular:

ekλp+1εk1λp+1log1εformulae-sequencesuperscript𝑒𝑘superscript𝜆𝑝1𝜀𝑘1superscript𝜆𝑝11𝜀\displaystyle e^{-k\lambda^{p+1}}\leq\varepsilon\quad\Leftrightarrow\quad k% \geq\frac{1}{\lambda^{p+1}}\log\frac{1}{\varepsilon}italic_e start_POSTSUPERSCRIPT - italic_k italic_λ start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ≤ italic_ε ⇔ italic_k ≥ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT end_ARG roman_log divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG

Appendix B Constructing Rule Graphs

We write 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the set of relations that appear in the head of some rule from the considered rule base, and 2=1subscript2subscript1\mathcal{R}_{2}=\mathcal{R}\setminus\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_R ∖ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the remaining relations.

Proposition 9.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and suppose that there exists a rule graph \mathcal{H}caligraphic_H for 𝒫𝒫\mathcal{P}caligraphic_P. Let 1subscript1\mathcal{R}_{1}caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the set of relations that appear in the head of some rule in 𝒫𝒫\mathcal{P}caligraphic_P. It holds that the language Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is regular for every r1𝑟subscript1r\in\mathcal{R}_{1}italic_r ∈ caligraphic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Proof.

Let α(ri)=ri𝛼subscript𝑟𝑖subscript𝑟𝑖\alpha(r_{i})=r_{i}italic_α ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if ri2subscript𝑟𝑖subscript2r_{i}\in\mathcal{R}_{2}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and α(ri)=r¯i𝛼subscript𝑟𝑖subscript¯𝑟𝑖\alpha(r_{i})=\overline{r}_{i}italic_α ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT otherwise. We clearly have that α(r1)α(rk)Lr𝛼subscript𝑟1𝛼subscript𝑟𝑘subscript𝐿𝑟\alpha(r_{1})...\alpha(r_{k})\in L_{r}italic_α ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) … italic_α ( italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT iff 𝒫𝒫\mathcal{P}caligraphic_P entails the following rule:

r1(X1,X2)rk(Xk,Xk+1)r(X1,Xk+1)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟𝑘subscript𝑋𝑘subscript𝑋𝑘1𝑟subscript𝑋1subscript𝑋𝑘1r_{1}(X_{1},X_{2})\wedge...\wedge r_{k}(X_{k},X_{k+1})\rightarrow r(X_{1},X_{k% +1})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT )

Since we have assumed that 𝒫𝒫\mathcal{P}caligraphic_P has a rule graph, thanks to conditions (R3) and (R4), we can check whether this rule is valid by checking whether for each edge labelled with r𝑟ritalic_r there is a path connecting the same nodes whose eq-reduced type is r1;;rksubscript𝑟1subscript𝑟𝑘r_{1};...;r_{k}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Let (ni,nj)subscript𝑛𝑖subscript𝑛𝑗(n_{i},n_{j})( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) be a an edge labelled with r𝑟ritalic_r. Then, we can construct a finite state machine (FSM) from \mathcal{H}caligraphic_H by treating nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the start node and njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as the unique final node and interpreting eq edges as ε𝜀\varepsilonitalic_ε-transitions (i.e. corresponding to the empty string). Clearly, this FSM will accept the string r1rksubscript𝑟1subscript𝑟𝑘r_{1}...r_{k}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT if there is a path labelled with r1;;rksubscript𝑟1subscript𝑟𝑘r_{1};...;r_{k}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT connecting nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to njsubscript𝑛𝑗n_{j}italic_n start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. For each edge labelled with r𝑟ritalic_r, we can construct such an FSM. Let F1,,Fmsubscript𝐹1subscript𝐹𝑚F_{1},...,F_{m}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be the languages associated with these FSMs. By construction, Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the intersection of F1,,Fmsubscript𝐹1subscript𝐹𝑚F_{1},...,F_{m}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Since F1,,Fmsubscript𝐹1subscript𝐹𝑚F_{1},...,F_{m}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are regular, it follows that Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is regular as well. ∎

B.1 Left Regular Rule Bases

We show that the graph resulting from the construction process satisfies the conditions (R1)-(R4). The fact that (R1) is satisfied follows from the following lemma.

Lemma 6.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. For every r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R, it holds that \mathcal{H}caligraphic_H contains an outgoing r𝑟ritalic_r-edge from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Proof.

Let r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R. The edge from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is added in step 2 of the construction process. This edge may be removed in step 4, but in that case, a new r𝑟ritalic_r-edge is added from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to a fresh node. ∎

The fact that (R2) is satisfied follows immediately from the construction in step 4. We now move to condition (R3).

Lemma 7.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. If 𝒫𝒫\mathcal{P}caligraphic_P contains the rule r1(X1,X2)r2(X2,X3)r3(X1,X3)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟3subscript𝑋1subscript𝑋3r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\rightarrow r_{3}(X_{1},X_{3})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), then whenever two nodes n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are connected in \mathcal{H}caligraphic_H by a path whose eq-reduced type is r3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, there is some node n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT such that n𝑛nitalic_n and n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT are connected by a path whose eq-reduced type is r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are connected by a path whose eq-reduced type is r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proof.

The stated assertion clearly holds after step 3 of the construction method. Indeed, the only r3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT-edge in \mathcal{H}caligraphic_H is from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Note in particular that no r3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT edges can be added in step 3, given our assumption that 𝒫𝒫\mathcal{P}caligraphic_P is left-regular. Finally, it is also easy to see that this property remains satisfied after step 4. ∎

The next lemma shows that (R3) is satisfied.

Lemma 8.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. Suppose nodes n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are connected with an edge of type r𝑟ritalic_r and suppose 𝒫r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r% _{p}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). Then there is a path whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT from n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Proof.

Assume 𝒫r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r% _{p}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). Let n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be nodes connected by an edge of type r𝑟ritalic_r. We show the result by structural induction. First, suppose p=2𝑝2p=2italic_p = 2. In this case, the considered rule is of the form r1(X1,X2)r2(X2,X3)r(X1,X3)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3𝑟subscript𝑋1subscript𝑋3r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\rightarrow r(X_{1},X_{3})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ). It then follows from Lemma 7 that there is a path whose eq-reduced type is r1;r2subscript𝑟1subscript𝑟2r_{1};r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT connecting n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Let us now consider the inductive case. If p>3𝑝3p>3italic_p > 3 then r1(X1,X2)r2(X2,X3)rp(Xp,Xp+1)r(X1,Xp+1)subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟2subscript𝑋2subscript𝑋3subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1r_{1}(X_{1},X_{2})\wedge r_{2}(X_{2},X_{3})\wedge...\wedge r_{p}(X_{p},X_{p+1}% )\rightarrow r(X_{1},X_{p+1})italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) is derived from at least two rules in 𝒫𝒫\mathcal{P}caligraphic_P (given that the rules in 𝒫𝒫\mathcal{P}caligraphic_P were restricted to have only two atoms in the body). The last step of the derivation of this rule is done by secting some rule s1(X,Y)s2(Y,Z)r(X,Z)subscript𝑠1𝑋𝑌subscript𝑠2𝑌𝑍𝑟𝑋𝑍s_{1}(X,Y)\wedge s_{2}(Y,Z)\rightarrow r(X,Z)italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r ( italic_X , italic_Z ) from 𝒫𝒫\mathcal{P}caligraphic_P such that

𝒫r1(X1,X2)ri1(Xi1,Xi)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟𝑖1subscript𝑋𝑖1subscript𝑋𝑖\displaystyle\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge...\wedge r_{i-1}(X_{i% -1},X_{i})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) s1(X1,Xi)absentsubscript𝑠1subscript𝑋1subscript𝑋𝑖\displaystyle\rightarrow s_{1}(X_{1},X_{i})→ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
𝒫ri(Xi,Xi+1)rp(Xp,Xp+1)models𝒫subscript𝑟𝑖subscript𝑋𝑖subscript𝑋𝑖1subscript𝑟𝑝subscript𝑋𝑝subscript𝑋𝑝1\displaystyle\mathcal{P}\models r_{i}(X_{i},X_{i+1})\wedge...\wedge r_{p}(X_{p% },X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) s2(Xi,Xp+1)absentsubscript𝑠2subscript𝑋𝑖subscript𝑋𝑝1\displaystyle\rightarrow s_{2}(X_{i},X_{p+1})→ italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT )

If there is a path from n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whose eq-reduced type is r𝑟ritalic_r, we know from Lemma 7 that there must be a path from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT with eq-reduced type s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-edge and a path from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with eq-reduced type s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, for some node n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT in \mathcal{H}caligraphic_H. By induction, we furthermore know that there must then be a path with eq-reduced type r1;;ri1subscript𝑟1subscript𝑟𝑖1r_{1};...;r_{i-1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and a path with eq-reduced type ri;;rpsubscript𝑟𝑖subscript𝑟𝑝r_{i};...;r_{p}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Thus, we find that there must be a path with eq-reduced type r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT from n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. ∎

The fact that (R4) is satisfied follows from the next lemma.

Lemma 9.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. Suppose there is a path in \mathcal{H}caligraphic_H from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT whose eq-reduced type is r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Then it holds that 𝒫r1(X1,X2)rp(Xp,Xp1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟𝑝subscript𝑋𝑝subscript𝑋subscript𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge...\wedge r_{p}(X_{p},X_{p_{1}})% \rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ).

Proof.

The result clearly holds after step 2. We show that the result remains valid after each iteration of step 3. Suppose in step 3 we add an r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edge between nr1subscript𝑛subscript𝑟1n_{r_{1}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. This means that:

𝒫r1(X,Y)r2(Y,Z)r3(X,X)models𝒫subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍subscript𝑟3𝑋𝑋\displaystyle\mathcal{P}\models r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r_{3}(X% ,X)caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X , italic_X )

Let τ𝜏\tauitalic_τ be a path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. If τ𝜏\tauitalic_τ does not contain the new r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-edge, then the fact that the result is valid for τ𝜏\tauitalic_τ follows by induction. Now, suppose that τ𝜏\tauitalic_τ contains the new r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT edge. Then τ𝜏\tauitalic_τ is of the form ri1;;ris;r2;rj1;;rjtsubscript𝑟subscript𝑖1subscript𝑟subscript𝑖𝑠subscript𝑟2subscript𝑟subscript𝑗1subscript𝑟subscript𝑗𝑡r_{i_{1}};...;r_{i_{s}};r_{2};r_{j_{1}};...;r_{j_{t}}italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. By induction we have:

𝒫𝒫\displaystyle\mathcal{P}caligraphic_P ri1(X1,X2)ris(Xs,Xs+1)r1(X1,Xs+1)modelsabsentsubscript𝑟subscript𝑖1subscript𝑋1subscript𝑋2subscript𝑟subscript𝑖𝑠subscript𝑋𝑠subscript𝑋𝑠1subscript𝑟1subscript𝑋1subscript𝑋𝑠1\displaystyle\models r_{i_{1}}(X_{1},X_{2})\wedge...\wedge r_{i_{s}}(X_{s},X_{% s+1})\rightarrow r_{1}(X_{1},X_{s+1})⊧ italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT )

Clearly there is a path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT with eq-reduced type r3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. In particular, there is a path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nr3subscript𝑛subscript𝑟3n_{r_{3}}italic_n start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT with eq-reduced type r3;rj1;rjtsubscript𝑟3subscript𝑟subscript𝑗1subscript𝑟subscript𝑗𝑡r_{3};r_{j_{1}};...r_{j_{t}}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. By induction, we thus have:

𝒫𝒫\displaystyle\mathcal{P}caligraphic_P r3(X0,X1)rj1(X1,X2)modelsabsentsubscript𝑟3subscript𝑋0subscript𝑋1subscript𝑟subscript𝑗1subscript𝑋1subscript𝑋2\displaystyle\models r_{3}(X_{0},X_{1})\wedge r_{j_{1}}(X_{1},X_{2})\wedge...⊧ italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ …
rjt(Xt,Xt1)r(X0,Xt+1)subscript𝑟subscript𝑗𝑡subscript𝑋𝑡subscript𝑋subscript𝑡1𝑟subscript𝑋0subscript𝑋𝑡1\displaystyle\quad\quad\quad\wedge r_{j_{t}}(X_{t},X_{t_{1}})\rightarrow r(X_{% 0},X_{t+1})∧ italic_r start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT )

Together we find that the stated result is satisfied.

Finally, we need to show that the result remains satisfied after step 4. This is clearly the case, as this step replaces edges of type r𝑟ritalic_r with paths of type r;eq;;eq𝑟eqeqr;\textit{eq};...;\textit{eq}italic_r ; eq ; … ; eq. The eq-reduced types of the paths from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT thus remain unchanged after this step. ∎

Proposition 10.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a left-regular set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method. It holds that \mathcal{H}caligraphic_H satisfies (R1)–(R4).

Proof.

The fact that (R1), (R3) and (R4) are satisfied follows immediately from Lemmas 6, 8 and 9. The fact that (R2) is satisfied follows trivially from the construction. ∎

B.2 Bounded Inference

Let paths𝒢m(b)subscriptsuperscriptpaths𝑚𝒢𝑏\textit{paths}^{m}_{\mathcal{G}}(b)paths start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) be the set of all paths in 𝒢𝒢\mathcal{G}caligraphic_G of length at most m𝑚mitalic_m which are ending in b𝑏bitalic_b.

Lemma 10.

For any entity e𝑒e\in\mathcal{E}italic_e ∈ caligraphic_E it holds that

𝐞(𝐦)max(𝐞(𝟎),maxπpaths𝒢m(e)μ𝑟𝑒𝑙𝑠(π)(emb0(head(π))))precedes-or-equalssuperscript𝐞𝐦superscript𝐞0subscript𝜋subscriptsuperscriptpaths𝑚𝒢𝑒subscript𝜇𝑟𝑒𝑙𝑠𝜋subscriptemb0head𝜋\displaystyle\mathbf{e^{(m)}}\preceq\max\Big{(}\mathbf{e^{(0)}},\max_{\pi\in% \textit{paths}^{m}_{\mathcal{G}}(e)}\mu_{\mathit{rels}(\pi)}\big{(}\textit{emb% }_{0}(\textit{head}(\pi))\big{)}\Big{)}bold_e start_POSTSUPERSCRIPT ( bold_m ) end_POSTSUPERSCRIPT ⪯ roman_max ( bold_e start_POSTSUPERSCRIPT ( bold_0 ) end_POSTSUPERSCRIPT , roman_max start_POSTSUBSCRIPT italic_π ∈ paths start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_e ) end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_rels ( italic_π ) end_POSTSUBSCRIPT ( emb start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( head ( italic_π ) ) ) )
Proof.

This follows immediately from the construction of the GNN. ∎

Lemma 11.

Let \ellroman_ℓ be the number of nodes in the given m𝑚mitalic_m-bounded rule graph. Suppose 𝒫𝒢⊧̸m(a,r,b)subscriptnot-models𝑚𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models_{m}(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a , italic_r , italic_b ). Then there is some i{1,,}𝑖1i\in\{1,...,\ell\}italic_i ∈ { 1 , … , roman_ℓ } such that:

  • ZiIrsubscript𝑍𝑖subscript𝐼𝑟Z_{i}\subseteq I_{r}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT; and

  • whenever πpaths𝒢m+1(b)𝜋subscriptsuperscriptpaths𝑚1𝒢𝑏\pi\in\textit{paths}^{m+1}_{\mathcal{G}}(b)italic_π ∈ paths start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) with head(π)=ahead𝜋𝑎\textit{head}(\pi)=ahead ( italic_π ) = italic_a, it holds that Irels(π)Zi=subscript𝐼rels𝜋subscript𝑍𝑖I_{\textit{rels}(\pi)}\cap Z_{i}=\emptysetitalic_I start_POSTSUBSCRIPT rels ( italic_π ) end_POSTSUBSCRIPT ∩ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∅.

Proof.

This lemma is shown in exactly the same way as Lemma 5, simply replacing paths𝒢(b)subscriptpaths𝒢𝑏\textit{paths}_{\mathcal{G}}(b)paths start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) by paths𝒢m+1(b)subscriptsuperscriptpaths𝑚1𝒢𝑏\textit{paths}^{m+1}_{\mathcal{G}}(b)paths start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_b ) and replacing Condition (R4) by Condition (R4m). ∎

Proposition 11.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a rule base and 𝒢𝒢\mathcal{G}caligraphic_G a knowledge graph. Let \mathcal{H}caligraphic_H be an m𝑚mitalic_m-bounded rule graph for 𝒫𝒫\mathcal{P}caligraphic_P and let 𝐙𝐞(𝐥)superscriptsubscript𝐙𝐞𝐥\mathbf{Z_{e}^{(l)}}bold_Z start_POSTSUBSCRIPT bold_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_l ) end_POSTSUPERSCRIPT be the entity representations that are learned by the corresponding GNN. For any ε>0𝜀0\varepsilon>0italic_ε > 0, there exists some k0subscript𝑘0k_{0}\in\mathbb{N}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_N such that, when kk0𝑘subscript𝑘0k\geq k_{0}italic_k ≥ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, for any im+1𝑖𝑚1i\leq m+1italic_i ≤ italic_m + 1 and (a,r,b)××𝑎𝑟𝑏(a,r,b)\in\mathcal{E}\times\mathcal{R}\times\mathcal{E}( italic_a , italic_r , italic_b ) ∈ caligraphic_E × caligraphic_R × caligraphic_E such that 𝒫𝒢⊧̸m(a,r,b)subscriptnot-models𝑚𝒫𝒢𝑎𝑟𝑏\mathcal{P}\cup\mathcal{G}\not\models_{m}(a,r,b)caligraphic_P ∪ caligraphic_G ⊧̸ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_a , italic_r , italic_b ), we have

Pr[𝐁𝐫𝐙𝐚(𝐢)𝐙𝐛(𝐢)]εPrdelimited-[]precedes-or-equalssubscript𝐁𝐫superscriptsubscript𝐙𝐚𝐢superscriptsubscript𝐙𝐛𝐢𝜀\displaystyle\textit{Pr}[\mathbf{B_{r}}\mathbf{Z_{a}^{(i)}}\preceq\mathbf{Z_{b% }^{(i)}}]\leq\varepsilonPr [ bold_B start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ⪯ bold_Z start_POSTSUBSCRIPT bold_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( bold_i ) end_POSTSUPERSCRIPT ] ≤ italic_ε
Proof.

This result is shown in the same way as Proposition 2, by relying on Lemma 11 instead of Lemma 5. ∎

Let us now show the correctness of the proposed process for constructing m𝑚mitalic_m-bounded rule graphs. Conditions (R1) and (R2) are clearly satisfied. Next, we show that condition (R3) is satisfied.

Lemma 12.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and let \mathcal{H}caligraphic_H be the resulting m𝑚mitalic_m-bounded rule graph, constructed using the proposed process. Suppose nodes n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are connected with an edge of type r𝑟ritalic_r and suppose 𝒫ri1(X1,X2)ri2(X2,X3)rip(Xp,Xp+1)r(X1,Xp+1)models𝒫subscript𝑟subscript𝑖1subscript𝑋1subscript𝑋2subscript𝑟subscript𝑖2subscript𝑋2subscript𝑋3subscript𝑟subscript𝑖𝑝subscript𝑋𝑝subscript𝑋𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{i_{1}}(X_{1},X_{2})\wedge r_{i_{2}}(X_{2},X_{3})\wedge..% .\wedge r_{i_{p}}(X_{p},X_{p+1})\rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ). Then there is a path connecting n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, whose eq-reduced type is ri1;;ripsubscript𝑟subscript𝑖1subscript𝑟subscript𝑖𝑝r_{i_{1}};...;r_{i_{p}}italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Proof.

First, we show that at the end of step 4, there must be a path of type ri1;;ripsubscript𝑟subscript𝑖1subscript𝑟subscript𝑖𝑝r_{i_{1}};...;r_{i_{p}}italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT connecting n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By construction, we immediately have that whenever two nodes (n,n)𝑛superscript𝑛(n,n^{\prime})( italic_n , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) are connected with an risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT-edge and 𝒫𝒫\mathcal{P}caligraphic_P contains the rule rj(X,Y)rl(Y,Z)ri(X,Z)subscript𝑟𝑗𝑋𝑌subscript𝑟𝑙𝑌𝑍subscript𝑟𝑖𝑋𝑍r_{j}(X,Y)\wedge r_{l}(Y,Z)\rightarrow r_{i}(X,Z)italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X , italic_Z ) it holds that there exists some node n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT such that there is an rjsubscript𝑟𝑗r_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT-edge from n𝑛nitalic_n to n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and an rlsubscript𝑟𝑙r_{l}italic_r start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT edge from n′′superscript𝑛′′n^{\prime\prime}italic_n start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The existence of a path of type ri1;;ripsubscript𝑟subscript𝑖1subscript𝑟subscript𝑖𝑝r_{i_{1}};...;r_{i_{p}}italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT then follows in the same way as in the proof of Lemma 8. It remains to be shown that the proposition remains valid after step 5. However, the paths in the final graph are those that can be found in the graph after step 4, with the possible addition of some eq-edges. This means in particular that after step 5, there must still be a path from n𝑛nitalic_n to nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whose eq-reduced type is ri1;;ripsubscript𝑟subscript𝑖1subscript𝑟subscript𝑖𝑝r_{i_{1}};...;r_{i_{p}}italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT. ∎

Finally, the fact that (R4m) is satisfied follows from the following lemma.

Lemma 13.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules, and let \mathcal{H}caligraphic_H be the resulting m𝑚mitalic_m-bounded rule graph, constructed using the process outlined above. Suppose there is a path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT whose eq-reduced type if r1;;rpsubscript𝑟1subscript𝑟𝑝r_{1};...;r_{p}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; … ; italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, with pm+1𝑝𝑚1p\leq m+1italic_p ≤ italic_m + 1. Then it holds that 𝒫r1(X1,X2)rp(Xp,Xp1)r(X1,Xp+1)models𝒫subscript𝑟1subscript𝑋1subscript𝑋2subscript𝑟𝑝subscript𝑋𝑝subscript𝑋subscript𝑝1𝑟subscript𝑋1subscript𝑋𝑝1\mathcal{P}\models r_{1}(X_{1},X_{2})\wedge...\wedge r_{p}(X_{p},X_{p_{1}})% \rightarrow r(X_{1},X_{p+1})caligraphic_P ⊧ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ italic_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_r ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ).

Proof.

We clearly have that the proposition holds after step 3 of the construction method. After step 3, if there is an r𝑟ritalic_r-link between nodes n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and a rule r1(X,Y)r2(Y,Z)r(X,Z)subscript𝑟1𝑋𝑌subscript𝑟2𝑌𝑍𝑟𝑋𝑍r_{1}(X,Y)\wedge r_{2}(Y,Z)\rightarrow r(X,Z)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X , italic_Y ) ∧ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → italic_r ( italic_X , italic_Z ) such that n𝑛nitalic_n and nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are not connected by an r1;r2subscript𝑟1subscript𝑟2r_{1};r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT path, it must be the case that any path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to some node nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT which contains the edge (n,n)𝑛superscript𝑛(n,n^{\prime})( italic_n , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) must have a length of at least m+1𝑚1m+1italic_m + 1. It follows that any path from n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to some node nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT which contains an edge that was added during step 4 must have length at least m+2𝑚2m+2italic_m + 2. We thus have in particular that the proposition still holds after step 4. The paths in the final graph are those that can be found in the graph after step 4, with the possible addition of some eq-edges. Since the proposition only depends on the eq-reduced types of the paths, the result still holds after step 5. ∎

Together, we have shown the following result.

Proposition 12.

Let 𝒫𝒫\mathcal{P}caligraphic_P be a set of closed path rules and let \mathcal{H}caligraphic_H be the graph obtained using the proposed construction method for m𝑚mitalic_m-bounded rule graphs. It holds that \mathcal{H}caligraphic_H satisfies (R1)–(R3) and (R4m).

Appendix C Experimental Details

This section lists additional details about our experiment’s setup, benchmark datasets, and evaluation protocol. Section C.1 lists ReshufflE’s implementation details. The origins and licenses of the standard benchmarks for inductive KGC are discussed in Section C.2. Details on ReshufflE’s hyper-parameter optimisation are discussed in Section C.3. Finally, details about the evaluation protocol, together with the complete evaluation results, are provided in Section C.4.

C.1 Implementation Details

ReshufflE was implemented using the Python library PyKEEN 1.10.1 (?). PyKEEN employs the MIT license and offers numerous benchmarks for KGC, facilitating the comfortable reuse of ReshufflE’s code for upcoming applications and comparisons. Upon acceptance of our paper, we will provide ReshufflE’s source code in a public GitHub repository to further facilitate the reuse of ReshufflE by our community.

C.2 Benchmarks: Origins and Licenses

We did not find a license for any of the three inductive benchmarks nor their corresponding transductive supersets. Furthermore, WN18RR is a subset of the WordNet database (?), which states lexical relations of English words. We also did not find a license for this dataset. FB15k-237 is a subset of FB15k (?), which is a subset of Freebase (?), a collaborative database that contains general knowledge, such as about celebrities and awards, in English. We did not find a license for FB15k-237 but found that FB15k (?) uses the CC BY 2.5 license. Finally, NELL-995 (?) is a subset of NELL (?), a dataset that was extracted from semi-structured and natural-language data on the web and that includes information about e.g., cities, companies, and sports teams. Also for NELL, we did not find any license information.

C.3 Hyper-Parameter Optimisation

#Layers l𝑙litalic_l k𝑘kitalic_k λ𝜆\lambdaitalic_λ lr
FB15k-237 v1 4 25 80 2.0 0.005
v2 3 30 60 1.0 0.005
v3 5 25 40 0.5 0.005
v4 3 30 80 1.0 0.01
WN18RR v1 3 20 40 1.0 0.01
v2 3 20 60 0.5 0.01
v3 3 20 40 1.0 0.01
v4 3 30 80 1.0 0.01
NELL-995 v1 3 20 80 2.0 0.005
v2 4 30 60 2.0 0.01
v3 4 25 40 0.5 0.01
v4 4 30 60 1.0 0.01
Table 4: ReshufflE’s best-performing hyper-parameters on FB15k-237 v1-4, WN18RR v1-4, and NELL-995 v1-4.
FB15k-237 WN18RR NELL-995
v1 v2 v3 v4 v1 v2 v3 v4 v1 v2 v3 v4
Seed 1 0.751 0.879 0.905 0.918 0.713 0.727 0.614 0.693 0.630 0.874 0.871 0.816
Seed 2 0.744 0.892 0.908 0.916 0.707 0.726 0.574 0.690 0.650 0.860 0.893 0.808
Seed 3 0.746 0.883 0.897 0.918 0.710 0.736 0.617 0.698 0.635 0.848 0.881 0.812
mean 0.747 0.885 0.903 0.918 0.710 0.729 0.602 0.694 0.638 0.861 0.882 0.812
stdv 0.004 0.007 0.005 0.001 0.003 0.006 0.024 0.004 0.010 0.013 0.011 0.004
Table 5: ReshufflE’s benchmark Hits@10 scores on all seeds together with the mean (mean) and standard deviation (stdv) of Hits@10.

Following ? (?), we manually tune ReshufflE’s hyper-parameters on the validation split of 𝒢Trainsubscript𝒢Train\mathcal{G}_{\textit{Train}}caligraphic_G start_POSTSUBSCRIPT Train end_POSTSUBSCRIPT. We use the following ranges for the hyperparameters: the number of ReshufflE’s layers #Layers{3,4,5}#Layers345\textit{\#Layers}\in\{3,4,5\}#Layers ∈ { 3 , 4 , 5 }, the embedding dimensionality parameters l{20,25,30}𝑙202530l\in\{20,25,30\}italic_l ∈ { 20 , 25 , 30 } and k{40,60,80}𝑘406080k\in\{40,60,80\}italic_k ∈ { 40 , 60 , 80 }, the loss margin λ{0.5,1.0,2.0}𝜆0.51.02.0\lambda\in\{0.5,1.0,2.0\}italic_λ ∈ { 0.5 , 1.0 , 2.0 }, and finally the learning rate lr{0.005,0.01}lr0.0050.01\textit{lr}\in\{0.005,0.01\}lr ∈ { 0.005 , 0.01 }. We use the same batch and negative sampling size for all runs. In particular, we set the batch size to 1024102410241024 and the negative sampling size to 100100100100. We report the best hyper-parameters for ReshufflE split by each inductive benchmark in Table 4. Finally, we reuse the same hyper-parameters for each of ReshufflE’s ablations, namely, ReshufflEnL and ReshufflE2.

C.4 Evaluation Protocol and Complete Results

Following the standard evaluation protocol for inductive KGC, introduced by ? (?), we evaluate ReshufflE’s final performance on the test split of the testing graph by measuring the ranking quality of any test triple r(e,f)𝑟𝑒𝑓r(e,f)italic_r ( italic_e , italic_f ) over 50505050 randomly sampled entities eisubscriptsuperscript𝑒𝑖e^{\prime}_{i}\in\mathcal{E}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_E and fisubscriptsuperscript𝑓𝑖f^{\prime}_{i}\in\mathcal{E}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_E: r(ei,f)𝑟subscriptsuperscript𝑒𝑖𝑓r(e^{\prime}_{i},f)italic_r ( italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_f ) and r(e,fi)𝑟𝑒subscriptsuperscript𝑓𝑖r(e,f^{\prime}_{i})italic_r ( italic_e , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for all 1i501𝑖501\leq i\leq 501 ≤ italic_i ≤ 50. Following ? (?), we report the Hits@10 metric, i.e., the proportion of true triples (those within the test split of the testing graph) among the predicted triples whose rank is maximally 10101010.

Table 5 states ReshufflE’s benchmark results over all inductive datasets, as well as their means and standard deviations.

References

  • 2020 Abboud, R.; Ceylan, İ. İ.; Lukasiewicz, T.; and Salvatori, T. 2020. BoxE: A box embedding model for knowledge base completion. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  • 2021 Ali, M.; Berrendorf, M.; Hoyt, C. T.; Vermue, L.; Sharifzadeh, S.; Tresp, V.; and Lehmann, J. 2021. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. Journal of Machine Learning Research 22(82):1–6.
  • 2023 Anil, A.; Gutiérrez-Basulto, V.; Ibáñez-García, Y.; and Schockaert, S. 2023. Inductive knowledge graph completion with gnns and rules: An analysis. CoRR abs/2308.07942.
  • 2019 Balazevic, I.; Allen, C.; and Hospedales, T. M. 2019. TuckER: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, 5184–5193. Association for Computational Linguistics.
  • 2013 Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, 2787–2795.
  • 2010 Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Jr., E. R. H.; and Mitchell, T. M. 2010. Toward an architecture for never-ending language learning. In Fox, M., and Poole, D., eds., Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010, 1306–1313. AAAI Press.
  • 2024 Charpenay, V., and Schockaert, S. 2024. Capturing knowledge graphs and rules with octagon embeddings. CoRR abs/2401.16270.
  • 2022 Chen, Y.; Mishra, P.; Franceschi, L.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2022. Refactor gnns: Revisiting factorisation-based models from a message-passing perspective. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  • 2013 Galárraga, L. A.; Teflioudi, C.; Hose, K.; and Suchanek, F. M. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Schwabe, D.; Almeida, V. A. F.; Glaser, H.; Baeza-Yates, R.; and Moon, S. B., eds., 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, 413–422. International World Wide Web Conferences Steering Committee / ACM.
  • 2018 Gutiérrez-Basulto, V., and Schockaert, S. 2018. From knowledge graph embedding to ontology embedding? an analysis of the compatibility between vector space representations and rules. In Thielscher, M.; Toni, F.; and Wolter, F., eds., Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, Tempe, Arizona, 30 October - 2 November 2018, 379–388. AAAI Press.
  • 2015 Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  • 2022 Leemhuis, M.; Özçep, Ö. L.; and Wolter, D. 2022. Learning with cone-based geometric models and orthologics. Ann. Math. Artif. Intell. 90(11-12):1159–1195.
  • 2021 Mai, S.; Zheng, S.; Yang, Y.; and Hu, H. 2021. Communicative message passing for inductive relation reasoning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, 4294–4302. AAAI Press.
  • 2018 Meilicke, C.; Fink, M.; Wang, Y.; Ruffinelli, D.; Gemulla, R.; and Stuckenschmidt, H. 2018. Fine-grained evaluation of rule- and embedding-based systems for knowledge graph completion. In Vrandecic, D.; Bontcheva, K.; Suárez-Figueroa, M. C.; Presutti, V.; Celino, I.; Sabou, M.; Kaffee, L.; and Simperl, E., eds., The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume 11136 of Lecture Notes in Computer Science, 3–20. Springer.
  • 2019 Meilicke, C.; Chekol, M. W.; Ruffinelli, D.; and Stuckenschmidt, H. 2019. Anytime bottom-up rule learning for knowledge graph completion. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, 3137–3143. ijcai.org.
  • 1995 Miller, G. A. 1995. Wordnet: A lexical database for english. Commun. ACM 38(11):39–41.
  • 2011 Nickel, M.; Tresp, V.; and Kriegel, H. 2011. A three-way model for collective learning on multi-relational data. In Getoor, L., and Scheffer, T., eds., Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, 809–816. Omnipress.
  • 2023 Pavlovic, A., and Sallinger, E. 2023. ExpressivE: A spatio-functional embedding for knowledge graph completion. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  • 2019 Sadeghian, A.; Armandpour, M.; Ding, P.; and Wang, D. Z. 2019. DRUM: end-to-end differentiable rule mining on knowledge graphs. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 15321–15331.
  • 2020 Teru, K. K.; Denis, E. G.; and Hamilton, W. L. 2020. Inductive relation prediction by subgraph reasoning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, 9448–9457. PMLR.
  • 2015 Toutanova, K., and Chen, D. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, CVSC 2015, Bei**g, China, July 26-31, 2015, 57–66. Association for Computational Linguistics.
  • 2016 Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In Balcan, M., and Weinberger, K. Q., eds., Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, 2071–2080. JMLR.org.
  • 2017 Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning. In Palmer, M.; Hwa, R.; and Riedel, S., eds., Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, 564–573. Association for Computational Linguistics.
  • 2015 Yang, B.; Yih, W.; He, X.; Gao, J.; and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Bengio, Y., and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  • 2017 Yang, F.; Yang, Z.; and Cohen, W. W. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Guyon, I.; von Luxburg, U.; Bengio, S.; Wallach, H. M.; Fergus, R.; Vishwanathan, S. V. N.; and Garnett, R., eds., Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2319–2328.
  • 2021 Zhang, Z.; Wang, J.; Chen, J.; Ji, S.; and Wu, F. 2021. Cone: Cone embeddings for multi-hop reasoning over knowledge graphs. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y. N.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 19172–19183.
  • 2021 Zhu, Z.; Zhang, Z.; Xhonneux, L. A. C.; and Tang, J. 2021. Neural bellman-ford networks: A general graph neural network framework for link prediction. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 29476–29490.