P-NAL: an Effective and Interpretable Entity Alignment Method

Chuanhao Xu Northeastern UniversityShenyangChina [email protected] **gwei Cheng Northeastern UniversityShenyangChina [email protected]  and  Fu Zhang Northeastern UniversityShenyangChina [email protected]
Abstract.

Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings’ constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce P-NAL, an entity alignment method that captures two types of logical inference paths with Non-Axiomatic Logic (NAL). Type I is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity sentence between the other two entities. Type II links the entity pair by their embeddings. P-NAL iteratively aligns entities and relations by integrating the conclusions of the inference paths. Moreover, our method is logically interpretable and extensible due to the expressiveness of NAL. Our proposed method is suitable for various EA settings. Experimental results show that our method outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K with both supervised and unsupervised settings. To our knowledge, we present the first in-depth analysis of entity alignment’s basic principles from a unified logical perspective.

1. Introduction

Knowledge graphs (KGs), which store massive facts about the real world, is a fruitful attempt aiming to enhance semantic driven information processing ability. KGs can be used by various application domains, such as question answering, recommender systems and language representation learning (knowledge graph enhanced language model)  (Ji et al., 2021; Logan IV et al., 2019). The information contained in each individual KG project, such as DBpedia  (Auer et al., 2007) and YAGO  (Suchanek et al., 2007) is limited. So the task of entity alignment (EA) is proposed to increase KG completeness. The EA task consists of integrating two or more KGs into a same KG by aligning nodes that refer to the same entity.

There are many embedding-based EA methods  (Fanourakis et al., 2023) that leverage deep learning techniques to represent entities with low-dimensional embeddings, and align entities with a similarity function on the embedding space. KGs’ triples and seed alignments are usually seen as embeddings’ constraint during the training process of such embedding model. The structural and side information of KGs are usually utilized via embedding propagation, aggregation or interaction. Generally speaking, there are some crucial shortcomings of embedding-based EA methods: 1. They lack complex reasoning capability. Some of them are enhanced by paths  (Cai et al., 2022), however, due to the nature of vector representation, it is not easy to perform or approximate symbolic reasoning on such paths. 2. They lack interpretability in the models, so they have to rely solely on numerical evaluation metrics to evaluate their performance. Thus the cons and pros of their model design may not be properly evaluated. 3. The absence of a unified framework explaining the mechanism of embedding learning and processing renders their semantic or structural learning capability quite mysterious.

Apart from embedding-based methods, there exist a group of methods that directly estimates entity similarities from the contextual data (path) that are available in the two input KGs. We refer to them as “path-based” methods. Also, we refer to the estimation of entity similarities by processing and aggregating the paths as “similarity inference”. There is a potential advantage that path-based methods can capture fine-grained matches of neighbors while the traditional embedding-based methods can’t. There are also emerging methods that combines the idea of embedding learning and path reasoning. We coin the term “embedding-path” to refer to such methods. More recently, path-based (such as PARIS+  (Leone et al., 2022)) and embedding-path methods (such as BERT-INT  (Tang et al., 2020) and FGWEA  (Tang et al., 2023)) are starting to surpass the performance of traditional embedding-based methods. However, they failed to handle the similarity inference appropriately to some extent, possibly due to the lack of proper formalization of the inference paths and steps.

To address the aforementioned issues of embedding-based and embedding-path methods, we carefully examine the similarity inference of EA from the logical perspective. Thus we propose a path-based EA method P-NAL, where P stands for PARIS  (Suchanek et al., 2011) and NAL for Non-Axiomatic Logic  (Wang, 2013). PARIS is an unsupervised non-neural EA method with competitive performance on benchmark datasets  (Leone et al., 2022). NAL is a term logic with a specific semantic theory and its design suits KG tasks (see Section 2.3). P-NAL reinterprets and extends the traditional EA system PARIS with the help of NAL. We formalize the similarity inference as using NAL’s revision inference rule to aggregate the evidences of two types of inference paths. Type I is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity between the other two entities. It can be seen as the fundamental alignment evidence (signal), and we coin the term “align-bridge” to refer to such type. Similarity inference of align-bridge consists of clarifying premises, performing several inference steps and obtaining conclusions. Type II is the direct path linking the to-be-aligned entity pairs by the representations of their name or description. We obtain such representations (embeddings) through the deep language model BERT. There are other two types of inference paths (which align relations), III and IV, that is not performed by our method but depicted for theoretical purpose.

P-NAL’s adopts an iterative aligning strategy, and for each iteration it first performs similarity inference, then it uses a matching technique (rBMat algorithm with modification, see Section 3.4) to obtain EA results. It also infers the matching of relations in each iteration. Although P-NAL is path-based, it does use embedding technique minimally, just to embed the literals in KGs (entity names/descriptions and attribute values). The name/description embeddings are utilized by path type II and the attribute value embeddings are utilized by path type I.

P-NAL is simple, highly interpretable and self-explanatory. The design of P-NAL has simple intuition. Although the overall implementation may seem a little complicated, P-NAL avoids using unnecessarily complicated mathematical objects and each step in it can be easily understood. It is interpretable in the sense that similarity inference and relation inference of P-NAL shares the same logical foundation and uses interpretable logical inference steps. P-NAL is self-explanatory in the sense that it generates a log file of evidences for the alignments after each iteration so we can inspect the log files. This feature enhances the troubleshooting capacity of us to some extent during the development process of P-NAL. For example, inspecting the faulty alignments in the evidence file inspired many decision choices in this paper. Also, P-NAL is extensible because NAL can express and process many different reasoning patterns and logical structures, so P-NAL can be extended to tackle other challenges in the EA process in future research.

Experiments on cross-lingual EA dataset DBP15k𝐷𝐵𝑃15𝑘DBP15kitalic_D italic_B italic_P 15 italic_k demonstrate that P-NAL outperforms 12 existing EA methods, including both supervised and unsupervised state-of-the-art approaches, in 5 different configuration groups. Ablation study shows that our design choices jointly boost the overall performance of our model. With competitive EA performance, we conjecture that P-NAL’s similarity inference captures the essence of current EA task (with attribute triples), which means that aligning by such paths is both intuitive and effective.

To our knowledge, we present the first in-depth analysis of entity alignment’s basic principles from a unified logical perspective. Moreover, P-NAL is the first to integrate NAL in the EA task. P-NAL might also help explain the mechanism of other EA methods, as discussed in Section 4.

Outline. Section 2 gives a formal definition of EA and overviews the related works and relevant background knowledge of NAL. Section 3 first sketches our overall EA framework, then elaborates our design, including inference paths, literal utilization, matching algorithm, etc. Section 3.5 elaborates the overall structure of P-NAL. Section 4 discusses the relation with other methods. Our experimental results are presented in Section 5. We conclude in Section 6.

2. Preliminaries

2.1. Knowledge Graph and Entity Alignment

Knowledge graphs (KGs) are knowledge bases that store knowledge in the form of so-called facts or triples. We refer to (head, relation, tail) and (head, attribute, literal) as relation and attribute triples, respectively. Examples of both triple types are (New_Zealand, capital, Wellington) and (New_Zealand, establishedDate, “1947-11-25”), respectively. The arguments head and tail represent entities, relation is a relationship that holds between two entities, and attribute is a special type of relation that holds between an entity and a literal. Entities can be seen as graph nodes and they usually denote real-world objects, while literals are used to identify values for strings, numbers or dates. To summarize, a KG is characterized with a number of relation triples from ××\mathcal{E}\times\mathcal{R}\times\mathcal{E}caligraphic_E × caligraphic_R × caligraphic_E and a number of attribute triples from ×𝒜×𝒜\mathcal{E}\times\mathcal{A}\times\mathcal{L}caligraphic_E × caligraphic_A × caligraphic_L , where ,,𝒜𝒜\mathcal{E},\mathcal{R},\mathcal{A}caligraphic_E , caligraphic_R , caligraphic_A, and \mathcal{L}caligraphic_L indicate the set of entities, relations, attributes and literals, respectively.

The entity alignment (EA) problem is typically defined between two KGs, 𝒦𝒢1𝒦subscript𝒢1\mathcal{KG}_{1}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒦𝒢2𝒦subscript𝒢2\mathcal{KG}_{2}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where the task consists of finding equivalences (so-called alignment) between the set of entities 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of the two KGs. Sometimes there exists a set of given equivalences that can be used as supervision. This set 𝒮𝒮\mathcal{S}caligraphic_S is known as seed alignment set. The supervised EA methods are allowed to utilize the information of 𝒮𝒮\mathcal{S}caligraphic_S to infer the equivalences of other entities. We assume that there exists a ground truth set 𝒢={(e,e)1×2|ee}𝒢conditional-set𝑒superscript𝑒subscript1subscript2𝑒superscript𝑒\mathcal{G}=\{(e,e^{\prime})\in\mathcal{E}_{1}\times\mathcal{E}_{2}|\ e\equiv e% ^{\prime}\}caligraphic_G = { ( italic_e , italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_e ≡ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } that includes all known equivalences between pairs of entities. The ground truth set 𝒢𝒢\mathcal{G}caligraphic_G is usually used to evaluate the performance of an EA method, by comparing it with the output alignment set of the method.

2.2. Related Work

Generally speaking, there are three families of EA methods: embedding-based, path-based and embedding-path methods, as elaborated in this section.

In recent years, embedding-based methods have become mainstream for addressing the EA task  (Tang et al., 2023; Fanourakis et al., 2023). Their main idea is to embed the nodes (entities) and edges (relations or attributes) of a KG into a low-dimensional vector space that preserves their similarities in the original KG. Embedding-based EA methods usually consists of three parts: the embedding module, the alignment module and the matching module. For the embedding module, translational methods and graph neural network (GNN) methods are the most popular. Translational methods, such as MTransE  (Chen et al., 2016), usually optimize a margin-based loss function to learn the structural information (relation triples) of a KG. On the other hand, GNN methods recursively aggregate the representations of neighboring nodes with graph convolutional networks (GCNs) or graph attention networks (GATs). The representative ones are RDGCN  (Wu et al., 2019) and RREA  (Mao et al., 2020b), respectively. The alignment module maps the entity embeddings in different KGs into a unified space. There are generally three techniques  (Fanourakis et al., 2023) for this module: 1. Sharing the embedding space by using the margin-based loss to enforce the seed alignment entities’ embeddings from different KGs to be close. 2. Swap** the triples of seed alignment entities. 3. Map** the entity vectors from one embedding space to the other using a transformation matrix. The matching module generates the final alignment result. Common practices use the cosine similarity, the Manhattan distance, or the Euclidean distance between entity embeddings to measure their similarities and then performs a specific matching algorithm based on the similarity scores.

Apart from embedding-based methods, there exist a group of methods that directly estimates entity similarities from the contextual data (path) that are available in the two input KGs. We refer to them as “path-based” methods. There is a potential advantage that path-based methods can capture fine-grained matches of neighbors while the traditional embedding-based methods can’t. Embedding-based methods may suffer from the negative influence from the dissimilar neighbors, according to  (Tang et al., 2020). The distinction between embedding-based and path-based methods is sometimes obscure.

There are also emerging “embedding-path” methods that combines the idea of embedding learning and path reasoning. More recently, path-based and embedding-path methods are starting to surpass the performance of traditional embedding-based methods. Our proposed method P-NAL is a path-based method that uses embedding minimally, so in the following part we introduce several advanced path-based and embedding-path methods. Moreover, these methods are selected for comparison with P-NAL in our experiments.

PARIS  (Suchanek et al., 2011) is a classic unsupervised non-neural EA method with competitive performance on benchmark datasets  (Leone et al., 2022). It is purely path-based. PARIS introduces the concept “functionality” into the field of EA to enhance the validity of similarity inference paths. Functionality generally corresponds to the uniqueness of related things, for example a man can only have one father but multiple friends, so fun(father)𝑓𝑢𝑛𝑓𝑎𝑡𝑒𝑟fun(father)italic_f italic_u italic_n ( italic_f italic_a italic_t italic_h italic_e italic_r ) is close to 1 and fun(friend)𝑓𝑢𝑛𝑓𝑟𝑖𝑒𝑛𝑑fun(friend)italic_f italic_u italic_n ( italic_f italic_r italic_i italic_e italic_n italic_d ) is relatively lower, where fun()𝑓𝑢𝑛fun()italic_f italic_u italic_n ( ) represents functionality of a relation or attribute. See  (Suchanek et al., 2011) for more details about functionality. With functionality, PARIS constructs a probabilistic model that estimates the probabilities of an entity x𝑥xitalic_x in 𝒦𝒢1𝒦subscript𝒢1\mathcal{KG}_{1}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT being equivalent to another entity xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in 𝒦𝒢2𝒦subscript𝒢2\mathcal{KG}_{2}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Here is the formula for Pr(xx)𝑃𝑟𝑥superscript𝑥Pr\left(x\equiv x^{\prime}\right)italic_P italic_r ( italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ):

1r(x,y),r(x,y)(1Pr(rr)×fun(r1)×Pr(yy))1subscriptproduct𝑟𝑥𝑦superscript𝑟superscript𝑥superscript𝑦1𝑃𝑟𝑟superscript𝑟𝑓𝑢𝑛superscript𝑟1𝑃𝑟𝑦superscript𝑦1-\prod_{r(x,y),r^{\prime}(x^{\prime},y^{\prime})}\left(1-Pr\left(r\subset r^{% \prime}\right)\times fun(r^{-1})\times Pr\left(y\equiv y^{\prime}\right)\right)1 - ∏ start_POSTSUBSCRIPT italic_r ( italic_x , italic_y ) , italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( 1 - italic_P italic_r ( italic_r ⊂ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) × italic_f italic_u italic_n ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) × italic_P italic_r ( italic_y ≡ italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )

As depicted in the above formula, PARIS estimates the equivalence probabilities by integrating paths that connects corresponding entities. It also find subrelations between the two ontologies of KG with the following equation. Subrelations, such as rr𝑟superscript𝑟r\subset r^{\prime}italic_r ⊂ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, intuitively means a correspondence of two relations of different KGs such that one relational fact of r𝑟ritalic_r in 𝒦𝒢1𝒦subscript𝒢1\mathcal{KG}_{1}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT implies the existence of a corresponding relational fact of rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in 𝒦𝒢2𝒦subscript𝒢2\mathcal{KG}_{2}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Here is the formula for Pr(rr)𝑃𝑟𝑟superscript𝑟Pr\left(r\subset r^{\prime}\right)italic_P italic_r ( italic_r ⊂ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ):

Σr(x,y)(1r(x,y)(1Pr(xx)×Pr(yy)))Σr(x,y)(1x,y(1Pr(xx)×Pr(yy)))subscriptΣ𝑟𝑥𝑦1subscriptproductsuperscript𝑟superscript𝑥superscript𝑦1𝑃𝑟𝑥superscript𝑥𝑃𝑟𝑦superscript𝑦subscriptΣ𝑟𝑥𝑦1subscriptproductsuperscript𝑥superscript𝑦1𝑃𝑟𝑥superscript𝑥𝑃𝑟𝑦superscript𝑦\frac{\Sigma_{r(x,y)}\left(1-\prod_{r^{\prime}(x^{\prime},y^{\prime})}\left(1-% Pr\left(x\equiv x^{\prime}\right)\times Pr\left(y\equiv y^{\prime}\right)% \right)\right)}{\Sigma_{r(x,y)}\left(1-\prod_{x^{\prime},y^{\prime}}\left(1-Pr% \left(x\equiv x^{\prime}\right)\times Pr\left(y\equiv y^{\prime}\right)\right)% \right)}divide start_ARG roman_Σ start_POSTSUBSCRIPT italic_r ( italic_x , italic_y ) end_POSTSUBSCRIPT ( 1 - ∏ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( 1 - italic_P italic_r ( italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) × italic_P italic_r ( italic_y ≡ italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) end_ARG start_ARG roman_Σ start_POSTSUBSCRIPT italic_r ( italic_x , italic_y ) end_POSTSUBSCRIPT ( 1 - ∏ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 1 - italic_P italic_r ( italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) × italic_P italic_r ( italic_y ≡ italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) end_ARG

With the help of subrelations’ measurement, PARIS generalizes the equation of Pr(xx)𝑃𝑟𝑥superscript𝑥Pr\left(x\equiv x^{\prime}\right)italic_P italic_r ( italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) to the case where the two ontologies do not share common relations. Therefore, PARIS recursively aligns the entities and the equivalence probability of xx𝑥superscript𝑥x\equiv x^{\prime}italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT depends recursively on other equivalence probabilities. In each iteration, the probabilities are re-calculated based on the equivalences and subrelations of the previous iteration. Initial equivalences are computed between attribute literals based on a certain string distance measurement.

PARIS+  (Leone et al., 2022) is a variant of PARIS that makes a simple refinement and works in the absence of attribute triples. It processes the seed alignment information to generate synthetic attribute triples. That is, for every pair of seed alignments (x𝑥xitalic_x, xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT), it creates the attribute triples (x𝑥xitalic_x, EA:label, string(x𝑥xitalic_x)) and (xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, EA:label, string(x𝑥xitalic_x)), where EA:label is a synthetic relation. Thus, the reverse of the relation EA:label is designed to be highly functional in order to let the model match the seed alignments easily. P-NAL adopts the same refinement as PARIS+.

BERT-INT  (Tang et al., 2020), an embedding-path EA method, uses the well-known transformer model BERT to embed the entities and literals. It calculates the cosine similarity of the entity name/description embedding. Then it proposes an interaction model that compares each pair of neighbors or attributes (which forms a path from the source entity to the target entity) to obtain the neighbor/attribute similarity score. The name/description similarity vector, neighbor similarity vector and attribute similarity vector are concatenated and applied to a MLP layer to get the final similarity score.

FGWEA  (Tang et al., 2023) is a three-step progressive optimization algorithm for EA and it can be classified as an embedding-path EA method. First, the entity names and concatenated attribute triples are used for semantic embedding matching to obtain initial anchors. Then in order to approximate GWD (Gromov-Wasserstein Distance  (Peyré et al., 2016)), FGWEA computes cross-KG structural and relational similarities, which are then used for iterative multi-view optimal transport alignment. Finally, the Bregman Proximal Gradient algorithm  (Xu et al., 2019) is employed to refine the GWD’s coupling matrix.

There are also a few works that focus on the interpretability or explanability of EA, such as LightEA  (Mao et al., 2022) and ExEA  (Tian et al., 2023). LightEA is an interpretable non-neural EA method. It is inspired by a classical graph algorithm, label propagation  (Zhu and Zoubin, 2002). First, it generates a random orthogonal label for each seed alignment entity pair. Then, the labels of entities and relations are propagated according to the three views of adjacency tensor. Finally, LightEA utilizes sparse sinkhorn iteration to address the assignment problem of alignment results.

The ExEA framework, proposed by  (Tian et al., 2023), aims to explain the results of embedding-based EA. It generates semantic matching subgraphs as explanation by matching semantically consistent triples around the two aligned entities. ExEA devises an alignment dependency graph structure to gain deeper insights into the explanation.

The recent literature of EA is abundant, focusing on many different aspects or procedures of entity alignment apart from the aforementioned ones, such as utilizing attribute triples  (Liu et al., 2020; Sun et al., 2017), utilizing literals  (Gesese et al., 2021; Chen et al., 2018) , sample mining  (Liu et al., 2022; Mao et al., 2021a), reinforcement learning  (Guo et al., 2022), matching algorithm  (Lin et al., 2023; Dao et al., 2023; Mao et al., 2021b; Xu et al., 2020; Zeng et al., 2020), iterative strategy  (Liu et al., 2023; Mao et al., 2020a) and unsupervised learning  (Jiang et al., 2023b, a; Liu et al., 2022; Luo and Yu, 2022; Zhao et al., 2022). There are also some surveys for EA  (Fanourakis et al., 2023; Zeng et al., 2021; Sun et al., 2020; Mao et al., 2022). Besides graph structural, attribute and literal information, there are other information forms researched by the EA community, such as temporal, spatial and graphical information, however, these topics are beyond the scope of this paper.

2.3. A Brief Introduction to NAL

NAL (Non-Axiomatic Logic)  (Wang, 2013) is a logic designed for the creation of general-purpose AI systems, by formulating the fundamental regularities of human thinking in a general level. It can be used as the logical foundation of a (non-axiomatic) inference system. Traditional inference systems are usually based on model-theoretic semantics, while under the assumption of insufficient knowledge and resources, NAL is a term logic basing on experience-grounded semantics  (Wang, 2005). The meaning of a term in NAL, to the inference system, is determined by its role in the experience (which will be explained later), that is, how it has been related to other terms in the past. The truth-value of a statement in NAL is determined by how it has been supported or refuted by other statements in the past.

In this paper we only utilize a fraction of NAL’s syntax and inference capability (for EA). We will now introduce the relevant parts of its syntax. A term in NAL can either be atomic or compound. An atomic term is a word (string) or a variable term. Independent variable, such as “#x#𝑥\#x# italic_x”, represents any unspecified term under a given restriction, and intuitively correspond to the universally quantified variable in first-order predicate logic. Dependent variable, such as “$ycurrency-dollar𝑦\$y$ italic_y”, represents a certain unspecified term under a given restriction, and intuitively correspond to the existentially quantified variable. A compound term consists of term connector and components (which are themselves terms). A basic statement has the form of “subject copula predicate”, where subject and predicate are terms. There are multiple types of copula and each type has a corresponding statement type, including: 1.Inheritance (“AB𝐴𝐵A\rightarrow Bitalic_A → italic_B”, where A𝐴Aitalic_A and B𝐵Bitalic_B are terms) which intuitively means “B is a general case of A”; 2.Similarity (“AB𝐴𝐵A\leftrightarrow Bitalic_A ↔ italic_B”) which intuitively means “A is similar with B”; 3.Implication, which is a higher-order copula (“PQ𝑃𝑄P\Rightarrow Qitalic_P ⇒ italic_Q”, where P𝑃Pitalic_P and Q𝑄Qitalic_Q are statements), intuitively means “P implies Q” (different from the “material implication”, it requires P𝑃Pitalic_P to be related to Q𝑄Qitalic_Q in content because NAL is a term logic that uses syllogistic inference rules and only derives conclusions that are related in content). A sentence is a statement together with its truth-value. An intensional set with only one component, for example, “[red]delimited-[]𝑟𝑒𝑑[red][ italic_r italic_e italic_d ]” intuitively means “red things”. Term connector*” (product) combines multiple component terms into an ordered compound term such as (,A,B)𝐴𝐵\left(*,A,B\right)( ∗ , italic_A , italic_B ), which intuitively means “an anonymous relation between A and B”. Compound terms are usually written in the prefix format, that is the term connector is written in the first place. Statement connector\wedge” can be seen as the conjunction operator of propositional logic.

NAL is “non-axiomatic” in the sense that the truth-value of a conclusion in the inference system does not indicate how much the conclusion agrees with the “state of affair” in the world, or with a constant set of assumptions (the axioms), but how much it is supported by the evidence provided by the past experience of the system. Experience means the inference system’s history of interaction with the environment or equivalently the input sentences. The acquisition of experience may involve sensorimotor mechanism and sensation-perception process, which is beyond our scope. The information source of a sentence is characterized as its evidence. The inference rules of NAL coherently pass on the evidential information from the premises to the conclusion, so the premises can be seen as the evidence of the conclusion. The input sentences can be seen as a synthesis of virtual positive and negative evidences. Assume the available amount of positive evidence and negative evidence of a statement are written as w+superscript𝑤w^{+}italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and wsuperscript𝑤w^{-}italic_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, respectively, then the total amount of evidence is w=w++w𝑤superscript𝑤superscript𝑤w=w^{+}+w^{-}italic_w = italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT. The frequency of the statement is f=w+/w𝑓superscript𝑤𝑤f=w^{+}/witalic_f = italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT / italic_w, and the confidence of the statement is c=w/(w+k)𝑐𝑤𝑤𝑘c=w/(w+k)italic_c = italic_w / ( italic_w + italic_k ), where k𝑘kitalic_k is a positive constant representing “evidential horizon”. We take k𝑘kitalic_k = 1 in our implementation. Frequency intuitively means “the degree of truth” and confidence intuitively represents “the total amount of evidences”. The more evidences that the statement have considered, the higher confidence value. The truth-value attached to the statement is the ordered pair <f,c><f,c>< italic_f , italic_c > and it is often written right after the statement.

NAL uses syllogistic (rather than truth-functional) inference rules. Among them the revision rule merges evidences for the same statement collected from different sources together, so it can settle inconsistency among the system’s sentences. It is very useful in our approach. The relevant rules with corresponding truth functions are all listed in Table 1. Note that the inference rules are not domain-specific. There are three extended boolean operators  (Wang, 2013) in the calculation of truth functions:

{and(x1,,xn)=i=1nxior(x1,,xn)=1i=1n(1xi)not(x)=1x\left\{\begin{aligned} and(x_{1},...,x_{n})&=\prod_{i=1}^{n}x_{i}\\ or(x_{1},...,x_{n})\-&=1-\prod_{i=1}^{n}(1-x_{i})\\ not(x)&=1-x\end{aligned}\right.{ start_ROW start_CELL italic_a italic_n italic_d ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_o italic_r ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL = 1 - ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_n italic_o italic_t ( italic_x ) end_CELL start_CELL = 1 - italic_x end_CELL end_ROW , where xi[0,1]subscript𝑥𝑖01x_{i}\in[0,1]italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ].

Table 1. The table of relevant truth functions.
Inference rule Premises Conclusion
Deduction ABf1,c1,BCf2,c2formulae-sequence𝐴𝐵subscript𝑓1subscript𝑐1𝐵𝐶subscript𝑓2subscript𝑐2A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ B\rightarrow C\ \left% \langle f_{2},c_{2}\right\rangleitalic_A → italic_B ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_B → italic_C ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ ACf=and(f1,f2),c=and(f1,f2,c1,c2)𝐴𝐶delimited-⟨⟩formulae-sequence𝑓𝑎𝑛𝑑subscript𝑓1subscript𝑓2𝑐𝑎𝑛𝑑subscript𝑓1subscript𝑓2subscript𝑐1subscript𝑐2A\rightarrow C\ \left\langle f=and(f_{1},f_{2}),c=and(f_{1},f_{2},c_{1},c_{2})\right\rangleitalic_A → italic_C ⟨ italic_f = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_c = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟩
Analogy ABf1,c1,ACf2,c2𝐴𝐵subscript𝑓1subscript𝑐1𝐴𝐶subscript𝑓2subscript𝑐2A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ A\leftrightarrow C\ % \left\langle f_{2},c_{2}\right\rangleitalic_A → italic_B ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_A ↔ italic_C ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ CBf=and(f1,f2),c=and(f2,c1,c2)𝐶𝐵delimited-⟨⟩formulae-sequence𝑓𝑎𝑛𝑑subscript𝑓1subscript𝑓2𝑐𝑎𝑛𝑑subscript𝑓2subscript𝑐1subscript𝑐2C\rightarrow B\ \left\langle f=and(f_{1},f_{2}),c=and(f_{2},c_{1},c_{2})\right\rangleitalic_C → italic_B ⟨ italic_f = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_c = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟩
Conditional Deduction (PQ)Rf1,c1,Qf2,c2𝑃𝑄𝑅subscript𝑓1subscript𝑐1𝑄subscript𝑓2subscript𝑐2(P\wedge\ Q)\ \Rightarrow R\ \left\langle f_{1},c_{1}\right\rangle,\ Q\ \left% \langle f_{2},c_{2}\right\rangle( italic_P ∧ italic_Q ) ⇒ italic_R ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_Q ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ PRf=and(f1,f2),c=and(f1,f2,c1,c2)𝑃𝑅delimited-⟨⟩formulae-sequence𝑓𝑎𝑛𝑑subscript𝑓1subscript𝑓2𝑐𝑎𝑛𝑑subscript𝑓1subscript𝑓2subscript𝑐1subscript𝑐2P\Rightarrow R\ \left\langle f=and(f_{1},f_{2}),c=and(f_{1},f_{2},c_{1},c_{2})\right\rangleitalic_P ⇒ italic_R ⟨ italic_f = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_c = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟩
Induction ABf1,c1,ACf2,c2formulae-sequence𝐴𝐵subscript𝑓1subscript𝑐1𝐴𝐶subscript𝑓2subscript𝑐2A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ A\rightarrow C\ \left% \langle f_{2},c_{2}\right\rangleitalic_A → italic_B ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_A → italic_C ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ CBw+=and(f2,c2,f1,c1),w=and(f2,c2,not(f1),c1)𝐶𝐵delimited-⟨⟩formulae-sequencesuperscript𝑤𝑎𝑛𝑑subscript𝑓2subscript𝑐2subscript𝑓1subscript𝑐1superscript𝑤𝑎𝑛𝑑subscript𝑓2subscript𝑐2𝑛𝑜𝑡subscript𝑓1subscript𝑐1C\rightarrow B\ \left\langle w^{+}=and(f_{2},c_{2},f_{1},c_{1}),w^{-}=and(f_{2% },c_{2},not(f_{1}),c_{1})\right\rangleitalic_C → italic_B ⟨ italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = italic_a italic_n italic_d ( italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_n italic_o italic_t ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⟩
Revision Pf1,c1,Pf2,c2𝑃subscript𝑓1subscript𝑐1𝑃subscript𝑓2subscript𝑐2P\ \left\langle f_{1},c_{1}\right\rangle,\ P\ \left\langle f_{2},c_{2}\right\rangleitalic_P ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_P ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ Pw+=w1++w2+,w=w1+w2𝑃delimited-⟨⟩formulae-sequencesuperscript𝑤subscriptsuperscript𝑤1subscriptsuperscript𝑤2𝑤subscript𝑤1subscript𝑤2P\ \left\langle w^{+}=w^{+}_{1}+w^{+}_{2},w=w_{1}+w_{2}\right\rangleitalic_P ⟨ italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_w = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩
Probabilistic Revision Pf1,c1,Pf2,c2𝑃subscript𝑓1subscript𝑐1𝑃subscript𝑓2subscript𝑐2P\ \left\langle f_{1},c_{1}\right\rangle,\ P\ \left\langle f_{2},c_{2}\right\rangleitalic_P ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , italic_P ⟨ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ Pf=or(f1,f2),w=w1+w2𝑃delimited-⟨⟩formulae-sequence𝑓𝑜𝑟subscript𝑓1subscript𝑓2𝑤subscript𝑤1subscript𝑤2P\ \left\langle f=or(f_{1},f_{2}),w=w_{1}+w_{2}\right\rangleitalic_P ⟨ italic_f = italic_o italic_r ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_w = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩

2.4. Why NAL

Actually there might be many different logical systems that are qualified to represent the similarity inference process of EA. However, we believe that the non-axiomatic nature of NAL fits in the domain of knowledge graph better than those axiomatic logical systems, because real world KGs need to deal with the problem of open-domain and alterable/incomplete/conflicting facts. Fundamentally, the tasks of knowledge graph (such as EA), fits well with the assumption of insufficient knowledge and resources  (Wang, 2013), which is the basic assumption of NAL.

Technically speaking, NAL can represent entities, relations and relational triples, which are essential for EA. It can also perform formal reasoning and evidence aggregation, which is useful to align entities. The frequency/confidence measurement of truth-value is suitable to represent fuzziness and unknownness in the similarity inference process. The high expressiveness of NAL makes our approach extensible, which may benefit subsequent studies.

3. Our Approach

Our EA approach’s main idea stems from the well-known EA method PARIS. The overall structure of P-NAL adopts an iterative aligning strategy, and for each iteration it first performs similarity inference, then it uses a matching technique (rBMat algorithm with modification in Section 3.4) to obtain EA results. We formalize the similarity inference as using NAL’s revision inference rule to aggregate two types of inference paths. P-NAL also infers the matching of relations in each iteration. We explore and implement some other design choices and tricks to complete the alignment framework, as illustrated in the following subsections.

Refer to caption
Figure 1. An illustration of align-bridge, omitting irrelevant triples.

3.1. PARIS Reinterpreted in the Formal Language of NAL

The key point of the alignment process in PARIS is finding instances of bridge-like inference path between to-be-aligned entity pairs. We coin the term “align-bridge” to refer to such type of path. Valid align-bridges are retrieved from the KGs in a depth-first manner. As shown in Figure 1, entity y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and entity y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT belongs to different KGs (where subscripts represent different KGs) and (y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) forms a to-be-aligned entity pair. Triple (x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTy1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is a relational or attribute triple in 𝒦𝒢1𝒦subscript𝒢1\mathcal{KG}_{1}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, where x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is either an entity or a literal respectively. Note that PARIS (and P-NAL) automatically duplicates every original KG triple (a𝑎aitalic_ab𝑏bitalic_bc𝑐citalic_c) with a reversed triple (c𝑐citalic_cb1superscript𝑏1b^{-1}italic_b start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPTa𝑎aitalic_a) upon KG loading, so the attribute triple (x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTy1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) with a literal x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a reversed attribute triple. Similarly, triple (x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTy2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) is a relational or attribute triple in 𝒦𝒢2𝒦subscript𝒢2\mathcal{KG}_{2}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

In this paper, every entity, literal or relation is regarded as an atomic term in NAL. Triple (x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTy1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is reinterpreted as inheritance statement (*, x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTy1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) \rightarrow r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Its intuitive meaning is “The relation between x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a specialization of relational term r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT”. The triples (or “facts”) of the KGs can be seen as absolutely true to some extent, so the truth-value attached to the statement is 1,111\left\langle 1,1\right\rangle⟨ 1 , 1 ⟩. All of the triples of the two KGs are taken as input sentences, which forms the experience of the inference system. In PARIS, the equality score between x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is retrieved to measure the similarity between them. We interpret the equality as a similarity statement x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT \leftrightarrow  x2subscript𝑥2x_{2}\ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the score is reflected in the truth-value of the statement. Note that in the case of entity pair, the similarity comes from either seed alignments or alignments of the previous iteration. We omit the entity similarity statement which has a f𝑓fitalic_f or c𝑐citalic_c lesser than theta𝑡𝑒𝑡𝑎thetaitalic_t italic_h italic_e italic_t italic_a, a hyper-parameter. And in the case of literal pair, the similarity comes from literal comparison (for example, string identity comparison).

At the same time, PARIS retrieves the sub-relation probability score Pr(r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT \subseteq r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) (from the computation result of the previous iteration or a default value iota𝑖𝑜𝑡𝑎iotaitalic_i italic_o italic_t italic_a, which is a hyper-parameter) which we interpret as an inheritance statement r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT \rightarrow r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with a truth-value. Its intuitive meaning is “The relational term r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a specialization of relational term r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT”. PARIS evaluates the degree of functionality of relation r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with precomputed functionalities of each relation. We interpret it as an inheritance statement r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT \rightarrow [fun𝑓𝑢𝑛funitalic_f italic_u italic_n] with the degree reflected in the truth-value. The statement intuitively means “r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT has the functional property (to some extent)”.

Then the validity of align-bridge is interpreted in terms of inference steps in NAL as follows:

(typeIpath)premises::𝑡𝑦𝑝𝑒𝐼𝑝𝑎𝑡𝑝𝑟𝑒𝑚𝑖𝑠𝑒𝑠absent(type\ I\ path)\ premises:( italic_t italic_y italic_p italic_e italic_I italic_p italic_a italic_t italic_h ) italic_p italic_r italic_e italic_m italic_i italic_s italic_e italic_s :

(1) (,x1,y1)r11,1.subscript𝑥1subscript𝑦1subscript𝑟111\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ 1 , 1 ⟩ .
(2) (,x2,y2)r21,1.subscript𝑥2subscript𝑦2subscript𝑟211\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 1,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ 1 , 1 ⟩ .
(3) r1r2f3,c3.subscript𝑟1subscript𝑟2subscript𝑓3subscript𝑐3\displaystyle r_{1}\rightarrow r_{2}\ \ \left\langle f_{3},c_{3}\right\rangle.italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩ .
(4) r2r1f4,c4.subscript𝑟2subscript𝑟1subscript𝑓4subscript𝑐4\displaystyle r_{2}\rightarrow r_{1}\ \ \left\langle f_{4},c_{4}\right\rangle.italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ⟩ .
(5) x1x2f5,c5.subscript𝑥1subscript𝑥2subscript𝑓5subscript𝑐5\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{5},c_{5}\right\rangle.italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ⟩ .
(6) r1[fun]f6,c6.subscript𝑟1delimited-[]𝑓𝑢𝑛subscript𝑓6subscript𝑐6\displaystyle r_{1}\rightarrow[fun]\ \ \left\langle f_{6},c_{6}\right\rangle.italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → [ italic_f italic_u italic_n ] ⟨ italic_f start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ⟩ .
(7) r2[fun]f7,c7.subscript𝑟2delimited-[]𝑓𝑢𝑛subscript𝑓7subscript𝑐7\displaystyle r_{2}\rightarrow[fun]\ \ \left\langle f_{7},c_{7}\right\rangle.italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → [ italic_f italic_u italic_n ] ⟨ italic_f start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT ⟩ .
((,$a,#b1)$r(,$a,#b2)$r\displaystyle(\left(*,\$a,\#b_{1}\right)\rightarrow\$r\ \wedge\ \left(*,\$a,\#% b_{2}\right)\rightarrow\$r\ ( ( ∗ , $ italic_a , # italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → $ italic_r ∧ ( ∗ , $ italic_a , # italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → $ italic_r
(8) $r[fun])#b1#b2f8,c8.\displaystyle\wedge\ \$r\rightarrow[fun])\ \Rightarrow\#b_{1}\leftrightarrow\#% b_{2}\ \ \left\langle f_{8},c_{8}\right\rangle.∧ $ italic_r → [ italic_f italic_u italic_n ] ) ⇒ # italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ # italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ⟩ .

inferencesteps(path)&conclusion::𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠𝑡𝑒𝑝𝑠𝑝𝑎𝑡𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛absentinference\ steps\ (path)\ \&\ conclusion:italic_i italic_n italic_f italic_e italic_r italic_e italic_n italic_c italic_e italic_s italic_t italic_e italic_p italic_s ( italic_p italic_a italic_t italic_h ) & italic_c italic_o italic_n italic_c italic_l italic_u italic_s italic_i italic_o italic_n :

(9) from(1)and(3),Deduction:(,x1,y1)r2f9,c9.:𝑓𝑟𝑜𝑚1𝑎𝑛𝑑3𝐷𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛subscript𝑥1subscript𝑦1subscript𝑟2subscript𝑓9subscript𝑐9\displaystyle from\ (1)\ and\ (3),Deduction:\left(*,x_{1},y_{1}\right)% \rightarrow r_{2}\ \ \left\langle f_{9},c_{9}\right\rangle.italic_f italic_r italic_o italic_m ( 1 ) italic_a italic_n italic_d ( 3 ) , italic_D italic_e italic_d italic_u italic_c italic_t italic_i italic_o italic_n : ( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT ⟩ .
(10) from(2)and(5),Analogy:(,x1,y2)r2f10,c10.:𝑓𝑟𝑜𝑚2𝑎𝑛𝑑5𝐴𝑛𝑎𝑙𝑜𝑔𝑦subscript𝑥1subscript𝑦2subscript𝑟2subscript𝑓10subscript𝑐10\displaystyle from\ (2)\ and\ (5),Analogy:\left(*,x_{1},y_{2}\right)% \rightarrow r_{2}\ \ \left\langle f_{10},c_{10}\right\rangle.italic_f italic_r italic_o italic_m ( 2 ) italic_a italic_n italic_d ( 5 ) , italic_A italic_n italic_a italic_l italic_o italic_g italic_y : ( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ⟩ .
(11) from(8),(9),(10),and(7),Conditionaldeduction×3:y1y2f11,c11.\displaystyle from(8),(9),(10),and(7),Conditional\ deduction\times 3:\ \ y_{1}% \leftrightarrow y_{2}\ \ \left\langle f_{11},c_{11}\right\rangle.italic_f italic_r italic_o italic_m ( 8 ) , ( 9 ) , ( 10 ) , italic_a italic_n italic_d ( 7 ) , italic_C italic_o italic_n italic_d italic_i italic_t italic_i italic_o italic_n italic_a italic_l italic_d italic_e italic_d italic_u italic_c italic_t italic_i italic_o italic_n × 3 : italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ⟩ .

Here is a brief introduction of the idea of the inference steps: The first two steps aim to match up the two triples (1) and (2). The first step exchanges r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the triple (1) for r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The second step exchanges x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the triple (2) for x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The third step is a type of inference similar with the resolution rule of Horn clauses and (8) correspond to a Horn rule.

In the path listed above, we omit two auxiliary inference steps right before arriving at conclusion (10) which performs structural transformation in order to dismount x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from the product without modifying truth-value. The last conditional deduction of (11) degenerates into a case without conjunction in its premises (similar with Modus Ponens) and its truth function remains the same. Statement (11) is the conclusion of the above inference steps and the whole steps act as a summarizing or validation process of the align-bridge. Note that there can be four distinct inference paths of an align-bridge (including the path above) that share the same premise set and the same conclusion (with slightly different truth-values), with different details of inference steps. In implementation, we aggregate two of them (of both relational inheritance direction) by probabilistic revision rule (similar with PARIS).

Implication statement (8) is regarded as a definition or a piece of essence of the concept “functionality”. Relations’ functionality seems to reflect a widespread orderliness of reality or human cognition and PARIS leverages such orderliness.

The conclusions with the same statement but obtained from different align-bridges are merged by probabilistic revision rule because of the probabilistic nature of functionality. For example, the functionality of relation children𝑐𝑖𝑙𝑑𝑟𝑒𝑛childrenitalic_c italic_h italic_i italic_l italic_d italic_r italic_e italic_n is 0.68 which means that the majority of the population approximately have one to three children. Entity Lynne_Cheney𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦Lynne\_Che\-neyitalic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y has two children, however, in the alignment system when performing alignment of the two Mary_Cheney𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦Mary\_Cheneyitalic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y entities (who is a children of Lynne_Cheney𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦Lynne\_\-Cheneyitalic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y) of different KGs we do not know how many children does Lynne_Cheney𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦Lynne\_Cheneyitalic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y has. The conclusion of the align-bridge Mary_Cheney(en)𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑒𝑛Mary\_\-Chen\-ey(en)italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_e italic_n )Lynne_Cheney(en)𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦𝑒𝑛Lynne\_Cheney(en)italic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_e italic_n )Lynne_Cheney(zh)𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦𝑧Lynne\_Cheney(zh)italic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_z italic_h )Mary_Cheney(zh)𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑧Ma\-ry\_Cheney\-(zh)italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_z italic_h ) is

Mary_Cheney(en)Mary_Cheney(zh)f,c.𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑒𝑛𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑧𝑓𝑐Mary\_Cheney(en)\leftrightarrow Mary\_Cheney(zh)\ \ \left\langle f,c\right\rangle.italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_e italic_n ) ↔ italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_z italic_h ) ⟨ italic_f , italic_c ⟩ .

It has a probabilistic nature because we don’t know whether Mary_Cheney(en)𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑒𝑛Mary\_\-Cheney(en)italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_e italic_n ) and Mary_Cheney(zh)𝑀𝑎𝑟𝑦_𝐶𝑒𝑛𝑒𝑦𝑧Mary\_Cheney(zh)italic_M italic_a italic_r italic_y _ italic_C italic_h italic_e italic_n italic_e italic_y ( italic_z italic_h ) is the same children of Lynne_Cheney𝐿𝑦𝑛𝑛𝑒_𝐶𝑒𝑛𝑒𝑦Lynne\_\-Cheneyitalic_L italic_y italic_n italic_n italic_e _ italic_C italic_h italic_e italic_n italic_e italic_y. Thus the probabilistic revision rule is used to aggregate the conclusions of multiple align-bridges. This rule is similar with the continued multiplication of PARIS’s probability formula for Pr(xx)𝑃𝑟𝑥superscript𝑥Pr\left(x\equiv x^{\prime}\right)italic_P italic_r ( italic_x ≡ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (given in Section 2.2), except for the introduction and calculation of confidence.

3.2. Aligning Relations

Apart from aligning entities, we also align relations using an adapted version of PARIS’s probabilistic relation aligning method. The relation aligning formula of PARIS is reinterpreted as directly estimating the evidence amount for the conclusion. The truth-value of the conclusion statement r1r2subscript𝑟1subscript𝑟2r_{1}\rightarrow r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (the inheritance statement between two relations r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) is computed as follow:

w+=Σ(x1,r1,y1)(1(x2,r2,y2)(1expt(x1x2)×expt(y1y2)))\displaystyle w^{+}=\Sigma_{(x_{1},r_{1},y_{1})}\left(1-\prod_{(x_{2},r_{2},y_% {2})}\left(1-expt\left(x_{1}\leftrightarrow x_{2}\right)\times expt\left(y_{1}% \leftrightarrow y_{2}\right)\right)\right)italic_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_Σ start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( 1 - ∏ start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( 1 - italic_e italic_x italic_p italic_t ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) × italic_e italic_x italic_p italic_t ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) )
w=Σ(x1,r1,y1)(1x2,y2(1expt(x1x2)×expt(y1y2)))\displaystyle w=\Sigma_{(x_{1},r_{1},y_{1})}\left(1-\prod_{x_{2},y_{2}}\left(1% -expt\left(x_{1}\leftrightarrow x_{2}\right)\times expt\left(y_{1}% \leftrightarrow y_{2}\right)\right)\right)italic_w = roman_Σ start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( 1 - ∏ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 - italic_e italic_x italic_p italic_t ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) × italic_e italic_x italic_p italic_t ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) )

where expt𝑒𝑥𝑝𝑡exptitalic_e italic_x italic_p italic_t is the expectation𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛expectationitalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n of a truth-value, and expectation𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛expectationitalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n is a combined measurement of f𝑓fitalic_f and c𝑐citalic_c, defined as expectation=f×c𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑓𝑐expectation=f\times citalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n = italic_f × italic_c.

Alternatively, the truth-value of statement r1r2subscript𝑟1subscript𝑟2r_{1}\rightarrow r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be obtained via inference on two types of paths:

(typeIIIpath)premises::𝑡𝑦𝑝𝑒𝐼𝐼𝐼𝑝𝑎𝑡𝑝𝑟𝑒𝑚𝑖𝑠𝑒𝑠absent(type\ III\ path)\ premises:( italic_t italic_y italic_p italic_e italic_I italic_I italic_I italic_p italic_a italic_t italic_h ) italic_p italic_r italic_e italic_m italic_i italic_s italic_e italic_s :

(12) (,x1,y1)r11,1.subscript𝑥1subscript𝑦1subscript𝑟111\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ 1 , 1 ⟩ .
(13) (,x2,y2)r21,1.subscript𝑥2subscript𝑦2subscript𝑟211\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 1,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ 1 , 1 ⟩ .
(14) x1x2f14,c14.subscript𝑥1subscript𝑥2subscript𝑓14subscript𝑐14\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{14},c_{14}\right\rangle.italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ⟩ .
(15) y1y2f15,c15.subscript𝑦1subscript𝑦2subscript𝑓15subscript𝑐15\displaystyle y_{1}\leftrightarrow y_{2}\ \ \left\langle f_{15},c_{15}\right\rangle.italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 15 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 15 end_POSTSUBSCRIPT ⟩ .

inferencesteps(path)&conclusion::𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠𝑡𝑒𝑝𝑠𝑝𝑎𝑡𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛absentinference\ steps\ (path)\ \&\ conclusion:italic_i italic_n italic_f italic_e italic_r italic_e italic_n italic_c italic_e italic_s italic_t italic_e italic_p italic_s ( italic_p italic_a italic_t italic_h ) & italic_c italic_o italic_n italic_c italic_l italic_u italic_s italic_i italic_o italic_n :

(16) from(12)and(14),Analogy:(,x2,y1)r1f16,c16.:𝑓𝑟𝑜𝑚12𝑎𝑛𝑑14𝐴𝑛𝑎𝑙𝑜𝑔𝑦subscript𝑥2subscript𝑦1subscript𝑟1subscript𝑓16subscript𝑐16\displaystyle from\ (12)\ and\ (14),Analogy:\left(*,x_{2},y_{1}\right)% \rightarrow r_{1}\ \ \left\langle f_{16},c_{16}\right\rangle.italic_f italic_r italic_o italic_m ( 12 ) italic_a italic_n italic_d ( 14 ) , italic_A italic_n italic_a italic_l italic_o italic_g italic_y : ( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 16 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 16 end_POSTSUBSCRIPT ⟩ .
(17) from(16)and(15),Analogy:(,x2,y2)r1f17,c17.:𝑓𝑟𝑜𝑚16𝑎𝑛𝑑15𝐴𝑛𝑎𝑙𝑜𝑔𝑦subscript𝑥2subscript𝑦2subscript𝑟1subscript𝑓17subscript𝑐17\displaystyle from\ (16)\ and\ (15),Analogy:\left(*,x_{2},y_{2}\right)% \rightarrow r_{1}\ \ \left\langle f_{17},c_{17}\right\rangle.italic_f italic_r italic_o italic_m ( 16 ) italic_a italic_n italic_d ( 15 ) , italic_A italic_n italic_a italic_l italic_o italic_g italic_y : ( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT ⟩ .
(18) from(13)and(17),Induction:r1r21,c18.:𝑓𝑟𝑜𝑚13𝑎𝑛𝑑17𝐼𝑛𝑑𝑢𝑐𝑡𝑖𝑜𝑛subscript𝑟1subscript𝑟21subscript𝑐18\displaystyle from\ (13)\ and\ (17),Induction:r_{1}\rightarrow r_{2}\ \ \left% \langle 1,c_{18}\right\rangle.italic_f italic_r italic_o italic_m ( 13 ) italic_a italic_n italic_d ( 17 ) , italic_I italic_n italic_d italic_u italic_c italic_t italic_i italic_o italic_n : italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ 1 , italic_c start_POSTSUBSCRIPT 18 end_POSTSUBSCRIPT ⟩ .

(typeIVpath)premises::𝑡𝑦𝑝𝑒𝐼𝑉𝑝𝑎𝑡𝑝𝑟𝑒𝑚𝑖𝑠𝑒𝑠absent(type\ IV\ path)\ premises:( italic_t italic_y italic_p italic_e italic_I italic_V italic_p italic_a italic_t italic_h ) italic_p italic_r italic_e italic_m italic_i italic_s italic_e italic_s :

(19) (,x1,y1)r11,1.subscript𝑥1subscript𝑦1subscript𝑟111\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ 1 , 1 ⟩ .
(20) (,x2,y2)r20,1.subscript𝑥2subscript𝑦2subscript𝑟201\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 0,1% \right\rangle.( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ 0 , 1 ⟩ .
(21) x1x2f21,c21.subscript𝑥1subscript𝑥2subscript𝑓21subscript𝑐21\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{21},c_{21}\right\rangle.italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ⟩ .
(22) y1y2f22,c22.subscript𝑦1subscript𝑦2subscript𝑓22subscript𝑐22\displaystyle y_{1}\leftrightarrow y_{2}\ \ \left\langle f_{22},c_{22}\right\rangle.italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ⟩ .

inferencesteps(path)&conclusion::𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠𝑡𝑒𝑝𝑠𝑝𝑎𝑡𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛absentinference\ steps\ (path)\ \&\ conclusion:italic_i italic_n italic_f italic_e italic_r italic_e italic_n italic_c italic_e italic_s italic_t italic_e italic_p italic_s ( italic_p italic_a italic_t italic_h ) & italic_c italic_o italic_n italic_c italic_l italic_u italic_s italic_i italic_o italic_n :

(23) from(19)and(21),Analogy:(,x2,y1)r1f23,c23.:𝑓𝑟𝑜𝑚19𝑎𝑛𝑑21𝐴𝑛𝑎𝑙𝑜𝑔𝑦subscript𝑥2subscript𝑦1subscript𝑟1subscript𝑓23subscript𝑐23\displaystyle from\ (19)\ and\ (21),Analogy:\left(*,x_{2},y_{1}\right)% \rightarrow r_{1}\ \ \left\langle f_{23},c_{23}\right\rangle.italic_f italic_r italic_o italic_m ( 19 ) italic_a italic_n italic_d ( 21 ) , italic_A italic_n italic_a italic_l italic_o italic_g italic_y : ( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT ⟩ .
(24) from(23)and(22),Analogy:(,x2,y2)r1f24,c24.:𝑓𝑟𝑜𝑚23𝑎𝑛𝑑22𝐴𝑛𝑎𝑙𝑜𝑔𝑦subscript𝑥2subscript𝑦2subscript𝑟1subscript𝑓24subscript𝑐24\displaystyle from\ (23)\ and\ (22),Analogy:\left(*,x_{2},y_{2}\right)% \rightarrow r_{1}\ \ \left\langle f_{24},c_{24}\right\rangle.italic_f italic_r italic_o italic_m ( 23 ) italic_a italic_n italic_d ( 22 ) , italic_A italic_n italic_a italic_l italic_o italic_g italic_y : ( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 24 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 24 end_POSTSUBSCRIPT ⟩ .
(25) from(20)and(24),Induction:r1r20,c25.:𝑓𝑟𝑜𝑚20𝑎𝑛𝑑24𝐼𝑛𝑑𝑢𝑐𝑡𝑖𝑜𝑛subscript𝑟1subscript𝑟20subscript𝑐25\displaystyle from\ (20)\ and\ (24),Induction:r_{1}\rightarrow r_{2}\ \ \left% \langle 0,c_{25}\right\rangle.italic_f italic_r italic_o italic_m ( 20 ) italic_a italic_n italic_d ( 24 ) , italic_I italic_n italic_d italic_u italic_c italic_t italic_i italic_o italic_n : italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ 0 , italic_c start_POSTSUBSCRIPT 25 end_POSTSUBSCRIPT ⟩ .

Premise (20) has a frequency value of 0, so it is not a triple in the KG but represents the absence of the triple. Note that the induction inference rule is a weak inference rule, so the upper bound of its conclusion’s confidence is lower than the strong inference rules (such as deduction and analogy). Type III path only generates positive evidence for the conclusion and type IV path only generates negative evidence, because of the characteristic of the truth-function of induction rule. The conclusions of the two types of path are supposed to be merged by the revision rule. In implementation, for simplicity we directly estimate evidence amount instead of performing inference for relation alignment, however, the latter approach may provide a more theoretical view. These two approaches share the same idea but the calculation processes are not identical.

Due to the incompleteness of the KGs of relevant datasets, the positive evidence for the inheritance statement r1r2subscript𝑟1subscript𝑟2r_{1}\rightarrow r_{2}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is usually inadequate, for that either of the premise (12) or (13) may be missing in the KG but actually true. To address this issue, we force the frequency of every inheritance statement to increase by a proportion (at the end of each iteration), that is f:=f+(1f)×inc_relation_fassign𝑓𝑓1𝑓𝑖𝑛𝑐_𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓f:=f+\left(1-f\right)\times inc\_relation\_fitalic_f := italic_f + ( 1 - italic_f ) × italic_i italic_n italic_c _ italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n _ italic_f, where :=assign:=:= represents assignment and inc_relation_f𝑖𝑛𝑐_𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓inc\_relation\_fitalic_i italic_n italic_c _ italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n _ italic_f is a hyper-parameter.

3.3. Utilizing Literal Values

Literal values in real-world KGs act as entity names, entity descriptions, relation/attribute names or attribute values, carrying enormous information. Literal values include texts (strings), numerical values and dates. Literals’ deficiency of its outer semantic structure (triples) contrasts with its abundant internal semantics. However, symbolic reasoning languages (systems) like NAL currently can’t effectively handle the subtle semantics in texts for the following reasons: semantic parsing/understanding requires processing capacity and efficiency of complex logical forms and it also requires automatic learning capacity; the lacking of KGs with complex logical forms; the lacking of KGs with detailed and comprehensive common sense knowledge. In a certain perspective, the literal values in real-world KGs are not really “literal” but rather under-characterized entities, concepts, triples, common sense knowledge and/or statements with complex logical forms. The real-world KG project may not have enough information or adequate paradigm to deal with them. For example, the literal value of attribute triple ((((John Lennon, deathPlace, “Manhattan, New York City, United States”@en)))) referred to entities “Manhattan”, “New York” and “United States”, and its form indicates a specific relation between these places.

Deep neural network language models provide an interim solution to the literal value understanding problem. For example, BERT-INT  (Tang et al., 2020) utilize BERT to embed names/descriptions and values into vector space, thus use similarities between the feature vectors for alignment.

P-NAL adopts the same embedding method as BERT-INT and fuse it into the similarity inference system. First, the basic BERT unit is finetuned on the name/description of seed alignment entity pairs. Then we use the finetuned model to compute feature vectors for all entities (name/description). Moreover, we get the feature vectors for attribute values. Entity name/description feature’s cosine similarity is used to convert directly to the statement

y1y2sim(y1,y2),Cname.subscript𝑦1subscript𝑦2𝑠𝑖𝑚subscript𝑦1subscript𝑦2subscript𝐶𝑛𝑎𝑚𝑒y_{1}\leftrightarrow y_{2}\ \ \left\langle sim(y_{1},y_{2}),\ C_{name}\right\rangle.italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_s italic_i italic_m ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT ⟩ .

where sim𝑠𝑖𝑚simitalic_s italic_i italic_m is cosine similarity and Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT is a hyper-parameter. The direct path linking the to-be-aligned entities with their name/description similarity is the type II path of similarity inference as mentioned before. The statement is seen as a piece of evidence and fused with other evidences (such as those from align-bridges) by revision rule.

This path seems straightforward, however we can have a deeper understanding of it. Language models used for the embedding process of EA are distinct information sources other than the KG itself. The deep language model which has the ability of aligning or translating entity names can be seen as a generalized alignment model that aligns morphemes, words, entities and concepts. The pretraining corpus of it consists of sentences, although the sentences do not possesses explicit structures, they can be understood by the model by transforming them into complex logical forms. However, such transformation (if exist) and the logical forms are implicitly expressed in the model parameters. To summarize, our similarity inference’s type II path can be seen as the aggregation of multiple (virtual) complex logical paths. The aggregation result is represented into the vector space by the language model.

Attribute value feature’s cosine similarity is used to convert to the truth-value of premise statement (5) if x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are attribute values. The corresponding truth-value is

f=sim(x1,x2),c=sim(x1,x2)delimited-⟨⟩formulae-sequence𝑓𝑠𝑖𝑚subscript𝑥1subscript𝑥2𝑐𝑠𝑖𝑚subscript𝑥1subscript𝑥2\left\langle f=sim(x_{1},x_{2}),\ c=\-sim(x_{1},x_{2})\right\rangle⟨ italic_f = italic_s italic_i italic_m ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_c = italic_s italic_i italic_m ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟩

The idea is that the deep learning model’s result which has higher similarity is usually more verifiable. There are thousands of distinct attribute values in a KG, so for an attribute value we only consider the Kvaluesubscript𝐾𝑣𝑎𝑙𝑢𝑒K_{value}italic_K start_POSTSUBSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUBSCRIPT most similar (but not identical) values in the other KG to prevent an explosive number of value similarities. Kvaluesubscript𝐾𝑣𝑎𝑙𝑢𝑒K_{value}italic_K start_POSTSUBSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUBSCRIPT is a hyper-parameter and in implementation we set Kvaluesubscript𝐾𝑣𝑎𝑙𝑢𝑒K_{value}italic_K start_POSTSUBSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUBSCRIPT to 1.

3.4. Matching Algorithm

1
input : An array of linked list of similarity sentences KG1_to_KG2𝐾𝐺1_𝑡𝑜_𝐾𝐺2KG1\_to\_KG2italic_K italic_G 1 _ italic_t italic_o _ italic_K italic_G 2, with each linked list storing top-k similarity sentences of an entity with descending order.
output : Optimized 1-to-1 similarity sentences (alignment results)
2 populates KG2_to_KG1𝐾𝐺2_𝑡𝑜_𝐾𝐺1KG2\_to\_KG1italic_K italic_G 2 _ italic_t italic_o _ italic_K italic_G 1 with all of the sentences in KG1_to_KG2𝐾𝐺1_𝑡𝑜_𝐾𝐺2KG1\_to\_KG2italic_K italic_G 1 _ italic_t italic_o _ italic_K italic_G 2;
/* KG2_to_KG1𝐾𝐺2_𝑡𝑜_𝐾𝐺1KG2\_to\_KG1italic_K italic_G 2 _ italic_t italic_o _ italic_K italic_G 1 is another array of linked list, arranging the similarity sentences in the other direction */
3 for e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
4       recursively_delete(e1,nullsubscript𝑒1𝑛𝑢𝑙𝑙e_{1},nullitalic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n italic_u italic_l italic_l);
5      
6
Algorithm 1 recursive bidirectional matching
1
input : Entity e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, entity eprevsubscript𝑒𝑝𝑟𝑒𝑣e_{prev}italic_e start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT.
/* e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the entity to be processed and we assume that e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT belongs to the left graph, similarly otherwise. Entity eprevsubscript𝑒𝑝𝑟𝑒𝑣e_{prev}italic_e start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT represents the previous entity, that is the processed entity of the recursion parent. */
output : entity ereturnsubscript𝑒𝑟𝑒𝑡𝑢𝑟𝑛e_{return}italic_e start_POSTSUBSCRIPT italic_r italic_e italic_t italic_u italic_r italic_n end_POSTSUBSCRIPT which represents the final alignment for e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
2 for sentence𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒sentenceitalic_s italic_e italic_n italic_t italic_e italic_n italic_c italic_e in KG1_to_KG2(e1)𝐾𝐺1_𝑡𝑜_𝐾𝐺2subscript𝑒1KG1\_to\_KG2(e_{1})italic_K italic_G 1 _ italic_t italic_o _ italic_K italic_G 2 ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) do
3       e2subscript𝑒2e_{2}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT \leftarrow predicate_term𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒_𝑡𝑒𝑟𝑚predicate\_termitalic_p italic_r italic_e italic_d italic_i italic_c italic_a italic_t italic_e _ italic_t italic_e italic_r italic_m of sentence𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒sentenceitalic_s italic_e italic_n italic_t italic_e italic_n italic_c italic_e;
       /* predicate_term𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒_𝑡𝑒𝑟𝑚predicate\_termitalic_p italic_r italic_e italic_d italic_i italic_c italic_a italic_t italic_e _ italic_t italic_e italic_r italic_m means the other entity of the similarity sentence */
4       if e2subscript𝑒2e_{2}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT == eprevsubscript𝑒𝑝𝑟𝑒𝑣e_{prev}italic_e start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT then
5             ereturnsubscript𝑒𝑟𝑒𝑡𝑢𝑟𝑛e_{return}italic_e start_POSTSUBSCRIPT italic_r italic_e italic_t italic_u italic_r italic_n end_POSTSUBSCRIPT \leftarrow eprevsubscript𝑒𝑝𝑟𝑒𝑣e_{prev}italic_e start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT;
6             break;
7            
8      else
9             e3subscript𝑒3e_{3}italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT \leftarrow recursively_delete(e2,e1subscript𝑒2subscript𝑒1e_{2},e_{1}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT);
10             if e3subscript𝑒3e_{3}italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT == e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT then
11                   ereturnsubscript𝑒𝑟𝑒𝑡𝑢𝑟𝑛e_{return}italic_e start_POSTSUBSCRIPT italic_r italic_e italic_t italic_u italic_r italic_n end_POSTSUBSCRIPT \leftarrow e2subscript𝑒2e_{2}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;
12                   break;
13                  
14            
15      
16for sentence𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒sentenceitalic_s italic_e italic_n italic_t italic_e italic_n italic_c italic_e in KG1_to_KG2(e1)𝐾𝐺1_𝑡𝑜_𝐾𝐺2subscript𝑒1KG1\_to\_KG2(e_{1})italic_K italic_G 1 _ italic_t italic_o _ italic_K italic_G 2 ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) except the first node do
       /* now that the first sentence for e1subscript𝑒1e_{1}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is bidirectionally matched, we delete other sentences */
17       removes sentence𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒sentenceitalic_s italic_e italic_n italic_t italic_e italic_n italic_c italic_e from the linked list;
18       removes sentence𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒sentenceitalic_s italic_e italic_n italic_t italic_e italic_n italic_c italic_e’s counterpart in KG2_to_KG1𝐾𝐺2_𝑡𝑜_𝐾𝐺1KG2\_to\_KG1italic_K italic_G 2 _ italic_t italic_o _ italic_K italic_G 1 which expresses the same similarity in the other direction;
19      
20return ereturnsubscript𝑒𝑟𝑒𝑡𝑢𝑟𝑛e_{return}italic_e start_POSTSUBSCRIPT italic_r italic_e italic_t italic_u italic_r italic_n end_POSTSUBSCRIPT;
21
Algorithm 2 recursively delete

There are 1-to-1 assumptions in some EA datasets (such as DBP15K𝐷𝐵𝑃15𝐾DBP\-15Kitalic_D italic_B italic_P 15 italic_K) and it is a useful information for alignment. Formally, we define the 1-to-1 assumption as follows: first, there is a range of alignable entities A11subscript𝐴1subscript1A_{1}\subset\mathcal{E}_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊂ caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and A22subscript𝐴2subscript2A_{2}\subset\mathcal{E}_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊂ caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (for DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K, A11subscript𝐴1subscript1A_{1}\subsetneqq\mathcal{E}_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⫋ caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). Second, the equivalence between A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a bijection. Note that the assumption does not have aligning regularity for entities outside the range except that they can’t be aligned with entities inside the range. For example, DBP15KZH_EN𝐷𝐵𝑃15subscript𝐾𝑍𝐻_𝐸𝑁DBP15K_{ZH\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_Z italic_H _ italic_E italic_N end_POSTSUBSCRIPT’s ZH𝑍𝐻ZHitalic_Z italic_H graph has a 15,000 sized entity set A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and EN𝐸𝑁ENitalic_E italic_N graph has a 15,000 sized entity set A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. It is informed that every entity in A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT must have a unique entity in A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as its alignment counterpart. Many ranking-based EA methods leverages the 1-to-1 range assumption, however, PARIS do not. Therefore, in implementation in order to leverage the range assumption we take the set A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as input and filters out any alignment sentence that aligns A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to 2A2subscript2subscript𝐴2\mathcal{E}_{2}\setminus A_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∖ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT or 1A1subscript1subscript𝐴1\mathcal{E}_{1}\setminus A_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∖ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Align-bridge’s similarities (type I path) are naturally sparse, because it only considers the entity pairs which is effectively linked by the logical path. Entity name/description’s similarities (type II path) are dense, however, it is noisy and most of the similarities are useless. P-NAL’s overall algorithm (depicted in the next Section) exhaustively search for and stores the two types of similarity sentences for a specific to-be-aligned entity versus any entity in the other KG. Then, because of the sparsity of informative similarity signal, the similarity sentences is rearranged into ordered linked list, one list for a specific to-be-aligned entity. The sentences are ordered (descending) by its expectation𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛expectationitalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n value. We only store the top Ksimsubscript𝐾𝑠𝑖𝑚K_{sim}italic_K start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT similarity sentences in the linked list, where Ksimsubscript𝐾𝑠𝑖𝑚K_{sim}italic_K start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT is a model hyper-parameter.

Then we perform a recursive bidirectional matching algorithm (rBMat) which has similar idea with BMat  (Tang et al., 2020) but different implementation. See Algorithm 1 and Algorithm 2 for details. The main idea is to recursively delete the similarity sentences that don’t conform the 1-to-1 assumption. Considering sorting cost, our rBMat has O(kn2)𝑂𝑘superscript𝑛2O(kn^{2})italic_O ( italic_k italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time complexity and O(kn)𝑂𝑘𝑛O(kn)italic_O ( italic_k italic_n ) space complexity, where k𝑘kitalic_k represents Ksimsubscript𝐾𝑠𝑖𝑚K_{sim}italic_K start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT and knmuch-less-than𝑘𝑛k\ll nitalic_k ≪ italic_n.

We found that there are still some mismatches after performing rBMat algorithm and most of them share a same pattern. For example, e1e2subscript𝑒1subscript𝑒2e_{1}\leftrightarrow e_{2}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and e3e4subscript𝑒3subscript𝑒4e_{3}\leftrightarrow e_{4}italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT are two ground truth pairs, however, rBMat’s result is e1e3subscript𝑒1subscript𝑒3e_{1}\leftrightarrow e_{3}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and e2e4subscript𝑒2subscript𝑒4e_{2}\leftrightarrow e_{4}italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. We implements a simple “modify matches” algorithm to handle this. We exhaustively search for and modify (swap the alignment) the cases in which expt(e1e2)+expt(e3e4)expt(e_{1}\leftrightarrow e_{2})+expt(e_{3}\leftrightarrow e_{4})italic_e italic_x italic_p italic_t ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_e italic_x italic_p italic_t ( italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) is greater than expt(e1e3)+expt(e2e4)expt(e_{1}\leftrightarrow e_{3})+expt(e_{2}\leftrightarrow e_{4})italic_e italic_x italic_p italic_t ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) + italic_e italic_x italic_p italic_t ( italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↔ italic_e start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ), where expt𝑒𝑥𝑝𝑡exptitalic_e italic_x italic_p italic_t is the expectation𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛expectationitalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n of the truth-value.

3.5. Overall structure of P-NAL

Generally speaking, our method adopts the same optimization method and iteration strategy as PARIS’s implementation, with some minor differences. The overall structure of supervised P-NAL algorithm is elaborated in Algorithm 3. Line 1-3 finetunes the BERT unit. Line 6-13 performs similarity inference and relational alignment. Line 14-19 performs the proposed matching algorithm to obtain alignment results of an iteration. End_iteration𝐸𝑛𝑑_𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛End\_iterationitalic_E italic_n italic_d _ italic_i italic_t italic_e italic_r italic_a italic_t italic_i italic_o italic_n is a hyper-parameter. Note that the inference within each iteration benefits from the alignment results (both entities and relations) of the previous iteration. Registering evidential information (line 10 in Algorithm 3) means memorizing which premises form the specific align-bridge and such information will be used to generate evidence log file.

1
input : Two knowledge graphs 𝒦𝒢1𝒦subscript𝒢1\mathcal{KG}_{1}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒦𝒢2𝒦subscript𝒢2\mathcal{KG}_{2}caligraphic_K caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
output : Alignment result and other information.
2 run finetuning for BERT unit;
3 compute entity/value embeddings with the BERT unit;
4 generate synthetic attribute triples for seed alignments (for supervision);
5 load the knowledge graphs;
6 for iteration0𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛0iteration\leftarrow 0italic_i italic_t italic_e italic_r italic_a italic_t italic_i italic_o italic_n ← 0 to end_iteration𝑒𝑛𝑑_𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛end\_iterationitalic_e italic_n italic_d _ italic_i italic_t italic_e italic_r italic_a italic_t italic_i italic_o italic_n do
7       for y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
             /* aligning for different entities of 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is divided into multiple parallel threads */
8             for x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that forms a sound align-bridge path with y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
                   /* Type I path. The paths are searched in a depth-first manner */
9                   update the estimations of w𝑤witalic_w and w+limit-from𝑤w+italic_w + for relational inheritance;
10                   perform inference with the inference steps and inference rules in Section 3.1;
11                   register evidential information of the path;
12                  
13            for y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT do
                   /* Type II path. */
14                   retrieve embedding similarity for y1y2subscript𝑦1subscript𝑦2y_{1}\leftrightarrow y_{2}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↔ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;
15                   integrate the similarity with prior conclusions of align-bridge by revision rule;
16                  
17            filter the similarity sentences with 1-to-1 range assumption;
18             insert the sentences into a top-k ordered linked list;
19            
20      dump the similarity sentences;
21       perform recursive bidirectional matching;
22       modify matches;
23       dump alignment results and evidences (log file);
24       increase frequency of relation inheritance statement;
25      
26
Algorithm 3 P-NAL(supervised)

3.6. Unsupervised Learning

The seed alignment set is not always available for different EA tasks or real-world EA applications. So an unsupervised scenario is sometimes adopted to evaluate the industrial applicability of EA methods. We adapt our method to the unsupervised scenario, that is, without using seed alignments. The BERT embedding model need to finetune on seed alignments, so we adopt a bootstrap** strategy. First, a P-NAL instance performs alignment on the dataset with 0% seed and no literal embedding information. Then, filter the initial alignment results’ expectation𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛expectationitalic_e italic_x italic_p italic_e italic_c italic_t italic_a italic_t italic_i italic_o italic_n with a threshold θfiltersubscript𝜃𝑓𝑖𝑙𝑡𝑒𝑟\theta_{filter}italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_l italic_t italic_e italic_r end_POSTSUBSCRIPT and use the filtered result as the training set of BERT. Next, another 0% seed P-NAL instance performs alignment with the help of BERT’s literal embedding information to obtain the final result.

3.7. Interpretability

Following  (Rudin, 2019; Marcinkevičs and Vogt, 2020), interpretable ML (machine learning) focuses on designing models that are inherently interpretable, while explainable ML tries to provide post hoc explanations for existing black box models. P-NAL is highly interpretable and self-explanatory. It is arguably more interpretable than PARIS for the following two reasons. First, with the introduction of evidence amount (confidence) and logical inference rules, P-NAL processes data with more information and generates a more informative explanation. Second, P-NAL manages value similarity and name similarity in a unified logical framework, while PARIS doesn’t leverage such information.

P-NAL is self-explanatory in the sense that it generates a log file of evidences for the alignments so we can inspect the file after one iteration. This feature enhances the troubleshooting capacity of us to some extent during the development process of P-NAL. For example, inspecting the faulty alignments in the evidence file inspired many decision choices in this paper. The generated evidences are displayed in our GitHub repository.

Using the neural BERT model does not weaken the interpretability of type I similarity inference path because utilizing literal similarity does not affect the interpretable inference steps. Moreover, as we only keep the attribute value similarities with a score above the threshold, these similarities are easily understood and self-explanatory, except the wrong ones. Our method tolerates faulty attribute value similarity because the align-bridge needs a conjunction of all premises, while faulty similarities usually can’t form a complete premise set.

4. Relation with Other Methods

In this section, we will discuss the relation between our proposed method and methods with other forms. We will propose some preliminary explanations of certain translational embedding methods and embedding-path EA methods from a theoretical perspective.

The way NAL models KG information and the inference process has a similar part with “uncertainty estimation”  (Hu et al., 2023) in the natural language processing domain. The truth-value of alignments shares some similarity with the distributive view of facts or beliefs which views facts as probability distribution of random variables. Also, the concept of confidence is shared with some information extraction systems such as Markov logic network  (Jiang et al., 2012), which assigns confidence to extracted facts or logical formulas in some intermediate steps.

4.1. Relation with Translational Embedding Methods

The well-known KG embedding model TransE (Bordes et al., 2013) is initially proposed for link prediction tasks. It may be partially explained from a logical perspective of NAL (or equivalently other logic with similar expressive power). Consider a specific type of Horn clauses ((,A,B)R1(,B,C)R2)(,A,C)R3f1,c1𝐴𝐵subscript𝑅1𝐵𝐶subscript𝑅2𝐴𝐶subscript𝑅3subscript𝑓1subscript𝑐1((*,A,B)\rightarrow R_{1}\wedge\ (*,B,C)\rightarrow R_{2})\ \Rightarrow(*,A,C)% \rightarrow R_{3}\ \left\langle f_{1},c_{1}\right\rangle( ( ∗ , italic_A , italic_B ) → italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ ( ∗ , italic_B , italic_C ) → italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ ( ∗ , italic_A , italic_C ) → italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩, the following three triples

(Martin_Luther_King_Jr,birthPlace,Georgia_(U.S._state))\displaystyle(Martin\_Luther\_King\_Jr,\ birthPlace,\ Georgia\_(U.S.\_state))( italic_M italic_a italic_r italic_t italic_i italic_n _ italic_L italic_u italic_t italic_h italic_e italic_r _ italic_K italic_i italic_n italic_g _ italic_J italic_r , italic_b italic_i italic_r italic_t italic_h italic_P italic_l italic_a italic_c italic_e , italic_G italic_e italic_o italic_r italic_g italic_i italic_a _ ( italic_U . italic_S . _ italic_s italic_t italic_a italic_t italic_e ) )
(Georgia_(U.S._state),country,United_States)\displaystyle(Georgia\_(U.S.\_state),\ country,\ United\_States)( italic_G italic_e italic_o italic_r italic_g italic_i italic_a _ ( italic_U . italic_S . _ italic_s italic_t italic_a italic_t italic_e ) , italic_c italic_o italic_u italic_n italic_t italic_r italic_y , italic_U italic_n italic_i italic_t italic_e italic_d _ italic_S italic_t italic_a italic_t italic_e italic_s )
(Martin_Luther_King_Jr,citizenship,United_States)𝑀𝑎𝑟𝑡𝑖𝑛_𝐿𝑢𝑡𝑒𝑟_𝐾𝑖𝑛𝑔_𝐽𝑟𝑐𝑖𝑡𝑖𝑧𝑒𝑛𝑠𝑖𝑝𝑈𝑛𝑖𝑡𝑒𝑑_𝑆𝑡𝑎𝑡𝑒𝑠\displaystyle(Martin\_Luther\_King\_Jr,\ citizenship,\ United\_States)( italic_M italic_a italic_r italic_t italic_i italic_n _ italic_L italic_u italic_t italic_h italic_e italic_r _ italic_K italic_i italic_n italic_g _ italic_J italic_r , italic_c italic_i italic_t italic_i italic_z italic_e italic_n italic_s italic_h italic_i italic_p , italic_U italic_n italic_i italic_t italic_e italic_d _ italic_S italic_t italic_a italic_t italic_e italic_s )

together forms a piece of positive evidence of an instantiated Horn clause, in which R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and R3subscript𝑅3R_{3}italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is replaced by birthPlace𝑏𝑖𝑟𝑡𝑃𝑙𝑎𝑐𝑒birthPlaceitalic_b italic_i italic_r italic_t italic_h italic_P italic_l italic_a italic_c italic_e, country𝑐𝑜𝑢𝑛𝑡𝑟𝑦countryitalic_c italic_o italic_u italic_n italic_t italic_r italic_y and citizenship𝑐𝑖𝑡𝑖𝑧𝑒𝑛𝑠𝑖𝑝citizenshipitalic_c italic_i italic_t italic_i italic_z italic_e italic_n italic_s italic_h italic_i italic_p respectively. We conjecture that the gradient descent optimization process of TransE implicitly performs approximate logical inference and evidence aggregation. In the above example for each of the three triples, h+rtnormhrt||\textbf{h}+\textbf{r}-\textbf{t}||| | h + r - t | | (where bold format represent a vector) is minimized once per epoch (ignoring margin-based criterion), leading to birthPlace+countrycitizenshipbirthPlacecountrycitizenship\textbf{birthPlace}+\textbf{country}\approx\textbf{citizenship}birthPlace + country ≈ citizenship. Thus, the instantiated Horn clause together with its truth-value may be represented by the vector representations’ correlation, and the truth-value may be reflected in distance birthPlace+countrycitizenshipnormbirthPlacecountrycitizenship||\textbf{birthPlace}+\textbf{country}-\textbf{citizenship}||| | birthPlace + country - citizenship | |. Note that these three relations may appear in more than one Horn clauses, so the gradients from the evidences of a Horn clause may confuse with (or conflict with) those from another Horn clause, for example manufacturer+countrymade-
InCountry
manufacturercountrymade-
InCountry
\textbf{manufacturer}+\textbf{country}\approx\textbf{made-}\\ \textbf{InCountry}manufacturer + country ≈ bold_made- bold_InCountry
. The training process may force vector birthPlace to be nearly perpendicular with manufacturer, otherwise, there may be hallucination in link prediction or EA results. A similar explanation of hallucination may apply to LLMs. A similar analysis applies to the vector representations of two relations which frequently appear on the same head entity (or tail entity). It’s arguable that the test set link prediction process of TransE mainly relies on Horn clauses, because from a logical perspective there is no other information. In this paper Horn clauses will not be extracted and managed, leaving for further research.

MTransE  (Chen et al., 2016) is a translational embedding-based EA method. It encodes the two KGs’ relational triples separately with the TransE loss criterion SK=Σ(h,r,t)h+rtsubscript𝑆𝐾subscriptΣ𝑟𝑡normhrtS_{K}=\Sigma_{(h,r,t)}||\textbf{h}+\textbf{r}-\textbf{t}||italic_S start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = roman_Σ start_POSTSUBSCRIPT ( italic_h , italic_r , italic_t ) end_POSTSUBSCRIPT | | h + r - t | |. It proposed a “distance-based axis calibration” alignment model in order to coincide the vectors of counterpart entities/relations. The corresponding loss is Sa2=Σee+rrsubscript𝑆subscript𝑎2ΣnormesuperscriptenormrsuperscriptrS_{a_{2}}=\Sigma||\textbf{e}-\textbf{e}^{\prime}||+||\textbf{r}-\textbf{r}^{% \prime}||italic_S start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_Σ | | e - e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | + | | r - r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | (Sa2subscript𝑆subscript𝑎2S_{a_{2}}italic_S start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT only has the first item if there is no available seed relation alignment). The seed and derived alignments are assumed to have eeesuperscripte\textbf{e}\approx\textbf{e}^{\prime}e ≈ e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and we see it as the embedding representation of the similarity statement ee𝑒superscript𝑒e\leftrightarrow e^{\prime}italic_e ↔ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, with its truth-value somehow represented by the distance eenormesuperscripte||\textbf{e}-\textbf{e}^{\prime}||| | e - e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | |. Theoretically, the distance can’t simultaneously represent frequency and confidence by itself, but more possibly a combined effect. We argue that MTransE performs approximate inference that is similar with the type III path, because if the learned embedding constraints of the four premises are considered simultaneously, we can get rrrsuperscriptr\textbf{r}\approx\textbf{r}^{\prime}r ≈ r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT which we interpret as rr𝑟superscript𝑟r\leftrightarrow r^{\prime}italic_r ↔ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Similarly, MTransE performs approximate inference of the type I path (with functionality omitted and rr𝑟superscript𝑟r\rightarrow r^{\prime}italic_r → italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT replaced by rr𝑟superscript𝑟r\leftrightarrow r^{\prime}italic_r ↔ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) to obtain derived alignment results.

4.2. Relation with Embedding-path EA Methods

Here we propose some preliminary explanations of the similarity inference aspect of some embedding-path EA methods from a theoretical perspective.

The first method to be discussed is BERT-INT. It generates entity embedding using the name/description information with BERT unit and the embedding is C(e)=MLP(CLS(e))𝐶𝑒𝑀𝐿𝑃𝐶𝐿𝑆𝑒C(e)=MLP(CLS(e))italic_C ( italic_e ) = italic_M italic_L italic_P ( italic_C italic_L italic_S ( italic_e ) ). It uses pairwise margin loss to approximately enforce C(e)C(e)𝐶𝑒𝐶superscript𝑒C(e)\approx C(e^{\prime})italic_C ( italic_e ) ≈ italic_C ( italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Different from MTransE which performs path inference implicitly with the gradient optimization of loss criterions, BERT-INT explicitly performs path inference with its proposed interaction model. Every element of the neighbor-view interaction matrix represents a inference process of a type I path. Its path omits functionality and relation alignment (for BERT-INT fails to utilize its proposed relation mask matrix). Because of the ignorance of relation type, its premise (1) and (2) has the form of (,x1,y1)$rsubscript𝑥1subscript𝑦1currency-dollar𝑟\left(*,x_{1},y_{1}\right)\rightarrow\$r( ∗ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → $ italic_r and (,x2,y2)$rsubscript𝑥2subscript𝑦2currency-dollar𝑟\left(*,x_{2},y_{2}\right)\rightarrow\$r( ∗ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → $ italic_r which represents “There exists an unspecified relation between x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT/y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and (another) unspecified relation between x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT/y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT”. Moreover, its premise (5) fails to utilize derived alignments, because BERT-INT is not iterative. With such premises, BERT-INT’s type I path inference’s effectiveness is supposed to be lower than that of P-NAL’s. Similarly, every element of the attribute-view interaction matrix represents a type I path which has attribute triples as premises (1) and (2). BERT-INT’s evidence aggregation method is different from P-NAL which uses probabilistic revision and revision rules.

The second method to be discussed is FGWEA. Its multi-view Optimal Transport (OT) alignment step combines four cost matrices for the OT problem, that is, Csum=Cstru+Crel+Cname+Cattrsubscript𝐶𝑠𝑢𝑚subscript𝐶𝑠𝑡𝑟𝑢subscript𝐶𝑟𝑒𝑙subscript𝐶𝑛𝑎𝑚𝑒subscript𝐶𝑎𝑡𝑡𝑟C_{sum}=C_{stru}+C_{rel}+C_{name}+C_{attr}italic_C start_POSTSUBSCRIPT italic_s italic_u italic_m end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_r italic_e italic_l end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_a italic_t italic_t italic_r end_POSTSUBSCRIPT. Obtaining the cost matrices corresponds to the similarity inference process and different matrices correspond to different groups of inference paths. Among them, Crelsubscript𝐶𝑟𝑒𝑙C_{rel}italic_C start_POSTSUBSCRIPT italic_r italic_e italic_l end_POSTSUBSCRIPT corresponds to a degenerated type I path inference where relation alignment is obtained by relation names and without the consideration of functionality. Cstrusubscript𝐶𝑠𝑡𝑟𝑢C_{stru}italic_C start_POSTSUBSCRIPT italic_s italic_t italic_r italic_u end_POSTSUBSCRIPT corresponds to a further degenerated type I path inference (similar with BERT-INT’s neighbor-view interaction). Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT corresponds to type II path inference. Cattrsubscript𝐶𝑎𝑡𝑡𝑟C_{attr}italic_C start_POSTSUBSCRIPT italic_a italic_t italic_t italic_r end_POSTSUBSCRIPT fails to model the (fine-grained) attributive type I path because it uses the concatenation of all attribute triples of an entity.

In this paper, BERT-INT and FGWEA are classified as embedding-path EA methods because their embedding module couples with the path inference to some extent. In contrast, P-NAL, which we classify as path-based, performs inference wherever it can and uses embeddings minimally.

Table 2. Dataset statistics. |||\mathcal{E}|| caligraphic_E |, |||\mathcal{R}|| caligraphic_R |, |𝒯|subscript𝒯|\mathcal{T_{R}}|| caligraphic_T start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT | and |𝒯𝒜|subscript𝒯𝒜|\mathcal{T_{A}}|| caligraphic_T start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | represent the number of entities, relation types, relation triples and attribute triples in each KG, respectively.
Dataset |||\mathcal{E}|| caligraphic_E | |||\mathcal{R}|| caligraphic_R | |𝒯|subscript𝒯|\mathcal{T_{R}}|| caligraphic_T start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT | |𝒯𝒜|subscript𝒯𝒜|\mathcal{T_{A}}|| caligraphic_T start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |
DBP15KZH_EN𝐷𝐵𝑃15subscript𝐾𝑍𝐻_𝐸𝑁DBP15K_{ZH\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_Z italic_H _ italic_E italic_N end_POSTSUBSCRIPT 19,388 1,701 70,414 379,684
19,572 1,323 95,142 567,755
DBP15KJA_EN𝐷𝐵𝑃15subscript𝐾𝐽𝐴_𝐸𝑁DBP15K_{JA\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_J italic_A _ italic_E italic_N end_POSTSUBSCRIPT 19,814 1,299 77,214 354,619
19,780 1,153 93,484 497,230
DBP15KFR_EN𝐷𝐵𝑃15subscript𝐾𝐹𝑅_𝐸𝑁DBP15K_{FR\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_F italic_R _ italic_E italic_N end_POSTSUBSCRIPT 19,661 903 105,998 528,665
19,993 1,208 115,722 576,543
D-W-15K-V2 15,000 167 73,983 66,813
15,000 121 83,365 175,686

5. Experiments and Results

5.1. Datasets and Settings

We evaluate our model on two EA datasets: the widely used cross-lingual dataset DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K (see  (Sun et al., 2017) for details) and a monolingual multi-source dataset D-W-15K-V2  (Sun et al., 2020). DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K consists of three subsets of cross-lingual KG pairs extracted from DBpedia: DBP15KZH_EN𝐷𝐵𝑃15subscript𝐾𝑍𝐻_𝐸𝑁DBP15\-K_{ZH\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_Z italic_H _ italic_E italic_N end_POSTSUBSCRIPT (Chinese to English), DBP15KJA_EN𝐷𝐵𝑃15subscript𝐾𝐽𝐴_𝐸𝑁DBP15K_{JA\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_J italic_A _ italic_E italic_N end_POSTSUBSCRIPT (Japanese to English), and DBP15KFR_EN𝐷𝐵𝑃15subscript𝐾𝐹𝑅_𝐸𝑁DBP15K_{FR\_EN}italic_D italic_B italic_P 15 italic_K start_POSTSUBSCRIPT italic_F italic_R _ italic_E italic_N end_POSTSUBSCRIPT (French to English). Each KG pair contains 15,000 seed alignments. D-W-15K-V2 consists of two English KGs extracted from DBpedia and WikiData, respectively, and there are 15,000 seed alignments. The statistics of the datasets are listed in Table 2.

The configuration of our main results on DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K (Table 3) consists of five settings: Attr.,Name,Trans.,Desc.Attr.,Name,Trans.,Desc.italic_A italic_t italic_t italic_r . , italic_N italic_a italic_m italic_e , italic_T italic_r italic_a italic_n italic_s . , italic_D italic_e italic_s italic_c . and Seed𝑆𝑒𝑒𝑑Seeditalic_S italic_e italic_e italic_d, explained as follows. Attr.𝐴𝑡𝑡𝑟Attr.italic_A italic_t italic_t italic_r . is for utilizing the attribute triples. Name𝑁𝑎𝑚𝑒Nameitalic_N italic_a italic_m italic_e is for utilizing the entity name information. Trans.𝑇𝑟𝑎𝑛𝑠Trans.italic_T italic_r italic_a italic_n italic_s . is for utilizing translators, such as the Google translator. Desc.𝐷𝑒𝑠𝑐Desc.italic_D italic_e italic_s italic_c . is for utilizing the information of entity description. Seed𝑆𝑒𝑒𝑑Seeditalic_S italic_e italic_e italic_d is for the percentage of seed alignments, 30% for the conventional supervised scenario and 0% for the unsupervised scenario.

We categorize baselines into five configuration groups and run P-NAL using the configurations for each group. Group 1 is the supervised scenario with attribute triples. Group 2 is the unsupervised scenario with attribute triples. Group 3 is the supervised or unsupervised scenario with entity name information and translator. Group 4 is the supervised scenario with entity name and description information, which is the same scenario as BERT-INT. Group 5 is the supervised or unsupervised scenario with attribute triples and entity name information.

Most hyper-parameters of our model remain the same across different datasets and configurations, except for group 3 which will be discussed later. The hyper-parameters are selected manually. We set iota𝑖𝑜𝑡𝑎iotaitalic_i italic_o italic_t italic_a = 0.5, theta𝑡𝑒𝑡𝑎thetaitalic_t italic_h italic_e italic_t italic_a = 0.1, inc_relation_f𝑖𝑛𝑐_𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓inc\_relation\_fitalic_i italic_n italic_c _ italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n _ italic_f = 0.5 and end_iteration𝑒𝑛𝑑_𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛end\_iterationitalic_e italic_n italic_d _ italic_i italic_t italic_e italic_r italic_a italic_t italic_i italic_o italic_n = 21. We set Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT = 0.8 for FR_EN otherwise 0.6. Ksimsubscript𝐾𝑠𝑖𝑚K_{sim}italic_K start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT is set to 80. θfiltersubscript𝜃𝑓𝑖𝑙𝑡𝑒𝑟\theta_{filter}italic_θ start_POSTSUBSCRIPT italic_f italic_i italic_l italic_t italic_e italic_r end_POSTSUBSCRIPT is set to 0.6. The BERT unit is finetuned for 15 epochs. The dimension of the BERT CLS embedding is 768 and the dimension of BERT unit’s embedding output is 300. P-NAL do not perform data cleaning. Roughly speaking, P-NAL has more hyper-parameters than many other EA methods, which may be a drawback. The automatic setting of hyper-parameters is left for further research. The entity name translations are obtained by Google translator, which is consistent with many other studies.

Our P-NAL model is implemented in java and the BERT unit is implemented in python with PyTorch. All experiments are performed on a Linux server with an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz, 251G RAM and a NVIDIA GeForce RTX 3090 GPU.

Table 3. Evaluation Results of all compared EA methods on DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K in five different configuration groups. Methods marked with * use the additional information of relation names.
Config Model Settings ZH_EN JA_EN FR_EN
Attr. Name Trans. Desc. Seed Hits@1 Hits@1 Hits@1
1 JAPE 30% 0.412 0.363 0.324
GCNAlign 30% 0.413 0.399 0.373
PARIS+ 30% 0.904 0.874 0.928
P-NAL 30% 0.984 0.970 0.990
2 PARIS 0% 0.777 0.785 0.793
FGWEA* 0% 0.929 0.922 0.967
P-NAL 0% 0.978 0.967 0.988
3 RDGCN 30% 0.708 0.767 0.886
CUEA 30% 0.921 0.946 0.956
UPL-EA 30% 0.949 0.970 0.995
SE-UEA 0% 0.935 0.951 0.957
LightEA 0% 0.952 0.981 0.995
FGWEA* 0% 0.959 0.982 0.994
P-NAL 0% 0.938 0.988 0.996
4 BERT-INT 30% 0.968 0.964 0.995
P-NAL 30% 0.998 0.995 0.999
5 TEA 30% 0.941 0.941 0.979
FGWEA* 0% 0.976 0.978 0.997
P-NAL 0% 0.989 0.984 0.998

5.2. Evaluation Metric

We use Hits@1 (which is the same metric as recall for EA) as the sole evaluation metric of our main results of DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K for the following reasons. Mean Reciprocal Rank (MRR) is unavailable for P-NAL because it does not provide a alignment ranking for the test entities. There exist a non-negligible number of equivalent entity pairs that are not in the ground-truth of DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K, so the precision and F1-score can’t be measured properly. We use the precision (P), recall (R), and F1 score for dataset D-W-15K-V2.

5.3. Main Results

We compare P-NAL with the following methods, most of which are new and well-performing: JAPE  (Sun et al., 2017), GCNAlign  (Wang et al., 2018), PARIS+  (Leone et al., 2022), PARIS  (Suchanek et al., 2011), FGWEA  (Tang et al., 2023), RDGCN  (Wu et al., 2019) ,CUEA  (Zhao et al., 2022), UPL-EA  (Ding et al., 2023), SE-UEA  (Jiang et al., 2023a), LightEA  (Mao et al., 2022), BERT-INT  (Tang et al., 2020), TEA  (Zhao et al., 2023). Their results are fetched from their original papers if possible, with their settings carefully examined.

The experimental settings and results of P-NAL and all compared baselines on DBP15K𝐷𝐵𝑃15𝐾DBP15Kitalic_D italic_B italic_P 15 italic_K are in Table 3. As observed, P-NAL achieves the best performance in term of Hits@1 in all five groups except group 3. P-NAL outperforms BERT-INT significantly with identical setting and the same embedding method, verifying the effectiveness of our similarity inference combined with the matching algorithm. P-NAL outperforms FGWEA in group 2 and 5, indicating that it successfully utilizes the information of attribute triples. In group 1, two classic EA model JAPE and GCNAlign are outperformed by the newer approaches (PARIS+ and P-NAL) by a significant margin, indicating the effective innovation of the new EA approaches in the recent years. The performance of P-NAL in unsupervised group 2 approaches its performance in supervised group 1 with a minor gap, indicating that our proposed bootstrap** strategy effectively adapts to the unsupervised setting (with the help of attribute information).

As for configuration group 3, the attribute information is unavailable and we have to rely on the name and translation information to bootstrap the alignment process. We use two BERT units instead of one to separately embed the original entity names and the translated entity names. The BERT units are finetuned separately. We adapt the bootstrap** strategy in Section 3.6 into three steps. In each step, we perform alignment with a P-NAL instance and filter the alignment results as the training set of the BERT units of the subsequent step. The first step uses unfinetuned BERT units and only considers translated names because the unfinetuned embeddings of original entity names have relatively poor quality. We adjust the hyper-parameters accordingly, the confidence Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT’s evidence amount w𝑤witalic_w is halved compared with other settings because there are two embeddings for one entity. Also, we decrease the confidence Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT by 0.2 and 0.1 for the frist two steps, matching the quality of the embeddings. Ksimsubscript𝐾𝑠𝑖𝑚K_{sim}italic_K start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT is set to 400 and other hyper-parameters are unchanged. P-NAL outperforms other methods in configuration group 3 on JA_EN and FR_EN, including three supervised ones. However, on ZH_EN, unsupervised FGWEA and LightEA yields better performance. This is possibly due to the differences of matching algorithm, for example both of them involve the Sinkhorn algorithm. The error accumulation effect of P-NAL’s strategy in group 3 is left for further study.

Refer to caption
Figure 2. Influence of Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT.

5.4. Influence of Confidence Hyper-parameter

The experiment results of Figure 2 shows how entity name/description embedding similarity confidence Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT affects Hits@1. These experiments are performed on configuration group 4 without using attribute value embedding information. We adjust Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT with other conditions unchanged. The Hits@1 curve is approximately concave and for ZH_EN𝑍𝐻_𝐸𝑁ZH\_ENitalic_Z italic_H _ italic_E italic_N, JA_EN𝐽𝐴_𝐸𝑁JA\_ENitalic_J italic_A _ italic_E italic_N and FR_EN𝐹𝑅_𝐸𝑁FR\_ENitalic_F italic_R _ italic_E italic_N respectively, it reaches maximum performance at 0.6, 0.55 and 0.8. It shows that the informative embedding similarity enhances the performance to different extents. French is often regarded as more closely related to English than Chinese or Japanese, so the BERT unit learns representation easier and thus produces more confident embedding similarity. Pretraining corpus of the BERT unit may include relevant triples (in the form of natural language sentences) which may have same informational origin with DBpedia. So the embedding similarity’s evidences may have an overlap part with align-bridge’s evidences. The revision rule is only appropriately used when the two premises don’t share same evidence (or equivalently their evidential bases do not overlap). So the appropriate confidence value need to be lower than the confidence of the BERT output (if it provides such information) in order to exclude the overlap. The best-performance confidence of each dataset is conjectured to reflect the combined influence of embedding quality of the BERT unit and the evidence overlap** effect. The Cnamesubscript𝐶𝑛𝑎𝑚𝑒C_{name}italic_C start_POSTSUBSCRIPT italic_n italic_a italic_m italic_e end_POSTSUBSCRIPT confidence value can be alternatively set equal to the cosine similarity of the embeddings, resulting in a slightly decreased performance. This is a good choice if you want to avoid hyper-parameter tuning.

5.5. Ablation study

To validate the effectiveness of each component in P-NAL, we compare it with several ablations. We demonstrate the results in  Table 4, where w/o represents without and Evaluesubscript𝐸𝑣𝑎𝑙𝑢𝑒E_{value}italic_E start_POSTSUBSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUBSCRIPT represents attribute value embedding information. all_revision𝑎𝑙𝑙_𝑟𝑒𝑣𝑖𝑠𝑖𝑜𝑛all\_revisionitalic_a italic_l italic_l _ italic_r italic_e italic_v italic_i italic_s italic_i italic_o italic_n represents replacing probabilistic revision rule with revision rule and all_prob_revision𝑎𝑙𝑙_𝑝𝑟𝑜𝑏_𝑟𝑒𝑣𝑖𝑠𝑖𝑜𝑛all\_prob\_revisionitalic_a italic_l italic_l _ italic_p italic_r italic_o italic_b _ italic_r italic_e italic_v italic_i italic_s italic_i italic_o italic_n is the opposite. 1v1_range1𝑣1_𝑟𝑎𝑛𝑔𝑒1v1\_range1 italic_v 1 _ italic_r italic_a italic_n italic_g italic_e is the 1-to-1 matching range information that is utilized in Section 3.4, modify_matches𝑚𝑜𝑑𝑖𝑓𝑦_𝑚𝑎𝑡𝑐𝑒𝑠modify\_matchesitalic_m italic_o italic_d italic_i italic_f italic_y _ italic_m italic_a italic_t italic_c italic_h italic_e italic_s is a proposed algorithm in Section 3.4 and inc_relation_f𝑖𝑛𝑐_𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓inc\_relation\_fitalic_i italic_n italic_c _ italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n _ italic_f represents the increment of inheritance statement’s frequency.

P-NAL approximately performs the best compared with these variants. However, all_prob_revision𝑎𝑙𝑙_𝑝𝑟𝑜𝑏_𝑟𝑒𝑣𝑖𝑠𝑖𝑜𝑛all\_prob\_r\-evisionitalic_a italic_l italic_l _ italic_p italic_r italic_o italic_b _ italic_r italic_e italic_v italic_i italic_s italic_i italic_o italic_n obtains slightly better result on ZH_EN compared with P-NAL, which need further research to explain. We insist on retaining the revision rule because it can deal with negative evidences of similarity sentences, while probabilistic revision rule cannot. The ablation results together with the main results show that P-NAL seems to have good monotonicity in Hits@1 performance in the sense that when adding extra information or procedure (component) into the model, the Hits@1 increases monotonically. Arguably, this is because introducing two-dimensional truth-values in every inference step separates confidence from truth degree (frequency) in every statement, thus the information of relative reliability level is stored for further usage. P-NAL also achieves competitive performance compared with FGWEA on D-W-15K-V2 with attribute triples (as FGWEA uses attribute information for semantic comparison).

Table 4. Ablation study of P-NAL.
Model ZH_EN JA_EN D-W-15K-V2
Hits@1 Hits@1 P R F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
P-NAL 0.989 0.984 0.917 0.906 0.912
- w/o Evaluesubscript𝐸𝑣𝑎𝑙𝑢𝑒E_{value}italic_E start_POSTSUBSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUBSCRIPT 0.985 0.980 - - -
- all_revision𝑎𝑙𝑙_𝑟𝑒𝑣𝑖𝑠𝑖𝑜𝑛all\_revisionitalic_a italic_l italic_l _ italic_r italic_e italic_v italic_i italic_s italic_i italic_o italic_n 0.903 0.912 0.857 0.814 0.835
- all_prob_revision𝑎𝑙𝑙_𝑝𝑟𝑜𝑏_𝑟𝑒𝑣𝑖𝑠𝑖𝑜𝑛all\_prob\_revisionitalic_a italic_l italic_l _ italic_p italic_r italic_o italic_b _ italic_r italic_e italic_v italic_i italic_s italic_i italic_o italic_n 0.991 0.987 - - -
- w/o 1v1_range1𝑣1_𝑟𝑎𝑛𝑔𝑒1v1\_range1 italic_v 1 _ italic_r italic_a italic_n italic_g italic_e 0.985 0.978 - - -
- w/o modify_matches𝑚𝑜𝑑𝑖𝑓𝑦_𝑚𝑎𝑡𝑐𝑒𝑠modify\_matchesitalic_m italic_o italic_d italic_i italic_f italic_y _ italic_m italic_a italic_t italic_c italic_h italic_e italic_s 0.987 0.982 0.912 0.901 0.907
- w/o inc_relation_f𝑖𝑛𝑐_𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓inc\_relation\_fitalic_i italic_n italic_c _ italic_r italic_e italic_l italic_a italic_t italic_i italic_o italic_n _ italic_f 0.978 0.975 0.918 0.906 0.912
FGWEA 0.976 0.978 0.952 0.903 0.927

6. Conclusion and Future Work

In this paper, we propose an entity alignment method named P-NAL, tackling the EA problem by modeling inference processes (similarity inference) which obtains similarity through paths that connect the entities. P-NAL leverages two type of paths, exploiting both structural and side information of KGs. Using the similarities, P-NAL matches the entities by the proposed rBMat algorithm with modification. P-NAL is also successfully adapted to the unsupervised scenario and a scenario without attribute triples. Compared with up-to-date EA methods, P-NAL attains competitive result on dataset D-W-15K-V2 and various settings of DBP15k𝐷𝐵𝑃15𝑘DBP15kitalic_D italic_B italic_P 15 italic_k, indicating that it successfully handles the most effective part of similarity inference.

We take a step in re-evaluating the design choices of different EA models, by providing some interesting insights (explanations) of different methods and competitive results compared with them. Hopefully, our approach may broaden the view and deepen the understanding of the EA research community. How to combine embedding models with path inference and facilitate embedding models’ full potential is a research question to be further studied.

Thanks to the expressiveness of the logic NAL, P-NAL can be extended with other capabilities in future works, such as integrating ontological information and processing paths for negative similarity evidences.

Acknowledgements.
This work was supported by the National Natural Science Foundation of China (62276057).

References

  • (1)
  • Auer et al. (2007) Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In international semantic web conference. Springer, 722–735.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol. 2. 2787–2795.
  • Cai et al. (2022) Weishan Cai, Wenjun Ma, Jieyu Zhan, and Yuncheng Jiang. 2022. Entity alignment with reliable path reasoning and relation-aware heterogeneous graph transformer. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22). 1930–1937.
  • Chen et al. (2018) Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3998–4004.
  • Chen et al. (2016) Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2016. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1511–1517.
  • Dao et al. (2023) Nhat-Minh Dao, Thai V Hoang, and Zonghua Zhang. 2023. A Benchmarking Study of Matching Algorithms for Knowledge Graph Entity Alignment. arXiv preprint arXiv:2308.03961 (2023).
  • Ding et al. (2023) Qijie Ding, Jie Yin, Daokun Zhang, and Junbin Gao. 2023. Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment. arXiv preprint arXiv:2307.02075 (2023).
  • Fanourakis et al. (2023) Nikolaos Fanourakis, Vasilis Efthymiou, Dimitris Kotzinos, and Vassilis Christophides. 2023. Knowledge graph embedding methods for entity alignment: experimental review. Data Mining and Knowledge Discovery 37, 5 (2023), 2070–2137.
  • Gesese et al. (2021) Genet Asefa Gesese, Russa Biswas, Mehwish Alam, and Harald Sack. 2021. A survey on knowledge graph embeddings with literals: Which model links better literal-ly? Semantic Web 12, 4 (2021), 617–647.
  • Guo et al. (2022) Lingbing Guo, Yuqiang Han, Qiang Zhang, and Huajun Chen. 2022. Deep reinforcement learning for entity alignment. In Findings of the Association for Computational Linguistics: ACL 2022. 2754–2765.
  • Hu et al. (2023) Mengting Hu, Zhen Zhang, Shiwan Zhao, Minlie Huang, and Bingzhe Wu. 2023. Uncertainty in Natural Language Processing: Sources, Quantification, and Applications. arXiv preprint arXiv:2306.04459 (2023).
  • Ji et al. (2021) Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. 2021. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE transactions on neural networks and learning systems 33, 2 (2021), 494–514.
  • Jiang et al. (2023b) Chuanyu Jiang, Yiming Qian, Lijun Chen, Yang Gu, and Xia Xie. 2023b. Unsupervised Deep Cross-Language Entity Alignment. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 3–19.
  • Jiang et al. (2012) Shangpu Jiang, Daniel Lowd, and De**g Dou. 2012. Learning to refine an automatically extracted knowledge base using markov logic. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 912–917.
  • Jiang et al. (2023a) Tingting Jiang, Chenyang Bu, Yi Zhu, and Xindong Wu. 2023a. Integrating symbol similarities with knowledge graph embedding for entity alignment: an unsupervised framework. Intelligent Computing 2 (2023), 0021.
  • Leone et al. (2022) Manuel Leone, Stefano Huber, Akhil Arora, Alberto García-Durán, and Robert West. 2022. A critical re-evaluation of neural methods for entity alignment. In Proceedings of the VLDB Endowment, Vol. 15. 1712–1725. https://doi.org/10.14778/3529337.3529355
  • Lin et al. (2023) Lin Lin, Lizheng Zu, Feng Guo, Song Fu, Yancheng Lv, Hao Guo, and Jie Liu. 2023. Using combinatorial optimization to solve entity alignment: An efficient unsupervised model. Neurocomputing 558 (2023), 126802.
  • Liu et al. (2023) Bing Liu, Tiancheng Lan, Wen Hua, and Guido Zuccon. 2023. Dependency-aware Self-training for Entity Alignment. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 796–804.
  • Liu et al. (2022) Xiao Liu, Haoyun Hong, Xinghao Wang, Zeyi Chen, Evgeny Kharlamov, Yuxiao Dong, and Jie Tang. 2022. Selfkg: Self-supervised entity alignment in knowledge graphs. In Proceedings of the ACM Web Conference 2022. 860–870.
  • Liu et al. (2020) Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, and Tat-Seng Chua. 2020. Exploring and evaluating attributes, values, and structures for entity alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6355–6364.
  • Logan IV et al. (2019) Robert L Logan IV, Nelson F Liu, Matthew E Peters, Matt Gardner, and Sameer Singh. 2019. Barack’s wife hillary: Using knowledge-graphs for fact-aware language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5962–5971.
  • Luo and Yu (2022) Shengxuan Luo and Sheng Yu. 2022. An accurate unsupervised method for joint entity alignment and dangling entity detection. In Findings of the Association for Computational Linguistics: ACL 2022. 2330–2339.
  • Mao et al. (2021a) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021a. Boosting the Speed of Entity Alignment 10 ×: Dual Attention Matching Network with Normalized Hard Sample Mining. In Proceedings of the Web Conference 2021. 821–832. https://doi.org/10.1145/3442381.3449897
  • Mao et al. (2021b) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021b. From alignment to assignment: Frustratingly simple unsupervised entity alignment. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2843–2853.
  • Mao et al. (2022) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2022. LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 825–838.
  • Mao et al. (2020a) Xin Mao, Wenting Wang, Huimin Xu, Man Lan, and Yuanbin Wu. 2020a. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In Proceedings of the 13th International Conference on Web Search and Data Mining. 420–428. https://doi.org/10.1145/3336191.3371804
  • Mao et al. (2020b) Xin Mao, Wenting Wang, Huimin Xu, Yuanbin Wu, and Man Lan. 2020b. Relational reflection entity alignment. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1095–1104.
  • Marcinkevičs and Vogt (2020) Ričards Marcinkevičs and Julia E Vogt. 2020. Interpretability and explainability: A machine learning zoo mini-tour. arXiv preprint arXiv:2012.01805 (2020).
  • Peyré et al. (2016) Gabriel Peyré, Marco Cuturi, and Justin Solomon. 2016. Gromov-wasserstein averaging of kernel and distance matrices. In International conference on machine learning. PMLR, 2664–2672.
  • Rudin (2019) C. Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature machine intelligence 1, 5 (2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x
  • Suchanek et al. (2011) Fabian M Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic Alignment of Relations, Instances, and Schema. In Proceedings of the VLDB Endowment, Vol. 5. 157–168.
  • Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. 697–706.
  • Sun et al. (2017) Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-lingual entity alignment via joint attribute-preserving embedding. In The Semantic Web–ISWC 2017: 16th International Semantic Web Conference. Springer, 628–644.
  • Sun et al. (2020) Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. In Proceedings of the VLDB Endowment, Vol. 13. 2326–2340.
  • Tang et al. (2023) Jianheng Tang, Kangfei Zhao, and Jia Li. 2023. A Fused Gromov-Wasserstein Framework for Unsupervised Knowledge Graph Entity Alignment. arXiv preprint arXiv:2305.06574 (2023).
  • Tang et al. (2020) ** Li. 2020. BERT-INT: a BERT-based interaction model for knowledge graph alignment. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3174–3180.
  • Tian et al. (2023) Xiaobin Tian, Zequn Sun, and Wei Hu. 2023. Generating Explanations to Understand and Repair Embedding-based Entity Alignment. arXiv preprint arXiv:2312.04877 (2023).
  • Wang (2005) Pei Wang. 2005. Experience-grounded semantics: a theory for intelligent systems. Cognitive Systems Research 6, 4 (2005), 282–302.
  • Wang (2013) Pei Wang. 2013. Non-axiomatic logic: A model of intelligent reasoning. World Scientific.
  • Wang et al. (2018) Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual knowledge graph alignment via graph convolutional networks. In Proceedings of the 2018 conference on empirical methods in natural language processing. 349–357.
  • Wu et al. (2019) Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, and Dongyan Zhao. 2019. Relation-aware entity alignment for heterogeneous knowledge graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 5278–5284.
  • Xu et al. (2019) Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. 2019. Gromov-wasserstein learning for graph matching and node embedding. In International conference on machine learning. PMLR, 6932–6941.
  • Xu et al. (2020) Kun Xu, Linfeng Song, Yansong Feng, Yan Song, and Dong Yu. 2020. Coordinated reasoning for cross-lingual knowledge graph alignment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 9354–9361.
  • Zeng et al. (2021) Kaisheng Zeng, Chengjiang Li, Lei Hou, Juanzi Li, and Ling Feng. 2021. A comprehensive survey of entity alignment for knowledge graphs. AI Open 2 (2021), 1–13. https://doi.org/10.1016/j.aiopen.2021.02.002
  • Zeng et al. (2020) Weixin Zeng, Xiang Zhao, Jiuyang Tang, and Xuemin Lin. 2020. Collective entity alignment via adaptive features. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1870–1873.
  • Zhao et al. (2022) Xiang Zhao, Weixin Zeng, Jiuyang Tang, Xinyi Li, Minnan Luo, and Qinghua Zheng. 2022. Toward entity alignment in the open world: an unsupervised approach with confidence modeling. Data Science and Engineering 7, 1 (2022), 16–29.
  • Zhao et al. (2023) Yu Zhao, Yike Wu, Xiangrui Cai, Ying Zhang, Haiwei Zhang, and Xiaojie Yuan. 2023. From Alignment to Entailment: A Unified Textual Entailment Framework for Entity Alignment. arXiv preprint arXiv:2305.11501 (2023).
  • Zhu and Zoubin (2002) Xiao** Zhu and Ghahramani Zoubin. 2002. Learning from labeled and unlabeled data with label propagation. ProQuest number: information to all users (2002).