P-NAL: an Effective and Interpretable Entity Alignment Method

Chuanhao Xu Northeastern UniversityShenyangChina [email protected] , **gwei Cheng Northeastern UniversityShenyangChina [email protected] and Fu Zhang Northeastern UniversityShenyangChina [email protected]

Abstract.

Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings’ constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce P-NAL, an entity alignment method that captures two types of logical inference paths with Non-Axiomatic Logic (NAL). Type I is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity sentence between the other two entities. Type II links the entity pair by their embeddings. P-NAL iteratively aligns entities and relations by integrating the conclusions of the inference paths. Moreover, our method is logically interpretable and extensible due to the expressiveness of NAL. Our proposed method is suitable for various EA settings. Experimental results show that our method outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of $DBP15K$ with both supervised and unsupervised settings. To our knowledge, we present the first in-depth analysis of entity alignment’s basic principles from a unified logical perspective.

1. Introduction

Knowledge graphs (KGs), which store massive facts about the real world, is a fruitful attempt aiming to enhance semantic driven information processing ability. KGs can be used by various application domains, such as question answering, recommender systems and language representation learning (knowledge graph enhanced language model) (Ji et al., 2021; Logan IV et al., 2019). The information contained in each individual KG project, such as DBpedia (Auer et al., 2007) and YAGO (Suchanek et al., 2007) is limited. So the task of entity alignment (EA) is proposed to increase KG completeness. The EA task consists of integrating two or more KGs into a same KG by aligning nodes that refer to the same entity.

There are many embedding-based EA methods (Fanourakis et al., 2023) that leverage deep learning techniques to represent entities with low-dimensional embeddings, and align entities with a similarity function on the embedding space. KGs’ triples and seed alignments are usually seen as embeddings’ constraint during the training process of such embedding model. The structural and side information of KGs are usually utilized via embedding propagation, aggregation or interaction. Generally speaking, there are some crucial shortcomings of embedding-based EA methods: 1. They lack complex reasoning capability. Some of them are enhanced by paths (Cai et al., 2022), however, due to the nature of vector representation, it is not easy to perform or approximate symbolic reasoning on such paths. 2. They lack interpretability in the models, so they have to rely solely on numerical evaluation metrics to evaluate their performance. Thus the cons and pros of their model design may not be properly evaluated. 3. The absence of a unified framework explaining the mechanism of embedding learning and processing renders their semantic or structural learning capability quite mysterious.

Apart from embedding-based methods, there exist a group of methods that directly estimates entity similarities from the contextual data (path) that are available in the two input KGs. We refer to them as “path-based” methods. Also, we refer to the estimation of entity similarities by processing and aggregating the paths as “similarity inference”. There is a potential advantage that path-based methods can capture fine-grained matches of neighbors while the traditional embedding-based methods can’t. There are also emerging methods that combines the idea of embedding learning and path reasoning. We coin the term “embedding-path” to refer to such methods. More recently, path-based (such as PARIS+ (Leone et al., 2022)) and embedding-path methods (such as BERT-INT (Tang et al., 2020) and FGWEA (Tang et al., 2023)) are starting to surpass the performance of traditional embedding-based methods. However, they failed to handle the similarity inference appropriately to some extent, possibly due to the lack of proper formalization of the inference paths and steps.

To address the aforementioned issues of embedding-based and embedding-path methods, we carefully examine the similarity inference of EA from the logical perspective. Thus we propose a path-based EA method P-NAL, where P stands for PARIS (Suchanek et al., 2011) and NAL for Non-Axiomatic Logic (Wang, 2013). PARIS is an unsupervised non-neural EA method with competitive performance on benchmark datasets (Leone et al., 2022). NAL is a term logic with a specific semantic theory and its design suits KG tasks (see Section 2.3). P-NAL reinterprets and extends the traditional EA system PARIS with the help of NAL. We formalize the similarity inference as using NAL’s revision inference rule to aggregate the evidences of two types of inference paths. Type I is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity between the other two entities. It can be seen as the fundamental alignment evidence (signal), and we coin the term “align-bridge” to refer to such type. Similarity inference of align-bridge consists of clarifying premises, performing several inference steps and obtaining conclusions. Type II is the direct path linking the to-be-aligned entity pairs by the representations of their name or description. We obtain such representations (embeddings) through the deep language model BERT. There are other two types of inference paths (which align relations), III and IV, that is not performed by our method but depicted for theoretical purpose.

P-NAL’s adopts an iterative aligning strategy, and for each iteration it first performs similarity inference, then it uses a matching technique (rBMat algorithm with modification, see Section 3.4) to obtain EA results. It also infers the matching of relations in each iteration. Although P-NAL is path-based, it does use embedding technique minimally, just to embed the literals in KGs (entity names/descriptions and attribute values). The name/description embeddings are utilized by path type II and the attribute value embeddings are utilized by path type I.

P-NAL is simple, highly interpretable and self-explanatory. The design of P-NAL has simple intuition. Although the overall implementation may seem a little complicated, P-NAL avoids using unnecessarily complicated mathematical objects and each step in it can be easily understood. It is interpretable in the sense that similarity inference and relation inference of P-NAL shares the same logical foundation and uses interpretable logical inference steps. P-NAL is self-explanatory in the sense that it generates a log file of evidences for the alignments after each iteration so we can inspect the log files. This feature enhances the troubleshooting capacity of us to some extent during the development process of P-NAL. For example, inspecting the faulty alignments in the evidence file inspired many decision choices in this paper. Also, P-NAL is extensible because NAL can express and process many different reasoning patterns and logical structures, so P-NAL can be extended to tackle other challenges in the EA process in future research.

Experiments on cross-lingual EA dataset $DBP15k$ demonstrate that P-NAL outperforms 12 existing EA methods, including both supervised and unsupervised state-of-the-art approaches, in 5 different configuration groups. Ablation study shows that our design choices jointly boost the overall performance of our model. With competitive EA performance, we conjecture that P-NAL’s similarity inference captures the essence of current EA task (with attribute triples), which means that aligning by such paths is both intuitive and effective.

To our knowledge, we present the first in-depth analysis of entity alignment’s basic principles from a unified logical perspective. Moreover, P-NAL is the first to integrate NAL in the EA task. P-NAL might also help explain the mechanism of other EA methods, as discussed in Section 4.

Outline. Section 2 gives a formal definition of EA and overviews the related works and relevant background knowledge of NAL. Section 3 first sketches our overall EA framework, then elaborates our design, including inference paths, literal utilization, matching algorithm, etc. Section 3.5 elaborates the overall structure of P-NAL. Section 4 discusses the relation with other methods. Our experimental results are presented in Section 5. We conclude in Section 6.

2. Preliminaries

2.1. Knowledge Graph and Entity Alignment

Knowledge graphs (KGs) are knowledge bases that store knowledge in the form of so-called facts or triples. We refer to (head, relation, tail) and (head, attribute, literal) as relation and attribute triples, respectively. Examples of both triple types are (New_Zealand, capital, Wellington) and (New_Zealand, establishedDate, “1947-11-25”), respectively. The arguments head and tail represent entities, relation is a relationship that holds between two entities, and attribute is a special type of relation that holds between an entity and a literal. Entities can be seen as graph nodes and they usually denote real-world objects, while literals are used to identify values for strings, numbers or dates. To summarize, a KG is characterized with a number of relation triples from $\mathcal{E}\times\mathcal{R}\times\mathcal{E}$ and a number of attribute triples from $\mathcal{E}\times\mathcal{A}\times\mathcal{L}$ , where $\mathcal{E},\mathcal{R},\mathcal{A}$ , and $\mathcal{L}$ indicate the set of entities, relations, attributes and literals, respectively.

The entity alignment (EA) problem is typically defined between two KGs, $\mathcal{KG}_{1}$ and $\mathcal{KG}_{2}$ , where the task consists of finding equivalences (so-called alignment) between the set of entities $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ of the two KGs. Sometimes there exists a set of given equivalences that can be used as supervision. This set $\mathcal{S}$ is known as seed alignment set. The supervised EA methods are allowed to utilize the information of $\mathcal{S}$ to infer the equivalences of other entities. We assume that there exists a ground truth set $\mathcal{G}=\{(e,e^{\prime})\in\mathcal{E}_{1}\times\mathcal{E}_{2}|\ e\equiv e% ^{\prime}\}$ that includes all known equivalences between pairs of entities. The ground truth set $\mathcal{G}$ is usually used to evaluate the performance of an EA method, by comparing it with the output alignment set of the method.

2.2. Related Work

Generally speaking, there are three families of EA methods: embedding-based, path-based and embedding-path methods, as elaborated in this section.

In recent years, embedding-based methods have become mainstream for addressing the EA task (Tang et al., 2023; Fanourakis et al., 2023). Their main idea is to embed the nodes (entities) and edges (relations or attributes) of a KG into a low-dimensional vector space that preserves their similarities in the original KG. Embedding-based EA methods usually consists of three parts: the embedding module, the alignment module and the matching module. For the embedding module, translational methods and graph neural network (GNN) methods are the most popular. Translational methods, such as MTransE (Chen et al., 2016), usually optimize a margin-based loss function to learn the structural information (relation triples) of a KG. On the other hand, GNN methods recursively aggregate the representations of neighboring nodes with graph convolutional networks (GCNs) or graph attention networks (GATs). The representative ones are RDGCN (Wu et al., 2019) and RREA (Mao et al., 2020b), respectively. The alignment module maps the entity embeddings in different KGs into a unified space. There are generally three techniques (Fanourakis et al., 2023) for this module: 1. Sharing the embedding space by using the margin-based loss to enforce the seed alignment entities’ embeddings from different KGs to be close. 2. Swap** the triples of seed alignment entities. 3. Map** the entity vectors from one embedding space to the other using a transformation matrix. The matching module generates the final alignment result. Common practices use the cosine similarity, the Manhattan distance, or the Euclidean distance between entity embeddings to measure their similarities and then performs a specific matching algorithm based on the similarity scores.

Apart from embedding-based methods, there exist a group of methods that directly estimates entity similarities from the contextual data (path) that are available in the two input KGs. We refer to them as “path-based” methods. There is a potential advantage that path-based methods can capture fine-grained matches of neighbors while the traditional embedding-based methods can’t. Embedding-based methods may suffer from the negative influence from the dissimilar neighbors, according to (Tang et al., 2020). The distinction between embedding-based and path-based methods is sometimes obscure.

There are also emerging “embedding-path” methods that combines the idea of embedding learning and path reasoning. More recently, path-based and embedding-path methods are starting to surpass the performance of traditional embedding-based methods. Our proposed method P-NAL is a path-based method that uses embedding minimally, so in the following part we introduce several advanced path-based and embedding-path methods. Moreover, these methods are selected for comparison with P-NAL in our experiments.

PARIS (Suchanek et al., 2011) is a classic unsupervised non-neural EA method with competitive performance on benchmark datasets (Leone et al., 2022). It is purely path-based. PARIS introduces the concept “functionality” into the field of EA to enhance the validity of similarity inference paths. Functionality generally corresponds to the uniqueness of related things, for example a man can only have one father but multiple friends, so $fun(father)$ is close to 1 and $fun(friend)$ is relatively lower, where $fun()$ represents functionality of a relation or attribute. See (Suchanek et al., 2011) for more details about functionality. With functionality, PARIS constructs a probabilistic model that estimates the probabilities of an entity $x$ in $\mathcal{KG}_{1}$ being equivalent to another entity $x^{\prime}$ in $\mathcal{KG}_{2}$ . Here is the formula for $Pr\left(x\equiv x^{\prime}\right)$ :

$1-\prod_{r(x,y),r^{\prime}(x^{\prime},y^{\prime})}\left(1-Pr\left(r\subset r^{% \prime}\right)\times fun(r^{-1})\times Pr\left(y\equiv y^{\prime}\right)\right)$

As depicted in the above formula, PARIS estimates the equivalence probabilities by integrating paths that connects corresponding entities. It also find subrelations between the two ontologies of KG with the following equation. Subrelations, such as $r\subset r^{\prime}$ , intuitively means a correspondence of two relations of different KGs such that one relational fact of $r$ in $\mathcal{KG}_{1}$ implies the existence of a corresponding relational fact of $r^{\prime}$ in $\mathcal{KG}_{2}$ . Here is the formula for $Pr\left(r\subset r^{\prime}\right)$ :

$\frac{\Sigma_{r(x,y)}\left(1-\prod_{r^{\prime}(x^{\prime},y^{\prime})}\left(1-% Pr\left(x\equiv x^{\prime}\right)\times Pr\left(y\equiv y^{\prime}\right)% \right)\right)}{\Sigma_{r(x,y)}\left(1-\prod_{x^{\prime},y^{\prime}}\left(1-Pr% \left(x\equiv x^{\prime}\right)\times Pr\left(y\equiv y^{\prime}\right)\right)% \right)}$

With the help of subrelations’ measurement, PARIS generalizes the equation of $Pr\left(x\equiv x^{\prime}\right)$ to the case where the two ontologies do not share common relations. Therefore, PARIS recursively aligns the entities and the equivalence probability of $x\equiv x^{\prime}$ depends recursively on other equivalence probabilities. In each iteration, the probabilities are re-calculated based on the equivalences and subrelations of the previous iteration. Initial equivalences are computed between attribute literals based on a certain string distance measurement.

PARIS+ (Leone et al., 2022) is a variant of PARIS that makes a simple refinement and works in the absence of attribute triples. It processes the seed alignment information to generate synthetic attribute triples. That is, for every pair of seed alignments ( $x$ , $x^{\prime}$ ), it creates the attribute triples ( $x$ , EA:label, string( $x$ )) and ( $x^{\prime}$ , EA:label, string( $x$ )), where EA:label is a synthetic relation. Thus, the reverse of the relation EA:label is designed to be highly functional in order to let the model match the seed alignments easily. P-NAL adopts the same refinement as PARIS+.

BERT-INT (Tang et al., 2020), an embedding-path EA method, uses the well-known transformer model BERT to embed the entities and literals. It calculates the cosine similarity of the entity name/description embedding. Then it proposes an interaction model that compares each pair of neighbors or attributes (which forms a path from the source entity to the target entity) to obtain the neighbor/attribute similarity score. The name/description similarity vector, neighbor similarity vector and attribute similarity vector are concatenated and applied to a MLP layer to get the final similarity score.

FGWEA (Tang et al., 2023) is a three-step progressive optimization algorithm for EA and it can be classified as an embedding-path EA method. First, the entity names and concatenated attribute triples are used for semantic embedding matching to obtain initial anchors. Then in order to approximate GWD (Gromov-Wasserstein Distance (Peyré et al., 2016)), FGWEA computes cross-KG structural and relational similarities, which are then used for iterative multi-view optimal transport alignment. Finally, the Bregman Proximal Gradient algorithm (Xu et al., 2019) is employed to refine the GWD’s coupling matrix.

There are also a few works that focus on the interpretability or explanability of EA, such as LightEA (Mao et al., 2022) and ExEA (Tian et al., 2023). LightEA is an interpretable non-neural EA method. It is inspired by a classical graph algorithm, label propagation (Zhu and Zoubin, 2002). First, it generates a random orthogonal label for each seed alignment entity pair. Then, the labels of entities and relations are propagated according to the three views of adjacency tensor. Finally, LightEA utilizes sparse sinkhorn iteration to address the assignment problem of alignment results.

The ExEA framework, proposed by (Tian et al., 2023), aims to explain the results of embedding-based EA. It generates semantic matching subgraphs as explanation by matching semantically consistent triples around the two aligned entities. ExEA devises an alignment dependency graph structure to gain deeper insights into the explanation.

The recent literature of EA is abundant, focusing on many different aspects or procedures of entity alignment apart from the aforementioned ones, such as utilizing attribute triples (Liu et al., 2020; Sun et al., 2017), utilizing literals (Gesese et al., 2021; Chen et al., 2018) , sample mining (Liu et al., 2022; Mao et al., 2021a), reinforcement learning (Guo et al., 2022), matching algorithm (Lin et al., 2023; Dao et al., 2023; Mao et al., 2021b; Xu et al., 2020; Zeng et al., 2020), iterative strategy (Liu et al., 2023; Mao et al., 2020a) and unsupervised learning (Jiang et al., 2023b, a; Liu et al., 2022; Luo and Yu, 2022; Zhao et al., 2022). There are also some surveys for EA (Fanourakis et al., 2023; Zeng et al., 2021; Sun et al., 2020; Mao et al., 2022). Besides graph structural, attribute and literal information, there are other information forms researched by the EA community, such as temporal, spatial and graphical information, however, these topics are beyond the scope of this paper.

2.3. A Brief Introduction to NAL

NAL (Non-Axiomatic Logic) (Wang, 2013) is a logic designed for the creation of general-purpose AI systems, by formulating the fundamental regularities of human thinking in a general level. It can be used as the logical foundation of a (non-axiomatic) inference system. Traditional inference systems are usually based on model-theoretic semantics, while under the assumption of insufficient knowledge and resources, NAL is a term logic basing on experience-grounded semantics (Wang, 2005). The meaning of a term in NAL, to the inference system, is determined by its role in the experience (which will be explained later), that is, how it has been related to other terms in the past. The truth-value of a statement in NAL is determined by how it has been supported or refuted by other statements in the past.

In this paper we only utilize a fraction of NAL’s syntax and inference capability (for EA). We will now introduce the relevant parts of its syntax. A term in NAL can either be atomic or compound. An atomic term is a word (string) or a variable term. Independent variable, such as “ $\#x$ ”, represents any unspecified term under a given restriction, and intuitively correspond to the universally quantified variable in first-order predicate logic. Dependent variable, such as “ $\$y$ ”, represents a certain unspecified term under a given restriction, and intuitively correspond to the existentially quantified variable. A compound term consists of term connector and components (which are themselves terms). A basic statement has the form of “subject copula predicate”, where subject and predicate are terms. There are multiple types of copula and each type has a corresponding statement type, including: 1.Inheritance (“ $A\rightarrow B$ ”, where $A$ and $B$ are terms) which intuitively means “B is a general case of A”; 2.Similarity (“ $A\leftrightarrow B$ ”) which intuitively means “A is similar with B”; 3.Implication, which is a higher-order copula (“ $P\Rightarrow Q$ ”, where $P$ and $Q$ are statements), intuitively means “P implies Q” (different from the “material implication”, it requires $P$ to be related to $Q$ in content because NAL is a term logic that uses syllogistic inference rules and only derives conclusions that are related in content). A sentence is a statement together with its truth-value. An intensional set with only one component, for example, “ $[red]$ ” intuitively means “red things”. Term connector “ $*$ ” (product) combines multiple component terms into an ordered compound term such as $\left(*,A,B\right)$ , which intuitively means “an anonymous relation between A and B”. Compound terms are usually written in the prefix format, that is the term connector is written in the first place. Statement connector “ $\wedge$ ” can be seen as the conjunction operator of propositional logic.

NAL is “non-axiomatic” in the sense that the truth-value of a conclusion in the inference system does not indicate how much the conclusion agrees with the “state of affair” in the world, or with a constant set of assumptions (the axioms), but how much it is supported by the evidence provided by the past experience of the system. Experience means the inference system’s history of interaction with the environment or equivalently the input sentences. The acquisition of experience may involve sensorimotor mechanism and sensation-perception process, which is beyond our scope. The information source of a sentence is characterized as its evidence. The inference rules of NAL coherently pass on the evidential information from the premises to the conclusion, so the premises can be seen as the evidence of the conclusion. The input sentences can be seen as a synthesis of virtual positive and negative evidences. Assume the available amount of positive evidence and negative evidence of a statement are written as $w^{+}$ and $w^{-}$ , respectively, then the total amount of evidence is $w=w^{+}+w^{-}$ . The frequency of the statement is $f=w^{+}/w$ , and the confidence of the statement is $c=w/(w+k)$ , where $k$ is a positive constant representing “evidential horizon”. We take $k$ = 1 in our implementation. Frequency intuitively means “the degree of truth” and confidence intuitively represents “the total amount of evidences”. The more evidences that the statement have considered, the higher confidence value. The truth-value attached to the statement is the ordered pair $<f,c>$ and it is often written right after the statement.

NAL uses syllogistic (rather than truth-functional) inference rules. Among them the revision rule merges evidences for the same statement collected from different sources together, so it can settle inconsistency among the system’s sentences. It is very useful in our approach. The relevant rules with corresponding truth functions are all listed in Table 1. Note that the inference rules are not domain-specific. There are three extended boolean operators (Wang, 2013) in the calculation of truth functions:

$\left\{\begin{aligned} and(x_{1},...,x_{n})&=\prod_{i=1}^{n}x_{i}\\ or(x_{1},...,x_{n})\-&=1-\prod_{i=1}^{n}(1-x_{i})\\ not(x)&=1-x\end{aligned}\right.$ , where $x_{i}\in[0,1]$ .

Table 1. The table of relevant truth functions.

Inference rule	Premises	Conclusion
`Deduction`	$A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ B\rightarrow C\ \left% \langle f_{2},c_{2}\right\rangle$	$A\rightarrow C\ \left\langle f=and(f_{1},f_{2}),c=and(f_{1},f_{2},c_{1},c_{2})\right\rangle$
`Analogy`	$A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ A\leftrightarrow C\ % \left\langle f_{2},c_{2}\right\rangle$	$C\rightarrow B\ \left\langle f=and(f_{1},f_{2}),c=and(f_{2},c_{1},c_{2})\right\rangle$
`Conditional Deduction`	$(P\wedge\ Q)\ \Rightarrow R\ \left\langle f_{1},c_{1}\right\rangle,\ Q\ \left% \langle f_{2},c_{2}\right\rangle$	$P\Rightarrow R\ \left\langle f=and(f_{1},f_{2}),c=and(f_{1},f_{2},c_{1},c_{2})\right\rangle$
`Induction`	$A\rightarrow B\ \left\langle f_{1},c_{1}\right\rangle,\ A\rightarrow C\ \left% \langle f_{2},c_{2}\right\rangle$	$C\rightarrow B\ \left\langle w^{+}=and(f_{2},c_{2},f_{1},c_{1}),w^{-}=and(f_{2% },c_{2},not(f_{1}),c_{1})\right\rangle$
`Revision`	$P\ \left\langle f_{1},c_{1}\right\rangle,\ P\ \left\langle f_{2},c_{2}\right\rangle$	$P\ \left\langle w^{+}=w^{+}_{1}+w^{+}_{2},w=w_{1}+w_{2}\right\rangle$
`Probabilistic Revision`	$P\ \left\langle f_{1},c_{1}\right\rangle,\ P\ \left\langle f_{2},c_{2}\right\rangle$	$P\ \left\langle f=or(f_{1},f_{2}),w=w_{1}+w_{2}\right\rangle$

2.4. Why NAL

Actually there might be many different logical systems that are qualified to represent the similarity inference process of EA. However, we believe that the non-axiomatic nature of NAL fits in the domain of knowledge graph better than those axiomatic logical systems, because real world KGs need to deal with the problem of open-domain and alterable/incomplete/conflicting facts. Fundamentally, the tasks of knowledge graph (such as EA), fits well with the assumption of insufficient knowledge and resources (Wang, 2013), which is the basic assumption of NAL.

Technically speaking, NAL can represent entities, relations and relational triples, which are essential for EA. It can also perform formal reasoning and evidence aggregation, which is useful to align entities. The frequency/confidence measurement of truth-value is suitable to represent fuzziness and unknownness in the similarity inference process. The high expressiveness of NAL makes our approach extensible, which may benefit subsequent studies.

3. Our Approach

Our EA approach’s main idea stems from the well-known EA method PARIS. The overall structure of P-NAL adopts an iterative aligning strategy, and for each iteration it first performs similarity inference, then it uses a matching technique (rBMat algorithm with modification in Section 3.4) to obtain EA results. We formalize the similarity inference as using NAL’s revision inference rule to aggregate two types of inference paths. P-NAL also infers the matching of relations in each iteration. We explore and implement some other design choices and tricks to complete the alignment framework, as illustrated in the following subsections.

Refer to caption — Figure 1. An illustration of align-bridge, omitting irrelevant triples.

3.1. PARIS Reinterpreted in the Formal Language of NAL

The key point of the alignment process in PARIS is finding instances of bridge-like inference path between to-be-aligned entity pairs. We coin the term “align-bridge” to refer to such type of path. Valid align-bridges are retrieved from the KGs in a depth-first manner. As shown in Figure 1, entity $y_{1}$ and entity $y_{2}$ belongs to different KGs (where subscripts represent different KGs) and ( $y_{1}$ , $y_{2}$ ) forms a to-be-aligned entity pair. Triple ( $x_{1}$ , $r_{1}$ , $y_{1}$ ) is a relational or attribute triple in $\mathcal{KG}_{1}$ , where $x_{1}$ is either an entity or a literal respectively. Note that PARIS (and P-NAL) automatically duplicates every original KG triple ( $a$ , $b$ , $c$ ) with a reversed triple ( $c$ , $b^{-1}$ , $a$ ) upon KG loading, so the attribute triple ( $x_{1}$ , $r_{1}$ , $y_{1}$ ) with a literal $x_{1}$ is a reversed attribute triple. Similarly, triple ( $x_{2}$ , $r_{2}$ , $y_{2}$ ) is a relational or attribute triple in $\mathcal{KG}_{2}$ .

In this paper, every entity, literal or relation is regarded as an atomic term in NAL. Triple ( $x_{1}$ , $r_{1}$ , $y_{1}$ ) is reinterpreted as inheritance statement (*, $x_{1}$ , $y_{1}$ ) $\rightarrow$ $r_{1}$ . Its intuitive meaning is “The relation between $x_{1}$ and $y_{1}$ is a specialization of relational term $r_{1}$ ”. The triples (or “facts”) of the KGs can be seen as absolutely true to some extent, so the truth-value attached to the statement is $\left\langle 1,1\right\rangle$ . All of the triples of the two KGs are taken as input sentences, which forms the experience of the inference system. In PARIS, the equality score between $x_{1}$ and $x_{2}$ is retrieved to measure the similarity between them. We interpret the equality as a similarity statement $x_{1}$ $\leftrightarrow$ $x_{2}\$ and the score is reflected in the truth-value of the statement. Note that in the case of entity pair, the similarity comes from either seed alignments or alignments of the previous iteration. We omit the entity similarity statement which has a $f$ or $c$ lesser than $theta$ , a hyper-parameter. And in the case of literal pair, the similarity comes from literal comparison (for example, string identity comparison).

At the same time, PARIS retrieves the sub-relation probability score Pr( $r_{1}$ $\subseteq$ $r_{2}$ ) (from the computation result of the previous iteration or a default value $iota$ , which is a hyper-parameter) which we interpret as an inheritance statement $r_{1}$ $\rightarrow$ $r_{2}$ with a truth-value. Its intuitive meaning is “The relational term $r_{1}$ is a specialization of relational term $r_{2}$ ”. PARIS evaluates the degree of functionality of relation $r_{2}$ with precomputed functionalities of each relation. We interpret it as an inheritance statement $r_{2}$ $\rightarrow$ [ $fun$ ] with the degree reflected in the truth-value. The statement intuitively means “ $r_{2}$ has the functional property (to some extent)”.

Then the validity of align-bridge is interpreted in terms of inference steps in NAL as follows:

$(type\ I\ path)\ premises:$

(1)		$\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.$
(2)		$\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 1,1% \right\rangle.$
(3)		$\displaystyle r_{1}\rightarrow r_{2}\ \ \left\langle f_{3},c_{3}\right\rangle.$
(4)		$\displaystyle r_{2}\rightarrow r_{1}\ \ \left\langle f_{4},c_{4}\right\rangle.$
(5)		$\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{5},c_{5}\right\rangle.$
(6)		$\displaystyle r_{1}\rightarrow[fun]\ \ \left\langle f_{6},c_{6}\right\rangle.$
(7)		$\displaystyle r_{2}\rightarrow[fun]\ \ \left\langle f_{7},c_{7}\right\rangle.$
	$\displaystyle(\left(,\$a,\#b_{1}\right)\rightarrow\$r\ \wedge\ \left(,\$a,\#% b_{2}\right)\rightarrow\$r\$
(8)		$\displaystyle\wedge\ \$r\rightarrow[fun])\ \Rightarrow\#b_{1}\leftrightarrow\#% b_{2}\ \ \left\langle f_{8},c_{8}\right\rangle.$

$inference\ steps\ (path)\ \&\ conclusion:$

(9)		$\displaystyle from\ (1)\ and\ (3),Deduction:\left(*,x_{1},y_{1}\right)% \rightarrow r_{2}\ \ \left\langle f_{9},c_{9}\right\rangle.$
(10)		$\displaystyle from\ (2)\ and\ (5),Analogy:\left(*,x_{1},y_{2}\right)% \rightarrow r_{2}\ \ \left\langle f_{10},c_{10}\right\rangle.$
(11)		$\displaystyle from(8),(9),(10),and(7),Conditional\ deduction\times 3:\ \ y_{1}% \leftrightarrow y_{2}\ \ \left\langle f_{11},c_{11}\right\rangle.$

Here is a brief introduction of the idea of the inference steps: The first two steps aim to match up the two triples (1) and (2). The first step exchanges $r_{1}$ in the triple (1) for $r_{2}$ . The second step exchanges $x_{2}$ in the triple (2) for $x_{1}$ . The third step is a type of inference similar with the resolution rule of Horn clauses and (8) correspond to a Horn rule.

In the path listed above, we omit two auxiliary inference steps right before arriving at conclusion (10) which performs structural transformation in order to dismount $x_{2}$ from the product without modifying truth-value. The last conditional deduction of (11) degenerates into a case without conjunction in its premises (similar with Modus Ponens) and its truth function remains the same. Statement (11) is the conclusion of the above inference steps and the whole steps act as a summarizing or validation process of the align-bridge. Note that there can be four distinct inference paths of an align-bridge (including the path above) that share the same premise set and the same conclusion (with slightly different truth-values), with different details of inference steps. In implementation, we aggregate two of them (of both relational inheritance direction) by probabilistic revision rule (similar with PARIS).

Implication statement (8) is regarded as a definition or a piece of essence of the concept “functionality”. Relations’ functionality seems to reflect a widespread orderliness of reality or human cognition and PARIS leverages such orderliness.

The conclusions with the same statement but obtained from different align-bridges are merged by probabilistic revision rule because of the probabilistic nature of functionality. For example, the functionality of relation $children$ is 0.68 which means that the majority of the population approximately have one to three children. Entity $Lynne\_Che\-ney$ has two children, however, in the alignment system when performing alignment of the two $Mary\_Cheney$ entities (who is a children of $Lynne\_\-Cheney$ ) of different KGs we do not know how many children does $Lynne\_Cheney$ has. The conclusion of the align-bridge $Mary\_\-Chen\-ey(en)$ – $Lynne\_Cheney(en)$ – $Lynne\_Cheney(zh)$ – $Ma\-ry\_Cheney\-(zh)$ is

$Mary\_Cheney(en)\leftrightarrow Mary\_Cheney(zh)\ \ \left\langle f,c\right\rangle.$

It has a probabilistic nature because we don’t know whether $Mary\_\-Cheney(en)$ and $Mary\_Cheney(zh)$ is the same children of $Lynne\_\-Cheney$ . Thus the probabilistic revision rule is used to aggregate the conclusions of multiple align-bridges. This rule is similar with the continued multiplication of PARIS’s probability formula for $Pr\left(x\equiv x^{\prime}\right)$ (given in Section 2.2), except for the introduction and calculation of confidence.

3.2. Aligning Relations

Apart from aligning entities, we also align relations using an adapted version of PARIS’s probabilistic relation aligning method. The relation aligning formula of PARIS is reinterpreted as directly estimating the evidence amount for the conclusion. The truth-value of the conclusion statement $r_{1}\rightarrow r_{2}$ (the inheritance statement between two relations $r_{1}$ and $r_{2}$ ) is computed as follow:

	$\displaystyle w^{+}=\Sigma_{(x_{1},r_{1},y_{1})}\left(1-\prod_{(x_{2},r_{2},y_% {2})}\left(1-expt\left(x_{1}\leftrightarrow x_{2}\right)\times expt\left(y_{1}% \leftrightarrow y_{2}\right)\right)\right)$
	$\displaystyle w=\Sigma_{(x_{1},r_{1},y_{1})}\left(1-\prod_{x_{2},y_{2}}\left(1% -expt\left(x_{1}\leftrightarrow x_{2}\right)\times expt\left(y_{1}% \leftrightarrow y_{2}\right)\right)\right)$

where $expt$ is the $expectation$ of a truth-value, and $expectation$ is a combined measurement of $f$ and $c$ , defined as $expectation=f\times c$ .

Alternatively, the truth-value of statement $r_{1}\rightarrow r_{2}$ can be obtained via inference on two types of paths:

$(type\ III\ path)\ premises:$

(12)		$\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.$
(13)		$\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 1,1% \right\rangle.$
(14)		$\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{14},c_{14}\right\rangle.$
(15)		$\displaystyle y_{1}\leftrightarrow y_{2}\ \ \left\langle f_{15},c_{15}\right\rangle.$

$inference\ steps\ (path)\ \&\ conclusion:$

(16)		$\displaystyle from\ (12)\ and\ (14),Analogy:\left(*,x_{2},y_{1}\right)% \rightarrow r_{1}\ \ \left\langle f_{16},c_{16}\right\rangle.$
(17)		$\displaystyle from\ (16)\ and\ (15),Analogy:\left(*,x_{2},y_{2}\right)% \rightarrow r_{1}\ \ \left\langle f_{17},c_{17}\right\rangle.$
(18)		$\displaystyle from\ (13)\ and\ (17),Induction:r_{1}\rightarrow r_{2}\ \ \left% \langle 1,c_{18}\right\rangle.$

$(type\ IV\ path)\ premises:$

(19)		$\displaystyle\left(*,x_{1},y_{1}\right)\rightarrow r_{1}\ \ \left\langle 1,1% \right\rangle.$
(20)		$\displaystyle\left(*,x_{2},y_{2}\right)\rightarrow r_{2}\ \ \left\langle 0,1% \right\rangle.$
(21)		$\displaystyle x_{1}\leftrightarrow x_{2}\ \ \left\langle f_{21},c_{21}\right\rangle.$
(22)		$\displaystyle y_{1}\leftrightarrow y_{2}\ \ \left\langle f_{22},c_{22}\right\rangle.$

$inference\ steps\ (path)\ \&\ conclusion:$

(23)		$\displaystyle from\ (19)\ and\ (21),Analogy:\left(*,x_{2},y_{1}\right)% \rightarrow r_{1}\ \ \left\langle f_{23},c_{23}\right\rangle.$
(24)		$\displaystyle from\ (23)\ and\ (22),Analogy:\left(*,x_{2},y_{2}\right)% \rightarrow r_{1}\ \ \left\langle f_{24},c_{24}\right\rangle.$
(25)		$\displaystyle from\ (20)\ and\ (24),Induction:r_{1}\rightarrow r_{2}\ \ \left% \langle 0,c_{25}\right\rangle.$

Premise (20) has a frequency value of 0, so it is not a triple in the KG but represents the absence of the triple. Note that the induction inference rule is a weak inference rule, so the upper bound of its conclusion’s confidence is lower than the strong inference rules (such as deduction and analogy). Type III path only generates positive evidence for the conclusion and type IV path only generates negative evidence, because of the characteristic of the truth-function of induction rule. The conclusions of the two types of path are supposed to be merged by the revision rule. In implementation, for simplicity we directly estimate evidence amount instead of performing inference for relation alignment, however, the latter approach may provide a more theoretical view. These two approaches share the same idea but the calculation processes are not identical.

Due to the incompleteness of the KGs of relevant datasets, the positive evidence for the inheritance statement $r_{1}\rightarrow r_{2}$ is usually inadequate, for that either of the premise (12) or (13) may be missing in the KG but actually true. To address this issue, we force the frequency of every inheritance statement to increase by a proportion (at the end of each iteration), that is $f:=f+\left(1-f\right)\times inc\_relation\_f$ , where $:=$ represents assignment and $inc\_relation\_f$ is a hyper-parameter.

3.3. Utilizing Literal Values

Literal values in real-world KGs act as entity names, entity descriptions, relation/attribute names or attribute values, carrying enormous information. Literal values include texts (strings), numerical values and dates. Literals’ deficiency of its outer semantic structure (triples) contrasts with its abundant internal semantics. However, symbolic reasoning languages (systems) like NAL currently can’t effectively handle the subtle semantics in texts for the following reasons: semantic parsing/understanding requires processing capacity and efficiency of complex logical forms and it also requires automatic learning capacity; the lacking of KGs with complex logical forms; the lacking of KGs with detailed and comprehensive common sense knowledge. In a certain perspective, the literal values in real-world KGs are not really “literal” but rather under-characterized entities, concepts, triples, common sense knowledge and/or statements with complex logical forms. The real-world KG project may not have enough information or adequate paradigm to deal with them. For example, the literal value of attribute triple $($ John Lennon, deathPlace, “Manhattan, New York City, United States”@en $)$ referred to entities “Manhattan”, “New York” and “United States”, and its form indicates a specific relation between these places.

Deep neural network language models provide an interim solution to the literal value understanding problem. For example, BERT-INT (Tang et al., 2020) utilize BERT to embed names/descriptions and values into vector space, thus use similarities between the feature vectors for alignment.

P-NAL adopts the same embedding method as BERT-INT and fuse it into the similarity inference system. First, the basic BERT unit is finetuned on the name/description of seed alignment entity pairs. Then we use the finetuned model to compute feature vectors for all entities (name/description). Moreover, we get the feature vectors for attribute values. Entity name/description feature’s cosine similarity is used to convert directly to the statement

$y_{1}\leftrightarrow y_{2}\ \ \left\langle sim(y_{1},y_{2}),\ C_{name}\right\rangle.$

where $sim$ is cosine similarity and $C_{name}$ is a hyper-parameter. The direct path linking the to-be-aligned entities with their name/description similarity is the type II path of similarity inference as mentioned before. The statement is seen as a piece of evidence and fused with other evidences (such as those from align-bridges) by revision rule.

This path seems straightforward, however we can have a deeper understanding of it. Language models used for the embedding process of EA are distinct information sources other than the KG itself. The deep language model which has the ability of aligning or translating entity names can be seen as a generalized alignment model that aligns morphemes, words, entities and concepts. The pretraining corpus of it consists of sentences, although the sentences do not possesses explicit structures, they can be understood by the model by transforming them into complex logical forms. However, such transformation (if exist) and the logical forms are implicitly expressed in the model parameters. To summarize, our similarity inference’s type II path can be seen as the aggregation of multiple (virtual) complex logical paths. The aggregation result is represented into the vector space by the language model.

Attribute value feature’s cosine similarity is used to convert to the truth-value of premise statement (5) if $x_{1}$ and $x_{2}$ are attribute values. The corresponding truth-value is

$\left\langle f=sim(x_{1},x_{2}),\ c=\-sim(x_{1},x_{2})\right\rangle$

The idea is that the deep learning model’s result which has higher similarity is usually more verifiable. There are thousands of distinct attribute values in a KG, so for an attribute value we only consider the $K_{value}$ most similar (but not identical) values in the other KG to prevent an explosive number of value similarities. $K_{value}$ is a hyper-parameter and in implementation we set $K_{value}$ to 1.

3.4. Matching Algorithm

input : An array of linked list of similarity sentences

KG1\_to\_KG2

, with each linked list storing top-k similarity sentences of an entity with descending order.

output : Optimized 1-to-1 similarity sentences (alignment results)

2 populates $KG2\_to\_KG1$ with all of the sentences in $KG1\_to\_KG2$ ;

KG2\_to\_KG1

is another array of linked list, arranging the similarity sentences in the other direction */

3 for $e_{1}$ in $\mathcal{E}_{1}$ do

4 recursively_delete( $e_{1},null$ );

Algorithm 1 recursive bidirectional matching

input : Entity

e_{1}

, entity

e_{prev}

e_{1}

is the entity to be processed and we assume that

e_{1}

belongs to the left graph, similarly otherwise. Entity

e_{prev}

represents the previous entity, that is the processed entity of the recursion parent. */

output : entity

e_{return}

which represents the final alignment for

e_{1}

2 for $sentence$ in $KG1\_to\_KG2(e_{1})$ do

e_{2}

\leftarrow

predicate\_term

sentence

;

predicate\_term

means the other entity of the similarity sentence */

4 if $e_{2}$ == $e_{prev}$ then

e_{return}

\leftarrow

e_{prev}

;

6 break;

8 else

e_{3}

\leftarrow

recursively_delete( $e_{2},e_{1}$ );

10 if $e_{3}$ == $e_{1}$ then

e_{return}

\leftarrow

e_{2}

;

12 break;

16for $sentence$ in $KG1\_to\_KG2(e_{1})$ except the first node do

/* now that the first sentence for

e_{1}

is bidirectionally matched, we delete other sentences */

17 removes $sentence$ from the linked list;

18 removes $sentence$ ’s counterpart in $KG2\_to\_KG1$ which expresses the same similarity in the other direction;

20return $e_{return}$ ;

Algorithm 2 recursively delete

There are 1-to-1 assumptions in some EA datasets (such as $DBP\-15K$ ) and it is a useful information for alignment. Formally, we define the 1-to-1 assumption as follows: first, there is a range of alignable entities $A_{1}\subset\mathcal{E}_{1}$ and $A_{2}\subset\mathcal{E}_{2}$ (for $DBP15K$ , $A_{1}\subsetneqq\mathcal{E}_{1}$ ). Second, the equivalence between $A_{1}$ and $A_{2}$ is a bijection. Note that the assumption does not have aligning regularity for entities outside the range except that they can’t be aligned with entities inside the range. For example, $DBP15K_{ZH\_EN}$ ’s $ZH$ graph has a 15,000 sized entity set $A_{1}$ and $EN$ graph has a 15,000 sized entity set $A_{2}$ . It is informed that every entity in $A_{1}$ must have a unique entity in $A_{2}$ as its alignment counterpart. Many ranking-based EA methods leverages the 1-to-1 range assumption, however, PARIS do not. Therefore, in implementation in order to leverage the range assumption we take the set $A_{1}$ and $A_{2}$ as input and filters out any alignment sentence that aligns $A_{1}$ to $\mathcal{E}_{2}\setminus A_{2}$ or $\mathcal{E}_{1}\setminus A_{1}$ to $A_{2}$ .

Align-bridge’s similarities (type I path) are naturally sparse, because it only considers the entity pairs which is effectively linked by the logical path. Entity name/description’s similarities (type II path) are dense, however, it is noisy and most of the similarities are useless. P-NAL’s overall algorithm (depicted in the next Section) exhaustively search for and stores the two types of similarity sentences for a specific to-be-aligned entity versus any entity in the other KG. Then, because of the sparsity of informative similarity signal, the similarity sentences is rearranged into ordered linked list, one list for a specific to-be-aligned entity. The sentences are ordered (descending) by its $expectation$ value. We only store the top $K_{sim}$ similarity sentences in the linked list, where $K_{sim}$ is a model hyper-parameter.

Then we perform a recursive bidirectional matching algorithm (rBMat) which has similar idea with BMat (Tang et al., 2020) but different implementation. See Algorithm 1 and Algorithm 2 for details. The main idea is to recursively delete the similarity sentences that don’t conform the 1-to-1 assumption. Considering sorting cost, our rBMat has $O(kn^{2})$ time complexity and $O(kn)$ space complexity, where $k$ represents $K_{sim}$ and $k\ll n$ .

We found that there are still some mismatches after performing rBMat algorithm and most of them share a same pattern. For example, $e_{1}\leftrightarrow e_{2}$ and $e_{3}\leftrightarrow e_{4}$ are two ground truth pairs, however, rBMat’s result is $e_{1}\leftrightarrow e_{3}$ and $e_{2}\leftrightarrow e_{4}$ . We implements a simple “modify matches” algorithm to handle this. We exhaustively search for and modify (swap the alignment) the cases in which $expt(e_{1}\leftrightarrow e_{2})+expt(e_{3}\leftrightarrow e_{4})$ is greater than $expt(e_{1}\leftrightarrow e_{3})+expt(e_{2}\leftrightarrow e_{4})$ , where $expt$ is the $expectation$ of the truth-value.

3.5. Overall structure of P-NAL

Generally speaking, our method adopts the same optimization method and iteration strategy as PARIS’s implementation, with some minor differences. The overall structure of supervised P-NAL algorithm is elaborated in Algorithm 3. Line 1-3 finetunes the BERT unit. Line 6-13 performs similarity inference and relational alignment. Line 14-19 performs the proposed matching algorithm to obtain alignment results of an iteration. $End\_iteration$ is a hyper-parameter. Note that the inference within each iteration benefits from the alignment results (both entities and relations) of the previous iteration. Registering evidential information (line 10 in Algorithm 3) means memorizing which premises form the specific align-bridge and such information will be used to generate evidence log file.

input : Two knowledge graphs

\mathcal{KG}_{1}

and

\mathcal{KG}_{2}

output : Alignment result and other information.

2 run finetuning for BERT unit;

3 compute entity/value embeddings with the BERT unit;

4 generate synthetic attribute triples for seed alignments (for supervision);

5 load the knowledge graphs;

6 for $iteration\leftarrow 0$ to $end\_iteration$ do

7 for $y_{1}$ in $\mathcal{E}_{1}$ do

/* aligning for different entities of

\mathcal{E}_{1}

is divided into multiple parallel threads */

8 for $x_{1}$ , $x_{2}$ , $y_{2}$ that forms a sound align-bridge path with $y_{1}$ do

/* Type I path. The paths are searched in a depth-first manner */

9 update the estimations of $w$ and $w+$ for relational inheritance;

10 perform inference with the inference steps and inference rules in Section 3.1;

11 register evidential information of the path;

13 for $y_{2}$ in $\mathcal{E}_{2}$ do

/* Type II path. */

14 retrieve embedding similarity for $y_{1}\leftrightarrow y_{2}$ ;

15 integrate the similarity with prior conclusions of align-bridge by revision rule;

17 filter the similarity sentences with 1-to-1 range assumption;

18 insert the sentences into a top-k ordered linked list;

20 dump the similarity sentences;

21 perform recursive bidirectional matching;

22 modify matches;

23 dump alignment results and evidences (log file);

24 increase frequency of relation inheritance statement;

Algorithm 3 P-NAL(supervised)

3.6. Unsupervised Learning

The seed alignment set is not always available for different EA tasks or real-world EA applications. So an unsupervised scenario is sometimes adopted to evaluate the industrial applicability of EA methods. We adapt our method to the unsupervised scenario, that is, without using seed alignments. The BERT embedding model need to finetune on seed alignments, so we adopt a bootstrap** strategy. First, a P-NAL instance performs alignment on the dataset with 0% seed and no literal embedding information. Then, filter the initial alignment results’ $expectation$ with a threshold $\theta_{filter}$ and use the filtered result as the training set of BERT. Next, another 0% seed P-NAL instance performs alignment with the help of BERT’s literal embedding information to obtain the final result.

3.7. Interpretability

Following (Rudin, 2019; Marcinkevičs and Vogt, 2020), interpretable ML (machine learning) focuses on designing models that are inherently interpretable, while explainable ML tries to provide post hoc explanations for existing black box models. P-NAL is highly interpretable and self-explanatory. It is arguably more interpretable than PARIS for the following two reasons. First, with the introduction of evidence amount (confidence) and logical inference rules, P-NAL processes data with more information and generates a more informative explanation. Second, P-NAL manages value similarity and name similarity in a unified logical framework, while PARIS doesn’t leverage such information.

P-NAL is self-explanatory in the sense that it generates a log file of evidences for the alignments so we can inspect the file after one iteration. This feature enhances the troubleshooting capacity of us to some extent during the development process of P-NAL. For example, inspecting the faulty alignments in the evidence file inspired many decision choices in this paper. The generated evidences are displayed in our GitHub repository.

Using the neural BERT model does not weaken the interpretability of type I similarity inference path because utilizing literal similarity does not affect the interpretable inference steps. Moreover, as we only keep the attribute value similarities with a score above the threshold, these similarities are easily understood and self-explanatory, except the wrong ones. Our method tolerates faulty attribute value similarity because the align-bridge needs a conjunction of all premises, while faulty similarities usually can’t form a complete premise set.

4. Relation with Other Methods

In this section, we will discuss the relation between our proposed method and methods with other forms. We will propose some preliminary explanations of certain translational embedding methods and embedding-path EA methods from a theoretical perspective.

The way NAL models KG information and the inference process has a similar part with “uncertainty estimation” (Hu et al., 2023) in the natural language processing domain. The truth-value of alignments shares some similarity with the distributive view of facts or beliefs which views facts as probability distribution of random variables. Also, the concept of confidence is shared with some information extraction systems such as Markov logic network (Jiang et al., 2012), which assigns confidence to extracted facts or logical formulas in some intermediate steps.

4.1. Relation with Translational Embedding Methods

The well-known KG embedding model TransE (Bordes et al., 2013) is initially proposed for link prediction tasks. It may be partially explained from a logical perspective of NAL (or equivalently other logic with similar expressive power). Consider a specific type of Horn clauses $((*,A,B)\rightarrow R_{1}\wedge\ (*,B,C)\rightarrow R_{2})\ \Rightarrow(*,A,C)% \rightarrow R_{3}\ \left\langle f_{1},c_{1}\right\rangle$ , the following three triples

	$\displaystyle(Martin\_Luther\_King\_Jr,\ birthPlace,\ Georgia\_(U.S.\_state))$
	$\displaystyle(Georgia\_(U.S.\_state),\ country,\ United\_States)$
	$\displaystyle(Martin\_Luther\_King\_Jr,\ citizenship,\ United\_States)$

together forms a piece of positive evidence of an instantiated Horn clause, in which $R_{1}$ , $R_{2}$ and $R_{3}$ is replaced by $birthPlace$ , $country$ and $citizenship$ respectively. We conjecture that the gradient descent optimization process of TransE implicitly performs approximate logical inference and evidence aggregation. In the above example for each of the three triples, $||\textbf{h}+\textbf{r}-\textbf{t}||$ (where bold format represent a vector) is minimized once per epoch (ignoring margin-based criterion), leading to $\textbf{birthPlace}+\textbf{country}\approx\textbf{citizenship}$ . Thus, the instantiated Horn clause together with its truth-value may be represented by the vector representations’ correlation, and the truth-value may be reflected in distance $||\textbf{birthPlace}+\textbf{country}-\textbf{citizenship}||$ . Note that these three relations may appear in more than one Horn clauses, so the gradients from the evidences of a Horn clause may confuse with (or conflict with) those from another Horn clause, for example $\textbf{manufacturer}+\textbf{country}\approx\textbf{made-}\\ \textbf{InCountry}$ . The training process may force vector birthPlace to be nearly perpendicular with manufacturer, otherwise, there may be hallucination in link prediction or EA results. A similar explanation of hallucination may apply to LLMs. A similar analysis applies to the vector representations of two relations which frequently appear on the same head entity (or tail entity). It’s arguable that the test set link prediction process of TransE mainly relies on Horn clauses, because from a logical perspective there is no other information. In this paper Horn clauses will not be extracted and managed, leaving for further research.

MTransE (Chen et al., 2016) is a translational embedding-based EA method. It encodes the two KGs’ relational triples separately with the TransE loss criterion $S_{K}=\Sigma_{(h,r,t)}||\textbf{h}+\textbf{r}-\textbf{t}||$ . It proposed a “distance-based axis calibration” alignment model in order to coincide the vectors of counterpart entities/relations. The corresponding loss is $S_{a_{2}}=\Sigma||\textbf{e}-\textbf{e}^{\prime}||+||\textbf{r}-\textbf{r}^{% \prime}||$ ( $S_{a_{2}}$ only has the first item if there is no available seed relation alignment). The seed and derived alignments are assumed to have $\textbf{e}\approx\textbf{e}^{\prime}$ and we see it as the embedding representation of the similarity statement $e\leftrightarrow e^{\prime}$ , with its truth-value somehow represented by the distance $||\textbf{e}-\textbf{e}^{\prime}||$ . Theoretically, the distance can’t simultaneously represent frequency and confidence by itself, but more possibly a combined effect. We argue that MTransE performs approximate inference that is similar with the type III path, because if the learned embedding constraints of the four premises are considered simultaneously, we can get $\textbf{r}\approx\textbf{r}^{\prime}$ which we interpret as $r\leftrightarrow r^{\prime}$ . Similarly, MTransE performs approximate inference of the type I path (with functionality omitted and $r\rightarrow r^{\prime}$ replaced by $r\leftrightarrow r^{\prime}$ ) to obtain derived alignment results.

4.2. Relation with Embedding-path EA Methods

Here we propose some preliminary explanations of the similarity inference aspect of some embedding-path EA methods from a theoretical perspective.

The first method to be discussed is BERT-INT. It generates entity embedding using the name/description information with BERT unit and the embedding is $C(e)=MLP(CLS(e))$ . It uses pairwise margin loss to approximately enforce $C(e)\approx C(e^{\prime})$ . Different from MTransE which performs path inference implicitly with the gradient optimization of loss criterions, BERT-INT explicitly performs path inference with its proposed interaction model. Every element of the neighbor-view interaction matrix represents a inference process of a type I path. Its path omits functionality and relation alignment (for BERT-INT fails to utilize its proposed relation mask matrix). Because of the ignorance of relation type, its premise (1) and (2) has the form of $\left(*,x_{1},y_{1}\right)\rightarrow\$r$ and $\left(*,x_{2},y_{2}\right)\rightarrow\$r$ which represents “There exists an unspecified relation between $x_{1}$ / $y_{1}$ , and (another) unspecified relation between $x_{2}$ / $y_{2}$ ”. Moreover, its premise (5) fails to utilize derived alignments, because BERT-INT is not iterative. With such premises, BERT-INT’s type I path inference’s effectiveness is supposed to be lower than that of P-NAL’s. Similarly, every element of the attribute-view interaction matrix represents a type I path which has attribute triples as premises (1) and (2). BERT-INT’s evidence aggregation method is different from P-NAL which uses probabilistic revision and revision rules.

The second method to be discussed is FGWEA. Its multi-view Optimal Transport (OT) alignment step combines four cost matrices for the OT problem, that is, $C_{sum}=C_{stru}+C_{rel}+C_{name}+C_{attr}$ . Obtaining the cost matrices corresponds to the similarity inference process and different matrices correspond to different groups of inference paths. Among them, $C_{rel}$ corresponds to a degenerated type I path inference where relation alignment is obtained by relation names and without the consideration of functionality. $C_{stru}$ corresponds to a further degenerated type I path inference (similar with BERT-INT’s neighbor-view interaction). $C_{name}$ corresponds to type II path inference. $C_{attr}$ fails to model the (fine-grained) attributive type I path because it uses the concatenation of all attribute triples of an entity.

In this paper, BERT-INT and FGWEA are classified as embedding-path EA methods because their embedding module couples with the path inference to some extent. In contrast, P-NAL, which we classify as path-based, performs inference wherever it can and uses embeddings minimally.

Table 2. Dataset statistics.

|\mathcal{E}|

|\mathcal{R}|

|\mathcal{T_{R}}|

and

|\mathcal{T_{A}}|

represent the number of entities, relation types, relation triples and attribute triples in each KG, respectively.

Dataset	$\|\mathcal{E}\|$	$\|\mathcal{R}\|$	$\|\mathcal{T_{R}}\|$	$\|\mathcal{T_{A}}\|$
$DBP15K_{ZH\_EN}$	19,388	1,701	70,414	379,684
$DBP15K_{ZH\_EN}$	19,572	1,323	95,142	567,755
$DBP15K_{JA\_EN}$	19,814	1,299	77,214	354,619
$DBP15K_{JA\_EN}$	19,780	1,153	93,484	497,230
$DBP15K_{FR\_EN}$	19,661	903	105,998	528,665
$DBP15K_{FR\_EN}$	19,993	1,208	115,722	576,543
D-W-15K-V2	15,000	167	73,983	66,813
D-W-15K-V2	15,000	121	83,365	175,686

5. Experiments and Results

5.1. Datasets and Settings

We evaluate our model on two EA datasets: the widely used cross-lingual dataset $DBP15K$ (see (Sun et al., 2017) for details) and a monolingual multi-source dataset D-W-15K-V2 (Sun et al., 2020). $DBP15K$ consists of three subsets of cross-lingual KG pairs extracted from DBpedia: $DBP15\-K_{ZH\_EN}$ (Chinese to English), $DBP15K_{JA\_EN}$ (Japanese to English), and $DBP15K_{FR\_EN}$ (French to English). Each KG pair contains 15,000 seed alignments. D-W-15K-V2 consists of two English KGs extracted from DBpedia and WikiData, respectively, and there are 15,000 seed alignments. The statistics of the datasets are listed in Table 2.

The configuration of our main results on $DBP15K$ (Table 3) consists of five settings: $Attr.,Name,Trans.,Desc.$ and $Seed$ , explained as follows. $Attr.$ is for utilizing the attribute triples. $Name$ is for utilizing the entity name information. $Trans.$ is for utilizing translators, such as the Google translator. $Desc.$ is for utilizing the information of entity description. $Seed$ is for the percentage of seed alignments, 30% for the conventional supervised scenario and 0% for the unsupervised scenario.

We categorize baselines into five configuration groups and run P-NAL using the configurations for each group. Group 1 is the supervised scenario with attribute triples. Group 2 is the unsupervised scenario with attribute triples. Group 3 is the supervised or unsupervised scenario with entity name information and translator. Group 4 is the supervised scenario with entity name and description information, which is the same scenario as BERT-INT. Group 5 is the supervised or unsupervised scenario with attribute triples and entity name information.

Most hyper-parameters of our model remain the same across different datasets and configurations, except for group 3 which will be discussed later. The hyper-parameters are selected manually. We set $iota$ = 0.5, $theta$ = 0.1, $inc\_relation\_f$ = 0.5 and $end\_iteration$ = 21. We set $C_{name}$ = 0.8 for FR_EN otherwise 0.6. $K_{sim}$ is set to 80. $\theta_{filter}$ is set to 0.6. The BERT unit is finetuned for 15 epochs. The dimension of the BERT CLS embedding is 768 and the dimension of BERT unit’s embedding output is 300. P-NAL do not perform data cleaning. Roughly speaking, P-NAL has more hyper-parameters than many other EA methods, which may be a drawback. The automatic setting of hyper-parameters is left for further research. The entity name translations are obtained by Google translator, which is consistent with many other studies.

Our P-NAL model is implemented in java and the BERT unit is implemented in python with PyTorch. All experiments are performed on a Linux server with an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz, 251G RAM and a NVIDIA GeForce RTX 3090 GPU.

Table 3. Evaluation Results of all compared EA methods on

DBP15K

in five different configuration groups. Methods marked with * use the additional information of relation names.

Config	Model	Settings					ZH_EN	JA_EN	FR_EN
Config	Model	Attr.	Name	Trans.	Desc.	Seed	Hits@1	Hits@1	Hits@1
1	`JAPE`	✓				30%	0.412	0.363	0.324
	`GCNAlign`	✓				30%	0.413	0.399	0.373
	`PARIS+`	✓				30%	0.904	0.874	0.928
	`P-NAL`	✓				30%	0.984	0.970	0.990
2	`PARIS`	✓				0%	0.777	0.785	0.793
	`FGWEA*`	✓				0%	0.929	0.922	0.967
	`P-NAL`	✓				0%	0.978	0.967	0.988
3	`RDGCN`		✓	✓		30%	0.708	0.767	0.886
	`CUEA`		✓	✓		30%	0.921	0.946	0.956
	`UPL-EA`		✓	✓		30%	0.949	0.970	0.995
	`SE-UEA`		✓	✓		0%	0.935	0.951	0.957
	`LightEA`		✓	✓		0%	0.952	0.981	0.995
	`FGWEA*`		✓	✓		0%	0.959	0.982	0.994
	`P-NAL`		✓	✓		0%	0.938	0.988	0.996
4	`BERT-INT`	✓	✓		✓	30%	0.968	0.964	0.995
4	`P-NAL`	✓	✓		✓	30%	0.998	0.995	0.999
5	`TEA`	✓	✓			30%	0.941	0.941	0.979
	`FGWEA*`	✓	✓			0%	0.976	0.978	0.997
	`P-NAL`	✓	✓			0%	0.989	0.984	0.998

5.2. Evaluation Metric

We use Hits@1 (which is the same metric as recall for EA) as the sole evaluation metric of our main results of $DBP15K$ for the following reasons. Mean Reciprocal Rank (MRR) is unavailable for P-NAL because it does not provide a alignment ranking for the test entities. There exist a non-negligible number of equivalent entity pairs that are not in the ground-truth of $DBP15K$ , so the precision and F1-score can’t be measured properly. We use the precision (P), recall (R), and F1 score for dataset D-W-15K-V2.

5.3. Main Results

We compare P-NAL with the following methods, most of which are new and well-performing: JAPE (Sun et al., 2017), GCNAlign (Wang et al., 2018), PARIS+ (Leone et al., 2022), PARIS (Suchanek et al., 2011), FGWEA (Tang et al., 2023), RDGCN (Wu et al., 2019) ,CUEA (Zhao et al., 2022), UPL-EA (Ding et al., 2023), SE-UEA (Jiang et al., 2023a), LightEA (Mao et al., 2022), BERT-INT (Tang et al., 2020), TEA (Zhao et al., 2023). Their results are fetched from their original papers if possible, with their settings carefully examined.

The experimental settings and results of P-NAL and all compared baselines on $DBP15K$ are in Table 3. As observed, P-NAL achieves the best performance in term of Hits@1 in all five groups except group 3. P-NAL outperforms BERT-INT significantly with identical setting and the same embedding method, verifying the effectiveness of our similarity inference combined with the matching algorithm. P-NAL outperforms FGWEA in group 2 and 5, indicating that it successfully utilizes the information of attribute triples. In group 1, two classic EA model JAPE and GCNAlign are outperformed by the newer approaches (PARIS+ and P-NAL) by a significant margin, indicating the effective innovation of the new EA approaches in the recent years. The performance of P-NAL in unsupervised group 2 approaches its performance in supervised group 1 with a minor gap, indicating that our proposed bootstrap** strategy effectively adapts to the unsupervised setting (with the help of attribute information).

As for configuration group 3, the attribute information is unavailable and we have to rely on the name and translation information to bootstrap the alignment process. We use two BERT units instead of one to separately embed the original entity names and the translated entity names. The BERT units are finetuned separately. We adapt the bootstrap** strategy in Section 3.6 into three steps. In each step, we perform alignment with a P-NAL instance and filter the alignment results as the training set of the BERT units of the subsequent step. The first step uses unfinetuned BERT units and only considers translated names because the unfinetuned embeddings of original entity names have relatively poor quality. We adjust the hyper-parameters accordingly, the confidence $C_{name}$ ’s evidence amount $w$ is halved compared with other settings because there are two embeddings for one entity. Also, we decrease the confidence $C_{name}$ by 0.2 and 0.1 for the frist two steps, matching the quality of the embeddings. $K_{sim}$ is set to 400 and other hyper-parameters are unchanged. P-NAL outperforms other methods in configuration group 3 on JA_EN and FR_EN, including three supervised ones. However, on ZH_EN, unsupervised FGWEA and LightEA yields better performance. This is possibly due to the differences of matching algorithm, for example both of them involve the Sinkhorn algorithm. The error accumulation effect of P-NAL’s strategy in group 3 is left for further study.

5.4. Influence of Confidence Hyper-parameter

The experiment results of Figure 2 shows how entity name/description embedding similarity confidence $C_{name}$ affects Hits@1. These experiments are performed on configuration group 4 without using attribute value embedding information. We adjust $C_{name}$ with other conditions unchanged. The Hits@1 curve is approximately concave and for $ZH\_EN$ , $JA\_EN$ and $FR\_EN$ respectively, it reaches maximum performance at 0.6, 0.55 and 0.8. It shows that the informative embedding similarity enhances the performance to different extents. French is often regarded as more closely related to English than Chinese or Japanese, so the BERT unit learns representation easier and thus produces more confident embedding similarity. Pretraining corpus of the BERT unit may include relevant triples (in the form of natural language sentences) which may have same informational origin with DBpedia. So the embedding similarity’s evidences may have an overlap part with align-bridge’s evidences. The revision rule is only appropriately used when the two premises don’t share same evidence (or equivalently their evidential bases do not overlap). So the appropriate confidence value need to be lower than the confidence of the BERT output (if it provides such information) in order to exclude the overlap. The best-performance confidence of each dataset is conjectured to reflect the combined influence of embedding quality of the BERT unit and the evidence overlap** effect. The $C_{name}$ confidence value can be alternatively set equal to the cosine similarity of the embeddings, resulting in a slightly decreased performance. This is a good choice if you want to avoid hyper-parameter tuning.

5.5. Ablation study

To validate the effectiveness of each component in P-NAL, we compare it with several ablations. We demonstrate the results in Table 4, where w/o represents without and $E_{value}$ represents attribute value embedding information. $all\_revision$ represents replacing probabilistic revision rule with revision rule and $all\_prob\_revision$ is the opposite. $1v1\_range$ is the 1-to-1 matching range information that is utilized in Section 3.4, $modify\_matches$ is a proposed algorithm in Section 3.4 and $inc\_relation\_f$ represents the increment of inheritance statement’s frequency.

P-NAL approximately performs the best compared with these variants. However, $all\_prob\_r\-evision$ obtains slightly better result on ZH_EN compared with P-NAL, which need further research to explain. We insist on retaining the revision rule because it can deal with negative evidences of similarity sentences, while probabilistic revision rule cannot. The ablation results together with the main results show that P-NAL seems to have good monotonicity in Hits@1 performance in the sense that when adding extra information or procedure (component) into the model, the Hits@1 increases monotonically. Arguably, this is because introducing two-dimensional truth-values in every inference step separates confidence from truth degree (frequency) in every statement, thus the information of relative reliability level is stored for further usage. P-NAL also achieves competitive performance compared with FGWEA on D-W-15K-V2 with attribute triples (as FGWEA uses attribute information for semantic comparison).

Table 4. Ablation study of P-NAL.

Model	ZH_EN	JA_EN	D-W-15K-V2
Model	Hits@1	Hits@1	P	R	$F_{1}$
P-NAL	0.989	0.984	0.917	0.906	0.912
- w/o $E_{value}$	0.985	0.980	-	-	-
- $all\_revision$	0.903	0.912	0.857	0.814	0.835
- $all\_prob\_revision$	0.991	0.987	-	-	-
- w/o $1v1\_range$	0.985	0.978	-	-	-
- w/o $modify\_matches$	0.987	0.982	0.912	0.901	0.907
- w/o $inc\_relation\_f$	0.978	0.975	0.918	0.906	0.912
FGWEA	0.976	0.978	0.952	0.903	0.927

6. Conclusion and Future Work

In this paper, we propose an entity alignment method named P-NAL, tackling the EA problem by modeling inference processes (similarity inference) which obtains similarity through paths that connect the entities. P-NAL leverages two type of paths, exploiting both structural and side information of KGs. Using the similarities, P-NAL matches the entities by the proposed rBMat algorithm with modification. P-NAL is also successfully adapted to the unsupervised scenario and a scenario without attribute triples. Compared with up-to-date EA methods, P-NAL attains competitive result on dataset D-W-15K-V2 and various settings of $DBP15k$ , indicating that it successfully handles the most effective part of similarity inference.

We take a step in re-evaluating the design choices of different EA models, by providing some interesting insights (explanations) of different methods and competitive results compared with them. Hopefully, our approach may broaden the view and deepen the understanding of the EA research community. How to combine embedding models with path inference and facilitate embedding models’ full potential is a research question to be further studied.

Thanks to the expressiveness of the logic NAL, P-NAL can be extended with other capabilities in future works, such as integrating ontological information and processing paths for negative similarity evidences.

Acknowledgements.

This work was supported by the National Natural Science Foundation of China (62276057).

References

(1)
Auer et al. (2007) Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In international semantic web conference. Springer, 722–735.
Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol. 2. 2787–2795.
Cai et al. (2022) Weishan Cai, Wenjun Ma, Jieyu Zhan, and Yuncheng Jiang. 2022. Entity alignment with reliable path reasoning and relation-aware heterogeneous graph transformer. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22). 1930–1937.
Chen et al. (2018) Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3998–4004.
Chen et al. (2016) Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2016. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1511–1517.
Dao et al. (2023) Nhat-Minh Dao, Thai V Hoang, and Zonghua Zhang. 2023. A Benchmarking Study of Matching Algorithms for Knowledge Graph Entity Alignment. arXiv preprint arXiv:2308.03961 (2023).
Ding et al. (2023) Qijie Ding, Jie Yin, Daokun Zhang, and Junbin Gao. 2023. Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment. arXiv preprint arXiv:2307.02075 (2023).
Fanourakis et al. (2023) Nikolaos Fanourakis, Vasilis Efthymiou, Dimitris Kotzinos, and Vassilis Christophides. 2023. Knowledge graph embedding methods for entity alignment: experimental review. Data Mining and Knowledge Discovery 37, 5 (2023), 2070–2137.
Gesese et al. (2021) Genet Asefa Gesese, Russa Biswas, Mehwish Alam, and Harald Sack. 2021. A survey on knowledge graph embeddings with literals: Which model links better literal-ly? Semantic Web 12, 4 (2021), 617–647.
Guo et al. (2022) Lingbing Guo, Yuqiang Han, Qiang Zhang, and Huajun Chen. 2022. Deep reinforcement learning for entity alignment. In Findings of the Association for Computational Linguistics: ACL 2022. 2754–2765.
Hu et al. (2023) Mengting Hu, Zhen Zhang, Shiwan Zhao, Minlie Huang, and Bingzhe Wu. 2023. Uncertainty in Natural Language Processing: Sources, Quantification, and Applications. arXiv preprint arXiv:2306.04459 (2023).
Ji et al. (2021) Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. 2021. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE transactions on neural networks and learning systems 33, 2 (2021), 494–514.
Jiang et al. (2023b) Chuanyu Jiang, Yiming Qian, Lijun Chen, Yang Gu, and Xia Xie. 2023b. Unsupervised Deep Cross-Language Entity Alignment. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 3–19.
Jiang et al. (2012) Shangpu Jiang, Daniel Lowd, and De**g Dou. 2012. Learning to refine an automatically extracted knowledge base using markov logic. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 912–917.
Jiang et al. (2023a) Tingting Jiang, Chenyang Bu, Yi Zhu, and Xindong Wu. 2023a. Integrating symbol similarities with knowledge graph embedding for entity alignment: an unsupervised framework. Intelligent Computing 2 (2023), 0021.
Leone et al. (2022) Manuel Leone, Stefano Huber, Akhil Arora, Alberto García-Durán, and Robert West. 2022. A critical re-evaluation of neural methods for entity alignment. In Proceedings of the VLDB Endowment, Vol. 15. 1712–1725. https://doi.org/10.14778/3529337.3529355
Lin et al. (2023) Lin Lin, Lizheng Zu, Feng Guo, Song Fu, Yancheng Lv, Hao Guo, and Jie Liu. 2023. Using combinatorial optimization to solve entity alignment: An efficient unsupervised model. Neurocomputing 558 (2023), 126802.
Liu et al. (2023) Bing Liu, Tiancheng Lan, Wen Hua, and Guido Zuccon. 2023. Dependency-aware Self-training for Entity Alignment. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 796–804.
Liu et al. (2022) Xiao Liu, Haoyun Hong, Xinghao Wang, Zeyi Chen, Evgeny Kharlamov, Yuxiao Dong, and Jie Tang. 2022. Selfkg: Self-supervised entity alignment in knowledge graphs. In Proceedings of the ACM Web Conference 2022. 860–870.
Liu et al. (2020) Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, and Tat-Seng Chua. 2020. Exploring and evaluating attributes, values, and structures for entity alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6355–6364.
Logan IV et al. (2019) Robert L Logan IV, Nelson F Liu, Matthew E Peters, Matt Gardner, and Sameer Singh. 2019. Barack’s wife hillary: Using knowledge-graphs for fact-aware language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5962–5971.
Luo and Yu (2022) Shengxuan Luo and Sheng Yu. 2022. An accurate unsupervised method for joint entity alignment and dangling entity detection. In Findings of the Association for Computational Linguistics: ACL 2022. 2330–2339.
Mao et al. (2021a) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021a. Boosting the Speed of Entity Alignment 10 ×: Dual Attention Matching Network with Normalized Hard Sample Mining. In Proceedings of the Web Conference 2021. 821–832. https://doi.org/10.1145/3442381.3449897
Mao et al. (2021b) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2021b. From alignment to assignment: Frustratingly simple unsupervised entity alignment. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2843–2853.
Mao et al. (2022) Xin Mao, Wenting Wang, Yuanbin Wu, and Man Lan. 2022. LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 825–838.
Mao et al. (2020a) Xin Mao, Wenting Wang, Huimin Xu, Man Lan, and Yuanbin Wu. 2020a. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In Proceedings of the 13th International Conference on Web Search and Data Mining. 420–428. https://doi.org/10.1145/3336191.3371804
Mao et al. (2020b) Xin Mao, Wenting Wang, Huimin Xu, Yuanbin Wu, and Man Lan. 2020b. Relational reflection entity alignment. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1095–1104.
Marcinkevičs and Vogt (2020) Ričards Marcinkevičs and Julia E Vogt. 2020. Interpretability and explainability: A machine learning zoo mini-tour. arXiv preprint arXiv:2012.01805 (2020).
Peyré et al. (2016) Gabriel Peyré, Marco Cuturi, and Justin Solomon. 2016. Gromov-wasserstein averaging of kernel and distance matrices. In International conference on machine learning. PMLR, 2664–2672.
Rudin (2019) C. Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature machine intelligence 1, 5 (2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x
Suchanek et al. (2011) Fabian M Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic Alignment of Relations, Instances, and Schema. In Proceedings of the VLDB Endowment, Vol. 5. 157–168.
Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. 697–706.
Sun et al. (2017) Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-lingual entity alignment via joint attribute-preserving embedding. In The Semantic Web–ISWC 2017: 16th International Semantic Web Conference. Springer, 628–644.
Sun et al. (2020) Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. In Proceedings of the VLDB Endowment, Vol. 13. 2326–2340.
Tang et al. (2023) Jianheng Tang, Kangfei Zhao, and Jia Li. 2023. A Fused Gromov-Wasserstein Framework for Unsupervised Knowledge Graph Entity Alignment. arXiv preprint arXiv:2305.06574 (2023).
Tang et al. (2020) ** Li. 2020. BERT-INT: a BERT-based interaction model for knowledge graph alignment. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3174–3180.
Tian et al. (2023) Xiaobin Tian, Zequn Sun, and Wei Hu. 2023. Generating Explanations to Understand and Repair Embedding-based Entity Alignment. arXiv preprint arXiv:2312.04877 (2023).
Wang (2005) Pei Wang. 2005. Experience-grounded semantics: a theory for intelligent systems. Cognitive Systems Research 6, 4 (2005), 282–302.
Wang (2013) Pei Wang. 2013. Non-axiomatic logic: A model of intelligent reasoning. World Scientific.
Wang et al. (2018) Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual knowledge graph alignment via graph convolutional networks. In Proceedings of the 2018 conference on empirical methods in natural language processing. 349–357.
Wu et al. (2019) Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, and Dongyan Zhao. 2019. Relation-aware entity alignment for heterogeneous knowledge graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 5278–5284.
Xu et al. (2019) Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. 2019. Gromov-wasserstein learning for graph matching and node embedding. In International conference on machine learning. PMLR, 6932–6941.
Xu et al. (2020) Kun Xu, Linfeng Song, Yansong Feng, Yan Song, and Dong Yu. 2020. Coordinated reasoning for cross-lingual knowledge graph alignment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 9354–9361.
Zeng et al. (2021) Kaisheng Zeng, Chengjiang Li, Lei Hou, Juanzi Li, and Ling Feng. 2021. A comprehensive survey of entity alignment for knowledge graphs. AI Open 2 (2021), 1–13. https://doi.org/10.1016/j.aiopen.2021.02.002
Zeng et al. (2020) Weixin Zeng, Xiang Zhao, Jiuyang Tang, and Xuemin Lin. 2020. Collective entity alignment via adaptive features. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1870–1873.
Zhao et al. (2022) Xiang Zhao, Weixin Zeng, Jiuyang Tang, Xinyi Li, Minnan Luo, and Qinghua Zheng. 2022. Toward entity alignment in the open world: an unsupervised approach with confidence modeling. Data Science and Engineering 7, 1 (2022), 16–29.
Zhao et al. (2023) Yu Zhao, Yike Wu, Xiangrui Cai, Ying Zhang, Haiwei Zhang, and Xiaojie Yuan. 2023. From Alignment to Entailment: A Unified Textual Entailment Framework for Entity Alignment. arXiv preprint arXiv:2305.11501 (2023).
Zhu and Zoubin (2002) Xiao** Zhu and Ghahramani Zoubin. 2002. Learning from labeled and unlabeled data with label propagation. ProQuest number: information to all users (2002).