-
Entity Type Prediction Leveraging Graph Walks and Entity Descriptions
Authors:
Russa Biswas,
Jan Portisch,
Heiko Paulheim,
Harald Sack,
Mehwish Alam
Abstract:
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity ty** is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity ty** leveraging different graph walk strategies in RDF2vec together with textual entity d…
▽ More
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity ty** is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity ty** leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity ty** task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity ty** in KGs for both fine-grained and coarse-grained classes. The results show that the combination of order-aware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results.
△ Less
Submitted 29 July, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings
Authors:
Jan Portisch,
Heiko Paulheim
Abstract:
Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine their overall performance. The evalua…
▽ More
Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine their overall performance. The evaluation is then assessed in terms of how well the embedding approach performs on the task at hand. Still, it is hardly evaluated (and often not even deeply understood) what information the embedding approaches are actually learning to represent.
To fill this gap, we present the DLCC (Description Logic Class Constructors) benchmark, a resource to analyze embedding approaches in terms of which kinds of classes they can represent. Two gold standards are presented, one based on the real-world knowledge graph DBpedia and one synthetic gold standard. In addition, an evaluation framework is provided that implements an experiment protocol so that researchers can directly use the gold standard. To demonstrate the use of DLCC, we compare multiple embedding approaches using the gold standards. We find that many DL constructors on DBpedia are actually learned by recognizing different correlated patterns than those defined in the gold standard and that specific DL constructors, such as cardinality constraints, are particularly hard to be learned for most embedding approaches.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
KERMIT -- A Transformer-Based Approach for Knowledge Graph Matching
Authors:
Sven Hertling,
Jan Portisch,
Heiko Paulheim
Abstract:
One of the strongest signals for automated matching of knowledge graphs and ontologies are textual concept descriptions. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is available to researchers. However, performing pairwise comparisons of all textual descriptions of concepts in two knowledge graphs is expensive and scales quadr…
▽ More
One of the strongest signals for automated matching of knowledge graphs and ontologies are textual concept descriptions. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is available to researchers. However, performing pairwise comparisons of all textual descriptions of concepts in two knowledge graphs is expensive and scales quadratically (or even worse if concepts have more than one description). To overcome this problem, we follow a two-step approach: we first generate matching candidates using a pre-trained sentence transformer (so called bi-encoder). In a second step, we use fine-tuned transformer cross-encoders to generate the best candidates. We evaluate our approach on multiple datasets and show that it is feasible and produces competitive results.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Ontology Matching Through Absolute Orientation of Embedding Spaces
Authors:
Jan Portisch,
Guilherme Costa,
Karolin Stefani,
Katharina Kreplin,
Michael Hladik,
Heiko Paulheim
Abstract:
Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based map** approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evalua…
▽ More
Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based map** approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evaluation using synthetic and real-world datasets. We find in experiments with synthetic data, that the approach works very well on similarly structured graphs; it handles alignment noise better than size and structural differences in the ontologies.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Walk this Way! Entity Walks and Property Walks for RDF2vec
Authors:
Jan Portisch,
Heiko Paulheim
Abstract:
RDF2vec is a knowledge graph embedding mechanism which first extracts sequences from knowledge graphs by performing random walks, then feeds those into the word embedding algorithm word2vec for computing vector representations for entities. In this poster, we introduce two new flavors of walk extraction coined e-walks and p-walks, which put an emphasis on the structure or the neighborhood of an en…
▽ More
RDF2vec is a knowledge graph embedding mechanism which first extracts sequences from knowledge graphs by performing random walks, then feeds those into the word embedding algorithm word2vec for computing vector representations for entities. In this poster, we introduce two new flavors of walk extraction coined e-walks and p-walks, which put an emphasis on the structure or the neighborhood of an entity respectively, and thereby allow for creating embeddings which focus on similarity or relatedness. By combining the walk strategies with order-aware and classic RDF2vec, as well as CBOW and skip-gram word2vec embeddings, we conduct a preliminary evaluation with a total of 12 RDF2vec variants.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Matching with Transformers in MELT
Authors:
Sven Hertling,
Jan Portisch,
Heiko Paulheim
Abstract:
One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather th…
▽ More
One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is possible. In this paper, we model the ontology matching task as classification problem and present approaches based on transformer models. We further provide an easy to use implementation in the MELT framework which is suited for ontology and knowledge graph matching. We show that a transformer-based filter helps to choose the correct correspondences given a high-recall alignment and already achieves a good result with simple alignment post-processing methods.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Putting RDF2vec in Order
Authors:
Jan Portisch,
Heiko Paulheim
Abstract:
The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on tasks where entities of different classes are invol…
▽ More
The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on tasks where entities of different classes are involved.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Background Knowledge in Schema Matching: Strategy vs. Data
Authors:
Jan Portisch,
Michael Hladik,
Heiko Paulheim
Abstract:
The use of external background knowledge can be beneficial for the task of matching schemas or ontologies automatically. In this paper, we exploit six general-purpose knowledge graphs as sources of background knowledge for the matching task. The background sources are evaluated by applying three different exploitation strategies. We find that explicit strategies still outperform latent ones and th…
▽ More
The use of external background knowledge can be beneficial for the task of matching schemas or ontologies automatically. In this paper, we exploit six general-purpose knowledge graphs as sources of background knowledge for the matching task. The background sources are evaluated by applying three different exploitation strategies. We find that explicit strategies still outperform latent ones and that the choice of the strategy has a greater impact on the final alignment than the actual background dataset on which the strategy is applied. While we could not identify a universally superior resource, BabelNet achieved consistently good results. Our best matcher configuration with BabelNet performs very competitively when compared to other matching systems even though no dataset-specific optimizations were made.
△ Less
Submitted 29 June, 2021;
originally announced July 2021.
-
FinMatcher at FinSim-2: Hypernym Detection in the Financial Services Domain using Knowledge Graphs
Authors:
Jan Portisch,
Michael Hladik,
Heiko Paulheim
Abstract:
This paper presents the FinMatcher system and its results for the FinSim 2021 shared task which is co-located with the Workshop on Financial Technology on the Web (FinWeb) in conjunction with The Web Conference. The FinSim-2 shared task consists of a set of concept labels from the financial services domain. The goal is to find the most relevant top-level concept from a given set of concepts. The F…
▽ More
This paper presents the FinMatcher system and its results for the FinSim 2021 shared task which is co-located with the Workshop on Financial Technology on the Web (FinWeb) in conjunction with The Web Conference. The FinSim-2 shared task consists of a set of concept labels from the financial services domain. The goal is to find the most relevant top-level concept from a given set of concepts. The FinMatcher system exploits three publicly available knowledge graphs, namely WordNet, Wikidata, and WebIsALOD. The graphs are used to generate explicit features as well as latent features which are fed into a neural classifier to predict the closest hypernym.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019
Authors:
Nacira Abbas,
Kholoud Alghamdi,
Mortaza Alinam,
Francesca Alloatti,
Glenda Amaral,
Claudia d'Amato,
Luigi Asprino,
Martin Beno,
Felix Bensmann,
Russa Biswas,
Ling Cai,
Riley Capshaw,
Valentina Anita Carriero,
Irene Celino,
Amine Dadoun,
Stefano De Giorgis,
Harm Delva,
John Domingue,
Michel Dumontier,
Vincent Emonet,
Marieke van Erp,
Paola Espinoza Arias,
Omaima Fallatah,
Sebastián Ferrada,
Marc Gallofré Ocaña
, et al. (49 additional authors not shown)
Abstract:
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this fur…
▽ More
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this further by asking if we can create a knowledge graph of "everything" ranging from common sense concepts to location based entities. This knowledge graph should be "open to the public" in a FAIR manner democratizing this mass amount of knowledge." Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a unique testbed for experimenting and evaluating research hypotheses on open and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing evolution and long term preservation. We want to investigate this problem, that is to understand what preserving and supporting the evolution of KGs means and how these problems can be addressed. Clearly, the problem can be approached from different perspectives and may require the development of different approaches, including new theories, ontologies, metrics, strategies, procedures, etc. This document reports a collaborative effort performed by 9 teams of students, each guided by a senior researcher as their mentor, attending the International Semantic Web Research School (ISWS 2019). Each team provides a different perspective to the problem of knowledge graph evolution substantiated by a set of research questions as the main subject of their investigation. In addition, they provide their working definition for KG preservation and evolution.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Supervised Ontology and Instance Matching with MELT
Authors:
Sven Hertling,
Jan Portisch,
Heiko Paulheim
Abstract:
In this paper, we present MELT-ML, a machine learning extension to the Matching and EvaLuation Toolkit (MELT) which facilitates the application of supervised learning for ontology and instance matching. Our contributions are twofold: We present an open source machine learning extension to the matching toolkit as well as two supervised learning use cases demonstrating the capabilities of the new ex…
▽ More
In this paper, we present MELT-ML, a machine learning extension to the Matching and EvaLuation Toolkit (MELT) which facilitates the application of supervised learning for ontology and instance matching. Our contributions are twofold: We present an open source machine learning extension to the matching toolkit as well as two supervised learning use cases demonstrating the capabilities of the new extension.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
RDF2Vec Light -- A Lightweight Approach for Knowledge Graph Embeddings
Authors:
Jan Portisch,
Michael Hladik,
Heiko Paulheim
Abstract:
Knowledge graph embedding approaches represent nodes and edges of graphs as mathematical vectors. Current approaches focus on embedding complete knowledge graphs, i.e. all nodes and edges. This leads to very high computational requirements on large graphs such as DBpedia or Wikidata. However, for most downstream application scenarios, only a small subset of concepts is of actual interest. In this…
▽ More
Knowledge graph embedding approaches represent nodes and edges of graphs as mathematical vectors. Current approaches focus on embedding complete knowledge graphs, i.e. all nodes and edges. This leads to very high computational requirements on large graphs such as DBpedia or Wikidata. However, for most downstream application scenarios, only a small subset of concepts is of actual interest. In this paper, we present RDF2Vec Light, a lightweight embedding approach based on RDF2Vec which generates vectors for only a subset of entities. To that end, RDF2Vec Light only traverses and processes a subgraph of the knowledge graph. Our method allows the application of embeddings of very large knowledge graphs in scenarios where such embeddings were not possible before due to a significantly lower runtime and significantly reduced hardware requirements.
△ Less
Submitted 17 September, 2020; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Challenges of Linking Organizational Information in Open Government Data to Knowledge Graphs
Authors:
Jan Portisch,
Omaima Fallatah,
Sebastian Neumaier,
Mohamad Yaser Jaradeh,
Axel Polleres
Abstract:
Open Government Data (OGD) is being published by various public administration organizations around the globe. Within the metadata of OGD data catalogs, the publishing organizations (1) are not uniquely and unambiguously identifiable and, even worse, (2) change over time, by public administration units being merged or restructured. In order to enable fine-grained analyses or searches on Open Gover…
▽ More
Open Government Data (OGD) is being published by various public administration organizations around the globe. Within the metadata of OGD data catalogs, the publishing organizations (1) are not uniquely and unambiguously identifiable and, even worse, (2) change over time, by public administration units being merged or restructured. In order to enable fine-grained analyses or searches on Open Government Data on the level of publishing organizations, linking those from OGD portals to publicly available knowledge graphs (KGs) such as Wikidata and DBpedia seems like an obvious solution. Still, as we show in this position paper, organization linking faces significant challenges, both in terms of available (portal) metadata and KGs in terms of data quality and completeness. We herein specifically highlight five main challenges, namely regarding (1) temporal changes in organizations and in the portal metadata, (2) lack of a base ontology for describing organizational structures and changes in public knowledge graphs, (3) metadata and KG data quality, (4) multilinguality, and (5) disambiguating public sector organizations. Based on available OGD portal metadata from the Open Data Portal Watch, we provide an in-depth analysis of these issues, make suggestions for concrete starting points on how to tackle them along with a call to the community to jointly work on these open challenges.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Visual Analysis of Ontology Matching Results with the MELT Dashboard
Authors:
Jan Portisch,
Sven Hertling,
Heiko Paulheim
Abstract:
In this demo, we introduce MELT Dashboard, an interactive Web user interface for ontology alignment evaluation which is created with the existing Matching EvaLuation Toolkit (MELT). Compared to existing, static evaluation interfaces in the ontology matching domain, our dashboard allows for interactive self-service analyses such as a drill down into the matcher performance for data type properties…
▽ More
In this demo, we introduce MELT Dashboard, an interactive Web user interface for ontology alignment evaluation which is created with the existing Matching EvaLuation Toolkit (MELT). Compared to existing, static evaluation interfaces in the ontology matching domain, our dashboard allows for interactive self-service analyses such as a drill down into the matcher performance for data type properties or into the performance of matchers within a certain confidence threshold. In addition, the dashboard offers detailed group evaluation capabilities that allow for the application in broad evaluation campaigns such as the Ontology Alignment Evaluation Initiative (OAEI).
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
KGvec2go -- Knowledge Graph Embeddings as a Service
Authors:
Jan Portisch,
Michael Hladik,
Heiko Paulheim
Abstract:
In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals t…
▽ More
In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals that the combination of multiple models can lead to a better outcome than the best individual model.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.