Search | arXiv e-print repository

arXiv:2005.03640 [pdf, other]

Where is Linked Data in Question Answering over Linked Data?

Authors: Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Moussallem, Gustavo Publio, Muhammad Saleem

Abstract: We argue that "Question Answering with Knowledge Base" and "Question Answering over Linked Data" are currently two instances of the same problem, despite one explicitly declares to deal with Linked Data. We point out the lack of existing methods to evaluate question answering on datasets which exploit external links to the rest of the cloud or share common schema. To this end, we propose the creat… ▽ More We argue that "Question Answering with Knowledge Base" and "Question Answering over Linked Data" are currently two instances of the same problem, despite one explicitly declares to deal with Linked Data. We point out the lack of existing methods to evaluate question answering on datasets which exploit external links to the rest of the cloud or share common schema. To this end, we propose the creation of new evaluation settings to leverage the advantages of the Semantic Web to achieve AI-complete question answering. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: Position paper, THE Workshop @ ISWC 2018

MSC Class: 68T99 ACM Class: I.2.7

arXiv:1806.10478 [pdf, other]

Neural Machine Translation for Query Construction and Composition

Authors: Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Esteves, Diego Moussallem, Gustavo Publio

Abstract: Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-a… ▽ More Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-answer pairs, we expect a semi-supervised approach, where alignments between questions and queries are built through templates. We argue that the coverage of language utterances can be expanded using late notable works in natural language generation. △ Less

Submitted 9 July, 2018; v1 submitted 27 June, 2018; originally announced June 2018.

Comments: ICML workshop on Neural Abstract Machines & Program Induction v2 (NAMPI), extended abstract

MSC Class: 68T99 ACM Class: I.2.6; I.2.7

arXiv:1803.07828 [pdf, other]

Expeditious Generation of Knowledge Graph Embeddings

Authors: Tommaso Soru, Stefano Ruberto, Diego Moussallem, André Valdestilhas, Alexander Bigerl, Edgard Marx, Diego Esteves

Abstract: Knowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large k… ▽ More Knowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large knowledge bases without needing state-of-the-art computational resources. In this paper, we propose KG2Vec, a simple and fast approach to Knowledge Graph Embedding based on the skip-gram model. Instead of using a predefined scoring function, we learn it relying on Long Short-Term Memories. We show that our embeddings achieve results comparable with the most scalable approaches on knowledge graph completion as well as on a new metric. Yet, KG2Vec can embed large graphs in lesser time by processing more than 250 million triples in less than 7 hours on common hardware. △ Less

Submitted 9 November, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: Submitted to the Archives of Data Science, Series A; 14 pages

ACM Class: I.2.4; I.2.6

arXiv:1802.03638 [pdf, other]

Beyond Markov Logic: Efficient Mining of Prediction Rules in Large Graphs

Authors: Tommaso Soru, André Valdestilhas, Edgard Marx, Axel-Cyrille Ngonga Ngomo

Abstract: Graph representations of large knowledge bases may comprise billions of edges. Usually built upon human-generated ontologies, several knowledge bases do not feature declared ontological rules and are far from being complete. Current rule mining approaches rely on schemata or store the graph in-memory, which can be unfeasible for large graphs. In this paper, we introduce HornConcerto, an algorithm… ▽ More Graph representations of large knowledge bases may comprise billions of edges. Usually built upon human-generated ontologies, several knowledge bases do not feature declared ontological rules and are far from being complete. Current rule mining approaches rely on schemata or store the graph in-memory, which can be unfeasible for large graphs. In this paper, we introduce HornConcerto, an algorithm to discover Horn clauses in large graphs without the need of a schema. Using a standard fact-based confidence score, we can mine close Horn rules having an arbitrary body size. We show that our method can outperform existing approaches in terms of runtime and memory consumption and mine high-quality rules for the link prediction task, achieving state-of-the-art results on a widely-used benchmark. Moreover, we find that rules alone can perform inference significantly faster than embedding-based methods and achieve accuracies on link prediction comparable to resource-demanding approaches such as Markov Logic Networks. △ Less

Submitted 13 February, 2018; v1 submitted 10 February, 2018; originally announced February 2018.

Comments: 13 pages, 4 figures

ACM Class: G.3.8; E.1.3

arXiv:1712.08352 [pdf]

Triple Scoring Using a Hybrid Fact Validation Approach - The Catsear Triple Scorer at WSDM Cup 2017

Authors: Edgard Marx, Tommaso Soru, André Valdestilhas

Abstract: With the continuous increase of data daily published in knowledge bases across the Web, one of the main issues is regarding information relevance. In most knowledge bases, a triple (i.e., a statement composed by subject, predicate, and object) can be only true or false. However, triples can be assigned a score to have information sorted by relevance. In this work, we describe the participation of… ▽ More With the continuous increase of data daily published in knowledge bases across the Web, one of the main issues is regarding information relevance. In most knowledge bases, a triple (i.e., a statement composed by subject, predicate, and object) can be only true or false. However, triples can be assigned a score to have information sorted by relevance. In this work, we describe the participation of the Catsear team in the Triple Scoring Challenge at the WSDM Cup 2017. The Catsear approach scores triples by combining the answers coming from three different sources using a linear regression classifier. We show how our approach achieved an Accuracy2 value of 79.58% and the overall 4th place. △ Less

Submitted 22 December, 2017; originally announced December 2017.

Comments: Triple Scorer at WSDM Cup 2017, see arXiv:1712.08081

ACM Class: H.3

arXiv:1708.07624 other]

SPARQL as a Foreign Language

Authors: Tommaso Soru, Edgard Marx, Diego Moussallem, Gustavo Publio, André Valdestilhas, Diego Esteves, Ciro Baron Neto

Abstract: In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on han… ▽ More In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve state-of-the-art results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, end-to-end deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition. △ Less

Submitted 5 May, 2020; v1 submitted 25 August, 2017; originally announced August 2017.

Comments: SEMANTiCS 2017; 13th International Conference on Semantic Systems, 2017

MSC Class: 68T99 ACM Class: I.2.6; I.2.7

Journal ref: SEMANTiCS CEUR Workshop Proceedings 2044 (2017) Paper 14

Showing 1–6 of 6 results for author: Valdestilhas, A