Search | arXiv e-print repository

Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases

Authors: Sumit Neelam, Udit Sharma, Sumit Bhatia, Hima Karanam, Ankita Likhyani, Ibrahim Abdelaziz, Achille Fokoue, L. V. Subramaniam

Abstract: Resource Description Framework (RDF) and Property Graph (PG) are the two most commonly used data models for representing, storing, and querying graph data. We present Expressive Reasoning Graph Store (ERGS) -- a graph store built on top of JanusGraph (a Property Graph store) that also allows storing and querying of RDF datasets. First, we describe how RDF data can be translated into a Property Gra… ▽ More Resource Description Framework (RDF) and Property Graph (PG) are the two most commonly used data models for representing, storing, and querying graph data. We present Expressive Reasoning Graph Store (ERGS) -- a graph store built on top of JanusGraph (a Property Graph store) that also allows storing and querying of RDF datasets. First, we describe how RDF data can be translated into a Property Graph representation and then describe a query translation module that converts SPARQL queries into a series of Gremlin traversals. The converters and translators thus developed can allow any Apache Tinkerpop compliant graph database to store and query RDF datasets. We demonstrate the effectiveness of our proposed approach using JanusGraph as the base Property Graph store and compare its performance with standard RDF systems. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 16 pages, 3 figures, 9 tables

arXiv:2203.11054 [pdf, other]

Targeted Extraction of Temporal Facts from Textual Resources for Improved Temporal Question Answering over Knowledge Bases

Authors: Nithish Kannen, Udit Sharma, Sumit Neelam, Dinesh Khandelwal, Shajith Ikbal, Hima Karanam, L Venkata Subramaniam

Abstract: Knowledge Base Question Answering (KBQA) systems have the goal of answering complex natural language questions by reasoning over relevant facts retrieved from Knowledge Bases (KB). One of the major challenges faced by these systems is their inability to retrieve all relevant facts due to factors such as incomplete KB and entity/relation linking errors. In this paper, we address this particular cha… ▽ More Knowledge Base Question Answering (KBQA) systems have the goal of answering complex natural language questions by reasoning over relevant facts retrieved from Knowledge Bases (KB). One of the major challenges faced by these systems is their inability to retrieve all relevant facts due to factors such as incomplete KB and entity/relation linking errors. In this paper, we address this particular challenge for systems handling a specific category of questions called temporal questions, where answer derivation involve reasoning over facts asserting point/intervals of time for various events. We propose a novel approach where a targeted temporal fact extraction technique is used to assist KBQA whenever it fails to retrieve temporal facts from the KB. We use $λ$-expressions of the questions to logically represent the component facts and the reasoning steps needed to derive the answer. This allows us to spot those facts that failed to get retrieved from the KB and generate textual queries to extract them from the textual resources in an open-domain question answering fashion. We evaluated our approach on a benchmark temporal question answering dataset considering Wikidata and Wikipedia respectively as the KB and textual resource. Experimental results show a significant $\sim$30\% relative improvement in answer accuracy, demonstrating the effectiveness of our approach. △ Less

Submitted 21 March, 2022; originally announced March 2022.

ACM Class: I.2.7; I.2.4

arXiv:2201.05793 [pdf, other]

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

Authors: Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-Suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme Lima, Ryan Riegel, Francois Luus, L Venkata Subramaniam

Abstract: Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-… ▽ More Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata. The TempQA-WD dataset is available at https://github.com/IBM/tempqa-wd. △ Less

Submitted 15 January, 2022; originally announced January 2022.

Comments: 7 pages, 2 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2109.13430

arXiv:2109.13430 [pdf, other]

SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases

Authors: Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-Suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme LimaRyan Riegel, Francois Luus, L Venkata Subramaniam

Abstract: Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are… ▽ More Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are specif-ically tuned to a single knowledge base. In this paper, wepresent SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types. Specifically, SYGMA contains three high levelmodules: 1) KB-agnostic question understanding module thatis common across KBs 2) Rules to support additional reason-ing types and 3) KB-specific question map** and answeringmodule to address the KB-specific aspects of the answer ex-traction. We demonstrate effectiveness of our system by evalu-ating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata. In addition, to demonstrate extensi-bility to additional reasoning types we evaluate on multi-hopreasoning datasets and a new Temporal KBQA benchmarkdataset on Wikidata, namedTempQA-WD1, introduced in thispaper. We show that our generalizable approach has bettercompetetive performance on multiple datasets on DBpediaand Wikidata that requires both multi-hop and temporal rea-soning △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:1712.01562 [pdf, other]

EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter

Authors: Kuntal Dey, Ritvik Shrivastava, Saroj Kaushik, L. Venkata Subramaniam

Abstract: The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that buil… ▽ More The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that builds on the top of the concept of word embedding. The training phase comprises of learning word vectors associated with each hashtag, and deriving a word embedding for each hashtag. We provide two training procedures, one in which each hashtag is trained with a separate word embedding model applicable in the context of that hashtag, and another in which each hashtag obtains its embedding from a global context. The testing phase constitutes computing the average word embedding of the test post, and finding the similarity of this embedding with the known embeddings of the hashtags. The tweets that contain the most-similar hashtag are extracted, and all the hashtags that appear in these tweets are ranked in terms of embedding similarity scores. The top-K hashtags that appear in this ranked list, are recommended for the given test post. Our system produces F1 score of 50.83%, improving over the LDA baseline by around 6.53 times, outperforming the best-performing system known in the literature that provides a lift of 6.42 times. EmTaggeR is a fast, scalable and lightweight system, which makes it practical to deploy in real-life applications. △ Less

Submitted 5 December, 2017; originally announced December 2017.

Comments: Accepted at the IEEE International Conference on Data Mining (ICDM) 2017 ACUMEN Workshop

arXiv:1706.00921 [pdf, ps, other]

Literature Survey on Interplay of Topics, Information Diffusion and Connections on Social Networks

Authors: Kuntal Dey, Saroj Kaushik, L. Venkata Subramaniam

Abstract: Researchers have attempted to model information diffusion and topic trends and lifecycle on online social networks. They have investigated the role of content, social connections and communities, familiarity and behavioral similarity in this context. The current article presents a survey of representative models that perform topic analysis, capture information diffusion, and explore the properties… ▽ More Researchers have attempted to model information diffusion and topic trends and lifecycle on online social networks. They have investigated the role of content, social connections and communities, familiarity and behavioral similarity in this context. The current article presents a survey of representative models that perform topic analysis, capture information diffusion, and explore the properties of social connections in the context of online social networks. The article concludes with a set of outlines of open problems and possible directions of future research interest. This article is intended for researchers to identify the current literature, and explore possibilities to improve the art. △ Less

Submitted 3 June, 2017; originally announced June 2017.

arXiv:1405.1392 [pdf, other]

From Tweets to Events: Exploring a Scalable Solution for Twitter Streams

Authors: Shamanth Kumar, Huan Liu, Sameep Mehta, L. Venkata Subramaniam

Abstract: The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However,… ▽ More The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high volume and high velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams. △ Less

Submitted 6 May, 2014; originally announced May 2014.

Showing 1–7 of 7 results for author: Subramaniam, L V