-
TempTabQA: Temporal Question Answering for Semi-Structured Tables
Authors:
Vivek Gupta,
Pranshu Kandoi,
Mahek Bhavesh Vora,
Shuo Zhang,
Yujie He,
Ridho Reinanda,
Vivek Srikumar
Abstract:
Semi-structured data, such as Infobox tables, often include temporal information about entities, either implicitly or explicitly. Can current NLP systems reason about such information in semi-structured tables? To tackle this question, we introduce the task of temporal question answering on semi-structured tables. We present a dataset, TempTabQA, which comprises 11,454 question-answer pairs extrac…
▽ More
Semi-structured data, such as Infobox tables, often include temporal information about entities, either implicitly or explicitly. Can current NLP systems reason about such information in semi-structured tables? To tackle this question, we introduce the task of temporal question answering on semi-structured tables. We present a dataset, TempTabQA, which comprises 11,454 question-answer pairs extracted from 1,208 Wikipedia Infobox tables spanning more than 90 distinct domains. Using this dataset, we evaluate several state-of-the-art models for temporal reasoning. We observe that even the top-performing LLMs lag behind human performance by more than 13.5 F1 points. Given these results, our dataset has the potential to serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Novel Entity Discovery from Web Tables
Authors:
Shuo Zhang,
Edgar Meij,
Krisztian Balog,
Ridho Reinanda
Abstract:
When working with any sort of knowledge base (KB) one has to make sure it is as complete and also as up-to-date as possible. Both tasks are non-trivial as they require recall-oriented efforts to determine which entities and relationships are missing from the KB. As such they require a significant amount of labor. Tables on the Web, on the other hand, are abundant and have the distinct potential to…
▽ More
When working with any sort of knowledge base (KB) one has to make sure it is as complete and also as up-to-date as possible. Both tasks are non-trivial as they require recall-oriented efforts to determine which entities and relationships are missing from the KB. As such they require a significant amount of labor. Tables on the Web, on the other hand, are abundant and have the distinct potential to assist with these tasks. In particular, we can leverage the content in such tables to discover new entities, properties, and relationships. Because web tables typically only contain raw textual content we first need to determine which cells refer to which known entities---a task we dub table-to-KB matching. This first task aims to infer table semantics by linking table cells and heading columns to elements of a KB. Then second task builds upon these linked entities and properties to not only identify novel ones in the same table but also to bootstrap their type and additional relationships. We refer to this process as novel entity discovery and, to the best of our knowledge, it is the first endeavor on mining the unlinked cells in web tables. Our method identifies not only out-of-KB (``novel'') information but also novel aliases for in-KB (``known'') entities. When evaluated using three purpose-built test collections, we find that our proposed approaches obtain a marked improvement in terms of precision over our baselines whilst kee** recall stable.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Weakly-supervised Contextualization of Knowledge Graph Facts
Authors:
Nikos Voskarides,
Edgar Meij,
Ridho Reinanda,
Abhinav Khaitan,
Miles Osborne,
Giorgio Stefanoni,
Prabhanjan Kambadur,
Maarten de Rijke
Abstract:
Knowledge graphs (KGs) model facts about the world, they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and suppo…
▽ More
Knowledge graphs (KGs) model facts about the world, they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and support exploratory information needs. KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts. The task is challenging because of the large size of KGs, discovering other relevant facts even in a small neighborhood of the given fact results in an enormous amount of candidates. We introduce a neural fact contextualization method (NFCM) to address the KG fact contextualization task. NFCM first generates a set of candidate facts in the neighborhood of a given fact and then ranks the candidate facts using a supervised learning to rank model. The ranking model combines features that we automatically learn from data and that represent the query-candidate facts with a set of hand-crafted features we devised or adjusted for this task. In order to obtain the annotations required to train the learning to rank model at scale, we generate training data automatically using distant supervision on a large entity-tagged text corpus. We show that ranking functions learned on this data are effective at contextualizing KG facts. Evaluation using human assessors shows that it significantly outperforms several competitive baselines.
△ Less
Submitted 8 July, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Document Filtering for Long-tail Entities
Authors:
Ridho Reinanda,
Edgar Meij,
Maarten de Rijke
Abstract:
Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on a…
▽ More
Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. It is based on intrinsic features, i.e., features that are derived from the documents in which the entities are mentioned. We propose a set of features that capture informativeness, entity-saliency, and timeliness. In particular, we introduce features based on entity aspect similarities, relation patterns, and temporal expressions and combine these with standard features for document filtering. Experiments following the TREC KBA 2014 setup on a publicly available dataset show that our model is able to improve the filtering performance for long-tail entities over several baselines. Results of applying the model to unseen entities are promising, indicating that the model is able to learn the general characteristics of a vital document. The overall performance across all entities---i.e., not just long-tail entities---improves upon the state-of-the-art without depending on any entity-specific training data.
△ Less
Submitted 14 September, 2016;
originally announced September 2016.
-
Dynamics of Media Attention
Authors:
V. A. Traag,
R. Reinanda,
J. Hicks,
G. van Klinken
Abstract:
Studies of human attention dynamics analyses how attention is focused on specific topics, issues or people. In online social media, there are clear signs of exogenous shocks, bursty dynamics, and an exponential or powerlaw lifetime distribution. We here analyse the attention dynamics of traditional media, focussing on co-occurrence of people in newspaper articles. The results are quite different f…
▽ More
Studies of human attention dynamics analyses how attention is focused on specific topics, issues or people. In online social media, there are clear signs of exogenous shocks, bursty dynamics, and an exponential or powerlaw lifetime distribution. We here analyse the attention dynamics of traditional media, focussing on co-occurrence of people in newspaper articles. The results are quite different from online social networks and attention. Different regimes seem to be operating at two different time scales. At short time scales we see evidence of bursty dynamics and fast decaying edge lifetimes and attention. This behaviour disappears for longer time scales, and in that regime we find Poissonian dynamics and slower decaying lifetimes. We propose that a cascading Poisson process may take place, with issues arising at a constant rate over a long time scale, and faster dynamics at a shorter time scale.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.
-
Structure of a media co-occurrence network
Authors:
V. A. Traag,
R. Reinanda,
G. van Klinken
Abstract:
Social networks have been of much interest in recent years. We here focus on a network structure derived from co-occurrences of people in traditional newspaper media. We find three clear deviations from what can be expected in a random graph. First, the average degree in the empirical network is much lower than expected, and the average weight of a link much higher than expected. Secondly, high de…
▽ More
Social networks have been of much interest in recent years. We here focus on a network structure derived from co-occurrences of people in traditional newspaper media. We find three clear deviations from what can be expected in a random graph. First, the average degree in the empirical network is much lower than expected, and the average weight of a link much higher than expected. Secondly, high degree nodes attract disproportionately much weight. Thirdly, relatively much of the weight seems to concentrate between high degree nodes. We believe this can be explained by the fact that most people tend to co-occur repeatedly with the same people. We create a model that replicates these observations qualitatively based on two self-reinforcing processes: (1) more frequently occurring persons are more likely to occur again; and (2) if two people co-occur frequently, they are more likely to co-occur again. This suggest that the media tends to focus on people that are already in the news, and that they reinforce existing co-occurrences.
△ Less
Submitted 13 July, 2016; v1 submitted 5 September, 2014;
originally announced September 2014.