Skip to main content

Showing 1–45 of 45 results for author: Razniewski, S

.
  1. arXiv:2405.02732  [pdf, other

    cs.CL cs.IR

    Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

    Authors: Sneha Singhania, Simon Razniewski, Gerhard Weikum

    Abstract: Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X met… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  2. arXiv:2402.10689  [pdf, other

    cs.CL

    Multi-Cultural Commonsense Knowledge Distillation

    Authors: Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum

    Abstract: Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents MANGO, a methodology for distilling high-accuracy, high-recall assertions of cultural knowledge. We judiciously and iteratively prompt LLMs for this purpose from two entry points, concepts and cultures. Outputs are conso… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 20 pages, 5 figures, 13 tables

  3. arXiv:2312.06338  [pdf, other

    cs.CL cs.AI

    BoschAI @ Causal News Corpus 2023: Robust Cause-Effect Span Extraction using Multi-Layer Sequence Tagging and Data Augmentation

    Authors: Timo Pierre Schrader, Simon Razniewski, Lukas Lange, Annemarie Friedrich

    Abstract: Understanding causality is a core aspect of intelligence. The Event Causality Identification with Causal News Corpus Shared Task addresses two aspects of this challenge: Subtask 1 aims at detecting causal relationships in texts, and Subtask 2 requires identifying signal words and the spans that refer to the cause or effect, respectively. Our system, which is based on pre-trained transformers, stac… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 6 pages, 6 tables, 1 figure, published in "Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text"

  4. arXiv:2311.01907  [pdf, other

    cs.CL

    BoschAI @ PLABA 2023: Leveraging Edit Operations in End-to-End Neural Sentence Simplification

    Authors: Valentin Knappich, Simon Razniewski, Annemarie Friedrich

    Abstract: Automatic simplification can help laypeople to comprehend complex scientific text. Language models are frequently applied to this task by translating from complex to simple language. In this paper, we describe our system based on Llama 2, which ranked first in the PLABA shared task addressing the simplification of biomedical text. We find that the large portion of shared tokens between input and o… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

  5. arXiv:2310.14771  [pdf, other

    cs.CL cs.AI

    Evaluating the Knowledge Base Completion Potential of GPT

    Authors: Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo, Gerhard Weikum

    Abstract: Structured knowledge bases (KBs) are an asset for search engines and other applications, but are inevitably incomplete. Language models (LMs) have been proposed for unsupervised knowledge base completion (KBC), yet, their ability to do this at scale and with high accuracy remains an open question. Prior experimental studies mostly fall short because they only evaluate on popular subjects, or sampl… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 12 pages 4 tables

    Journal ref: Findings of EMNLP 2023

  6. arXiv:2308.06374  [pdf, other

    cs.AI cs.CL

    Large Language Models and Knowledge Graphs: Opportunities and Challenges

    Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, Damien Graux

    Abstract: Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 30 pages

  7. arXiv:2307.03122  [pdf, other

    cs.CL

    Extracting Multi-valued Relations from Language Models

    Authors: Sneha Singhania, Simon Razniewski, Gerhard Weikum

    Abstract: The widespread usage of latent language representations via pre-trained language models (LMs) suggests that they are a promising source of structured knowledge. However, existing methods focus only on a single object per subject-relation pair, even though often multiple objects are correct. To overcome this limitation, we analyze these representations for their potential to yield materialized mult… ▽ More

    Submitted 7 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted to Repl4NLP Workshop at ACL 2023

  8. arXiv:2306.17472  [pdf, other

    cs.CL

    Knowledge Base Completion for Long-Tail Entities

    Authors: Lihu Chen, Simon Razniewski, Gerhard Weikum

    Abstract: Despite their impressive scale, knowledge bases (KBs), such as Wikidata, still contain significant gaps. Language models (LMs) have been proposed as a source for filling these gaps. However, prior works have focused on prominent entities with rich coverage by LMs, neglecting the crucial case of long-tail entities. In this paper, we present a novel method for LM-based-KB completion that is specific… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: In ACL23 (MATCHING workshop)

  9. arXiv:2306.12766  [pdf, other

    cs.CL

    Map** and Cleaning Open Commonsense Knowledge Bases with Generative Translation

    Authors: Julien Romero, Simon Razniewski

    Abstract: Structured knowledge bases (KBs) are the backbone of many know\-ledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Beside… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  10. arXiv:2305.16755  [pdf, other

    cs.CL cs.AI

    Can large language models generate salient negative statements?

    Authors: Hiba Arnaout, Simon Razniewski

    Abstract: We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation, i.e., pattern-based textual extractions and knowledge-graph-based inferences, as well as… ▽ More

    Submitted 21 September, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: For data, see https://www.mpi-inf.mpg.de/fileadmin/inf/d5/research/negation_in_KBs/data.csv

  11. arXiv:2305.05403  [pdf, other

    cs.AI cs.CL cs.DB cs.DL

    Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey

    Authors: Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, Fabian Suchanek

    Abstract: General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric AI. Many of them are constructed pragmatically from Web sources, and are thus far from complete. This poses challenges for the consumption as well as the curation of their content. While several surveys target the problem of completing incomplete KBs, the first problem is arguably to know whether and where the KB is incom… ▽ More

    Submitted 6 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 42 pages, 8 figures, 5 tables

    Journal ref: Under review, 2022

  12. arXiv:2303.11082  [pdf, other

    cs.CL cs.AI

    Evaluating Language Models for Knowledge Base Completion

    Authors: Blerta Veseli, Sneha Singhania, Simon Razniewski, Gerhard Weikum

    Abstract: Structured knowledge bases (KBs) are a foundation of many intelligent applications, yet are notoriously incomplete. Language models (LMs) have recently been proposed for unsupervised knowledge base completion (KBC), yet, despite encouraging initial results, questions regarding their suitability remain open. Existing evaluations often fall short because they only evaluate on popular subjects, or sa… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Data and code available at https://github.com/bveseli/LMsForKBC

    Journal ref: ESWC 2023

  13. arXiv:2303.09189  [pdf, other

    cs.SI cs.CY

    Wiki-based Communities of Interest: Demographics and Outliers

    Authors: Hiba Arnaout, Simon Razniewski, Jeff Z. Pan

    Abstract: In this paper, we release data about demographic information and outliers of communities of interest. Identified from Wiki-based sources, mainly Wikidata, the data covers 7.5k communities, such as members of the White House Coronavirus Task Force, and 345k subjects, e.g., Deborah Birx. We describe the statistical inference methodology adopted to mine such data. We release subject-centric and group… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted to ICWSM 2023. For demo, see https://wikiknowledge.onrender.com/demographics/ and for dataset see https://doi.org/10.5281/zenodo.7410436

  14. arXiv:2303.04532  [pdf, ps, other

    cs.IR cs.AI

    Class Cardinality Comparison as a Fermi Problem

    Authors: Shrestha Ghosh, Simon Razniewski, Gerhard Weikum

    Abstract: Questions on class cardinality comparisons are quite tricky to answer and come with its own challenges. They require some kind of reasoning since web documents and knowledge bases, indispensable sources of information, rarely store direct answers to questions, such as, ``Are there more astronauts or Physics Nobel Laureates?'' We tackle questions on class cardinality comparison by tap** into thre… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted to the Web Conference 2023

  15. arXiv:2211.00989  [pdf, other

    cs.AI cs.DB

    How Stable is Knowledge Base Knowledge?

    Authors: Suhas Shrinivasan, Simon Razniewski

    Abstract: Knowledge Bases (KBs) provide structured representation of the real-world in the form of extensive collections of facts about real-world entities, their properties and relationships. They are ubiquitous in large-scale intelligent systems that exploit structured information such as in tasks like structured search, question answering and reasoning, and hence their data quality becomes paramount. The… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Incomplete draft. 12 pages

  16. Extracting Cultural Commonsense Knowledge at Scale

    Authors: Tuan-Phong Nguyen, Simon Razniewski, Aparna Varde, Gerhard Weikum

    Abstract: Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents CANDLE, an end-to-end methodology for extracting hi… ▽ More

    Submitted 10 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 11 pages, 6 figures, 10 tables

    Journal ref: ACM Web Conference 2023

  17. arXiv:2210.04530  [pdf, other

    cs.CL cs.AI

    Do Children Texts Hold The Key To Commonsense Knowledge?

    Authors: Julien Romero, Simon Razniewski

    Abstract: Compiling comprehensive repositories of commonsense knowledge is a long-standing problem in AI. Many concerns revolve around the issue of reporting bias, i.e., that frequency in text sources is not a good proxy for relevance or truth. This paper explores whether children's texts hold the key to commonsense knowledge compilation, based on the hypothesis that such content makes fewer assumptions on… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 6 pages, 10 tables

    Journal ref: EMNLP 2022

  18. Answering Count Questions with Structured Answers from Text

    Authors: Shrestha Ghosh, Simon Razniewski, Gerhard Weikum

    Abstract: In this work we address the challenging case of answering count queries in web search, such as ``number of songs by John Lennon''. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unl… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2204.05039

  19. UnCommonSense: Informative Negative Knowledge about Everyday Concepts

    Authors: Hiba Arnaout, Simon Razniewski, Gerhard Weikum, Jeff Z. Pan

    Abstract: Commonsense knowledge about everyday concepts is an important asset for AI applications, such as question answering and chatbots. Recently, we have seen an increasing interest in the construction of structured commonsense knowledge bases (CSKBs). An important part of human commonsense is about properties that do not apply to concepts, yet existing CSKBs only store positive statements. Moreover, si… ▽ More

    Submitted 5 September, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

  20. Answering Count Queries with Explanatory Evidence

    Authors: Shrestha Ghosh, Simon Razniewski, Gerhard Weikum

    Abstract: A challenging case in web search and question answering are count queries, such as \textit{"number of songs by John Lennon"}. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike p… ▽ More

    Submitted 30 August, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Version published at SIGIR 2022

  21. Materialized Knowledge Bases from Commonsense Transformers

    Authors: Tuan-Phong Nguyen, Simon Razniewski

    Abstract: Starting from the COMET methodology by Bosselut et al. (2019), generating commonsense knowledge directly from pre-trained language models has recently received significant attention. Surprisingly, up to now no materialized resource of commonsense knowledge generated this way is publicly available. This paper fills this gap, and uses the materialized resources to perform a detailed analysis of the… ▽ More

    Submitted 14 April, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: 7 pages, accepted to CSRR workshop @ ACL 2022

    Journal ref: Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)

  22. Refined Commonsense Knowledge from Large-Scale Web Contents

    Authors: Tuan-Phong Nguyen, Simon Razniewski, Julien Romero, Gerhard Weikum

    Abstract: Commonsense knowledge (CSK) about concepts and their properties is helpful for AI applications. Prior works, such as ConceptNet, have compiled large CSK collections. However, they are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O. This paper presents a method called ASCENT++ to automatically build a large-scale knowl… ▽ More

    Submitted 23 June, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: This is a substantial extension of the previous WWW paper: arXiv:2011.00905

    Journal ref: IEEE Transactions on Knowledge and Data Engineering, 2022

  23. arXiv:2111.13611  [pdf, other

    cs.CL cs.AI

    Predicting Document Coverage for Relation Extraction

    Authors: Sneha Singhania, Simon Razniewski, Gerhard Weikum

    Abstract: This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze t… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: To appear in TACL. The arXiv version is a pre-MIT Press publication version

  24. arXiv:2110.04888  [pdf, other

    cs.CL cs.AI cs.DB

    Language Models As or For Knowledge Bases

    Authors: Simon Razniewski, Andrew Yates, Nora Kassner, Gerhard Weikum

    Abstract: Pre-trained language models (LMs) have recently gained attention for their potential as an alternative to (or proxy for) explicit knowledge bases (KBs). In this position paper, we examine this hypothesis, identify strengths and limitations of both LMs and KBs, and discuss the complementary nature of the two paradigms. In particular, we offer qualitative arguments that latent LMs are not suitable a… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Journal ref: DL4KG 2021

  25. Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering

    Authors: Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum

    Abstract: ASCENT is a fully automated methodology for extracting and consolidating commonsense assertions from web contents (Nguyen et al., WWW 2021). It advances traditional triple-based commonsense knowledge representation by capturing semantic facets like locations and purposes, and composite concepts, i.e., subgroups and related aspects of subjects. In this demo, we present a web portal that allows user… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Demo website: https://ascent.mpi-inf.mpg.de; introductory video: https://youtu.be/qMkJXqu_Yd4

    Journal ref: ACL 2021 system demonstration

  26. arXiv:2105.01925  [pdf, ps, other

    cs.AI cs.CL cs.DB

    Commonsense Knowledge Base Construction in the Age of Big Data

    Authors: Simon Razniewski

    Abstract: Compiling commonsense knowledge is traditionally an AI topic approached by manual labor. Recent advances in web data processing have enabled automated approaches. In this demonstration we will showcase three systems for automated commonsense knowledge base construction, highlighting each time one aspect of specific interest to the data management community. (i) We use Quasimodo to illustrate knowl… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: Manuscript for the cancelled BTW 2021 demo track

  27. Advanced Semantics for Commonsense Knowledge Extraction

    Authors: Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum

    Abstract: Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precisio… ▽ More

    Submitted 25 October, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 12 pages, 3 figures, 11 tables

    Journal ref: Proceedings of the Web Conference 2021 (WWW '21)

  28. arXiv:2009.11564  [pdf, other

    cs.AI cs.DB

    Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

    Authors: Gerhard Weikum, Luna Dong, Simon Razniewski, Fabian Suchanek

    Abstract: Equip** machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpre… ▽ More

    Submitted 22 March, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

    Comments: Submitted to Foundations and Trends in Databases

    Journal ref: Foundations and Trends in Databases, 2021

  29. arXiv:2009.09049  [pdf, other

    cs.HC cs.CY cs.DL

    Examining the Impact of Algorithm Awareness on Wikidata's Recommender System Recoin

    Authors: Jesse Josua Benjamin, Claudia Müller-Birn, Simon Razniewski

    Abstract: The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shoppi… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: 10 pages, 7 figures

  30. arXiv:2005.05886  [pdf, ps, other

    cs.DB cs.AI

    Counting Query Answers over a DL-Lite Knowledge Base (extended version)

    Authors: Diego Calvanese, Julien Corman, Davide Lanti, Simon Razniewski

    Abstract: Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA… ▽ More

    Submitted 17 July, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Extended version of an article published at IJCAI 2020

  31. arXiv:2005.03529  [pdf, other

    cs.IR cs.AI cs.DB

    CounQER: A System for Discovering and Linking Count Information in Knowledge Bases

    Authors: Shrestha Ghosh, Simon Razniewski, Gerhard Weikum

    Abstract: Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata, DBpedia and Freebase are often limited to subproperty, domain and range constraints. In this demo we showcase CounQER, a system that illustrates the alignment of counting predicates, like staffSize, and enumerating predicates, like workInstitution^{-1} . In the demonstration session, attendees can inspect these alignment… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted at ESWC 2020

  32. arXiv:2003.03155  [pdf, other

    cs.DB cs.IR

    Uncovering Hidden Semantics of Set Information in Knowledge Bases

    Authors: Shrestha Ghosh, Simon Razniewski, Gerhard Weikum

    Abstract: Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicat… ▽ More

    Submitted 26 March, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: This work is under review in the Journal of Web Semantics, Special Issue on Language Technology and Knowledge Graphs. This is a revision draft

  33. arXiv:2001.04425  [pdf, other

    cs.IR cs.AI cs.CL cs.DB

    Negative Statements Considered Useful

    Authors: Hiba Arnaout, Simon Razniewski, Gerhard Weikum, Jeff Z. Pan

    Abstract: Knowledge bases (KBs) about notable entities and their properties are an important asset in applications such as search, question answering and dialogue. All popular KBs capture virtually only positive statements, and abstain from taking any stance on statements not stored in the KB. This paper makes the case for explicitly stating salient statements that do not hold. Negative statements are usefu… ▽ More

    Submitted 25 September, 2021; v1 submitted 13 January, 2020; originally announced January 2020.

    Journal ref: Journal of Web Semantics (JWS), Volume 71, 2021

  34. arXiv:2001.04170  [pdf, other

    cs.CL cs.AI cs.IR

    Joint Reasoning for Multi-Faceted Commonsense Knowledge

    Authors: Yohan Chalier, Simon Razniewski, Gerhard Weikum

    Abstract: Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative meas… ▽ More

    Submitted 4 May, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: 11 pages

    Journal ref: AKBC 2020

  35. arXiv:1909.00692  [pdf, other

    cs.CL

    Story-oriented Image Selection and Placement

    Authors: Sreyasi Nag Chowdhury, Simon Razniewski, Gerhard Weikum

    Abstract: Multimodal contents have become commonplace on the Internet today, manifested as news articles, social media posts, and personal or business blog posts. Among the various kinds of media (images, videos, graphics, icons, audio) used in such multimodal stories, images are the most popular. The selection of images from a collection - either author's personal photo album, or web repositories - and the… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

  36. arXiv:1905.10989  [pdf, other

    cs.CL cs.AI cs.DB

    Commonsense Properties from Query Logs and Question Answering Forums

    Authors: Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, Gerhard Weikum

    Abstract: Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tap** int… ▽ More

    Submitted 10 February, 2021; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Updated appendix reporting on Quasimodo v4.3 (2/2021)

    Journal ref: CIKM 2019

  37. arXiv:1901.10263  [pdf, other

    cs.CL cs.AI cs.IR

    TiFi: Taxonomy Induction for Fictional Domains [Extended version]

    Authors: Cuong Xuan Chu, Simon Razniewski, Gerhard Weikum

    Abstract: Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipe… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: Extended version of The Web Conference 2019 paper

  38. arXiv:1807.03656  [pdf, other

    cs.CL

    Enriching Knowledge Bases with Counting Quantifiers

    Authors: Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum

    Abstract: Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can he… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: 16 pages, The 17th International Semantic Web Conference (ISWC 2018)

  39. arXiv:1709.06907  [pdf, other

    cs.IR cs.AI cs.CL cs.DB

    Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties [Extended Version]

    Authors: Simon Razniewski, Vevake Balaraman, Werner Nutt

    Abstract: In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language gener… ▽ More

    Submitted 20 September, 2017; originally announced September 2017.

    Comments: Extended version of an ADMA 2017 conference paper

  40. arXiv:1704.04455  [pdf, ps, other

    cs.CL

    Cardinal Virtues: Extracting Relation Cardinalities from Text

    Authors: Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum

    Abstract: Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting car… ▽ More

    Submitted 26 May, 2017; v1 submitted 14 April, 2017; originally announced April 2017.

    Comments: 5 pages, ACL 2017 (short paper)

  41. Predicting Completeness in Knowledge Bases

    Authors: Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek

    Abstract: Knowledge bases such as Wikidata, DBpedia, or YAGO contain millions of entities and facts. In some knowledge bases, the correctness of these facts has been evaluated. However, much less is known about their completeness, i.e., the proportion of real facts that the knowledge bases cover. In this work, we investigate different signals to identify the areas where a knowledge base is complete. We show… ▽ More

    Submitted 17 December, 2016; originally announced December 2016.

    Comments: 21 pages, 19 references, 1 figure, 5 tables. Complete version of the article accepted at WSDM'17

  42. arXiv:1604.08377  [pdf, other

    cs.DB

    Enabling Fine-grained RDF Data Completeness Assessment

    Authors: Fariz Darari, Simon Razniewski, Radityo Eko Prasojo, Werner Nutt

    Abstract: Nowadays, more and more RDF data is becoming available on the Semantic Web. While the Semantic Web is generally incomplete by nature, on certain topics, it already contains complete information and thus, queries may return all answers that exist in reality. In this paper we develop a technique to check query completeness based on RDF data annotated with completeness information, taking into accoun… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

    Comments: This is a preprint version of a paper published in the Proceedings of the 16th International Conference on Web Engineering (ICWE 2016)

  43. arXiv:1411.2855  [pdf, other

    cs.DB

    Query-driven Data Completeness Management (PhD Thesis)

    Authors: Simon Razniewski

    Abstract: Knowledge about data completeness is essentially in data-supported decision making. In this thesis we present a framework for metadata-based assessment of database completeness. We discuss how to express information about data completeness and how to use such information to draw conclusions about the completeness of query answers. In particular, we introduce formalisms for stating completeness for… ▽ More

    Submitted 1 April, 2015; v1 submitted 11 November, 2014; originally announced November 2014.

    Comments: Change to previous version: Fixed the date on the title page

  44. arXiv:1408.6395  [pdf, ps, other

    cs.DB

    Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements [Extended Version]

    Authors: Fariz Darari, Simon Razniewski, Werner Nutt

    Abstract: RDF data is often treated as incomplete, following the Open-World Assumption. On the other hand, SPARQL, the standard query language over RDF, usually follows the Closed-World Assumption, assuming RDF data to be complete. This gives rise to a semantic gap between RDF and SPARQL. In this paper, we address how to close the semantic gap between RDF and SPARQL in terms of certain answers and possible… ▽ More

    Submitted 3 September, 2014; v1 submitted 27 August, 2014; originally announced August 2014.

    Comments: This paper is an extended version with proofs of a poster paper at ISWC 2014

  45. arXiv:1306.1689  [pdf, other

    cs.DB

    Verification of Query Completeness over Processes [Extended Version]

    Authors: Simon Razniewski, Marco Montali, Werner Nutt

    Abstract: Data completeness is an essential aspect of data quality, and has in turn a huge impact on the effective management of companies. For example, statistics are computed and audits are conducted in companies by implicitly placing the strong assumption that the analysed data are complete. In this work, we are interested in studying the problem of completeness of data produced by business processes, to… ▽ More

    Submitted 7 June, 2013; originally announced June 2013.

    Comments: Extended version of a paper that was submitted to BPM 2013