Search | arXiv e-print repository

arXiv:2407.02045 [pdf, ps, other]

Online Unbounded Knapsack

Authors: Hans-Joachim Böckenhauer, Matthias Gehnen, Juraj Hromkovič, Ralf Klasing, Dennis Komm, Henri Lotze, Daniel Mock, Peter Rossmanith, Moritz Stocker

Abstract: We analyze the competitive ratio and the advice complexity of the online unbounded knapsack problem. An instance is given as a sequence of n items with a size and a value each, and an algorithm has to decide how often to pack each item into a knapsack of bounded capacity. The items are given online and the total size of the packed items must not exceed the knapsack's capacity, while the objective… ▽ More We analyze the competitive ratio and the advice complexity of the online unbounded knapsack problem. An instance is given as a sequence of n items with a size and a value each, and an algorithm has to decide how often to pack each item into a knapsack of bounded capacity. The items are given online and the total size of the packed items must not exceed the knapsack's capacity, while the objective is to maximize the total value of the packed items. While each item can only be packed once in the classical 0-1 knapsack problem, the unbounded version allows for items to be packed multiple times. We show that the simple unbounded knapsack problem, where the size of each item is equal to its value, allows for a competitive ratio of 2. We also analyze randomized algorithms and show that, in contrast to the 0-1 knapsack problem, one uniformly random bit cannot improve an algorithm's performance. More randomness lowers the competitive ratio to less than 1.736, but it can never be below 1.693. In the advice complexity setting, we measure how many bits of information the algorithm has to know to achieve some desired solution quality. For the simple unbounded knapsack problem, one advice bit lowers the competitive ratio to 3/2. While this cannot be improved with fewer than log(n) advice bits for instances of length n, a competitive ratio of 1+epsilon can be achieved with O(log(n/epsilon)/epsilon) advice bits for any epsilon>0. We further show that no amount of advice bounded by a function f(n) allows an algorithm to be optimal. We also study the online general unbounded knapsack problem and show that it does not allow for any bounded competitive ratio for deterministic and randomized algorithms, as well as for algorithms using fewer than log(n) advice bits. We also provide an algorithm that uses O(log(n/epsilon)/epsilon) advice bits to achieve a competitive ratio of 1+epsilon for any epsilon>0. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.18151 [pdf, other]

Tree Coloring: Random Order and Predictions

Authors: Fabian Frei, Matthias Gehnen, Dennis Komm, Rastislav Královič, Richard Královič, Peter Rossmanith, Moritz Stocker

Abstract: Coloring is a notoriously hard problem, and even more so in the online setting, where each arriving vertex has to be colored immediately and irrevocably. Already on trees, which are trivially two-colorable, it is impossible to achieve anything better than a logarithmic competitive ratio. We show how to undercut this bound by a double-logarithmic factor in the slightly relaxed online model where… ▽ More Coloring is a notoriously hard problem, and even more so in the online setting, where each arriving vertex has to be colored immediately and irrevocably. Already on trees, which are trivially two-colorable, it is impossible to achieve anything better than a logarithmic competitive ratio. We show how to undercut this bound by a double-logarithmic factor in the slightly relaxed online model where the vertices arrive in random order. We then also analyze algorithms with predictions, showing how well we can color trees with machine-learned advice of varying reliability. We further extend our analysis to all two-colorable graphs and provide matching lower bounds in both cases. Finally, we demonstrate how the two mentioned approaches, both of which diminish the often unjustified pessimism of the classical online model, can be combined to yield even better results. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.13129 [pdf, other]

Rethinking the production and publication of machine-reusable expressions of research findings

Authors: Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh

Abstract: Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccurac… ▽ More Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born reusable, i.e. produced in a machine-reusable format during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. We test the approach with three use cases, and discuss the role of publishers and editors in scaling the approach. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge richness and accuracy as well as technological simplicity. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2312.01065 [pdf, other]

Scholarly Knowledge Graph Construction from Published Software Packages

Authors: Muhammad Haris, Sören Auer, Markus Stocker

Abstract: The value of structured scholarly knowledge for research and society at large is well understood, but producing scholarly knowledge (i.e., knowledge traditionally published in articles) in structured form remains a challenge. We propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data)… ▽ More The value of structured scholarly knowledge for research and society at large is well understood, but producing scholarly knowledge (i.e., knowledge traditionally published in articles) in structured form remains a challenge. We propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting knowledge graph includes articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. The knowledge graph also includes the results reported as scholarly knowledge in articles. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures. arXiv admin note: text overlap with arXiv:2212.07921

arXiv:2212.07921 [pdf, other]

Scholarly Knowledge Extraction from Published Software Packages

Authors: Muhammad Haris, Markus Stocker, Sören Auer

Abstract: A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static anal… ▽ More A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG). △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.05429 [pdf, other]

doi 10.1007/978-3-031-21756-2_23

MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles

Authors: Mohamad Yaser Jaradeh, Markus Stocker, Sören Auer

Abstract: Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates st… ▽ More Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles, which we openly publish as a resource for the research community. Our results show that structured summarization is a suitable approach for targeted information extraction that complements other commonly used methods such as question answering and named entity recognition. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: Published as a short paper in ICADL 2022

arXiv:2209.08789 [pdf]

doi 10.5281/zenodo.6912480

Persistent Identification and Interlinking of FAIR Scholarly Knowledge

Authors: Muhammad Haris, Markus Stocker, Sören Auer

Abstract: We leverage the Open Research Knowledge Graph - a scholarly infrastructure that supports the creation, curation, and reuse of structured, semantic scholarly knowledge - and present an approach for persistent identification of FAIR scholarly knowledge. We propose a DOI-based persistent identification of ORKG Papers, which are machine-actionable descriptions of the essential information published in… ▽ More We leverage the Open Research Knowledge Graph - a scholarly infrastructure that supports the creation, curation, and reuse of structured, semantic scholarly knowledge - and present an approach for persistent identification of FAIR scholarly knowledge. We propose a DOI-based persistent identification of ORKG Papers, which are machine-actionable descriptions of the essential information published in scholarly articles. This enables the citability of FAIR scholarly knowledge and its discovery in global scholarly communication infrastructures (e.g., DataCite, OpenAIRE, and ORCID). While publishing, the state of the ORKG Paper is saved and cannot be further edited. To allow for updating published versions, ORKG supports creating new versions, which are linked in provenance chains. We demonstrate the linking of FAIR scholarly knowledge with digital artefacts (articles), agents (researchers) and other objects (organizations). We persistently identify FAIR scholarly knowledge (namely, ORKG Papers and ORKG Comparisons as collections of ORKG Papers) by leveraging DataCite services. Given the existing interoperability between DataCite, Crossref, OpenAIRE and ORCID, sharing metadata with DataCite ensures global findability of FAIR scholarly knowledge in scholarly communication infrastructures. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2206.01442 [pdf, other]

doi 10.1145/3442442.3458603

Plumber: A Modular Framework to Create Information Extraction Pipelines

Authors: Mohamad Yaser Jaradeh, Kuldeep Singh, Markus Stocker, Sören Auer

Abstract: Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured… ▽ More Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured triples are extracted from a text and aligned to an existing Knowledge Graph (KG). In this paper, we present PLUMBER, the first framework that allows users to manually and automatically create suitable IE pipelines from a community-created pool of tools to perform triple extraction and alignment on unstructured text. Our approach provides an interactive medium to alter the pipelines and perform IE tasks. A short video to show the working of the framework for different use-cases is available online under: https://www.youtube.com/watch?v=XC9rJNIUv8g △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: pre-print for WWW'21 demo of ICWE PLUMBER publication

arXiv:2206.01439 [pdf, other]

doi 10.1007/978-3-030-30760-8_31

Open Research Knowledge Graph:A System Walkthrough

Authors: Mohamad Yaser Jaradeh, Allard Oelen, Manuel Prinz, Markus Stocker, Sören Auer

Abstract: Despite improved digital access to scholarly literature in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. Scholarly knowledge remains locked in representations that are inadequate for machine processing. The Open Research Knowledge Graph (ORKG) is an infrastructure for representing, curating and exploring scholarl… ▽ More Despite improved digital access to scholarly literature in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. Scholarly knowledge remains locked in representations that are inadequate for machine processing. The Open Research Knowledge Graph (ORKG) is an infrastructure for representing, curating and exploring scholarly knowledge in a machine actionable manner. We demonstrate the core functionality of ORKG for representing research contributions published in scholarly articles. A video of the demonstration and the system are available online. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: Pre-print for TPDL 2019 demo

arXiv:2205.04504 [pdf, other]

doi 10.1145/3529372.3533285

TinyGenius: Intertwining Natural Language Processing with Microtask Crowdsourcing for Scholarly Knowledge Graph Creation

Authors: Allard Oelen, Markus Stocker, Sören Auer

Abstract: As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by f… ▽ More As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2203.14617 [pdf, other]

Enriching Scholarly Knowledge with Context

Authors: Muhammad Haris, Markus Stocker, Sören Auer

Abstract: Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the proposed approach in the Open Research Knowledge Graph… ▽ More Leveraging a GraphQL-based federated query service that integrates multiple scholarly communication infrastructures (specifically, DataCite, ORCID, ROR, OpenAIRE, Semantic Scholar, Wikidata and Altmetric), we develop a novel web widget based approach for the presentation of scholarly knowledge with rich contextual information. We implement the proposed approach in the Open Research Knowledge Graph (ORKG) and showcase it on three kinds of widgets. First, we devise a widget for the ORKG paper view that presents contextual information about related datasets, software, project information, topics, and metrics. Second, we extend the ORKG contributor profile view with contextual information including authored articles, developed software, linked projects, and research interests. Third, we advance ORKG comparison faceted search by introducing contextual facets (e.g. citations). As a result, the devised approach enables presenting ORKG scholarly knowledge flexibly enriched with contextual information sourced in a federated manner from numerous technologically heterogeneous scholarly communication infrastructures. △ Less

Submitted 28 March, 2022; originally announced March 2022.

arXiv:2203.14574 [pdf, other]

The Digitalization of Bioassays in the Open Research Knowledge Graph

Authors: Jennifer D'Souza, Anita Monteverdi, Muhammad Haris, Marco Anteghini, Kheir Eddine Farfar, Markus Stocker, Vitor A. P. Martins dos Santos, Sören Auer

Abstract: Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-readable data. There is a need, however,… ▽ More Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-readable data. There is a need, however, to engender change in traditional community practices of recording contributions as unstructured, non-machine-readable text. For this in turn, there is a strong need for AI tools designed for scientists that permit easy and accurate semantification of their scholarly contributions. We present one such tool, ORKG-assays. Implementation: ORKG-assays is a freely available AI micro-service in ORKG written in Python designed to assist scientists obtain semantified bioassays as a set of triples. It uses an AI-based clustering algorithm which on gold-standard evaluations over 900 bioassays with 5,514 unique property-value pairs for 103 predicates shows competitive performance. Results and Discussion: As a result, semantified assay collections can be surveyed on the ORKG platform via tabulation or chart-based visualizations of key property values of the chemicals and compounds offering smart knowledge access to biochemists and pharmaceutical researchers in the advancement of drug development. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 12 pages, 5 figures, In Review at DeXa 2022 https://www.dexa.org/dexa2022

arXiv:2111.15342 [pdf, other]

doi 10.1007/978-3-030-91669-5_9

SmartReviews: Towards Human- and Machine-actionable Representation of Review Articles

Authors: Allard Oelen, Markus Stocker, Sören Auer

Abstract: Review articles are a means to structure state-of-the-art literature and to organize the growing number of scholarly publications. However, review articles are suffering from numerous limitations, weakening the impact the articles could potentially have. A key limitation is the inability of machines to access and process knowledge presented within review articles. In this work, we present SmartRev… ▽ More Review articles are a means to structure state-of-the-art literature and to organize the growing number of scholarly publications. However, review articles are suffering from numerous limitations, weakening the impact the articles could potentially have. A key limitation is the inability of machines to access and process knowledge presented within review articles. In this work, we present SmartReviews, a review authoring and publishing tool, specifically addressing the limitations of review articles. The tool enables community-based authoring of living articles, leveraging a scholarly knowledge graph to provide machine-actionable knowledge. We evaluate the approach and tool by means of a SmartReview use case. The results indicate that the evaluated article is successfully addressing the weaknesses of the current review practices. △ Less

Submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.11845 [pdf, other]

doi 10.1145/3460210.3493582

Triple Classification for Scholarly Knowledge Graph Completion

Authors: Mohamad Yaser Jaradeh, Kuldeep Singh, Markus Stocker, Sören Auer

Abstract: Scholarly Knowledge Graphs (KGs) provide a rich source of structured information representing knowledge encoded in scientific publications. With the sheer volume of published scientific literature comprising a plethora of inhomogeneous entities and relations to describe scientific concepts, these KGs are inherently incomplete. We present exBERT, a method for leveraging pre-trained transformer lang… ▽ More Scholarly Knowledge Graphs (KGs) provide a rich source of structured information representing knowledge encoded in scientific publications. With the sheer volume of published scientific literature comprising a plethora of inhomogeneous entities and relations to describe scientific concepts, these KGs are inherently incomplete. We present exBERT, a method for leveraging pre-trained transformer language models to perform scholarly knowledge graph completion. We model triples of a knowledge graph as text and perform triple classification (i.e., belongs to KG or not). The evaluation shows that exBERT outperforms other baselines on three scholarly KG completion datasets in the tasks of triple classification, link prediction, and relation prediction. Furthermore, we present two scholarly datasets as resources for the research community, collected from public KGs and online resources. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2109.05857 [pdf, other]

Federating Scholarly Infrastructures with GraphQL

Authors: Muhammad Haris, Kheir Eddine Farfar, Markus Stocker, Sören Auer

Abstract: A plethora of scholarly knowledge is being published on distributed scholarly infrastructures. Querying a single infrastructure is no longer sufficient for researchers to satisfy information needs. We present a GraphQL-based federated query service for executing distributed queries on numerous, heterogeneous scholarly infrastructures (currently, ORKG, DataCite and GeoNames), thus enabling the inte… ▽ More A plethora of scholarly knowledge is being published on distributed scholarly infrastructures. Querying a single infrastructure is no longer sufficient for researchers to satisfy information needs. We present a GraphQL-based federated query service for executing distributed queries on numerous, heterogeneous scholarly infrastructures (currently, ORKG, DataCite and GeoNames), thus enabling the integrated retrieval of scholarly content from these infrastructures. Furthermore, we present the methods that enable cross-walks between artefact metadata and artefact content across scholarly infrastructures, specifically DOI-based persistent identification of ORKG artefacts (e.g., ORKG comparisons) and linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). This type of linking increases interoperability, facilitates the reuse of scholarly knowledge, and enables finding machine actionable scholarly knowledge published by ORKG in global scholarly infrastructures. In summary, we suggest applying the established linked data principles to scholarly knowledge to improve its findability, interoperability, and ultimately reusability, i.e., improve scholarly knowledge FAIR-ness. △ Less

Submitted 13 September, 2021; originally announced September 2021.

arXiv:2107.05738 [pdf, other]

doi 10.1145/3442442.3458605

Demonstration of Faceted Search on Scholarly Knowledge Graphs

Authors: Golsa Heidari, Ahmad Ramadan, Markus Stocker, Sören Auer

Abstract: Scientists always look for the most accurate and relevant answer to their queries on the scholarly literature. Traditional scholarly search systems list documents instead of providing direct answers to the search queries. As data in knowledge graphs are not acquainted semantically, they are not machine-readable. Therefore, a search on scholarly knowledge graphs ends up in a full-text search, not a… ▽ More Scientists always look for the most accurate and relevant answer to their queries on the scholarly literature. Traditional scholarly search systems list documents instead of providing direct answers to the search queries. As data in knowledge graphs are not acquainted semantically, they are not machine-readable. Therefore, a search on scholarly knowledge graphs ends up in a full-text search, not a search in the content of scholarly literature. In this demo, we present a faceted search system that retrieves data from a scholarly knowledge graph, which can be compared and filtered to better satisfy user information needs. Our practice's novelty is that we use dynamic facets, which means facets are not fixed and will change according to the content of a comparison. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 2 pages, 1 figure, WWW 2021 Demo. arXiv admin note: substantial text overlap with arXiv:2107.05447

arXiv:2107.05447 [pdf, other]

Leveraging a Federation of Knowledge Graphs to Improve Faceted Search in Digital Libraries

Authors: Golsa Heidari, Ahmad Ramadan, Markus Stocker, Sören Auer

Abstract: Scientists always look for the most accurate and relevant answers to their queries in the literature. Traditional scholarly digital libraries list documents in search results, and therefore are unable to provide precise answers to search queries. In other words, search in digital libraries is metadata search and, if available, full-text search. We present a methodology for improving a faceted sear… ▽ More Scientists always look for the most accurate and relevant answers to their queries in the literature. Traditional scholarly digital libraries list documents in search results, and therefore are unable to provide precise answers to search queries. In other words, search in digital libraries is metadata search and, if available, full-text search. We present a methodology for improving a faceted search system on structured content by leveraging a federation of scholarly knowledge graphs. We implemented the methodology on top of a scholarly knowledge graph. This search system can leverage content from third-party knowledge graphs to improve the exploration of scholarly content. A novelty of our approach is that we use dynamic facets on diverse data types, meaning that facets can change according to the user query. The user can also adjust the granularity of dynamic facets. An additional novelty is that we leverage third-party knowledge graphs to improve exploring scholarly knowledge. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 12 pages, 4 figures, TPDL 2021 conference

arXiv:2107.03816 [pdf, other]

SmartReviews: Towards Human- and Machine-actionable Reviews

Authors: Allard Oelen, Markus Stocker, Sören Auer

Abstract: Review articles summarize state-of-the-art work and provide a means to organize the growing number of scholarly publications. However, the current review method and publication mechanisms hinder the impact review articles can potentially have. Among other limitations, reviews only provide a snapshot of the current literature and are generally not readable by machines. In this work, we identify the… ▽ More Review articles summarize state-of-the-art work and provide a means to organize the growing number of scholarly publications. However, the current review method and publication mechanisms hinder the impact review articles can potentially have. Among other limitations, reviews only provide a snapshot of the current literature and are generally not readable by machines. In this work, we identify the weaknesses of the current review method. Afterwards, we present the SmartReview approach addressing those weaknesses. The approach pushes towards semantic community-maintained review articles. At the core of our approach, knowledge graphs are employed to make articles more machine-actionable and maintainable. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 1 figure

arXiv:2102.10966 [pdf, other]

Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines

Authors: Mohamad Yaser Jaradeh, Kuldeep Singh, Markus Stocker, Andreas Both, Sören Auer

Abstract: In the last decade, a large number of Knowledge Graph (KG) information extraction approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG information extraction (IE) have not been studied in the literature. We propose Plumber, the first framework that brings together the research community's disjoint IE efforts. The Plum… ▽ More In the last decade, a large number of Knowledge Graph (KG) information extraction approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG information extraction (IE) have not been studied in the literature. We propose Plumber, the first framework that brings together the research community's disjoint IE efforts. The Plumber architecture comprises 33 reusable components for various KG information extraction subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components,Plumber dynamically generates suitable information extraction pipelines and offers overall 264 distinct pipelines.We study the optimization problem of choosing suitable pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over two KGs: DBpedia, and Open Research Knowledge Graph (ORKG). Our results demonstrate the effectiveness of Plumber in dynamically generating KG information extraction pipelines,outperforming all baselines agnostics of the underlying KG. Furthermore,we provide an analysis of collective failure cases, study the similarities and synergies among integrated components, and discuss their limitations. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted in ICWE 2021

arXiv:2102.06021 [pdf, other]

Analysing the Requirements for an Open Research Knowledge Graph: Use Cases, Quality Requirements and Construction Strategies

Authors: Arthur Brack, Anett Hoppe, Markus Stocker, Sören Auer, Ralph Ewerth

Abstract: Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work… ▽ More Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:2005.10334

arXiv:2012.00456 [pdf, other]

doi 10.1007/978-3-030-64452-9_35

Creating a Scholarly Knowledge Graph from Survey Article Tables

Authors: Allard Oelen, Markus Stocker, Sören Auer

Abstract: Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-consuming and cumbersome. In this work, we present a human-in-the-loop methodology used to build a scholarly knowledge graph leveraging literature surve… ▽ More Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-consuming and cumbersome. In this work, we present a human-in-the-loop methodology used to build a scholarly knowledge graph leveraging literature survey articles. Survey articles often contain manually curated and high-quality tabular information that summarizes findings published in the scientific literature. Consequently, survey articles are an excellent resource for generating a scholarly knowledge graph. The presented methodology consists of five steps, in which tables and references are extracted from PDF articles, tables are formatted and finally ingested into the knowledge graph. To evaluate the methodology, 92 survey articles, containing 160 survey tables, have been imported in the graph. In total, 2,626 papers have been added to the knowledge graph using the presented methodology. The results demonstrate the feasibility of our approach, but also indicate that manual effort is required and thus underscore the important role of human experts. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2006.13733 [pdf, ps, other]

doi 10.1007/978-3-030-52200-1_32

Operational Research Literature as a Use Case for the Open Research Knowledge Graph

Authors: Mila Runnwerth, Markus Stocker, Sören Auer

Abstract: The Open Research Knowledge Graph (ORKG) provides machine-actionable access to scholarly literature that habitually is written in prose. Following the FAIR principles, the ORKG makes traditional, human-coded knowledge findable, accessible, interoperable, and reusable in a structured manner in accordance with the Linked Open Data paradigm. At the moment, in ORKG papers are described manually, but i… ▽ More The Open Research Knowledge Graph (ORKG) provides machine-actionable access to scholarly literature that habitually is written in prose. Following the FAIR principles, the ORKG makes traditional, human-coded knowledge findable, accessible, interoperable, and reusable in a structured manner in accordance with the Linked Open Data paradigm. At the moment, in ORKG papers are described manually, but in the long run the semantic depth of the literature at scale needs automation. Operational Research is a suitable test case for this vision because the mathematical field and, hence, its publication habits are highly structured: A mundane problem is formulated as a mathematical model, solved or approximated numerically, and evaluated systematically. We study the existing literature with respect to the Assembly Line Balancing Problem and derive a semantic description in accordance with the ORKG. Eventually, selected papers are ingested to test the semantic description and refine it further. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: International Congress on Mathematical Software (ICMS) 2020

MSC Class: 68T30; 90C10 ACM Class: H.3.7; I.2.4

arXiv:2006.01747 [pdf, other]

doi 10.1145/3383583.3398520

Generate FAIR Literature Surveys with Scholarly Knowledge Graphs

Authors: A. Oelen, M. Y. Jaradeh, M. Stocker, S. Auer

Abstract: Reviewing scientific literature is a cumbersome, time consuming but crucial activity in research. Leveraging a scholarly knowledge graph, we present a methodology and a system for comparing scholarly literature, in particular research contributions describing the addressed problem, utilized materials, employed methods and yielded results. The system can be used by researchers to quickly get famili… ▽ More Reviewing scientific literature is a cumbersome, time consuming but crucial activity in research. Leveraging a scholarly knowledge graph, we present a methodology and a system for comparing scholarly literature, in particular research contributions describing the addressed problem, utilized materials, employed methods and yielded results. The system can be used by researchers to quickly get familiar with existing work in a specific research domain (e.g., a concrete research question or hypothesis). Additionally, it can be used to publish literature surveys following the FAIR Data Principles. The methodology to create a research contribution comparison consists of multiple tasks, specifically: (a) finding similar contributions, (b) aligning contribution descriptions, (c) visualizing and finally (d) publishing the comparison. The methodology is implemented within the Open Research Knowledge Graph (ORKG), a scholarly infrastructure that enables researchers to collaboratively describe, find and compare research contributions. We evaluate the implementation using data extracted from published review articles. The evaluation also addresses the FAIRness of comparisons published with the ORKG. △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:2006.01527 [pdf, other]

Question Answering on Scholarly Knowledge Graphs

Authors: Mohamad Yaser Jaradeh, Markus Stocker, Sören Auer

Abstract: Answering questions on scholarly knowledge comprising text and other artifacts is a vital part of any research life cycle. Querying scholarly knowledge and retrieving suitable answers is currently hardly possible due to the following primary reason: machine inactionable, ambiguous and unstructured content in publications. We present JarvisQA, a BERT based system to answer questions on tabular view… ▽ More Answering questions on scholarly knowledge comprising text and other artifacts is a vital part of any research life cycle. Querying scholarly knowledge and retrieving suitable answers is currently hardly possible due to the following primary reason: machine inactionable, ambiguous and unstructured content in publications. We present JarvisQA, a BERT based system to answer questions on tabular views of scholarly knowledge graphs. Such tables can be found in a variety of shapes in the scholarly literature (e.g., surveys, comparisons or results). Our system can retrieve direct answers to a variety of different questions asked on tabular data in articles. Furthermore, we present a preliminary dataset of related tables and a corresponding set of natural language questions. This dataset is used as a benchmark for our system and can be reused by others. Additionally, JarvisQA is evaluated on two datasets against other baselines and shows an improvement of two to three folds in performance compared to related methods. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: Pre-print for TPDL2020 accepted full paper, 14 pages

arXiv:2005.10334 [pdf, other]

doi 10.1007/978-3-030-54956-5_1

Requirements Analysis for an Open Research Knowledge Graph

Authors: Arthur Brack, Anett Hoppe, Markus Stocker, Sören Auer, Ralph Ewerth

Abstract: Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get an overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Re… ▽ More Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get an overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted for publishing in 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020

Journal ref: Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science, vol 12246. Springer, Cham

arXiv:2003.12958 [pdf]

Persistent Identification Of Instruments

Authors: Markus Stocker, Louise Darroch, Rolf Krahl, Ted Habermann, Anusuriya Devaraju, Ulrich Schwardmann, Claudio D'Onofrio, Ingemar Häggström

Abstract: Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persisten… ▽ More Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persistent identification of instruments which we present and discuss in this paper. Based on an analysis of 10 use cases, PIDINST developed a metadata schema and prototyped schema implementation with DataCite and ePIC as representative persistent identifier infrastructures and with HZB (Helmholtz-Zentrum Berlin für Materialien und Energie) and BODC (British Oceanographic Data Centre) as representative institutional instrument providers. These implementations demonstrate the viability of the proposed solution in practice. Moving forward, PIDINST will further catalyse adoption and consolidate the schema by addressing new stakeholder requirements. △ Less

Submitted 29 March, 2020; originally announced March 2020.

arXiv:1901.10816 [pdf, other]

Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge

Authors: Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, Sören Auer

Abstract: Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. In this paper, we present the first steps towards a knowledge graph based infrastructure that acquires scholarly knowledge in machine actionable form thus enabling new possibilities for scholarly kn… ▽ More Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. In this paper, we present the first steps towards a knowledge graph based infrastructure that acquires scholarly knowledge in machine actionable form thus enabling new possibilities for scholarly knowledge curation, publication and processing. The primary contribution is to present, evaluate and discuss multi-modal scholarly knowledge acquisition, combining crowdsourced and automated techniques. We present the results of the first user evaluation of the infrastructure with the participants of a recent international conference. Results suggest that users were intrigued by the novelty of the proposed infrastructure and by the possibilities for innovative scholarly knowledge processing it could enable. △ Less

Submitted 1 August, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: 8 pages

Showing 1–27 of 27 results for author: Stocker, M