Search | arXiv e-print repository

Rethinking the production and publication of machine-reusable expressions of research findings

Authors: Markus Stocker, Lauren Snyder, Matthew Anfuso, Oliver Ludwig, Freya Thießen, Kheir Eddine Farfar, Muhammad Haris, Allard Oelen, Mohamad Yaser Jaradeh

Abstract: Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccurac… ▽ More Literature is the primary expression of scientific knowledge and an important source of research data. However, scientific knowledge expressed in narrative text documents is not inherently machine reusable. To facilitate knowledge reuse, e.g. for synthesis research, scientific knowledge must be extracted from articles and organized into databases post-publication. The high time costs and inaccuracies associated with completing these activities manually has driven the development of techniques that automate knowledge extraction. Tackling the problem with a different mindset, we propose a pre-publication approach, known as reborn, that ensures scientific knowledge is born reusable, i.e. produced in a machine-reusable format during knowledge production. We implement the approach using the Open Research Knowledge Graph infrastructure for FAIR scientific knowledge organization. We test the approach with three use cases, and discuss the role of publishers and editors in scaling the approach. Our results suggest that the proposed approach is superior compared to classical manual and semi-automated post-publication extraction techniques in terms of knowledge richness and accuracy as well as technological simplicity. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2303.15113 [pdf, other]

Describing and Organizing Semantic Web and Machine Learning Systems in the SWeMLS-KG

Authors: Fajar J. Ekaputra, Majlinda Llugiqi, Marta Sabou, Andreas Ekelhart, Heiko Paulheim, Anna Breit, Artem Revenko, Laura Waltersdorfer, Kheir Eddine Farfar, Sören Auer

Abstract: In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community - Semantic Web Machine Learning (SWeML for short). Due to its rapid growth and impact on several communities… ▽ More In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining machine learning (ML) components with techniques developed by the Semantic Web (SW) community - Semantic Web Machine Learning (SWeML for short). Due to its rapid growth and impact on several communities in the last two decades, there is a need to better understand the space of these SWeML Systems, their characteristics, and trends. Yet, surveys that adopt principled and unbiased approaches are missing. To fill this gap, we performed a systematic study and analyzed nearly 500 papers published in the last decade in this area, where we focused on evaluating architectural, and application-specific features. Our analysis identified a rapidly growing interest in SWeML Systems, with a high impact on several application domains and tasks. Catalysts for this rapid growth are the increased application of deep learning and knowledge graph technologies. By leveraging the in-depth understanding of this area acquired through this study, a further key contribution of this paper is a classification system for SWeML Systems which we publish as ontology. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Preprint of a paper in the resource track of the 20th Extended Semantic Web Conference (ESWC'23)

arXiv:2203.14574 [pdf, other]

The Digitalization of Bioassays in the Open Research Knowledge Graph

Authors: Jennifer D'Souza, Anita Monteverdi, Muhammad Haris, Marco Anteghini, Kheir Eddine Farfar, Markus Stocker, Vitor A. P. Martins dos Santos, Sören Auer

Abstract: Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-readable data. There is a need, however,… ▽ More Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-readable data. There is a need, however, to engender change in traditional community practices of recording contributions as unstructured, non-machine-readable text. For this in turn, there is a strong need for AI tools designed for scientists that permit easy and accurate semantification of their scholarly contributions. We present one such tool, ORKG-assays. Implementation: ORKG-assays is a freely available AI micro-service in ORKG written in Python designed to assist scientists obtain semantified bioassays as a set of triples. It uses an AI-based clustering algorithm which on gold-standard evaluations over 900 bioassays with 5,514 unique property-value pairs for 103 predicates shows competitive performance. Results and Discussion: As a result, semantified assay collections can be surveyed on the ORKG platform via tabulation or chart-based visualizations of key property values of the chemicals and compounds offering smart knowledge access to biochemists and pharmaceutical researchers in the advancement of drug development. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 12 pages, 5 figures, In Review at DeXa 2022 https://www.dexa.org/dexa2022

arXiv:2109.05857 [pdf, other]

Federating Scholarly Infrastructures with GraphQL

Authors: Muhammad Haris, Kheir Eddine Farfar, Markus Stocker, Sören Auer

Abstract: A plethora of scholarly knowledge is being published on distributed scholarly infrastructures. Querying a single infrastructure is no longer sufficient for researchers to satisfy information needs. We present a GraphQL-based federated query service for executing distributed queries on numerous, heterogeneous scholarly infrastructures (currently, ORKG, DataCite and GeoNames), thus enabling the inte… ▽ More A plethora of scholarly knowledge is being published on distributed scholarly infrastructures. Querying a single infrastructure is no longer sufficient for researchers to satisfy information needs. We present a GraphQL-based federated query service for executing distributed queries on numerous, heterogeneous scholarly infrastructures (currently, ORKG, DataCite and GeoNames), thus enabling the integrated retrieval of scholarly content from these infrastructures. Furthermore, we present the methods that enable cross-walks between artefact metadata and artefact content across scholarly infrastructures, specifically DOI-based persistent identification of ORKG artefacts (e.g., ORKG comparisons) and linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). This type of linking increases interoperability, facilitates the reuse of scholarly knowledge, and enables finding machine actionable scholarly knowledge published by ORKG in global scholarly infrastructures. In summary, we suggest applying the established linked data principles to scholarly knowledge to improve its findability, interoperability, and ultimately reusability, i.e., improve scholarly knowledge FAIR-ness. △ Less

Submitted 13 September, 2021; originally announced September 2021.

arXiv:1901.10816 [pdf, other]

Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge

Authors: Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, Sören Auer

Abstract: Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. In this paper, we present the first steps towards a knowledge graph based infrastructure that acquires scholarly knowledge in machine actionable form thus enabling new possibilities for scholarly kn… ▽ More Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. In this paper, we present the first steps towards a knowledge graph based infrastructure that acquires scholarly knowledge in machine actionable form thus enabling new possibilities for scholarly knowledge curation, publication and processing. The primary contribution is to present, evaluate and discuss multi-modal scholarly knowledge acquisition, combining crowdsourced and automated techniques. We present the results of the first user evaluation of the infrastructure with the participants of a recent international conference. Results suggest that users were intrigued by the novelty of the proposed infrastructure and by the possibilities for innovative scholarly knowledge processing it could enable. △ Less

Submitted 1 August, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: 8 pages

Showing 1–5 of 5 results for author: Farfar, K E