Skip to main content

Showing 1–39 of 39 results for author: D'Souza, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02977  [pdf, other

    cs.CL cs.AI cs.IT

    Large Language Models as Evaluators for Scientific Synthesis

    Authors: Julia Evans, Jennifer D'Souza, Sören Auer

    Abstract: Our study explores how well the state-of-the-art Large Language Models (LLMs), like GPT-4 and Mistral, can assess the quality of scientific summaries or, more fittingly, scientific syntheses, comparing their evaluations to those of human annotators. We used a dataset of 100 research questions and their syntheses made by GPT-4 from abstracts of five related papers, checked against human quality rat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 4 pages, forthcoming as part of the KONVENS 2024 proceedings https://konvens-2024.univie.ac.at/

  2. arXiv:2407.02409  [pdf, other

    cs.CL

    Effective Context Selection in LLM-based Leaderboard Generation: An Empirical Study

    Authors: Salomon Kabongo, Jennifer D'Souza, Sören Auer

    Abstract: This paper explores the impact of context selection on the efficiency of Large Language Models (LLMs) in generating Artificial Intelligence (AI) research leaderboards, a task defined as the extraction of (Task, Dataset, Metric, Score) quadruples from scholarly articles. By framing this challenge as a text generation objective and employing instruction finetuning with the FLAN-T5 collection, we int… ▽ More

    Submitted 6 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.04383

  3. arXiv:2406.07257  [pdf, other

    cs.CL cs.AI

    Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

    Authors: Hamed Babaei Giglou, Tilahun Abedissa Taffa, Rana Abdullah, Aida Usmanova, Ricardo Usbeck, Jennifer D'Souza, Sören Auer

    Abstract: This paper introduces a scholarly Question Answering (QA) system on top of the NFDI4DataScience Gateway, employing a Retrieval Augmented Generation-based (RAG) approach. The NFDI4DS Gateway, as a foundational framework, offers a unified and intuitive interface for querying various scientific databases using federated search. The RAG-based scholarly QA, powered by a Large Language Model (LLM), faci… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages main content, 16 pages overall, 3 Figures, accepted for publication at NSLP 2024 workshop at ESWC 2024

  4. arXiv:2406.04383  [pdf, other

    cs.CL cs.AI

    Exploring the Latest LLMs for Leaderboard Extraction

    Authors: Salomon Kabongo, Jennifer D'Souza, Sören Auer

    Abstract: The rapid advancements in Large Language Models (LLMs) have opened new avenues for automating complex tasks in AI research. This paper investigates the efficacy of different LLMs-Mistral 7B, Llama-2, GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. We explore three types of contextual inputs to the models: DocTAET (Document Title, Abstract, Experim… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2405.14601  [pdf, other

    cs.CL cs.AI

    A FAIR and Free Prompt-based Research Assistant

    Authors: Mahsa Shamsabadi, Jennifer D'Souza

    Abstract: This demo will present the Research Assistant (RA) tool developed to assist with six main types of research tasks defined as standardized instruction templates, instantiated with user input, applied finally as prompts to well-known--for their sophisticated natural language processing abilities--AI tools, such as ChatGPT (https://chat.openai.com/) and Gemini (https://gemini.google.com/app). The six… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 6 pages, 2 figures, accepted to the Demo track of NLDB 2024 (https://nldb2024.di.unito.it/)

  6. arXiv:2405.02602  [pdf, other

    cs.CL cs.AI cs.IT

    Astro-NER -- Astronomy Named Entity Recognition: Is GPT a Good Domain Expert Annotator?

    Authors: Julia Evans, Sameer Sadruddin, Jennifer D'Souza

    Abstract: In this study, we address one of the challenges of develo** NER models for scholarly domains, namely the scarcity of suitable labeled data. We experiment with an approach using predictions from a fine-tuned LLM model to aid non-domain experts in annotating scientific entities within astronomy literature, with the goal of uncovering whether such a collaborative process can approximate domain expe… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 9 pages

  7. arXiv:2405.02105  [pdf, other

    cs.AI cs.CL cs.IT

    Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph

    Authors: Vladyslav Nechakhin, Jennifer D'Souza, Steffen Eger

    Abstract: Structured science summaries or research contributions using properties or dimensions beyond traditional keywords enhances science findability. Current methods, such as those used by the Open Research Knowledge Graph (ORKG), involve manually curating properties to describe research papers' contributions in a structured manner, but this is labor-intensive and inconsistent between the domain expert… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 22 pages, 11 figures. In review at https://www.mdpi.com/journal/information/special_issues/WYS02U2GTD

  8. arXiv:2404.10317  [pdf, other

    cs.AI

    LLMs4OM: Matching Ontologies with Large Language Models

    Authors: Hamed Babaei Giglou, Jennifer D'Souza, Felix Engel, Sören Auer

    Abstract: Ontology Matching (OM), is a critical task in knowledge integration, where aligning heterogeneous ontologies facilitates data interoperability and knowledge sharing. Traditional OM systems often rely on expert knowledge or predictive models, with limited exploration of the potential of Large Language Models (LLMs). We present the LLMs4OM framework, a novel approach to evaluate the effectiveness of… ▽ More

    Submitted 23 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, accepted to ESWC 2024 Special Track on LLMs for Knowledge Engineering (https://2024.eswc-conferences.org/call-for-papers-llms/)

  9. arXiv:2404.08443  [pdf, other

    cs.DL cs.IR

    Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

    Authors: Raia Abu Ahmad, Jennifer D'Souza, Matthäus Zloch, Wolfgang Otto, Georg Rehm, Allard Oelen, Stefan Dietze, Sören Auer

    Abstract: Search engines these days can serve datasets as search results. Datasets get picked up by search technologies based on structured descriptions on their official web pages, informed by metadata ontologies such as the Dataset content type of schema.org. Despite this promotion of the content type dataset as a first-class citizen of search results, a vast proportion of datasets, particularly research… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, published in the Joint Proceedings of the Onto4FAIR 2023 Workshops

    Journal ref: In Joint Proceedings of the Onto4FAIR 2023 Workshops: Collocated with FOIS 2023 and SEMANTICS 2023. pp.23-31. https://hal.science/hal-04312604

  10. arXiv:2402.14622  [pdf, other

    cs.IR cs.AI cs.CL cs.DL

    From Keywords to Structured Summaries: Streamlining Scholarly Knowledge Access

    Authors: Mahsa Shamsabadi, Jennifer D'Souza

    Abstract: This short paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how res… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 6 pages, 1 figure

  11. arXiv:2401.10040  [pdf, other

    cs.CL cs.AI cs.DL cs.IT

    Large Language Models for Scientific Information Extraction: An Empirical Study for Virology

    Authors: Mahsa Shamsabadi, Jennifer D'Souza, Sören Auer

    Abstract: In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generat… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures, Accepted as Findings of the ACL: EACL 2024

  12. arXiv:2310.06517  [pdf, other

    cs.DL cs.CL cs.IT

    Toward Semantic Publishing in Non-Invasive Brain Stimulation: A Comprehensive Analysis of rTMS Studies

    Authors: Swathi Anil, Jennifer D'Souza

    Abstract: Noninvasive brain stimulation (NIBS) encompasses transcranial stimulation techniques that can influence brain excitability. These techniques have the potential to treat conditions like depression, anxiety, and chronic pain, and to provide insights into brain function. However, a lack of standardized reporting practices limits its reproducibility and full clinical potential. This paper aims to fost… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 8 pages, 2 figures. Accepted as a Practice Paper at The 25th International Conference on Asia-Pacific Digital Libraries (ICADL 2023) https://icadl.net/icadl2023/index.html#accepted

  13. arXiv:2310.03376  [pdf, other

    cs.CL cs.AI cs.IT

    Procedural Text Mining with Large Language Models

    Authors: Anisa Rula, Jennifer D'Souza

    Abstract: Recent advancements in the field of Natural Language Processing, particularly the development of large-scale language models that are pretrained on vast amounts of knowledge, are creating novel opportunities within the realm of Knowledge Engineering. In this paper, we investigate the usage of large language models (LLMs) in both zero-shot and in-context learning settings to tackle the problem of e… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures, Accepted to The Twelfth International Conference on Knowledge Capture (K-Cap 2023)

  14. arXiv:2307.16648  [pdf, other

    cs.AI cs.CL cs.IT cs.LG

    LLMs4OL: Large Language Models for Ontology Learning

    Authors: Hamed Babaei Giglou, Jennifer D'Souza, Sören Auer

    Abstract: We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: \textit{Can LLMs effectively apply their language pattern capturi… ▽ More

    Submitted 2 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: 15 pages main content, 27 pages overall, 2 Figures, accepted for publication at ISWC 2023 research track

  15. arXiv:2305.12900  [pdf, other

    cs.CL cs.AI cs.DL cs.IT

    Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph

    Authors: Jennifer D'Souza, Moussab Hrou, Sören Auer

    Abstract: There have been many recent investigations into prompt-based training of transformer language models for new text genres in low-resource settings. The prompt-based training approach has been found to be effective in generalizing pre-trained or fine-tuned models for transfer to resource-scarce settings. This work, for the first time, reports results on adopting prompt-based training of transformers… ▽ More

    Submitted 11 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 14 pages, 1 figure, accepted for publication as a short paper at DEXA 2023 (https://www.dexa.org/dexa2023)

  16. arXiv:2305.11068  [pdf, other

    cs.CL cs.AI

    ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph

    Authors: Salomon Kabongo, Jennifer D'Souza, Sören Auer

    Abstract: The purpose of this work is to describe the Orkg-Leaderboard software designed to extract leaderboards defined as Task-Dataset-Metric tuples automatically from large collections of empirical research papers in Artificial Intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the Op… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: NA. arXiv admin note: text overlap with arXiv:2109.13089

  17. Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections

    Authors: Ming Jiang, Jennifer D'Souza, Sören Auer, J. Stephen Downie

    Abstract: The rapid growth of research publications has placed great demands on digital libraries (DL) for advanced information management technologies. To cater to these demands, techniques relying on knowledge-graph structures are being advocated. In such graph-based pipelines, inferring semantic relations between related scientific concepts is a crucial step. Recently, BERT-based pre-trained models have… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Journal ref: International Journal on Digital Libraries (2022)

  18. arXiv:2303.16835  [pdf, other

    cs.CL cs.AI cs.LG

    Zero-shot Entailment of Leaderboards for Empirical AI Research

    Authors: Salomon Kabongo, Jennifer D'Souza, Sören Auer

    Abstract: We present a large-scale empirical investigation of the zero-shot learning phenomena in a specific recognizing textual entailment (RTE) task category, i.e. the automated mining of leaderboards for Empirical AI Research. The prior reported state-of-the-art models for leaderboards extraction formulated as an RTE task, in a non-zero-shot setting, are promising with above 90% reported performances. Ho… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 5 pages, 1 figure. Accepted for publication at JCDL 2023 - Late Breaking Results and Datasets track (https://2023.jcdl.org/calls/papers/#paper_types), official citation forthcoming

  19. arXiv:2211.13727  [pdf, other

    cs.CL cs.AI cs.LG

    Question-type Identification for Academic Questions in Online Learning Platform

    Authors: Azam Rabiee, Alok Goel, Johnson D'Souza, Saurabh Khanwalkar

    Abstract: Online learning platforms provide learning materials and answers to students' academic questions by experts, peers, or systems. This paper explores question-type identification as a step in content understanding for an online learning platform. The aim of the question-type identifier is to categorize question types based on their structure and complexity, using the question text, subject, and stru… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: 18 pages, 6 figures, 4th International Conference on Semantic & Natural Language Processing (SNLP 2023)

  20. arXiv:2210.02034  [pdf, other

    cs.DL cs.AI

    Clustering Semantic Predicates in the Open Research Knowledge Graph

    Authors: Omar Arab Oghli, Jennifer D'Souza, Sören Auer

    Abstract: When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of graph construction experience terminology diverge… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  21. arXiv:2205.11863  [pdf, other

    cs.CL cs.AI cs.IT

    Overview of STEM Science as Process, Method, Material, and Data Named Entities

    Authors: Jennifer D'Souza

    Abstract: We are faced with an unprecedented production in scholarly publications worldwide. Stakeholders in the digital libraries posit that the document-based publishing paradigm has reached the limits of adequacy. Instead, structured, machine-interpretable, fine-grained scholarly knowledge publishing as Knowledge Graphs (KG) is strongly advocated. In this work, we develop and analyze a large-scale struct… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 9 pages, 17 figures, In Review submission at Queer in AI @ NAACL 2022 Research Symposia (https://sites.google.com/view/queer-in-ai/naacl-2022?authuser=0)

  22. arXiv:2203.14579  [pdf, ps, other

    cs.CL cs.DL cs.IR cs.LG

    Computer Science Named Entity Recognition in the Open Research Knowledge Graph

    Authors: Jennifer D'Souza, Sören Auer

    Abstract: Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can beset the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we believe that scholarly domain-specific NER will receive increasing att… ▽ More

    Submitted 14 November, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: 15 pages, Accepted for publication as a short paper in 24th International Conference on Asia-Pacific Digital Libraries (ICADL 2022, https://icadl.net/icadl2022/)

  23. arXiv:2203.14574  [pdf, other

    cs.DL cs.AI cs.IR

    The Digitalization of Bioassays in the Open Research Knowledge Graph

    Authors: Jennifer D'Souza, Anita Monteverdi, Muhammad Haris, Marco Anteghini, Kheir Eddine Farfar, Markus Stocker, Vitor A. P. Martins dos Santos, Sören Auer

    Abstract: Background: Recent years are seeing a growing impetus in the semantification of scholarly knowledge at the fine-grained level of scientific entities in knowledge graphs. The Open Research Knowledge Graph (ORKG) https://www.orkg.org/ represents an important step in this direction, with thousands of scholarly contributions as structured, fine-grained, machine-readable data. There is a need, however,… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 12 pages, 5 figures, In Review at DeXa 2022 https://www.dexa.org/dexa2022

  24. arXiv:2111.15182  [pdf, other

    cs.AI cs.CL cs.DL cs.LG

    Easy Semantification of Bioassays

    Authors: Marco Anteghini, Jennifer D'Souza, Vitor A. P. Martins dos Santos, Sören Auer

    Abstract: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complex… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: 12 pages, 5 figures, Accepted for Publication in AIxIA 2021 (https://aixia2021.disco.unimib.it/home-page)

  25. arXiv:2110.09036  [pdf

    cs.CL cs.AI cs.IR cs.SC

    Ranking Facts for Explaining Answers to Elementary Science Questions

    Authors: Jennifer D'Souza, Isaiah Onando Mulang', Soeren Auer

    Abstract: In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by 'connecting the dots' across various pertinent facts. Considering automated reasoning for elementary science question… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 25 pages, 5 figures, accepted for publication in NLE

  26. arXiv:2109.13089  [pdf, other

    cs.CL cs.AI cs.DL

    Automated Mining of Leaderboards for Empirical AI Research

    Authors: Salomon Kabongo, Jennifer D'Souza, Sören Auer

    Abstract: With the rapid growth of research publications, empowering scientists to keep oversight over the scientific progress is of paramount importance. In this regard, the Leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among o… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

  27. arXiv:2109.00199  [pdf, ps, other

    cs.IR cs.CL cs.DL

    Pattern-based Acquisition of Scientific Entities from Scholarly Article Titles

    Authors: Jennifer D'Souza, Soeren Auer

    Abstract: We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article's contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic pat… ▽ More

    Submitted 17 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 8 pages, Accepted for publication in ICADL 2021 as a short paper

  28. arXiv:2106.07385  [pdf, other

    cs.CL cs.AI cs.DL cs.IR cs.LG

    SemEval-2021 Task 11: NLPContributionGraph -- Structuring Scholarly NLP Contributions for a Research Knowledge Graph

    Authors: Jennifer D'Souza, Sören Auer, Ted Pedersen

    Abstract: There is currently a gap between the natural language expression of scholarly publications and their structured semantic content modeling to enable intelligent content search. With the volume of research growing exponentially every year, a search feature operating over semantically structured content is compelling. The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. 'the NCG task') tasks par… ▽ More

    Submitted 15 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: 13 pages, 5 figures, 8 tables

    Journal ref: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), (pp. 364-376), ACL

  29. Eigenfactor

    Authors: Grischa Fraumann, Jennifer D'Souza, Kim Holmberg

    Abstract: The Eigenfactor is a journal metric, which was developed by Bergstrom and his colleagues at the University of Washington. They invented the Eigenfactor as a response to the criticism against the use of simple citation counts. The Eigenfactor makes use of the network structure of citations, i.e. citations between journals, and establishes the importance, influence or impact of a journal based on it… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: In book: Handbook Bibliometrics | Edition: De Gruyter Reference | Chapter: 4.7 | Publisher: De Gruyter Saur

  30. arXiv:2104.00563  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.MA

    Latent Variable Sequential Set Transformers For Joint Multi-Agent Motion Prediction

    Authors: Roger Girgis, Florian Golemo, Felipe Codevilla, Martin Weiss, Jim Aldon D'Souza, Samira Ebrahimi Kahou, Felix Heide, Christopher Pal

    Abstract: Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a representation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-… ▽ More

    Submitted 10 February, 2022; v1 submitted 19 February, 2021; originally announced April 2021.

    Comments: 26 pages, 17 figures, 8 tables

  31. arXiv:2010.04388  [pdf

    cs.CL cs.DL cs.IR

    Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions -- A Trial Dataset

    Authors: Jennifer D'Souza, Sören Auer

    Abstract: Purpose: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme (henceforward, NLPCONTRIBUTIONGRAPH) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage - to define the scheme (described in prior work); and 2) adjudication stage - to normalize the graphi… ▽ More

    Submitted 7 May, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: 22 pages, 9 figures, 4 tables

    Journal ref: Journal of Data and Information Science, 6(3) (2021)

  32. arXiv:2009.08801  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph

    Authors: Marco Anteghini, Jennifer D'Souza, Vitor A. P. Martins dos Santos, Sören Auer

    Abstract: As a novel contribution to the problem of semantifying biological assays, in this paper, we propose a neural-network-based approach to automatically semantify, thereby structure, unstructured bioassay text descriptions. Experimental evaluations, to this end, show promise as the neural-based semantification significantly outperforms a naive frequency-based baseline approach. Specifically, the neura… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: In proceedings of the '22nd International Conference on Knowledge Engineering and Knowledge Management' 'Demo and Poster section'

  33. arXiv:2009.07642  [pdf, other

    cs.DL

    Representing Semantified Biological Assays in the Open Research Knowledge Graph

    Authors: Marco Anteghini, Jennifer D'Souza, Vitor A. P. Martins dos Santos, Sören Auer

    Abstract: In the biotechnology and biomedical domains, recent text mining efforts advocate for machine-interpretable, and preferably, semantified, documentation formats of laboratory processes. This includes wet-lab protocols, (in)organic materials synthesis reactions, genetic manipulations and procedures for faster computer-mediated analysis and predictions. Herein, we present our work on the representatio… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: In Proceedings of 'The 22nd International Conference on Asia-Pacific Digital Libraries'

  34. arXiv:2006.12870  [pdf, other

    cs.CL cs.DL cs.IR

    NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

    Authors: Jennifer D'Souza, Sören Auer

    Abstract: We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction ta… ▽ More

    Submitted 3 September, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: In Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2020) co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020), Virtual Event, China, August 1. http://ceur-ws.org/Vol-2658/

  35. arXiv:2004.06153  [pdf, other

    cs.DL cs.CL

    Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification

    Authors: Ming Jiang, Jennifer D'Souza, Sören Auer, J. Stephen Downie

    Abstract: With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on la… ▽ More

    Submitted 13 July, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

  36. arXiv:2003.01006  [pdf, other

    cs.IR cs.AI cs.CL cs.DL

    The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources

    Authors: Jennifer D'Souza, Anett Hoppe, Arthur Brack, Mohamad Yaser Jaradeh, Sören Auer, Ralph Ewerth

    Abstract: We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM di… ▽ More

    Submitted 28 July, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Published in LREC 2020. Publication URL https://www.aclweb.org/anthology/2020.lrec-1.268/; Dataset DOI https://doi.org/10.25835/0017546

  37. Domain-independent Extraction of Scientific Concepts from Research Articles

    Authors: Arthur Brack, Jennifer D'Souza, Anett Hoppe, Sören Auer, Ralph Ewerth

    Abstract: We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: Accepted for publishing in 42nd European Conference on IR Research, ECIR 2020

    Journal ref: Advances in Information Retrieval. 2020

  38. arXiv:1901.10816  [pdf, other

    cs.DL cs.IR

    Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge

    Authors: Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, Sören Auer

    Abstract: Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. In this form, scholarly knowledge is hard to process automatically. In this paper, we present the first steps towards a knowledge graph based infrastructure that acquires scholarly knowledge in machine actionable form thus enabling new possibilities for scholarly kn… ▽ More

    Submitted 1 August, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: 8 pages

  39. arXiv:1204.4765  [pdf, other

    cs.DS

    String Trees

    Authors: Julius D'souza

    Abstract: A string-like compact data structure for unlabelled rooted trees is given using 2n bits.

    Submitted 20 April, 2012; originally announced April 2012.

    Comments: 5 pages

    ACM Class: E.1; G.2.2