Showing 1–2 of 2 results for author: Hildebrand, F

Search v0.5.6 released 2020-02-24

arXiv:2310.15569 [pdf, other]

cs.CL

MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in the Materials Science Domain

Authors: Timo Pierre Schrader, Matteo Finco, Stefan Grünewald, Felix Hildebrand, Annemarie Friedrich

Abstract: Kee** track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures… ▽ More Kee** track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on sub-domains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science. The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 17 pages, 2 figures, 28 tables, to be published in "Proceedings of the second Workshop on Information Extraction from Scientific Publications"
arXiv:2307.02340 [pdf, other]

cs.CL

MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain

Authors: Timo Pierre Schrader, Teresa Bürkle, Sophie Henning, Sherry Tan, Matteo Finco, Stefan Grünewald, Maira Indrikova, Felix Hildebrand, Annemarie Friedrich

Abstract: Scientific publications follow conventionalized rhetorical structures. Classifying the Argumentative Zone (AZ), e.g., identifying whether a sentence states a Motivation, a Result or Background information, has been proposed to improve processing of scholarly documents. In this work, we adapt and extend this idea to the domain of materials science research. We present and release a new dataset of 5… ▽ More Scientific publications follow conventionalized rhetorical structures. Classifying the Argumentative Zone (AZ), e.g., identifying whether a sentence states a Motivation, a Result or Background information, has been proposed to improve processing of scholarly documents. In this work, we adapt and extend this idea to the domain of materials science research. We present and release a new dataset of 50 manually annotated research articles. The dataset spans seven sub-topics and is annotated with a materials-science focused multi-label annotation scheme for AZ. We detail corpus statistics and demonstrate high inter-annotator agreement. Our computational experiments show that using domain-specific pre-trained transformer-based text encoders is key to high classification performance. We also find that AZ categories from existing datasets in other domains are transferable to varying degrees. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 15 pages, 2 figures, 14 tables, to be published in "Proceedings of the 4th Workshop on Computational Approaches to Discourse"

Search v0.5.6 released 2020-02-24