The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Friedrich, Annemarie; Adel, Heike; Tomazic, Federico; Hingerl, Johannes; Benteau, Renou; Maruscyk, Anika; Lange, Lukas

Computer Science > Computation and Language

arXiv:2006.03039 (cs)

[Submitted on 4 Jun 2020]

Title:The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Authors:Annemarie Friedrich, Heike Adel, Federico Tomazic, Johannes Hingerl, Renou Benteau, Anika Maruscyk, Lukas Lange

View PDF

Abstract:This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

Comments:	Accepted for publication at ACL 2020
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2006.03039 [cs.CL]
	(or arXiv:2006.03039v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.03039

Submission history

From: Annemarie Friedrich [view email]
[v1] Thu, 4 Jun 2020 17:49:34 UTC (299 KB)

Computer Science > Computation and Language

Title:The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators