Shared Metadata for Data-Centric Materials Science
Authors:
Luca M. Ghiringhelli,
Carsten Baldauf,
Tristan Bereau,
Sandor Brockhauser,
Christian Carbogno,
Javad Chamanara,
Stefano Cozzini,
Stefano Curtarolo,
Claudia Draxl,
Shyam Dwaraknath,
Ádám Fekete,
James Kermode,
Christoph T. Koch,
Markus Kühbach,
Alvin Noe Ladines,
Patrick Lambrix,
Maja-Olivia Lenz-Himmer,
Sergey Levchenko,
Micael Oliveira,
Adam Michalchuk,
Ron Miller,
Berk Onat,
Pasquale Pavone,
Giovanni Pizzi,
Benjamin Regler
, et al. (10 additional authors not shown)
Abstract:
The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree…
▽ More
The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree on the strategies to tackle the challenges that are specific to its data, both from computations and experiments. In this paper, we present the result of the discussions held at the workshop on "Shared Metadata and Data Formats for Big-Data Driven Materials Science". We start from an operative definition of metadata, and what features a FAIR-compliant metadata schema should have. We will mainly focus on computational materials-science data and propose a constructive approach for the FAIRification of the (meta)data related to ground-state and excited-states calculations, potential-energy sampling, and generalized workflows. Finally, challenges with the FAIRification of experimental (meta)data and materials-science ontologies are presented together with an outlook of how to meet them.
△ Less
Submitted 23 August, 2023; v1 submitted 29 May, 2022;
originally announced May 2022.
Better force fields start with better data -- A data set of cation dipeptide interactions
Authors:
Xiaojuan Hu,
Maja-Olivia Lenz-Himmer,
Carsten Baldauf
Abstract:
We present a data set from a first-principles study of amino-methylated and acetylated (capped) dipeptides of the 20 proteinogenic amino acids - including alternative possible side chain protonation states and their interactions with selected divalent cations (Ca$^{2+}$, Mg$^{2+}$ and Ba$^{2+}$). The data covers 21,909 stationary points on the respective potential-energy surfaces in a wide relativ…
▽ More
We present a data set from a first-principles study of amino-methylated and acetylated (capped) dipeptides of the 20 proteinogenic amino acids - including alternative possible side chain protonation states and their interactions with selected divalent cations (Ca$^{2+}$, Mg$^{2+}$ and Ba$^{2+}$). The data covers 21,909 stationary points on the respective potential-energy surfaces in a wide relative energy range of up to 4 eV (390 kJ/mol). Relevant properties of interest, like partial charges, were derived for the conformers. The motivation was to provide a solid data basis for force field parameterization and further applications like machine learning or benchmarking. In particular the process of creating all this data on the same first-principles footing, i.e. density-functional theory calculations employing the generalized gradient approximation with a van der Waals correction, makes this data suitable for data-driven force field development. To make the data accessible across domain borders and to machines, we formalized the metadata in an ontology.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.