-
How big is Big Data?
Authors:
Daniel T. Speckhard,
Tim Bechtel,
Luca M. Ghiringhelli,
Martin Kuban,
Santiago Rigamonti,
Claudia Draxl
Abstract:
Big data has ushered in a new wave of predictive power using machine learning models. In this work, we assess what {\it big} means in the context of typical materials-science machine-learning problems. This concerns not only data volume, but also data quality and veracity as much as infrastructure issues. With selected examples, we ask (i) how models generalize to similar datasets, (ii) how high-q…
▽ More
Big data has ushered in a new wave of predictive power using machine learning models. In this work, we assess what {\it big} means in the context of typical materials-science machine-learning problems. This concerns not only data volume, but also data quality and veracity as much as infrastructure issues. With selected examples, we ask (i) how models generalize to similar datasets, (ii) how high-quality datasets can be gathered from heterogenous sources, (iii) how the feature set and complexity of a model can affect expressivity, and (iv) what infrastructure requirements are needed to create larger datasets and train models on them. In sum, we find that big data present unique challenges along very different aspects that should serve to motivate further work.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Extrapolation to complete basis-set limit in density-functional theory by quantile random-forest models
Authors:
Daniel T. Speckhard,
Christian Carbogno,
Luca Ghiringhelli,
Sven Lubeck,
Matthias Scheffler,
Claudia Draxl
Abstract:
The numerical precision of density-functional-theory (DFT) calculations depends on a variety of computational parameters, one of the most critical being the basis-set size. The ultimate precision is reached with an infinitely large basis set, i.e., in the limit of a complete basis set (CBS). Our aim in this work is to find a machine-learning model that extrapolates finite basis-size calculations t…
▽ More
The numerical precision of density-functional-theory (DFT) calculations depends on a variety of computational parameters, one of the most critical being the basis-set size. The ultimate precision is reached with an infinitely large basis set, i.e., in the limit of a complete basis set (CBS). Our aim in this work is to find a machine-learning model that extrapolates finite basis-size calculations to the CBS limit. We start with a data set of 63 binary solids investigated with two all-electron DFT codes, exciting and FHI-aims, which employ very different types of basis sets. A quantile-random-forest model is used to estimate the total-energy correction with respect to a fully converged calculation as a function of the basis-set size. The random-forest model achieves a symmetric mean absolute percentage error of lower than 25% for both codes and outperforms previous approaches in the literature. Our approach also provides prediction intervals, which quantify the uncertainty of the models' predictions.
△ Less
Submitted 1 June, 2023; v1 submitted 26 March, 2023;
originally announced March 2023.
-
OPTIMADE, an API for exchanging materials data
Authors:
Casper W. Andersen,
Rickard Armiento,
Evgeny Blokhin,
Gareth J. Conduit,
Shyam Dwaraknath,
Matthew L. Evans,
Ádám Fekete,
Abhijith Gopakumar,
Saulius Gražulis,
Andrius Merkys,
Fawzi Mohamed,
Corey Oses,
Giovanni Pizzi,
Gian-Marco Rignanese,
Markus Scheidgen,
Leopold Talirz,
Cormac Toher,
Donald Winston,
Rossella Aversa,
Kamal Choudhary,
Pauline Colinet,
Stefano Curtarolo,
Davide Di Stefano,
Claudia Draxl,
Suleyman Er
, et al. (31 additional authors not shown)
Abstract:
The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API throug…
▽ More
The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification.
△ Less
Submitted 25 August, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.