-
Synthetic pre-training for neural-network interatomic potentials
Authors:
John L. A. Gardner,
Kathryn T. Baker,
Volker L. Deringer
Abstract:
Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore develo** datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of "synthetic" (artificial) d…
▽ More
Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore develo** datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of "synthetic" (artificial) data that is common in other areas of ML research, we here show that synthetic atomistic data, themselves obtained at scale with an existing ML potential, constitute a useful pre-training task for neural-network interatomic potential models. Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice. We demonstrate feasibility for a series of equivariant graph-neural-network potentials for carbon, and we carry out initial experiments to test the limits of the approach.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Coarse-grained versus fully atomistic machine learning for zeolitic imidazolate frameworks
Authors:
Zoé Faure Beaulieu,
Thomas C. Nicholas,
John L. A. Gardner,
Andrew L. Goodwin,
Volker L. Deringer
Abstract:
Zeolitic imidazolate frameworks are widely thought of as being analogous to inorganic AB$_{2}$ phases. We test the validity of this assumption by comparing simplified and fully atomistic machine-learning models for local environments in ZIFs. Our work addresses the central question to what extent chemical information can be "coarse-grained" in hybrid framework materials.
Zeolitic imidazolate frameworks are widely thought of as being analogous to inorganic AB$_{2}$ phases. We test the validity of this assumption by comparing simplified and fully atomistic machine-learning models for local environments in ZIFs. Our work addresses the central question to what extent chemical information can be "coarse-grained" in hybrid framework materials.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Synthetic data enable experiments in atomistic machine learning
Authors:
John L. A. Gardner,
Zoé Faure Beaulieu,
Volker L. Deringer
Abstract:
Machine-learning models are increasingly used to predict properties of atoms in chemical systems. There have been major advances in develo** descriptors and regression frameworks for this task, typically starting from (relatively) small sets of quantum-mechanical reference data. Larger datasets of this kind are becoming available, but remain expensive to generate. Here we demonstrate the use of…
▽ More
Machine-learning models are increasingly used to predict properties of atoms in chemical systems. There have been major advances in develo** descriptors and regression frameworks for this task, typically starting from (relatively) small sets of quantum-mechanical reference data. Larger datasets of this kind are becoming available, but remain expensive to generate. Here we demonstrate the use of a large dataset that we have "synthetically" labelled with per-atom energies from an existing ML potential model. The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints, in turn enabling rapid experimentation with atomistic ML models from the small- to the large-data regime. This approach allows us here to compare regression frameworks in depth, and to explore visualisation based on learned representations. We also show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets. In the future, we expect that our open-sourced dataset, and similar ones, will be useful in rapidly exploring deep-learning models in the limit of abundant chemical data.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
How to validate machine-learned interatomic potentials
Authors:
Joe D. Morrow,
John L. A. Gardner,
Volker L. Deringer
Abstract:
Machine learning (ML) approaches enable large-scale atomistic simulations with near-quantum-mechanical accuracy. With the growing availability of these methods there arises a need for careful validation, particularly for physically agnostic models - that is, for potentials which extract the nature of atomic interactions from reference data. Here, we review the basic principles behind ML potentials…
▽ More
Machine learning (ML) approaches enable large-scale atomistic simulations with near-quantum-mechanical accuracy. With the growing availability of these methods there arises a need for careful validation, particularly for physically agnostic models - that is, for potentials which extract the nature of atomic interactions from reference data. Here, we review the basic principles behind ML potentials and their validation for atomic-scale materials modeling. We discuss best practice in defining error metrics based on numerical performance as well as physically guided validation. We give specific recommendations that we hope will be useful for the wider community, including those researchers who intend to use ML potentials for materials "off the shelf".
△ Less
Submitted 28 November, 2022; v1 submitted 22 November, 2022;
originally announced November 2022.