-
Guest Editorial: Special Topic on Software for Atomistic Machine Learning
Authors:
Matthias Rupp,
Emine Küçükbenli,
Gábor Csányi
Abstract:
A survey of the contributions to the Journal of Chemical Physics' Special Topic on Software for Atomistic Machine Learning.
A survey of the contributions to the Journal of Chemical Physics' Special Topic on Software for Atomistic Machine Learning.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Self-consistent Coulomb interactions for machine learning interatomic potentials
Authors:
Jack Thomas,
William J. Baldwin,
Gábor Csányi,
Christoph Ortner
Abstract:
A ubiquitous approach to obtain transferable machine learning-based models of potential energy surfaces for atomistic systems is to decompose the total energy into a sum of local atom-centred contributions. However, in many systems non-negligible long-range electrostatic effects must be taken into account as well. We introduce a general mathematical framework to study how such long-range effects c…
▽ More
A ubiquitous approach to obtain transferable machine learning-based models of potential energy surfaces for atomistic systems is to decompose the total energy into a sum of local atom-centred contributions. However, in many systems non-negligible long-range electrostatic effects must be taken into account as well. We introduce a general mathematical framework to study how such long-range effects can be included in a way that (i) allows charge equilibration and (ii) retains the locality of the learnable atom-centred contributions to ensure transferability. Our results give partial explanations for the success of existing machine learned potentials that include equilibriation and provide perspectives how to design such schemes in a systematic way. To complement the rigorous theoretical results, we describe a practical scheme for fitting the energy and electron density of water clusters.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies
Authors:
Harveen Kaur,
Flaviano Della Pia,
Ilyes Batatia,
Xavier R. Advincula,
Benjamin X. Shi,
**ggang Lan,
Gábor Csányi,
Angelos Michaelides,
Venkat Kapil
Abstract:
Calculating sublimation enthalpies of molecular crystal polymorphs is relevant to a wide range of technological applications. However, predicting these quantities at first-principles accuracy -- even with the aid of machine learning potentials -- is a challenge that requires sub-kJ/mol accuracy in the potential energy surface and finite-temperature sampling. We present an accurate and data-efficie…
▽ More
Calculating sublimation enthalpies of molecular crystal polymorphs is relevant to a wide range of technological applications. However, predicting these quantities at first-principles accuracy -- even with the aid of machine learning potentials -- is a challenge that requires sub-kJ/mol accuracy in the potential energy surface and finite-temperature sampling. We present an accurate and data-efficient protocol based on fine-tuning of the foundational MACE-MP-0 model and showcase its capabilities on sublimation enthalpies and physical properties of ice polymorphs. Our approach requires only a few tens of training structures to achieve sub-kJ/mol accuracy in the sublimation enthalpies and sub 1 % error in densities for polymorphs at finite temperature and pressure. Exploiting this data efficiency, we explore simulations of hexagonal ice at the random phase approximation level of theory at experimental temperatures and pressures, calculating its physical properties, like pair correlation function and density, with good agreement with experiments. Our approach provides a way forward for predicting the stability of molecular crystals at finite thermodynamic conditions with the accuracy of correlated electronic structure theory.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Computing hydration free energies of small molecules with first principles accuracy
Authors:
J. Harry Moore,
Daniel J. Cole,
Gabor Csanyi
Abstract:
Free energies play a central role in characterising the behaviour of chemical systems and are among the most important quantities that can be calculated by molecular dynamics simulations. The free energy of hydration in particular is a well-studied physicochemical property of drug-like molecules and is commonly used to assess and optimise the accuracy of nonbonded parameters in empirical forcefiel…
▽ More
Free energies play a central role in characterising the behaviour of chemical systems and are among the most important quantities that can be calculated by molecular dynamics simulations. The free energy of hydration in particular is a well-studied physicochemical property of drug-like molecules and is commonly used to assess and optimise the accuracy of nonbonded parameters in empirical forcefields, and as a fast-to-compute surrogate of performance for protein-ligand binding free energy estimation. Machine learned potentials (MLPs) show great promise as more accurate alternatives to empirical forcefields, but are not readily decomposed into physically motivated functional forms, which has thus far rendered them incompatible with standard alchemical free energy methods that manipulate individual pairwise interaction terms. However, since the accuracy of free energy calculations is highly sensitive to the forcefield, this is a key area in which MLPs have the potential to address the shortcomings of empirical forcefields. In this work, we introduce an efficient alchemical free energy method compatible with MLPs, enabling, for the first time, calculations of biomolecular free energy with \textit{ab initio} accuracy. Using a pretrained, transferrable, alchemically equipped MACE model, we demonstrate sub-chemical accuracy for the hydration free energies of organic molecules.
△ Less
Submitted 21 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Benchmarking of machine learning interatomic potentials for reactive hydrogen dynamics at metal surfaces
Authors:
Wojciech G. Stark,
Cas van der Oord,
Ilyes Batatia,
Yaolong Zhang,
Bin Jiang,
Gábor Csányi,
Reinhard J. Maurer
Abstract:
Simulations of chemical reaction probabilities in gas surface dynamics require the calculation of ensemble averages over many tens of thousands of reaction events to predict dynamical observables that can be compared to experiments. At the same time, the energy landscapes need to be accurately mapped, as small errors in barriers can lead to large deviations in reaction probabilities. This brings a…
▽ More
Simulations of chemical reaction probabilities in gas surface dynamics require the calculation of ensemble averages over many tens of thousands of reaction events to predict dynamical observables that can be compared to experiments. At the same time, the energy landscapes need to be accurately mapped, as small errors in barriers can lead to large deviations in reaction probabilities. This brings a particularly interesting challenge for machine learning interatomic potentials, which are becoming well-established tools to accelerate molecular dynamics simulations. We compare state-of-the-art machine learning interatomic potentials with a particular focus on their inference performance on CPUs and suitability for high throughput simulation of reactive chemistry at surfaces. The considered models include polarizable atom interaction neural networks (PaiNN), recursively embedded atom neural networks (REANN), the MACE equivariant graph neural network, and atomic cluster expansion potentials (ACE). The models are applied to a dataset on reactive molecular hydrogen scattering on low-index surface facets of copper. All models are assessed for their accuracy, time-to-solution, and ability to simulate reactive sticking probabilities as a function of the rovibrational initial state and kinetic incidence energy of the molecule. REANN and MACE models provide the best balance between accuracy and time-to-solution and can be considered the current state-of-the-art in gas-surface dynamics. PaiNN models require many features for the best accuracy, which causes significant losses in computational efficiency. ACE models provide the fastest time-to-solution, however, models trained on the existing dataset were not able to achieve sufficiently accurate predictions in all cases.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Accurate Crystal Structure Prediction of New 2D Hybrid Organic Inorganic Perovskites
Authors:
Nima Karimitari,
William J. Baldwin,
Evan W. Muller,
Zachary J. L. Bare,
W. Joshua Kennedy,
Gábor Csányi,
Christopher Sutton
Abstract:
Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a promising class of electronically active materials for both light absorption and emission. The design space of HOIPs is extremely large, since a diverse space of organic cations can be combined with different inorganic frameworks. This immense design space allows for tunable electronic and mechanical properties, but also nece…
▽ More
Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a promising class of electronically active materials for both light absorption and emission. The design space of HOIPs is extremely large, since a diverse space of organic cations can be combined with different inorganic frameworks. This immense design space allows for tunable electronic and mechanical properties, but also necessitates the development of new tools for in silico high throughput analysis of candidate structures. In this work, we present an accurate, efficient, transferable and widely applicable machine learning interatomic potential (MLIP) for predicting the structure of new 2D HOIPs. Using the MACE architecture, an MLIP is trained on 86 diverse experimentally reported HOIP structures. The model is tested on 73 unseen perovskite compositions, and achieves chemical accuracy with respect to the reference electronic structure method. Our model is then combined with a simple random structure search algorithm to predict the structure of hypothetical HOIPs given only the proposed composition. Success is demonstrated by correctly and reliably recovering the crystal structure of a set of experimentally known 2D perovskites. Such a random structure search is impossible with ab initio methods due to the associated computational cost, but is relatively inexpensive with the MACE potential. Finally, the procedure is used to predict the structure formed by a new organic cation with no previously known corresponding perovskite. Laboratory synthesis of the new hybrid perovskite confirms the accuracy of our prediction. This capability, applied at scale, enables efficient screening of thousands of combinations of organic cations and inorganic layers.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Zero Shot Molecular Generation via Similarity Kernels
Authors:
Rokas Elijošius,
Fabian Zills,
Ilyes Batatia,
Sam Walton Norwood,
Dávid Péter Kovács,
Christian Holm,
Gábor Csányi
Abstract:
Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the beh…
▽ More
Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end. In between the two endpoints, it exhibits special properties that enable the building of large molecules. Using insights from the trained model, we present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation. SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules without any further training. Our approach allows full control over the molecular shape through point cloud priors and supports conditional generation. We also release an interactive web tool that allows users to generate structures with SiMGen online (https://zndraw.icp.uni-stuttgart.de).
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials
Authors:
Ivan Grega,
Ilyes Batatia,
Gábor Csányi,
Sri Karlapati,
Vikram S. Deshpande
Abstract:
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is…
▽ More
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is made available to the community which can fuel the development of methods anchored in physical principles for the fitting of fourth-order tensors. In addition, we present a higher-order GNN model trained on this dataset. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate its benefits in terms of predictive performance and reduced training requirements. Finally, we demonstrate an example application of the model to an architected material design task. The methods which we developed are applicable to fourth-order tensors beyond elasticity such as piezo-optical tensor etc.
△ Less
Submitted 20 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
A foundation model for atomistic materials chemistry
Authors:
Ilyes Batatia,
Philipp Benner,
Yuan Chiang,
Alin M. Elena,
Dávid P. Kovács,
Janosh Riebesell,
Xavier R. Advincula,
Mark Asta,
Matthew Avaylon,
William J. Baldwin,
Fabian Berger,
Noam Bernstein,
Arghya Bhowmik,
Samuel M. Blau,
Vlad Cărare,
James P. Darby,
Sandip De,
Flaviano Della Pia,
Volker L. Deringer,
Rokas Elijošius,
Zakariya El-Machachi,
Fabio Falcioni,
Edvin Fako,
Andrea C. Ferrari,
Annalena Genreith-Schriever
, et al. (51 additional authors not shown)
Abstract:
Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferabilit…
▽ More
Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferability from one chemical system to the next. Here, using the state-of-the-art MACE architecture we introduce a single general-purpose ML model, trained on a public database of 150k inorganic crystals, that is capable of running stable molecular dynamics on molecules and materials. We demonstrate the power of the MACE-MP-0 model - and its qualitative and at times quantitative accuracy - on a diverse set problems in the physical sciences, including the properties of solids, liquids, gases, chemical reactions, interfaces and even the dynamics of a small protein. The model can be applied out of the box and as a starting or "foundation model" for any atomistic system of interest and is thus a step towards democratising the revolution of ML force fields by lowering the barriers to entry.
△ Less
Submitted 1 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
MACE-OFF23: Transferable Machine Learning Force Fields for Organic Molecules
Authors:
Dávid Péter Kovács,
J. Harry Moore,
Nicholas J. Browning,
Ilyes Batatia,
Joshua T. Horton,
Venkat Kapil,
William C. Witt,
Ioan-Bogdan Magdău,
Daniel J. Cole,
Gábor Csányi
Abstract:
Classical empirical force fields have dominated biomolecular simulation for over 50 years. Although widely used in drug discovery, crystal structure prediction, and biomolecular dynamics, they generally lack the accuracy and transferability required for predictive modelling. In this paper, we introduce MACE-OFF23, a transferable force field for organic molecules created using state-of-the-art mach…
▽ More
Classical empirical force fields have dominated biomolecular simulation for over 50 years. Although widely used in drug discovery, crystal structure prediction, and biomolecular dynamics, they generally lack the accuracy and transferability required for predictive modelling. In this paper, we introduce MACE-OFF23, a transferable force field for organic molecules created using state-of-the-art machine learning technology and first-principles reference data computed with a high level of quantum mechanical theory. MACE-OFF23 demonstrates the remarkable capabilities of local, short-range models by accurately predicting a wide variety of gas and condensed phase properties of molecular systems. It produces accurate, easy-to-converge dihedral torsion scans of unseen molecules, as well as reliable descriptions of molecular crystals and liquids, including quantum nuclear effects. We further demonstrate the capabilities of MACE-OFF23 by determining free energy surfaces in explicit solvent, as well as the folding dynamics of peptides. Finally, we simulate a fully solvated small protein, observing accurate secondary structure and vibrational spectrum. These developments enable first-principles simulations of molecular systems for the broader chemistry community at high accuracy and low computational cost.
△ Less
Submitted 29 December, 2023; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Equivariant Matrix Function Neural Networks
Authors:
Ilyes Batatia,
Lars L. Schaaf,
Huajie Chen,
Gábor Csányi,
Christoph Ortner,
Felix A. Faber
Abstract:
Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in graphs such as large conjugated molecules, and social networks due to oversmoothing and oversquashing. Although Spectral GNNs and traditional neural networks su…
▽ More
Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in graphs such as large conjugated molecules, and social networks due to oversmoothing and oversquashing. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack generalizability, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves stateof-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.
△ Less
Submitted 30 January, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Gaussian Approximation Potentials: theory, software implementation and application examples
Authors:
Sascha Klawohn,
Gábor Csányi,
James P. Darby,
James R. Kermode,
Miguel A. Caro,
Albert P. Bartók
Abstract:
Gaussian Approximation Potentials are a class of Machine Learned Interatomic Potentials routinely used to model materials and molecular systems on the atomic scale. The software implementation provides the means for both fitting models using ab initio data and using the resulting potentials in atomic simulations. Details of the GAP theory, algorithms and software are presented, together with detai…
▽ More
Gaussian Approximation Potentials are a class of Machine Learned Interatomic Potentials routinely used to model materials and molecular systems on the atomic scale. The software implementation provides the means for both fitting models using ab initio data and using the resulting potentials in atomic simulations. Details of the GAP theory, algorithms and software are presented, together with detailed usage examples to help new and existing users. We review some recent developments to the GAP framework, including MPI parallelisation of the fitting code enabling its use on thousands of CPU cores and compression of descriptors to eliminate the poor scaling with the number of different chemical elements.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
ACEpotentials.jl: A Julia Implementation of the Atomic Cluster Expansion
Authors:
William C. Witt,
Cas van der Oord,
Elena Gelžinytė,
Teemu Järvinen,
Andres Ross,
James P. Darby,
Cheuk Hin Ho,
William J. Baldwin,
Matthias Sachs,
James Kermode,
Noam Bernstein,
Gábor Csányi,
Christoph Ortner
Abstract:
We introduce ACEpotentials.jl, a Julia-language software package that constructs interatomic potentials from quantum mechanical reference data using the Atomic Cluster Expansion (Drautz, 2019). As the latter provides a complete description of atomic environments, including invariance to overall translation and rotation as well as permutation of like atoms, the resulting potentials are systematical…
▽ More
We introduce ACEpotentials.jl, a Julia-language software package that constructs interatomic potentials from quantum mechanical reference data using the Atomic Cluster Expansion (Drautz, 2019). As the latter provides a complete description of atomic environments, including invariance to overall translation and rotation as well as permutation of like atoms, the resulting potentials are systematically improvable and data efficient. Furthermore, the descriptor's expressiveness enables use of a linear model, facilitating rapid evaluation and straightforward application of Bayesian techniques for active learning. We summarize the capabilities of ACEpotentials.jl and demonstrate its strengths (simplicity, interpretability, robustness, performance) on a selection of prototypical atomistic modelling workflows.
△ Less
Submitted 7 September, 2023; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Machine learning of microscopic structure-dynamics relationships in complex molecular systems
Authors:
Martina Crippa,
Annalisa Cardellini,
Matteo Cioni,
Gábor Csányi,
Giovanni M. Pavan
Abstract:
In many complex molecular systems, the macroscopic ensemble's properties are controlled by microscopic dynamic events (or fluctuations) that are often difficult to detect via pattern-recognition approaches. Discovering the relationships between local structural environments and the dynamical events originating from them would allow unveiling microscopic level structure-dynamics relationships funda…
▽ More
In many complex molecular systems, the macroscopic ensemble's properties are controlled by microscopic dynamic events (or fluctuations) that are often difficult to detect via pattern-recognition approaches. Discovering the relationships between local structural environments and the dynamical events originating from them would allow unveiling microscopic level structure-dynamics relationships fundamental to understand the macroscopic behavior of complex systems. Here we show that, by coupling advanced structural (e.g., Smooth Overlap of Atomic Positions, SOAP) with local dynamical descriptors (e.g., Local Environment and Neighbor Shuffling, LENS) in a unique dataset, it is possible to improve both individual SOAP- and LENS-based analyses, obtaining a more complete characterization of the system under study. As representative examples, we use various molecular systems with diverse internal structural dynamics. On the one hand, we demonstrate how the combination of structural and dynamical descriptors facilitates decoupling relevant dynamical fluctuations from noise, overcoming the intrinsic limits of the individual analyses. Furthermore, machine learning approaches also allow extracting from such combined structural/dynamical dataset useful microscopic-level relationships, relating key local dynamical events (e.g., LENS fluctuations) occurring in the systems to the local structural (SOAP) environments they originate from. Given its abstract nature, we believe that such an approach will be useful in revealing hidden microscopic structure-dynamics relationships fundamental to rationalize the behavior of a variety of complex systems, not necessarily limited to the atomistic and molecular scales.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Efficiency, Accuracy, and Transferability of Machine Learning Potentials: Application to Dislocations and Cracks in Iron
Authors:
Lei Zhang,
Gábor Csányi,
Erik van der Giessen,
Francesco Maresca
Abstract:
Machine learning interatomic potentials (ML-IAPs) enable quantum-accurate, classical molecular dynamics simulations of large systems, beyond reach of density functional theory (DFT). Yet, their efficiency and ability to predict systems larger than DFT supercells are not fully explored, posing a question regarding transferability to large-scale simulations with defects (e.g. dislocations, cracks).…
▽ More
Machine learning interatomic potentials (ML-IAPs) enable quantum-accurate, classical molecular dynamics simulations of large systems, beyond reach of density functional theory (DFT). Yet, their efficiency and ability to predict systems larger than DFT supercells are not fully explored, posing a question regarding transferability to large-scale simulations with defects (e.g. dislocations, cracks). Here, we apply a three-step validation approach to body-centered-cubic iron. First, accuracy and efficiency are assessed by optimizing ML-IAPs based on four state-of-the-art ML packages. The Pareto front of computational speed versus testing root-mean-square-error (RMSE) is computed. Second, benchmark properties relevant to plasticity and fracture are evaluated. Their average relative error Q with respect to DFT is found to correlate with RMSE. Third, transferability of ML-IAPs to dislocations and cracks is investigated by using per-atom model uncertainty quantification. The core structures and Peierls barriers of screw, M111 and three edge dislocations are compared with DFT. Traction-separation curve and critical stress intensity factor (K_Ic) are also predicted. Cleavage on the pre-existing crack plane is found to be the zero-temperature atomistic fracture mechanism of pure body-centered-cubic iron under mode-I loading, independent of ML package and training database. Quantitative predictions of dislocation glide paths and KIc can be sensitive to database, ML package, cutoff radius, and are limited by DFT accuracy. Our results highlight the importance of validating ML-IAPs by using indicators beyond RMSE. Moreover, significant computational speed-ups can be achieved by using the most efficient ML-IAP package, yet the assessment of the accuracy and transferability should be performed with care.
△ Less
Submitted 6 November, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
wfl Python Toolkit for Creating Machine Learning Interatomic Potentials and Related Atomistic Simulation Workflows
Authors:
Elena Gelžinytė,
Simon Wengert,
Tamás K. Stenczel,
Hendrik H. Heenen,
Karsten Reuter,
Gábor Csányi,
Noam Bernstein
Abstract:
Predictive atomistic simulations are increasingly employed for data intensive high throughput studies that take advantage of constantly growing computational resources. To handle the sheer number of individual calculations that are needed in such studies, workflow management packages for atomistic simulations have been developed for a rapidly growing user base. These packages are predominantly des…
▽ More
Predictive atomistic simulations are increasingly employed for data intensive high throughput studies that take advantage of constantly growing computational resources. To handle the sheer number of individual calculations that are needed in such studies, workflow management packages for atomistic simulations have been developed for a rapidly growing user base. These packages are predominantly designed to handle computationally heavy ab initio calculations, usually with a focus on data provenance and reproducibility. However, in related simulation communities, e.g. the developers of machine learning interatomic potentials (MLIPs), the computational requirements are somewhat different: the types, sizes, and numbers of computational tasks are more diverse, and therefore require additional ways of parallelization and local or remote execution for optimal efficiency. In this work, we present the atomistic simulation and MLIP fitting workflow management package wfl and Python remote execution package ExPyRe to meet these requirements. With wfl and ExPyRe, versatile Atomic Simulation Environment based workflows that perform diverse procedures can be written. This capability is based on a low-level developer-oriented framework, which can be utilized to construct high level functionality for user-friendly programs. Such high level capabilities to automate machine learning interatomic potential fitting procedures are already incorporated in wfl, which we use to showcase its capabilities in this work. We believe that wfl fills an important niche in several growing simulation communities and will aid the development of efficient custom computational tasks.
△ Less
Submitted 1 August, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials Science
Authors:
David Peter Kovacs,
Ilyes Batatia,
Eszter Sara Arany,
Gabor Csanyi
Abstract:
The MACE architecture represents the state of the art in the field of machine learning force fields for a variety of in-domain, extrapolation and low-data regime tasks. In this paper, we further evaluate MACE by fitting models for published benchmark datasets. We show that MACE generally outperforms alternatives for a wide range of systems from amorphous carbon, universal materials modelling, and…
▽ More
The MACE architecture represents the state of the art in the field of machine learning force fields for a variety of in-domain, extrapolation and low-data regime tasks. In this paper, we further evaluate MACE by fitting models for published benchmark datasets. We show that MACE generally outperforms alternatives for a wide range of systems from amorphous carbon, universal materials modelling, and general small molecule organic chemistry to large molecules and liquid water. We demonstrate the capabilities of the model on tasks ranging from constrained geometry optimisation to molecular dynamics simulations and find excellent performance across all tested domains. We show that MACE is very data efficient, and can reproduce experimental molecular vibrational spectra when trained on as few as 50 randomly selected reference configurations. We further demonstrate that the strictly local atom-centered model is sufficient for such tasks even in the case of large molecules and weakly interacting molecular assemblies.
△ Less
Submitted 2 July, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Structural Dynamics Descriptors for Metal Halide Perovskites
Authors:
Xia Liang,
Johan Klarbring,
William Baldwin,
Zhenzhu Li,
Gábor Csányi,
Aron Walsh
Abstract:
Metal halide perovskites have shown extraordinary performance in solar energy conversion technologies. They have been classified as "soft semiconductors" due to their flexible corner-sharing octahedral networks and polymorphous nature. Understanding the local and average structures continues to be challenging for both modelling and experiments. Here, we report the quantitative analysis of structur…
▽ More
Metal halide perovskites have shown extraordinary performance in solar energy conversion technologies. They have been classified as "soft semiconductors" due to their flexible corner-sharing octahedral networks and polymorphous nature. Understanding the local and average structures continues to be challenging for both modelling and experiments. Here, we report the quantitative analysis of structural dynamics in time and space from molecular dynamics simulations of perovskite crystals. The compact descriptors provided cover a wide variety of structural properties, including octahedral tilting and distortion, local lattice parameters, molecular orientations, as well as their spatial correlation. To validate our methods, we have trained a machine learning force field (MLFF) for methylammonium lead bromide (CH$_3$NH$_3$PbBr$_3$) using an on-the-fly training approach with Gaussian process regression. The known stable phases are reproduced and we find an additional symmetry-breaking effect in the cubic and tetragonal phases close to the phase transition temperature. To test the implementation for large trajectories, we also apply it to 69,120 atom simulations for CsPbI$_3$ based on an MLFF developed using the atomic cluster expansion formalism. The structural dynamics descriptors and Python toolkit are general to perovskites and readily transferable to more complex compositions.
△ Less
Submitted 23 July, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Dynamic Local Structure in Caesium Lead Iodide: Spatial Correlation and Transient Domains
Authors:
William Baldwin,
Xia Liang,
Johan Klarbring,
Milos Dubajic,
David Dell'Angelo,
Christopher Sutton,
Claudia Caddeo,
Samuel D. Stranks,
Alessandro Mattoni,
Aron Walsh,
Gábor Csányi
Abstract:
Metal halide perovskites are multifunctional semiconductors with tunable structures and properties. They are highly dynamic crystals with complex octahedral tilting patterns and strongly anharmonic atomic behaviour. In the higher temperature, higher symmetry phases of these materials, several complex structural features have been observed. The local structure can differ greatly from the average st…
▽ More
Metal halide perovskites are multifunctional semiconductors with tunable structures and properties. They are highly dynamic crystals with complex octahedral tilting patterns and strongly anharmonic atomic behaviour. In the higher temperature, higher symmetry phases of these materials, several complex structural features have been observed. The local structure can differ greatly from the average structure and there is evidence that dynamic two-dimensional structures of correlated octahedral motion form. An understanding of the underlying complex atomistic dynamics is, however, still lacking. In this work, the local structure of the inorganic perovskite CsPbI$_3$ is investigated using a new machine learning force field based on the atomic cluster expansion framework. Through analysis of the temporal and spatial correlation observed during large-scale simulations, we reveal that the low frequency motion of octahedral tilts implies a double-well effective potential landscape, even well into the cubic phase. Moreover, dynamic local regions of lower symmetry are present within both higher symmetry phases. These regions are planar and we report the length and timescales of the motion. Finally, we investigate and visualise the spatial arrangement of these features and their interactions, providing a comprehensive picture of local structure in the higher symmetry phases.
△ Less
Submitted 11 April, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Accurate Energy Barriers for Catalytic Reaction Pathways: An Automatic Training Protocol for Machine Learning Force Fields
Authors:
Lars Schaaf,
Edvin Fako,
Sandip De,
Ansgar Schäfer,
Gábor Csányi
Abstract:
In this study, we introduce a training protocol for develo** machine learning force fields (MLFFs), capable of accurately determining energy barriers in catalytic reaction pathways. The protocol is validated on the extensively explored hydrogenation of carbon dioxide to methanol over indium oxide. With the help of active learning, the final force field obtains energy barriers within 0.05 eV of D…
▽ More
In this study, we introduce a training protocol for develo** machine learning force fields (MLFFs), capable of accurately determining energy barriers in catalytic reaction pathways. The protocol is validated on the extensively explored hydrogenation of carbon dioxide to methanol over indium oxide. With the help of active learning, the final force field obtains energy barriers within 0.05 eV of Density Functional Theory. Thanks to the computational speedup, not only do we reduce the cost of routine in-silico catalytic tasks, but also find a 40\% reduction in the previously established rate-limiting step. Furthermore, we illustrate the importance of finite-temperature effects and compute free energy barriers. The transferability of the protocol is demonstrated on the experimentally relevant, yet unexplored, top-layer reduced indium oxide surface. The ability of MLFFs to enhance our understanding of extensively studied catalysts underscores the need for fast and accurate alternatives to direct ab-intio simulations.
△ Less
Submitted 22 August, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Hyperactive Learning (HAL) for Data-Driven Interatomic Potentials
Authors:
Cas van der Oord,
Matthias Sachs,
Dávid Péter Kovács,
Christoph Ortner,
Gábor Csányi
Abstract:
Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert u…
▽ More
Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.
△ Less
Submitted 7 November, 2022; v1 submitted 9 October, 2022;
originally announced October 2022.
-
Tensor-reduced atomic density representations
Authors:
James P. Darby,
Dávid P. Kovács,
Ilyes Batatia,
Miguel A. Caro,
Gus L. W. Hart,
Christoph Ortner,
Gábor Csányi
Abstract:
Density based representations of atomic environments that are invariant under Euclidean symmetries have become a widely used tool in the machine learning of interatomic potentials, broader data-driven atomistic modelling and the visualisation and analysis of materials datasets.The standard mechanism used to incorporate chemical element information is to create separate densities for each element a…
▽ More
Density based representations of atomic environments that are invariant under Euclidean symmetries have become a widely used tool in the machine learning of interatomic potentials, broader data-driven atomistic modelling and the visualisation and analysis of materials datasets.The standard mechanism used to incorporate chemical element information is to create separate densities for each element and form tensor products between them. This leads to a steep scaling in the size of the representation as the number of elements increases. Graph neural networks, which do not explicitly use density representations, escape this scaling by map** the chemical element information into a fixed dimensional space in a learnable way. We recast this approach as tensor factorisation by exploiting the tensor structure of standard neighbour density based descriptors. In doing so, we form compact tensor-reduced representations whose size does not depend on the number of chemical elements, but remain systematically convergeable and are therefore applicable to a wide range of data analysis and regression tasks.
△ Less
Submitted 6 December, 2022; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Atomistic fracture in bcc iron revealed by active learning of Gaussian approximation potential
Authors:
Lei Zhang,
Gábor Csányi,
Erik van der Giessen,
Francesco Maresca
Abstract:
The prediction of atomistic fracture mechanisms in body-centred cubic (bcc) iron is essential for understanding its semi-brittle nature. Existing atomistic simulations of the crack-tip deformation mechanisms under mode-I loading based on classical interatomic potentials yield contradicting predictions. To enable fracture prediction with quantum accuracy, we develop a Gaussian approximation potenti…
▽ More
The prediction of atomistic fracture mechanisms in body-centred cubic (bcc) iron is essential for understanding its semi-brittle nature. Existing atomistic simulations of the crack-tip deformation mechanisms under mode-I loading based on classical interatomic potentials yield contradicting predictions. To enable fracture prediction with quantum accuracy, we develop a Gaussian approximation potential (GAP) using an active learning strategy by extending a density functional theory (DFT) database of ferromagnetic bcc iron. We apply the active learning algorithm and obtain a Fe GAP model with a maximum predicted error of 8 meV/atom over a broad range of stress intensity factors (SIFs) and for four crack systems. The learning efficiency of the approach is analysed, and the predicted critical SIFs are compared with Griffith and Rice theories. The simulations reveal that cleavage along the original crack plane is the crack tip mechanism for {100} and {110} crack planes at T=0K, thus settling a long-standing dispute. Our work also highlights the need for a multiscale approach to predicting fracture and intrinsic ductility, whereby finite temperature, finite loading rate effects and pre-existing defects (e.g. nanovoids, dislocations) should be taken explicitly into account.
△ Less
Submitted 14 September, 2022; v1 submitted 11 August, 2022;
originally announced August 2022.
-
MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields
Authors:
Ilyes Batatia,
Dávid Péter Kovács,
Gregor N. C. Simm,
Christoph Ortner,
Gábor Csányi
Abstract:
Creating fast and accurate force fields is a long-standing challenge in computational chemistry and materials science. Recently, several equivariant message passing neural networks (MPNNs) have been shown to outperform models built using other approaches in terms of accuracy. However, most MPNNs suffer from high computational cost and poor scalability. We propose that these limitations arise becau…
▽ More
Creating fast and accurate force fields is a long-standing challenge in computational chemistry and materials science. Recently, several equivariant message passing neural networks (MPNNs) have been shown to outperform models built using other approaches in terms of accuracy. However, most MPNNs suffer from high computational cost and poor scalability. We propose that these limitations arise because MPNNs only pass two-body messages leading to a direct relationship between the number of layers and the expressivity of the network. In this work, we introduce MACE, a new equivariant MPNN model that uses higher body order messages. In particular, we show that using four-body messages reduces the required number of message passing iterations to just two, resulting in a fast and highly parallelizable model, reaching or exceeding state-of-the-art accuracy on the rMD17, 3BPA, and AcAc benchmark tasks. We also demonstrate that using higher order messages leads to an improved steepness of the learning curves.
△ Less
Submitted 26 January, 2023; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Nested sampling for physical scientists
Authors:
Greg Ashton,
Noam Bernstein,
Johannes Buchner,
Xi Chen,
Gábor Csányi,
Andrew Fowlie,
Farhan Feroz,
Matthew Griffiths,
Will Handley,
Michael Habeck,
Edward Higson,
Michael Hobson,
Anthony Lasenby,
David Parkinson,
Livia B. Pártay,
Matthew Pitkin,
Doris Schneider,
Joshua S. Speagle,
Leah South,
John Veitch,
Philipp Wacker,
David J. Wales,
David Yallup
Abstract:
We review Skilling's nested sampling (NS) algorithm for Bayesian inference and more broadly multi-dimensional integration. After recapitulating the principles of NS, we survey developments in implementing efficient NS algorithms in practice in high-dimensions, including methods for sampling from the so-called constrained prior. We outline the ways in which NS may be applied and describe the applic…
▽ More
We review Skilling's nested sampling (NS) algorithm for Bayesian inference and more broadly multi-dimensional integration. After recapitulating the principles of NS, we survey developments in implementing efficient NS algorithms in practice in high-dimensions, including methods for sampling from the so-called constrained prior. We outline the ways in which NS may be applied and describe the application of NS in three scientific fields in which the algorithm has proved to be useful: cosmology, gravitational-wave astronomy, and materials science. We close by making recommendations for best practice when using NS and by summarizing potential limitations and optimizations of NS.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Multilayer atomic cluster expansion for semi-local interactions
Authors:
Anton Bochkarev,
Yury Lysogorskiy,
Christoph Ortner,
Gábor Csányi,
Ralf Drautz
Abstract:
Traditionally, interatomic potentials assume local bond formation supplemented by long-range electrostatic interactions when necessary. This ignores intermediate range multi-atom interactions that arise from the relaxation of the electronic structure. Here, we present the multilayer atomic cluster expansion (ml-ACE) that includes collective, semi-local multi-atom interactions naturally within its…
▽ More
Traditionally, interatomic potentials assume local bond formation supplemented by long-range electrostatic interactions when necessary. This ignores intermediate range multi-atom interactions that arise from the relaxation of the electronic structure. Here, we present the multilayer atomic cluster expansion (ml-ACE) that includes collective, semi-local multi-atom interactions naturally within its remit. We demonstrate that ml-ACE significantly improves fit accuracy compared to a local expansion on selected examples and provide physical intuition to understand this improvement.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
The Design Space of E(3)-Equivariant Atom-Centered Interatomic Potentials
Authors:
Ilyes Batatia,
Simon Batzner,
Dávid Péter Kovács,
Albert Musaelian,
Gregor N. C. Simm,
Ralf Drautz,
Christoph Ortner,
Boris Kozinsky,
Gábor Csányi
Abstract:
The rapid progress of machine learning interatomic potentials over the past couple of years produced a number of new architectures. Particularly notable among these are the Atomic Cluster Expansion (ACE), which unified many of the earlier ideas around atom density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message passing neural network with equivariant features t…
▽ More
The rapid progress of machine learning interatomic potentials over the past couple of years produced a number of new architectures. Particularly notable among these are the Atomic Cluster Expansion (ACE), which unified many of the earlier ideas around atom density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message passing neural network with equivariant features that showed state of the art accuracy. In this work, we construct a mathematical framework that unifies these models: ACE is generalised so that it can be recast as one layer of a multi-layer architecture. From another point of view, the linearised version of NequIP is understood as a particular sparsification of a much larger polynomial model. Our framework also provides a practical tool for systematically probing different choices in the unified design space. We demonstrate this by an ablation study of NequIP via a set of experiments looking at in- and out-of-domain accuracy and smooth extrapolation very far from the training data, and shed some light on which design choices are critical for achieving high accuracy. Finally, we present BOTNet (Body-Ordered-Tensor-Network), a much-simplified version of NequIP, which has an interpretable architecture and maintains accuracy on benchmark datasets.
△ Less
Submitted 24 November, 2022; v1 submitted 13 May, 2022;
originally announced May 2022.
-
Comment on "Manifolds of quasi-constant SOAP and ACSF fingerprints and the resulting failure to machine learn four body interactions"
Authors:
Sergey N. Pozdnyakov,
Michael J. Willatt,
Albert P. Bartók,
Christoph Ortner,
Gábor Csányi,
Michele Ceriotti
Abstract:
The "quasi-constant" SOAP and ACSF fingerprint manifolds recently discovered by Parsaeifard and Goedecker are a direct consequence of the presence of degenerate pairs of configurations, a known shortcoming of all low-body-order atom-density correlation representations of molecular structures. Contrary to the configurations that are rigorously singular -- that we demonstrate can only occur in finit…
▽ More
The "quasi-constant" SOAP and ACSF fingerprint manifolds recently discovered by Parsaeifard and Goedecker are a direct consequence of the presence of degenerate pairs of configurations, a known shortcoming of all low-body-order atom-density correlation representations of molecular structures. Contrary to the configurations that are rigorously singular -- that we demonstrate can only occur in finite, discrete sets -- the continuous "quasi-constant" manifolds exhibit low, but non-zero, sensitivity to atomic displacements. Thus, it is possible to build interpolative machine-learning models of high-order interactions along the manifold, even though the numerical instabilities associated with proximity to the exact singularities affect the accuracy and transferability of such models, to an extent that depends on numerical details of the implementation.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Compressing local atomic neighbourhood descriptors
Authors:
James P. Darby,
James R. Kermode,
Gábor Csányi
Abstract:
Many atomic descriptors are currently limited by their unfavourable scaling with the number of chemical elements $S$ e.g. the length of body-ordered descriptors, such as the Smooth Overlap of Atomic Positions (SOAP) power spectrum (3-body) and the Atomic Cluster Expansion (ACE) (multiple body-orders), scales as $(NS)^ν$ where $ν+1$ is the body-order and $N$ is the number of radial basis functions…
▽ More
Many atomic descriptors are currently limited by their unfavourable scaling with the number of chemical elements $S$ e.g. the length of body-ordered descriptors, such as the Smooth Overlap of Atomic Positions (SOAP) power spectrum (3-body) and the Atomic Cluster Expansion (ACE) (multiple body-orders), scales as $(NS)^ν$ where $ν+1$ is the body-order and $N$ is the number of radial basis functions used in the density expansion. We introduce two distinct approaches which can be used to overcome this scaling for the SOAP power spectrum. Firstly, we show that the power spectrum is amenable to lossless compression with respect to both $S$ and $N$, so that the descriptor length can be reduced from $\mathcal{O}(N^2S^2)$ to $\mathcal{O}\left(NS\right)$. Secondly, we introduce a generalized SOAP kernel, where compression is achieved through the use of the total, element agnostic density, in combination with radial projection. The ideas used in the generalized kernel are equally applicably to any other body-ordered descriptors and we demonstrate this for the Atom Centered Symmetry Functions (ACSF). Finally, both compression approaches are shown to offer comparable performance to the original descriptor across a variety of numerical tests.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Local invertibility and sensitivity of atomic structure-feature map**s
Authors:
Sergey N. Pozdnyakov,
Liwei Zhang,
Christoph Ortner,
Gábor Csányi,
Michele Ceriotti
Abstract:
The increasingly common applications of machine-learning schemes to atomic-scale simulations have triggered efforts to better understand the mathematical properties of the map** between the Cartesian coordinates of the atoms and the variety of representations that can be used to convert them into a finite set of symmetric descriptors or features. Here, we analyze the sensitivity of the map** t…
▽ More
The increasingly common applications of machine-learning schemes to atomic-scale simulations have triggered efforts to better understand the mathematical properties of the map** between the Cartesian coordinates of the atoms and the variety of representations that can be used to convert them into a finite set of symmetric descriptors or features. Here, we analyze the sensitivity of the map** to atomic displacements, showing that the combination of symmetry and smoothness leads to map**s that have singular points at which the Jacobian has one or more null singular values (besides those corresponding to infinitesimal translations and rotations). This is in fact desirable, because it enforces physical symmetry constraints on the values predicted by regression models constructed using such representations. However, besides these symmetry-induced singularities, there are also spurious singular points, that we find to be linked to the incompleteness of the map**, i.e. the fact that, for certain classes of representations, structurally distinct configurations are not guaranteed to be mapped onto different feature vectors. Additional singularities can be introduced by a too aggressive truncation of the infinite basis set that is used to discretize the representations.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
A Gaussian Approximation Potential for Amorphous Si:H
Authors:
Davis Unruh,
Reza Vatan Meidanshahi,
Stephen M. Goodnick,
Gábor Csányi,
Gergely T. Zimányi
Abstract:
Hydrogenation of amorphous silicon (a-Si:H) is critical for reducing defect densities, passivating mid-gap states and surfaces, and improving photoconductivity in silicon-based electro-optical devices. Modelling the atomic scale structure of this material is critical to understanding these processes, which in turn is needed to describe c-Si/a-Si:H heterjunctions that are at the heart of the modern…
▽ More
Hydrogenation of amorphous silicon (a-Si:H) is critical for reducing defect densities, passivating mid-gap states and surfaces, and improving photoconductivity in silicon-based electro-optical devices. Modelling the atomic scale structure of this material is critical to understanding these processes, which in turn is needed to describe c-Si/a-Si:H heterjunctions that are at the heart of the modern solar cells with world record efficiency. Density functional theory (DFT) studies achieve the required high accuracy but are limited to moderate system sizes a hundred atoms or so by their high computational cost. Simulations of amorphous materials in particular have been hindered by this high cost because large structural models are required to capture the medium range order that is characteristic of such materials. Empirical potential models are much faster, but their accuracy is not sufficient to correctly describe the frustrated local structure. Data driven, "machine learned" interatomic potentials have broken this impasse, and have been highly successful in describing a variety of amorphous materials in their elemental phase. Here we extend the Gaussian approximation potential (GAP) for silicon by incorporating the interaction with hydrogen, thereby significantly improving the degree of realism with which amorphous silicon can be modelled. We show that our Si:H GAP enables the simulation of hydrogenated silicon with an accuracy very close to DFT, but with computational expense and run times reduced by several orders of magnitude for large structures. We demonstrate the capabilities of the Si:H GAP by creating models of hydrogenated liquid and amorphous silicon, and showing that their energies, forces and stresses are in excellent agreement with DFT results, and their structure as captured by bond and angle distributions, with both DFT and experiments.
△ Less
Submitted 5 January, 2022; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Machine learning force fields based on local parametrization of dispersion interactions: Application to the phase diagram of C$_{60}$
Authors:
Heikki Muhli,
Xi Chen,
Albert P. Bartók,
Patricia Hernández-León,
Gábor Csányi,
Tapio Ala-Nissila,
Miguel A. Caro
Abstract:
We present a comprehensive methodology to enable addition of van der Waals (vdW) corrections to machine learning (ML) atomistic force fields. Using a Gaussian approximation potential (GAP) [Bartók et al., Phys. Rev. Lett. 104, 136403 (2010)] as baseline, we accurately machine learn a local model of atomic polarizabilities based on Hirshfeld volume partitioning of the charge density [Tkatchenko and…
▽ More
We present a comprehensive methodology to enable addition of van der Waals (vdW) corrections to machine learning (ML) atomistic force fields. Using a Gaussian approximation potential (GAP) [Bartók et al., Phys. Rev. Lett. 104, 136403 (2010)] as baseline, we accurately machine learn a local model of atomic polarizabilities based on Hirshfeld volume partitioning of the charge density [Tkatchenko and Scheffler, Phys. Rev. Lett. 102, 073005 (2009)]. These environment-dependent polarizabilities are then used to parametrize a screened London-dispersion approximation to the vdW interactions. Our ML vdW model only needs to learn the charge density partitioning implicitly, by learning the reference Hirshfeld volumes from density functional theory (DFT). In practice, we can predict accurate Hirshfeld volumes from the knowledge of the local atomic environment (atomic positions) alone, making the model highly computationally efficient. For additional efficiency, our ML model of atomic polarizabilities reuses the same many-body atomic descriptors used for the underlying GAP learning of bonded interatomic interactions. We also show how the method enables straightforward computation of gradients of the observables, even when these remain challenging for the reference method (e.g., calculating gradients of the Hirshfeld volumes in DFT). Finally, we demonstrate the approach by studying the phase diagram of C$_{60}$, where vdW effects are important. The need for a highly accurate vdW-inclusive reactive force field is highlighted by modeling the decomposition of the C$_{60}$ molecules taking place at high pressures and temperatures.
△ Less
Submitted 10 August, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Ranking the information content of distance measures
Authors:
Aldo Glielmo,
Claudio Zeni,
Bingqing Cheng,
Gabor Csanyi,
Alessandro Laio
Abstract:
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches,…
▽ More
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.
△ Less
Submitted 25 May, 2022; v1 submitted 30 April, 2021;
originally announced April 2021.
-
Performant implementation of the atomic cluster expansion (PACE): Application to copper and silicon
Authors:
Yury Lysogorskiy,
Cas van der Oord,
Anton Bochkarev,
Sarath Menon,
Matteo Rinaldi,
Thomas Hammerschmidt,
Matous Mrovec,
Aidan Thompson,
Gábor Csányi,
Christoph Ortner,
Ralf Drautz
Abstract:
The atomic cluster expansion is a general polynomial expansion of the atomic energy in multi-atom basis functions. Here we implement the atomic cluster expansion in the performant C++ code \verb+PACE+ that is suitable for use in large scale atomistic simulations. We briefly review the atomic cluster expansion and give detailed expressions for energies and forces as well as efficient algorithms for…
▽ More
The atomic cluster expansion is a general polynomial expansion of the atomic energy in multi-atom basis functions. Here we implement the atomic cluster expansion in the performant C++ code \verb+PACE+ that is suitable for use in large scale atomistic simulations. We briefly review the atomic cluster expansion and give detailed expressions for energies and forces as well as efficient algorithms for their evaluation. We demonstrate that the atomic cluster expansion as implemented in \verb+PACE+ shifts a previously established Pareto front for machine learning interatomic potentials towards faster and more accurate calculations. Moreover, general purpose parameterizations are presented for copper and silicon and evaluated in detail. We show that the new Cu and Si potentials significantly improve on the best available potentials for highly accurate large-scale atomistic simulations.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Physics-inspired structural representations for molecules and materials
Authors:
Felix Musil,
Andrea Grisafi,
Albert P. Bartók,
Christoph Ortner,
Gábor Csányi,
Michele Ceriotti
Abstract:
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the succe…
▽ More
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks, and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.
△ Less
Submitted 4 August, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Predicting polarizabilities of silicon clusters using local chemical environments
Authors:
Mario G. Zauchner,
Stefano Dal Forno,
Gábor Cśanyi,
Andrew Horsfield,
Johannes Lischner
Abstract:
Calculating polarizabilities of large clusters with first-principles techniques is challenging because of the unfavorable scaling of computational cost with cluster size. To address this challenge, we demonstrate that polarizabilities of large hydrogenated silicon clusters containing thousands of atoms can be efficiently calculated with machine learning methods. Specifically, we construct machine…
▽ More
Calculating polarizabilities of large clusters with first-principles techniques is challenging because of the unfavorable scaling of computational cost with cluster size. To address this challenge, we demonstrate that polarizabilities of large hydrogenated silicon clusters containing thousands of atoms can be efficiently calculated with machine learning methods. Specifically, we construct machine learning models based on the smooth overlap of atomic positions (SOAP) descriptor and train the models using a database of calculated random-phase approximation polarizabilities for clusters containing up to 110 silicon atoms. We first demonstrate the ability of the machine learning models to fit the data and then assess their ability to predict cluster polarizabilities using k-fold cross validation. Finally, we study the machine learning predictions for clusters that are too large for explicit first-principles calculations and find that they accurately describe the dependence of the polarizabilities on the ratio of hydrogen to silicon atoms and also predict a bulk limit that is in good agreement with previous studies.
△ Less
Submitted 25 August, 2021; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Symmetry-Aware Actor-Critic for 3D Molecular Design
Authors:
Gregor N. C. Simm,
Robert Pinsler,
Gábor Csányi,
José Miguel Hernández-Lobato
Abstract:
Automating molecular design using deep reinforcement learning (RL) has the potential to greatly accelerate the search for novel materials. Despite recent progress on leveraging graph representations to design molecules, such methods are fundamentally limited by the lack of three-dimensional (3D) information. In light of this, we propose a novel actor-critic architecture for 3D molecular design tha…
▽ More
Automating molecular design using deep reinforcement learning (RL) has the potential to greatly accelerate the search for novel materials. Despite recent progress on leveraging graph representations to design molecules, such methods are fundamentally limited by the lack of three-dimensional (3D) information. In light of this, we propose a novel actor-critic architecture for 3D molecular design that can generate molecular structures unattainable with previous approaches. This is achieved by exploiting the symmetries of the design process through a rotationally covariant state-action representation based on a spherical harmonics series expansion. We demonstrate the benefits of our approach on several 3D molecular design tasks, where we find that building in such symmetries significantly improves generalization and the quality of generated molecules.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Atomic Permutationally Invariant Polynomials for Fitting Molecular Force Fields
Authors:
Alice Allen,
Gábor Csányi,
Geneviève Dusson,
Christoph Ortner
Abstract:
We introduce and explore an approach for constructing force fields for small molecules, which combines intuitive low body order empirical force field terms with the concepts of data driven statistical fits of recent machine learned potentials. We bring these two key ideas together to bridge the gap between established empirical force fields that have a high degree of transferability on the one han…
▽ More
We introduce and explore an approach for constructing force fields for small molecules, which combines intuitive low body order empirical force field terms with the concepts of data driven statistical fits of recent machine learned potentials. We bring these two key ideas together to bridge the gap between established empirical force fields that have a high degree of transferability on the one hand, and the machine learned potentials that are systematically improvable and can converge to very high accuracy, on the other. Our framework extends the atomic Permutationally Invariant Polynomials (aPIP) developed for elemental materials in [Mach. Learn.: Sci. Technol. 2019 1 015004] to molecular systems. The body order decomposition allows us to keep the dimensionality of each term low, while the use of an iterative fitting scheme as well as regularisation procedures improve the extrapolation outside the training set. We investigate aPIP force fields with up to generalised 4-body terms, and examine the performance on a set of small organic molecules. We achieve a high level of accuracy when fitting individual molecules, comparable to those of the many-body machine learned force fields. Fitted to a combined training set of short linear alkanes, the accuracy of the aPIP force field still significantly exceeds what can be expected from classical empirical force fields, while retaining reasonable transferability to both configurations far from the training set and to new molecules.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide
Authors:
Ganesh Sivaraman,
Leighanne Gallington,
Anand Narayanan Krishnamoorthy,
Marius Stan,
Gabor Csanyi,
Alvaro Vazquez-Mayagoitia,
Chris J. Benmore
Abstract:
Understanding the structure and properties of refractory oxides are critical for high temperature applications. In this work, a combined experimental and simulation approach uses an automated closed loop via an active-learner, which is initialized by X-ray and neutron diffraction measurements, and sequentially improves a machine-learning model until the experimentally predetermined phase space is…
▽ More
Understanding the structure and properties of refractory oxides are critical for high temperature applications. In this work, a combined experimental and simulation approach uses an automated closed loop via an active-learner, which is initialized by X-ray and neutron diffraction measurements, and sequentially improves a machine-learning model until the experimentally predetermined phase space is covered. A multi-phase potential is generated for a canonical example of the archetypal refractory oxide, HfO2, by drawing a minimum number of training configurations from room temperature to the liquid state at ~2900oC. The method significantly reduces model development time and human effort.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
An Accurate and Transferable Machine Learning Potential for Carbon
Authors:
Patrick Rowe,
Volker L Deringer,
Piero Gasparotto,
Gábor Csányi,
Angelos Michaelides
Abstract:
We present an accurate machine learning (ML) model for atomistic simulations of carbon, constructed using the Gaussian approximation potential (GAP) methodology. The potential, named GAP-20, describes the properties of the bulk crystalline and amorphous phases, crystal surfaces and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced co…
▽ More
We present an accurate machine learning (ML) model for atomistic simulations of carbon, constructed using the Gaussian approximation potential (GAP) methodology. The potential, named GAP-20, describes the properties of the bulk crystalline and amorphous phases, crystal surfaces and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. We combine structural databases for amorphous carbon and graphene, which we extend substantially by adding suitable configurations, for example, for defects in graphene and other nanostructures. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional. Dispersion interactions, which are crucial to describe multilayer carbonaceous materials, are therefore implicitly included. We additionally account for long-range dispersion interactions using a semianalytical two-body term and show that an improved model can be obtained through an optimisation of the many-body smooth overlap of atomic positions (SOAP) descriptor. We rigorously test the potential on lattice parameters, bond lengths, formation energies and phonon dispersions of numerous carbon allotropes. We compare the formation energies of an extensive set of defect structures, surfaces and surface reconstructions to DFT reference calculations. The present work demonstrates the ability to combine, in the same ML model, the previously attained flexibility required for amorphous carbon [Phys. Rev. B, 95, 094203, (2017)] with the high numerical accuracy necessary for crystalline graphene [Phys. Rev. B, 97, 054303, (2018)], thereby providing an interatomic potential that will be applicable to a wide range of applications concerning diverse forms of bulk and nanostructured carbon.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Learning the electronic density of states in condensed matter
Authors:
Chiheb Ben Mahmoud,
Andrea Anelli,
Gábor Csányi,
Michele Ceriotti
Abstract:
The electronic density of states (DOS) quantifies the distribution of the energy levels that can be occupied by electrons in a quasiparticle picture, and is central to modern electronic structure theory. It also underpins the computation and interpretation of experimentally observable material properties such as optical absorption and electrical conductivity. We discuss the challenges inherent in…
▽ More
The electronic density of states (DOS) quantifies the distribution of the energy levels that can be occupied by electrons in a quasiparticle picture, and is central to modern electronic structure theory. It also underpins the computation and interpretation of experimentally observable material properties such as optical absorption and electrical conductivity. We discuss the challenges inherent in the construction of a machine-learning (ML) framework aimed at predicting the DOS as a combination of local contributions that depend in turn on the geometric configuration of neighbours around each atom, using quasiparticle energy levels from density functional theory as training data. We present a challenging case study that includes configurations of silicon spanning a broad set of thermodynamic conditions, ranging from bulk structures to clusters, and from semiconducting to metallic behavior. We compare different approaches to represent the DOS, and the accuracy of predicting quantities such as the Fermi level, the DOS at the Fermi level, or the band energy, either directly or as a side-product of the evaluation of the DOS. The performance of the model depends crucially on the smoothening of the DOS, and there is a tradeoff to be made between the systematic error associated with the smoothening and the error in the ML model for a specific structure. We demonstrate the usefulness of this approach by computing the density of states of a large amorphous silicon sample, for which it would be prohibitively expensive to compute the DOS by direct electronic structure calculations, and show how the atom-centred decomposition of the DOS that is obtained through our model can be used to extract physical insights into the connections between structural and electronic features.
△ Less
Submitted 12 November, 2020; v1 submitted 21 June, 2020;
originally announced June 2020.
-
Machine learning driven simulated deposition of carbon films: from low-density to diamondlike amorphous carbon
Authors:
Miguel A. Caro,
Gábor Csányi,
Tomi Laurila,
Volker L. Deringer
Abstract:
Amorphous carbon (a-C) materials have diverse interesting and useful properties, but the understanding of their atomic-scale structures is still incomplete. Here, we report on extensive atomistic simulations of the deposition and growth of a-C films, describing interatomic interactions using a machine learning (ML) based Gaussian Approximation Potential (GAP) model. We expand widely on our initial…
▽ More
Amorphous carbon (a-C) materials have diverse interesting and useful properties, but the understanding of their atomic-scale structures is still incomplete. Here, we report on extensive atomistic simulations of the deposition and growth of a-C films, describing interatomic interactions using a machine learning (ML) based Gaussian Approximation Potential (GAP) model. We expand widely on our initial work [Phys. Rev. Lett. 120, 166101 (2018)] by now considering a broad range of incident ion energies, thus modeling samples that span the entire range from low-density ($sp^{2}$-rich) to high-density ($sp^{3}$-rich, "diamond-like") amorphous forms of carbon. Two different mechanisms are observed in these simulations, depending on the impact energy: low-energy impacts induce $sp$- and $sp^{2}$-dominated growth directly around the impact site, whereas high-energy impacts induce peening. Furthermore, we propose and apply a scheme for computing the anisotropic elastic properties of the a-C films. Our work provides fundamental insight into this intriguing class of disordered solids, as well as a conceptual and methodological blueprint for simulating the atomic-scale deposition of other materials with ML-driven molecular dynamics.
△ Less
Submitted 4 November, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Combining phonon accuracy with high transferability in Gaussian approximation potential models
Authors:
Janine George,
Geoffroy Hautier,
Albert P. Bartók,
Gábor Csányi,
Volker L. Deringer
Abstract:
Machine learning driven interatomic potentials, including Gaussian approximation potential (GAP) models, are emerging tools for atomistic simulations. Here, we address the methodological question of how one can fit GAP models that accurately predict vibrational properties in specific regions of configuration space, whilst retaining flexibility and transferability to others. We use an adaptive regu…
▽ More
Machine learning driven interatomic potentials, including Gaussian approximation potential (GAP) models, are emerging tools for atomistic simulations. Here, we address the methodological question of how one can fit GAP models that accurately predict vibrational properties in specific regions of configuration space, whilst retaining flexibility and transferability to others. We use an adaptive regularization of the GAP fit that scales with the absolute force magnitude on any given atom, thereby exploring the Bayesian interpretation of GAP regularization as an "expected error", and its impact on the prediction of physical properties for a material of interest. The approach enables excellent predictions of phonon modes (to within 0.1-0.2 THz) for structurally diverse silicon allotropes, and it can be coupled with existing fitting databases for high transferability. These findings and workflows are expected to be useful for GAP-driven materials modeling more generally.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Gaussian Process States: A data-driven representation of quantum many-body physics
Authors:
Aldo Glielmo,
Yannic Rath,
Gabor Csanyi,
Alessandro De Vita,
George H. Booth
Abstract:
We present a novel, non-parametric form for compactly representing entangled many-body quantum states, which we call a `Gaussian Process State'. In contrast to other approaches, we define this state explicitly in terms of a configurational data set, with the probability amplitudes statistically inferred from this data according to Bayesian statistics. In this way the non-local physical correlated…
▽ More
We present a novel, non-parametric form for compactly representing entangled many-body quantum states, which we call a `Gaussian Process State'. In contrast to other approaches, we define this state explicitly in terms of a configurational data set, with the probability amplitudes statistically inferred from this data according to Bayesian statistics. In this way the non-local physical correlated features of the state can be analytically resummed, allowing for exponential complexity to underpin the ansatz, but efficiently represented in a small data set. The state is found to be highly compact, systematically improvable and efficient to sample, representing a large number of known variational states within its span. It is also proven to be a `universal approximator' for quantum states, able to capture any entangled many-body state with increasing data set size. We develop two numerical approaches which can learn this form directly: a fragmentation approach, and direct variational optimization, and apply these schemes to the Fermionic Hubbard model. We find competitive or superior descriptions of correlated quantum problems compared to existing state-of-the-art variational ansatzes, as well as other numerical methods.
△ Less
Submitted 17 September, 2020; v1 submitted 27 February, 2020;
originally announced February 2020.
-
On the Completeness of Atomic Structure Representations
Authors:
Sergey N. Pozdnyakov,
Michael J. Willatt,
Albert P. Bartók,
Christoph Ortner,
Gábor Csányi,
Michele Ceriotti
Abstract:
Many-body descriptors are widely used to represent atomic environments in the construction of machine learned interatomic potentials and more broadly for fitting, classification and embedding tasks on atomic structures. It was generally believed that 3-body descriptors uniquely specify the environment of an atom, up to a rotation and permutation of like atoms. We produce several counterexamples to…
▽ More
Many-body descriptors are widely used to represent atomic environments in the construction of machine learned interatomic potentials and more broadly for fitting, classification and embedding tasks on atomic structures. It was generally believed that 3-body descriptors uniquely specify the environment of an atom, up to a rotation and permutation of like atoms. We produce several counterexamples to this belief, with the consequence that any classifier, regression or embedding model for atom-centred properties that uses 3 (or 4)-body features will incorrectly give identical results for different configurations. Writing global properties (such as total energies) as a sum of many atom-centred contributions mitigates, but does not eliminate, the impact of this fundamental deficiency -- explaining the success of current "machine-learning" force fields. We anticipate the issues that will arise as the desired accuracy increases, and suggest potential solutions.
△ Less
Submitted 5 June, 2020; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Structural transitions in dense disordered silicon from quantum-accurate ultra-large-scale simulations
Authors:
Volker L. Deringer,
Noam Bernstein,
Gábor Csányi,
Mark Wilson,
David A. Drabold,
Stephen R. Elliott
Abstract:
Structurally disordered materials continue to pose fundamental questions, including that of how different disordered phases ("polyamorphs") can coexist and transform from one to another. As a widely studied case, amorphous silicon (a-Si) forms a fourfold-coordinated, covalent random network at ambient conditions, but much higher-coordinated, metallic-like phases under pressure. However, a detailed…
▽ More
Structurally disordered materials continue to pose fundamental questions, including that of how different disordered phases ("polyamorphs") can coexist and transform from one to another. As a widely studied case, amorphous silicon (a-Si) forms a fourfold-coordinated, covalent random network at ambient conditions, but much higher-coordinated, metallic-like phases under pressure. However, a detailed mechanistic understanding of the liquid-amorphous and amorphous-amorphous transitions in silicon has been lacking, due to intrinsic limitations of even the most advanced experimental and computational techniques. Here, we show how machine-learning (ML)-driven simulations can break through this long-standing barrier, affording a comprehensive, quantum-accurate, and fully atomistic description of all relevant liquid and amorphous phases of silicon. Combining a model system size of 100,000 atoms (ten-nanometre length scale) with a prediction accuracy of a few meV per atom, our simulations reveal a remarkable, three-step transformation sequence for a-Si under increasing external pressure. First, up to 10-11 GPa, polyamorphic low- and high-density amorphous (LDA and HDA) regions are found to coexist, rather than appearing sequentially. Then, we observe a structural collapse into a distinct, very-high-density amorphous (VHDA) phase at 12-13 GPa, reminiscent of the dense liquid but being formed at a much lower temperature. Finally, our simulations indicate the transient nature of this VHDA phase: it rapidly nucleates crystallites at 13-16 GPa, ultimately leading to the formation of a poly-crystalline, simple-hexagonal structure, consistent with experiments but not seen in earlier simulations.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Atomic Cluster Expansion: Completeness, Efficiency and Stability
Authors:
Genevieve Dusson,
Markus Bachmayr,
Gabor Csanyi,
Ralf Drautz,
Simon Etter,
Cas van der Oord,
Christoph Ortner
Abstract:
The Atomic Cluster Expansion (Drautz, Phys. Rev. B 99, 2019) provides a framework to systematically derive polynomial basis functions for approximating isometry and permutation invariant functions, particularly with an eye to modelling properties of atomistic systems. Our presentation extends the derivation by proposing a precomputation algorithm that yields immediate guarantees that a complete ba…
▽ More
The Atomic Cluster Expansion (Drautz, Phys. Rev. B 99, 2019) provides a framework to systematically derive polynomial basis functions for approximating isometry and permutation invariant functions, particularly with an eye to modelling properties of atomistic systems. Our presentation extends the derivation by proposing a precomputation algorithm that yields immediate guarantees that a complete basis is obtained. We provide a fast recursive algorithm for efficient evaluation and illustrate its performance in numerical tests. Finally, we discuss generalisations and open challenges, particularly from a numerical stability perspective, around basis optimisation and parameter estimation, paving the way towards a comprehensive analysis of the convergence to a high-fidelity reference model.
△ Less
Submitted 12 May, 2021; v1 submitted 8 November, 2019;
originally announced November 2019.
-
Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide
Authors:
Ganesh Sivaraman,
Anand Narayanan Krishnamoorthy,
Matthias Baur,
Christian Holm,
Marius Stan,
Gabor Csányi,
Chris Benmore,
Álvaro Vázquez-Mayagoitia
Abstract:
We propose a novel active learning scheme for automatically sampling a minimum number of uncorrelated configurations for fitting the Gaussian Approximation Potential (GAP). Our active learning scheme consists of an unsupervised machine learning (ML) scheme coupled to Bayesian optimization technique that evaluates the GAP model. We apply this scheme to a Hafnium dioxide (HfO2) dataset generated fro…
▽ More
We propose a novel active learning scheme for automatically sampling a minimum number of uncorrelated configurations for fitting the Gaussian Approximation Potential (GAP). Our active learning scheme consists of an unsupervised machine learning (ML) scheme coupled to Bayesian optimization technique that evaluates the GAP model. We apply this scheme to a Hafnium dioxide (HfO2) dataset generated from a melt-quench ab initio molecular dynamics (AIMD) protocol. Our results show that the active learning scheme, with no prior knowledge of the dataset is able to extract a configuration that reaches the required energy fit tolerance. Further, molecular dynamics (MD) simulations performed using this active learned GAP model on 6144-atom systems of amorphous and liquid state elucidate the structural properties of HfO2 with near ab initio precision and quench rates (i.e. 1.0 K/ps) not accessible via AIMD. The melt and amorphous x-ray structural factors generated from our simulation are in good agreement with experiment. Additionally, the calculated diffusion constants are in good agreement with previous ab initio studies.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Regularised Atomic Body-Ordered Permutation-Invariant Polynomials for the Construction of Interatomic Potentials
Authors:
Cas van der Oord,
Geneviève Dusson,
Gabor Csanyi,
Christoph Ortner
Abstract:
We investigate the use of invariant polynomials in the construction of data-driven interatomic potentials for material systems. The "atomic body-ordered permutation-invariant polynomials" (aPIPs) comprise a systematic basis and are constructed to preserve the symmetry of the potential energy function with respect to rotations and permutations. In contrast to kernel based and artificial neural netw…
▽ More
We investigate the use of invariant polynomials in the construction of data-driven interatomic potentials for material systems. The "atomic body-ordered permutation-invariant polynomials" (aPIPs) comprise a systematic basis and are constructed to preserve the symmetry of the potential energy function with respect to rotations and permutations. In contrast to kernel based and artificial neural network models, the explicit decomposition of the total energy as a sum of atomic body-ordered terms allows to keep the dimensionality of the fit reasonably low, up to just 10 for the 5-body terms. The explainability of the potential is aided by this decomposition, as the low body-order components can be studied and interpreted independently. Moreover, although polynomial basis functions are thought to extrapolate poorly, we show that the low dimensionality combined with careful regularisation actually leads to better transferability than the high dimensional, kernel based Gaussian Approximation Potential.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
A Performance and Cost Assessment of Machine Learning Interatomic Potentials
Authors:
Yunxing Zuo,
Chi Chen,
Xiangguo Li,
Zhi Deng,
Yiming Chen,
Jörg Behler,
Gábor Csányi,
Alexander V. Shapeev,
Aidan P. Thompson,
Mitchell A. Wood,
Shyue ** Ong
Abstract:
Machine learning of the quantitative relationship between local environment descriptors and the potential energy surface of a system of atoms has emerged as a new frontier in the development of interatomic potentials (IAPs). Here, we present a comprehensive evaluation of ML-IAPs based on four local environment descriptors --- Behler-Parrinello symmetry functions, smooth overlap of atomic positions…
▽ More
Machine learning of the quantitative relationship between local environment descriptors and the potential energy surface of a system of atoms has emerged as a new frontier in the development of interatomic potentials (IAPs). Here, we present a comprehensive evaluation of ML-IAPs based on four local environment descriptors --- Behler-Parrinello symmetry functions, smooth overlap of atomic positions (SOAP), the Spectral Neighbor Analysis Potential (SNAP) bispectrum components, and moment tensors --- using a diverse data set generated using high-throughput density functional theory (DFT) calculations. The data set comprising bcc (Li, Mo) and fcc (Cu, Ni) metals and diamond group IV semiconductors (Si, Ge) is chosen to span a range of crystal structures and bonding. All descriptors studied show excellent performance in predicting energies and forces far surpassing that of classical IAPs, as well as predicting properties such as elastic constants and phonon dispersion curves. We observe a general trade-off between accuracy and the degrees of freedom of each model, and consequently computational cost. We will discuss these trade-offs in the context of model selection for molecular dynamics and other applications.
△ Less
Submitted 24 July, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.