-
Boltzmann Generators and the New Frontier of Computational Sampling in Many-Body Systems
Authors:
Alessandro Coretti,
Sebastian Falkner,
Jan Weinreich,
Christoph Dellago,
O. Anatole von Lilienfeld
Abstract:
The paper by Noé et al. [F. Noé, S. Olsson, J. Köhler and H. Wu, Science, 365:6457 (2019)] introduced the concept of Boltzmann Generators (BGs), a deep generative model that can produce unbiased independent samples of many-body systems. They can generate equilibrium configurations from different metastable states, compute relative stabilities between different structures of proteins or other organ…
▽ More
The paper by Noé et al. [F. Noé, S. Olsson, J. Köhler and H. Wu, Science, 365:6457 (2019)] introduced the concept of Boltzmann Generators (BGs), a deep generative model that can produce unbiased independent samples of many-body systems. They can generate equilibrium configurations from different metastable states, compute relative stabilities between different structures of proteins or other organic molecules, and discover new states. In this commentary, we motivate the necessity for a new generation of sampling methods beyond molecular dynamics, explain the methodology, and give our perspective on the future role of BGs.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Understanding Representations by Exploring Galaxies in Chemical Space
Authors:
Jan Weinreich,
Konstantin Karandashev,
Guido Falk von Rudorff
Abstract:
We present a Monte Carlo approach for studying chemical feature distributions of molecules without training a machine learning model or performing exhaustive enumeration. The algorithm generates molecules with predefined similarity to a given one for any representation. It serves as a diagnostic tool to understand which molecules are grouped in feature space and to identify shortcomings of represe…
▽ More
We present a Monte Carlo approach for studying chemical feature distributions of molecules without training a machine learning model or performing exhaustive enumeration. The algorithm generates molecules with predefined similarity to a given one for any representation. It serves as a diagnostic tool to understand which molecules are grouped in feature space and to identify shortcomings of representations and embeddings from unsupervised learning. In this work, we first study clusters surrounding chosen molecules and demonstrate that common representations do not yield a constant density of molecules in feature space, with possible implications for learning behavior. Next, we observe a connection between representations and properties: a linear correlation between the property value of a central molecule and the average radial slope of that property in chemical space. Molecules with extremal property values have the largest property derivative values in chemical space, which provides a route to improve the data efficiency of a representation by tailoring it towards a given property. Finally, we demonstrate applications for sampling molecules with specified metric-dependent distributions to generate molecules biased toward graph spaces of interest.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Evolutionary Monte Carlo of QM properties in chemical space: Electrolyte design
Authors:
Konstantin Karandashev,
Jan Weinreich,
Stefan Heinen,
Daniel Jose Arismendi Arrieta,
Guido Falk von Rudorff,
Kersti Hermansson,
O. Anatole von Lilienfeld
Abstract:
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science, but also a very difficult one due to the vast number of possible molecular systems. We propose an Evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimi…
▽ More
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science, but also a very difficult one due to the vast number of possible molecular systems. We propose an Evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favourable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was done over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) datasets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to $10^{6}$ QM calculations and were thus only feasible thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
△ Less
Submitted 14 November, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Encrypted machine learning of molecular quantum properties
Authors:
Jan Weinreich,
Guido Falk von Rudorff,
O. Anatole von Lilienfeld
Abstract:
Large machine learning models with improved predictions have become widely available in the chemical sciences. Unfortunately, these models do not protect the privacy necessary within commercial settings, prohibiting the use of potentially extremely valuable data by others. Encrypting the prediction process can solve this problem by double-blind model evaluation and prohibits the extraction of trai…
▽ More
Large machine learning models with improved predictions have become widely available in the chemical sciences. Unfortunately, these models do not protect the privacy necessary within commercial settings, prohibiting the use of potentially extremely valuable data by others. Encrypting the prediction process can solve this problem by double-blind model evaluation and prohibits the extraction of training or query data. However, contemporary ML models based on fully homomorphic encryption or federated learning are either too expensive for practical use or have to trade higher speed for weaker security. We have implemented secure and computationally feasible encrypted machine learning models using oblivious transfer enabling and secure predictions of molecular quantum properties across chemical compound space. However, we find that encrypted predictions using kernel ridge regression models are a million times more expensive than without encryption. This demonstrates a dire need for a compact machine learning model architecture, including molecular representation and kernel matrix size, that minimizes model evaluation costs.
△ Less
Submitted 22 December, 2022; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Ab initio machine learning of phase space averages
Authors:
Jan Weinreich,
Dominik Lemm,
Guido Falk von Rudorff,
O. Anatole von Lilienfeld
Abstract:
Equilibrium structures determine material properties and biochemical functions. We propose to machine learn phase-space averages, conventionally obtained by {\em ab initio} or force-field based molecular dynamics (MD) or Monte Carlo simulations. In analogy to \textit(ab initio} molecular dynamics (AIMD), our {\em ab initio} machine learning (AIML) model does not require bond topologies and therefo…
▽ More
Equilibrium structures determine material properties and biochemical functions. We propose to machine learn phase-space averages, conventionally obtained by {\em ab initio} or force-field based molecular dynamics (MD) or Monte Carlo simulations. In analogy to \textit(ab initio} molecular dynamics (AIMD), our {\em ab initio} machine learning (AIML) model does not require bond topologies and therefore enables a general machine learning pathway to ensemble properties throughout chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data, and reaching competitive prediction errors (MAE $\sim$ 0.8 kcal/mol) for out-of-sample molecules -- within milli-seconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns throughout CCS at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
△ Less
Submitted 30 May, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Properties of α-Brass Nanoparticles II: Structure and Composition
Authors:
Jan Weinreich,
Martín Leandro Paleico,
Jörg Behler
Abstract:
Nanoparticles have become increasingly interesting for a wide range of applications, because in principle it is possible to tailor their properties by controlling size, shape and composition. One of these applications is heterogeneous catalysis, and a fundamental understanding of the structural details of the nanoparticles is essential for any knowledge-based improvement of reactivity and selectiv…
▽ More
Nanoparticles have become increasingly interesting for a wide range of applications, because in principle it is possible to tailor their properties by controlling size, shape and composition. One of these applications is heterogeneous catalysis, and a fundamental understanding of the structural details of the nanoparticles is essential for any knowledge-based improvement of reactivity and selectivity. In this work we investigate the atomic structure of brass nanoparticles containing up to 5000 atoms as a typical example for a binary alloy consisting of Cu and Zn. As systems of this size are too large for electronic structure calculations, in our simulations we use a recently parametrized machine learning potential providing close to density functional theory accuracy. This potential is employed for a structural characterization as a function of chemical composition by various types of simulations like Monte Carlo in the Semi-Grand Canonical Ensemble and simulated annealing molecular dynamics. Our analysis reveals that the distribution of both elements in the nanoparticles is inhomogeneous, and zinc accumulates in the outermost layer, while the first subsurface layer shows an enrichment of copper. Only for high zinc concentrations alloying can be found in the interior of the nanoparticles, and regular patterns corresponding to crystalline bulk phases of $α$-brass can then be observed. The surfaces of the investigated clusters exhibit well-ordered single-crystal facets, which can give rise to grain boundaries inside the clusters. The melting temperature of the nanoparticles is found to decrease with increasing zinc-atom fraction, a trend which is well-known also for the bulk phase diagram of brass.
△ Less
Submitted 21 June, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Machine Learning of Free Energies in Chemical Compound Space Using Ensemble Representations: Reaching Experimental Uncertainty for Solvation
Authors:
Jan Weinreich,
Nicholas J. Browning,
O. Anatole von Lilienfeld
Abstract:
Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine…
▽ More
Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine Learning (FML) model applicable throughout chemical compound space and based on a representation that employs Boltzmann averages to account for an approximated sampling of configurational space. Using the FreeSolv database, FML's out-of-sample prediction errors of experimental hydration free energies decay systematically with training set size, and experimental uncertainty (0.6 kcal/mol) is reached after training on 490 molecules (80\% of FreeSolv). Corresponding FML model errors are also on par with state-of-the art physics based approaches. To generate the input representation for a new query compound, FML requires approximate and short molecular dynamics runs. We showcase its usefulness through analysis of FML solvation free energies for 116k organic molecules (all force-field compatible molecules in QM9 database) identifying the most and least solvated systems, and rediscovering quasi-linear structure property relationships in terms of simple descriptors such as hydrogen-bond donors, number of NH or OH groups, number of oxygen atoms in hydrocarbons, and number of heavy atoms. FML's accuracy is maximal when the temperature used for the molecular dynamics simulation to generate averaged input representation samples in training is the same as for the query compounds. The sampling time for the representation converges rapidly with respect to the prediction error.
△ Less
Submitted 24 March, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Properties of $α$-Brass Nanoparticles I: Neural Network Potential Energy Surface
Authors:
Jan Weinreich,
Anton Römer,
Martín Leandro Paleico,
Jörg Behler
Abstract:
Binary metal clusters are of high interest for applications in heterogeneous catalysis and have received much attention in recent years. To gain insights into their structure and composition at the atomic scale, computer simulations can provide valuable information if reliable interatomic potentials are available. In this paper we describe the construction of a high-dimensional neural network pote…
▽ More
Binary metal clusters are of high interest for applications in heterogeneous catalysis and have received much attention in recent years. To gain insights into their structure and composition at the atomic scale, computer simulations can provide valuable information if reliable interatomic potentials are available. In this paper we describe the construction of a high-dimensional neural network potential (HDNNP) intended for simulations of large brass nanoparticles with thousands of atoms, which is also applicable to bulk $α$-brass and its surfaces. The HDNNP, which is based on reference data obtained from density-functional theory calculations, is very accurate with a root mean square error of 1.7 meV/atom for total energies and 39 meV/Å for the forces of structures not included in the training set. The potential has been thoroughly validated for a wide range of energetic and structural properties of bulk $α$-brass, its surfaces as well as clusters of different size and composition demonstrating its suitability for large-scale molecular dynamics and Monte Carlo simulations with first principles accuracy.
△ Less
Submitted 9 April, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.