-
Fourier series for singular measures in higher dimensions
Authors:
Chad Berner,
John E. Herr,
Palle E. T. Jorgensen,
Eric S. Weber
Abstract:
For multi-variable finite measure spaces, we present in this paper a new framework for non-orthogonal $L^2$ Fourier expansions. Our results hold for probability measures $μ$ with finite support in $\mathbb{R}^d$ that satisfy a certain disintegration condition that we refer to as ``slice-singular''. In this general framework, we present explicit $L^{2}(μ)$-Fourier expansions, with Fourier exponenti…
▽ More
For multi-variable finite measure spaces, we present in this paper a new framework for non-orthogonal $L^2$ Fourier expansions. Our results hold for probability measures $μ$ with finite support in $\mathbb{R}^d$ that satisfy a certain disintegration condition that we refer to as ``slice-singular''. In this general framework, we present explicit $L^{2}(μ)$-Fourier expansions, with Fourier exponentials having positive Fourier frequencies in each of the d coordinates. Our Fourier representations apply to every $f \in L^2(μ)$, are based on an extended Kaczmarz algorithm, and use a new recursive $μ$ Rokhlin disintegration representation. In detail, our Fourier series expansion for $f$ is in terms of the multivariate Fourier exponentials $\{e_n\}$, but the associated Fourier coefficients for $f$ are now computed from a Kaczmarz system $\{g_n\}$ in $L^{2}(μ)$ which is dual to the Fourier exponentials. The $\{g_n\}$ system is shown to be a Parseval frame for $L^{2}(μ)$. Explicit computations for our new Fourier expansions entail a detailed analysis of subspaces of the Hardy space on the polydisk, dual to $L^{2}(μ)$, and an associated d-variable Normalized Cauchy Transform. Our results extend earlier work for measures $μ$ in one and two dimensions, i.e., $d=1 (μ$ singular), and $d=2 (μ$ assumed slice-singular). Here our focus is the extension to the cases of measures $μ$ in dimensions $d >2$. Our results are illustrated with the use of explicit iterated function systems (IFSs), including the IFS generated Menger sponge for $d=3$.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials
Authors:
Peter Eastman,
Pavan Kumar Behara,
David L. Dotson,
Raimondas Galvelis,
John E. Herr,
Josh T. Horton,
Yuezhi Mao,
John D. Chodera,
Benjamin P. Pritchard,
Yuanqing Wang,
Gianni De Fabritiis,
Thomas E. Markland
Abstract:
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small…
▽ More
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
△ Less
Submitted 23 November, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
End-to-End Differentiable Molecular Mechanics Force Field Construction
Authors:
Yuanqing Wang,
Josh Fass,
Benjamin Kaminow,
John E. Herr,
Dominic Rufa,
Ivy Zhang,
Iván Pulido,
Mike Henry,
John D. Chodera
Abstract:
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discret…
▽ More
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules or applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter ty** schemes. When trained on the same quantum chemical small molecule dataset used to parameterize the openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-à-vis experiments in computing relative alchemical free energy calculations for a popular benchmark set.
△ Less
Submitted 18 April, 2022; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Identifying Complex Hadamard Submatrices of the Fourier Matrices via Primitive Sets
Authors:
John E. Herr,
Troy M. Wiegand
Abstract:
For a given selection of rows and columns from a Fourier matrix, we give a number of tests for whether the resulting submatrix is Hadamard based on the primitive sets of those rows and columns. In particular, we demonstrate that whether a given selection of rows and columns of a Fourier matrix forms a Hadamard submatrix is exactly determined by whether the primitive sets of those rows and columns…
▽ More
For a given selection of rows and columns from a Fourier matrix, we give a number of tests for whether the resulting submatrix is Hadamard based on the primitive sets of those rows and columns. In particular, we demonstrate that whether a given selection of rows and columns of a Fourier matrix forms a Hadamard submatrix is exactly determined by whether the primitive sets of those rows and columns are compatible with respect to the size of the Fourier matrix. This allows the partitioning of all submatrices into equivalence classes that will consist entirely of Hadamard or entirely of non-Hadamard submatrices and motivates the creation of compatibility graphs that represent this structure. We conclude with some results that facilitate the construction of these graphs for submatrix sizes 2 and 3.
△ Less
Submitted 2 February, 2021; v1 submitted 28 September, 2019;
originally announced September 2019.
-
Compressing physical properties of atomic species for improving predictive chemistry
Authors:
John E. Herr,
Kevin Koh,
Kun Yao,
John Parkhill
Abstract:
The answers to many unsolved problems lie in the intractable chemical space of molecules and materials. Machine learning techniques are rapidly growing in popularity as a way to compress and explore chemical space efficiently. One of the most important aspects of machine learning techniques is representation through the feature vector, which should contain the most important descriptors necessary…
▽ More
The answers to many unsolved problems lie in the intractable chemical space of molecules and materials. Machine learning techniques are rapidly growing in popularity as a way to compress and explore chemical space efficiently. One of the most important aspects of machine learning techniques is representation through the feature vector, which should contain the most important descriptors necessary to make accurate predictions, not least of which is the atomic species in the molecule or material. In this work we introduce a compressed representation of physical properties for atomic species we call the elemental modes. The elemental modes provide an excellent representation by capturing many of the nuances of the periodic table and the similarity of atomic species. We apply the elemental modes to several different tasks for machine learning algorithms and show that they enable us to make improvements to these tasks even beyond simply achieving higher accuracy predictions.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
A Characterization of Boundary Representations of Positive Matrices in the Hardy Space via the Abel Product
Authors:
John E. Herr,
Palle E. T. Jorgensen,
Eric S. Weber
Abstract:
Spectral measures give rise to a natural harmonic analysis on the unit disc via a boundary representation of a positive matrix arising from a spectrum of the measure. We consider in this paper the reverse: for a positive matrix in the Hardy space of the unit disc we consider which measures, if any, yield a boundary representation of the positive matrix. We prove a characterization of those represe…
▽ More
Spectral measures give rise to a natural harmonic analysis on the unit disc via a boundary representation of a positive matrix arising from a spectrum of the measure. We consider in this paper the reverse: for a positive matrix in the Hardy space of the unit disc we consider which measures, if any, yield a boundary representation of the positive matrix. We prove a characterization of those representing measures via a matrix identity by introducing a new operator product called the Abel Product.
△ Less
Submitted 28 February, 2018;
originally announced March 2018.
-
Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment
Authors:
John E. Herr,
Kun Yao,
Ryker McIntyre,
David Toth,
John Parkhill
Abstract:
Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled,…
▽ More
Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and `test data' chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow `test error' can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling (NMS) and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near $k_bT$. It is one cheap tool to address the issue of generalization.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
The TensorMol-0.1 Model Chemistry: a Neural Network Augmented with Long-Range Physics
Authors:
Kun Yao,
John E. Herr,
David W. Toth,
Ryker Mcintyre,
John Parkhill
Abstract:
Traditional force-fields cannot model chemical reactivity, and suffer from low generality without re-fitting. Neural network potentials promise to address these problems, offering energies and forces with near ab-initio accuracy at low cost. However a data-driven approach is naturally inefficient for long-range interatomic forces that have simple physical formulas. In this manuscript we construct…
▽ More
Traditional force-fields cannot model chemical reactivity, and suffer from low generality without re-fitting. Neural network potentials promise to address these problems, offering energies and forces with near ab-initio accuracy at low cost. However a data-driven approach is naturally inefficient for long-range interatomic forces that have simple physical formulas. In this manuscript we construct a hybrid model chemistry consisting of a nearsighted Neural-Network potential with screened long-range electrostatic and Van-Der-Waals physics. This trained potential, simply dubbed "TensorMol-0.1", is offered in an open-source python package capable of many of the simulation types commonly used to study chemistry: Geometry optimizations, harmonic spectra, and open or periodic molecular dynamics, Monte Carlo, and nudged elastic band calculations. We describe the robustness and speed of the package, demonstrating millihartree accuracy and scalability to tens-of-thousands of atoms on ordinary laptops. We demonstrate the performance of the model by reproducing vibrational spectra, and simulating molecular dynamics of a protein. Our comparisons with electronic structure theory and experiment demonstrate that neural network molecular dynamics is poised to become an important tool for molecular simulation, lowering the resource barrier to simulate chemistry.
△ Less
Submitted 20 November, 2017; v1 submitted 16 November, 2017;
originally announced November 2017.
-
A matrix characterization of boundary representations of positive matrices in the Hardy space
Authors:
John E. Herr,
Palle E. T. Jorgensen,
Eric S. Weber
Abstract:
Spectral measures give rise to a natural harmonic analysis on the unit disc via a boundary representation of a positive matrix arising from a spectrum of the measure. We consider in this paper the reverse: for a positive matrix in the Hardy space of the unit disc we consider which measures, if any, yield a boundary representation of the positive matrix. We introduce a potential characterization of…
▽ More
Spectral measures give rise to a natural harmonic analysis on the unit disc via a boundary representation of a positive matrix arising from a spectrum of the measure. We consider in this paper the reverse: for a positive matrix in the Hardy space of the unit disc we consider which measures, if any, yield a boundary representation of the positive matrix. We introduce a potential characterization of those measures via a matrix identity and show that the characterization holds in several important special cases.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
The Many-Body Expansion Combined with Neural Networks
Authors:
Kun Yao,
John E. Herr,
John Parkhill
Abstract:
Fragmentation methods such as the many-body expansion (MBE) are a common strategy to model large systems by partitioning energies into a hierarchy of decreasingly significant contributions. The number of fragments required for chemical accuracy is still prohibitively expensive for ab-initio MBE to compete with force field approximations for applications beyond single-point energies. Alongside the…
▽ More
Fragmentation methods such as the many-body expansion (MBE) are a common strategy to model large systems by partitioning energies into a hierarchy of decreasingly significant contributions. The number of fragments required for chemical accuracy is still prohibitively expensive for ab-initio MBE to compete with force field approximations for applications beyond single-point energies. Alongside the MBE, empirical models of ab-initio potential energy surfaces have improved, especially non-linear models based on neural networks (NN) which can reproduce ab-initio potential energy surfaces rapidly and accurately. Although they are fast, NNs suffer from their own curse of dimensionality; they must be trained on a representative sample of chemical space. In this paper we examine the synergy of the MBE and NN's, and explore their complementarity. The MBE offers a systematic way to treat systems of arbitrary size and intelligently sample chemical space. NN's reduce, by a factor in excess of $10^6$ the computational overhead of the MBE and reproduce the accuracy of ab-initio calculations without specialized force fields. We show they are remarkably general, providing comparable accuracy with drastically different chemical embeddings. To assess this we test a new chemical embedding which can be inverted to predict molecules with desired properties.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Positive Matrices in the Hardy Space with Prescribed Boundary Representations via the Kaczmarz Algorithm
Authors:
John E. Herr,
Palle E. T. Jorgensen,
Eric S. Weber
Abstract:
For a singular probability measure $μ$ on the circle, we show the existence of positive matrices on the unit disc which admit a boundary representation on the unit circle with respect to $μ$. These positive matrices are constructed in several different ways using the Kaczmarz algorithm. Some of these positive matrices correspond to the projection of the Szegő kernel on the disc to certain subspace…
▽ More
For a singular probability measure $μ$ on the circle, we show the existence of positive matrices on the unit disc which admit a boundary representation on the unit circle with respect to $μ$. These positive matrices are constructed in several different ways using the Kaczmarz algorithm. Some of these positive matrices correspond to the projection of the Szegő kernel on the disc to certain subspaces of the Hardy space corresponding to the normalized Cauchy transform of $μ$. Other positive matrices are obtained which correspond to subspaces of the Hardy space after a renormalization, and so are not projections of the Szegő kernel. We show that these positive matrices are a generalization of a spectrum or Fourier frame for $μ$, and the existence of such a positive matrix does not require $μ$ to be spectral.
△ Less
Submitted 29 March, 2016;
originally announced March 2016.
-
Fourier Series for Singular Measures
Authors:
John E. Herr,
Eric S. Weber
Abstract:
Using the Kaczmarz algorithm, we prove that for any singular Borel probability measure $μ$ on $[0,1)$, every $f\in L^2(μ)$ possesses a Fourier series of the form $f(x)=\sum_{n=0}^{\infty}c_ne^{2πinx}$. We show that the coefficients $c_{n}$ can be computed in terms of the quantities $\hat{f}(n) = \int_{0}^{1} f(x) e^{-2πi n x} d μ(x)$. We also demonstrate a Shannon-type sampling theorem for functio…
▽ More
Using the Kaczmarz algorithm, we prove that for any singular Borel probability measure $μ$ on $[0,1)$, every $f\in L^2(μ)$ possesses a Fourier series of the form $f(x)=\sum_{n=0}^{\infty}c_ne^{2πinx}$. We show that the coefficients $c_{n}$ can be computed in terms of the quantities $\hat{f}(n) = \int_{0}^{1} f(x) e^{-2πi n x} d μ(x)$. We also demonstrate a Shannon-type sampling theorem for functions that are in a sense $μ$-bandlimited.
△ Less
Submitted 1 May, 2016; v1 submitted 16 March, 2015;
originally announced March 2015.