-
ColabFit Exchange: open-access datasets for data-driven interatomic potentials
Authors:
Joshua A. Vita,
Eric G. Fuemmeler,
Amit Gupta,
Gregory P. Wolfe,
Alexander Quanming Tao,
Ryan S. Elliott,
Stefano Martiniani,
Ellad B. Tadmor
Abstract:
Data-driven (DD) interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with…
▽ More
Data-driven (DD) interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at \url{https://colabfit.org/}, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70,000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework.
△ Less
Submitted 6 September, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Inverse molecular design from first principles: tailoring organic chromophore spectra for optoelectronic applications
Authors:
James David Green,
Eric Gabriel Fuemmeler,
Timothy J. H. Hele
Abstract:
The discovery of molecules with tailored optoelectronic properties such as specific frequency and intensity of absorption or emission is a major challenge in creating next-generation organic light-emitting diodes (OLEDs) and photovoltaics. This raises the question: how can we predict a potential chemical structure from these properties? Approaches that attempt to tackle this inverse design problem…
▽ More
The discovery of molecules with tailored optoelectronic properties such as specific frequency and intensity of absorption or emission is a major challenge in creating next-generation organic light-emitting diodes (OLEDs) and photovoltaics. This raises the question: how can we predict a potential chemical structure from these properties? Approaches that attempt to tackle this inverse design problem include virtual screening, active machine learning and genetic algorithms. However, these approaches rely on a molecular database or many electronic structure calculations, and significant computational savings could be achieved if there was prior knowledge of (i) whether the optoelectronic properties of a parent molecule could easily be improved and (ii) what morphing operations on a parent molecule could improve these properties. In this perspective we address both of these challenges from first principles. We firstly adapt the Thomas-Reiche-Kuhn sum rule to organic chromophores and show how this indicates how easily the absorption and emission of a molecule can be improved. We then show how by combining electronic structure theory and intensity borrowing perturbation theory we can predict whether or not the proposed morphing operations will achieve the desired spectral alteration, and thereby derive widely-applicable design rules. We go on to provide proof-of-concept illustrations of this approach to optimizing the visible absorption of acenes and the emission of radical OLEDs. We believe this approach can be integrated into genetic algorithms by biasing morphing operations in favour of those which are likely to be successful, leading to faster molecular discovery and greener chemistry.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Selected Columns of the Density Matrix in an Atomic Orbital Basis I: An Intrinsic and Non-Iterative Orbital Localization Scheme for the Occupied Space
Authors:
Eric G. Fuemmeler,
Anil Damle,
Robert A. DiStasio Jr
Abstract:
We extend the selected columns of the density matrix (SCDM) methodology [J. Chem. Theory Comput. 2015, 11, 1463--1469]---a non-iterative procedure for generating localized occupied orbitals for condensed-phase systems---to the construction of local molecular orbitals (LMOs) in systems described using non-orthogonal atomic orbital (AO) basis sets. In particular, we introduce three different variant…
▽ More
We extend the selected columns of the density matrix (SCDM) methodology [J. Chem. Theory Comput. 2015, 11, 1463--1469]---a non-iterative procedure for generating localized occupied orbitals for condensed-phase systems---to the construction of local molecular orbitals (LMOs) in systems described using non-orthogonal atomic orbital (AO) basis sets. In particular, we introduce three different variants of SCDM (referred to as SCDM-M, SCDM-L, and SCDM-G) that can be used in conjunction with the standard AO basis sets. The SCDM-M and SCDM-L variants are based on the Mulliken and L{ö}wdin representations of the density matrix, and are tantamount to selecting a well-conditioned set of projected atomic orbitals (PAOs) and projected (symmetrically-) orthogonalized atomic orbitals (POAOs), respectively, as proto-LMOs. The SCDM-G variant leverages a real-space (grid) representation of the wavefunction to select a set of well-conditioned proto-LMOs. A detailed comparative analysis reveals that the LMOs generated by these three SCDM variants are robust, comparable in orbital locality to those produced with the iterative Boys or Pipek-Mezey (PM) localization schemes, and are agnostic towards any single orbital locality metric. Although all three SCDM variants are based on the density matrix, we find that the character of the generated LMOs can differ significantly between SCDM-M, SCDM-L, and SCDM-G. In this regard, only the grid-based SCDM-G procedure (like PM) generates LMOs that qualitatively preserve $σ\text{-}π$ symmetry and are well-aligned with chemical intuition. While the direct and standalone use of SCDM-generated LMOs should suffice for most applications, our findings also suggest that the use of these orbitals as an unbiased and cost-effective (initial) guess also has the potential to improve the convergence of iterative orbital localization schemes.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Anticipating acene-based chromophore spectra with molecular orbital arguments
Authors:
Timothy J. H. Hele,
Eric G. Fuemmeler,
Samuel N. Sanders,
Elango Kumarasamy,
Matthew Y. Sfeir,
Luis M. Campos,
Nandini Ananth
Abstract:
Recent synthetic studies on the organic molecules tetracene and pentacene have found certain dimers and oligomers to exhibit an intense absorption in the visible region of the spectrum which is not present in the monomer or many previously-studied dimers. In this article we combine experimental synthesis with electronic structure theory and spectral computation to show that this absorption arises…
▽ More
Recent synthetic studies on the organic molecules tetracene and pentacene have found certain dimers and oligomers to exhibit an intense absorption in the visible region of the spectrum which is not present in the monomer or many previously-studied dimers. In this article we combine experimental synthesis with electronic structure theory and spectral computation to show that this absorption arises from an otherwise dark charge-transfer excitation 'borrowing intensity' from an intense UV excitation. Further, by characterizing the role of relevant monomer molecular orbitals, we arrive at a design principle that allows us to predict the presence or absence of an additional absorption based on the bonding geometry of the dimer. We find this rule correctly explains the spectra of a wide range of acene derivatives and solves an unexplained structure-spectrum phenomenon first observed seventy years ago. These results pave the way for the design of highly absorbent chromophores with applications ranging from photovoltaics to liquid crystals.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.