Search | arXiv e-print repository

AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion

Abstract: Determining the optimal configuration of adsorbates on a slab (adslab) is pivotal in the exploration of novel catalysts across diverse applications. Traditionally, the quest for the lowest energy adslab configuration involves placing the adsorbate onto the slab followed by an optimization process. Prior methodologies have relied on heuristics, problem-specific intuitions, or brute-force approaches… ▽ More Determining the optimal configuration of adsorbates on a slab (adslab) is pivotal in the exploration of novel catalysts across diverse applications. Traditionally, the quest for the lowest energy adslab configuration involves placing the adsorbate onto the slab followed by an optimization process. Prior methodologies have relied on heuristics, problem-specific intuitions, or brute-force approaches to guide adsorbate placement. In this work, we propose a novel framework for adsorbate placement using denoising diffusion. The model is designed to predict the optimal adsorbate site and orientation corresponding to the lowest energy configuration. Further, we have an end-to-end evaluation framework where diffusion-predicted adslab configuration is optimized with a pretrained machine learning force field and finally evaluated with Density Functional Theory (DFT). Our findings demonstrate an acceleration of up to 5x or 3.5x improvement in accuracy compared to the previous best approach. Given the novelty of this framework and application, we provide insights into the impact of pre-training, model architectures, and conduct extensive experiments to underscore the significance of this approach. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 8 pages, 7 figures, ICML 2024

arXiv:2405.02078 [pdf, other]

CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks

Authors: Brook Wander, Muhammed Shuaibi, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick

Abstract: Direct access to transition state energies at low computational cost unlocks the possibility of accelerating catalyst discovery. We show that the top performing graph neural network potential trained on the OC20 dataset, a related but different task, is able to find transition states energetically similar (within 0.1 eV) to density functional theory (DFT) 91% of the time with a 28x speedup. This s… ▽ More Direct access to transition state energies at low computational cost unlocks the possibility of accelerating catalyst discovery. We show that the top performing graph neural network potential trained on the OC20 dataset, a related but different task, is able to find transition states energetically similar (within 0.1 eV) to density functional theory (DFT) 91% of the time with a 28x speedup. This speaks to the generalizability of the models, having never been explicitly trained on reactions, the machine learned potential approximates the potential energy surface well enough to be performant for this auxiliary task. We introduce the Open Catalyst 2020 Nudged Elastic Band (OC20NEB) dataset, which is made of 932 DFT nudged elastic band calculations, to benchmark machine learned model performance on transition state energies. To demonstrate the efficacy of this approach, we replicated a well-known, large reaction network with 61 intermediates and 174 dissociation reactions at DFT resolution (40 meV). In this case of dense NEB enumeration, we realize even more computational cost savings and used just 12 GPU days of compute, where DFT would have taken 52 GPU years, a 1500x speedup. Similar searches for complete reaction networks could become routine using the approach presented here. Finally, we replicated an ammonia synthesis activity volcano and systematically found lower energy configurations of the transition states and intermediates on six stepped unary surfaces. This scalable approach offers a more complete treatment of configurational space to improve and accelerate catalyst discovery. △ Less

Submitted 11 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: 50 pages, 15 figures, submitted to Nature Catalysis

arXiv:2311.01987 [pdf, other]

Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials

Authors: Xiaoxiao Wang, Joseph Musielewicz, Richard Tran, Sudheesh Kumar Ethirajan, Xiaoyan Fu, Hilda Mera, John R. Kitchin, Rachel C. Kurchin, Zachary W. Ulissi

Abstract: Although density functional theory (DFT) has aided in accelerating the discovery of new materials, such calculations are computationally expensive, especially for high-throughput efforts. This has prompted an explosion in exploration of machine learning assisted techniques to improve the computational efficiency of DFT. In this study, we present a comprehensive investigation of the broader applica… ▽ More Although density functional theory (DFT) has aided in accelerating the discovery of new materials, such calculations are computationally expensive, especially for high-throughput efforts. This has prompted an explosion in exploration of machine learning assisted techniques to improve the computational efficiency of DFT. In this study, we present a comprehensive investigation of the broader application of Finetuna, an active learning framework to accelerate structural relaxation in DFT with prior information from Open Catalyst Project pretrained graph neural networks. We explore the challenges associated with out-of-domain systems: alcohol ($C_{>2}$) on metal surfaces as larger adsorbates, metal-oxides with spin polarization, and three-dimensional (3D) structures like zeolites and metal-organic-frameworks. By pre-training machine learning models on large datasets and fine-tuning the model along the simulation, we demonstrate the framework's ability to conduct relaxations with fewer DFT calculations. Depending on the similarity of the test systems to the training systems, a more conservative querying strategy is applied. Our best-performing Finetuna strategy reduces the number of DFT single-point calculations by 80% for alcohols and 3D structures, and 42% for oxide systems. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.16802 [pdf, other]

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

Authors: Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, Brandon M. Wood

Abstract: Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously… ▽ More Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of $\sim$120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the potential of pre-training strategies that utilize diverse data to advance property prediction across chemical domains, especially for low-data tasks. Please visit https://nima.sh/jmp for further information. △ Less

Submitted 6 May, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.04811 [pdf, other]

Chemical Properties from Graph Neural Network-Predicted Electron Densities

Authors: Ethan M. Sunshine, Muhammed Shuaibi, Zachary W. Ulissi, John R. Kitchin

Abstract: According to density functional theory, any chemical property can be inferred from the electron density, making it the most informative attribute of an atomic structure. In this work, we demonstrate the use of established physical methods to obtain important chemical properties from model-predicted electron densities. We introduce graph neural network architectural choices that provide physically… ▽ More According to density functional theory, any chemical property can be inferred from the electron density, making it the most informative attribute of an atomic structure. In this work, we demonstrate the use of established physical methods to obtain important chemical properties from model-predicted electron densities. We introduce graph neural network architectural choices that provide physically relevant and useful electron density predictions. Despite not training to predict atomic charges, the model is able to predict atomic charges with an order of magnitude lower error than a sum of atomic charge densities. Similarly, the model predicts dipole moments with half the error of the sum of atomic charge densities method. We demonstrate that larger data sets lead to more useful predictions in these tasks. These results pave the way for an alternative path in atomistic machine learning, where data-driven approaches and existing physical methods are used in tandem to obtain a variety of chemical properties in an explainable and self-consistent manner. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2302.14103 [pdf, other]

WhereWulff: A semi-autonomous workflow for systematic catalyst surface reactivity under reaction conditions

Authors: Rohan Yuri Sanspeur, Javier Heras-Domingo, John R. Kitchin, Zachary Ulissi

Abstract: This paper introduces WhereWulff, a semi-autonomous workflow for modeling the reactivity of catalyst surfaces. The workflow begins with a bulk optimization task that takes an initial bulk structure, and returns the optimized bulk geometry and magnetic state, including stability under reaction conditions. The stable bulk structure is the input to a surface chemistry task that enumerates surfaces up… ▽ More This paper introduces WhereWulff, a semi-autonomous workflow for modeling the reactivity of catalyst surfaces. The workflow begins with a bulk optimization task that takes an initial bulk structure, and returns the optimized bulk geometry and magnetic state, including stability under reaction conditions. The stable bulk structure is the input to a surface chemistry task that enumerates surfaces up to a user-specified maximum Miller index, computes relaxed surface energies for those surfaces, and then prioritizes those for subsequent adsorption energy calculations based on their contribution to the Wulff construction shape. The workflow handles computational resource constraints such as limited wall-time as well as automated job submission and analysis. We illustrate the workflow for oxygen evolution (OER) intermediates on two double perovskites. WhereWulff nearly halved the number of Density Functional Theory (DFT) calculations from ~ 240 to ~ 132 by prioritizing terminations, up to a maximum Miller index of 1, based on surface stability. Additionally, it automatically handled the 180 additional re-submission jobs required to successfully converge 120+ atoms systems under a 48-hour wall-time cluster constraint. There are four main use cases that we envision for WhereWulff: (1) as a first-principles source of truth to validate and update a closed-loop self-sustaining materials discovery pipeline, (2) as a data generation tool, (3) as an educational tool, allowing users (e.g. experimentalists) unfamiliar with OER modeling to probe materials they might be interested in before doing further in-domain analyses, (4) and finally as a starting point for users to extend with reactions other than OER, as part of a collaborative software community. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2206.02005 [pdf, other]

Open Challenges in Develo** Generalizable Large Scale Machine Learning Models for Catalyst Discovery

Authors: Adeesh Kolluru, Muhammed Shuaibi, Aini Palizhati, Nima Shoghi, Abhishek Das, Brandon Wood, C. Lawrence Zitnick, John R Kitchin, Zachary W Ulissi

Abstract: The development of machine learned potentials for catalyst discovery has predominantly been focused on very specific chemistries and material compositions. While effective in interpolating between available materials, these approaches struggle to generalize across chemical space. The recent curation of large-scale catalyst datasets has offered the opportunity to build a universal machine learning… ▽ More The development of machine learned potentials for catalyst discovery has predominantly been focused on very specific chemistries and material compositions. While effective in interpolating between available materials, these approaches struggle to generalize across chemical space. The recent curation of large-scale catalyst datasets has offered the opportunity to build a universal machine learning potential, spanning chemical and composition space. If accomplished, said potential could accelerate the catalyst discovery process across a variety of applications (CO2 reduction, NH3 production, etc.) without additional specialized training efforts that are currently required. The release of the Open Catalyst 2020 (OC20) has begun just that, pushing the heterogeneous catalysis and machine learning communities towards building more accurate and robust models. In this perspective, we discuss some of the challenges and findings of recent developments on OC20. We examine the performance of current models across different materials and adsorbates to identify notably underperforming subsets. We then discuss some of the modeling efforts surrounding energy-conservation, approaches to finding and evaluating the local minima, and augmentation of off-equilibrium data. To complement the community's ongoing developments, we end with an outlook to some of the important challenges that have yet to be thoroughly explored for large-scale catalyst discovery. △ Less

Submitted 13 June, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: submitted to ACS Catalysis

arXiv:2101.06307 [pdf, other]

doi 10.1063/5.0049665

Machine-learning accelerated geometry optimization in molecular simulation

Authors: Yilin Yang, Omar A. Jimenez-Negron, John R. Kitchin

Abstract: Geometry optimization is an important part of both computational materials and surface science because it is the path to finding ground state atomic structures and reaction pathways. These properties are used in the estimation of thermodynamic and kinetic properties of molecular and crystal structures. This process is slow at the quantum level of theory because it involves an iterative calculation… ▽ More Geometry optimization is an important part of both computational materials and surface science because it is the path to finding ground state atomic structures and reaction pathways. These properties are used in the estimation of thermodynamic and kinetic properties of molecular and crystal structures. This process is slow at the quantum level of theory because it involves an iterative calculation of forces using quantum chemical codes such as density functional theory (DFT), which are computationally expensive, and which limit the speed of the optimization algorithms. It would be highly advantageous to accelerate this process because then one could either do the same amount of work in less time, or more work in the same time. In this work, we provide a neural network (NN) ensemble based active learning method to accelerate the local geometry optimization for multiple configurations simultaneously. We illustrate the acceleration on several case studies including bare metal surfaces, surfaces with adsorbates, and nudged elastic band (NEB) for two reactions. In all cases the accelerated method requires fewer DFT calculations than the standard method. In addition, we provide an ASE-optimizer Python package to make the usage of the NN ensemble active learning for geometry optimization easier. △ Less

Submitted 18 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Showing 1–8 of 8 results for author: Kitchin, J R