-
Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing
Authors:
Pranav Shetty,
Aishat Adeboye,
Sonakshi Gupta,
Chao Zhang,
Rampi Ramprasad
Abstract:
We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantifie…
▽ More
We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantified for material discovery problems that can take decades. Our approach demonstrates a potential reduction in discovery time by approximately 75 %, equivalent to a 15 year acceleration in material innovation. Our pipeline enables us to extract data from greater than 3300 papers which is $\sim$5 times larger and therefore more diverse than similar data sets reported by others. We also trained machine learning models to predict the power conversion efficiency and used our model to identify promising donor-acceptor combinations that are as yet unreported. We thus demonstrate a pipeline that goes from published literature to extracted material property data which in turn is used to obtain data-driven insights. Our insights include active learning strategies that can be used to train strong predictive models of material properties or be robust to the initial material system used. This work provides a valuable framework for data-driven research in materials science.
△ Less
Submitted 21 June, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations
Authors:
Rui Feng,
Qi Zhu,
Huan Tran,
Binghong Chen,
Aubrey Toland,
Rampi Ramprasad,
Chao Zhang
Abstract:
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. W…
▽ More
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. We address this gap by proposing a force-centric pretraining model for 3D molecular conformations covering both equilibrium and off-equilibrium data. For off-equilibrium data, our model learns directly from their atomic forces. For equilibrium data, we introduce zero-force regularization and forced-based denoising techniques to approximate near-equilibrium forces. We obtain a unified pre-trained model for 3D molecular representation with over 15 million diverse conformations. Experiments show that, with our pre-training objective, we increase forces accuracy by around 3 times compared to the un-pre-trained Equivariant Transformer model. By incorporating regularizations on equilibrium data, we solved the problem of unstable MD simulations in vanilla Equivariant Transformers, achieving state-of-the-art simulation performance with 2.45 times faster inference time than NequIP. As a powerful molecular encoder, our pre-trained model achieves on-par performance with state-of-the-art property prediction tasks.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Polymer Informatics Beyond Homopolymers
Authors:
Shivank S. Shukla,
Christopher Kuenneth,
Rampi Ramprasad
Abstract:
Polymers are diverse and versatile materials that have met a wide range of material application demands. They come in several flavors and architectures, e.g., homopolymers, copolymers, polymer blends, and polymers with additives. Searching this enormous space for suitable materials with a specific set of property/performance targets is thus non-trivial, painstaking, and expensive. Such a search pr…
▽ More
Polymers are diverse and versatile materials that have met a wide range of material application demands. They come in several flavors and architectures, e.g., homopolymers, copolymers, polymer blends, and polymers with additives. Searching this enormous space for suitable materials with a specific set of property/performance targets is thus non-trivial, painstaking, and expensive. Such a search process can be made effective by the creation of rapid and accurate property predictors. In this work, we present a machine-learning framework to predict the thermal properties of homopolymers, copolymers, and polymer blends. A universal fingerprinting scheme capable of handling this entire polymer chemical class has been developed and a multi-task deep learning algorithm is trained simultaneously on a large dataset of glass transition, melting, and degradation temperatures. The developed models are accurate, fast, flexible, and scalable to other properties when suitable data become available.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Informatics-Driven Selection of Polymers for Fuel-Cell Applications
Authors:
Huan Tran,
Kuan-Hsuan Shen,
Shivank Shukla,
Ha-Kyung Kwon,
Rampi Ramprasad
Abstract:
Modern fuel cell technologies use Nafion as the material of choice for the proton exchange membrane (PEM) and as the binding material (ionomer), used to assemble the catalyst layers of the anode and cathode. These applications demand high proton conductivity as well as other requirements. For example, PEM is expected to block electrons, oxygen, and hydrogen from penetrating and diffusing while the…
▽ More
Modern fuel cell technologies use Nafion as the material of choice for the proton exchange membrane (PEM) and as the binding material (ionomer), used to assemble the catalyst layers of the anode and cathode. These applications demand high proton conductivity as well as other requirements. For example, PEM is expected to block electrons, oxygen, and hydrogen from penetrating and diffusing while the anode/cathode ionomer should allow hydrogen/oxygen to move easily, so that they can reach the catalyst nanoparticles. Given some of the well-known limits of Nafion, such as low glass-transition temperature, the community is in the midst of an active search for Nafion replacements. In this work, we present an informatics-based scheme to search large polymer chemical spaces, which includes establishing a list of properties needed for the targeted applications, develo** predictive machine-learning models for these properties, defining a search space, and using the developed models to screen the search space. Using the scheme, we have identified 60 new polymer candidates for PEM, anode ionomer, and cathode ionomer that we hope will be advanced to the next step, i.e., validating the designs through synthesis and testing. The proposed informatics scheme is generic, and can be used to select polymers for multiple applications in the future.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Computational framework for polymer synthesis to study dielectric properties using polarizable reactive molecular dynamics
Authors:
Ankit Mishra,
Lihua Chen,
ZongZe Li,
Ken-ichi Nomura,
Aravind Krishnamoorthy,
Shogo Fukushima,
Subodh C. Tiwari,
Rajiv K. Kalia,
Aiichiro Nakano,
Rampi Ramprasad,
Greg Sotzing,
Yang Cao,
Fuyuki Shimojo,
Priya Vashishta
Abstract:
The increased energy and power density required in modern electronics poses a challenge for designing new dielectric polymer materials with high energy density while maintaining low loss at high applied electric fields. Recently, an advanced computational screening method coupled with hierarchical modelling has accelerated the identification of promising high energy density materials. It is well k…
▽ More
The increased energy and power density required in modern electronics poses a challenge for designing new dielectric polymer materials with high energy density while maintaining low loss at high applied electric fields. Recently, an advanced computational screening method coupled with hierarchical modelling has accelerated the identification of promising high energy density materials. It is well known that the dielectric response of polymeric materials is largely influenced by their phases and local heterogeneous structures as well as operational temperature. Such inputs are crucial to accelerate the design and discovery of potential polymer candidates. However, an efficient computational framework to probe temperature dependence of the dielectric properties of polymers, while incorporating effects controlled by their morphology is still lacking. In this paper, we propose a scalable computational framework based on reactive molecular dynamics with a valence-state aware polarizable charge model, which is capable of handling practically relevant polymer morphologies and simultaneously provide near-quantum accuracy in estimating dielectric properties of various polymer systems. We demonstrate the predictive power of our framework on high energy density polymer systems recently identified through rational experimental-theoretical co-design. Our scalable and automated framework may be used for high-throughput theoretical screenings of combinatorial large design space to identify next-generation high energy density polymer materials.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Polymer Informatics with Multi-Task Learning
Authors:
Christopher Künneth,
Arunkumar Chitteth Rajan,
Huan Tran,
Lihua Chen,
Chiho Kim,
Rampi Ramprasad
Abstract:
Modern data-driven tools are transforming application-specific polymer development cycles. Surrogate models that can be trained to predict the properties of new polymers are becoming commonplace. Nevertheless, these models do not utilize the full breadth of the knowledge available in datasets, which are oftentimes sparse; inherent correlations between different property datasets are disregarded. H…
▽ More
Modern data-driven tools are transforming application-specific polymer development cycles. Surrogate models that can be trained to predict the properties of new polymers are becoming commonplace. Nevertheless, these models do not utilize the full breadth of the knowledge available in datasets, which are oftentimes sparse; inherent correlations between different property datasets are disregarded. Here, we demonstrate the potency of multi-task learning approaches that exploit such inherent correlations effectively, particularly when some property dataset sizes are small. Data pertaining to 36 different properties of over $13, 000$ polymers (corresponding to over $23,000$ data points) are coalesced and supplied to deep-learning multi-task architectures. Compared to conventional single-task learning models (that are trained on individual property datasets independently), the multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available. Moreover, these models are interpretable. Chemical rules, that explain how certain features control trends in specific property values, emerge from the present work, paving the way for the rational design of application specific polymers meeting desired property or performance objectives.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
EZFF: Python Library for Multi-Objective Parameterization and Uncertainty Quantification of Interatomic Forcefields for Molecular Dynamics
Authors:
Aravind Krishnamoorthy,
Ankit Mishra,
Deepak Kamal,
Sungwook Hong,
Ken-ichi Nomura,
Subodh Tiwari,
Aiichiro Nakano,
Rajiv Kalia,
Rampi Ramprasad,
Priya Vashishta
Abstract:
Parameterization of interatomic forcefields is a necessary first step in performing molecular dynamics simulations. This is a non-trivial global optimization problem involving quantification of multiple empirical variables against one or more properties. We present EZFF, a lightweight Python library for parameterization of several types of interatomic forcefields implemented in several molecular d…
▽ More
Parameterization of interatomic forcefields is a necessary first step in performing molecular dynamics simulations. This is a non-trivial global optimization problem involving quantification of multiple empirical variables against one or more properties. We present EZFF, a lightweight Python library for parameterization of several types of interatomic forcefields implemented in several molecular dynamics engines against multiple objectives using genetic-algorithm-based global optimization methods. The EZFF scheme provides unique functionality such as the parameterization of hybrid forcefields composed of multiple forcefield interactions as well as built-in quantification of uncertainty in forcefield parameters and can be easily extended to other forcefield functional forms as well as MD engines.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Screening of Therapeutic Agents for COVID-19 using Machine Learning and Ensemble Docking Simulations
Authors:
Rohit Batra,
Henry Chan,
Ganesh Kamath,
Rampi Ramprasad,
Mathew J. Cherukara,
Subramanian Sankaranarayanan
Abstract:
The world has witnessed unprecedented human and economic loss from the COVID-19 disease, caused by the novel coronavirus SARS-CoV-2. Extensive research is being conducted across the globe to identify therapeutic agents against the SARS-CoV-2. Here, we use a powerful and efficient computational strategy by combining machine learning (ML) based models and high-fidelity ensemble docking simulations t…
▽ More
The world has witnessed unprecedented human and economic loss from the COVID-19 disease, caused by the novel coronavirus SARS-CoV-2. Extensive research is being conducted across the globe to identify therapeutic agents against the SARS-CoV-2. Here, we use a powerful and efficient computational strategy by combining machine learning (ML) based models and high-fidelity ensemble docking simulations to enable rapid screening of possible therapeutic molecules (or ligands). Our screening is based on the binding affinity to either the isolated SARS-CoV-2 S-protein at its host receptor region or to the Sprotein-human ACE2 interface complex, thereby potentially limiting and/or disrupting the host-virus interactions. We first apply our screening strategy to two drug datasets (CureFFI and DrugCentral) to identify hundreds of ligands that bind strongly to the aforementioned two systems. Candidate ligands were then validated by all atom docking simulations. The validated ML models were subsequently used to screen a large bio-molecule dataset (with nearly a million entries) to provide a rank-ordered list of ~19,000 potentially useful compounds for further validation. Overall, this work not only expands our knowledge of small-molecule treatment against COVID-19, but also provides an efficient pathway to perform high-throughput computational drug screening by combining quick ML surrogate models with expensive high-fidelity simulations, for accelerating the therapeutic cure of diseases.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
Machine Learning Models for the Lattice Thermal Conductivity Prediction of Inorganic Materials
Authors:
Lihua Chen,
Huan Tran,
Rohit Batra,
Chiho Kim,
Rampi Ramprasad
Abstract:
The lattice thermal conductivity ($κ_{\rm L} $) is a critical property of thermoelectrics, thermal barrier coating materials and semiconductors. While accurate empirical measurements of $κ_{\rm L} $ are extremely challenging, it is usually approximated through computational approaches, such as semi-empirical models, Green-Kubo formalism coupled with molecular dynamics simulations, and first-princi…
▽ More
The lattice thermal conductivity ($κ_{\rm L} $) is a critical property of thermoelectrics, thermal barrier coating materials and semiconductors. While accurate empirical measurements of $κ_{\rm L} $ are extremely challenging, it is usually approximated through computational approaches, such as semi-empirical models, Green-Kubo formalism coupled with molecular dynamics simulations, and first-principles based methods. However, these theoretical methods are not only limited in terms of their accuracy, but sometimes become computationally intractable owing to their cost. Thus, in this work, we build a machine learning (ML)-based model to accurately and instantly predict $κ_{\rm L}$ of inorganic materials, using a benchmark data set of experimentally measured $κ_{\rm L} $ of about 100 inorganic solids. We use advanced and universal feature engineering techniques along with the Gaussian process regression algorithm, and compare the performance of our ML model with past theoretical works. The trained ML model is not only helpful for rational design and screening of novel materials, but we also identify key features governing the thermal transport behavior in non-metals.
△ Less
Submitted 4 August, 2019; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Accelerated materials property predictions and design using motif-based fingerprints
Authors:
Tran Doan Huan,
Arun Mannodi-Kanakkithodi,
Rampi Ramprasad
Abstract:
Data-driven approaches are particularly useful for computational materials discovery and design as they can be used for rapidly screening over a very large number of materials, thus suggesting lead candidates for further in-depth investigations. A central challenge of such approaches is to develop a numerical representation, often referred to as a fingerprint, of the materials. Inspired by recent…
▽ More
Data-driven approaches are particularly useful for computational materials discovery and design as they can be used for rapidly screening over a very large number of materials, thus suggesting lead candidates for further in-depth investigations. A central challenge of such approaches is to develop a numerical representation, often referred to as a fingerprint, of the materials. Inspired by recent developments in chem-informatics, we propose a class of hierarchical motif-based topological fingerprints for materials composed of elements such as C, O, H, N, F, etc., whose coordination preferences are well understood. We show that these fingerprints, when representing either molecules or crystals, may be effectively mapped onto a variety of properties using a similarity-based learning model and hence can be used to predict relevant properties of a material, given that its fingerprint can be defined. Two simple procedures are introduced to demonstrate that the learning model can be inverted to identify the desired fingerprints and then, to reconstruct molecules which possess a set of targeted properties.
△ Less
Submitted 28 March, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.