-
Composition and Structure Based GGA Bandgap Prediction Using Machine Learning Approach
Authors:
Mukesh K. Choudhary,
Amal Raj V,
Gowri Sankar S,
P. Ravindran
Abstract:
This study focuses on develo** precise machine learning (ML) regression models for predicting energy bandgap values based on chemical compositions and crystal structures. The primary aim is to match the accuracy of predictions derived from GGA-PBE calculations and validate them through density functional theory (DFT)-based band structure calculations. We assessed eight standalone ML regression m…
▽ More
This study focuses on develo** precise machine learning (ML) regression models for predicting energy bandgap values based on chemical compositions and crystal structures. The primary aim is to match the accuracy of predictions derived from GGA-PBE calculations and validate them through density functional theory (DFT)-based band structure calculations. We assessed eight standalone ML regression models, including AdaBoost, Bagging, CatBoost, LGBM, RF, DT, GB, and XGB. These models were analyzed for their ability to predict GGA-PBE bandgap values across diverse material structures and compositions, using a dataset containing bandgap values for 106,113 compounds. Additionally, we constructed four ensemble models using the stacking method and seven using the bagging method. These ensemble models incorporated RidgeCV and LassoCV to explore if ensemble techniques could enhance prediction accuracy. The dataset was divided into subsets of varying sizes: 10,000, 25,000, 50,000, and 100,000 entries. We determined feature importance through permutation techniques and established a correlation coefficient matrix using the Pearson correlation method. The Random Forest (RF) model emerged as the top performer among standalone models, achieving an R2 value of 0.943 and an RMSE value of 0.504 eV. Bagging regression demonstrated improved performance across different dataset sizes with streamlined feature selection. Ensemble models, particularly bagging, consistently outperformed standalone models, achieving the best R2 value of 0.948 and an RMSE value of 0.479 eV in the test dataset. Using the best-performing model, we predicted bandgap values for new half-Heusler compounds with 18 valence electron counts. These predictions were successfully validated using accurate DFT calculations. DFT calculations indicated that the newly predicted compounds are narrow bandgap semiconductors with dynamic stability.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Accelerating Defect Predictions in Semiconductors Using Graph Neural Networks
Authors:
Md Habibur Rahman,
Prince Gollapalli,
Panayotis Manganaris,
Satyesh Kumar Yadav,
Ghanshyam Pilania,
Brian DeCost,
Kamal Choudhary,
Arun Mannodi-Kanakkithodi
Abstract:
Here, we develop a framework for the prediction and screening of native defects and functional impurities in a chemical space of Group IV, III-V, and II-VI zinc blende (ZB) semiconductors, powered by crystal Graph-based Neural Networks (GNNs) trained on high-throughput density functional theory (DFT) data. Using an innovative approach of sampling partially optimized defect configurations from DFT…
▽ More
Here, we develop a framework for the prediction and screening of native defects and functional impurities in a chemical space of Group IV, III-V, and II-VI zinc blende (ZB) semiconductors, powered by crystal Graph-based Neural Networks (GNNs) trained on high-throughput density functional theory (DFT) data. Using an innovative approach of sampling partially optimized defect configurations from DFT calculations, we generate one of the largest computational defect datasets to date, containing many types of vacancies, self-interstitials, anti-site substitutions, impurity interstitials and substitutions, as well as some defect complexes. We applied three types of established GNN techniques, namely Crystal Graph Convolutional Neural Network (CGCNN), Materials Graph Network (MEGNET), and Atomistic Line Graph Neural Network (ALIGNN), to rigorously train models for predicting defect formation energy (DFE) in multiple charge states and chemical potential conditions. We find that ALIGNN yields the best DFE predictions with root mean square errors around 0.3 eV, which represents a prediction accuracy of 98 % given the range of values within the dataset, improving significantly on the state-of-the-art. Models are tested for different defect types as well as for defect charge transition levels. We further show that GNN-based defective structure optimization can take us close to DFT-optimized geometries at a fraction of the cost of full DFT. DFT-GNN models enable prediction and screening across thousands of hypothetical defects based on both unoptimized and partially-optimized defective structures, hel** identify electronically active defects in technologically-important semiconductors.
△ Less
Submitted 13 September, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Turbulence dynamics and flow speeds in the inner solar corona: Results from radio-sounding experiments by the Akatsuki spacecraft
Authors:
Richa N. Jain,
R. K. Choudhary,
Anil Bhardwaj,
T. Imamura,
Anshuman Sharma,
Umang M. Parikh
Abstract:
The solar inner corona is a region that plays a critical role in energizing the solar wind and propelling it to supersonic and supra-Alfvenic velocities. Despite its importance, this region remains poorly understood because of being least explored due to observational limitations. The coronal radio sounding technique in this context becomes useful as it helps in providing information in parts of t…
▽ More
The solar inner corona is a region that plays a critical role in energizing the solar wind and propelling it to supersonic and supra-Alfvenic velocities. Despite its importance, this region remains poorly understood because of being least explored due to observational limitations. The coronal radio sounding technique in this context becomes useful as it helps in providing information in parts of this least explored region. To shed light on the dynamics of the solar wind in the inner corona, we conducted a study using data obtained from coronal radio-sounding experiments carried out by the Akatsuki spacecraft during the 2021 Venus-solar conjunction event. By analyzing X-band radio signals recorded at two ground stations (IDSN in Bangalore and UDSC in Japan), we investigated plasma turbulence characteristics and estimated flow speed measurements based on isotropic quasi-static turbulence models. Our analysis revealed that the speed of the solar wind in the inner corona (at heliocentric distances from 5 to 13 solar radii), ranging from 220-550 km/sec, was higher than the expected average flow speeds in this region. By integrating our radio-sounding results with EUV images of the solar disk, we gained a unique perspective on the properties and energization of high-velocity plasma streams originating from coronal holes. We tracked the evolution of fast solar wind streams emanating from an extended coronal hole as they propagated to increasing heliocentric distances. Our study provides unique insights into the least-explored inner coronal region by corroborating radio sounding results with EUV observations of the corona.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Authors:
Kevin Maik Jablonka,
Qianxiang Ai,
Alexander Al-Feghali,
Shruti Badhwar,
Joshua D. Bocarsly,
Andres M Bran,
Stefan Bringuier,
L. Catherine Brinson,
Kamal Choudhary,
Defne Circi,
Sam Cox,
Wibe A. de Jong,
Matthew L. Evans,
Nicolas Gastellu,
Jerome Genzling,
María Victoria Gil,
Ankur K. Gupta,
Zhi Hong,
Alishba Imran,
Sabine Kruschwitz,
Anne Labarre,
Jakub Lála,
Tao Liu,
Steven Ma,
Sauradeep Majumdar
, et al. (28 additional authors not shown)
Abstract:
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole…
▽ More
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and develo** new educational applications.
The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.
△ Less
Submitted 14 July, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data
Authors:
Kamal Choudhary,
Mathew L. Kelley
Abstract:
In this work, we present the ChemNLP library that can be used for 1) curating open access datasets for materials and chemistry literature, develo** and comparing traditional machine learning, transformers and graph neural network models for 2) classifying and clustering texts, 3) named entity recognition for large-scale text-mining, 4) abstractive summarization for generating titles of articles…
▽ More
In this work, we present the ChemNLP library that can be used for 1) curating open access datasets for materials and chemistry literature, develo** and comparing traditional machine learning, transformers and graph neural network models for 2) classifying and clustering texts, 3) named entity recognition for large-scale text-mining, 4) abstractive summarization for generating titles of articles from abstracts, 5) text generation for suggesting abstracts from titles, 6) integration with density functional theory dataset for identifying potential candidate materials such as superconductors, and 7) web-interface development for text and reference query. We primarily use the publicly available arXiv and Pubchem datasets but the tools can be used for other datasets as well. Moreover, as new models are developed, they can be easily integrated in the library. ChemNLP is available at the websites: https://github.com/usnistgov/chemnlp and https://jarvis.nist.gov/jarvischemnlp.
△ Less
Submitted 15 August, 2023; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Reproducible Sorbent Materials Foundry for Carbon Capture at Scale
Authors:
Austin McDannald,
Howie Joress,
Brian DeCost,
Avery E. Baumann,
A. Gilad Kusne,
Kamal Choudhary,
Taner Yildirim,
Daniel W. Siderius,
Winnie Wong-Ng,
Andrew J. Allen,
Christopher M. Stafford,
Diana Ortiz-Montalvo
Abstract:
We envision an autonomous sorbent materials foundry (SMF) for rapidly evaluating materials for direct air capture of carbon dioxide (CO2), specifically targeting novel metal organic framework materials. Our proposed SMF is hierarchical, simultaneously addressing the most critical gaps in the inter-related space of sorbent material synthesis, processing, properties, and performance. The ability to…
▽ More
We envision an autonomous sorbent materials foundry (SMF) for rapidly evaluating materials for direct air capture of carbon dioxide (CO2), specifically targeting novel metal organic framework materials. Our proposed SMF is hierarchical, simultaneously addressing the most critical gaps in the inter-related space of sorbent material synthesis, processing, properties, and performance. The ability to collect these critical data streams in an agile, coordinated, and automated fashion will enable efficient end-to-end sorbent materials design through machine learning driven research framework.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Prediction of the electron density of states for crystalline compounds with Atomistic Line Graph Neural Networks (ALIGNN)
Authors:
Prathik R Kaundinya,
Kamal Choudhary,
Surya R. Kalidindi
Abstract:
Machine learning (ML) based models have greatly enhanced the traditional materials discovery and design pipeline. Specifically, in recent years, surrogate ML models for material property prediction have demonstrated success in predicting discrete scalar-valued target properties to within reasonable accuracy of their DFT-computed values. However, accurate prediction of spectral targets such as the…
▽ More
Machine learning (ML) based models have greatly enhanced the traditional materials discovery and design pipeline. Specifically, in recent years, surrogate ML models for material property prediction have demonstrated success in predicting discrete scalar-valued target properties to within reasonable accuracy of their DFT-computed values. However, accurate prediction of spectral targets such as the electron Density of States (DOS) poses a much more challenging problem due to the complexity of the target, and the limited amount of available training data. In this study, we present an extension of the recently developed Atomistic Line Graph Neural Network (ALIGNN) to accurately predict DOS of a large set of material unit cell structures, trained to the publicly available JARVIS-DFT dataset. Furthermore, we evaluate two methods of representation of the target quantity - a direct discretized spectrum, and a compressed low-dimensional representation obtained using an autoencoder. Through this work, we demonstrate the utility of graph-based featurization and modeling methods in the prediction of complex targets that depend on both chemistry and directional characteristics of material structures.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Graph Neural Network Predictions of Metal Organic Framework CO2 Adsorption Properties
Authors:
Kamal Choudhary,
Taner Yildirim,
Daniel Siderius,
Aaron Gilad Kusne,
Austin McDannald,
Diana Ortiz-Montalvo
Abstract:
The increasing CO2 level is a critical concern and suitable materials are needed to capture such gases from the environment. While experimental and conventional computational methods are useful in finding such materials, they are usually slow and there is a need to expedite such processes. We use Atomistic Line Graph Neural Network (ALIGNN) method to predict CO2 adsorption in metal organic framewo…
▽ More
The increasing CO2 level is a critical concern and suitable materials are needed to capture such gases from the environment. While experimental and conventional computational methods are useful in finding such materials, they are usually slow and there is a need to expedite such processes. We use Atomistic Line Graph Neural Network (ALIGNN) method to predict CO2 adsorption in metal organic frameworks (MOF), which are known for their high functional tunability. We train ALIGNN models for hypothetical MOF (hMOF) database with 137953 MOFs with grand canonical Monte Carlo (GCMC) based CO2 adsorption isotherms. We develop high accuracy and fast models for pre-screening applications. We apply the trained model on CoREMOF database and computationally rank them for experimental synthesis. In addition to the CO2 adsorption isotherm, we also train models for electronic bandgaps, surface area, void fraction, lowest cavity diameter, and pore limiting diameter, and illustrate the strength and limitation of such graph neural network models. For a few candidate MOFs we carry out GCMC calculations to evaluate the deep-learning (DL) predictions.
△ Less
Submitted 19 December, 2021;
originally announced December 2021.
-
Recent Advances and Applications of Deep Learning Methods in Materials Science
Authors:
Kamal Choudhary,
Brian DeCost,
Chi Chen,
Anubhav Jain,
Francesca Tavazza,
Ryan Cohn,
Cheol WooPark,
Alok Choudhary,
Ankit Agrawal,
Simon J. L. Billinge,
Elizabeth Holm,
Shyue ** Ong,
Chris Wolverton
Abstract:
Deep learning (DL) is one of the fastest growing topics in materials data science, with rapidly emerging applications spanning atomistic, image-based, spectral, and textual data modalities. DL allows analysis of unstructured data and automated identification of features. Recent development of large materials databases has fueled the application of DL methods in atomistic prediction in particular.…
▽ More
Deep learning (DL) is one of the fastest growing topics in materials data science, with rapidly emerging applications spanning atomistic, image-based, spectral, and textual data modalities. DL allows analysis of unstructured data and automated identification of features. Recent development of large materials databases has fueled the application of DL methods in atomistic prediction in particular. In contrast, advances in image and spectral data have largely leveraged synthetic data enabled by high quality forward models as well as by generative unsupervised DL methods. In this article, we present a high-level overview of deep-learning methods followed by a detailed discussion of recent developments of deep learning in atomistic simulation, materials imaging, spectral analysis, and natural language processing. For each modality we discuss applications involving both theoretical and experimental data, typical modeling approaches with their strengths and limitations, and relevant publicly available software and datasets. We conclude the review with a discussion of recent cross-cutting work related to uncertainty quantification in this field and a brief perspective on limitations, challenges, and potential growth areas for DL methods in materials science. The application of DL methods in materials science presents an exciting avenue for future materials discovery and design.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
The Joint Automated Repository for Various Integrated Simulations (JARVIS) for data-driven materials design
Authors:
Kamal Choudhary,
Kevin F. Garrity,
Andrew C. E. Reid,
Brian DeCost,
Adam J. Biacchi,
Angela R. Hight Walker,
Zachary Trautt,
Jason Hattrick-Simpers,
A. Gilad Kusne,
Andrea Centrone,
Albert Davydov,
Jie Jiang,
Ruth Pachter,
Gowoon Cheon,
Evan Reed,
Ankit Agrawal,
Xiaofeng Qian,
Vinit Sharma,
Houlong Zhuang,
Sergei V. Kalinin,
Bobby G. Sumpter,
Ghanshyam Pilania,
Pinar Acar,
Subhasish Mandal,
Kristjan Haule
, et al. (3 additional authors not shown)
Abstract:
The Joint Automated Repository for Various Integrated Simulations (JARVIS) is an integrated infrastructure to accelerate materials discovery and design using density functional theory (DFT), classical force-fields (FF), and machine learning (ML) techniques. JARVIS is motivated by the Materials Genome Initiative (MGI) principles of develo** open-access databases and tools to reduce the cost and d…
▽ More
The Joint Automated Repository for Various Integrated Simulations (JARVIS) is an integrated infrastructure to accelerate materials discovery and design using density functional theory (DFT), classical force-fields (FF), and machine learning (ML) techniques. JARVIS is motivated by the Materials Genome Initiative (MGI) principles of develo** open-access databases and tools to reduce the cost and development time of materials discovery, optimization, and deployment. The major features of JARVIS are: JARVIS-DFT, JARVIS-FF, JARVIS-ML, and JARVIS-Tools. To date, JARVIS consists of 40,000 materials and 1 million calculated properties in JARVIS-DFT, 1,500 materials and 110 force-fields in JARVIS-FF, and 25 ML models for material-property predictions in JARVIS-ML, all of which are continuously expanding. JARVIS-Tools provides scripts and workflows for running and analyzing various simulations. We compare our computational data to experiments or high-fidelity computational methods wherever applicable to evaluate error/uncertainty in predictions. In addition to the existing workflows, the infrastructure can support a wide variety of other technologically important applications as part of the data-driven materials design paradigm. The JARVIS datasets and tools are publicly available at the website: https://jarvis.nist.gov .
△ Less
Submitted 11 July, 2021; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Re-enterant efficiency of phototaxis in Chlamydomonas reinhardtii cells
Authors:
Sujeet Kumar Choudhary,
Aparna Baskaran,
Prerna Sharma
Abstract:
Phototaxis is one of the most fundamental stimulus-response behaviors in biology wherein motile micro-organisms sense light gradients to swim towards the light source. Apart from single cell survival and growth, it plays a major role at the global scale of aquatic ecosystem and bio-reactors. We study photoaxis of single celled algae Chalmydomonas reinhardtii as a function of cell number density an…
▽ More
Phototaxis is one of the most fundamental stimulus-response behaviors in biology wherein motile micro-organisms sense light gradients to swim towards the light source. Apart from single cell survival and growth, it plays a major role at the global scale of aquatic ecosystem and bio-reactors. We study photoaxis of single celled algae Chalmydomonas reinhardtii as a function of cell number density and light stimulus using high spatio-temporal video microscopy. Surprisingly, the phototactic efficiency has a minimum at a well-defined number density, for a given light gradient, above which the phototaxis behaviour of collection of cells can even exceed the performance obtainable from single isolated cells. We show that the origin of enhancement of performance above the critical concentration lies in the slowing down of the cells which enables them to sense light more effectively. We also show that this steady state phenomenology is well captured by a modelling the phototactic response as a density dependent torque acting on an active Brownian particle.
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
Can a short intervention focused on gravitational waves and quantum physics improve students' understanding and attitude?
Authors:
Rahul K. Choudhary,
Alexander Foppoli,
Te**der Kaur,
David G. Blair,
Marjan Zadnik,
Richard Meagher
Abstract:
The decline in student interest in science and technology is a major concern in the western world. One approach to reversing this decline is to introduce modern physics concepts much earlier in the school curriculum. We have used the context of the recent discoveries of gravitational waves to test benefits of one-day interventions, in which students are introduced to the ongoing nature of scientif…
▽ More
The decline in student interest in science and technology is a major concern in the western world. One approach to reversing this decline is to introduce modern physics concepts much earlier in the school curriculum. We have used the context of the recent discoveries of gravitational waves to test benefits of one-day interventions, in which students are introduced to the ongoing nature of scientific discovery, as well as the fundamental concepts of quantum physics and gravitation, which underpin these discoveries. Our innovative approach combines role-playing, model demonstrations, single photon interference and gravitational wave detection, plus simple experiments designed to emphasize the quantum interpretation of interference. We compare understanding and attitudes through pre and post testing on four age groups (school years 7, 8, 9 and 10), and compare results with those of longer interventions with Year 9. Results indicate that neither prior knowledge nor age are significant factors in student understanding of the core concepts of Einsteinian physics. However we find that the short interventions are insufficient to enable students to comprehend more derived concepts.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Gender response to Einsteinian physics interventions in School
Authors:
Te**der Kaur,
David Blair,
Rahul Kumar Choudhary,
Yohanes Sudarmo Dua,
Alexander Foppoli,
Marjan Zadnik
Abstract:
There is growing interest in the introduction of Einsteinian concepts of space, time, light and gravity across the entire school curriculum. We have developed intervention programs and measured their effectiveness in terms of student attitudes to physics and ability to understand the concepts with classes from Years 6 to 10. In all cases we observe significant levels of conceptual understanding an…
▽ More
There is growing interest in the introduction of Einsteinian concepts of space, time, light and gravity across the entire school curriculum. We have developed intervention programs and measured their effectiveness in terms of student attitudes to physics and ability to understand the concepts with classes from Years 6 to 10. In all cases we observe significant levels of conceptual understanding and improvement in student attitudes, although the magnitude of the improvement depends on age group and program duration. This paper reports an unexpected outcome in regard to gender effects. We have compared male and female outcomes. In most cases, independent of age group, academic stream and culture (including one intervention in Indonesia), we find that females enter our programmes with substantially lower attitude scores than males, while on completion their attitudes are comparable to the boys. This provides a strong case for widespread implementation of Einsteinian conceptual learning across the school curriculum. We discuss possible reasons for this effect.
△ Less
Submitted 24 October, 2019; v1 submitted 18 December, 2017;
originally announced December 2017.
-
An Exploration of OpenCL for a Numerical Relativity Application
Authors:
Niket K. Choudhary,
Rakesh Ginjupalli,
Sandeep Navada,
Gaurav Khanna
Abstract:
Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver using…
▽ More
Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver using finite-differencing). OpenCL is the only vendor-agnostic and multi-platform parallel computing framework that has been adopted by all major processor vendors. Therefore, it allows us to write portable source-code and run it on a wide variety of compute hardware and perform meaningful comparisons. The outcome of our experimentation suggests that it is relatively straightforward to obtain order-of-magnitude gains in overall application performance by making use of many-core GPUs over multi-core CPUs and this fact is largely independent of the specific hardware architecture and vendor. We also observe that a single high-end GPU can match the performance of a small-sized, message-passing based CPU cluster.
△ Less
Submitted 3 October, 2011; v1 submitted 19 October, 2010;
originally announced October 2010.
-
Research News -- Observation of oscillation phenomena in heavy meson systems
Authors:
B. Ananthanarayan,
Keshav Choudhary,
Lishibanya Mohapatra,
Indrajeet Patil,
Avinash Rustagi,
K. Shivaraj
Abstract:
We review the recent discoveries of rare oscillation phenomena in certain heavy neutral meson systems.
We review the recent discoveries of rare oscillation phenomena in certain heavy neutral meson systems.
△ Less
Submitted 28 July, 2007;
originally announced July 2007.
-
Research News -- Observation of Exotic Heavy Baryons
Authors:
B. Ananthanarayan,
Keshav Choudhary,
Lishibanya Mohapatra,
Indrajeet Patil,
Avinash Rustagi,
K. Shivaraj
Abstract:
We review the recent discoveries of exotic heavy baryons at the Fermi National Accelerator Laboratory.
We review the recent discoveries of exotic heavy baryons at the Fermi National Accelerator Laboratory.
△ Less
Submitted 28 July, 2007;
originally announced July 2007.