-
Impact of Data Bias on Machine Learning for Crystal Compound Synthesizability Predictions
Authors:
Ali Davariashtiyani,
Busheng Wang,
Samad Ha**azar,
Eva Zurek,
Sara Kadkhodaei
Abstract:
Machine learning models are susceptible to being misled by biases in training data that emphasize incidental correlations over the intended learning task. In this study, we demonstrate the impact of data bias on the performance of a machine learning model designed to predict the synthesizability likelihood of crystal compounds. The model performs a binary classification on labeled crystal samples.…
▽ More
Machine learning models are susceptible to being misled by biases in training data that emphasize incidental correlations over the intended learning task. In this study, we demonstrate the impact of data bias on the performance of a machine learning model designed to predict the synthesizability likelihood of crystal compounds. The model performs a binary classification on labeled crystal samples. Despite using the same architecture for the machine learning model, we showcase how the model's learning and prediction behavior differs once trained on distinct data. We use two data sets for illustration: a mixed-source data set that integrates experimental and computational crystal samples and a single-source data set consisting of data exclusively from one computational database. We present simple procedures to detect data bias and to evaluate its effect on the model's performance and generalization. This study reveals how inconsistent, unbalanced data can propagate bias, undermining real-world applicability even for advanced machine learning techniques.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Enhancing ab initio diffusion calculations in materials through Gaussian process regression
Authors:
Seyyedfaridoddin Fattahpour,
Sara Kadkhodaei
Abstract:
Saddle point search schemes are widely used to identify the transition state of different processes, like chemical reactions, surface and bulk diffusion, surface adsorption, and many more. In solid-state materials with relatively large numbers of atoms, the minimum mode following schemes such as dimer are commonly used because they alleviate the calculation of the Hessian on the high-dimensional p…
▽ More
Saddle point search schemes are widely used to identify the transition state of different processes, like chemical reactions, surface and bulk diffusion, surface adsorption, and many more. In solid-state materials with relatively large numbers of atoms, the minimum mode following schemes such as dimer are commonly used because they alleviate the calculation of the Hessian on the high-dimensional potential energy surface. Here, we show that the dimer search can be further accelerated by leveraging Gaussian process regression (GPR). The GPR serves as a surrogate model to feed the dimer with the required energy and force input. We test the GPR- accelerated dimer method for predicting the diffusion coefficient of vacancy-mediated self-diffusion in bcc molybdenum and sulfur diffusion in hexagonal molybdenum disulfide. We use a multi-task learning approach that utilizes a shared covariance function between energy and force input, and we show that the multi-task learning significantly improves the performance of the GPR surrogate model compared to previously used learning approaches. Additionally, we demonstrate that a translation-hop sampling approach is necessary to avoid over-fitting the GPR surrogate model to the minimum-mode-following pathway and thus succeeding in locating the saddle point. We show that our method reduces the number of evaluations to a fraction of what a conventional dimer requires.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Computational Design of Corrosion-resistant and Wear-resistant Titanium Alloys for Orthopedic Implants
Authors:
Noel Siony,
Long Vuong,
Otgonsuren Lundaajamts,
Sara Kadkhodaei
Abstract:
Titanium alloys are promising candidates for orthopedic implants due to their mechanical resilience and biocompatibility. Current titanium alloys in orthopedic implants still suffer from low wear and corrosion resistance. Here, we present a computational method for optimizing the composition of titanium alloys for enhanced corrosion and wear resistance without compromising on other aspects such as…
▽ More
Titanium alloys are promising candidates for orthopedic implants due to their mechanical resilience and biocompatibility. Current titanium alloys in orthopedic implants still suffer from low wear and corrosion resistance. Here, we present a computational method for optimizing the composition of titanium alloys for enhanced corrosion and wear resistance without compromising on other aspects such as phase stability, biocompatibility, and strength. We use the cohesive energy, oxide formation energy, surface work function, and the elastic shear modulus of pure elements as proxy descriptors to guide us towards alloys with enhanced wear and corrosion resistance. For the best-selected candidates, we then use the CALPHAD approach, as implemented in the Thermo-Calc software, to calculate the phase diagram, yield strength, hardness, Pourbaix diagram, and the Pilling-Bedworth (PB) ratio. These calculations are used to assess the thermodynamic stability, biocompatibility, corrosion resistance, and wear resistance of the selected alloys. Additionally, we provide insights about the role of silicon on improving the corrosion and wear resistance of alloys.
△ Less
Submitted 22 September, 2022;
originally announced October 2022.
-
Understanding the role of anharmonic phonons in diffusion of bcc metal
Authors:
Seyyedfaridodin Fattahpour,
Ali Davariashtiyani,
Sara Kadkhodaei
Abstract:
Diffusion in high-temperature bcc phase of IIIB-IVB metals such as Zr, Ti, and their alloys is observed to be orders of magnitude higher than bcc metals of group VB-VIB, including Cr, Mo, and W. The underlying reason for this higher diffusion is still poorly understood. To explain this observation, we compare the first-principles-calculated parameters of monovancy-mediated diffusion between bcc Ti…
▽ More
Diffusion in high-temperature bcc phase of IIIB-IVB metals such as Zr, Ti, and their alloys is observed to be orders of magnitude higher than bcc metals of group VB-VIB, including Cr, Mo, and W. The underlying reason for this higher diffusion is still poorly understood. To explain this observation, we compare the first-principles-calculated parameters of monovancy-mediated diffusion between bcc Ti, Zr, and dilute Zr- Sn alloys and bcc Cr, Mo, and W. Our results indicate that strongly anharmonic vibrations promote both the vacancy concentration and the diffusive jump rate in bcc IVB metals and can explain their markedly faster diffusion compared to bcc VIB metals. Additionally, we provide an efficient approach to calculate diffusive jump rates according to the transition state theory (TST). The use of standard harmonic TST is impractical in bcc IIIB/IVB metals due to the existence of ill-defined harmonic phonons, and most studies use classical or ab initio molecular dynamics for direct simulation of diffusive jumps. Here, instead, we use a stochastically-sampled temperature-dependent phonon analysis within the transition state theory to study diffusive jumps without the need of direct molecular dynamics simulations. We validate our first-principles diffusion coefficient predictions with available experimental measurements and explain the underlying reasons for the promotion of diffusion in bcc IVB metals/alloys compared to bcc VIB metals.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Phonon-assisted diffusion in bcc phase of titanium and zirconium from first-principles
Authors:
Sara Kadkhodaei,
Ali Davariashtiyani
Abstract:
Diffusion is the underlying mechanism for many complicated materials phenomena, and understanding it is basic to the discovery of novel materials with desired physical and mechanical properties. Certain groups of solid phases, such as the bcc phase of IIIB and IVB metals and their alloys, which are only stable when they reach high enough temperatures and experience anharmonic vibration entropic ef…
▽ More
Diffusion is the underlying mechanism for many complicated materials phenomena, and understanding it is basic to the discovery of novel materials with desired physical and mechanical properties. Certain groups of solid phases, such as the bcc phase of IIIB and IVB metals and their alloys, which are only stable when they reach high enough temperatures and experience anharmonic vibration entropic effects, exhibit "anomalously fast diffusion". However, the underlying reason for the observed extraordinary fast diffusion is poorly understood and due to the existence of harmonic vibration instabilities in these phases the standard models fail to predict their diffusivity. Here, we indicate that the anharmonic phonon-phonon coupling effects can accurately describe the anomalously large macroscopic diffusion coefficients in the bcc phase of IVB metals, and therefore yield a new understanding of the underlying mechanism for diffusion in these phases. We utilize temperature-dependent phonon analysis by combining ab initio molecular dynamics with lattice dynamics calculations to provide a new approach to use the transition state theory beyond the harmonic approximation. We validate the diffusivity predictions for the bcc phase of titanium and zirconium with available experimental measurements, while we show that predictions based on harmonic transition state theory severely underestimates diffusivity in these phases.
△ Less
Submitted 24 April, 2020; v1 submitted 13 October, 2019;
originally announced October 2019.
-
A simple local expression for the prefactor in transition state theory
Authors:
Sara Kadkhodaei,
Axel van de Walle
Abstract:
We present a simple and accurate computational technique to determine the frequency prefactor in harmonic transition state theory without necessitating full phonon density of states (DOS) calculations. The atoms in the system are partitioned into an "active region", where the kinetic process takes place, and an "environment" surrounding the active region. It is shown that the prefactor can be obta…
▽ More
We present a simple and accurate computational technique to determine the frequency prefactor in harmonic transition state theory without necessitating full phonon density of states (DOS) calculations. The atoms in the system are partitioned into an "active region", where the kinetic process takes place, and an "environment" surrounding the active region. It is shown that the prefactor can be obtained by a partial phonon DOS calculation of the active region with a simple correction term accounting for the environment, under reasonable assumptions regarding atomic interactions. Convergence with respect to the size of the active region is investigated for different systems, as well as the reduction in computational costs when compared to full phonon DOS calculation. Additionally, we provide an open source implementation of the algorithm that can be added as an extension to LAMMPS software.
△ Less
Submitted 8 April, 2019; v1 submitted 22 December, 2018;
originally announced December 2018.
-
Free energy calculation of mechanically unstable but dynamically stabilized bcc titanium
Authors:
Sara Kadkhodaei,
Qi-Jun Hong,
Axel van de Walle
Abstract:
The phase diagram of numerous materials of technological importance features high-symmetry high-temperature phases that exhibit phonon instabilities. Leading examples include shape-memory alloys, as well as ferroelectric, refractory, and structural materials. The thermodynamics of these phases have proven challenging to handle by atomistic computational thermodynamic techniques, due to the occurre…
▽ More
The phase diagram of numerous materials of technological importance features high-symmetry high-temperature phases that exhibit phonon instabilities. Leading examples include shape-memory alloys, as well as ferroelectric, refractory, and structural materials. The thermodynamics of these phases have proven challenging to handle by atomistic computational thermodynamic techniques, due to the occurrence of constant anharmonicity-driven hop** between local low-symmetry distortions, while maintaining a high-symmetry time-averaged structure. To compute the free energy in such phases, we propose to explore the system's potential-energy surface by discrete sampling of local minima by means of a lattice gas Monte Carlo approach and by continuous sampling by means of a lattice dynamics approach in the vicinity of each local minimum. Given the proximity of the local minima, it is necessary to carefully partition phase space by using a Voronoi tessellation to constrain the domain of integration of the partition function, in order to avoid double-counting artifacts and enable an accurate harmonic treatment near each local minima. We consider the bcc phase of titanium as a prototypical examples to illustrate our approach.
△ Less
Submitted 3 February, 2017;
originally announced February 2017.
-
An epicycle method for elasticity limit calculations
Authors:
Axel van de Walle,
Sara Kadkhodaei,
Ruoshi Sun,
Qi-Jun Hong
Abstract:
The task of finding the smallest energy needed to bring a solid to its onset of mechanical instability arises in many problems in materials science, from the determination of the elasticity limit to the consistent assignment of free energies to mechanically unstable phases. However, unless the space of possible deformations is low-dimensional and a priori known, this problem is numerically difficu…
▽ More
The task of finding the smallest energy needed to bring a solid to its onset of mechanical instability arises in many problems in materials science, from the determination of the elasticity limit to the consistent assignment of free energies to mechanically unstable phases. However, unless the space of possible deformations is low-dimensional and a priori known, this problem is numerically difficult, as it involves minimizing a function under a constraint on its Hessian, which is computionally prohibitive to obtain in low symmetry systems, especially if electronic structure calculations are used. We propose a method that is inspired by the well-known dimer method for saddle point searches but that adds the necessary ingredients to solve for the lowest onset of mechanical instability. The method consists of two nested optimization problems. The inner one involves a dimer-like construction to find the direction of smallest curvature as well as the gradient of this curvature function. The outer optimization then minimizes energy using the result of the inner optimization problem to constrain the search to the hypersurface enclosing all points of zero minimum curvature. Example applications to both model systems and electronic structure calculations are given.
△ Less
Submitted 12 January, 2017;
originally announced January 2017.
-
BioInfoBase : A Bioinformatics Resourceome
Authors:
Saeid Kadkhodaei,
Fatemeh Barantalab,
Sima Taheri,
Majid Foroughi,
Farahnaz Golestan Hashemi,
Mahmood Reza Shabanimofrad,
Hossein Hosseinimonfared,
Morvarid Akhavan Rezaei,
Ali Ranjbarfard,
Mahbod Sahebi,
Parisa Azizi,
Maryam Dadar,
Rambod Abiri,
Mohammad Fazel Harighi,
Nahid Kalhori,
Mohammad Reza Etemadi,
Ali Baradaran,
Mahmoud Danaee,
Iman Zare,
Ahmad Ghafarpour,
Zahra Azhdari,
Hamid Rajabi Memari,
Vajiheh Safavi,
Naser Tajabadi,
Faruku Bande
Abstract:
Over the past decade there has been a significant growth in bioinformatics databases, tools and resources. Although, bioinformatics is becoming more specific, increasing the number of bioinformatics-wares has made it difficult for researchers to find the most appropriate databases, tools or methods which match their needs. Our coordinated effort has been planned to establish a reference website in…
▽ More
Over the past decade there has been a significant growth in bioinformatics databases, tools and resources. Although, bioinformatics is becoming more specific, increasing the number of bioinformatics-wares has made it difficult for researchers to find the most appropriate databases, tools or methods which match their needs. Our coordinated effort has been planned to establish a reference website in Bioinformatics as a public repository of tools, databases, directories and resources annotated with contextual information and organized by functional relevance. Within the first phase of BioInfoBase development, 22 experts in different fields of molecular biology contributed and more than 2500 records were registered, which are increasing daily. For each record submitted to the database of website almost all related data (40 features) has been extracted. These include information from the biological category and subcategory to the scientific article and developer information. Searching the query keyword(s) returns links containing the entered keyword(s) found within the different features of the records with more weights on the title, abstract and application fields. The search results simply provide the users with the most informative features of the records to select the most suitable ones. The usefulness of the returned results is ranked according to the matching score based on the Term Frequency-Inverse Document Frequency (TF-IDF) methods. Therefore, this search engine will screen a comprehensive index of bioinformatics tools, databases and resources and provide the best suited records (links) to the researchers need. The BioInfoBase resource is available at www.bioinfobase.info.
△ Less
Submitted 20 November, 2016; v1 submitted 11 July, 2016;
originally announced July 2016.