Condensed Matter > Materials Science
[Submitted on 19 Jun 2024]
Title:Optimal pre-train/fine-tune strategies for accurate material property predictions
View PDF HTML (experimental)Abstract:Overcoming the challenge of limited data availability within materials science is crucial for the broad-based applicability of machine learning within materials science. One pathway to overcome this limited data availability is to use the framework of transfer learning (TL), where a pre-trained (PT) machine learning model (on a larger dataset) can be fine-tuned (FT) on a target (typically smaller) dataset. Our study systematically explores the effectiveness of various PT/FT strategies to learn and predict material properties with limited data. Specifically, we leverage graph neural networks (GNNs) to PT/FT on seven diverse curated materials datasets, encompassing sizes ranging from 941 to 132,752 datapoints. We consider datasets that cover a spectrum of material properties, ranging from band gaps (electronic) to formation energies (thermodynamic) and shear moduli (mechanical). We study the influence of PT and FT dataset sizes, strategies that can be employed for FT, and other hyperparameters on pair-wise TL among the datasets considered. We find our pair-wise PT-FT models to consistently outperform models trained from scratch on the target datasets. Importantly, we develop a GNN framework that is simultaneously PT on multiple properties (MPT), enabling the construction of generalized GNN models. Our MPT models outperform pair-wise PT-FT models on several datasets considered, and more significantly, on a 2D material band gap dataset that is completely out-of-distribution from the PT datasets. Finally, we expect our PT/FT and MPT frameworks to be generalizable to other GNNs and materials properties, which can accelerate materials design and discovery for various applications.
Submission history
From: Gopalakrishnan Sai Gautam [view email][v1] Wed, 19 Jun 2024 01:25:26 UTC (452 KB)
Current browse context:
cond-mat.mtrl-sci
Change to browse by:
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.