-
Probabilistic Phase Labeling and Lattice Refinement for Autonomous Material Research
Authors:
Ming-Chiang Chang,
Sebastian Ament,
Maximilian Amsler,
Duncan R. Sutherland,
Lan Zhou,
John M. Gregoire,
Carla P. Gomes,
R. Bruce van Dover,
Michael O. Thompson
Abstract:
X-ray diffraction (XRD) is an essential technique to determine a material's crystal structure in high-throughput experimentation, and has recently been incorporated in artificially intelligent agents in autonomous scientific discovery processes. However, rapid, automated and reliable analysis method of XRD data matching the incoming data rate remains a major challenge. To address these issues, we…
▽ More
X-ray diffraction (XRD) is an essential technique to determine a material's crystal structure in high-throughput experimentation, and has recently been incorporated in artificially intelligent agents in autonomous scientific discovery processes. However, rapid, automated and reliable analysis method of XRD data matching the incoming data rate remains a major challenge. To address these issues, we present CrystalShift, an efficient algorithm for probabilistic XRD phase labeling that employs symmetry-constrained pseudo-refinement optimization, best-first tree search, and Bayesian model comparison to estimate probabilities for phase combinations without requiring phase space information or training. We demonstrate that CrystalShift provides robust probability estimates, outperforming existing methods on synthetic and experimental datasets, and can be readily integrated into high-throughput experimental workflows. In addition to efficient phase-map**, CrystalShift offers quantitative insights into materials' structural parameters, which facilitate both expert evaluation and AI-based modeling of the phase space, ultimately accelerating materials identification and discovery.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery
Authors:
Yuanqi Du,
Yingheng Wang,
Yining Huang,
Jianan Canal Li,
Yanqiao Zhu,
Tian Xie,
Chenru Duan,
John M. Gregoire,
Carla P. Gomes
Abstract:
We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to d…
▽ More
We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.
△ Less
Submitted 14 June, 2023;
originally announced July 2023.
-
Automating Crystal-Structure Phase Map**: Combining Deep Learning with Constraint Reasoning
Authors:
Di Chen,
Yiwei Bai,
Sebastian Ament,
Wenting Zhao,
Dan Guevarra,
Lan Zhou,
Bart Selman,
R. Bruce van Dover,
John M. Gregoire,
Carla P. Gomes
Abstract:
Crystal-structure phase map** is a core, long-standing challenge in materials science that requires identifying crystal structures, or mixtures thereof, in synthesized materials. Materials science experts excel at solving simple systems but cannot solve complex systems, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase mapp…
▽ More
Crystal-structure phase map** is a core, long-standing challenge in materials science that requires identifying crystal structures, or mixtures thereof, in synthesized materials. Materials science experts excel at solving simple systems but cannot solve complex systems, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase map**. We formulate phase map** as an unsupervised pattern demixing problem and describe how to solve it using Deep Reasoning Networks (DRNets). DRNets combine deep learning with constraint reasoning for incorporating scientific prior knowledge and consequently require only a modest amount of (unlabeled) data. DRNets compensate for the limited data by exploiting and magnifying the rich prior knowledge about the thermodynamic rules governing the mixtures of crystals with constraint reasoning seamlessly integrated into neural network optimization. DRNets are designed with an interpretable latent space for encoding prior-knowledge domain constraints and seamlessly integrate constraint reasoning into neural network optimization. DRNets surpass previous approaches on crystal-structure phase map**, unraveling the Bi-Cu-V oxide phase diagram, and aiding the discovery of solar-fuels materials.
△ Less
Submitted 21 August, 2021;
originally announced August 2021.
-
Materials Representation and Transfer Learning for Multi-Property Prediction
Authors:
Shufeng Kong,
Dan Guevarra,
Carla P. Gomes,
John M. Gregoire
Abstract:
The adoption of machine learning in materials science has rapidly transformed materials property prediction. Hurdles limiting full capitalization of recent advancements in machine learning include the limited development of methods to learn the underlying interactions of multiple elements, as well as the relationships among multiple properties, to facilitate property prediction in new composition…
▽ More
The adoption of machine learning in materials science has rapidly transformed materials property prediction. Hurdles limiting full capitalization of recent advancements in machine learning include the limited development of methods to learn the underlying interactions of multiple elements, as well as the relationships among multiple properties, to facilitate property prediction in new composition spaces. To address these issues, we introduce the Hierarchical Correlation Learning for Multi-property Prediction (H-CLMP) framework that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.
△ Less
Submitted 17 June, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Autonomous synthesis of metastable materials
Authors:
Sebastian Ament,
Maximilian Amsler,
Duncan R. Sutherland,
Ming-Chiang Chang,
Dan Guevarra,
Aine B. Connolly,
John M. Gregoire,
Michael O. Thompson,
Carla P. Gomes,
R. Bruce van Dover
Abstract:
Autonomous experimentation enabled by artificial intelligence (AI) offers a new paradigm for accelerating scientific discovery. Non-equilibrium materials synthesis is emblematic of complex, resource-intensive experimentation whose acceleration would be a watershed for materials discovery and development. The map** of non-equilibrium synthesis phase diagrams has recently been accelerated via high…
▽ More
Autonomous experimentation enabled by artificial intelligence (AI) offers a new paradigm for accelerating scientific discovery. Non-equilibrium materials synthesis is emblematic of complex, resource-intensive experimentation whose acceleration would be a watershed for materials discovery and development. The map** of non-equilibrium synthesis phase diagrams has recently been accelerated via high throughput experimentation but still limits materials research because the parameter space is too vast to be exhaustively explored. We demonstrate accelerated synthesis and exploration of metastable materials through hierarchical autonomous experimentation governed by the Scientific Autonomous Reasoning Agent (SARA). SARA integrates robotic materials synthesis and characterization along with a hierarchy of AI methods that efficiently reveal the structure of processing phase diagrams. SARA designs lateral gradient laser spike annealing (lg-LSA) experiments for parallel materials synthesis and employs optical spectroscopy to rapidly identify phase transitions. Efficient exploration of the multi-dimensional parameter space is achieved with nested active learning (AL) cycles built upon advanced machine learning models that incorporate the underlying physics of the experiments as well as end-to-end uncertainty quantification. With this, and the coordination of AL at multiple scales, SARA embodies AI harnessing of complex scientific tasks. We demonstrate its performance by autonomously map** synthesis phase boundaries for the Bi$_2$O$_3$ system, leading to orders-of-magnitude acceleration in establishment of a synthesis phase diagram that includes conditions for kinetically stabilizing $δ$-Bi$_2$O$_3$ at room temperature, a critical development for electrochemical technologies such as solid oxide fuel cells.
△ Less
Submitted 19 December, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Deep Reasoning Networks: Thinking Fast and Slow
Authors:
Di Chen,
Yiwei Bai,
Wenting Zhao,
Sebastian Ament,
John M. Gregoire,
Carla P. Gomes
Abstract:
We introduce Deep Reasoning Networks (DRNets), an end-to-end framework that combines deep learning with reasoning for solving complex tasks, typically in an unsupervised or weakly-supervised setting. DRNets exploit problem structure and prior knowledge by tightly combining logic and constraint reasoning with stochastic-gradient-based neural network optimization. We illustrate the power of DRNets o…
▽ More
We introduce Deep Reasoning Networks (DRNets), an end-to-end framework that combines deep learning with reasoning for solving complex tasks, typically in an unsupervised or weakly-supervised setting. DRNets exploit problem structure and prior knowledge by tightly combining logic and constraint reasoning with stochastic-gradient-based neural network optimization. We illustrate the power of DRNets on de-mixing overlap** hand-written Sudokus (Multi-MNIST-Sudoku) and on a substantially more complex task in scientific discovery that concerns inferring crystal structures of materials from X-ray diffraction data under thermodynamic rules (Crystal-Structure-Phase-Map**). At a high level, DRNets encode a structured latent space of the input data, which is constrained to adhere to prior knowledge by a reasoning module. The structured latent encoding is used by a generative decoder to generate the targeted output. Finally, an overall objective combines responses from the generative decoder (thinking fast) and the reasoning module (thinking slow), which is optimized using constraint-aware stochastic gradient descent. We show how to encode different tasks as DRNets and demonstrate DRNets' effectiveness with detailed experiments: DRNets significantly outperform the state of the art and experts' capabilities on Crystal-Structure-Phase-Map**, recovering more precise and physically meaningful crystal structures. On Multi-MNIST-Sudoku, DRNets perfectly recovered the mixed Sudokus' digits, with 100% digit accuracy, outperforming the supervised state-of-the-art MNIST de-mixing models. Finally, as a proof of concept, we also show how DRNets can solve standard combinatorial problems -- 9-by-9 Sudoku puzzles and Boolean satisfiability problems (SAT), outperforming other specialized deep learning models. DRNets are general and can be adapted and expanded to tackle other tasks.
△ Less
Submitted 4 June, 2019; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery
Authors:
Stefano Ermon,
Ronan Le Bras,
Santosh K. Suram,
John M. Gregoire,
Carla Gomes,
Bart Selman,
Robert B. van Dover
Abstract:
Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori…
▽ More
Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition, we propose a new pattern decomposition algorithm, called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.