Search | arXiv e-print repository

Emulating Expert Insight: A Robust Strategy for Optimal Experimental Design

Authors: Matthew R. Carbone, Hyeong ** Kim, Chandima Fernando, Shinjae Yoo, Daniel Olds, Howie Joress, Brian DeCost, Bruce Ravel, Yugang Zhang, Phillip M. Maffettone

Abstract: The challenge of optimal design of experiments (DOE) pervades materials science, physics, chemistry, and biology. Bayesian optimization has been used to address this challenge in vast sample spaces, although it requires framing experimental campaigns through the lens of maximizing some observable. This framing is insufficient for epistemic research goals that seek to comprehensively analyze a samp… ▽ More The challenge of optimal design of experiments (DOE) pervades materials science, physics, chemistry, and biology. Bayesian optimization has been used to address this challenge in vast sample spaces, although it requires framing experimental campaigns through the lens of maximizing some observable. This framing is insufficient for epistemic research goals that seek to comprehensively analyze a sample space, without an explicit scalar objective (e.g., the characterization of a wafer or sample library). In this work, we propose a flexible formulation of scientific value that recasts a dataset of input conditions and higher-dimensional observable data into a continuous, scalar metric. Intuitively, the scientific value function measures where observables change significantly, emulating the perspective of experts driving an experiment, and can be used in collaborative analysis tools or as an objective for optimization techniques. We demonstrate this technique by exploring simulated phase boundaries from different observables, autonomously driving a variable temperature measurement of a ferroelectric material, and providing feedback from a nanoparticle synthesis campaign. The method is seamlessly compatible with existing optimization tools, can be extended to multi-modal and multi-fidelity experiments, and can integrate existing models of an experimental system. Because of its flexibility, it can be deployed in a range of experimental settings for autonomous or accelerated experiments. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2306.16349 [pdf, other]

Accurate, uncertainty-aware classification of molecular chemical motifs from multi-modal X-ray absorption spectroscopy

Authors: Matthew R. Carbone, Phillip M. Maffettone, Xiaohui Qu, Shinjae Yoo, Deyu Lu

Abstract: Accurate classification of molecular chemical motifs from experimental measurement is an important problem in molecular physics, chemistry and biology. In this work, we present neural network ensemble classifiers for predicting the presence (or lack thereof) of 41 different chemical motifs on small molecules from simulated C, N and O K-edge X-ray absorption near-edge structure (XANES) spectra. Our… ▽ More Accurate classification of molecular chemical motifs from experimental measurement is an important problem in molecular physics, chemistry and biology. In this work, we present neural network ensemble classifiers for predicting the presence (or lack thereof) of 41 different chemical motifs on small molecules from simulated C, N and O K-edge X-ray absorption near-edge structure (XANES) spectra. Our classifiers not only reach a maximum average class-balanced accuracy of 0.99 but also accurately quantify uncertainty. We also show that including multiple XANES modalities improves predictions notably on average, demonstrating a "multi-modal advantage" over any single modality. In addition to structure refinement, our approach can be generalized for broad applications with molecular design pipelines. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2305.18089 [pdf, other]

Inverse Protein Folding Using Deep Bayesian Optimization

Authors: Natalie Maus, Yimeng Zeng, Daniel Allen Anderson, Phillip Maffettone, Aaron Solomon, Peyton Greenside, Osbert Bastani, Jacob R. Gardner

Abstract: Inverse protein folding -- the task of predicting a protein sequence from its backbone atom coordinates -- has surfaced as an important problem in the "top down", de novo design of proteins. Contemporary approaches have cast this problem as a conditional generative modelling problem, where a large generative model over protein sequences is conditioned on the backbone. While these generative models… ▽ More Inverse protein folding -- the task of predicting a protein sequence from its backbone atom coordinates -- has surfaced as an important problem in the "top down", de novo design of proteins. Contemporary approaches have cast this problem as a conditional generative modelling problem, where a large generative model over protein sequences is conditioned on the backbone. While these generative models very rapidly produce promising sequences, independent draws from generative models may fail to produce sequences that reliably fold to the correct backbone. Furthermore, it is challenging to adapt pure generative approaches to other settings, e.g., when constraints exist. In this paper, we cast the problem of improving generated inverse folds as an optimization problem that we solve using recent advances in "deep" or "latent space" Bayesian optimization. Our approach consistently produces protein sequences with greatly reduced structural error to the target backbone structure as measured by TM score and RMSD while using fewer computational resources. Additionally, we demonstrate other advantages of an optimization-based approach to the problem, such as the ability to handle constraints. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2304.11120 [pdf, other]

What is missing in autonomous discovery: Open challenges for the community

Authors: Phillip M. Maffettone, Pascal Friederich, Sterling G. Baird, Ben Blaiszik, Keith A. Brown, Stuart I. Campbell, Orion A. Cohen, Tantum Collins, Rebecca L. Davis, Ian T. Foster, Navid Haghmoradi, Mark Hereld, Nicole Jung, Ha-Kyung Kwon, Gabriella Pizzuto, Jacob Rintamaki, Casper Steinmann, Luca Torresi, Shi**g Sun

Abstract: Self-driving labs (SDLs) leverage combinations of artificial intelligence, automation, and advanced computing to accelerate scientific discovery. The promise of this field has given rise to a rich community of passionate scientists, engineers, and social scientists, as evidenced by the development of the Acceleration Consortium and recent Accelerate Conference. Despite its strengths, this rapidly… ▽ More Self-driving labs (SDLs) leverage combinations of artificial intelligence, automation, and advanced computing to accelerate scientific discovery. The promise of this field has given rise to a rich community of passionate scientists, engineers, and social scientists, as evidenced by the development of the Acceleration Consortium and recent Accelerate Conference. Despite its strengths, this rapidly develo** field presents numerous opportunities for growth, challenges to overcome, and potential risks of which to remain aware. This community perspective builds on a discourse instantiated during the first Accelerate Conference, and looks to the future of self-driving labs with a tempered optimism. Incorporating input from academia, government, and industry, we briefly describe the current status of self-driving labs, then turn our attention to barriers, opportunities, and a vision for what is possible. Our field is delivering solutions in technology and infrastructure, artificial intelligence and knowledge generation, and education and workforce development. In the spirit of community, we intend for this work to foster discussion and drive best practices as our field grows. △ Less

Submitted 2 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2301.09177 [pdf, other]

Self-driving Multimodal Studies at User Facilities

Authors: Phillip M. Maffettone, Daniel B. Allan, Stuart I. Campbell, Matthew R. Carbone, Thomas A. Caswell, Brian L. DeCost, Dmitri Gavrilov, Marcus D. Hanwell, Howie Joress, Joshua Lynch, Bruce Ravel, Stuart B. Wilkins, Jakub Wlodek, Daniel Olds

Abstract: Multimodal characterization is commonly required for understanding materials. User facilities possess the infrastructure to perform these measurements, albeit in serial over days to months. In this paper, we describe a unified multimodal measurement of a single sample library at distant instruments, driven by a concert of distributed agents that use analysis from each modality to inform the direct… ▽ More Multimodal characterization is commonly required for understanding materials. User facilities possess the infrastructure to perform these measurements, albeit in serial over days to months. In this paper, we describe a unified multimodal measurement of a single sample library at distant instruments, driven by a concert of distributed agents that use analysis from each modality to inform the direction of the other in real time. Powered by the Bluesky project at the National Synchrotron Light Source II, this experiment is a world's first for beamline science, and provides a blueprint for future approaches to multimodal and multifidelity experiments at user facilities. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). AI4Mat Workshop

arXiv:2203.12742 [pdf, other]

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Authors: Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

Abstract: Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of mult… ▽ More Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design. △ Less

Submitted 12 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: ICML 2022. Code available at https://github.com/samuelstanton/lambo

arXiv:2201.03550 [pdf, other]

Machine learning enabling high-throughput and remote operations at large-scale user facilities

Authors: Tatiana Konstantinova, Phillip M. Maffettone, Bruce Ravel, Stuart I. Campbell, Andi M. Barbour, Daniel Olds

Abstract: Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron li… ▽ More Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron light sources. Machine learning (ML) methods are regularly developed to process and interpret large datasets in real-time with measurements. However, there remain conceptual barriers to entry for the facility general user community, whom often lack expertise in ML, and technical barriers for deploying ML models. Herein, we demonstrate a variety of archetypal ML models for on-the-fly analysis at multiple beamlines at the National Synchrotron Light Source II (NSLS-II). We describe these examples instructively, with a focus on integrating the models into existing experimental workflows, such that the reader can easily include their own ML techniques into experiments at NSLS-II or facilities with a common infrastructure. The framework presented here shows how with little effort, diverse ML models operate in conjunction with feedback loops via integration into the existing Bluesky Suite for experimental orchestration and data management. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: 12 pages, 5 figures

arXiv:2104.04392 [pdf]

Deep learning for visualization and novelty detection in large X-ray diffraction datasets

Authors: Lars Banko, Phillip M. Maffettone, Dennis Naujoks, Daniel Olds, Alfred Ludwig

Abstract: We apply variational autoencoders (VAE) to X-ray diffraction (XRD) data analysis on both simulated and experimental thin-film data. We show that crystal structure representations learned by a VAE reveal latent information, such as the structural similarity of textured diffraction patterns. While other artificial intelligence (AI) agents are effective at classifying XRD data into known phases, a si… ▽ More We apply variational autoencoders (VAE) to X-ray diffraction (XRD) data analysis on both simulated and experimental thin-film data. We show that crystal structure representations learned by a VAE reveal latent information, such as the structural similarity of textured diffraction patterns. While other artificial intelligence (AI) agents are effective at classifying XRD data into known phases, a similarly conditioned VAE is uniquely effective at knowing what it does not know, rapidly identifying novel phases and mixtures. These capabilities demonstrate that a VAE is a valuable AI agent for materials discovery and understanding XRD measurements both on-the-fly and during post hoc analysis. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2104.00864 [pdf, other]

doi 10.1063/5.0052859

Constrained non-negative matrix factorization enabling real-time insights of $\textit{in situ}$ and high-throughput experiments

Authors: Phillip M. Maffettone, Aidan C. Daly, Daniel Olds

Abstract: Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as $\textit{in situ}$ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produce… ▽ More Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as $\textit{in situ}$ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from $\textit{in situ}$ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during $\textit{in situ}$ and high-throughput experiments. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: This article has been submitted to Applied Physics Reviews. After it is published, it will be found at https://aip.scitation.org/journal/are. Copyright (2021) Phillip M. Maffettone, Aiden C. Daly, Daniel Olds

arXiv:2008.00283 [pdf, other]

doi 10.1038/s43588-021-00059-2

Crystallography companion agent for high-throughput materials discovery

Authors: Phillip M. Maffettone, Lars Banko, Peng Cui, Yury Lysogorskiy, Marc A. Little, Daniel Olds, Alfred Ludwig, Andrew I. Cooper

Abstract: The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone, and impossible to scale. With the advent of autonomous robotic scientists or self-driving labs, contemporary techniques pro… ▽ More The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone, and impossible to scale. With the advent of autonomous robotic scientists or self-driving labs, contemporary techniques prohibit the integration of XRD. Here, we describe a computer program for the autonomous characterization of XRD data, driven by artificial intelligence (AI), for the discovery of new materials. Starting from structural databases, we train an ensemble model using a physically accurate synthetic dataset, which output probabilistic classifications -- rather than absolutes -- to overcome the overconfidence in traditional neural networks. This AI agent behaves as a companion to the researcher, improving accuracy and offering significant time savings. It was demonstrated on a diverse set of organic and inorganic materials characterization challenges. This innovation is directly applicable to inverse design approaches, robotic discovery systems, and can be immediately considered for other forms of characterization such as spectroscopy and the pair distribution function. △ Less

Submitted 17 March, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: For associated code, see https://github.com/maffettone/xca

Journal ref: Nat. Comput. Sci. 1, 290 (2021)

arXiv:1809.08080 [pdf]

Quantitative imaging of the complexity in liquid bubbles' evolution reveals the dynamics of film retraction

Authors: Biagio Mandracchia, Zhe Wang, Vincenzo Ferraro, Massimiliano Maria Villone, Ernesto Di Maio, Pier Luca Maffettone, Pietro Ferraro

Abstract: The dynamics and stability of thin liquid films have fascinated scientists over many decades. Thin film flows are central to numerous areas of engineering, geophysics, and biophysics and occur over a wide range of length, velocity, and liquid properties scales. In spite of many significant developments in this area, we still lack appropriate quantitative experimental tools with the spatial and tem… ▽ More The dynamics and stability of thin liquid films have fascinated scientists over many decades. Thin film flows are central to numerous areas of engineering, geophysics, and biophysics and occur over a wide range of length, velocity, and liquid properties scales. In spite of many significant developments in this area, we still lack appropriate quantitative experimental tools with the spatial and temporal resolution necessary for a comprehensive study of film evolution. We propose tackling this problem with a holographic technique that combines quantitative phase imaging with a custom setup designed to form and manipulate bubbles. The results, gathered on a model aqueous polymeric solution, provide an unparalleled insight into bubble dynamics through the combination of full-field thickness estimation, three-dimensional imaging, and fast acquisition time. The unprecedented level of detail offered by the proposed methodology will promote a deeper understanding of the underlying physics of thin film dynamics. △ Less

Submitted 12 December, 2018; v1 submitted 9 September, 2018; originally announced September 2018.

arXiv:1804.04906 [pdf, other]

doi 10.1103/PhysRevLett.120.265501

Negative Hydration Expansion in ZrW2O8: Microscopic Mechanism, Spaghetti Dynamics, and Negative Thermal Expansion

Authors: Mia Baise, Phillip M. Maffettone, Fabien Trousselet, Nicholas P. Funnell, François-Xavier Coudert, Andrew L. Goodwin

Abstract: We use a combination of X-ray diffraction, total scattering and quantum mechanical calculations to determine the mechanism responsible for hydration-driven contraction in ZrW$_2$O$_8$. Inclusion of H$_2$O molecules within the ZrW$_2$O$_8$ network drives the concerted formation of new W--O bonds to give one-dimensional (--W--O--)$_n$ strings. The topology of the ZrW$_2$O$_8$ network is such that th… ▽ More We use a combination of X-ray diffraction, total scattering and quantum mechanical calculations to determine the mechanism responsible for hydration-driven contraction in ZrW$_2$O$_8$. Inclusion of H$_2$O molecules within the ZrW$_2$O$_8$ network drives the concerted formation of new W--O bonds to give one-dimensional (--W--O--)$_n$ strings. The topology of the ZrW$_2$O$_8$ network is such that there is no unique choice for the string trajectories: the same local changes in coordination can propagate with a large number of different periodicities. Consequently, ZrW$_2$O$_8$ is heavily disordered, with each configuration of strings forming a dense aperiodic `spaghetti'. This new connectivity contracts the unit cell \emph{via} large shifts in the Zr and W atom positions. Fluctuations of the undistorted parent structure towards this spaghetti phase emerge as the key NTE phonon modes in ZrW$_2$O$_8$ itself. The large relative density of NTE phonon modes in ZrW$_2$O$_8$ actually reflect the degeneracy of volume-contracting spaghetti excitations, itself a function of the particular topology of this remarkable material. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. Lett. 120, 265501 (2018)

arXiv:1802.07629 [pdf, other]

Extreme cooperative swelling in topologically disordered fibre entanglements

Authors: Alistair R. Overy, Raj Pandya, Phillip M. Maffettone, Philip A. Chater, Arkadiy Simonov, Andrew L. Goodwin

Abstract: Entangled states are ubiquitous amongst fibrous materials, whether naturally occurring (keratin, collagen, DNA) or synthetic (nanotube assemblies, elastane). A key mechanical characteristic of these systems is their ability to reorganise in response to external stimuli, as implicated in e.g. hydration-induced swelling of keratin fibrils in human skin. During swelling, the curvature of individual f… ▽ More Entangled states are ubiquitous amongst fibrous materials, whether naturally occurring (keratin, collagen, DNA) or synthetic (nanotube assemblies, elastane). A key mechanical characteristic of these systems is their ability to reorganise in response to external stimuli, as implicated in e.g. hydration-induced swelling of keratin fibrils in human skin. During swelling, the curvature of individual fibres changes to give a cooperative and reversible structural reorganisation that opens up a pore network. The phenomenon is known to be highly dependent on topology, even if the nature of this dependence is not well understood: certain ordered entanglements (`weavings') can swell to many times their original volume while others are entirely incapable of swelling at all. Given this sensitivity to topology, it is puzzling how the disordered entanglements of many real materials manage to support cooperative dilation mechanisms. Here we use a combination of geometric and lattice-dynamical modelling to study the effect of disorder on swelling behaviour. The model system we devise spans a continuum of disordered topologies and is bounded by ordered states whose swelling behaviour is already known to be either vanishingly small or extreme. We find that while topological disorder often quenches swelling behaviour, certain disordered states possess a surprisingly large swelling capacity. Crucially, we show that the extreme swelling response previously observed only for certain specific weavings can be matched---and even superseded---by that of disordered entanglements. Our results establish a counterintuitive link between topological disorder and mechanical flexibility that has implications not only for polymer science but also for our broader understanding of collective phenomena in disordered systems. △ Less

Submitted 21 February, 2018; originally announced February 2018.

Comments: 17 pages, 4 figures

arXiv:1403.6629 [pdf, ps, other]

Microrheology with Optical Tweezers: Measuring the solutions' relative viscosity at a glance

Authors: Francesco Del Giudice, Andrew Glidle, Francesco Greco, Paolo Antonio Netti, Pier Luca Maffettone, Jonathan M. Cooper, Manlio Tassieri

Abstract: We present a straightforward method for measuring the fluids' relative viscosity via a simple graphical analysis of the normalised position autocorrelation function of an optically trapped bead, without the need of embarking on laborious calculations. The advantages of the proposed microrheology method become evident, for instance, when it is adopted for measuring the molecular weight of rare or p… ▽ More We present a straightforward method for measuring the fluids' relative viscosity via a simple graphical analysis of the normalised position autocorrelation function of an optically trapped bead, without the need of embarking on laborious calculations. The advantages of the proposed microrheology method become evident, for instance, when it is adopted for measuring the molecular weight of rare or precious materials by means of their intrinsic viscosity. The proposed method has been validated by direct comparison with conventional bulk rheology methods. △ Less

Submitted 27 March, 2014; v1 submitted 26 March, 2014; originally announced March 2014.

Comments: 4 pages, 3 figures

Showing 1–14 of 14 results for author: Maffettone, P