Search | arXiv e-print repository

arXiv:2207.02209 [pdf]

Tackling Data Scarcity with Transfer Learning: A Case Study of Thickness Characterization from Optical Spectra of Perovskite Thin Films

Authors: Siyu Isaac Parker Tian, Zekun Ren, Selvaraj Venkataraj, Yuanhang Cheng, Daniil Bash, Felipe Oviedo, J. Senthilnath, Vijila Chellappan, Yee-Fun Lim, Armin G. Aberle, Benjamin P MacLeod, Fraser G. L. Parlane, Curtis P. Berlinguette, Qianxiao Li, Tonio Buonassisi, Zhe Liu

Abstract: Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propo… ▽ More Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propose a machine learning model called thicknessML that predicts thickness from UV-Vis spectrophotometry input and an overarching transfer learning workflow. We demonstrate the transfer learning workflow from generic source domain of generic band-gapped materials to specific target domain of perovskite materials, where the target domain data only come from limited number (18) of refractive indices from literature. The target domain can be easily extended to other material classes with a few literature data. Defining thickness prediction accuracy to be within-10% deviation, thicknessML achieves 92.2% (with a deviation of 3.6%) accuracy with transfer learning compared to 81.8% (with a deviation of 3.6%) 11.7% without (lower mean and larger standard deviation). Experimental validation on six deposited perovskite films also corroborates the efficacy of the proposed workflow by yielding a 10.5% mean absolute percentage error (MAPE). △ Less

Submitted 20 December, 2022; v1 submitted 14 June, 2022; originally announced July 2022.

arXiv:2204.03738 [pdf, other]

BankNote-Net: Open dataset for assistive universal currency recognition

Authors: Felipe Oviedo, Srinivas Vinnakota, Eugene Seleznev, Hemant Malhotra, Saqib Shaikh, Juan Lavista Ferres

Abstract: Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image reco… ▽ More Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image recognition. However, the datasets and models available for this task are limited, both in terms of dataset size and in variety of currencies covered. In this work, we collect a total of 24,826 images of banknotes in variety of assistive settings, spanning 17 currencies and 112 denominations. Using supervised contrastive learning, we develop a machine learning model for universal currency recognition. This model learns compliant embeddings of banknote images in a variety of contexts, which can be shared publicly (as a compressed vector representation), and can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft. We share our encoder model and the embeddings as an open dataset in our BankNote-Net repository. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: Pre-print

arXiv:2202.01340 [pdf, other]

An Artificial Intelligence Dataset for Solar Energy Locations in India

Authors: Anthony Ortiz, Dhaval Negandhi, Sagar R Mysorekar, Joseph Kiesecker, Shivaprakash K Nagaraju, Caleb Robinson, Priyal Bhatia, Aditi Khurana, Jane Wang, Felipe Oviedo, Juan Lavista Ferres

Abstract: Rapid development of renewable energy sources, particularly solar photovoltaics (PV), is critical to mitigate climate change. As a result, India has set ambitious goals to install 500 gigawatts of solar energy capacity by 2030. Given the large footprint projected to meet renewables energy targets, the potential for land use conflicts over environmental values is high. To expedite development of so… ▽ More Rapid development of renewable energy sources, particularly solar photovoltaics (PV), is critical to mitigate climate change. As a result, India has set ambitious goals to install 500 gigawatts of solar energy capacity by 2030. Given the large footprint projected to meet renewables energy targets, the potential for land use conflicts over environmental values is high. To expedite development of solar energy, land use planners will need access to up-to-date and accurate geo-spatial information of PV infrastructure. In this work, we developed a spatially explicit machine learning model to map utility-scale solar projects across India using freely available satellite imagery with a mean accuracy of 92%. Our model predictions were validated by human experts to obtain a dataset of 1363 solar PV farms. Using this dataset, we measure the solar footprint across India and quantified the degree of landcover modification associated with the development of PV infrastructure. Our analysis indicates that over 74% of solar development In India was built on landcover types that have natural ecosystem preservation, or agricultural value. △ Less

Submitted 30 June, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: Accepted for publication in Nature Scientific Data

arXiv:2111.01037 [pdf, other]

doi 10.1021/accountsmr.1c00244

Interpretable and Explainable Machine Learning for Materials Science and Chemistry

Authors: Felipe Oviedo, Juan Lavista Ferres, Tonio Buonassisi, Keith Butler

Abstract: While the uptake of data-driven approaches for materials science and chemistry is at an exciting, early stage, to realise the true potential of machine learning models for successful scientific discovery, they must have qualities beyond purely predictive power. The predictions and inner workings of models should provide a certain degree of explainability by human experts, permitting the identifica… ▽ More While the uptake of data-driven approaches for materials science and chemistry is at an exciting, early stage, to realise the true potential of machine learning models for successful scientific discovery, they must have qualities beyond purely predictive power. The predictions and inner workings of models should provide a certain degree of explainability by human experts, permitting the identification of potential model issues or limitations, building trust on model predictions and unveiling unexpected correlations that may lead to scientific insights. In this work, we summarize applications of interpretability and explainability techniques for materials science and chemistry and discuss how these techniques can improve the outcome of scientific studies. We discuss various challenges for interpretable machine learning in materials science and, more broadly, in scientific settings. In particular, we emphasize the risks of inferring causation or reaching generalization by purely interpreting machine learning models and the need of uncertainty estimates for model explanations. Finally, we showcase a number of exciting developments in other fields that could benefit interpretability in material science and chemistry problems. △ Less

Submitted 3 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Under review Accounts of Material Research

Journal ref: 2022 Account of Materials Research

arXiv:2104.11757 [pdf, ps, other]

Becoming Good at AI for Good

Authors: Meghana Kshirsagar, Caleb Robinson, Siyu Yang, Shahrzad Gholami, Ivan Klyuzhin, Sumit Mukherjee, Md Nasir, Anthony Ortiz, Felipe Oviedo, Darren Tanner, Anusua Trivedi, Yixi Xu, Ming Zhong, Bistra Dilkina, Rahul Dodhia, Juan M. Lavista Ferres

Abstract: AI for good (AI4G) projects involve develo** and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Develo** and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Ba… ▽ More AI for good (AI4G) projects involve develo** and applying artificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Develo** and deploying such solutions must be done in collaboration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level categories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations. △ Less

Submitted 3 May, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: Accepted to AIES-2021

arXiv:2005.07609 [pdf]

doi 10.1016/j.matt.2021.11.032

An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties

Authors: Zekun Ren, Siyu Isaac Parker Tian, Juhwan Noh, Felipe Oviedo, Guangzong Xing, Jiali Li, Qiaohao Liang, Ruiming Zhu, Armin G. Aberle, Shi**g Sun, Xiaonan Wang, Yi Liu, Qianxiao Li, Senthilnath Jayavelu, Kedar Hippalgaonkar, Yousung Jung, Tonio Buonassisi

Abstract: Realizing general inverse design could greatly accelerate the discovery of new materials with user-defined properties. However, state-of-the-art generative models tend to be limited to a specific composition or crystal structure. Herein, we present a framework capable of general inverse design (not limited to a given set of elements or crystal structures), featuring a generalized invertible repres… ▽ More Realizing general inverse design could greatly accelerate the discovery of new materials with user-defined properties. However, state-of-the-art generative models tend to be limited to a specific composition or crystal structure. Herein, we present a framework capable of general inverse design (not limited to a given set of elements or crystal structures), featuring a generalized invertible representation that encodes crystals in both real and reciprocal space, and a property-structured latent space from a variational autoencoder (VAE). In three design cases, the framework generates 142 new crystals with user-defined formation energies, bandgap, thermoelectric (TE) power factor, and combinations thereof. These generated crystals, absent in the training database, are validated by first-principles calculations. The success rates (number of first-principles-validated target-satisfying crystals/number of designed crystals) ranges between 7.1% and 38.9%. These results represent a significant step toward property-driven general inverse design using generative models, although practical challenges remain when coupled with experimental synthesis. △ Less

Submitted 15 December, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:2004.13599 [pdf]

Bridging the gap between photovoltaics R&D and manufacturing with data-driven optimization

Authors: Felipe Oviedo, Zekun Ren, Xue Hansong, Siyu Isaac Parker Tian, Kaicheng Zhang, Mariya Layurova, Thomas Heumueller, Ning Li, Erik Birgersson, Shi**g Sun, Benji Mayurama, Ian Marius Peters, Christoph J. Brabec, John Fisher III, Tonio Buonassisi

Abstract: Novel photovoltaics, such as perovskites and perovskite-inspired materials, have shown great promise due to high efficiency and potentially low manufacturing cost. So far, solar cell R&D has mostly focused on achieving record efficiencies, a process that often results in small batches, large variance, and limited understanding of the physical causes of underperformance. This approach is intensive… ▽ More Novel photovoltaics, such as perovskites and perovskite-inspired materials, have shown great promise due to high efficiency and potentially low manufacturing cost. So far, solar cell R&D has mostly focused on achieving record efficiencies, a process that often results in small batches, large variance, and limited understanding of the physical causes of underperformance. This approach is intensive in time and resources, and ignores many relevant factors for industrial production, particularly the need for high reproducibility and high manufacturing yield, and the accompanying need of physical insights. The record-efficiency paradigm is effective in early-stage R&D, but becomes unsuitable for industrial translation, requiring a repetition of the optimization procedure in the industrial setting. This mismatch between optimization objectives, combined with the complexity of physical root-cause analysis, contributes to decade-long timelines to transfer new technologies into the market. Based on recent machine learning and technoeconomic advances, our perspective articulates a data-driven optimization framework to bridge R&D and manufacturing optimization approaches. We extend the maximum-efficiency optimization paradigm by considering two additional dimensions: a technoeconomic figure of merit and scalable physical inference. Our framework naturally aligns different stages of technology development with shared optimization objectives, and accelerates the optimization process by providing physical insights. △ Less

Submitted 28 April, 2020; originally announced April 2020.

arXiv:2002.08471 [pdf, other]

Scaled Fixed Point Algorithm for Computing the Matrix Square Root

Authors: Harry F. Oviedo, Hugo J. Lara, Oscar S. Dalmau

Abstract: This paper addresses the numerical solution of the matrix square root problem. Two fixed point iterations are proposed by rearranging the nonlinear matrix equation $A - X^2 = 0$ and incorporating a positive scaling parameter. The proposals only need to compute one matrix inverse and at most two matrix multiplications per iteration. A global convergence result is established. The numerical comparis… ▽ More This paper addresses the numerical solution of the matrix square root problem. Two fixed point iterations are proposed by rearranging the nonlinear matrix equation $A - X^2 = 0$ and incorporating a positive scaling parameter. The proposals only need to compute one matrix inverse and at most two matrix multiplications per iteration. A global convergence result is established. The numerical comparisons versus some existing methods from the literature, on several test problems, demonstrate the efficiency and effectiveness of our proposals. △ Less

Submitted 18 February, 2020; originally announced February 2020.

MSC Class: 65J15; 65F30; 65H10

arXiv:1908.06497 [pdf, ps, other]

A Spectral Gradient Projection Method for the Positive Semi-definite Procrustes Problem

Authors: Harry F. Oviedo

Abstract: This paper addresses the positive semi-definite procrustes problem (PSDP). The PSDP corresponds to a least squares problem over the set of symmetric and semi-definite positive matrices. These kinds of problems appear in many applications such as structure analysis, signal processing, among others. A non-monotone spectral projected gradient algorithm is proposed to obtain a numerical solution for t… ▽ More This paper addresses the positive semi-definite procrustes problem (PSDP). The PSDP corresponds to a least squares problem over the set of symmetric and semi-definite positive matrices. These kinds of problems appear in many applications such as structure analysis, signal processing, among others. A non-monotone spectral projected gradient algorithm is proposed to obtain a numerical solution for the PSDP. The proposed algorithm employs the Zhang and Hager's non-monotone technique in combination with the Barzilai and Borwein's step size to accelerate convergence. Some theoretical results are presented. Finally, numerical experiments are performed to demonstrate the effectiveness and efficiency of the proposed method, and comparisons are made with other state-of-the-art algorithms. △ Less

Submitted 18 August, 2019; originally announced August 2019.

arXiv:1907.10995 [pdf]

Embedding Physics Domain Knowledge into a Bayesian Network Enables Layer-by-Layer Process Innovation for Photovoltaics

Authors: Zekun Ren, Felipe Oviedo, Muang Thway, Siyu I. P. Tian, Yue Wang, Hansong Xue, Jose Dario Perea, Mariya Layurova, Thomas Heumueller, Erik Birgersson, Armin Aberle, Christoph J. Brabec, Rolf Stangl, Shi**g Sun, Qianxiao Li, Fen Lin, Ian Marius Peters, Tonio Buonassisi

Abstract: Process optimization of photovoltaic devices is a time-intensive, trial and error endeavor, without full transparency of the underlying physics, and with user-imposed constraints that may or may not lead to a global optimum. Herein, we demonstrate that embedding physics domain knowledge into a Bayesian network enables an optimization approach that identifies the root cause(s) of underperformance w… ▽ More Process optimization of photovoltaic devices is a time-intensive, trial and error endeavor, without full transparency of the underlying physics, and with user-imposed constraints that may or may not lead to a global optimum. Herein, we demonstrate that embedding physics domain knowledge into a Bayesian network enables an optimization approach that identifies the root cause(s) of underperformance with layer by-layer resolution and reveals alternative optimal process windows beyond global black-box optimization. Our Bayesian-network approach links process conditions to materials descriptors (bulk and interface properties, e.g., bulk lifetime, do**, and surface recombination) and device performance parameters (e.g., cell efficiency), using a Bayesian inference framework with an autoencoder-based surrogate device-physics model that is 100x faster than numerical solvers. With the trained surrogate model, our approach is robust and reduces significantly the time consuming experimentalist intervention, even with small numbers of fabricated samples. To demonstrate our method, we perform layer-by-layer optimization of GaAs solar cells. In a single cycle of learning, we find an improved growth temperature for the GaAs solar cells without any secondary measurements, and demonstrate a 6.5% relative AM1.5G efficiency improvement above baseline and traditional black-box optimization methods. △ Less

Submitted 3 November, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

arXiv:1812.01025 [pdf]

doi 10.1016/j.joule.2019.05.014

Accelerating Photovoltaic Materials Development via High-Throughput Experiments and Machine-Learning-Assisted Diagnosis

Authors: Shi**g Sun, Noor T. P. Hartono, Zekun D. Ren, Felipe Oviedo, Antonio M. Buscemi, Mariya Layurova, De Xin Chen, Tofunmi Ogunfunmi, Janak Thapa, Savitha Ramasamy, Charles Settens, Brian L. DeCost, Aaron Gilad Kusne, Zhe Liu, Siyu I. P. Tian, I. Marius Peters, Juan-Pablo Correa-Baena, Tonio Buonassisi

Abstract: Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased thro… ▽ More Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased throughput is enabled by streamlining experimental workflows, develo** a set of precursors amenable to high-throughput synthesis, and develo** machine-learning assisted diagnosis. We utilize a deep neural network to classify compounds based on experimental X-ray diffraction data into 0D, 2D, and 3D structures more than 10 times faster than human analysis and with 90% accuracy. We validate our methods using lead-halide perovskites and extend the application to novel lead-free compositions. The wider synthesis window and faster cycle of learning enables three noteworthy scientific findings: (1) we realize four inorganic layered perovskites, A3B2Br9 (A = Cs, Rb; B = Bi, Sb) in thin-film form via one-step liquid deposition; (2) we report a multi-site lead-free alloy series that was not previously described in literature, Cs3(Bi1-xSbx)2(I1-xBrx)9; and (3) we reveal the effect on bandgap (reduction to <2 eV) and structure upon simultaneous alloying on the B-site and X-site of Cs3Bi2I9 with Sb and Br. This study demonstrates that combining an accelerated experimental cycle of learning and machine-learning based diagnosis represents an important step toward realizing fully-automated laboratories for materials discovery and development. △ Less

Submitted 25 November, 2018; originally announced December 2018.

Comments: NIPS 2018 Workshop: Machine Learning for Molecules and Materials

Journal ref: Joule 3, 2019, 1437-1451

arXiv:1811.08425 [pdf]

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

Authors: Felipe Oviedo, Zekun Ren, Shi**g Sun, Charlie Settens, Zhe Liu, Noor Titan Putri Hartono, Ramasamy Savitha, Brian L. DeCost, Siyu I. P. Tian, Giuseppe Romano, Aaron Gilad Kusne, Tonio Buonassisi

Abstract: X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a superv… ▽ More X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16°, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less. △ Less

Submitted 23 April, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

Comments: Accepted with minor revisions in npj Computational Materials, Presented in NIPS 2018 Workshop: Machine Learning for Molecules and Materials

Showing 1–12 of 12 results for author: Oviedo, F