-
A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences
Authors:
Miguel González-Duque,
Richard Michael,
Simon Bartels,
Yevgen Zainchkovskyy,
Søren Hauberg,
Wouter Boomsma
Abstract:
Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveal…
▽ More
Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers. Project website: https://machinelearninglifescience.github.io/hdbo_benchmark
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
A Continuous Relaxation for Discrete Bayesian Optimization
Authors:
Richard Michael,
Simon Bartels,
Miguel González-Duque,
Yevgen Zainchkovskyy,
Jes Frellsen,
Søren Hauberg,
Wouter Boomsma
Abstract:
To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing prot…
▽ More
To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Atomic step disorder on polycrystalline surfaces leads to spatially inhomogeneous work functions
Authors:
Morgann Berg,
Sean W. Smith,
David A. Scrymgeour,
Michael T. Brumbach,
** Lu,
Sara M. Dickens,
Joseph R. Michael,
Taisuke Ohta,
Ezra Bussmann,
Harold P. Hjalmarson,
Peter A. Schultz,
Paul G. Clem,
Matthew M. Hopkins,
Christopher H. Moore
Abstract:
Structural disorder causes materials surface electronic properties, e.g. work function ($φ$) to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g. roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold map** reveals a spatially varying $φ= 5.70\pm 0.03$~eV over a distribution of (111) textured vi…
▽ More
Structural disorder causes materials surface electronic properties, e.g. work function ($φ$) to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g. roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold map** reveals a spatially varying $φ= 5.70\pm 0.03$~eV over a distribution of (111) textured vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g. vacuum arc initiation, a salient feature of the $φ$ distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e. far below the mean, which is exponentially impactful to field emission via the Fowler-Nordheim relation. We show that the $φ$ spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local $φ$ variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify $φ$. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole $μ=0.12$ D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic $φ$ and nanostructure that may contribute to the spread of reported $φ$ for Pt and other surfaces, and may be useful toward more complete descriptions of polycrystalline metals in models of field emission and other related vacuum electronics phenomena, e.g. arc initiation.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Modelling of flow through spatially varying porous media with application to topology optimization
Authors:
Rakotobe Michaël,
Ramalingom Delphine,
Cocquet Pierre-Henri,
Bastide Alain
Abstract:
The objective of this study is to highlight the effect of porosity variation in a topology optimization process in the field of fluid dynamics. Usually a penalization term added to momentum equation provides to get material distribution. Every time material is added inside the computational domain, there is creation of new fluid-solid interfaces and apparition of gradient of porosity. However, at…
▽ More
The objective of this study is to highlight the effect of porosity variation in a topology optimization process in the field of fluid dynamics. Usually a penalization term added to momentum equation provides to get material distribution. Every time material is added inside the computational domain, there is creation of new fluid-solid interfaces and apparition of gradient of porosity. However, at present, porosity variation is not taken account in topology optimization and the penalization term used to locate the solid is analogous to a Darcy term used for flows in porous media. With that in mind, in this paper, we first develop an original one-domain macroscopic model for the modelling of flow through spatially varying porous media that goes beyond the scope of Darcy regime. Next, we numerically solve a topology optimization problem and compare the results obtained with the standard model that does not include effect of porosity variation with those obtained with our model. Among our results, we show for instance that the designs obtained are different but percentages of reduction of objective functional remain quite close (below 4\% of difference). In addition, we illustrate effects of porosity and particle diameter values on final optimized designs.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach
Authors:
José M. Maisog,
Wenhong Li,
Yanchun Xu,
Brian Hurley,
Hetal Shah,
Ryan Lemberg,
Tina Borden,
Stephen Bandeian,
Melissa Schline,
Roxanna Cross,
Alan Spiro,
Russ Michael,
Alexander Gutfraind
Abstract:
Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000…
▽ More
Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000$ and who represent just 0.16% of the insured population but currently account for 9% of all healthcare costs. In this study, we aimed to develop a high-performance algorithm to predict HiCCs to inform a novel care management system. Using health insurance claims from 48 million people and augmented with census data, we applied machine learning to train binary classification models to calculate the personal risk of HiCC. To train the models, we developed a platform starting with 6,006 variables across all clinical and demographic dimensions and constructed over one hundred candidate models. The best model achieved an area under the receiver operating characteristic curve of 91.2%. The model exceeds the highest published performance (84%) and remains high for patients with no prior history of high-cost status (89%), who have less than a full year of enrollment (87%), or lack pharmacy claims data (88%). It attains an area under the precision-recall curve of 23.1%, and precision of 74% at a threshold of 0.99. A care management program enrolling 500 people with the highest HiCC risk is expected to treat 199 true HiCCs and generate a net savings of $\$7.3$ million per year. Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone, even for rare high-cost claimants exceeding $\$250,000$. Our model demonstrates the transformational power of machine learning and artificial intelligence in care management, which would allow healthcare payers and providers to introduce the next generation of care management programs.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Linking microstructural evolution and macro-scale friction behavior in metals
Authors:
Nicolas Argibay,
Michael E. Chandross,
Shengfeng Cheng,
Joseph R. Michael
Abstract:
A correlation is established between the macro-scale friction regimes of metals and a transition between two dominant atomistic mechanisms of deformation. Metals tend to exhibit bi-stable friction behavior -- low and converging or high and diverging. These general trends in behavior are shown to be largely explained using a simplified model based on grain size evolution, as a function of contact s…
▽ More
A correlation is established between the macro-scale friction regimes of metals and a transition between two dominant atomistic mechanisms of deformation. Metals tend to exhibit bi-stable friction behavior -- low and converging or high and diverging. These general trends in behavior are shown to be largely explained using a simplified model based on grain size evolution, as a function of contact stress and temperature, and are demonstrated for pure copper and gold. Specifically, the low friction regime is linked to the formation of ultra-nanocrystalline surface films (10 to 20 nm), driving toward shear accommodation by grain boundary sliding. Above a critical combination of stress and temperature -- demonstrated to be a material property -- shear accommodation transitions to dislocation dominated plasticity and high friction. We utilize a combination of experimental and computational methods to develop and validate the proposed structure-property relationship. This quantitative framework provides a shift from phenomenological to mechanistic and predictive fundamental understanding of friction for crystalline materials, including engineering alloys.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
K$^+$-nucleus quasielastic scattering
Authors:
C. M. Kormanyos,
R. J. Peterson,
J. R. Shepard,
J. E. Wise,
S. Bart,
R. E. Chrien,
L. Lee,
B. L. Clausen,
J. Piekarewicz,
M. B. Barakat,
R. A. Michael,
T. Kishimoto
Abstract:
K$^+$--nucleus quasielastic cross sections measured for a laboratory kaon beam momentum of 705 MeV/$c$ are presented for 3--momentum transfers of 300 and 500 MeV/$c$. The measured differential cross sections for C, Ca and Pb at 500 MeV/$c$ are used to deduce the effective number of nucleons participating in the scattering, which are compared with estimates based on the eikonal approximation. The…
▽ More
K$^+$--nucleus quasielastic cross sections measured for a laboratory kaon beam momentum of 705 MeV/$c$ are presented for 3--momentum transfers of 300 and 500 MeV/$c$. The measured differential cross sections for C, Ca and Pb at 500 MeV/$c$ are used to deduce the effective number of nucleons participating in the scattering, which are compared with estimates based on the eikonal approximation. The long mean free path expected for K$^+$ mesons in nuclei is found. Double differential cross sections for C and Ca are compared to relativistic nuclear structure calculations.
△ Less
Submitted 30 July, 1993;
originally announced July 1993.