Search | arXiv e-print repository

Look-Ahead Acquisition Functions for Bernoulli Level Set Estimation

Authors: Benjamin Letham, Phillip Guan, Chase Tymms, Eytan Bakshy, Michael Shvartsman

Abstract: Level set estimation (LSE) is the problem of identifying regions where an unknown function takes values above or below a specified threshold. Active sampling strategies for efficient LSE have primarily been studied in continuous-valued functions. Motivated by applications in human psychophysics where common experimental designs produce binary responses, we study LSE active sampling with Bernoulli… ▽ More Level set estimation (LSE) is the problem of identifying regions where an unknown function takes values above or below a specified threshold. Active sampling strategies for efficient LSE have primarily been studied in continuous-valued functions. Motivated by applications in human psychophysics where common experimental designs produce binary responses, we study LSE active sampling with Bernoulli outcomes. With Gaussian process classification surrogate models, the look-ahead model posteriors used by state-of-the-art continuous-output methods are intractable. However, we derive analytic expressions for look-ahead posteriors of sublevel set membership, and show how these lead to analytic expressions for a class of look-ahead LSE acquisition functions, including information-based methods. Benchmark experiments show the importance of considering the global look-ahead impact on the entire posterior. We demonstrate a clear benefit to using this new class of acquisition functions on benchmark problems, and on a challenging real-world task of estimating a high-dimensional contrast sensitivity function. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS

arXiv:2203.01900 [pdf, other]

Sparse Bayesian Optimization

Authors: Sulin Liu, Qing Feng, David Eriksson, Benjamin Letham, Eytan Bakshy

Abstract: Bayesian optimization (BO) is a powerful approach to sample-efficient optimization of black-box objective functions. However, the application of BO to areas such as recommendation systems often requires taking the interpretability and simplicity of the configurations into consideration, a setting that has not been previously studied in the BO literature. To make BO useful for this setting, we pres… ▽ More Bayesian optimization (BO) is a powerful approach to sample-efficient optimization of black-box objective functions. However, the application of BO to areas such as recommendation systems often requires taking the interpretability and simplicity of the configurations into consideration, a setting that has not been previously studied in the BO literature. To make BO useful for this setting, we present several regularization-based approaches that allow us to discover sparse and more interpretable configurations. We propose a novel differentiable relaxation based on homotopy continuation that makes it possible to target sparsity by working directly with $L_0$ regularization. We identify failure modes for regularized BO and develop a hyperparameter-free method, sparsity exploring Bayesian optimization (SEBO) that seeks to simultaneously maximize a target objective and sparsity. SEBO and methods based on fixed regularization are evaluated on synthetic and real-world problems, and we show that we are able to efficiently optimize for sparsity. △ Less

Submitted 3 March, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2104.09549 [pdf, other]

Adaptive Nonparametric Psychophysics

Authors: Lucy Owen, Jonathan Browder, Benjamin Letham, Gideon Stocek, Chase Tymms, Michael Shvartsman

Abstract: We introduce a new set of models and adaptive psychometric testing methods for multidimensional psychophysics. In contrast to traditional adaptive staircase methods like PEST and QUEST, the method is multi-dimensional and does not require a grid over contextual dimensions, retaining sub-exponential scaling in the number of stimulus dimensions. In contrast to more recent multi-dimensional adaptive… ▽ More We introduce a new set of models and adaptive psychometric testing methods for multidimensional psychophysics. In contrast to traditional adaptive staircase methods like PEST and QUEST, the method is multi-dimensional and does not require a grid over contextual dimensions, retaining sub-exponential scaling in the number of stimulus dimensions. In contrast to more recent multi-dimensional adaptive methods, our underlying model does not require a parametric assumption about the interaction between intensity and the additional dimensions. In addition, we introduce a new active sampling policy that explicitly targets psychometric detection threshold estimation and does so substantially faster than policies that attempt to estimate the full psychometric function (though it still provides estimates of the function, albeit with lower accuracy). Finally, we introduce AEPsych, a user-friendly open-source package for nonparametric psychophysics that makes these technically-challenging methods accessible to the broader community. △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:2001.11659 [pdf, other]

Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

Authors: Benjamin Letham, Roberto Calandra, Akshara Rai, Eytan Bakshy

Abstract: Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, often via a random linear embedding. In this paper, we identify s… ▽ More Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, often via a random linear embedding. In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO. We study the properties of linear embeddings from the literature and show that some of the design choices in current approaches adversely impact their performance. We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion. △ Less

Submitted 22 October, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

arXiv:1910.06403 [pdf, other]

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

Authors: Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, Eytan Bakshy

Abstract: Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiatio… ▽ More Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques. BoTorch's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, simplifying implementation of new acquisition functions. Our approach is backed by novel theoretical convergence results and made practical by a distinctive algorithmic foundation that leverages fast predictive distributions, hardware acceleration, and deterministic optimization. We also propose a novel "one-shot" formulation of the Knowledge Gradient, enabled by a combination of our theoretical and software contributions. In experiments, we demonstrate the improved sample efficiency of BoTorch relative to other popular libraries. △ Less

Submitted 8 December, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Journal ref: Advances in Neural Information Processing Systems 33, 2020

arXiv:1904.01049 [pdf, other]

Bayesian Optimization for Policy Search via Online-Offline Experimentation

Authors: Benjamin Letham, Eytan Bakshy

Abstract: Online field experiments are the gold-standard way of evaluating changes to real-world interactive machine learning systems. Yet our ability to explore complex, multi-dimensional policy spaces - such as those found in recommendation and ranking problems - is often constrained by the limited number of experiments that can be run simultaneously. To alleviate these constraints, we augment online expe… ▽ More Online field experiments are the gold-standard way of evaluating changes to real-world interactive machine learning systems. Yet our ability to explore complex, multi-dimensional policy spaces - such as those found in recommendation and ranking problems - is often constrained by the limited number of experiments that can be run simultaneously. To alleviate these constraints, we augment online experiments with an offline simulator and apply multi-task Bayesian optimization to tune live machine learning systems. We describe practical issues that arise in these types of applications, including biases that arise from using a simulator and assumptions for the multi-task kernel. We measure empirical learning curves which show substantial gains from including data from biased offline experiments, and show how these learning curves are consistent with theoretical results for multi-task Gaussian process generalization. We find that improved kernel inference is a significant driver of multi-task generalization. Finally, we show several examples of Bayesian optimization efficiently tuning a live machine learning system by combining offline and online experiments. △ Less

Submitted 29 April, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1802.02219 [pdf, other]

Practical Transfer Learning for Bayesian Optimization

Authors: Matthias Feurer, Benjamin Letham, Frank Hutter, Eytan Bakshy

Abstract: When hyperparameter optimization of a machine learning algorithm is repeated for multiple datasets it is possible to transfer knowledge to an optimization run on a new dataset. We develop a new hyperparameter-free ensemble model for Bayesian optimization that is a generalization of two existing transfer learning extensions to Bayesian optimization and establish a worst-case bound compared to vanil… ▽ More When hyperparameter optimization of a machine learning algorithm is repeated for multiple datasets it is possible to transfer knowledge to an optimization run on a new dataset. We develop a new hyperparameter-free ensemble model for Bayesian optimization that is a generalization of two existing transfer learning extensions to Bayesian optimization and establish a worst-case bound compared to vanilla Bayesian optimization. Using a large collection of hyperparameter optimization benchmark problems, we demonstrate that our contributions substantially reduce optimization time compared to standard Gaussian process-based Bayesian optimization and improve over the current state-of-the-art for transfer hyperparameter optimization. △ Less

Submitted 24 October, 2022; v1 submitted 6 February, 2018; originally announced February 2018.

Comments: This version fixes a minor error in the equation in Section 3.2 of V3

arXiv:1706.07094 [pdf, other]

Constrained Bayesian Optimization with Noisy Experiments

Authors: Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy

Abstract: Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when t… ▽ More Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when the noise level is high, limiting its applicability to many randomized experiments. We derive an expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Simulations with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real-world experiments conducted at Facebook: optimizing a ranking system, and optimizing server compiler flags. △ Less

Submitted 26 June, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

arXiv:1511.03395 [pdf, other]

doi 10.1063/1.4953795

Prediction uncertainty and optimal experimental design for learning dynamical systems

Authors: Benjamin Letham, Portia A. Letham, Cynthia Rudin, Edward P. Browne

Abstract: Dynamical systems are frequently used to model biological systems. When these models are fit to data it is necessary to ascertain the uncertainty in the model fit. Here we present prediction deviation, a new metric of uncertainty that determines the extent to which observed data have constrained the model's predictions. This is accomplished by solving an optimization problem that searches for a pa… ▽ More Dynamical systems are frequently used to model biological systems. When these models are fit to data it is necessary to ascertain the uncertainty in the model fit. Here we present prediction deviation, a new metric of uncertainty that determines the extent to which observed data have constrained the model's predictions. This is accomplished by solving an optimization problem that searches for a pair of models that each provide a good fit for the observed data, yet have maximally different predictions. We develop a method for estimating a priori the impact that additional experiments would have on the prediction deviation, allowing the experimenter to design a set of experiments that would most reduce uncertainty. We use prediction deviation to assess uncertainty in a model of interferon-alpha inhibition of viral infection, and to select a sequence of experiments that reduces this uncertainty. Finally we prove a theoretical result which shows that prediction deviation provides bounds on the trajectories of the underlying true model. These results show that prediction deviation is a meaningful metric of uncertainty that can be used for optimal experimental design. △ Less

Submitted 6 June, 2017; v1 submitted 11 November, 2015; originally announced November 2015.

Journal ref: Chaos 26, 063110 (2016)

arXiv:1511.01644 [pdf, ps, other]

doi 10.1214/15-AOAS848

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

Authors: Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan

Abstract: We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model… ▽ More We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$_2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more accurate. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS848 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS848

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371

arXiv:1502.04243 [pdf, other]

Bayesian Inference of Arrival Rate and Substitution Behavior from Sales Transaction Data with Stockouts

Authors: Benjamin Letham, Lydia M. Letham, Cynthia Rudin

Abstract: When an item goes out of stock, sales transaction data no longer reflect the original customer demand, since some customers leave with no purchase while others substitute alternative products for the one that was out of stock. Here we develop a Bayesian hierarchical model for inferring the underlying customer arrival rate and choice model from sales transaction data and the corresponding stock lev… ▽ More When an item goes out of stock, sales transaction data no longer reflect the original customer demand, since some customers leave with no purchase while others substitute alternative products for the one that was out of stock. Here we develop a Bayesian hierarchical model for inferring the underlying customer arrival rate and choice model from sales transaction data and the corresponding stock levels. The model uses a nonhomogeneous Poisson process to allow the arrival rate to vary throughout the day, and allows for a variety of choice models. Model parameters are inferred using a stochastic gradient MCMC algorithm that can scale to large transaction databases. We fit the model to data from a local bakery and show that it is able to make accurate out-of-sample predictions, and to provide actionable insight into lost cookie sales. △ Less

Submitted 13 January, 2016; v1 submitted 14 February, 2015; originally announced February 2015.

Showing 1–11 of 11 results for author: Letham, B