Search | arXiv e-print repository

arXiv:2406.20062 [pdf, other]

Cost-aware Bayesian optimization via the Pandora's Box Gittins index

Authors: Qian Xie, Raul Astudillo, Peter Frazier, Ziv Scully, Alexander Terenin

Abstract: Bayesian optimization is a technique for efficiently optimizing unknown functions in a black-box manner. To handle practical settings where gathering data requires use of finite resources, it is desirable to explicitly incorporate function evaluation costs into Bayesian optimization policies. To understand how to do so, we develop a previously-unexplored connection between cost-aware Bayesian opti… ▽ More Bayesian optimization is a technique for efficiently optimizing unknown functions in a black-box manner. To handle practical settings where gathering data requires use of finite resources, it is desirable to explicitly incorporate function evaluation costs into Bayesian optimization policies. To understand how to do so, we develop a previously-unexplored connection between cost-aware Bayesian optimization and the Pandora's Box problem, a decision problem from economics. The Pandora's Box problem admits a Bayesian-optimal solution based on an expression called the Gittins index, which can be reinterpreted as an acquisition function. We study the use of this acquisition function for cost-aware Bayesian optimization, and demonstrate empirically that it performs well, particularly in medium-high dimensions. We further show that this performance carries over to classical Bayesian optimization without explicit evaluation costs. Our work constitutes a first step towards integrating techniques from Gittins index theory into Bayesian optimization. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.07866 [pdf, other]

Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

Authors: Samuel Tan, Peter I. Frazier

Abstract: We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes… ▽ More We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 2 figures, 3 tables

arXiv:2402.01845 [pdf, other]

Multi-Armed Bandits with Interference

Authors: Su Jia, Peter Frazier, Nathan Kallus

Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assig… ▽ More Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assigns an arm to each of $N$ experimental units over a time horizon of $T$ rounds. The reward of each unit in each round depends on the treatments of {\em all} units, where the influence of a unit decays in the spatial distance between units. Furthermore, we employ a general setup wherein the reward functions are chosen by an adversary and may vary arbitrarily across rounds and units. We first show that switchback policies achieve an optimal {\em expected} regret $\tilde O(\sqrt T)$ against the best fixed-arm policy. Nonetheless, the regret (as a random variable) for any switchback policy suffers a high variance, as it does not account for $N$. We propose a cluster randomization policy whose regret (i) is optimal in {\em expectation} and (ii) admits a high probability bound that vanishes in $N$. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2311.02146 [pdf, other]

Bayesian Optimization of Function Networks with Partial Evaluations

Authors: Poompol Buathong, Jiayue Wan, Raul Astudillo, Samuel Daulton, Maximilian Balandat, Peter I. Frazier

Abstract: Bayesian optimization is a powerful framework for optimizing functions that are expensive or time-consuming to evaluate. Recent work has considered Bayesian optimization of function networks (BOFN), where the objective function is given by a network of functions, each taking as input the output of previous nodes in the network as well as additional parameters. Leveraging this network structure has… ▽ More Bayesian optimization is a powerful framework for optimizing functions that are expensive or time-consuming to evaluate. Recent work has considered Bayesian optimization of function networks (BOFN), where the objective function is given by a network of functions, each taking as input the output of previous nodes in the network as well as additional parameters. Leveraging this network structure has been shown to yield significant performance improvements. Existing BOFN algorithms for general-purpose networks evaluate the full network at each iteration. However, many real-world applications allow for evaluating nodes individually. To exploit this, we propose a novel knowledge gradient acquisition function that chooses which node and corresponding inputs to evaluate in a cost-aware manner, thereby reducing query costs by evaluating only on a part of the network at each step. We provide an efficient approach to optimizing our acquisition function and show that it outperforms existing BOFN methods and other benchmarks across several synthetic and real-world problems. Our acquisition function is the first to enable cost-aware optimization of a broad class of function networks. △ Less

Submitted 12 June, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 34 pages, 15 figures, 3 tables

arXiv:2310.05649 [pdf, other]

Context, Composition, Automation, and Communication -- The C2AC Roadmap for Modeling and Simulation

Authors: Adelinde Uhrmacher, Peter Frazier, Reiner Hähnle, Franziska Klügl, Fabian Lorig, Bertram Ludäscher, Laura Nenzi, Cristina Ruiz-Martin, Bernhard Rumpe, Claudia Szabo, Gabriel A. Wainer, Pia Wilsdorf

Abstract: Simulation has become, in many application areas, a sine-qua-non. Most recently, COVID-19 has underlined the importance of simulation studies and limitations in current practices and methods. We identify four goals of methodological work for addressing these limitations. The first is to provide better support for capturing, representing, and evaluating the context of simulation studies, including… ▽ More Simulation has become, in many application areas, a sine-qua-non. Most recently, COVID-19 has underlined the importance of simulation studies and limitations in current practices and methods. We identify four goals of methodological work for addressing these limitations. The first is to provide better support for capturing, representing, and evaluating the context of simulation studies, including research questions, assumptions, requirements, and activities contributing to a simulation study. In addition, the composition of simulation models and other simulation studies' products must be supported beyond syntactical coherence, including aspects of semantics and purpose, enabling their effective reuse. A higher degree of automating simulation studies will contribute to more systematic, standardized simulation studies and their efficiency. Finally, it is essential to invest increased effort into effectively communicating results and the processes involved in simulation studies to enable their use in research and decision-making. These goals are not pursued independently of each other, but they will benefit from and sometimes even rely on advances in other subfields. In the present paper, we explore the basis and interdependencies evident in current research and practice and delineate future research directions based on these considerations. △ Less

Submitted 27 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

ACM Class: I.6

arXiv:2303.15746 [pdf, other]

qEUBO: A Decision-Theoretic Acquisition Function for Preferential Bayesian Optimization

Authors: Raul Astudillo, Zhiyuan Jerry Lin, Eytan Bakshy, Peter I. Frazier

Abstract: Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradien… ▽ More Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate $o(1/n)$ as the number of queries, $n$, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

arXiv:2301.12366 [pdf, other]

Smooth Non-Stationary Bandits

Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

Abstract: In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde Θ(T^{2/3})$ regret. However, in practice environments are often changing {… ▽ More In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde Θ(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $β$-Hölder function over (normalized) time, meaning it is $(β-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $β=2$. We complement this result by an $\Omg(T^{(β+1)/(2β+1)})$ lower bound for any integer $β\ge 1$, which matches our upper bound for $β=2$. △ Less

Submitted 7 June, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: Accepted by ICML 2023

arXiv:2205.09679 [pdf, ps, other]

Dynamic Pricing Provides Robust Equilibria in Stochastic Ridesharing Networks

Authors: J. Massey Cashore, Peter I. Frazier, Eva Tardos

Abstract: Ridesharing markets are complex: drivers are strategic, rider demand and driver availability are stochastic, and complex city-scale phenomena like weather induce large scale correlation across space and time. At the same time, past work has focused on a subset of these challenges. We propose a model of ridesharing networks with strategic drivers, spatiotemporal dynamics, and stochasticity. Support… ▽ More Ridesharing markets are complex: drivers are strategic, rider demand and driver availability are stochastic, and complex city-scale phenomena like weather induce large scale correlation across space and time. At the same time, past work has focused on a subset of these challenges. We propose a model of ridesharing networks with strategic drivers, spatiotemporal dynamics, and stochasticity. Supporting both computational tractability and better modeling flexibility than classical fluid limits, we use a two-level stochastic model that allows correlated shocks caused by weather or large public events. Using this model, we propose a novel pricing mechanism: stochastic spatiotemporal pricing (SSP). We show that the SSP mechanism is asymptotically incentive-compatible and that all (approximate) equilibria of the resulting game are asymptotically welfare-maximizing when the market is large enough. The SSP mechanism iteratively recomputes prices based on realized demand and supply, and in this sense prices dynamically. We show that this is critical: while a static variant of the SSP mechanism (whose prices vary with the market-level stochastic scenario but not individual rider and driver decisions) has a sequence of asymptotically welfare-optimal approximate equilibria, we demonstrate that it also has other equilibria producing extremely low social welfare. Thus, we argue that dynamic pricing is important for ensuring robustness in stochastic ride-sharing networks. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2203.15853 [pdf, other]

Near-optimality for infinite-horizon restless bandits with many arms

Authors: Xiangyu Zhang, Peter I. Frazier

Abstract: Restless bandits are an important class of problems with applications in recommender systems, active learning, revenue management and other areas. We consider infinite-horizon discounted restless bandits with many arms where a fixed proportion of arms may be pulled in each period and where arms share a finite state space. Although an average-case-optimal policy can be computed via stochastic dynam… ▽ More Restless bandits are an important class of problems with applications in recommender systems, active learning, revenue management and other areas. We consider infinite-horizon discounted restless bandits with many arms where a fixed proportion of arms may be pulled in each period and where arms share a finite state space. Although an average-case-optimal policy can be computed via stochastic dynamic programming, the computation required grows exponentially with the number of arms $N$. Thus, it is important to find scalable policies that can be computed efficiently for large $N$ and that are near optimal in this regime, in the sense that the optimality gap (i.e. the loss of expected performance against an optimal policy) per arm vanishes for large $N$. However, the most popular approach, the Whittle index, requires a hard-to-verify indexability condition to be well-defined and another hard-to-verify condition to guarantee a $o(N)$ optimality gap. We present a method resolving these difficulties. By replacing a global Lagrange multiplier used by the Whittle index with a sequence of Lagrangian multipliers, one per time period up to a finite truncation point, we derive a class of policies, called fluid-balance policies, that have a $O(\sqrt{N})$ optimality gap. Unlike the Whittle index, fluid-balance policies do not require indexability to be well-defined and their $O(\sqrt{N})$ optimality gap bound holds universally without sufficient conditions. We also demonstrate empirically that fluid-balance policies provide state-of-the-art performance on specific problems. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.11382 [pdf, other]

Preference Exploration for Efficient Bayesian Optimization with Multiple Outcomes

Authors: Zhiyuan Jerry Lin, Raul Astudillo, Peter I. Frazier, Eytan Bakshy

Abstract: We consider Bayesian optimization of expensive-to-evaluate experiments that generate vector-valued outcomes over which a decision-maker (DM) has preferences. These preferences are encoded by a utility function that is not known in closed form but can be estimated by asking the DM to express preferences over pairs of outcome vectors. To address this problem, we develop Bayesian optimization with pr… ▽ More We consider Bayesian optimization of expensive-to-evaluate experiments that generate vector-valued outcomes over which a decision-maker (DM) has preferences. These preferences are encoded by a utility function that is not known in closed form but can be estimated by asking the DM to express preferences over pairs of outcome vectors. To address this problem, we develop Bayesian optimization with preference exploration, a novel framework that alternates between interactive real-time preference learning with the DM via pairwise comparisons between outcomes, and Bayesian optimization with a learned compositional model of DM utility and outcomes. Within this framework, we propose preference exploration strategies specifically designed for this task, and demonstrate their performance via extensive simulation studies. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Journal ref: AISTATS 2022

arXiv:2201.00272 [pdf, other]

Thinking inside the box: A tutorial on grey-box Bayesian optimization

Authors: Raul Astudillo, Peter I. Frazier

Abstract: Bayesian optimization (BO) is a framework for global optimization of expensive-to-evaluate objective functions. Classical BO methods assume that the objective function is a black box. However, internal information about objective function computation is often available. For example, when optimizing a manufacturing line's throughput with simulation, we observe the number of parts waiting at each wo… ▽ More Bayesian optimization (BO) is a framework for global optimization of expensive-to-evaluate objective functions. Classical BO methods assume that the objective function is a black box. However, internal information about objective function computation is often available. For example, when optimizing a manufacturing line's throughput with simulation, we observe the number of parts waiting at each workstation, in addition to the overall throughput. Recent BO methods leverage such internal information to dramatically improve performance. We call these "grey-box" BO methods because they treat objective computation as partially observable and even modifiable, blending the black-box approach with so-called "white-box" first-principles knowledge of objective function computation. This tutorial describes these methods, focusing on BO of composite objective functions, where one can observe and selectively evaluate individual constituents that feed into the overall objective; and multi-fidelity BO, where one can evaluate cheaper approximations of the objective function by varying parameters of the evaluation oracle. △ Less

Submitted 1 January, 2022; originally announced January 2022.

Comments: Published as an advanced tutorial in the proceedings of the 2021 Winter Simulation Conference

arXiv:2112.15311 [pdf, other]

Bayesian Optimization of Function Networks

Authors: Raul Astudillo, Peter I. Frazier

Abstract: We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate. Such problems arise, for example, in reinforcement learning, engineering design, and manufacturing. While the standard Bayesian optimization approach observes only the final output, our approach deliv… ▽ More We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate. Such problems arise, for example, in reinforcement learning, engineering design, and manufacturing. While the standard Bayesian optimization approach observes only the final output, our approach delivers greater query efficiency by leveraging information that the former ignores: intermediate output within the network. This is achieved by modeling the nodes of the network using Gaussian processes and choosing the points to evaluate using, as our acquisition function, the expected improvement computed with respect to the implied posterior on the objective. Although the non-Gaussian nature of this posterior prevents computing our acquisition function in closed form, we show that it can be efficiently maximized via sample average approximation. In addition, we prove that our method is asymptotically consistent, meaning that it finds a globally optimal solution as the number of evaluations grows to infinity, thus generalizing previously known convergence results for the expected improvement. Notably, this holds even though our method might not evaluate the domain densely, instead leveraging problem structure to leave regions unexplored. Finally, we show that our approach dramatically outperforms standard Bayesian optimization methods in several synthetic and real-world problems. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Comments: In Advances in Neural Information Processing Systems, 2021

arXiv:2112.02833 [pdf, other]

Two-step Lookahead Bayesian Optimization with Inequality Constraints

Authors: Yunxiang Zhang, Xiangyu Zhang, Peter I. Frazier

Abstract: Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For i… ▽ More Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For instance, one existing non-myopic constrained BO method [Lam and Willcox, 2017] relies on computationally expensive unreliable brute-force derivative-free optimization of a Monte Carlo rollout acquisition function. Methods that use the reparameterization trick for more efficient derivative-based optimization of non-myopic acquisition functions in the unconstrained setting, like sample average approximation and infinitesimal perturbation analysis, do not extend: constraints introduce discontinuities in the sampled acquisition function surface that hinder its optimization. Moreover, we argue here that being non-myopic is even more important in constrained problems because fear of violating constraints pushes myopic methods away from sampling the boundary between feasible and infeasible regions, slowing the discovery of optimal solutions with tight constraints. In this paper, we propose a computationally efficient two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. To enable fast acquisition function optimization, we develop a novel likelihood-ratio-based unbiased estimator of the gradient of the two-step optimal acquisition function that does not use the reparameterization trick. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2111.06537 [pdf, other]

Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs

Authors: Raul Astudillo, Daniel R. Jiang, Maximilian Balandat, Eytan Bakshy, Peter I. Frazier

Abstract: Bayesian optimization (BO) is a sample-efficient approach to optimizing costly-to-evaluate black-box functions. Most BO methods ignore how evaluation costs may vary over the optimization domain. However, these costs can be highly heterogeneous and are often unknown in advance. This occurs in many practical settings, such as hyperparameter tuning of machine learning algorithms or physics-based simu… ▽ More Bayesian optimization (BO) is a sample-efficient approach to optimizing costly-to-evaluate black-box functions. Most BO methods ignore how evaluation costs may vary over the optimization domain. However, these costs can be highly heterogeneous and are often unknown in advance. This occurs in many practical settings, such as hyperparameter tuning of machine learning algorithms or physics-based simulation optimization. Moreover, those few existing methods that acknowledge cost heterogeneity do not naturally accommodate a budget constraint on the total evaluation cost. This combination of unknown costs and a budget constraint introduces a new dimension to the exploration-exploitation trade-off, where learning about the cost incurs the cost itself. Existing methods do not reason about the various trade-offs of this problem in a principled way, leading often to poor performance. We formalize this claim by proving that the expected improvement and the expected improvement per unit of cost, arguably the two most widely used acquisition functions in practice, can be arbitrarily inferior with respect to the optimal non-myopic policy. To overcome the shortcomings of existing approaches, we propose the budgeted multi-step expected improvement, a non-myopic acquisition function that generalizes classical expected improvement to the setting of heterogeneous and unknown evaluation costs. Finally, we show that our acquisition function outperforms existing methods in a variety of synthetic and real problems. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: In Advances in Neural Information Processing Systems, 2021

arXiv:2107.11911 [pdf, other]

Restless Bandits with Many Arms: Beating the Central Limit Theorem

Authors: Xiangyu Zhang, Peter I. Frazier

Abstract: We consider finite-horizon restless bandits with multiple pulls per period, which play an important role in recommender systems, active learning, revenue management, and many other areas. While an optimal policy can be computed, in principle, using dynamic programming, the computation required scales exponentially in the number of arms $N$. Thus, there is substantial value in understanding the per… ▽ More We consider finite-horizon restless bandits with multiple pulls per period, which play an important role in recommender systems, active learning, revenue management, and many other areas. While an optimal policy can be computed, in principle, using dynamic programming, the computation required scales exponentially in the number of arms $N$. Thus, there is substantial value in understanding the performance of index policies and other policies that can be computed efficiently for large $N$. We study the growth of the optimality gap, i.e., the loss in expected performance compared to an optimal policy, for such policies in a classical asymptotic regime proposed by Whittle in which $N$ grows while holding constant the fraction of arms that can be pulled per period. Intuition from the Central Limit Theorem and the tightest previous theoretical bounds suggest that this optimality gap should grow like $O(\sqrt{N})$. Surprisingly, we show that it is possible to outperform this bound. We characterize a non-degeneracy condition and a wide class of novel practically-computable policies, called fluid-priority policies, in which the optimality gap is $O(1)$. These include most widely-used index policies. When this non-degeneracy condition does not hold, we show that fluid-priority policies nevertheless have an optimality gap that is $O(\sqrt{N})$, significantly generalizing the class of policies for which convergence rates are known. We demonstrate that fluid-priority policies offer state-of-the-art performance on a collection of restless bandit problems in numerical experiments. △ Less

Submitted 25 July, 2021; originally announced July 2021.

arXiv:2007.05554 [pdf, other]

Bayesian Optimization of Risk Measures

Authors: Sait Cakmak, Raul Astudillo, Peter Frazier, Enlu Zhou

Abstract: We consider Bayesian optimization of objective functions of the form $ρ[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $ρ$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$. Such problems arise in decision making under uncertainty, such as in portfolio optimization and robust systems desig… ▽ More We consider Bayesian optimization of objective functions of the form $ρ[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $ρ$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$. Such problems arise in decision making under uncertainty, such as in portfolio optimization and robust systems design. We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency. Instead of modeling the objective function directly as is typical in Bayesian optimization, these algorithms model $F$ as a Gaussian process, and use the implied posterior on the objective function to decide which points to evaluate. We demonstrate the effectiveness of our approach in a variety of numerical experiments. △ Less

Submitted 3 November, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: Main paper: 15 pages with 2 figures. Supplement: 14 pages with 3 figures. To appear in NeurIPS 2020

arXiv:2006.08863 [pdf, other]

Matching Queues, Flexibility and Incentives

Authors: Francisco Castro, Peter Frazier, Hongyao Ma, Hamid Nazerzadeh, Chiwei Yan

Abstract: Problem definition: Agents in online marketplaces (such as ridesharing and freelancing platforms) are often strategic, and heterogeneous in their compatibility with different types of jobs: fully flexible agents can fulfill any job, whereas specialized agents can only complete specific subsets of jobs. Convention wisdom suggests reserving agents that are more flexible whenever possible, however th… ▽ More Problem definition: Agents in online marketplaces (such as ridesharing and freelancing platforms) are often strategic, and heterogeneous in their compatibility with different types of jobs: fully flexible agents can fulfill any job, whereas specialized agents can only complete specific subsets of jobs. Convention wisdom suggests reserving agents that are more flexible whenever possible, however this may incentivize agents to pretend to be more specialized, leading to loss in matches. We focus on designing a practical matching policy that performs well in a strategic environment. Methodology/results: We model the allocation of jobs to agents as a matching queue, and analyze the equilibrium performance of various matching policies when agents are strategic and report their own types. We show that reserving flexibility naively can backfire, to the extent that the equilibrium throughput can be arbitrarily bad compared to a policy which simply dispatches jobs to agents at random. To balance matching efficiency with agents' strategic considerations, we propose a new policy dubbed flexibility reservation with fallback and show that it enjoys robust performance. Managerial implications: Our work highlights the importance of considering agent strategic behavior when designing matching policies in online platforms and service systems. The robust performance guarantee, along with the parameter-free nature of our proposed policy makes it easy to implement in practice. We illustrate how this policy is implemented in the driver destination product of major ridesharing platforms. △ Less

Submitted 14 January, 2024; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:1911.05934 [pdf, ps, other]

Multi-Attribute Bayesian Optimization With Interactive Preference Learning

Authors: Raul Astudillo, Peter I. Frazier

Abstract: We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker (DM) whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of attributes and each vector of attributes is assigned a utility by the DM's utility function, which may be learned approximately using preferences expressed over pairs of… ▽ More We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker (DM) whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of attributes and each vector of attributes is assigned a utility by the DM's utility function, which may be learned approximately using preferences expressed over pairs of attribute vectors. Past work has used a point estimate of this utility function as if it were error-free within single-objective optimization. However, utility estimation errors may yield a poor suggested design. Furthermore, this approach produces a single suggested "best" design, whereas DMs often prefer to choose from a menu. We propose a novel multi-attribute Bayesian optimization with preference learning approach. Our approach acknowledges the uncertainty in preference estimation and implicitly chooses designs to evaluate that are good not just for a single estimated utility function but a range of likely ones. The outcome of our approach is a menu of designs and evaluated attributes from which the DM makes a final selection. We demonstrate the value and flexibility of our approach in a variety of experiments. △ Less

Submitted 3 March, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

arXiv:1909.12723 [pdf, other]

Information Design in Spatial Resource Competition

Authors: Pu Yang, Krishnamurthy Iyer, Peter Frazier

Abstract: We consider the information design problem in spatial resource competition settings. Agents gather at a location deciding whether to move to another location for possibly higher level of resources, and the utility each agent gets by moving to the other location decreases as more agents move there. The agents do not observe the resource level at the other location while a principal does and the pri… ▽ More We consider the information design problem in spatial resource competition settings. Agents gather at a location deciding whether to move to another location for possibly higher level of resources, and the utility each agent gets by moving to the other location decreases as more agents move there. The agents do not observe the resource level at the other location while a principal does and the principal would like to carefully release this information to attract a proper number of agents to move. We adopt the Bayesian persuasion framework and analyze the principal's optimal signaling mechanism design problem. We study both private and public signaling mechanisms. For private signaling, we show the optimal mechanism can be computed in polynomial time with respect to the number of agents. Obtaining the optimal private mechanism involves two steps: first, solve a linear program to get the marginal probability each agent should be recommended to move; second, sample the moving agents satisfying the marginal probabilities with a sequential sampling procedure. For public signaling, we show the sender preferred equilibrium has a simple threshold structure and the optimal public mechanism with respect to the sender preferred equilibrium can be computed in polynomial time. We support our analytical results with numerical computations that show the optimal private and public signaling mechanisms achieve substantially higher social welfare compared with no information or full information benchmarks in many settings. △ Less

Submitted 27 September, 2019; originally announced September 2019.

Comments: 25 pages, 2 figures

arXiv:1906.01537 [pdf, ps, other]

Bayesian Optimization of Composite Functions

Authors: Raul Astudillo, Peter I. Frazier

Abstract: We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function. While these problems can be solved with standard Bayesian optimization, we propose a novel approach that exploits the composite structure of the objective… ▽ More We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function. While these problems can be solved with standard Bayesian optimization, we propose a novel approach that exploits the composite structure of the objective function to substantially improve sampling efficiency. Our approach models $h$ using a multi-output Gaussian process and chooses where to sample using the expected improvement evaluated on the implied non-Gaussian posterior on $f$, which we call expected improvement for composite functions (\ei). Although \ei\ cannot be computed in closed form, we provide a novel stochastic gradient estimator that allows its efficient maximization. We also show that our approach is asymptotically consistent, i.e., that it recovers a globally optimal solution as sampling effort grows to infinity, generalizing previous convergence results for classical expected improvement. Numerical experiments show that our approach dramatically outperforms standard Bayesian optimization benchmarks, reducing simple regret by several orders of magnitude. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: In Proceedings of the 36th International Conference on Machine Learning, PMLR 97:354-363, 2019

Journal ref: In Proceedings of the 36th International Conference on Machine Learning, PMLR 97:354-363, 2019

arXiv:1903.04703 [pdf, other]

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

Authors: Jian Wu, Saul Toscano-Palmerin, Peter I. Frazier, Andrew Gordon Wilson

Abstract: Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives --- for example, validation error for a network traine… ▽ More Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives --- for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations --- values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1807.02811 [pdf, other]

A Tutorial on Bayesian Optimization

Authors: Peter I. Frazier

Abstract: Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique… ▽ More Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications. △ Less

Submitted 8 July, 2018; originally announced July 2018.

arXiv:1803.08661 [pdf, other]

Bayesian Optimization with Expensive Integrands

Authors: Saul Toscano-Palmerin, Peter I. Frazier

Abstract: We propose a Bayesian optimization algorithm for objective functions that are sums or integrals of expensive-to-evaluate functions, allowing noisy evaluations. These objective functions arise in multi-task Bayesian optimization for tuning machine learning hyperparameters, optimization via simulation, and sequential design of experiments with random environmental conditions. Our method is average-c… ▽ More We propose a Bayesian optimization algorithm for objective functions that are sums or integrals of expensive-to-evaluate functions, allowing noisy evaluations. These objective functions arise in multi-task Bayesian optimization for tuning machine learning hyperparameters, optimization via simulation, and sequential design of experiments with random environmental conditions. Our method is average-case optimal by construction when a single evaluation of the integrand remains within our evaluation budget. Achieving this one-step optimality requires solving a challenging value of information optimization problem, for which we provide a novel efficient discretization-free computational method. We also provide consistency proofs for our method in both continuum and discrete finite domains for objective functions that are sums. In numerical experiments comparing against previous state-of-the-art methods, including those that also leverage sum or integral structure, our method performs as well or better across a wide range of problems and offers significant improvements when evaluations are noisy or the integrand varies smoothly in the integrated variables. △ Less

Submitted 23 March, 2018; originally announced March 2018.

arXiv:1707.07919 [pdf, other]

Mean Field Equilibria for Resource Competition in Spatial Settings

Authors: Pu Yang, Krishnamurthy Iyer, Peter Frazier

Abstract: We study a model of competition among nomadic agents for time-varying and location-specific resources, arising in crowd-sourced transportation services, online communities, and traditional location-based economic activity. This model comprises a group of agents and a single location endowed with a dynamic stochastic resource process. Periodically, each agent derives a reward determined by the loca… ▽ More We study a model of competition among nomadic agents for time-varying and location-specific resources, arising in crowd-sourced transportation services, online communities, and traditional location-based economic activity. This model comprises a group of agents and a single location endowed with a dynamic stochastic resource process. Periodically, each agent derives a reward determined by the location's resource level and the number of other agents there, and has to decide whether to stay at the location or move. Upon moving, the agent arrives at a different location whose dynamics are independent and identical to the original location. Using the methodology of mean field equilibrium, we study the equilibrium behavior of the agents as a function of the dynamics of the stochastic resource process and the nature of the competition among co-located agents. We show that an equilibrium exists, where each agent decides whether to switch locations based only on their current location's resource level and the number of other agents there. We additionally show that when an agent's payoff is decreasing in the number of other agents at her location, equilibrium strategies obey a simple threshold structure. We show how to exploit this structure to compute equilibria numerically, and use these numerical techniques to study how system structure affects the agents' collective ability to explore their domain to find and effectively utilize resource-rich areas. △ Less

Submitted 16 August, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

Comments: 22 pages(49 including appendices), 3 figures

ACM Class: J.4

arXiv:1707.06541

Discretization-free Knowledge Gradient Methods for Bayesian Optimization

Authors: Jian Wu, Peter I. Frazier

Abstract: This paper studies Bayesian ranking and selection (R&S) problems with correlated prior beliefs and continuous domains, i.e. Bayesian optimization (BO). Knowledge gradient methods [Frazier et al., 2008, 2009] have been widely studied for discrete R&S problems, which sample the one-step Bayes-optimal point. When used over continuous domains, previous work on the knowledge gradient [Scott et al., 201… ▽ More This paper studies Bayesian ranking and selection (R&S) problems with correlated prior beliefs and continuous domains, i.e. Bayesian optimization (BO). Knowledge gradient methods [Frazier et al., 2008, 2009] have been widely studied for discrete R&S problems, which sample the one-step Bayes-optimal point. When used over continuous domains, previous work on the knowledge gradient [Scott et al., 2011, Wu and Frazier, 2016, Wu et al., 2017] often rely on a discretized finite approximation. However, the discretization introduces error and scales poorly as the dimension of domain grows. In this paper, we develop a fast discretization-free knowledge gradient method for Bayesian optimization. Our method is not restricted to the fully sequential setting, but useful in all settings where knowledge gradient can be used over continuous domains. We show how our method can be generalized to handle (i) batch of points suggestion (parallel knowledge gradient); (ii) the setting where derivative information is available in the optimization process (derivative-enabled knowledge gradient). In numerical experiments, we demonstrate that the discretization-free knowledge gradient method finds global optima significantly faster than previous Bayesian optimization algorithms on both synthetic test functions and real-world applications, especially when function evaluations are noisy; and derivative-enabled knowledge gradient can further improve the performances, even outperforming the gradient-based optimizer such as BFGS when derivative information is available. △ Less

Submitted 26 July, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

Comments: This paper, which combines and extends two conference papers (arXiv:1703.04389, arXiv:1606.04414), has been withdrawn by the authors because it was submitted prematurely before proper attribution could be provided

arXiv:1706.04304 [pdf, other]

Dueling Bandits With Weak Regret

Authors: Bangrui Chen, Peter I. Frazier

Abstract: We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: the more well-studied strong regret, which is 0 only when both arms pulled are the Condorcet winner; and the less well-studied weak regret, which is… ▽ More We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: the more well-studied strong regret, which is 0 only when both arms pulled are the Condorcet winner; and the less well-studied weak regret, which is 0 if either arm pulled is the Condorcet winner. We propose a new algorithm for this problem, Winner Stays (WS), with variations for each kind of regret: WS for weak regret (WS-W) has expected cumulative weak regret that is $O(N^2)$, and $O(N\log(N))$ if arms have a total order; WS for strong regret (WS-S) has expected cumulative strong regret of $O(N^2 + N \log(T))$, and $O(N\log(N)+N\log(T))$ if arms have a total order. WS-W is the first dueling bandit algorithm with weak regret that is constant in time. WS is simple to compute, even for problems with many arms, and we demonstrate through numerical experiments on simulated and real data that WS has significantly smaller regret than existing algorithms in both the weak- and strong-regret settings. △ Less

Submitted 13 June, 2017; originally announced June 2017.

arXiv:1703.04389 [pdf, other]

Bayesian Optimization with Gradients

Authors: Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier

Abstract: Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performa… ▽ More Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors. △ Less

Submitted 6 February, 2018; v1 submitted 13 March, 2017; originally announced March 2017.

Comments: Advances in Neural Information Processing Systems 30 (NIPS), 2017

Journal ref: Advances in Neural Information Processing Systems 30 (NIPS), 2017

arXiv:1702.07694 [pdf, other]

Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

Authors: Stephen N. Pallone, Peter I. Frazier, Shane G. Henderson

Abstract: We analyze the problem of learning a single user's preferences in an active learning setting, sequentially and adaptively querying the user over a finite time horizon. Learning is conducted via choice-based queries, where the user selects her preferred option among a small subset of offered alternatives. These queries have been shown to be a robust and efficient way to learn an individual's prefer… ▽ More We analyze the problem of learning a single user's preferences in an active learning setting, sequentially and adaptively querying the user over a finite time horizon. Learning is conducted via choice-based queries, where the user selects her preferred option among a small subset of offered alternatives. These queries have been shown to be a robust and efficient way to learn an individual's preferences. We take a parametric approach and model the user's preferences through a linear classifier, using a Bayesian prior to encode our current knowledge of this classifier. The rate at which we learn depends on the alternatives offered at every time epoch. Under certain noise assumptions, we show that the Bayes-optimal policy for maximally reducing entropy of the posterior distribution of this linear classifier is a greedy policy, and that this policy achieves a linear lower bound when alternatives can be constructed from the continuum. Further, we analyze a different metric called misclassification error, proving that the performance of the optimal policy that minimizes misclassification error is bounded below by a linear function of differential entropy. Lastly, we numerically compare the greedy entropy reduction policy with a knowledge gradient policy under a number of scenarios, examining their performance under both differential entropy and misclassification error. △ Less

Submitted 24 February, 2017; originally announced February 2017.

arXiv:1608.03585 [pdf, other]

Warm Starting Bayesian Optimization

Authors: Matthias Poloczek, Jialei Wang, Peter I. Frazier

Abstract: We develop a framework for warm-starting Bayesian optimization, that reduces the solution time required to solve an optimization problem that is one in a sequence of related problems. This is useful when optimizing the output of a stochastic simulator that fails to provide derivative information, for which Bayesian optimization methods are well-suited. Solving sequences of related optimization pro… ▽ More We develop a framework for warm-starting Bayesian optimization, that reduces the solution time required to solve an optimization problem that is one in a sequence of related problems. This is useful when optimizing the output of a stochastic simulator that fails to provide derivative information, for which Bayesian optimization methods are well-suited. Solving sequences of related optimization problems arises when making several business decisions using one optimization model and input data collected over different time periods or markets. While many gradient-based methods can be warm started by initiating optimization at the solution to the previous problem, this warm start approach does not apply to Bayesian optimization methods, which carry a full metamodel of the objective function from iteration to iteration. Our approach builds a joint statistical model of the entire collection of related objective functions, and uses a value of information calculation to recommend points to evaluate. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: To Appear in the Proc. of the 2016 Winter Simulation Conference

arXiv:1607.03195 [pdf, other]

Multi-Step Bayesian Optimization for One-Dimensional Feasibility Determination

Authors: J. Massey Cashore, Lemuel Kumarga, Peter I. Frazier

Abstract: Bayesian optimization methods allocate limited sampling budgets to maximize expensive-to-evaluate functions. One-step-lookahead policies are often used, but computing optimal multi-step-lookahead policies remains a challenge. We consider a specialized Bayesian optimization problem: finding the superlevel set of an expensive one-dimensional function, with a Markov process prior. We compute the Baye… ▽ More Bayesian optimization methods allocate limited sampling budgets to maximize expensive-to-evaluate functions. One-step-lookahead policies are often used, but computing optimal multi-step-lookahead policies remains a challenge. We consider a specialized Bayesian optimization problem: finding the superlevel set of an expensive one-dimensional function, with a Markov process prior. We compute the Bayes-optimal sampling policy efficiently, and characterize the suboptimality of one-step lookahead. Our numerical experiments demonstrate that the one-step lookahead policy is close to optimal in this problem, performing within 98% of optimal in the experimental settings considered. △ Less

Submitted 11 July, 2016; originally announced July 2016.

arXiv:1606.04414 [pdf, other]

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Authors: Jian Wu, Peter I. Frazier

Abstract: In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optim… ▽ More In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy. △ Less

Submitted 22 April, 2018; v1 submitted 14 June, 2016; originally announced June 2016.

Comments: Minor edits and typo fixes. Please cite "J. Wu and P. Frazier. The parallel knowledge gradient method for batch bayesian optimization. In Advances In Neural Information Processing Systems, pp. 3126-3134. 2016"

arXiv:1605.09088 [pdf, other]

The Bayesian Linear Information Filtering Problem

Authors: Bangrui Chen, Peter I. Frazier

Abstract: We present a Bayesian sequential decision-making formulation of the information filtering problem, in which an algorithm presents items (news articles, scientific papers, tweets) arriving in a stream, and learns relevance from user feedback on presented items. We model user preferences using a Bayesian linear model, similar in spirit to a Bayesian linear bandit. We compute a computational upper bo… ▽ More We present a Bayesian sequential decision-making formulation of the information filtering problem, in which an algorithm presents items (news articles, scientific papers, tweets) arriving in a stream, and learns relevance from user feedback on presented items. We model user preferences using a Bayesian linear model, similar in spirit to a Bayesian linear bandit. We compute a computational upper bound on the value of the optimal policy, which allows computing an optimality gap for implementable policies. We then use this analysis as motivation in introducing a pair of new Decompose-Then-Decide (DTD) heuristic policies, DTD-Dynamic-Programming (DTD-DP) and DTD-Upper-Confidence-Bound (DTD-UCB). We compare DTD-DP and DTD-UCB against several benchmarks on real and simulated data, demonstrating significant improvement, and show that the achieved performance is close to the upper bound. △ Less

Submitted 22 October, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

arXiv:1605.08838 [pdf, other]

Dueling Bandits with Dependent Arms

Authors: Bangrui Chen, Peter I. Frazier

Abstract: We study dueling bandits with weak utility-based regret when preferences over arms have a total order and carry observable feature vectors. The order is assumed to be determined by these feature vectors, an unknown preference vector, and a known utility function. This structure introduces dependence between preferences for pairs of arms, and allows learning about the preference over one pair of ar… ▽ More We study dueling bandits with weak utility-based regret when preferences over arms have a total order and carry observable feature vectors. The order is assumed to be determined by these feature vectors, an unknown preference vector, and a known utility function. This structure introduces dependence between preferences for pairs of arms, and allows learning about the preference over one pair of arms from the preference over another pair of arms. We propose an algorithm for this setting called Comparing The Best (CTB), which we show has constant expected cumulative weak utility-based regret. We provide a Bayesian interpretation for CTB, an implementation appropriate for a small number of arms, and an alternate implementation for many arms that can be used when the input parameters satisfy a decomposability condition. We demonstrate through numerical experiments that CTB with appropriate input parameters outperforms all benchmarks considered. △ Less

Submitted 14 June, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

arXiv:1604.07209 [pdf, ps, other]

Unbiased Comparative Evaluation of Ranking Functions

Authors: Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims

Abstract: Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, w… ▽ More Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing $k$ systems against a baseline, and ranking $k$ systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Comments: Under review; 10 pages

arXiv:1602.06571 [pdf, ps, other]

doi 10.1145/2872427.2883011

Mean Field Equilibria for Competitive Exploration in Resource Sharing Settings

Authors: Pu Yang, Krishnamurthy Iyer, Peter Frazier

Abstract: We consider a model of nomadic agents exploring and competing for time-varying location-specific resources, arising in crowdsourced transportation services, online communities, and in traditional location based economic activity. This model comprises a group of agents, and a set of locations each endowed with a dynamic stochastic resource process. Each agent derives a periodic reward determined by… ▽ More We consider a model of nomadic agents exploring and competing for time-varying location-specific resources, arising in crowdsourced transportation services, online communities, and in traditional location based economic activity. This model comprises a group of agents, and a set of locations each endowed with a dynamic stochastic resource process. Each agent derives a periodic reward determined by the overall resource level at her location, and the number of other agents there. Each agent is strategic and free to move between locations, and at each time decides whether to stay at the same node or switch to another one. We study the equilibrium behavior of the agents as a function of dynamics of the stochastic resource process and the nature of the externality each agent imposes on others at the same location. In the asymptotic limit with the number of agents and locations increasing proportionally, we show that an equilibrium exists and has a threshold structure, where each agent decides to switch to a different location based only on their current location's resource level and the number of other agents at that location. This result provides insight into how system structure affects the agents' collective ability to explore their domain to find and effectively utilize resource-rich areas. It also allows assessing the impact of changing the reward structure through penalties or subsidies. △ Less

Submitted 21 February, 2016; originally announced February 2016.

Comments: 17 pages, 1 figure, 1 table, to appear in proceedings of the 25th International World Wide Web Conference(WWW2016)

ACM Class: J.4

arXiv:1602.02338 [pdf, other]

Stratified Bayesian Optimization

Authors: Saul Toscano-Palmerin, Peter I. Frazier

Abstract: We consider derivative-free black-box global optimization of expensive noisy functions, when most of the randomness in the objective is produced by a few influential scalar random inputs. We present a new Bayesian global optimization algorithm, called Stratified Bayesian Optimization (SBO), which uses this strong dependence to improve performance. Our algorithm is similar in spirit to stratificati… ▽ More We consider derivative-free black-box global optimization of expensive noisy functions, when most of the randomness in the objective is produced by a few influential scalar random inputs. We present a new Bayesian global optimization algorithm, called Stratified Bayesian Optimization (SBO), which uses this strong dependence to improve performance. Our algorithm is similar in spirit to stratification, a technique from simulation, which uses strong dependence on a categorical representation of the random input to reduce variance. We demonstrate in numerical experiments that SBO outperforms state-of-the-art Bayesian optimization benchmarks that do not leverage this dependence. △ Less

Submitted 20 February, 2016; v1 submitted 6 February, 2016; originally announced February 2016.

arXiv:1512.09204 [pdf, other]

Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies

Authors: Weici Hu, Peter I. Frazier

Abstract: We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation i… ▽ More We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Based on the Lagrangian Relaxation technique in Adelman & Mersereau (2008), we provide a computationally tractable instance-specific upper bound on the value of this Bayes-optimal policy, which can in turn be used to bound the optimality gap of any other sub-optimal policy. In an approach similar in spirit to the Whittle index for restless multiarmed bandits, we provide an index policy for effort allocation in crowdsourcing and demonstrate numerically that it outperforms other stateof- arts and performs close to optimal solution. △ Less

Submitted 30 December, 2015; originally announced December 2015.

arXiv:1505.06538 [pdf, other]

Clustering via Content-Augmented Stochastic Blockmodels

Authors: J. Massey Cashore, Xiaoting Zhao, Alexander A. Alemi, Yujia Liu, Peter I. Frazier

Abstract: Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters from this interaction data. These methods, however, typically ignore the items' contents and the information they provide about item clusters, despite… ▽ More Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters from this interaction data. These methods, however, typically ignore the items' contents and the information they provide about item clusters, despite the tendency of items in the same latent cluster to share commonalities in content. We introduce content-augmented stochastic blockmodels (CASB), which use item content together with user-item interaction data to enhance the user communities and item clusters learned. Comparisons to several state-of-the-art benchmark methods, on datasets arising from scientists interacting with scientific articles, show that content-augmented stochastic blockmodels provide highly accurate clusters with respect to metrics representative of the underlying community structure. △ Less

Submitted 25 May, 2015; originally announced May 2015.

arXiv:1504.05929 [pdf, other]

A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution

Authors: Bishan Yang, Claire Cardie, Peter Frazier

Abstract: We present a novel hierarchical distance-dependent Bayesian model for event coreference resolution. While existing generative models for event coreference resolution are completely unsupervised, our model allows for the incorporation of pairwise distances between event mentions -- information that is widely used in supervised coreference models to guide the generative clustering processing for bet… ▽ More We present a novel hierarchical distance-dependent Bayesian model for event coreference resolution. While existing generative models for event coreference resolution are completely unsupervised, our model allows for the incorporation of pairwise distances between event mentions -- information that is widely used in supervised coreference models to guide the generative clustering processing for better event clustering both within and across documents. We model the distances between event mentions using a feature-rich learnable distance function and encode them as Bayesian priors for nonparametric clustering. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods for both within- and cross-document event coreference resolution. △ Less

Submitted 25 September, 2015; v1 submitted 22 April, 2015; originally announced April 2015.

Comments: 12 pages, 3 figures

arXiv:1410.7852 [pdf, other]

A Markov Decision Process Analysis of the Cold Start Problem in Bayesian Information Filtering

Authors: Xiaoting Zhao, Peter I. Frazier

Abstract: We consider the information filtering problem, in which we face a stream of items, and must decide which ones to forward to a user to maximize the number of relevant items shown, minus a penalty for each irrelevant item shown. Forwarding decisions are made separately in a personalized way for each user. We focus on the cold-start setting for this problem, in which we have limited historical data o… ▽ More We consider the information filtering problem, in which we face a stream of items, and must decide which ones to forward to a user to maximize the number of relevant items shown, minus a penalty for each irrelevant item shown. Forwarding decisions are made separately in a personalized way for each user. We focus on the cold-start setting for this problem, in which we have limited historical data on the user's preferences, and must rely on feedback from forwarded articles to learn which the fraction of items relevant to the user in each of several item categories. Performing well in this setting requires trading exploration vs. exploitation, forwarding items that are likely to be irrelevant, to allow learning that will improve later performance. In a Bayesian setting, and using Markov decision processes, we show how the Bayes-optimal forwarding algorithm can be computed efficiently when the user will examine each forwarded article, and how an upper bound on the Bayes-optimal procedure and a heuristic index policy can be obtained for the setting when the user will examine only a limited number of forwarded items. We present results from simulation experiments using parameters estimated using historical data from arXiv.org. △ Less

Submitted 28 October, 2014; originally announced October 2014.

Comments: 12 pages, 9 figures

arXiv:1407.8186 [pdf, other]

Exploration vs. Exploitation in the Information Filtering Problem

Authors: Xiaoting Zhao, Peter I. Frazier

Abstract: We consider information filtering, in which we face a stream of items too voluminous to process by hand (e.g., scientific articles, blog posts, emails), and must rely on a computer system to automatically filter out irrelevant items. Such systems face the exploration vs. exploitation tradeoff, in which it may be beneficial to present an item despite a low probability of relevance, just to learn ab… ▽ More We consider information filtering, in which we face a stream of items too voluminous to process by hand (e.g., scientific articles, blog posts, emails), and must rely on a computer system to automatically filter out irrelevant items. Such systems face the exploration vs. exploitation tradeoff, in which it may be beneficial to present an item despite a low probability of relevance, just to learn about future items with similar content. We present a Bayesian sequential decision-making model of this problem, show how it may be solved to optimality using a decomposition to a collection of two-armed bandit problems, and show structural results for the optimal policy. We show that the resulting method is especially useful when facing the cold start problem, i.e., when filtering items for new users without a long history of past interactions. We then present an application of this information filtering method to a historical dataset from the arXiv.org repository of scientific articles. △ Less

Submitted 8 February, 2015; v1 submitted 30 July, 2014; originally announced July 2014.

Comments: 36 pages, 5 figures

arXiv:1407.4446 [pdf, other]

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

Authors: Weidong Han, Purnima Rajan, Peter I. Frazier, Bruno M. Jedynak

Abstract: We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entr… ▽ More We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We study a probabilistic setting with entropy loss, in which we assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions. We present a new non-adaptive policy, called the dyadic policy, show it is optimal among non-adaptive policies, and is within a factor of two of optimal among adaptive policies. This policy is quick to compute, its nonadaptive nature makes it easy to parallelize, and our bounds show it performs well even when compared with adaptive policies. We also study an adaptive greedy policy, which maximizes the one-step expected reduction in entropy, and show that it performs at least as well as the dyadic policy, offering greater query efficiency but reduced parallelism. Numerical experiments demonstrate that both procedures outperform a divide-and-conquer benchmark policy from the literature, called sequential bifurcation, and show how these procedures may be applied in a stylized computer vision problem. △ Less

Submitted 22 September, 2015; v1 submitted 16 July, 2014; originally announced July 2014.

arXiv:1407.2676 [pdf, other]

A New Optimal Stepsize For Approximate Dynamic Programming

Authors: Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell

Abstract: Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally inte… ▽ More Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments. △ Less

Submitted 13 July, 2014; v1 submitted 9 July, 2014; originally announced July 2014.

Comments: Matlab files are included with the paper source

Showing 1–43 of 43 results for author: Frazier, P