-
Battery Operations in Electricity Markets: Strategic Behavior and Distortions
Authors:
Jerry Anunrojwong,
Santiago R. Balseiro,
Omar Besbes,
Bolun Xu
Abstract:
Electric power systems are undergoing a major transformation as they integrate intermittent renewable energy sources, and batteries to smooth out variations in renewable energy production. As privately-owned batteries grow from their role as marginal "price-takers" to significant players in the market, a natural question arises: How do batteries operate in electricity markets, and how does the str…
▽ More
Electric power systems are undergoing a major transformation as they integrate intermittent renewable energy sources, and batteries to smooth out variations in renewable energy production. As privately-owned batteries grow from their role as marginal "price-takers" to significant players in the market, a natural question arises: How do batteries operate in electricity markets, and how does the strategic behavior of decentralized batteries distort decisions compared to centralized batteries?
We propose an analytically tractable model that captures salient features of the highly complex electricity market. We derive in closed form the resulting battery behavior and generation cost in three operating regimes: (i) no battery, (ii) centralized battery, and (ii) decentralized profit-maximizing battery. We establish that a decentralized battery distorts its discharge decisions in three ways. First, there is quantity withholding, i.e., discharging less than centrally optimal. Second, there is a shift in participation from day-ahead to real-time, i.e., postponing some of its discharge from day-ahead to real-time. Third, there is reduction in real-time responsiveness, or discharging less in response to smoothing real-time demand than centrally optimal. We quantify each of the three forms of distortions in terms of market fundamentals. To illustrate our results, we calibrate our model to Los Angeles and Houston and show that the loss from incentive misalignment could be consequential.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
The Best of Many Robustness Criteria in Decision Making: Formulation and Application to Robust Pricing
Authors:
Jerry Anunrojwong,
Santiago R. Balseiro,
Omar Besbes
Abstract:
In robust decision-making under non-Bayesian uncertainty, different robust optimization criteria, such as maximin performance, minimax regret, and maximin ratio, have been proposed. In many problems, all three criteria are well-motivated and well-grounded from a decision-theoretic perspective, yet different criteria give different prescriptions. This paper initiates a systematic study of overfitti…
▽ More
In robust decision-making under non-Bayesian uncertainty, different robust optimization criteria, such as maximin performance, minimax regret, and maximin ratio, have been proposed. In many problems, all three criteria are well-motivated and well-grounded from a decision-theoretic perspective, yet different criteria give different prescriptions. This paper initiates a systematic study of overfitting to robustness criteria. How good is a prescription derived from one criterion when evaluated against another criterion? Does there exist a prescription that performs well against all criteria of interest? We formalize and study these questions through the prototypical problem of robust pricing under various information structures, including support, moments, and percentiles of the distribution of values. We provide a unified analysis of three focal robust criteria across various information structures and evaluate the relative performance of mechanisms optimized for each criterion against the others. We find that mechanisms optimized for one criterion often perform poorly against other criteria, highlighting the risk of overfitting to a particular robustness criterion. Remarkably, we show it is possible to design mechanisms that achieve good performance across all three criteria simultaneously, suggesting that decision-makers need not compromise among criteria.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dynamic Pricing for Reusable Resources: The Power of Two Prices
Authors:
Santiago R. Balseiro,
Will Ma,
Wenxin Zhang
Abstract:
Motivated by real-world applications such as rental and cloud computing services, we investigate pricing for reusable resources. We consider a system where a single resource with a fixed number of identical copies serves customers with heterogeneous willingness-to-pay (WTP), and the usage duration distribution is general. Optimal dynamic policies are computationally intractable when usage duration…
▽ More
Motivated by real-world applications such as rental and cloud computing services, we investigate pricing for reusable resources. We consider a system where a single resource with a fixed number of identical copies serves customers with heterogeneous willingness-to-pay (WTP), and the usage duration distribution is general. Optimal dynamic policies are computationally intractable when usage durations are not memoryless, so existing literature has focused on static pricing, whose steady-state reward rate converges to optimality at rate $\mathcal{O}(c^{-1/2})$ when supply and demand scale with $c$. We show, however, that this convergence rate is suboptimal, and propose a class of dynamic "stock-dependent" policies that 1) preserves computational tractability and 2) has a steady-state reward rate converging to optimality faster than $c^{-1/2}$. We characterize the tight convergence rate for stock-dependent policies and show that they can in fact be achieved by a simple two-price policy, that sets a higher price when the stock is below some threshold and a lower price otherwise. Finally, we demonstrate this "minimally dynamic" class of two-price policies to perform well numerically, even in non-asymptotic settings, suggesting that a little dynamicity can go a long way.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Robust Auction Design with Support Information
Authors:
Jerry Anunrojwong,
Santiago R. Balseiro,
Omar Besbes
Abstract:
A seller wants to sell an item to n buyers. Buyer valuations are drawn i.i.d. from a distribution unknown to the seller; the seller only knows that the support is included in [a, b]. To be robust, the seller chooses a DSIC mechanism that optimizes the worst-case performance relative to the first-best benchmark. Our analysis unifies the regret and the ratio objectives.
For these objectives, we de…
▽ More
A seller wants to sell an item to n buyers. Buyer valuations are drawn i.i.d. from a distribution unknown to the seller; the seller only knows that the support is included in [a, b]. To be robust, the seller chooses a DSIC mechanism that optimizes the worst-case performance relative to the first-best benchmark. Our analysis unifies the regret and the ratio objectives.
For these objectives, we derive an optimal mechanism and the corresponding performance in quasi-closed form, as a function of the support information and the number of buyers n. Our analysis reveals three regimes of support information and a new class of robust mechanisms. i.) With "low" support information, the optimal mechanism is a second-price auction (SPA) with random reserve, a focal class in earlier literature. ii.) With "high" support information, SPAs are strictly suboptimal, and an optimal mechanism belongs to a class of mechanisms we introduce, which we call pooling auctions (POOL); whenever the highest value is above a threshold, the mechanism still allocates to the highest bidder, but otherwise the mechanism allocates to a uniformly random buyer, i.e., pools low types. iii.) With "moderate" support information, a randomization between SPA and POOL is optimal.
We also characterize optimal mechanisms within nested central subclasses of mechanisms: standard mechanisms that only allocate to the highest bidder, SPA with random reserve, and SPA with no reserve. We show strict separations in terms of performance across classes, implying that deviating from standard mechanisms is necessary for robustness.
△ Less
Submitted 26 August, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
A Field Guide for Pacing Budget and ROS Constraints
Authors:
Santiago R. Balseiro,
Kshipra Bhawalkar,
Zhe Feng,
Haihao Lu,
Vahab Mirrokni,
Balasubramanian Sivan,
Di Wang
Abstract:
Budget pacing is a popular service that has been offered by major internet advertising platforms since their inception. Budget pacing systems seek to optimize advertiser returns subject to budget constraints by smoothly spending advertiser budgets. In the past few years, autobidding products that provide real-time bidding as a service to advertisers have seen a prominent rise in adoption. A popula…
▽ More
Budget pacing is a popular service that has been offered by major internet advertising platforms since their inception. Budget pacing systems seek to optimize advertiser returns subject to budget constraints by smoothly spending advertiser budgets. In the past few years, autobidding products that provide real-time bidding as a service to advertisers have seen a prominent rise in adoption. A popular autobidding strategy is value maximization subject to return-on-spend (ROS) constraints. For historical/business reasons, the systems that govern these two services, namely budget pacing and ROS pacing, are not always a unified and coordinated entity that optimizes a global objective subject to both constraints. The purpose of this work is to theoretically and empirically compare algorithms with different degrees of coordination between these two pacing systems.
In particular, we compare (a) a fully-decoupled sequential algorithm that first constructs the advertiser's ROS-pacing bid and then lowers that bid for budget pacing; (b) a minimally-coupled min-pacing algorithm that runs these two services independently, obtains the bid multipliers from both of them and applies the minimum of the two multipliers as the effective multiplier; and (c) a fully-coupled dual-based algorithm that optimally combines the dual variables from both the systems. Our main contribution is to theoretically analyze the min-pacing algorithm and show that it attains similar guarantees to the fully-coupled canonical dual-based algorithm. On the other hand, we show that the sequential algorithm, even though appealing by virtue of being fully decoupled, could badly violate the constraints. We validate our theoretical findings empirically by showing that the min-pacing algorithm performs almost as well as the canonical dual-based algorithm on a semi-synthetic dataset based on a large online advertising platform's data.
△ Less
Submitted 15 December, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Robust Budget Pacing with a Single Sample
Authors:
Santiago Balseiro,
Rachitesh Kumar,
Vahab Mirrokni,
Balasubramanian Sivan,
Di Wang
Abstract:
Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this…
▽ More
Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: how many historical samples are required to learn a good expenditure plan? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of $T\log T$ samples per distribution to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that just one sample per distribution is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Online Resource Allocation under Horizon Uncertainty
Authors:
Santiago Balseiro,
Christian Kroer,
Rachitesh Kumar
Abstract:
We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prio…
▽ More
We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prior results crucially and universally rely on the strong assumption that the total number of requests (the horizon) is known to the decision maker in advance. In many applications, such as revenue management and online advertising, the number of requests can vary widely because of fluctuations in demand or user traffic intensity. In this work, we develop online algorithms that are robust to horizon uncertainty. In sharp contrast to the known-horizon setting, no algorithm can achieve even a constant asymptotic competitive ratio that is independent of the horizon uncertainty. We introduce a novel generalization of dual mirror descent which allows the decision maker to specify a schedule of time-varying target consumption rates, and prove corresponding performance guarantees. We go on to give a fast algorithm for computing a schedule of target consumption rates that leads to near-optimal performance in the unknown-horizon setting. In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large. Finally, we also provide a way to incorporate machine-learned predictions about the horizon which interpolates between the known and unknown horizon settings.
△ Less
Submitted 22 June, 2023; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Uniformly Bounded Regret in Dynamic Fair Allocation
Authors:
Santiago R. Balseiro,
Shangzhou Xia
Abstract:
We study a dynamic allocation problem in which $T$ sequentially arriving divisible resources are to be allocated to a number of agents with linear utilities. The marginal utilities of each resource to the agents are drawn stochastically from a known joint distribution, independently and identically across time, and the central planner makes immediate and irrevocable allocation decisions. Most work…
▽ More
We study a dynamic allocation problem in which $T$ sequentially arriving divisible resources are to be allocated to a number of agents with linear utilities. The marginal utilities of each resource to the agents are drawn stochastically from a known joint distribution, independently and identically across time, and the central planner makes immediate and irrevocable allocation decisions. Most works on dynamic resource allocation aim to maximize the utilitarian welfare, i.e., the efficiency of the allocation, which may result in unfair concentration of resources on certain high-utility agents while leaving others' demands under-fulfilled. In this paper, aiming at balancing efficiency and fairness, we instead consider a broad collection of welfare metrics, the Hölder means, which includes the Nash social welfare and the egalitarian welfare. To this end, we first study a fluid-based policy derived from a deterministic surrogate to the underlying problem and show that for all smooth Hölder mean welfare metrics it attains an $O(1)$ regret over the time horizon length $T$ against the hindsight optimum, i.e., the optimal welfare if all utilities were known in advance of deciding on allocations. However, when evaluated under the non-smooth egalitarian welfare, the fluid-based policy attains a regret of order $Θ(\sqrt{T})$. We then propose a new policy built thereupon, called Backward Infrequent Re-solving with Thresholding ($\mathsf{BIRT}$), which consists of re-solving the deterministic surrogate problem at most $O(\log\log T)$ times. We prove the $\mathsf{BIRT}$ policy attains an $O(1)$ regret against the hindsight optimal egalitarian welfare, independently of the time horizon length $T$. We conclude by presenting numerical experiments to corroborate our theoretical claims and to illustrate the significant performance improvement against several benchmark policies.
△ Less
Submitted 23 June, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
On the Robustness of Second-Price Auctions in Prior-Independent Mechanism Design
Authors:
Jerry Anunrojwong,
Santiago R. Balseiro,
Omar Besbes
Abstract:
Classical Bayesian mechanism design relies on the common prior assumption, but such prior is often not available in practice. We study the design of prior-independent mechanisms that relax this assumption: the seller is selling an indivisible item to $n$ buyers such that the buyers' valuations are drawn from a joint distribution that is unknown to both the buyers and the seller; buyers do not need…
▽ More
Classical Bayesian mechanism design relies on the common prior assumption, but such prior is often not available in practice. We study the design of prior-independent mechanisms that relax this assumption: the seller is selling an indivisible item to $n$ buyers such that the buyers' valuations are drawn from a joint distribution that is unknown to both the buyers and the seller; buyers do not need to form beliefs about competitors, and the seller assumes the distribution is adversarially chosen from a specified class. We measure performance through the worst-case regret, or the difference between the expected revenue achievable with perfect knowledge of buyers' valuations and the actual mechanism revenue.
We study a broad set of classes of valuation distributions that capture a wide spectrum of possible dependencies: independent and identically distributed (i.i.d.) distributions, mixtures of i.i.d. distributions, affiliated and exchangeable distributions, exchangeable distributions, and all joint distributions. We derive in quasi closed form the minimax values and the associated optimal mechanism. In particular, we show that the first three classes admit the same minimax regret value, which is decreasing with the number of competitors, while the last two have the same minimax regret equal to that of the single buyer case. Furthermore, we show that the minimax optimal mechanisms have a simple form across all settings: a second-price auction with random reserve prices, which shows its robustness in prior-independent mechanism design. En route to our results, we also develop a principled methodology to determine the form of the optimal mechanism and worst-case distribution via first-order conditions that should be of independent interest in other minimax problems.
△ Less
Submitted 18 January, 2024; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Single-Leg Revenue Management with Advice
Authors:
Santiago Balseiro,
Christian Kroer,
Rachitesh Kumar
Abstract:
Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given $n$ units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are avai…
▽ More
Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given $n$ units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are available, which are not robust to inaccuracies in the forecast, or online algorithms with worst-case performance guarantees, which can be too conservative in practice. In this work, we look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework, which attempts to harness the increasing prediction accuracy of machine learning methods by optimally incorporating advice about the future into online algorithms. In particular, we characterize the Pareto frontier that captures the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice. Moreover, we provide an online algorithm that always achieves performance on this Pareto frontier. We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management: we provide an algorithm to incorporate advice into protection levels that optimally trades off consistency and competitiveness. Moreover, we empirically evaluate the performance of these algorithms on synthetic data. We find that our algorithm for protection level policies performs remarkably well on most instances, even if it is not guaranteed to be on the Pareto frontier in theory. Our results extend to other unit-cost online allocations problems such as the display advertising and the multiple secretary problem together with more general variable-cost problems such as the online knapsack problem.
△ Less
Submitted 22 June, 2023; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent
Authors:
Santiago R. Balseiro,
Haihao Lu,
Vahab Mirrokni,
Balasubramanian Sivan
Abstract:
Dual-based proportional-integral-derivative (PID) controllers are often employed in practice to solve online allocation problems with global constraints, such as budget pacing in online advertising. However, controllers are used in a heuristic fashion and come with no provable guarantees on their performance. This paper provides the first regret bounds on the performance of dual-based PID controll…
▽ More
Dual-based proportional-integral-derivative (PID) controllers are often employed in practice to solve online allocation problems with global constraints, such as budget pacing in online advertising. However, controllers are used in a heuristic fashion and come with no provable guarantees on their performance. This paper provides the first regret bounds on the performance of dual-based PID controllers for online allocation problems. We do so by first establishing a fundamental connection between dual-based PID controllers and a new first-order algorithm for online convex optimization called \emph{Convolutional Mirror Descent} (CMD), which updates iterates based on a weighted moving average of past gradients. CMD recovers, in a special case, online mirror descent with momentum and optimistic mirror descent. We establish sufficient conditions under which CMD attains low regret for general online convex optimization problems with adversarial inputs. We leverage this new result to give the first regret bound for dual-based PID controllers for online allocation problems. As a byproduct of our proofs, we provide the first regret bound for CMD for non-smooth convex optimization, which might be of independent interest.
△ Less
Submitted 19 December, 2023; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Robust Auction Design in the Auto-bidding World
Authors:
Santiago Balseiro,
Yuan Deng,
Jieming Mao,
Vahab Mirrokni,
Song Zuo
Abstract:
In classic auction theory, reserve prices are known to be effective for improving revenue for the auctioneer against quasi-linear utility maximizing bidders. The introduction of reserve prices, however, usually do not help improve total welfare of the auctioneer and the bidders. In this paper, we focus on value maximizing bidders with return on spend constraints -- a paradigm that has drawn consid…
▽ More
In classic auction theory, reserve prices are known to be effective for improving revenue for the auctioneer against quasi-linear utility maximizing bidders. The introduction of reserve prices, however, usually do not help improve total welfare of the auctioneer and the bidders. In this paper, we focus on value maximizing bidders with return on spend constraints -- a paradigm that has drawn considerable attention recently as more advertisers adopt auto-bidding algorithms in advertising platforms -- and show that the introduction of reserve prices has a novel impact on the market. Namely, by choosing reserve prices appropriately the auctioneer can improve not only the total revenue but also the total welfare. Our results also demonstrate that reserve prices are robust to bidder types, i.e., reserve prices work well for different bidder types, such as value maximizers and utility maximizers, without using bidder type information. We generalize these results for a variety of auction mechanisms such as VCG, GSP, and first-price auctions. Moreover, we show how to combine these results with additive boosts to improve the welfare of the outcomes of the auction further. Finally, we complement our theoretical observations with an empirical study confirming the effectiveness of these ideas using data from online advertising auctions.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Mechanism Design under Approximate Incentive Compatibility
Authors:
Santiago Balseiro,
Omar Besbes,
Francisco Castro
Abstract:
A fundamental assumption in classical mechanism design is that buyers are perfect optimizers. However, in practice, buyers may be limited by their computational capabilities or a lack of information, and may not be able to perfectly optimize. This has motivated the introduction of approximate incentive compatibility (IC) as an appealing solution concept for practical mechanism design. While most o…
▽ More
A fundamental assumption in classical mechanism design is that buyers are perfect optimizers. However, in practice, buyers may be limited by their computational capabilities or a lack of information, and may not be able to perfectly optimize. This has motivated the introduction of approximate incentive compatibility (IC) as an appealing solution concept for practical mechanism design. While most of the literature focuses on the analysis of particular approximate IC mechanisms, this paper is the first to study the design of optimal mechanisms in the space of approximate IC mechanisms and to explore how much revenue can be garnered by moving from exact to approximate incentive constraints. We study the problem of a seller facing one buyer with private values and analyze optimal selling mechanisms under $\varepsilon$-incentive compatibility. We establish that the gains that can be garnered depend on the local curvature of the seller's revenue function around the optimal posted price when the buyer is a perfect optimizer. If the revenue function behaves locally like an $α$-power for $α\in (1,\infty)$, then no mechanism can garner gains higher than order $\varepsilon^{α/(2α-1)}$. This improves upon state-of-the-art results which imply maximum gains of $\varepsilon^{1/2}$ by providing the first parametric bounds that capture the impact of revenue function's curvature on revenue gains. Furthermore, we establish that an optimal mechanism needs to randomize as soon as $\varepsilon>0$ and construct a randomized mechanism that is guaranteed to achieve order $\varepsilon^{α/(2α-1)}$ additional revenues, leading to a tight characterization of the revenue implications of approximate IC constraints. Our work brings forward the need to optimize not only over allocations and payments but also over best responses, and we develop a new framework to address this challenge.
△ Less
Submitted 24 March, 2022; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Contextual Standard Auctions with Budgets: Revenue Equivalence and Efficiency Guarantees
Authors:
Santiago Balseiro,
Christian Kroer,
Rachitesh Kumar
Abstract:
The internet advertising market is a multi-billion dollar industry, in which advertisers buy thousands of ad placements every day by repeatedly participating in auctions. An important and ubiquitous feature of these auctions is the presence of campaign budgets, which specify the maximum amount the advertisers are willing to pay over a specified time period. In this paper, we present a new model to…
▽ More
The internet advertising market is a multi-billion dollar industry, in which advertisers buy thousands of ad placements every day by repeatedly participating in auctions. An important and ubiquitous feature of these auctions is the presence of campaign budgets, which specify the maximum amount the advertisers are willing to pay over a specified time period. In this paper, we present a new model to study the equilibrium bidding strategies in standard auctions, a large class of auctions that includes first- and second-price auctions, for advertisers who satisfy budget constraints on average. Our model dispenses with the common, yet unrealistic assumption that advertisers' values are independent and instead assumes a contextual model in which advertisers determine their values using a common feature vector. We show the existence of a natural value-pacing-based Bayes-Nash equilibrium under very mild assumptions. Furthermore, we prove a revenue equivalence showing that all standard auctions yield the same revenue even in the presence of budget constraints. Leveraging this equivalence, we prove Price of Anarchy bounds for liquid welfare and structural properties of pacing-based equilibria that hold for all standard auctions. In recent years, the internet advertising market has adopted first-price auctions as the preferred paradigm for selling advertising slots. Our work thus takes an important step toward understanding the implications of the shift to first-price auctions in internet advertising markets by studying how the choice of the selling mechanism impacts revenues, welfare, and advertisers' bidding strategies.
△ Less
Submitted 9 October, 2022; v1 submitted 20 February, 2021;
originally announced February 2021.
-
The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems
Authors:
Santiago Balseiro,
Haihao Lu,
Vahab Mirrokni
Abstract:
Online allocation problems with resource constraints are central problems in revenue management and online advertising. In these problems, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates reward. The objective is to maximize cumulative rewards subject to a constraint on t…
▽ More
Online allocation problems with resource constraints are central problems in revenue management and online advertising. In these problems, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates reward. The objective is to maximize cumulative rewards subject to a constraint on the total consumption of resources. In this paper, we consider a data-driven setting in which the reward and resource consumption of each request are generated using an input model that is unknown to the decision maker. We design a general class of algorithms that attain good performance in various input models without knowing which type of input they are facing. In particular, our algorithms are asymptotically optimal under independent and identically distributed inputs as well as various non-stationary stochastic input models, and they attain an asymptotically optimal fixed competitive ratio when the input is adversarial. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent. By choosing the reference function accordingly, we recover the dual sub-gradient descent and dual multiplicative weights update algorithm. The resulting algorithms are simple, fast, and do not require convexity in the revenue function, consumption function and action space, in contrast to existing methods for online allocation problems. We discuss applications to network revenue management, online bidding in repeated auctions with budget constraints, online proportional matching with high entropy, and personalized assortment optimization with limited inventory.
△ Less
Submitted 4 November, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Regularized Online Allocation Problems: Fairness and Beyond
Authors:
Santiago Balseiro,
Haihao Lu,
Vahab Mirrokni
Abstract:
Online allocation problems with resource constraints have a rich history in operations research. In this paper, we introduce the \emph{regularized online allocation problem}, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a…
▽ More
Online allocation problems with resource constraints have a rich history in operations research. In this paper, we introduce the \emph{regularized online allocation problem}, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources. The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints. Our primary motivation is allowing decision makers to trade off separable objectives such as the economic efficiency of an allocation with ancillary, non-separable objectives such as the fairness or equity of an allocation. We design an algorithm that is simple, fast, and attains good performance with both stochastic i.i.d.~and adversarial inputs. In particular, our algorithm is asymptotically optimal under stochastic i.i.d. input models and attains a fixed competitive ratio that depends on the regularizer when the input is adversarial. Furthermore, the algorithm and analysis do not require convexity or concavity of the reward function and the consumption function, which allows more model flexibility. Numerical experiments confirm the effectiveness of the proposed algorithm and of regularization in an internet advertising application.
△ Less
Submitted 4 November, 2021; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Dual Mirror Descent for Online Allocation Problems
Authors:
Haihao Lu,
Santiago Balseiro,
Vahab Mirrokni
Abstract:
We consider online allocation problems with concave revenue functions and resource constraints, which are central problems in revenue management and online advertising. In these settings, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates revenue. The revenue function and r…
▽ More
We consider online allocation problems with concave revenue functions and resource constraints, which are central problems in revenue management and online advertising. In these settings, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates revenue. The revenue function and resource consumption of each request are drawn independently and at random from a probability distribution that is unknown to the decision maker. The objective is to maximize cumulative revenues subject to a constraint on the total consumption of resources.
We design a general class of algorithms that achieve sub-linear expected regret compared to the hindsight optimal allocation. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent. By choosing the reference function accordingly, we recover dual sub-gradient descent and dual exponential weights algorithm. The resulting algorithms are simple, efficient, and shown to attain the optimal order of regret when the length of the horizon and the initial number of resources are scaled proportionally. We discuss applications to online bidding in repeated auctions with budget constraints and online proportional matching with high entropy.
△ Less
Submitted 4 November, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Contextual Bandits with Cross-learning
Authors:
Santiago Balseiro,
Negin Golrezaei,
Mohammad Mahdian,
Vahab Mirrokni,
Jon Schneider
Abstract:
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards th…
▽ More
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards that would have been achieved by performing that action under different contexts $c'\in \mathcal{O}_i(c)$. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We design and analyze new algorithms for the contextual bandits problem with cross-learning and show that their regret has better dependence on the number of contexts. Under complete cross-learning where the rewards for all contexts are learned when choosing an action, i.e., set $\mathcal{O}_i(c)$ contains all contexts, we show that our algorithms achieve regret $\tilde{O}(\sqrt{KT})$, removing the dependence on $C$. For any other cases, i.e., under partial cross-learning where $|\mathcal{O}_i(c)|< C$ for some context-action pair of $(i,c)$, the regret bounds depend on how the sets $\mathcal O_i(c)$ impact the degree to which cross-learning between contexts is possible. We simulate our algorithms on real auction data from an ad exchange running first-price auctions and show that they outperform traditional contextual bandit algorithms.
△ Less
Submitted 15 November, 2021; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Yield Optimization of Display Advertising with Ad Exchange
Authors:
Santiago Balseiro,
Jon Feldman,
Vahab Mirrokni,
S. Muthukrishnan
Abstract:
In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize…
▽ More
In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. We prove asymptotic optimality of this policy in terms of any trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a rigorous bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any pareto-optimal point on the quality vs. revenue curve. Finally, we study a parametric training-based algorithm in which instead of learning the dual variables from a sample data (as is done in non-parametric training-based algorithms), we learn the parameters of the distribution and construct those dual variables from the learned parameter values. We compare parametric and non-parametric ways to estimate from data both analytically and experimentally in the special case without the ad exchange, and show that though both methods converge to the optimal policy as the sample size grows, our parametric method converges faster, and thus performs better on smaller samples.
△ Less
Submitted 21 September, 2012; v1 submitted 12 February, 2011;
originally announced February 2011.