Stochastic Approach for Price Optimization Problems with Decision-dependent Uncertainty
Abstract
Price determination is a central research topic of revenue management in marketing. The important aspect in pricing is controlling the stochastic behavior of demand, and the previous studies have tackled price optimization problems with uncertainties. However, many of those studies assumed that uncertainties are independent of decision variables (i.e., prices) and did not consider situations where demand uncertainty depends on price. Although some price optimization studies have dealt with decision-dependent uncertainty, they make application-specific assumptions in order to obtain an optimal solution or an approximation solution. To handle a wider range of applications with decision-dependent uncertainty, we propose a general non-convex stochastic optimization formulation. This approach aims to maximize the expectation of a revenue function with respect to a random variable representing demand under a decision-dependent distribution. We derived an unbiased stochastic gradient estimator by using a well-tuned variance reduction parameter and used it for a projected stochastic gradient descent method to find a stationary point of our problem. We conducted synthetic experiments and simulation experiments with real data on a retail service application. The results show that the proposed method outputs solutions with higher total revenues than baselines.
1 Introduction
Price determination is a central research topic of revenue management in marketing, and many pricing studies have targeted applications in agricultural (Wang and Wang, 2019), online retail (Ferreira et al., 2016), electrical power (Dong et al., 2017), and hospitality industries (Koushik et al., 2012).
An important aspect in pricing is controlling the stochastic behavior of demand. This is because stochastic over/under demand causes a loss in many cases; for example, in road pricing, overuse of a certain road causes congestion or traffic accidents; in an electricity market, if demand is much lower than the available electricity supply, capital investment costs cannot be recovered.
To obtain greater profits under demand uncertainty, many of the previous studies have tackled price optimization problems with decision-independent random variables. For example, He et al. (2009) and Dong et al. (2017) define the demand for a product/service as , where is price and is a decision-independent random variable. Correa et al. (2017) and Chawla et al. (2010) assume multi-agent systems where each buyer has a random variable as their value for a product and purchases it when the price is below the value. However, in practical applications, it is natural for the distribution of stochastic demand to vary with price: when the price of a product is close to (far from) those of competing products, it is difficult (easy) to predict the demand and its uncertainty is large (small). Furthermore, the settings of these studies with decision-independent random variables need to use discontinuous functions to represent buyers’ discrete actions (e.g., buy or leave), which makes the optimization problem difficult to solve (see Section 3.4.1).
Although some pricing studies have dealt with decision-dependent uncertainty, they assume specific demand distributions and problem settings in order to obtain an optimal solution or an approximation solution. For example, Bertsimas and de Boer (2005) determine prices for multiple products produced with limited resources. They consider demand of item at time , i.e., , where and are given functions, is price, and is a random variable following a given distribution. Schulte and Sachs (2020) optimize prices over multiple periods to sell a single product with a fixed unit cost , where the item’s demand follows a Poisson distribution with a given intensity function . While these studies can find optimal or approximation solutions for their problems, they appear to be difficult to apply to a wide range of probability distributions or problem settings (e.g., a nonlinear cost setting for selling products) due to their specific assumptions.
To resolve these issues, for general price optimization, we propose a non-convex stochastic optimization formulation that maximizes the expectation of a revenue function with respect to a random variable representing demand under a decision-dependent distribution. Our formulation assumes that (i) the objective function is differentiable and Lipschitz continuous, (ii) the given probability density function of the random variables is differentiable and its gradient, normalized by the value of the probability density function, is bounded, and (iii) the feasible region is compact and convex. These assumptions may seem strong, but they often hold in the price optimization literature. Indeed, we show three application examples satisfying our assumptions (see Section 3.3).
The formulated problem for practical applications is generally non-convex and the dimension of the decision variables may be large. We derive an unbiased stochastic gradient estimator of the objective function by using information on the probability density function and incorporate the estimator in a projected stochastic gradient descent method to find a stationary point of our problem. When deriving a gradient estimator, it is important to design it so that its variance is small for fast convergence of the algorithm. Our unbiased stochastic gradient includes a variance reduction parameter, which is inspired by baseline technique (Williams, 1992; Sutton and Barto, 2018) in the reinforcement-learning literature. After confirming that the variance of the proposed stochastic gradient is bounded, we present a method for calculating the variance reduction parameter. Then, we develop a projected stochastic gradient descent method, which converges to a stationary point by incorporating the proposed stochastic gradient and method for calculating the variance reduction parameter into a recent gradient descent algorithm (Ghadimi and Lan, 2016). Moreover, we show a way of speeding up the computation of the minibatch gradient under additional assumptions that hold in applications where multiple agents make purchase decisions.
While some of the previous methods might seem applicable to our formulation, they are not suitable for the following reasons: the retraining method (Perdomo et al., 2020; Mendler-Dünner et al., 2020) requires strong convexity of the objective function; the Bayesian optimization (Brochu et al., 2010; Frazier, 2018) and gradient-free methods (Spall, 2005; Flaxman et al., 2005) require a huge number of evaluations of objective values, which makes it difficult to find good solutions for large-scale problems in a reasonable time.
We conducted synthetic experiments and simulation experiments with real-data on a retail service application. The results show that the proposed method outputs solutions with higher total revenues than do baselines such as the (modified) retraining method and Bayesian optimization.
Notation
Bold lowercase symbols (e.g., ) denote vectors, and denotes the Euclidean norm of a vector . The inner product of the vectors is denoted by . Let be the set of positive real numbers. The gradient for a real-valued function w.r.t. is denoted by and the Jacobian matrix for a vector valued function w.r.t. is denoted by . A binomial coefficient of a pair of integers and is written as . Let be the set of .
2 Related Works
2.1 Price Optimization Problems with Stochastic Demand
The previous studies on pricing with stochastic demand considered three types of random variable: (a) decision-independent random variables included in buyers’ purchase behavior; (b) decision-independent random variables directly included in demand; (c) decision-dependent random variables included in demand. Regarding (a), Chawla et al. (2010) and Correa et al. (2017) address pricing problems with stochastic behaviors of multiple agents; each agent has a (decision-independent) random variable as its value for a product and purchases a product when the price is below that value. Regarding (b), He et al. (2009); Heydari and Norouzinasab (2015), and Dong et al. (2017) deal with demand with a decision-independent uncertainty, such as , where is price and is a random variable independent of price. Regarding (c), Bertsimas and de Boer (2005); Wang and Wang (2019); Schulte and Sachs (2020), and Hikima et al. (2021, 2022, 2023) tackle pricing problems with decision-dependent stochastic demand, such as , where is a (decision-dependent) random variable. Our study is categorized into (c).
In this paper, we propose a new general pricing problem with decision-dependent random variables. Our problem has advantages over the previous ones for tackling (a), (b), and (c). Regarding (a), while the previous studies need to define agents’ actions (e.g., buy or leave) by discontinuous functions, in our formulation, we can define those without a discontinuous function, leading to gradient-based methods. Regarding (b), we generalize the noise of demand to make it depend on the decision variable, which allows us to deal with situations where the demand uncertainty varies with price. Regarding (c), previous studies have limited applications since they make application-specific assumptions to obtain an optimal solution or an approximation solution: Wang and Wang (2019) and Schulte and Sachs (2020) consider specific situations to optimize prices over multiple periods to sell items and describe efficient methods to find an optimal solution; Hikima et al. (2021, 2022, 2023) tackle resource allocation problems while controlling agents’ acceptance probabilities for prices and present approximation algorithms with constant approximation ratios; Bertsimas and de Boer (2005) consider a simple demand function where the price of each item does not affect demand for other items and present heuristics to obtain an approximation solution. In contrast, we deal with a more general framework that has various applications (see Section 3.3). Consequently, our formulation is a non-convex optimization problem and we develop a stochastic method that is theoretically guaranteed to converge to a stationary point.
2.2 Optimization Methods for Stochastic Problems with Decision-dependent Uncertainty
Our price optimization problem, (P) in Section 3.1, is categorized as a stochastic problem with decision-dependent uncertainty (Hellemo et al., 2018; Varaiya and Wets, 1989). This is because the demand of items and services follows a probability distribution depending on price (decision variables). Here, we explain three different techniques for solving the problem.222 Another formulation dealing with decision-dependent random variables is decision-dependent distributionally robust optimization (Luo and Mehrotra, 2020; Basciftci et al., 2021). Although such methods are effective at finding an optimal solution in the worst case when the probability distribution is ambiguous, they are not appropriate for the purpose of this study.
Retraining methods (Perdomo et al., 2020; Mendler-Dünner et al., 2020).
Retraining methods fix the distribution at each iteration and update the current iterate. Specifically, (Perdomo et al., 2020) proposed repeated gradient descent: where is the feasible region and is the Euclidean projection operator onto . It converges to a performatively stable point . However, these methods assume the strong convexity of w.r.t. and are not applicable to our problem. We provide an intuitive example where RGD fails to work in price optimization, where the objective function is generally not strongly convex.
Example 1.
Suppose that a seller determines the price of a product. The buyer purchases the product () with probability or does not purchase it () with probability , where is a decreasing function. The seller wants to solve to maximize the expected revenue, where is the distribution for . Then, the optimal solution is . However, RGD continues to raise the price until the purchase probability reaches zero or the price reaches since and for all . This price is generally not equal to .
Meta-model methods (Brochu et al., 2010; Frazier, 2018; Miller et al., 2021).
This type of method creates a meta-model of the objective function or the distribution map from multiple sample points. Bayesian optimization (Brochu et al., 2010; Frazier, 2018) is the process of learning the objective function through Gaussian process regression while finding the global optimal solution. The two-stage approach (Miller et al., 2021) estimates a coarse model of the distribution map and then optimizes a proxy to the objective function by treating the estimated distribution as if it were the true distribution map. While these methods are powerful for certain problems, they are not suitable for ours: Bayesian optimization cannot find good solutions when the dimension of the decision variables is too large to be adequately explored; the two-stage approach assumes that the distribution map is included in location-scale families (Miller et al., 2021, Eq. (2)), which cannot be assumed in our problem.
Gradient-free methods (Spall, 2005; Flaxman et al., 2005).
Gradient-free methods estimate the gradient by querying objective values at randomly perturbed points around the current iterate. While this type of method is generic, it often requires many evaluations of objective values to estimate the gradient accurately.
We developed a new projected stochastic gradient descent method by deriving an unbiased stochastic gradient. Our method has advantages over the existing ones: unlike retraining methods, it can find stationary points for general pricing problems with no strongly convex objective functions; unlike meta-model methods, it can find stationary points in high-dimensional optimization problems and does not place a strong assumption on the distribution map; while gradient-free methods naively approximate the gradient, our method approximates it by using gradient information on the objective function and the probability density function, which enables us to estimate gradients more accurately in a shorter computation time.
3 Optimization Problem
3.1 Problem Definition
We will consider the following hypothetical situation. There is a decision maker determining a price vector for items , where the index denotes the type of items and/or the time period. Then, the demand vector of items is sampled from a probability distribution . The decision maker obtains a profit of , where and are the sales and cost functions, respectively.
The revenue maximization problem is as follows:
where is real-valued and possibly non-convex. is a decision-dependent distribution for the measurable set . Here, we let be the probability density function of and assume that the decision maker can obtain the value of and for given and . This assumption naturally holds in many applications of price optimization.333For example, in (Bertsimas and de Boer, 2005), the demand for item at price is defined by , where is a random variable and its probability density function is given. In (Schulte and Sachs, 2020), the buyer’s arrival rate at price is assumed to follow a Poisson process with the intensity function , which identifies the probability density function of demand. Hikima et al. (2022) define as the probability that buyer arrives at time interval for price , and then give a probability density function for demand.
3.2 Assumptions
Our development of an unbiased stochastic gradient for (P) whose variance is bounded by a constant requires a number of assumptions. In particular, we will make the following assumptions.
Assumption 1.
For all and , the following hold,
-
(i)
is differentiable and Lipschitz continuous with modulus w.r.t. and continuous w.r.t. ,
-
(ii)
is differentiable w.r.t. and , and
-
(iii)
for a constant .
Assumption 2.
The set is compact and convex. The set is compact.
Moreover, we need the following assumption when is a continuous random vector:
Assumption 3.
The set is a Borel set on . Moreover, is continuous w.r.t. for all .
Assumptions 1–3 do not depend on a specific application; there are various applications that satisfy them (see Section 3.3). Condition (i) of Assumption 1 usually holds in pricing applications; the sales function is usually expressed as (the product of price and demand), so it can be differentiable w.r.t. and Lipschitz continuous when is bounded; the cost function is usually continuous w.r.t. since the production cost is usually continuous with respect to demand. Condition (ii) of Assumption 1 is satisfied by many distributions with a (statistical) parameter , where is a differentiable vector-valued function.444For example, the probability density functions of normal and multinomial distributions satisfy condition (ii) of Assumption 1. Since the probability density functions of these distributions are differentiable with respect to their parameters (e.g., mean, variance), they are also differentiable w.r.t. from the differentiability of . Condition (iii) of Assumption 1 means that when the probability of a given demand is small, the effect of price on that probability is also small. In our application examples presented in Section 3.3, the multinomial and truncated normal distributions parameterized by price satisfy these conditions. Assumption 2 is natural for practical pricing applications since price and demand ranges are usually bounded. Assumption 3 is satisfied if follows one of the major continuous probability distributions such as the normal and logistic distributions. In the next section, we show that our application examples satisfy Assumptions 1–3.
Remark.
Assumption 2 does not hold in the case of unconstrained price optimization, but we can assume for a sufficiently large in practice.
3.3 Application Examples
3.3.1 Multiproduct Pricing
We consider a variant of (Gallego and Wang, 2014; Zhang et al., 2018) in which a decision maker exists that determines the prices of multiple products and there are buyers and products. Let be the price vector for the products. We assume buyers choose one product stochastically; Each buyer chooses product with probability or does not choose any product with probability .555Besides the multinomial logit model, various other models can be considered, such as the nested logit model (Gallego and Wang, 2014) and the generalized nested logit model (Zhang et al., 2018). Here, and are positive constants that can be estimated from historical transaction data (Croissant, 2012). Let be a random vector, where represents the number of buyers not purchasing any product and for represents the number of sales of each product. Let and be real-valued functions representing the sales and costs of products, respectively. The following functions are possible for and :
Here, , , , , and are constants for each . The function represents the case where the cost rate varies with the number of sold products (which is also called economies of scale or diseconomies of scale).
The revenue-maximizing problem is as follows:
where the probability mass function of is . It can be written in the form of (P).666If the functions and are linear w.r.t. , then , and the problem is a deterministic optimization as is tackled in (Gallego and Wang, 2014; Zhang et al., 2018). Therefore, the problem can be regarded as a generalization of the problems in the previous studies in terms of sales and cost functions.
The following proposition shows that this application satisfies our assumptions.
Proposition 1.
The proofs of this proposition and the others can be found in Appendix A.
Remark.
3.3.2 Congestion Pricing for HOT Lanes
We consider a stochastic variant of (Lou et al., 2011) in the following traffic situation:777Our method can be extended to more general situations, such as ones with many lanes. there are two lanes, a high-occupancy/toll (HOT) lane and a regular lane; drivers can only switch from the regular lane to the HOT lane. There is a decision maker determining a price of the HOT lane for each time interval . The purpose of the decision maker is (i) to maximize the total flow rate at the bottlenecks of the HOT and regular lanes and (ii) to prevent the density of vehicles at the switching point from exceeding a certain level (to avoid traffic accidents). Let be the number of homogeneous drivers in the regular lane in a time interval . Here, each driver in changes lane with a probability , where is a constant indicating the average time savings if a driver chooses the HOT lane at time . The parameters , , and are constants, which can be estimated in real-time (Lou et al., 2011, Section 2.2).
The optimization problem is as follows (the details can be found in (Lou et al., 2011, Section 3)):
where is a random variable indicating the number of drivers switching their lanes in . Regarding the first term, the values of and are continuous functions representing flow rates at the bottlenecks on the HOT lane and the regular lane, respectively. This term aims to maximize the flow rate of each lane. Regarding the second term, is the critical density (the density likely to cause traffic accidents) of vehicles at the switching point, and is a continuous function representing the density at the switching point for the demand in . Therefore, is a penalty term for densities above the critical density, where is the penalty parameter. This optimization problem can be written in the form of (P), where , , and .
The following proposition shows that this application satisfies our assumptions.
3.3.3 Pricing with Demand Prediction from Limited Data Points
Here, we will consider optimizing prices of types of item. Regarding the prices , the demand of item is predicted using data points through the truncated Gaussian process (Swiler et al., 2020, Section 8.1):
Here, , , , and are respectively the function, vector, scalar, and matrix learned from the data points . The -th element of is defined by , where and are learned constants. The normalization function is the probability that a sample lies in . Here, for some .888Given that the observations are subject to noise, it is natural to predict that the variance is more than or equal to a certain constant ().
The revenue-maximizing problem is as follows:
where , , and is a continuous function for . represents the cost for item . This problem can be written in the form of (P), where for .
The following proposition shows that this application satisfies our assumptions.
3.4 Advantages of Our Formulation
3.4.1 Benefits of Using Decision-dependent Random Variables
The multiproduct pricing problem in Section 3.3.1 can also be expressed in terms of decision-independent random variables as follows. Each buyer has a value for each product , where and are constants, is the price, and is a random variable following a Gumbel distribution with mode 0 and variance . Each buyer purchases a product with the highest . Accordingly, the demand for product can be defined by , where and if with otherwise. The optimization problem can be written as follows by letting :
Although the multiproduct pricing problem can be formulated in the above manner with decision-independent random variables, the discontinuous function makes it difficult to optimize.999Optimization problems involving such discontinuous functions have been addressed by (Correa et al., 2017). They propose approximation methods to deal with this difficulty. In contrast, our problem does not involve a discontinuous function, which allows us to use gradient-based methods.
Moreover, He et al. (2009) and Dong et al. (2017) tackle similar problems to ours by defining the demand for a product/service as , where is price and is a decision-independent random variable. However, assuming is decision-independent makes it impossible to handle situations where demand uncertainty varies with price. In contrast, our problem setting can deal with such a situation by using decision-dependent random variables.
3.4.2 Differences from Existing Pricing Problems with Decision-dependent Uncertainty
The existing formulations with decision-dependent uncertainty make assumptions specific to their applications. For example, Schulte and Sachs (2020) assume that demand follows a Poisson distribution with an intensity , and they cannot use a multinomial or truncated Gaussian distribution as the demand distribution. Moreover, since a fixed cost is charged on their products, they can not handle a nonlinear cost. Hikima et al. (2021) assume that the probability density function for demand is , where is the probability that service user accepts the price ; they cannot use a truncated Gaussian distribution. In addition, they assume a specific objective function, which is defined by a bipartite matching problem with uncertainty.
In contrast to the existing formulations, ours has more varied applications because it has more general assumptions. The trade off for this generality is that our problem is non-convex and the dependence of the probability distribution on the decision variables defeats conventional stochastic optimization theory. Below, we focus on finding a stationary point and develop a projected stochastic gradient descent method by deriving unbiased stochastic gradient estimators.
4 Proposed Method
4.1 Preliminaries
Definition 1 (Projection oracle).
Given a point , we define the following as a projection oracle:
Definition 2 (Unbiased stochastic gradient).
Given a point , we call an “unbiased stochastic gradient” if
Definition 3 (Gradient map**).
Given a point and , the gradient map** of (P) is defined by
Definition 4 (-stationary point).
We call an -stationary point for (P) if for some , where denotes the point returned by a stochastic algorithm.
The following preliminary lemmas are needed for ensuring our method’s convergence when the random variables are continuous.
Lemma 5.
4.2 Unbiased Stochastic Gradient for (P)
First, we propose an unbiased stochastic gradient for (P).
Lemma 6.
Inspired by a technique called baseline in reinforcement learning (Williams, 1992; Sutton and Barto, 2018), we decided to include a variance reduction parameter in the unbiased stochastic gradient. If is close to , the second term of is small, and the variance of is reduced. We show how to determine in Section 4.3.
The gradient in Lemma 6 has the following useful feature.
Lemma 7.
Lemma 7 shows that the variance of the stochastic gradient of Lemma 6 can be bounded by a constant. This is a necessary condition for stochastic gradient methods to have a convergence rate independent of the number of possible values of (Li and Li, 2018).
Moreover, the following lemma is necessary for ensuring the convergence of the proposed method.
4.3 Calculation of Variance Reduction Parameter
To reduce the variance of the gradient in Lemma 6, the parameter should be close to for the iterate . During the iterations of the algorithm, is updated to bring it closer to the target value. We consider the following sequential stochastic problem for : a decision maker selects at iteration and incurs an unobserved cost , where and are given;101010 is an arbitrarily small positive value. It extends the range of , which is needed in Proposition 9. for the decision maker, an unbiased estimate of , denoted by , is obtained by sampling. Here, we assume , which is usually holds from the definition (1) of .
As a way to solve the above problem, we propose Algorithm 1, which is based on the online gradient descent (OGD) algorithm (Besbes et al., 2015).
This method updates by using the stochastic gradient since . Note that from the definition of . Accordingly, the following proposition holds from (Besbes et al., 2015, Lemma C-5), which guarantees that Algorithm 1 outputs a solution close to the optimum in terms of regret.
Proposition 9.
Let for , and let be the output of Algorithm 1 for . Then, there exists a constant such that
From this proposition and the definition of , we find that output of Algorithm 1 is a reasonable approximation of .
4.4 Proposed Algorithm
We propose Algorithm 2 for solving problem (P). It incorporates our stochastic gradient and Algorithm 1 into a projected stochastic gradient method (Ghadimi and Lan, 2016, Algorithm 4). Lines 5–9 update the iterate on the basis of (Ghadimi and Lan, 2016, Algorithm 4) by using our proposed stochastic gradient. Line 10 updates the variance reduction parameter on the basis of Algorithm 1 by letting be . Note that line 10 does not impose any additional computation cost since is already computed on line 7.
Then, from Lemmas 6–8 and (Ghadimi and Lan, 2016, Corollary 6), the following convergence theorem holds.
Theorem 10.
Suppose that Assumptions 1 and 2 hold. Moreover, suppose that Assumption 3 holds if is a continuous random vector. Let the inputs of Algorithm 2 be , and for where is some parameter, , , and for . Let for . Then,
where . Consequently, to obtain an -stationary point of Definition 4, we need at most iterations.
The parameter in Theorem 10 determines the balance between the minibatch size and the iteration complexity: a small results in smaller iteration complexity but a larger minibatch size; a large leads to a smaller minibatch size but larger iteration complexity.
Bottleneck of Algorithm 2.
4.5 Specialized Projected Stochastic Gradient Method for Price Optimization in Multi-agent Applications
To reduce the computation cost at the bottleneck of Algorithm 2, we propose a specialized projected stochastic gradient method that adds the following assumptions to (P).
Assumption 4.
and is continuous.
Assumption 5.
The probability density function is defined as , where is real-valued and differentiable w.r.t. , is easily computed, and is vector-valued and differentiable.
The above assumptions are often satisfied in price optimization for multi-agent applications. In particular, Assumption 4 tends to hold because the sales function in price optimization is usually linear with respect to and the cost function is usually continuous with respect to . Assumption 5 holds for many parameterized distributions (e.g., binomial, multinomial, and Poisson distributions) since the probability density function and its gradient can be simply written by its parameters. Many multi-agent applications satisfy Assumption 5 because the distribution of the demand follows a binomial or multinomial distribution with parameters , which represents the probabilities of each agent’s actions.
The following lemmas show that the applications with multiple agents described in Section 3.3 satisfy Assumptions 4 and 5.
Proposition 11.
The problem of multiproduct pricing satisfies Assumption 4. Moreover, let Then, and
Proposition 12.
The problem of congestion pricing for Hot lanes satisfies Assumption 4. Moreover, let Then, and
Below, we present the lemmas for our specialized method under Assumptions 1–5. Let , which exists since is compact from Assumption 2 and is continuous from Assumption 4.
Lemma 13.
Lemma 14.
Now, let us examine Algorithm 3. Its computational cost is lower than that of Algorithm 2: Algorithm 2 requires calculations of , whereas Algorithm 3 requires calculations of , which can be easily computed from Assumption 5.
5 Experiments
We conducted experiments on an application of multiproduct pricing to show that Algorithm 3 outputs solutions with higher total revenues compared with the existing methods. We performed synthetic experiments and simulation experiments with real retail data from a supermarket service provider in Japan.111111We used publicly available data, “New Product Sales Ranking”, provided by KSP-SP Co., Ltd, http://www.ksp-sp.com. The details of our experiments are in Appenndix B.
We implemented the following methods.
Proposed Method: We implemented Algorithm 3 with , , and , where is the current iteration number and is the number of buyers.
Proposed Method (fixed ): This is the proposed method with a fixed from the information at the initial iterate .
Specifically, was set to for all , where .
Proposed Method ():
This is the proposed method with set to zero.
L2-Regularized Repeated Gradient Descent (L2-RGD()) (Perdomo et al., 2020, Appendix E):
This method applies a repeated gradient descent (Perdomo et al., 2020, Section 3.3) to the objective function with a regularization term , where is the initial point.121212This is introduced in (Perdomo et al., 2020, Appendix E) as a remedy for the retraining method for non-strongly convex objective functions.
Note that the retraining method was originally intended for strongly convex objective functions.
We implemented this method for several .
Bayesian Optimization (BO) (GPyOpt-authors, 2016):
This method sequentially searches for points where the objective value is likely to be small and outputs the solution with the lowest objective value among the evaluated points.
We used GPyOpt, a Python open-source library for Bayesian optimization (GPyOpt-authors, 2016).
Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall, 2005, 1998):
This method updates the current iterate by using approximated gradient, which calculated by the difference between objective values of two perturbed iterates.
Projected Sub-gradient Descent for Average Demand (PSD-AD) (Boyd et al., 2003, Section 3):
This is a projected subgradient descent method for deterministic pricing problems with average demand.
We performed our experiments under the following settings.
Initial points.
For all methods other than BO, we set the initial points as , where is a vector with all elements equal to .
BO first evaluates five random points; then it runs the Bayesian optimization.
Metric.
We computed for each iterate , where , and defined the smallest value among the iteration points as the Negative Expected Revenue (NER).
Termination criteria.
We terminated all methods at a maximum computation time of seconds.
5.1 Synthetic Experiments
Synthetic Parameter Setup.
We performed experiments by varying each parameter from the following default settings. We set and , which are the numbers of products and buyers, respectively. For each product, we let the minimum price be and the maximum price be . For the parameters of the function , we generated for each from a uniform distribution of , and we let and let . For the parameters of the function for each , we set , , and , where was generated from a uniform distribution of . We let and . We then varied and under these default settings.
(, ) | Proposed |
|
|
|
|
|
BO | SPSA | PSD-AD | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | |||||||||||
-56.3 | 5.4 | -54.9 | 5.5 | -54.7 | 5.2 | -28.0 | 10.6 | -28.2 | 10.7 | -28.2 | 10.7 | -22.3 | 6.2 | -33.7 | 17.6 | -45.9 | 8.9 | |||||||||||
-55.4 | 8.4 | -54.5 | 8.8 | -54.4 | 8.3 | -8.2 | 23.0 | -9.8 | 23.2 | -11.9 | 23.4 | -34.4 | 10.6 | -46.8 | 13.4 | -31.8 | 13.9 | |||||||||||
-56.6 | 3.2 | -54.7 | 3.5 | -54.1 | 3.4 | -24.5 | 9.5 | -24.4 | 9.5 | -24.4 | 9.6 | -14.6 | 3.9 | -1.7 | 13.0 | -47.4 | 3.7 | |||||||||||
-26.9 | 3.7 | -26.2 | 3.8 | -26.1 | 3.7 | -8.8 | 6.8 | -8.8 | 6.8 | -8.8 | 6.8 | -11.7 | 4.0 | -4.6 | 6.3 | -20.4 | 5.1 | |||||||||||
-106.9 | 14.2 | -104.4 | 14.6 | -103.4 | 14.0 | -38.5 | 28.5 | -38.5 | 28.2 | -39.2 | 26.8 | -37.8 | 7.9 | -36.4 | 17.5 | -79.4 | 21.0 |
Experimental Results
Table 1 shows the results of the simulation experiments with different parameter values. The proposed method outperformed the baselines in terms of NER for all parameters, for the following reasons: (i) Proposed (fixed ) and Proposed () converged to low-quality local solutions because the variance of the gradient was larger than that of the proposed method; (ii) L2-RGD continued to increase prices without considering the effect of prices on the probability distribution, as shown in Example 1 in Section 2.2, which led to unreasonably high prices; (iii) BO did not adequately explore because it took a lot of time to evaluate the objective value at each search point; (iv) SPSA did not accurately estimate the gradient because the noise in the gradient was too large; (v) PSD-AD ignored demand uncertainty, which increases the objective value since over/under demand occurs stochastically and causes unprofitable costs.
5.2 Simulation Experiments with Real Data
Data Set and Parameter Setup
We used retail data from a supermarket service provider in Japan. This data records the average sales prices of top-selling new products in food supermarkets. We targeted sales data for different confectionery products for randomly selected weeks from 2022. We set the recorded average selling price as the general value for each product . The other parameters were set the same as in the synthetic experiment. Since the parameter for each was generated randomly, experiments were performed on 20 problem instances for each week’s data.
date | Proposed |
|
|
|
|
|
BO | SPSA | PSD-AD | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | NER | SD | |||||||||||
02/21–02/27 | -28.1 | 1.0 | -21.5 | 1.4 | -25.3 | 1.0 | 13.8 | 18.2 | 8.8 | 19.0 | 2.2 | 16.3 | -8.5 | 2.1 | 10.3 | 8.7 | -9.0 | 2.5 | ||||||||||
03/21–03/27 | -20.6 | 0.7 | -20.1 | 1.0 | -18.5 | 1.0 | -7.5 | 3.4 | -7.5 | 3.4 | -7.6 | 3.4 | -4.4 | 0.7 | -10.4 | 3.3 | -17.7 | 1.8 | ||||||||||
05/23–05/29 | -22.6 | 0.9 | -17.8 | 1.8 | -20.2 | 1.0 | 12.4 | 7.1 | 12.3 | 7.1 | 12.1 | 7.9 | -6.1 | 1.6 | -1.2 | 6.6 | -10.2 | 3.2 | ||||||||||
06/20–06/26 | -32.3 | 2.1 | -21.6 | 3.6 | -28.8 | 2.4 | 79.2 | 39.5 | 55.0 | 58.9 | 53.3 | 57.9 | -14.1 | 4.1 | 30.6 | 15.4 | -8.1 | 5.6 | ||||||||||
08/08–08/14 | -33.6 | 0.9 | -31.7 | 1.0 | -31.2 | 1.0 | -24.5 | 3.8 | -24.5 | 3.7 | -24.6 | 3.7 | -7.2 | 1.8 | -11.1 | 3.4 | -29.4 | 1.6 | ||||||||||
09/19–09/25 | -31.3 | 1.5 | -23.9 | 3.4 | -28.5 | 1.8 | 0.0 | 24.4 | -6.4 | 22.1 | -10.3 | 18.5 | -9.8 | 2.3 | 11.6 | 7.4 | -13.5 | 5.3 | ||||||||||
12/05–12/11 | -73.0 | 3.2 | -66.0 | 3.9 | -71.1 | 3.4 | 172.3 | 30.7 | 152.6 | 44.0 | 146.2 | 37.2 | -37.9 | 7.0 | 72.5 | 22.3 | -28.9 | 10.8 |
Experimental Results
Table 2 shows the results of the experiments using real data from different weeks. The proposed method was superior to the baseline in terms of NER for all weeks of data.
6 Conclusion
We formulated a new price optimization problem with decision-dependent uncertainty to address the drawbacks of existing formulations that (i) cannot deal with decision-dependent demand uncertainty, (ii) require discontinuous functions to define buyers’ discrete actions, or (iii) have limited applications due to specific assumptions. Moreover, we developed a projected stochastic gradient descent method by deriving an unbiased stochastic gradient with a variance reduction parameter. Our method is guaranteed to converge to an -stationary point. Synthetic experiments and simulation experiments with real data confirmed the effectiveness of our formulation and method.
Our formulation and results suggest directions for further research. The first is to construct methods to find a globally optimal solution rather than a stationary point (e.g., incorporating multi-start techniques (György and Kocsis, 2011) into our methods or building fast Bayesian optimization under more specific assumptions). The second is analyzing the performance of our method when some of our assumptions are relaxed. This would include analyzing the performance when the probability density function is not differentiable and smoothed with the existing techniques.
References
- Basciftci et al. [2021] B. Basciftci, S. Ahmed, and S. Shen. Distributionally robust facility location problem under decision-dependent stochastic demand. European Journal of Operational Research, 292(2):548–561, 2021.
- Bertsimas and de Boer [2005] D. Bertsimas and S. de Boer. Special issue papers: Dynamic pricing and inventory control for multiple products. Journal of Revenue and Pricing Management, 3:303–319, 2005.
- Besbes et al. [2015] O. Besbes, Y. Gur, and A. Zeevi. Non-stationary stochastic optimization. Operations research, 63(5):1227–1244, 2015.
- Boyd et al. [2003] S. Boyd, L. Xiao, and A. Mutapcic. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter, 2004:2004–2005, 2003.
- Brochu et al. [2010] E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
- Chawla et al. [2010] S. Chawla, J. D. Hartline, D. L. Malec, and B. Sivan. Multi-parameter mechanism design and sequential posted pricing. In STOC, pages 311–320, 2010.
- Chen and Mangasarian [1996] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97–138, 1996.
- Chen [2012] X. Chen. Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1):71–99, 2012.
- Correa et al. [2017] J. Correa, P. Foncea, R. Hoeksma, T. Oosterwijk, and T. Vredeveld. Posted price mechanisms for a random stream of customers. In EC, page 169â186, 2017.
- Croissant [2012] Y. Croissant. Estimation of multinomial logit models in r: The mlogit packages. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=03dbc1728d3860d239132b5af95367d4a5b273c3, 2012.
- Dong et al. [2017] C. Dong, C. T. Ng, and T. Cheng. Electricity time-of-use tariff with stochastic demand. Production and Operations Management, 26(1):64–79, 2017.
- Ferreira et al. [2016] K. J. Ferreira, B. H. A. Lee, and D. Simchi-Levi. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18(1):69–88, 2016.
- Flaxman et al. [2005] A. Flaxman, A. T. Kalai, and H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pages 385–â94, 2005.
- Frazier [2018] P. I. Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
- Gallego and Wang [2014] G. Gallego and R. Wang. Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Operations Research, 62(2):450–461, 2014.
- Ghadimi and Lan [2016] S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1):59–99, 2016.
- GPyOpt-authors [2016] T. GPyOpt-authors. Gpyopt: A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt, 2016.
- György and Kocsis [2011] A. György and L. Kocsis. Efficient multi-start strategies for local search algorithms. Journal of Artificial Intelligence Research, 41:407–444, 2011.
- He et al. [2009] Y. He, X. Zhao, L. Zhao, and J. He. Coordinating a supply chain with effort and price dependent stochastic demand. Applied Mathematical Modelling, 33(6):2777–2790, 2009.
- Hellemo et al. [2018] L. Hellemo, P. I. Barton, and A. Tomasgard. Decision-dependent probabilities in stochastic programs with recourse. Computational Management Science, 15(3):369–395, 2018.
- Heydari and Norouzinasab [2015] J. Heydari and Y. Norouzinasab. A two-level discount model for coordinating a decentralized supply chain considering stochastic price-sensitive demand. Journal of Industrial Engineering International, 11:531–542, 2015.
- Hikima et al. [2021] Y. Hikima, Y. Akagi, H. Kim, M. Kohjima, T. Kurashima, and H. Toda. Integrated optimization of bipartite matching and its stochastic behavior: New formulation and approximation algorithm via min-cost flow optimization. In AAAI, pages 3796–3805, 2021.
- Hikima et al. [2022] Y. Hikima, Y. Akagi, N. Marumo, and H. Kim. Online matching with controllable rewards and arrival probabilities. In IJCAI, pages 1825–1833, 2022.
- Hikima et al. [2023] Y. Hikima, Y. Akagi, H. Kim, and T. Asami. An improved approximation algorithm for wage determination and online task allocation in crowd-sourcing. In AAAI, pages 3977–3986, 2023.
- Koushik et al. [2012] D. Koushik, J. A. Higbie, and C. Eister. Retail price optimization at intercontinental hotels group. Interfaces, 42(1):45–57, 2012.
- Li and Li [2018] Z. Li and J. Li. A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In NeurIPS, pages 5569–â5579, 2018.
- Lou et al. [2011] Y. Lou, Y. Yin, and J. A. Laval. Optimal dynamic pricing strategies for high-occupancy/toll lanes. Transportation Research Part C: Emerging Technologies, 19(1):64–74, 2011.
- Luo and Mehrotra [2020] F. Luo and S. Mehrotra. Distributionally robust optimization with decision dependent ambiguity sets. Optimization Letters, 14:2565–2594, 2020.
- Mendler-Dünner et al. [2020] C. Mendler-Dünner, J. Perdomo, T. Zrnic, and M. Hardt. Stochastic optimization for performative prediction. In NeurIPS, pages 4929–4939, 2020.
- Miller et al. [2021] J. P. Miller, J. C. Perdomo, and T. Zrnic. Outside the echo chamber: Optimizing the performative risk. In ICML, pages 7710–7720, 2021.
- Perdomo et al. [2020] J. Perdomo, T. Zrnic, C. Mendler-Dünner, and M. Hardt. Performative prediction. In ICML, pages 7599–7609, 2020.
- Royden and Fitzpatrick [1988] H. L. Royden and P. Fitzpatrick. Real analysis, volume 32. Macmillan New York, 1988.
- Schulte and Sachs [2020] B. Schulte and A.-L. Sachs. The price-setting newsvendor with poisson demand. European Journal of Operational Research, 283(1):125–137, 2020.
- Spall [1998] J. C. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on aerospace and electronic systems, 34(3):817–823, 1998.
- Spall [2005] J. C. Spall. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley & Sons, 2005.
- Sutton and Barto [2018] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
- Swiler et al. [2020] L. P. Swiler, M. Gulian, A. L. Frankel, C. Safta, and J. D. Jakeman. A survey of constrained gaussian process regression: Approaches and implementation challenges. Journal of Machine Learning for Modeling and Computing, 1(2), 2020.
- Varaiya and Wets [1989] P. Varaiya and R. J. Wets. Stochastic dynamic optimization, approaches and computation. In Mathematical Programming, Recent Developments and Applications, 1989.
- Wang and Wang [2019] X.-z. Wang and G.-q. Wang. Integrating dynamic pricing and inventory control for fresh-agri product under consumer choice. Australian Economic Papers, 58(1):96–111, 2019.
- Williams [1992] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Zhang et al. [2018] H. Zhang, P. Rusmevichientong, and H. Topaloglu. Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters. Operations Research, 66(6):1559–1570, 2018.
Appendix A Proofs
A.1 Proof of Proposition 1
Proof.
Assumption 2 holds since and .
Therefore, we give proof for each of (i)–(iii) in Assumption 1.
(i) From definitions of and , the function is differentiable w.r.t. and continuous w.r.t. for all
and .
Moreover,
,
where the second inequality is due to the fact that the total demand for all products never exceeds the number of buyers.
Therefore, is Lipschitz continuous with modulus .
(ii) Since , the function is differentiable w.r.t. and for all and from the definition of for each .
(iii) We have for all and from the definition of for each . Then, since , we have
(2) |
Let for all . Then, . For ,
(3) |
For and ,
(4) |
For ,
(5) |
where the first inequality follows from for all . The second inequality follows from the definition whereby is equal to the number of buyers. Then, for all and , . ∎
A.2 Proof of Proposition 2
Proof.
Assumption 2 holds since and for all .
Therefore, we give proof for each of (i)–(iii) in Assumption 1.
(i) From the definition, the value of is independent of .
Therefore, is differentiable and Lipschitz continuous with modulus w.r.t. .
Moreover, is continuous w.r.t. from the definition since , , and are continuous functions.
(ii) Since , is differentiable w.r.t. from the definition of . Moreover, since for all and , for all .
A.3 Proof of Proposition 3
Proof.
Assumption 2 holds since and .
Moreover, Assumption 3 holds since
is a Borel set on and is continuous w.r.t. for all from the definition of .
We give proof for each condition of Assumption 1.
(i) From definitions of and , is differentiable w.r.t. and continuous w.r.t. for all and . Moreover, .
Therefore, is Lipschitz continuous with modulus .
(ii) is differentiable w.r.t. and for all and from definitions of , , and .
(iii) Let and . Then, . Therefore,
(8) |
Here,
(9) |
For all ,
(10) |
where the second inequality comes from (9) and the fact that and for and .
A.4 Proof of Lemma 4
Proof.
For a given , let be a sequence of scalars such that and , where is a vector such that the -th element is and other elements are . Let . There exists such that from the mean-value theorem. Moreover, exists since and are compact from Assumption 2 and is a real-valued continuous function from Assumption 1. Then, for all and ,
where the first inequality comes from , and the second inequality follows from conditions (i) and (iii) of Assumption 1. Here, is measurable on since is a Borel set and is continuous w.r.t. from Assumption 3 and the definitions of and . The constant function is integrable over . Moreover, pointwise when since is differentiable w.r.t. from conditions (i) and (ii) of Assumption 1. Then, the Lebesgue dominated convergence theorem [Royden and Fitzpatrick, 1988, Chapter 4.4, page 88] holds for for all and , that is,
Then, for all and ,
Therefore, for all ,
∎
A.5 Proof of Lemma 5
Proof.
For given , let be a sequence of scalars such that and , where is a vector such that the -th element is and other elements are . Let . There exists such that from the mean-value theorem. Moreover, let , which exists since is compact from Assumption 2 and is a real-valued continuous function. Then, for all and ,
where the first inequality follows from . The second inequality comes from condition (iii) of Assumption 1. Here, is measurable on since is a Borel set and is continuous w.r.t. from Assumption 3 and the definition of . The constant function is integrable over . Moreover, pointwise when since is differentiable w.r.t. from condition (ii) of Assumption 1. Then, the Lebesgue dominated convergence theorem [Royden and Fitzpatrick, 1988, Chapter 4.4, page 88] holds for for all and , that is,
Then, for all and ,
(12) |
Therefore, for all ,
∎
A.6 Proof of Lemma 6
Proof.
We have
(13) |
where the second equality comes from Lemma 5 with . Then,
Here, the third equality obviously holds when is a discrete random vector. If is a continuous random vector, the third equality follows from Lemma 4 since Assumptions 1–3 hold. The fourth equality is due to the fact that and are differentiable w.r.t. from conditions (i) and (ii) of Assumption 1. The fifth equality is due to the fact that from condition (ii) of Assumption 1. The seventh equality comes from (13). Then, Lemma 6 holds from Definition 2. ∎
A.7 Proof of Lemma 7
A.8 Proof of Lemma 8
A.9 Proof of Proposition 9
Proof.
We show that our problem satisfies the assumptions of [Besbes et al., 2015, Lemma C-5]. First, we show that in our problem is included in defined by [Besbes et al., 2015, Section 5]. is a class of sequences of convex cost functions from into , where is convex, compact, and non-empty. Moreover, and satify the following conditions for all :
-
1.
There is a finite number such that and for all .
-
2.
There is some such that , where .
-
3.
There are finite numbers and such that , where is the -dimensional identity matrix.
We consider the case of and . Accordingly, in our problem is included in since the following holds for any and :
A.10 Proof of Theorem 10
Proof.
When , the output of Algorithm 2 is included in from , , and line 10 of Algorithm 2. Therefore, for all from . From Lemmas 6-8, Assumption 2, and [Ghadimi and Lan, 2016, Corollary 6], we have
Here, in [Ghadimi and Lan, 2016, Corollary 6], we let , , and . Then, to obtain an -stationary point, we need the iteration number such that
(14) |
Eq. (14) can be reformulated as
Therefore, the sufficient condition for (14) is as follows:
∎
A.11 Proof of Proposition 11
Proof.
Assumption 4 holds since is continuous from the definition and
.
Moreover, since for all and from the definition of for each ,
.
∎
A.12 Proof of Proposition 12
Proof.
Assumption 4 holds since is continuous and . Moreover, since for all from the definition of , we have
∎
A.13 Proof of Lemma 13
Proof.
It follows from the definition of that
(15) |
where the third equality comes from Lemma 5 with . Since
from Assumptions 4 and 5, we have
Here, the third equality holds when is a discrete random vector. When is a continuous random vector, the third equality comes from Assumption 3 and Lemma 5 by letting . The fourth equality is due to the fact that from condition (ii) of Assumption 1. The fifth equality comes from (15). ∎
A.14 Proof of Lemma 14
A.15 Proof of Theorem 15
Proof.
When , the output of Algorithm 3 is included in from , , and the update rule for in Algorithm 3. Therefore, for all from . From Lemmas 8, 13, 14, and [Ghadimi and Lan, 2016, Corollary 6], we have
Here, in [Ghadimi and Lan, 2016, Corollary 6], we let , , and . Then, as in the proof of Theorem 10, we need the iteration number to obtain an -stationary point such that:
∎
Appendix B Details of our experiments
B.1 Common Settings
All experiments were conducted on a computer with an AMD EPYC 7413 24-Core Processor, 503.6 GiB of memory RAM, and Ubuntu 20.04.6 LTS. The program code was implemented in Python 3.8.3.
B.2 Settings of Baselines
L2-Regularized Repeated Gradient Descent (L2-RGD()): This method is described in Section 2.2. We used the fixed step size at each iteration .
Bayesian Optimization (BO): We used GPyOpt, a Python open-source library for Bayesian optimization [GPyOpt-authors, 2016]. We used the default setting of the library for parameters other than the termination criteria.
Simultaneous Perturbation Stochastic Approximation (SPSA): At each iteration , this method updates the current iterate by using the stochastic perturbation gradient:
where , each element of is sampled from a Rademacher distribution (i.e. Bernoulli with probability ), and and are random vectors sampled from the distribution . We set as the stepsize at each iteration. The settings of , , and are based on [Spall, 1998, Section III].
Projected Sub-gradient Descent for Average Demand (PSD-AD): This method is a projected subgradient descent method for
where , which represents the average demand for . We set the step size at each iteration so that the objective value decreases by repeatedly multiplying by .