HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: pbox
  • failed: secdot
  • failed: datetime

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2307.00492v2 [math.OC] 03 Jan 2024

Stochastic Approach for Price Optimization Problems with Decision-dependent Uncertainty

Yuya Hikima111Corresponding author. E-mail: [email protected] Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan Akiko Takeda Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Abstract

Price determination is a central research topic of revenue management in marketing. The important aspect in pricing is controlling the stochastic behavior of demand, and the previous studies have tackled price optimization problems with uncertainties. However, many of those studies assumed that uncertainties are independent of decision variables (i.e., prices) and did not consider situations where demand uncertainty depends on price. Although some price optimization studies have dealt with decision-dependent uncertainty, they make application-specific assumptions in order to obtain an optimal solution or an approximation solution. To handle a wider range of applications with decision-dependent uncertainty, we propose a general non-convex stochastic optimization formulation. This approach aims to maximize the expectation of a revenue function with respect to a random variable representing demand under a decision-dependent distribution. We derived an unbiased stochastic gradient estimator by using a well-tuned variance reduction parameter and used it for a projected stochastic gradient descent method to find a stationary point of our problem. We conducted synthetic experiments and simulation experiments with real data on a retail service application. The results show that the proposed method outputs solutions with higher total revenues than baselines.

1 Introduction

Price determination is a central research topic of revenue management in marketing, and many pricing studies have targeted applications in agricultural (Wang and Wang, 2019), online retail (Ferreira et al., 2016), electrical power (Dong et al., 2017), and hospitality industries (Koushik et al., 2012).

An important aspect in pricing is controlling the stochastic behavior of demand. This is because stochastic over/under demand causes a loss in many cases; for example, in road pricing, overuse of a certain road causes congestion or traffic accidents; in an electricity market, if demand is much lower than the available electricity supply, capital investment costs cannot be recovered.

To obtain greater profits under demand uncertainty, many of the previous studies have tackled price optimization problems with decision-independent random variables. For example, He et al. (2009) and Dong et al. (2017) define the demand for a product/service as d(x)+ξ𝑑𝑥𝜉d(x)+\xiitalic_d ( italic_x ) + italic_ξ, where x𝑥xitalic_x is price and ξ𝜉\xiitalic_ξ is a decision-independent random variable. Correa et al. (2017) and Chawla et al. (2010) assume multi-agent systems where each buyer i𝑖iitalic_i has a random variable visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as their value for a product and purchases it when the price is below the value. However, in practical applications, it is natural for the distribution of stochastic demand to vary with price: when the price of a product is close to (far from) those of competing products, it is difficult (easy) to predict the demand and its uncertainty is large (small). Furthermore, the settings of these studies with decision-independent random variables need to use discontinuous functions to represent buyers’ discrete actions (e.g., buy or leave), which makes the optimization problem difficult to solve (see Section 3.4.1).

Although some pricing studies have dealt with decision-dependent uncertainty, they assume specific demand distributions and problem settings in order to obtain an optimal solution or an approximation solution. For example, Bertsimas and de Boer (2005) determine prices for multiple products produced with limited resources. They consider demand of item i𝑖iitalic_i at time t𝑡titalic_t, i.e., αit(xit)+βit(xit)ξitsubscriptsuperscript𝛼𝑡𝑖subscriptsuperscript𝑥𝑡𝑖subscriptsuperscript𝛽𝑡𝑖subscriptsuperscript𝑥𝑡𝑖subscriptsuperscript𝜉𝑡𝑖\alpha^{t}_{i}(x^{t}_{i})+\beta^{t}_{i}(x^{t}_{i})\xi^{t}_{i}italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where αit()subscriptsuperscript𝛼𝑡𝑖\alpha^{t}_{i}(\cdot)italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) and βit()subscriptsuperscript𝛽𝑡𝑖\beta^{t}_{i}(\cdot)italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) are given functions, xitsubscriptsuperscript𝑥𝑡𝑖x^{t}_{i}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is price, and ξitsubscriptsuperscript𝜉𝑡𝑖\xi^{t}_{i}italic_ξ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a random variable following a given distribution. Schulte and Sachs (2020) optimize prices over multiple periods to sell a single product with a fixed unit cost c0𝑐0c\geq 0italic_c ≥ 0, where the item’s demand follows a Poisson distribution with a given intensity function λ(x)𝜆𝑥\lambda(x)italic_λ ( italic_x ). While these studies can find optimal or approximation solutions for their problems, they appear to be difficult to apply to a wide range of probability distributions or problem settings (e.g., a nonlinear cost setting for selling products) due to their specific assumptions.

To resolve these issues, for general price optimization, we propose a non-convex stochastic optimization formulation that maximizes the expectation of a revenue function with respect to a random variable representing demand under a decision-dependent distribution. Our formulation assumes that (i) the objective function is differentiable and Lipschitz continuous, (ii) the given probability density function of the random variables is differentiable and its gradient, normalized by the value of the probability density function, is bounded, and (iii) the feasible region is compact and convex. These assumptions may seem strong, but they often hold in the price optimization literature. Indeed, we show three application examples satisfying our assumptions (see Section 3.3).

The formulated problem for practical applications is generally non-convex and the dimension of the decision variables may be large. We derive an unbiased stochastic gradient estimator of the objective function by using information on the probability density function and incorporate the estimator in a projected stochastic gradient descent method to find a stationary point of our problem. When deriving a gradient estimator, it is important to design it so that its variance is small for fast convergence of the algorithm. Our unbiased stochastic gradient includes a variance reduction parameter, which is inspired by baseline technique (Williams, 1992; Sutton and Barto, 2018) in the reinforcement-learning literature. After confirming that the variance of the proposed stochastic gradient is bounded, we present a method for calculating the variance reduction parameter. Then, we develop a projected stochastic gradient descent method, which converges to a stationary point by incorporating the proposed stochastic gradient and method for calculating the variance reduction parameter into a recent gradient descent algorithm (Ghadimi and Lan, 2016). Moreover, we show a way of speeding up the computation of the minibatch gradient under additional assumptions that hold in applications where multiple agents make purchase decisions.

While some of the previous methods might seem applicable to our formulation, they are not suitable for the following reasons: the retraining method (Perdomo et al., 2020; Mendler-Dünner et al., 2020) requires strong convexity of the objective function; the Bayesian optimization (Brochu et al., 2010; Frazier, 2018) and gradient-free methods (Spall, 2005; Flaxman et al., 2005) require a huge number of evaluations of objective values, which makes it difficult to find good solutions for large-scale problems in a reasonable time.

We conducted synthetic experiments and simulation experiments with real-data on a retail service application. The results show that the proposed method outputs solutions with higher total revenues than do baselines such as the (modified) retraining method and Bayesian optimization.

Notation

Bold lowercase symbols (e.g., 𝒙,𝒚𝒙𝒚\bm{x},\bm{y}bold_italic_x , bold_italic_y) denote vectors, and 𝒙norm𝒙\|\bm{x}\|∥ bold_italic_x ∥ denotes the Euclidean norm of a vector 𝒙𝒙\bm{x}bold_italic_x. The inner product of the vectors 𝒙,𝒚𝒙𝒚\bm{x},\bm{y}bold_italic_x , bold_italic_y is denoted by 𝒙𝒚superscript𝒙top𝒚\bm{x}^{\top}\bm{y}bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y. Let +subscript\mathbb{R}_{+}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT be the set of positive real numbers. The gradient for a real-valued function f𝑓fitalic_f w.r.t. 𝒙𝒙\bm{x}bold_italic_x is denoted by 𝒙fsubscript𝒙𝑓\nabla_{\bm{x}}f∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f and the Jacobian matrix for a vector valued function 𝒑𝒑\bm{p}bold_italic_p w.r.t. 𝒙𝒙\bm{x}bold_italic_x is denoted by d𝒑d𝒙𝑑𝒑𝑑𝒙\frac{d\bm{p}}{d\bm{x}}divide start_ARG italic_d bold_italic_p end_ARG start_ARG italic_d bold_italic_x end_ARG. A binomial coefficient of a pair of integers m𝑚mitalic_m and n𝑛nitalic_n is written as Cnmsubscriptsubscript𝐶𝑛𝑚{}_{m}C_{n}start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Let [N]delimited-[]𝑁[N][ italic_N ] be the set of {1,2,,N}12𝑁\{1,2,\dots,N\}{ 1 , 2 , … , italic_N }.

2 Related Works

2.1 Price Optimization Problems with Stochastic Demand

The previous studies on pricing with stochastic demand considered three types of random variable: (a) decision-independent random variables included in buyers’ purchase behavior; (b) decision-independent random variables directly included in demand; (c) decision-dependent random variables included in demand. Regarding (a), Chawla et al. (2010) and Correa et al. (2017) address pricing problems with stochastic behaviors of multiple agents; each agent i𝑖iitalic_i has a (decision-independent) random variable visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as its value for a product and purchases a product when the price is below that value. Regarding (b), He et al. (2009); Heydari and Norouzinasab (2015), and Dong et al. (2017) deal with demand with a decision-independent uncertainty, such as d(x)+ξ𝑑𝑥𝜉d(x)+\xiitalic_d ( italic_x ) + italic_ξ, where x𝑥xitalic_x is price and ξ𝜉\xiitalic_ξ is a random variable independent of price. Regarding (c), Bertsimas and de Boer (2005); Wang and Wang (2019); Schulte and Sachs (2020), and Hikima et al. (2021, 2022, 2023) tackle pricing problems with decision-dependent stochastic demand, such as d(x)+ξ(x)𝑑𝑥𝜉𝑥d(x)+\xi(x)italic_d ( italic_x ) + italic_ξ ( italic_x ), where ξ(x)𝜉𝑥\xi(x)italic_ξ ( italic_x ) is a (decision-dependent) random variable. Our study is categorized into (c).

In this paper, we propose a new general pricing problem with decision-dependent random variables. Our problem has advantages over the previous ones for tackling (a), (b), and (c). Regarding (a), while the previous studies need to define agents’ actions (e.g., buy or leave) by discontinuous functions, in our formulation, we can define those without a discontinuous function, leading to gradient-based methods. Regarding (b), we generalize the noise ξ𝜉\xiitalic_ξ of demand to make it depend on the decision variable, which allows us to deal with situations where the demand uncertainty varies with price. Regarding (c), previous studies have limited applications since they make application-specific assumptions to obtain an optimal solution or an approximation solution: Wang and Wang (2019) and Schulte and Sachs (2020) consider specific situations to optimize prices over multiple periods to sell items and describe efficient methods to find an optimal solution; Hikima et al. (2021, 2022, 2023) tackle resource allocation problems while controlling agents’ acceptance probabilities for prices and present approximation algorithms with constant approximation ratios; Bertsimas and de Boer (2005) consider a simple demand function where the price of each item does not affect demand for other items and present heuristics to obtain an approximation solution. In contrast, we deal with a more general framework that has various applications (see Section 3.3). Consequently, our formulation is a non-convex optimization problem and we develop a stochastic method that is theoretically guaranteed to converge to a stationary point.

2.2 Optimization Methods for Stochastic Problems with Decision-dependent Uncertainty

Our price optimization problem, (P) in Section 3.1, is categorized as a stochastic problem with decision-dependent uncertainty (Hellemo et al., 2018; Varaiya and Wets, 1989). This is because the demand of items and services follows a probability distribution depending on price (decision variables). Here, we explain three different techniques for solving the problem.222 Another formulation dealing with decision-dependent random variables is decision-dependent distributionally robust optimization (Luo and Mehrotra, 2020; Basciftci et al., 2021). Although such methods are effective at finding an optimal solution in the worst case when the probability distribution is ambiguous, they are not appropriate for the purpose of this study.

Retraining methods (Perdomo et al., 2020; Mendler-Dünner et al., 2020).

Retraining methods fix the distribution at each iteration and update the current iterate. Specifically, (Perdomo et al., 2020) proposed repeated gradient descent: 𝒙k+1:=proj𝒞(𝒙kηk𝔼𝝃D(𝒙k)[𝒙f(𝒙k,𝝃)]),assignsubscript𝒙𝑘1subscriptproj𝒞subscript𝒙𝑘subscript𝜂𝑘subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]subscript𝒙𝑓subscript𝒙𝑘𝝃\bm{x}_{k+1}:=\mathrm{proj}_{\mathcal{C}}(\bm{x}_{k}-\eta_{k}\mathbb{E}_{\bm{% \xi}\sim D(\bm{x}_{k})}[\nabla_{\bm{x}}f(\bm{x}_{k},\bm{\xi})]),bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] ) , where 𝒞𝒞\mathcal{C}caligraphic_C is the feasible region and proj𝒞subscriptproj𝒞\mathrm{proj}_{\mathcal{C}}roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT is the Euclidean projection operator onto 𝒞𝒞\mathcal{C}caligraphic_C. It converges to a performatively stable point 𝒙PS=argmin𝒙𝔼𝝃D(𝒙PS)[f(𝒙,𝝃)]subscript𝒙PSargsubscript𝒙subscript𝔼similar-to𝝃𝐷subscript𝒙PSdelimited-[]𝑓𝒙𝝃\bm{x}_{\mathrm{PS}}=\mathrm{arg}\min_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x% }_{\mathrm{PS}})}[f(\bm{x},\bm{\xi})]bold_italic_x start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ]. However, these methods assume the strong convexity of f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) w.r.t. 𝒙𝒙\bm{x}bold_italic_x and are not applicable to our problem. We provide an intuitive example where RGD fails to work in price optimization, where the objective function is generally not strongly convex.

Example 1.

Suppose that a seller determines the price x[0,M]𝑥0𝑀x\in[0,M]italic_x ∈ [ 0 , italic_M ] of a product. The buyer purchases the product (ξ=1𝜉1\xi=1italic_ξ = 1) with probability p(x)𝑝𝑥p(x)italic_p ( italic_x ) or does not purchase it (ξ=0𝜉0\xi=0italic_ξ = 0) with probability 1p(x)1𝑝𝑥1-p(x)1 - italic_p ( italic_x ), where p:[0,M][0,1]normal-:𝑝normal-→0𝑀01p:[0,M]\to[0,1]italic_p : [ 0 , italic_M ] → [ 0 , 1 ] is a decreasing function. The seller wants to solve minx[0,M]𝔼ξD(x)[xξ]subscript𝑥0𝑀subscript𝔼similar-to𝜉𝐷𝑥delimited-[]𝑥𝜉\min_{x\in[0,M]}\mathbb{E}_{\xi\sim D(x)}[-x\xi]roman_min start_POSTSUBSCRIPT italic_x ∈ [ 0 , italic_M ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_ξ ∼ italic_D ( italic_x ) end_POSTSUBSCRIPT [ - italic_x italic_ξ ] to maximize the expected revenue, where D(x)𝐷𝑥D(x)italic_D ( italic_x ) is the distribution for ξ𝜉\xiitalic_ξ. Then, the optimal solution is x*argminx[0,M]xp(x)superscript𝑥subscript𝑥0𝑀𝑥𝑝𝑥x^{*}\in\arg\min_{x\in[0,M]}-xp(x)italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ roman_arg roman_min start_POSTSUBSCRIPT italic_x ∈ [ 0 , italic_M ] end_POSTSUBSCRIPT - italic_x italic_p ( italic_x ). However, RGD continues to raise the price until the purchase probability reaches zero or the price reaches M𝑀Mitalic_M since 𝔼ξD(xk)[x(xξ)]=𝔼ξD(xk)[ξ]=p(xk)subscript𝔼similar-to𝜉𝐷superscript𝑥𝑘delimited-[]subscriptnormal-∇𝑥𝑥𝜉subscript𝔼similar-to𝜉𝐷superscript𝑥𝑘delimited-[]𝜉𝑝superscript𝑥𝑘\mathbb{E}_{\xi\sim D(x^{k})}[-\nabla_{x}(x\xi)]=\mathbb{E}_{\xi\sim D(x^{k})}% [-\xi]=-p(x^{k})blackboard_E start_POSTSUBSCRIPT italic_ξ ∼ italic_D ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT [ - ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x italic_ξ ) ] = blackboard_E start_POSTSUBSCRIPT italic_ξ ∼ italic_D ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT [ - italic_ξ ] = - italic_p ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) and p(xk)0𝑝superscript𝑥𝑘0p(x^{k})\geq 0italic_p ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ≥ 0 for all xk[0,M]superscript𝑥𝑘0𝑀x^{k}\in[0,M]italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ [ 0 , italic_M ]. This price is generally not equal to x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT.

Meta-model methods (Brochu et al., 2010; Frazier, 2018; Miller et al., 2021).

This type of method creates a meta-model of the objective function or the distribution map D()𝐷D(\cdot)italic_D ( ⋅ ) from multiple sample points. Bayesian optimization (Brochu et al., 2010; Frazier, 2018) is the process of learning the objective function through Gaussian process regression while finding the global optimal solution. The two-stage approach (Miller et al., 2021) estimates a coarse model of the distribution map D()𝐷D(\cdot)italic_D ( ⋅ ) and then optimizes a proxy to the objective function by treating the estimated distribution as if it were the true distribution map. While these methods are powerful for certain problems, they are not suitable for ours: Bayesian optimization cannot find good solutions when the dimension of the decision variables is too large to be adequately explored; the two-stage approach assumes that the distribution map is included in location-scale families (Miller et al., 2021, Eq. (2)), which cannot be assumed in our problem.

Gradient-free methods (Spall, 2005; Flaxman et al., 2005).

Gradient-free methods estimate the gradient by querying objective values at randomly perturbed points around the current iterate. While this type of method is generic, it often requires many evaluations of objective values to estimate the gradient accurately.

We developed a new projected stochastic gradient descent method by deriving an unbiased stochastic gradient. Our method has advantages over the existing ones: unlike retraining methods, it can find stationary points for general pricing problems with no strongly convex objective functions; unlike meta-model methods, it can find stationary points in high-dimensional optimization problems and does not place a strong assumption on the distribution map; while gradient-free methods naively approximate the gradient, our method approximates it by using gradient information on the objective function and the probability density function, which enables us to estimate gradients more accurately in a shorter computation time.

3 Optimization Problem

3.1 Problem Definition

We will consider the following hypothetical situation. There is a decision maker determining a price vector 𝒙𝒞n𝒙𝒞superscript𝑛\bm{x}\in\mathcal{C}\subseteq\mathbb{R}^{n}bold_italic_x ∈ caligraphic_C ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for items i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n, where the index i𝑖iitalic_i denotes the type of items and/or the time period. Then, the demand vector 𝝃Ξn𝝃Ξsuperscript𝑛\bm{\xi}\in\Xi\subseteq\mathbb{R}^{n}bold_italic_ξ ∈ roman_Ξ ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of n𝑛nitalic_n items is sampled from a probability distribution D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ). The decision maker obtains a profit of s(𝒙,𝝃)c(𝝃)𝑠𝒙𝝃𝑐𝝃s(\bm{x},\bm{\xi})-c(\bm{\xi})italic_s ( bold_italic_x , bold_italic_ξ ) - italic_c ( bold_italic_ξ ), where s:𝒞×Ξ:𝑠𝒞Ξs:\mathcal{C}\times\Xi\to\mathbb{R}italic_s : caligraphic_C × roman_Ξ → blackboard_R and c:Ξ:𝑐Ξc:\Xi\to\mathbb{R}italic_c : roman_Ξ → blackboard_R are the sales and cost functions, respectively.

The revenue maximization problem is as follows:

(P)min𝒙𝒞Psubscript𝒙𝒞\displaystyle\mathrm{(P)}\quad\min_{\bm{x}\in\mathcal{C}}( roman_P ) roman_min start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_C end_POSTSUBSCRIPT 𝔼𝝃D(𝒙)[f(𝒙,𝝃)],subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃\displaystyle\quad\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{\xi})],blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ,

where f(𝒙,𝝃):=s(𝒙,𝝃)+c(𝝃)assign𝑓𝒙𝝃𝑠𝒙𝝃𝑐𝝃f(\bm{x},\bm{\xi}):=-s(\bm{x},\bm{\xi})+c(\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) := - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ) is real-valued and possibly non-convex. D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) is a decision-dependent distribution for the measurable set ΞΞ\Xiroman_Ξ. Here, we let Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) be the probability density function of D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) and assume that the decision maker can obtain the value of Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) and 𝒙Pr(𝝃𝒙)subscript𝒙Prconditional𝝃𝒙\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) for given 𝒙𝒙\bm{x}bold_italic_x and 𝝃𝝃\bm{\xi}bold_italic_ξ. This assumption naturally holds in many applications of price optimization.333For example, in (Bertsimas and de Boer, 2005), the demand for item i𝑖iitalic_i at price xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is defined by di(xi):=αi(xi)+βi(xi)ξiassignsubscript𝑑𝑖subscript𝑥𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝛽𝑖subscript𝑥𝑖subscript𝜉𝑖d_{i}(x_{i}):=\alpha_{i}(x_{i})+\beta_{i}(x_{i})\xi_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) := italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a random variable and its probability density function is given. In (Schulte and Sachs, 2020), the buyer’s arrival rate at price x𝑥xitalic_x is assumed to follow a Poisson process with the intensity function λ(x)𝜆𝑥\lambda(x)italic_λ ( italic_x ), which identifies the probability density function of demand. Hikima et al. (2022) define pvt(x)subscript𝑝𝑣𝑡𝑥p_{vt}(x)italic_p start_POSTSUBSCRIPT italic_v italic_t end_POSTSUBSCRIPT ( italic_x ) as the probability that buyer v𝑣vitalic_v arrives at time interval t𝑡titalic_t for price x𝑥xitalic_x, and then give a probability density function for demand.

3.2 Assumptions

Our development of an unbiased stochastic gradient for (P) whose variance is bounded by a constant requires a number of assumptions. In particular, we will make the following assumptions.

Assumption 1.

For all 𝐱n𝐱superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝛏Ξ𝛏normal-Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ, the following hold,

  1. (i)

    f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is differentiable and Lipschitz continuous with modulus Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT w.r.t. 𝒙𝒙\bm{x}bold_italic_x and continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ,

  2. (ii)

    Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and Pr(𝝃𝒙)>0Prconditional𝝃𝒙0\mathrm{Pr}(\bm{\xi}\mid\bm{x})>0roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) > 0, and

  3. (iii)

    𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)Mnormsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑀\left\|\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{% \xi}\mid\bm{x})}\right\|\leq M∥ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ∥ ≤ italic_M for a constant M𝑀Mitalic_M.

Assumption 2.

The set 𝒞𝒞\mathcal{C}caligraphic_C is compact and convex. The set Ξnormal-Ξ\Xiroman_Ξ is compact.

Moreover, we need the following assumption when 𝝃𝝃\bm{\xi}bold_italic_ξ is a continuous random vector:

Assumption 3.

The set Ξnormal-Ξ\Xiroman_Ξ is a Borel set on nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Moreover, Pr(𝛏𝐱)normal-Prconditional𝛏𝐱\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is continuous w.r.t. 𝛏𝛏\bm{\xi}bold_italic_ξ for all 𝐱n𝐱superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Assumptions 13 do not depend on a specific application; there are various applications that satisfy them (see Section 3.3). Condition (i) of Assumption 1 usually holds in pricing applications; the sales function s(𝒙,𝝃)𝑠𝒙𝝃s(\bm{x},\bm{\xi})italic_s ( bold_italic_x , bold_italic_ξ ) is usually expressed as 𝒙𝝃superscript𝒙top𝝃\bm{x}^{\top}\bm{\xi}bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_ξ (the product of price and demand), so it can be differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and Lipschitz continuous when 𝝃𝝃\bm{\xi}bold_italic_ξ is bounded; the cost function c(𝝃)𝑐𝝃c(\bm{\xi})italic_c ( bold_italic_ξ ) is usually continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ since the production cost is usually continuous with respect to demand. Condition (ii) of Assumption 1 is satisfied by many distributions with a (statistical) parameter 𝝀(𝒙)𝝀𝒙\bm{\lambda}(\bm{x})bold_italic_λ ( bold_italic_x ), where 𝝀𝝀\bm{\lambda}bold_italic_λ is a differentiable vector-valued function.444For example, the probability density functions of normal and multinomial distributions satisfy condition (ii) of Assumption 1. Since the probability density functions of these distributions are differentiable with respect to their parameters 𝝀𝝀\bm{\lambda}bold_italic_λ (e.g., mean, variance), they are also differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x from the differentiability of 𝝀(𝒙)𝝀𝒙\bm{\lambda}(\bm{x})bold_italic_λ ( bold_italic_x ). Condition (iii) of Assumption 1 means that when the probability of a given demand is small, the effect of price on that probability is also small. In our application examples presented in Section 3.3, the multinomial and truncated normal distributions parameterized by price satisfy these conditions. Assumption 2 is natural for practical pricing applications since price and demand ranges are usually bounded. Assumption 3 is satisfied if 𝝃𝝃\bm{\xi}bold_italic_ξ follows one of the major continuous probability distributions such as the normal and logistic distributions. In the next section, we show that our application examples satisfy Assumptions 13.

Remark.

Assumption 2 does not hold in the case of unconstrained price optimization, but we can assume 𝒙[G,G]n𝒙superscript𝐺𝐺𝑛\bm{x}\in[-G,G]^{n}bold_italic_x ∈ [ - italic_G , italic_G ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for a sufficiently large G+𝐺subscriptG\in\mathbb{R}_{+}italic_G ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT in practice.

3.3 Application Examples

3.3.1 Multiproduct Pricing

We consider a variant of (Gallego and Wang, 2014; Zhang et al., 2018) in which a decision maker exists that determines the prices of multiple products and there are m𝑚mitalic_m buyers and n𝑛nitalic_n products. Let 𝒙:=(x1,x2,,xn)[xmin,xmax]nassign𝒙subscript𝑥1subscript𝑥2subscript𝑥𝑛superscriptsubscript𝑥subscript𝑥𝑛\bm{x}:=(x_{1},x_{2},\dots,x_{n})\in[x_{\min},x_{\max}]^{n}bold_italic_x := ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be the price vector for the products. We assume buyers choose one product stochastically; Each buyer chooses product iI:={1,,n}𝑖𝐼assign1𝑛i\in I:=\{1,\dots,n\}italic_i ∈ italic_I := { 1 , … , italic_n } with probability pi(𝒙)=eγi(αixi)a0+j=1neγj(αjxj)subscript𝑝𝑖𝒙superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎0superscriptsubscript𝑗1𝑛superscript𝑒subscript𝛾𝑗subscript𝛼𝑗subscript𝑥𝑗p_{i}(\bm{x})=\frac{e^{\gamma_{i}(\alpha_{i}-x_{i})}}{a_{0}+\operatorname*{% \textstyle\sum}_{j=1}^{n}e^{\gamma_{j}(\alpha_{j}-x_{j})}}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG or does not choose any product with probability p0(𝒙)=a0a0+j=1neγj(αjxj)subscript𝑝0𝒙subscript𝑎0subscript𝑎0superscriptsubscript𝑗1𝑛superscript𝑒subscript𝛾𝑗subscript𝛼𝑗subscript𝑥𝑗p_{0}(\bm{x})=\frac{a_{0}}{a_{0}+\operatorname*{\textstyle\sum}_{j=1}^{n}e^{% \gamma_{j}(\alpha_{j}-x_{j})}}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG.555Besides the multinomial logit model, various other models can be considered, such as the nested logit model (Gallego and Wang, 2014) and the generalized nested logit model (Zhang et al., 2018). Here, αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are positive constants that can be estimated from historical transaction data (Croissant, 2012). Let 𝝃{0,1,,m}n+1𝝃superscript01𝑚𝑛1\bm{\xi}\in\{0,1,\dots,m\}^{n+1}bold_italic_ξ ∈ { 0 , 1 , … , italic_m } start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT be a random vector, where ξ0subscript𝜉0\xi_{0}italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT represents the number of buyers not purchasing any product and ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n represents the number of sales of each product. Let s(𝒙,𝝃)𝑠𝒙𝝃s(\bm{x},\bm{\xi})italic_s ( bold_italic_x , bold_italic_ξ ) and c(𝝃)𝑐𝝃c(\bm{\xi})italic_c ( bold_italic_ξ ) be real-valued functions representing the sales and costs of products, respectively. The following functions are possible for s𝑠sitalic_s and c𝑐citalic_c:

s(𝒙,𝝃):=i=1nxiξi,c(𝝃):=i=1nci(ξi),formulae-sequenceassign𝑠𝒙𝝃superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝜉𝑖assign𝑐𝝃superscriptsubscript𝑖1𝑛subscript𝑐𝑖subscript𝜉𝑖\displaystyle s(\bm{x},\bm{\xi}):=\sum_{i=1}^{n}x_{i}\xi_{i},\quad c(\bm{\xi})% :=\sum_{i=1}^{n}c_{i}(\xi_{i}),italic_s ( bold_italic_x , bold_italic_ξ ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c ( bold_italic_ξ ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , wherewhere\displaystyle\mathrm{where}\ roman_where ci(ξi):={ηi1ξi,ξili,ηi2(ξili)+ηi1li,li<ξiui,ηi3(ξiui)+ηi2(uili)+ηi1li,ξi>ui.assignsubscript𝑐𝑖subscript𝜉𝑖casessubscriptsuperscript𝜂1𝑖subscript𝜉𝑖subscript𝜉𝑖subscript𝑙𝑖subscriptsuperscript𝜂2𝑖subscript𝜉𝑖subscript𝑙𝑖subscriptsuperscript𝜂1𝑖subscript𝑙𝑖subscript𝑙𝑖subscript𝜉𝑖subscript𝑢𝑖subscriptsuperscript𝜂3𝑖subscript𝜉𝑖subscript𝑢𝑖subscriptsuperscript𝜂2𝑖subscript𝑢𝑖subscript𝑙𝑖subscriptsuperscript𝜂1𝑖subscript𝑙𝑖subscript𝜉𝑖subscript𝑢𝑖\displaystyle c_{i}(\xi_{i}):=\begin{cases}\eta^{1}_{i}\xi_{i},&\xi_{i}\leq l_% {i},\\ \eta^{2}_{i}(\xi_{i}-l_{i})+\eta^{1}_{i}l_{i},&l_{i}<\xi_{i}\leq u_{i},\\ \eta^{3}_{i}(\xi_{i}-u_{i})+\eta^{2}_{i}(u_{i}-l_{i})+\eta^{1}_{i}l_{i},&\xi_{% i}>u_{i}.\end{cases}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) := { start_ROW start_CELL italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL end_ROW

Here, ηi1subscriptsuperscript𝜂1𝑖\eta^{1}_{i}italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, ηi2subscriptsuperscript𝜂2𝑖\eta^{2}_{i}italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, ηi3subscriptsuperscript𝜂3𝑖\eta^{3}_{i}italic_η start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are constants for each i𝑖iitalic_i. The function cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the case where the cost rate varies with the number of sold products (which is also called economies of scale or diseconomies of scale).

The revenue-maximizing problem is as follows:

min𝒙[xmin,xmax]n𝔼𝝃D(𝒙)[s(𝒙,𝝃)+c(𝝃)],subscript𝒙superscriptsubscript𝑥subscript𝑥𝑛subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑠𝒙𝝃𝑐𝝃\displaystyle\min_{\bm{x}\in[x_{\min},x_{\max}]^{n}}\mathbb{E}_{\bm{\xi}\sim D% (\bm{x})}\left[-s(\bm{x},\bm{\xi})+c(\bm{\xi})\right],roman_min start_POSTSUBSCRIPT bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ) ] ,

where the probability mass function of D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) is Pr(𝝃𝒙):=i=0nCξimpi(𝒙)ξiassignPrconditional𝝃𝒙superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖\Pr(\bm{\xi}\mid\bm{x}):=\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(\bm{x})^{\xi_{i}}roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) := ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. It can be written in the form of (P).666If the functions s𝑠sitalic_s and c𝑐citalic_c are linear w.r.t. ξ𝜉\xiitalic_ξ, then 𝔼𝝃D(𝒙)[s(𝒙,𝝃)+c(𝝃)]=s(𝒙,𝔼𝝃D(𝒙)[𝝃])+c(𝔼𝝃D(𝒙)[𝝃])subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑠𝒙𝝃𝑐𝝃𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃𝑐subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[-s(\bm{x},\bm{\xi})+c(\bm{\xi})\right% ]=-s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])+c(\mathbb{E}_{\bm{% \xi}\sim D(\bm{x})}[\bm{\xi}])blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ) ] = - italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + italic_c ( blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ), and the problem is a deterministic optimization as is tackled in (Gallego and Wang, 2014; Zhang et al., 2018). Therefore, the problem can be regarded as a generalization of the problems in the previous studies in terms of sales and cost functions.

The following proposition shows that this application satisfies our assumptions.

Proposition 1.

Let γmax:=maxiI|γi|assignsuperscript𝛾subscript𝑖𝐼subscript𝛾𝑖\gamma^{\max}:=\max_{i\in I}|\gamma_{i}|italic_γ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT | italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |. The problem of multiproduct pricing satisfies Assumptions 1 and 2, where f(𝐱,𝛏):=s(𝐱,𝛏)+c(𝛏)assign𝑓𝐱𝛏𝑠𝐱𝛏𝑐𝛏f(\bm{x},\bm{\xi}):=-s(\bm{x},\bm{\xi})+c(\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) := - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ), 𝒞:=[xmin,xmax]nassign𝒞superscriptsubscript𝑥subscript𝑥𝑛\mathcal{C}:=[x_{\min},x_{\max}]^{n}caligraphic_C := [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, Lf:=massignsubscript𝐿𝑓𝑚L_{f}:=mitalic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := italic_m, and M:=nmγmaxassign𝑀𝑛𝑚superscript𝛾M:=nm\gamma^{\max}italic_M := italic_n italic_m italic_γ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT.

The proofs of this proposition and the others can be found in Appendix A.

Remark.

Assumption 1 is not satisfied if pi(𝒙)subscript𝑝𝑖𝒙p_{i}(\bm{x})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) is non-differentiable. However, we can satisfy it by using a smoothing technique (Chen and Mangasarian, 1996; Chen, 2012) to smooth pi(𝒙)subscript𝑝𝑖𝒙p_{i}(\bm{x})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ).

3.3.2 Congestion Pricing for HOT Lanes

We consider a stochastic variant of (Lou et al., 2011) in the following traffic situation:777Our method can be extended to more general situations, such as ones with many lanes. there are two lanes, a high-occupancy/toll (HOT) lane and a regular lane; drivers can only switch from the regular lane to the HOT lane. There is a decision maker determining a price xi[xmin,xmax]subscript𝑥𝑖subscript𝑥subscript𝑥x_{i}\in[x_{\min},x_{\max}]italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] of the HOT lane for each time interval iI:={1,2,,n}𝑖𝐼assign12𝑛i\in I:=\{1,2,\dots,n\}italic_i ∈ italic_I := { 1 , 2 , … , italic_n }. The purpose of the decision maker is (i) to maximize the total flow rate at the bottlenecks of the HOT and regular lanes and (ii) to prevent the density of vehicles at the switching point from exceeding a certain level (to avoid traffic accidents). Let disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the number of homogeneous drivers in the regular lane in a time interval iI𝑖𝐼i\in Iitalic_i ∈ italic_I. Here, each driver in i𝑖iitalic_i changes lane with a probability pi(xi):=11+eαihi+βixi+γiassignsubscript𝑝𝑖subscript𝑥𝑖11superscript𝑒subscript𝛼𝑖subscript𝑖subscript𝛽𝑖subscript𝑥𝑖subscript𝛾𝑖p_{i}(x_{i}):=\frac{1}{1+e^{\alpha_{i}h_{i}+\beta_{i}x_{i}+\gamma_{i}}}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) := divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG, where hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a constant indicating the average time savings if a driver chooses the HOT lane at time iI𝑖𝐼i\in Iitalic_i ∈ italic_I. The parameters αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are constants, which can be estimated in real-time (Lou et al., 2011, Section 2.2).

The optimization problem is as follows (the details can be found in (Lou et al., 2011, Section 3)):

max𝒙[xmin,xmax]I𝔼𝝃D(𝒙)[iI(qH(ξi)+qR(diξi))+θmin(k~1|I|iIk(ξi),0)],subscript𝒙superscriptsubscript𝑥subscript𝑥𝐼subscript𝔼similar-to𝝃𝐷𝒙delimited-[]subscript𝑖𝐼subscript𝑞𝐻subscript𝜉𝑖subscript𝑞𝑅subscript𝑑𝑖subscript𝜉𝑖𝜃~𝑘1𝐼subscript𝑖𝐼𝑘subscript𝜉𝑖0\displaystyle\max_{\bm{x}\in[x_{\min},x_{\max}]^{I}}\mathbb{E}_{\bm{\xi}\sim D% (\bm{x})}\left[\operatorname*{\textstyle\sum}_{i\in I}\left(q_{H}\left(\xi_{i}% \right)+q_{R}\left(d_{i}-\xi_{i}\right)\right)+\theta\min\left(\tilde{k}-\frac% {1}{|I|}\operatorname*{\textstyle\sum}_{i\in I}k\left(\xi_{i}\right),0\right)% \right],roman_max start_POSTSUBSCRIPT bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_q start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + italic_θ roman_min ( over~ start_ARG italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG | italic_I | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_k ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , 0 ) ] ,

where ξi{0,1,,di}subscript𝜉𝑖01subscript𝑑𝑖\xi_{i}\in\{0,1,\dots,d_{i}\}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 , … , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } is a random variable indicating the number of drivers switching their lanes in i𝑖iitalic_i. Regarding the first term, the values of qH(ξi)subscript𝑞𝐻subscript𝜉𝑖q_{H}\left(\xi_{i}\right)italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and qR(diξi)subscript𝑞𝑅subscript𝑑𝑖subscript𝜉𝑖q_{R}\left(d_{i}-\xi_{i}\right)italic_q start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are continuous functions representing flow rates at the bottlenecks on the HOT lane and the regular lane, respectively. This term aims to maximize the flow rate of each lane. Regarding the second term, k~+~𝑘subscript\tilde{k}\in\mathbb{R}_{+}over~ start_ARG italic_k end_ARG ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is the critical density (the density likely to cause traffic accidents) of vehicles at the switching point, and k(ξi)𝑘subscript𝜉𝑖k(\xi_{i})italic_k ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a continuous function representing the density at the switching point for the demand ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in i𝑖iitalic_i. Therefore, θmin(k~1|I|iIk(ξi),0)𝜃~𝑘1𝐼subscript𝑖𝐼𝑘subscript𝜉𝑖0\theta\min\left(\tilde{k}-\frac{1}{|I|}\operatorname*{\textstyle\sum}_{i\in I}% k\left(\xi_{i}\right),0\right)italic_θ roman_min ( over~ start_ARG italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG | italic_I | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_k ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , 0 ) is a penalty term for densities above the critical density, where θ0𝜃0\theta\geq 0italic_θ ≥ 0 is the penalty parameter. This optimization problem can be written in the form of (P), where s(𝒙,𝝃):=0assign𝑠𝒙𝝃0s(\bm{x},\bm{\xi}):=0italic_s ( bold_italic_x , bold_italic_ξ ) := 0, c(𝝃):=iI(qH(ξi)+qR(diξi))+θmin(k~1|I|iIk(ξi),0)assign𝑐𝝃subscript𝑖𝐼subscript𝑞𝐻subscript𝜉𝑖subscript𝑞𝑅subscript𝑑𝑖subscript𝜉𝑖𝜃~𝑘1𝐼subscript𝑖𝐼𝑘subscript𝜉𝑖0c(\bm{\xi}):=\operatorname*{\textstyle\sum}_{i\in I}\left(q_{H}\left(\xi_{i}% \right)+q_{R}\left(d_{i}-\xi_{i}\right)\right)+\theta\min\left(\tilde{k}-\frac% {1}{|I|}\operatorname*{\textstyle\sum}_{i\in I}k\left(\xi_{i}\right),0\right)italic_c ( bold_italic_ξ ) := ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_q start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + italic_θ roman_min ( over~ start_ARG italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG | italic_I | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_k ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , 0 ), and Pr(𝝃𝒙):=iICξidipi(xi)ξi(1pi(xi))diξiassignPrconditional𝝃𝒙subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖\Pr(\bm{\xi}\mid\bm{x}):=\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})^{\xi_% {i}}(1-p_{i}(x_{i}))^{d_{i}-\xi_{i}}roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) := ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

The following proposition shows that this application satisfies our assumptions.

Proposition 2.

The problem of congestion pricing for HOT lanes satisfies Assumptions 1 and 2, where f(𝐱,𝛏):=iI(qH(ξi)+qR(diξi))+θmin(k~1|I|iIk(ξi),0)assign𝑓𝐱𝛏subscript𝑖𝐼subscript𝑞𝐻subscript𝜉𝑖subscript𝑞𝑅subscript𝑑𝑖subscript𝜉𝑖𝜃normal-~𝑘1𝐼subscript𝑖𝐼𝑘subscript𝜉𝑖0f(\bm{x},\bm{\xi}):=\operatorname*{\textstyle\sum}_{i\in I}\left(q_{H}\left(% \xi_{i}\right)+q_{R}\left(d_{i}-\xi_{i}\right)\right)+\theta\min(\tilde{k}-% \frac{1}{|I|}\operatorname*{\textstyle\sum}_{i\in I}k\left(\xi_{i}\right),0)italic_f ( bold_italic_x , bold_italic_ξ ) := ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_q start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + italic_θ roman_min ( over~ start_ARG italic_k end_ARG - divide start_ARG 1 end_ARG start_ARG | italic_I | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_k ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , 0 ), Lf:=0assignsubscript𝐿𝑓0L_{f}:=0italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := 0, 𝒞:=[xmin,xmax]Iassign𝒞superscriptsubscript𝑥subscript𝑥𝐼\mathcal{C}:=[x_{\min},x_{\max}]^{I}caligraphic_C := [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT, and M:=|I|maxiI(|βi|di)assign𝑀𝐼subscript𝑖𝐼subscript𝛽𝑖subscript𝑑𝑖M:=|I|\max_{i\in I}\left(|\beta_{i}|d_{i}\right)italic_M := | italic_I | roman_max start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

3.3.3 Pricing with Demand Prediction from Limited Data Points

Here, we will consider optimizing prices of n𝑛nitalic_n types of item. Regarding the prices 𝒙[xmin,xmax]n𝒙superscriptsubscript𝑥subscript𝑥𝑛\bm{x}\in[x_{\min},x_{\max}]^{n}bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the demand ξi{ξ0ξξimax}subscript𝜉𝑖conditional-set𝜉0𝜉superscriptsubscript𝜉𝑖\xi_{i}\in\{\xi\mid 0\leq\xi\leq\xi_{i}^{\max}\}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { italic_ξ ∣ 0 ≤ italic_ξ ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT } of item i𝑖iitalic_i is predicted using data points D^:={(𝒙^d,𝝃^d)}d=1Nassign^𝐷superscriptsubscriptsuperscript^𝒙𝑑superscript^𝝃𝑑𝑑1𝑁\hat{D}:=\{(\hat{\bm{x}}^{d},\hat{\bm{\xi}}^{d})\}_{d=1}^{N}over^ start_ARG italic_D end_ARG := { ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_ξ end_ARG start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT through the truncated Gaussian process (Swiler et al., 2020, Section 8.1):

ξi𝟏{0ξiξimax}(ξi)Ci(𝒙)N[𝒗i(𝒙)𝒂i,(σi)2𝒗i(𝒙)Ai𝒗i(𝒙)],similar-tosubscript𝜉𝑖subscript10subscript𝜉𝑖superscriptsubscript𝜉𝑖subscript𝜉𝑖superscript𝐶𝑖𝒙𝑁superscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙\displaystyle\xi_{i}\sim\frac{\bm{1}_{\{0\leq\xi_{i}\leq\xi_{i}^{\max}\}}(\xi_% {i})}{C^{i}(\bm{x})}N[\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i},(\sigma^{i})^{2}-\bm% {v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})],italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ divide start_ARG bold_1 start_POSTSUBSCRIPT { 0 ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT } end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG italic_N [ bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ] ,
whereCi(𝒙)=0ξimax12π((σi)2𝒗i(𝒙)Ai𝒗i(𝒙))exp((ϕ𝒗i(𝒙)𝒂i)22((σi)2𝒗i(𝒙)Ai𝒗i(𝒙)))𝑑ϕ.wheresuperscript𝐶𝑖𝒙superscriptsubscript0superscriptsubscript𝜉𝑖12𝜋superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙superscriptitalic-ϕsuperscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖22superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙differential-ditalic-ϕ\displaystyle\textrm{where}\ C^{i}(\bm{x})=\int_{0}^{\xi_{i}^{\max}}\frac{1}{% \sqrt{2\pi((\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}% }\exp\left(-\frac{(\phi-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})^{2}}{2((\sigma^{i% })^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}\right)d\phi.where italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG end_ARG roman_exp ( - divide start_ARG ( italic_ϕ - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG ) italic_d italic_ϕ .

Here, 𝒗i:nN:superscript𝒗𝑖superscript𝑛superscript𝑁\bm{v}^{i}:\mathbb{R}^{n}\to\mathbb{R}^{N}bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, 𝒂iNsuperscript𝒂𝑖superscript𝑁\bm{a}^{i}\in\mathbb{R}^{N}bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, σisuperscript𝜎𝑖\sigma^{i}\in\mathbb{R}italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R, and AiN×Nsuperscript𝐴𝑖superscript𝑁𝑁A^{i}\in\mathbb{R}^{N\times N}italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT are respectively the function, vector, scalar, and matrix learned from the data points D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG. The j𝑗jitalic_j-th element vji(𝒙)superscriptsubscript𝑣𝑗𝑖𝒙v_{j}^{i}(\bm{x})italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) of 𝒗i(𝒙)superscript𝒗𝑖𝒙\bm{v}^{i}(\bm{x})bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) is defined by vji(𝒙):=θ1iexp(𝒙𝒙^j2θ2i)assignsubscriptsuperscript𝑣𝑖𝑗𝒙superscriptsubscript𝜃1𝑖superscriptnorm𝒙superscript^𝒙𝑗2subscriptsuperscript𝜃𝑖2v^{i}_{j}(\bm{x}):=\theta_{1}^{i}\exp\left(-\frac{\|\bm{x}-\hat{\bm{x}}^{j}\|^% {2}}{\theta^{i}_{2}}\right)italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) := italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ), where θ1i+superscriptsubscript𝜃1𝑖subscript\theta_{1}^{i}\in\mathbb{R}_{+}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and θ2i+superscriptsubscript𝜃2𝑖subscript\theta_{2}^{i}\in\mathbb{R}_{+}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT are learned constants. The normalization function Ci(𝒙)superscript𝐶𝑖𝒙C^{i}(\bm{x})italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) is the probability that a sample lies in {0ξiξimax}0subscript𝜉𝑖superscriptsubscript𝜉𝑖\{0\leq\xi_{i}\leq\xi_{i}^{\max}\}{ 0 ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT }. Here, (σi)2𝒗i(𝒙)Ai𝒗i(𝒙)Δsuperscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙Δ(\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})\geq\Delta( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ≥ roman_Δ for some Δ+Δsubscript\Delta\in\mathbb{R}_{+}roman_Δ ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.888Given that the observations are subject to noise, it is natural to predict that the variance (σi)2𝒗i(𝒙)Ai𝒗i(𝒙)superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙(\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) is more than or equal to a certain constant (ΔΔ\Deltaroman_Δ).

The revenue-maximizing problem is as follows:

min𝒙[xmin,xmax]n𝔼𝝃D(𝒙)[s(𝒙,𝝃)+c(𝝃)],subscript𝒙superscriptsubscript𝑥subscript𝑥𝑛subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑠𝒙𝝃𝑐𝝃\displaystyle\min_{\bm{x}\in[x_{\min},x_{\max}]^{n}}\ \mathbb{E}_{\bm{\xi}\sim D% (\bm{x})}\left[-s(\bm{x},\bm{\xi})+c\left(\bm{\xi}\right)\right],roman_min start_POSTSUBSCRIPT bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ) ] ,

where s(𝒙,𝝃):=𝝃𝒙assign𝑠𝒙𝝃superscript𝝃top𝒙s(\bm{x},\bm{\xi}):=\bm{\xi}^{\top}\bm{x}italic_s ( bold_italic_x , bold_italic_ξ ) := bold_italic_ξ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x, c(𝝃):=i=1nci(ξi)assign𝑐𝝃superscriptsubscript𝑖1𝑛subscript𝑐𝑖subscript𝜉𝑖c(\bm{\xi}):=\sum_{i=1}^{n}c_{i}(\xi_{i})italic_c ( bold_italic_ξ ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and ci::subscript𝑐𝑖c_{i}:\mathbb{R}\to\mathbb{R}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R → blackboard_R is a continuous function for i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n. ci(ξi)subscript𝑐𝑖subscript𝜉𝑖c_{i}(\xi_{i})italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents the cost for item i𝑖iitalic_i. This problem can be written in the form of (P), where Pr(𝝃𝒙):=i=1n1Ci(𝒙)2π((σi)2𝒗i(𝒙)Ai𝒗i(𝒙))exp((ξi𝒗i(𝒙)𝒂i)22((σi)2𝒗i(𝒙)Ai𝒗i(𝒙)))assignPrconditional𝝃𝒙superscriptsubscriptproduct𝑖1𝑛1superscript𝐶𝑖𝒙2𝜋superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙superscriptsubscript𝜉𝑖superscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖22superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙\Pr(\bm{\xi}\mid\bm{x}):=\prod_{i=1}^{n}\frac{1}{C^{i}(\bm{x})\sqrt{2\pi((% \sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}}\exp\left(-% \frac{(\xi_{i}-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})^{2}}{2((\sigma^{i})^{2}-% \bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}\right)roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) := ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) square-root start_ARG 2 italic_π ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG end_ARG roman_exp ( - divide start_ARG ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG ) for 𝝃Ξ:={𝝃i[n],0ξiξimax}𝝃Ξassignconditional-set𝝃formulae-sequencefor-all𝑖delimited-[]𝑛0subscript𝜉𝑖superscriptsubscript𝜉𝑖\bm{\xi}\in\Xi:=\{\bm{\xi}\mid\forall i\in[n],0\leq\xi_{i}\leq\xi_{i}^{\max}\}bold_italic_ξ ∈ roman_Ξ := { bold_italic_ξ ∣ ∀ italic_i ∈ [ italic_n ] , 0 ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT }.

The following proposition shows that this application satisfies our assumptions.

Proposition 3.

Let θ1max:=maxiθ1iassignsuperscriptsubscript𝜃1subscript𝑖superscriptsubscript𝜃1𝑖\theta_{1}^{\max}:=\max_{i}\theta_{1}^{i}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, θ2min:=miniθ2iassignsuperscriptsubscript𝜃2subscript𝑖superscriptsubscript𝜃2𝑖\theta_{2}^{\min}:=\min_{i}\theta_{2}^{i}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT := roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, amax:=maxi,k|aki|assignsuperscript𝑎subscript𝑖𝑘subscriptsuperscript𝑎𝑖𝑘a^{\max}:=\max_{i,k}|a^{i}_{k}|italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT | italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |, Amax:=maxi,k,l|Ak,li|assignsuperscript𝐴subscript𝑖𝑘𝑙subscriptsuperscript𝐴𝑖𝑘𝑙A^{\max}:=\max_{i,k,l}|A^{i}_{k,l}|italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i , italic_k , italic_l end_POSTSUBSCRIPT | italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT |, and ξmax:=maxiξimaxassignsuperscript𝜉subscript𝑖superscriptsubscript𝜉𝑖\xi^{\max}:=\max_{i}\xi_{i}^{\max}italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT. The problem of pricing with demand prediction from limited data points satisfies Assumptions 13, where f(𝐱,𝛏):=s(𝐱,𝛏)+c(𝛏)assign𝑓𝐱𝛏𝑠𝐱𝛏𝑐𝛏f(\bm{x},\bm{\xi}):=-s(\bm{x},\bm{\xi})+c(\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) := - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ), 𝒞:=[xmin,xmax]nassign𝒞superscriptsubscript𝑥subscript𝑥𝑛\mathcal{C}:=[x_{\min},x_{\max}]^{n}caligraphic_C := [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, Lf:=nξmaxassignsubscript𝐿𝑓𝑛superscript𝜉L_{f}:=n\xi^{\max}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := italic_n italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT, and M:=4n2Nθ1max(xmaxxmin)Δθ2min(NAmaxθ1max+(ξmax+Nθ1maxamax)(amax+NAmaxθ1maxξmax+Nθ1maxamaxΔ))assign𝑀4superscript𝑛2𝑁superscriptsubscript𝜃1subscript𝑥subscript𝑥normal-Δsuperscriptsubscript𝜃2𝑁superscript𝐴superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎superscript𝑎𝑁superscript𝐴superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎normal-ΔM:=\frac{4n^{2}N\theta_{1}^{\max}(x_{\max}-x_{\min})}{\Delta\theta_{2}^{\min}}% \left(NA^{\max}\theta_{1}^{\max}+(\xi^{\max}+N\theta_{1}^{\max}a^{\max})\left(% a^{\max}+NA^{\max}\theta_{1}^{\max}\frac{\xi^{\max}+N\theta_{1}^{\max}a^{\max}% }{\Delta}\right)\right)italic_M := divide start_ARG 4 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Δ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG ( italic_N italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + ( italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) ( italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG ) ).

3.4 Advantages of Our Formulation

3.4.1 Benefits of Using Decision-dependent Random Variables

The multiproduct pricing problem in Section 3.3.1 can also be expressed in terms of decision-independent random variables as follows. Each buyer j=1,2,,m𝑗12𝑚j=1,2,\dots,mitalic_j = 1 , 2 , … , italic_m has a value γi(αixi)+μijsubscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝜇𝑖𝑗\gamma_{i}(\alpha_{i}-x_{i})+\mu_{ij}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for each product i𝑖iitalic_i, where αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are constants, xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the price, and μijsubscript𝜇𝑖𝑗\mu_{ij}italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is a random variable following a Gumbel distribution with mode 0 and variance π26superscript𝜋26\frac{\pi^{2}}{6}divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 6 end_ARG. Each buyer purchases a product i𝑖iitalic_i with the highest γi(αixi)+μijsubscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝜇𝑖𝑗\gamma_{i}(\alpha_{i}-x_{i})+\mu_{ij}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. Accordingly, the demand ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for product i𝑖iitalic_i can be defined by ξi(𝒙,𝝁):=j=1mξij(𝒙,𝝁j)assignsubscript𝜉𝑖𝒙𝝁superscriptsubscript𝑗1𝑚subscript𝜉𝑖𝑗𝒙subscript𝝁𝑗\xi_{i}(\bm{x},\bm{\mu}):=\sum_{j=1}^{m}\xi_{ij}(\bm{x},\bm{\mu}_{j})italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ ) := ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), where 𝝁j=(μij)i=1nsubscript𝝁𝑗superscriptsubscriptsubscript𝜇𝑖𝑗𝑖1𝑛\bm{\mu}_{j}=(\mu_{ij})_{i=1}^{n}bold_italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( italic_μ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and ξij(𝒙,𝝁j):=1assignsubscript𝜉𝑖𝑗𝒙subscript𝝁𝑗1\xi_{ij}(\bm{x},\bm{\mu}_{j}):=1italic_ξ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) := 1 if i=argmaxr{γr(αrxr)+μrj}𝑖subscriptargmax𝑟subscript𝛾𝑟subscript𝛼𝑟subscript𝑥𝑟subscript𝜇𝑟𝑗i=\operatorname*{argmax}_{r}\{\gamma_{r}(\alpha_{r}-x_{r})+\mu_{rj}\}italic_i = roman_argmax start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) + italic_μ start_POSTSUBSCRIPT italic_r italic_j end_POSTSUBSCRIPT } with ξij(𝒙,𝝁j):=0assignsubscript𝜉𝑖𝑗𝒙subscript𝝁𝑗0\xi_{ij}(\bm{x},\bm{\mu}_{j}):=0italic_ξ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) := 0 otherwise. The optimization problem can be written as follows by letting 𝝃(𝒙,𝝁):=(ξi(𝒙,𝝁))i=1nassign𝝃𝒙𝝁superscriptsubscriptsubscript𝜉𝑖𝒙𝝁𝑖1𝑛\bm{\xi}(\bm{x},\bm{\mu}):=(\xi_{i}(\bm{x},\bm{\mu}))_{i=1}^{n}bold_italic_ξ ( bold_italic_x , bold_italic_μ ) := ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ ) ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:

min𝒙[xmin,xmax]n𝔼𝝁[s(𝒙,𝝃(𝒙,𝝁))+c(𝝃(𝒙,𝝁))].subscript𝒙superscriptsubscript𝑥subscript𝑥𝑛subscript𝔼𝝁delimited-[]𝑠𝒙𝝃𝒙𝝁𝑐𝝃𝒙𝝁\displaystyle\min_{\bm{x}\in[x_{\min},x_{\max}]^{n}}\mathbb{E}_{\bm{\mu}}\left% [-s(\bm{x},\bm{\xi}(\bm{x},\bm{\mu}))+c(\bm{\xi}(\bm{x},\bm{\mu}))\right].roman_min start_POSTSUBSCRIPT bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT [ - italic_s ( bold_italic_x , bold_italic_ξ ( bold_italic_x , bold_italic_μ ) ) + italic_c ( bold_italic_ξ ( bold_italic_x , bold_italic_μ ) ) ] .

Although the multiproduct pricing problem can be formulated in the above manner with decision-independent random variables, the discontinuous function ξij(𝒙,𝝁j)subscript𝜉𝑖𝑗𝒙subscript𝝁𝑗\xi_{ij}(\bm{x},\bm{\mu}_{j})italic_ξ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) makes it difficult to optimize.999Optimization problems involving such discontinuous functions have been addressed by (Correa et al., 2017). They propose approximation methods to deal with this difficulty. In contrast, our problem does not involve a discontinuous function, which allows us to use gradient-based methods.

Moreover, He et al. (2009) and Dong et al. (2017) tackle similar problems to ours by defining the demand for a product/service as d(x)+ξ𝑑𝑥𝜉d(x)+\xiitalic_d ( italic_x ) + italic_ξ, where x𝑥xitalic_x is price and ξ𝜉\xiitalic_ξ is a decision-independent random variable. However, assuming ξ𝜉\xiitalic_ξ is decision-independent makes it impossible to handle situations where demand uncertainty varies with price. In contrast, our problem setting can deal with such a situation by using decision-dependent random variables.

3.4.2 Differences from Existing Pricing Problems with Decision-dependent Uncertainty

The existing formulations with decision-dependent uncertainty make assumptions specific to their applications. For example, Schulte and Sachs (2020) assume that demand follows a Poisson distribution with an intensity λ(x)𝜆𝑥\lambda(x)italic_λ ( italic_x ), and they cannot use a multinomial or truncated Gaussian distribution as the demand distribution. Moreover, since a fixed cost is charged on their products, they can not handle a nonlinear cost. Hikima et al. (2021) assume that the probability density function for demand is Pr(𝝃𝒙)=uU{pu(xu)ξu(1pu(xu))(1ξu)}Prconditional𝝃𝒙subscriptproduct𝑢𝑈subscript𝑝𝑢superscriptsubscript𝑥𝑢subscript𝜉𝑢superscript1subscript𝑝𝑢subscript𝑥𝑢1subscript𝜉𝑢{\rm Pr}(\bm{\xi}\mid\bm{x})=\displaystyle{\prod\nolimits_{u\in U}}\left\{p_{u% }(x_{u})^{\xi_{u}}(1-p_{u}(x_{u}))^{(1-\xi_{u})}\right\}roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = ∏ start_POSTSUBSCRIPT italic_u ∈ italic_U end_POSTSUBSCRIPT { italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ( 1 - italic_ξ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT }, where pu(x)subscript𝑝𝑢𝑥p_{u}(x)italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_x ) is the probability that service user uU𝑢𝑈u\in Uitalic_u ∈ italic_U accepts the price xusubscript𝑥𝑢x_{u}italic_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT; they cannot use a truncated Gaussian distribution. In addition, they assume a specific objective function, which is defined by a bipartite matching problem with uncertainty.

In contrast to the existing formulations, ours has more varied applications because it has more general assumptions. The trade off for this generality is that our problem is non-convex and the dependence of the probability distribution on the decision variables defeats conventional stochastic optimization theory. Below, we focus on finding a stationary point and develop a projected stochastic gradient descent method by deriving unbiased stochastic gradient estimators.

4 Proposed Method

4.1 Preliminaries

Definition 1 (Projection oracle).

Given a point 𝐱𝐱\bm{x}bold_italic_x, we define the following as a projection oracle:

proj𝒞(𝒙):=argmin𝒚n{𝒙𝒚2𝒚𝒞}.assignsubscriptproj𝒞𝒙argsubscript𝒚superscript𝑛conditionalsubscriptnorm𝒙𝒚2𝒚𝒞\displaystyle\mathrm{proj}_{\mathcal{C}}(\bm{x}):=\mathrm{arg}\min_{\bm{y}\in% \mathbb{R}^{n}}\{\|\bm{x}-\bm{y}\|_{2}\mid\bm{y}\in\mathcal{C}\}.roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( bold_italic_x ) := roman_arg roman_min start_POSTSUBSCRIPT bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { ∥ bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∣ bold_italic_y ∈ caligraphic_C } .
Definition 2 (Unbiased stochastic gradient).

Given a point 𝐱𝐱\bm{x}bold_italic_x, we call g(𝐱,𝛏)𝑔𝐱𝛏g(\bm{x},\bm{\xi})italic_g ( bold_italic_x , bold_italic_ξ ) an “unbiased stochastic gradient” if

𝔼𝝃D(𝒙)[g(𝒙,𝝃)]=𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)].subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑔𝒙𝝃subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[g(\bm{x},\bm{\xi})]=\nabla_{\bm{x}}\mathbb% {E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{\xi})].blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_g ( bold_italic_x , bold_italic_ξ ) ] = ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] .
Definition 3 (Gradient map**).

Given a point 𝐱𝐱\bm{x}bold_italic_x and η+𝜂subscript\eta\in\mathbb{R}_{+}italic_η ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, the gradient map** of (P) is defined by

𝒢(𝒙,η):=1η(𝒙proj𝒞(𝒙η𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)])).assign𝒢𝒙𝜂1𝜂𝒙subscriptproj𝒞𝒙𝜂subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃\mathcal{G}(\bm{x},\eta):=\frac{1}{\eta}(\bm{x}-\mathrm{proj}_{\mathcal{C}}(% \bm{x}-\eta\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{\xi% })])).caligraphic_G ( bold_italic_x , italic_η ) := divide start_ARG 1 end_ARG start_ARG italic_η end_ARG ( bold_italic_x - roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( bold_italic_x - italic_η ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ) ) .
Definition 4 (ε𝜀\varepsilonitalic_ε-stationary point).

We call 𝐱^normal-^𝐱\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG an ε𝜀\varepsilonitalic_ε-stationary point for (P) if 𝔼𝐱^[𝒢(𝐱^,η)2]ε2subscript𝔼normal-^𝐱delimited-[]superscriptnorm𝒢normal-^𝐱𝜂2superscript𝜀2\mathbb{E}_{\hat{\bm{x}}}[\|\mathcal{G}(\hat{\bm{x}},\eta)\|^{2}]\leq% \varepsilon^{2}blackboard_E start_POSTSUBSCRIPT over^ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT [ ∥ caligraphic_G ( over^ start_ARG bold_italic_x end_ARG , italic_η ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some η+𝜂subscript\eta\in\mathbb{R}_{+}italic_η ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, where 𝐱^normal-^𝐱\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG denotes the point returned by a stochastic algorithm.

The following preliminary lemmas are needed for ensuring our method’s convergence when the random variables are continuous.

Lemma 4.

Suppose that Assumptions 13 hold. Let h(𝐱,𝛏):=f(𝐱,𝛏)Pr(𝛏𝐱)assign𝐱𝛏𝑓𝐱𝛏normal-Prconditional𝛏𝐱h(\bm{x},\bm{\xi}):=f(\bm{x},\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x})italic_h ( bold_italic_x , bold_italic_ξ ) := italic_f ( bold_italic_x , bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ). Then,

𝒙𝝃Ξh(𝒙,𝝃)𝑑𝝃=𝝃Ξ𝒙h(𝒙,𝝃)𝑑𝝃subscript𝒙subscript𝝃Ξ𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝒙𝒙𝝃differential-d𝝃\displaystyle\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}h(\bm{x},\bm{\xi})d\bm{\xi}=% \int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}h(\bm{x},\bm{\xi})d{\bm{\xi}}∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ

for all 𝐱𝒞𝐱𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C.

Lemma 5.

Suppose that conditions (ii) and (iii) of Assumption 1 and Assumptions 2 and 3 hold. Then,

𝒙𝝃Ξq(𝝃)Pr(𝝃𝒙)𝑑𝝃=𝝃Ξ𝒙q(𝝃)Pr(𝝃𝒙)𝑑𝝃subscript𝒙subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙differential-d𝝃subscript𝝃Ξsubscript𝒙𝑞𝝃Prconditional𝝃𝒙differential-d𝝃\displaystyle\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}q(\bm{\xi})\mathrm{Pr}(\bm{% \xi}\mid\bm{x})d\bm{\xi}=\int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}q(\bm{\xi})% \mathrm{Pr}(\bm{\xi}\mid\bm{x})d{\bm{\xi}}∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ

for all 𝐱𝒞𝐱𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C and any real-valued continuous function q:Ξnormal-:𝑞normal-→normal-Ξq:\Xi\to\mathbb{R}italic_q : roman_Ξ → blackboard_R.

Throughout the paper, we let

fmax:=max𝒙𝒞,𝝃Ξ|f(𝒙,𝝃)|,assignsubscript𝑓subscriptformulae-sequence𝒙𝒞𝝃Ξ𝑓𝒙𝝃\displaystyle f_{\max}:=\max_{\bm{x}\in\mathcal{C},\bm{\xi}\in\Xi}|f(\bm{x},% \bm{\xi})|,italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_C , bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT | italic_f ( bold_italic_x , bold_italic_ξ ) | , (1)

which exists since 𝒞𝒞\mathcal{C}caligraphic_C and ΞΞ\Xiroman_Ξ are compact from Assumption 2 and f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is real-valued and continuous from Assumption 1.

4.2 Unbiased Stochastic Gradient for (P)

First, we propose an unbiased stochastic gradient for (P).

Lemma 6.

Suppose that conditions (i) and (ii) of Assumption 1 hold. Moreover, suppose that condition (iii) of Assumption 1, Assumption 2, and Assumption 3 hold if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let δ𝛿\delta\in\mathbb{R}italic_δ ∈ blackboard_R and

𝒈(𝒙,𝝃,δ):=𝒙f(𝒙,𝝃)+(f(𝒙,𝝃)δ)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙).assign𝒈𝒙𝝃𝛿subscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃𝛿subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙\bm{g}(\bm{x},\bm{\xi},\delta):=\nabla_{\bm{x}}f(\bm{x},\bm{\xi})+\left(f(\bm{% x},\bm{\xi})-\delta\right)\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})% }{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}.bold_italic_g ( bold_italic_x , bold_italic_ξ , italic_δ ) := ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + ( italic_f ( bold_italic_x , bold_italic_ξ ) - italic_δ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG .

Then, 𝐠(𝐱,𝛏,δ)𝐠𝐱𝛏𝛿\bm{g}(\bm{x},\bm{\xi},\delta)bold_italic_g ( bold_italic_x , bold_italic_ξ , italic_δ ) is an unbiased stochastic gradient for (P) for any δ𝛿\delta\in\mathbb{R}italic_δ ∈ blackboard_R.

Inspired by a technique called baseline in reinforcement learning (Williams, 1992; Sutton and Barto, 2018), we decided to include a variance reduction parameter δ𝛿\deltaitalic_δ in the unbiased stochastic gradient. If δ𝛿\deltaitalic_δ is close to 𝔼𝝃D(𝒙)[f(𝒙,𝝃)]subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{\xi})]blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ], the second term of g(𝒙,𝝃,δ)𝑔𝒙𝝃𝛿g(\bm{x},\bm{\xi},\delta)italic_g ( bold_italic_x , bold_italic_ξ , italic_δ ) is small, and the variance of g(𝒙,𝝃,δ)𝑔𝒙𝝃𝛿g(\bm{x},\bm{\xi},\delta)italic_g ( bold_italic_x , bold_italic_ξ , italic_δ ) is reduced. We show how to determine δ𝛿\deltaitalic_δ in Section 4.3.

The gradient in Lemma 6 has the following useful feature.

Lemma 7.

Suppose that Assumptions 1 and 2 hold. Moreover, suppose that Assumption 3 holds if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let δ[fmax,fmax]𝛿subscript𝑓subscript𝑓\delta\in[-f_{\max},f_{\max}]italic_δ ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ]. Then, for all 𝐱𝒞𝐱𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C,

𝔼𝝃D(𝒙)[𝒈(𝒙,𝝃,δ)𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]2](Lf+2fmaxM)2,subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnorm𝒈𝒙superscript𝝃𝛿subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃2superscriptsubscript𝐿𝑓2subscript𝑓𝑀2\displaystyle\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|\bm{g}(\bm{x},\bm{% \xi}^{\prime},\delta)-\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm% {x},\bm{\xi})]\|^{2}]\leq(L_{f}+2f_{\max}M)^{2},blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ bold_italic_g ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ ( italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where 𝐠(𝐱,𝛏,δ)𝐠𝐱superscript𝛏normal-′𝛿\bm{g}(\bm{x},\bm{\xi}^{\prime},\delta)bold_italic_g ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) is defined as in Lemma 6.

Lemma 7 shows that the variance of the stochastic gradient of Lemma 6 can be bounded by a constant. This is a necessary condition for stochastic gradient methods to have a convergence rate independent of the number of possible values of 𝝃𝝃\bm{\xi}bold_italic_ξ (Li and Li, 2018).

Moreover, the following lemma is necessary for ensuring the convergence of the proposed method.

Lemma 8.

Suppose that Assumptions 1 and 2 hold. Moreover, suppose that Assumption 3 holds if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Then, for all 𝐱𝒞𝐱𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C,

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]Lf+fmaxM.normsubscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃subscript𝐿𝑓subscript𝑓𝑀\displaystyle\|\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm% {\xi})]\|\leq L_{f}+f_{\max}M.∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M .

4.3 Calculation of Variance Reduction Parameter δ𝛿\deltaitalic_δ

To reduce the variance of the gradient in Lemma 6, the parameter δ𝛿\deltaitalic_δ should be close to 𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x}_{k})}[f(\bm{x}_{k},\bm{\xi})]blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] for the iterate 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. During the iterations of the algorithm, δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is updated to bring it closer to the target value. We consider the following sequential stochastic problem for δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT: a decision maker selects δk[fmaxκ,fmax+κ]subscript𝛿𝑘subscript𝑓𝜅subscript𝑓𝜅\delta_{k}\in[-f_{\max}-\kappa,f_{\max}+\kappa]italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_κ , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_κ ] at iteration k𝑘kitalic_k and incurs an unobserved cost ψk(δk):=12(δk𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)])2assignsubscript𝜓𝑘subscript𝛿𝑘12superscriptsubscript𝛿𝑘subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃2\psi_{k}(\delta_{k}):=\frac{1}{2}(\delta_{k}-\mathbb{E}_{\bm{\xi}\sim D(\bm{x}% _{k})}[f(\bm{x}_{k},\bm{\xi})])^{2}italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and κ+𝜅subscript\kappa\in\mathbb{R}_{+}italic_κ ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT are given;101010κ𝜅\kappaitalic_κ is an arbitrarily small positive value. It extends the range of δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, which is needed in Proposition 9. for the decision maker, an unbiased estimate of 𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x}_{k})}[f(\bm{x}_{k},\bm{\xi})]blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ], denoted by vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, is obtained by sampling. Here, we assume vk[fmax,fmax]subscript𝑣𝑘subscript𝑓subscript𝑓v_{k}\in[-f_{\max},f_{\max}]italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], which is usually holds from the definition (1) of fmaxsubscript𝑓f_{\max}italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT.

As a way to solve the above problem, we propose Algorithm 1, which is based on the online gradient descent (OGD) algorithm (Besbes et al., 2015).

Algorithm 1 OGD algorithm
1:iteration limit R𝑅Ritalic_R, initial parameter δ1[fmax,fmax]subscript𝛿1subscript𝑓subscript𝑓\delta_{1}\in[-f_{\max},f_{\max}]italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], stepsize parameter {ζk}k=2Rsuperscriptsubscriptsubscript𝜁𝑘𝑘2𝑅\{\zeta_{k}\}_{k=2}^{R}{ italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT, noisy target value {vk}k[R]subscriptsubscript𝑣𝑘𝑘delimited-[]𝑅\{v_{k}\}_{k\in[R]}{ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ [ italic_R ] end_POSTSUBSCRIPT.
2:δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ]
3:for k=1,,R1𝑘1𝑅1k=1,\dots,R-1italic_k = 1 , … , italic_R - 1 : 
4:     δk+1(1ζk+1)δk+ζk+1vksubscript𝛿𝑘11subscript𝜁𝑘1subscript𝛿𝑘subscript𝜁𝑘1subscript𝑣𝑘\delta_{k+1}\leftarrow(1-\zeta_{k+1})\delta_{k}+\zeta_{k+1}v_{k}italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( 1 - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

This method updates δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT by using the stochastic gradient since (1ζk+1)δk+ζk+1vk=δkζk+1(δkvk)δkζk+1ψk(δk)1subscript𝜁𝑘1subscript𝛿𝑘subscript𝜁𝑘1subscript𝑣𝑘subscript𝛿𝑘subscript𝜁𝑘1subscript𝛿𝑘subscript𝑣𝑘subscript𝛿𝑘subscript𝜁𝑘1subscript𝜓𝑘subscript𝛿𝑘(1-\zeta_{k+1})\delta_{k}+\zeta_{k+1}v_{k}=\delta_{k}-\zeta_{k+1}(\delta_{k}-v% _{k})\approx\delta_{k}-\zeta_{k+1}\nabla\psi_{k}(\delta_{k})( 1 - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≈ italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∇ italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Note that ψk(δk)=δk𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]subscript𝜓𝑘subscript𝛿𝑘subscript𝛿𝑘subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃\nabla\psi_{k}(\delta_{k})=\delta_{k}-\mathbb{E}_{\bm{\xi}\sim D(\bm{x}_{k})}[% f(\bm{x}_{k},\bm{\xi})]∇ italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] from the definition of ψksubscript𝜓𝑘\psi_{k}italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Accordingly, the following proposition holds from (Besbes et al., 2015, Lemma C-5), which guarantees that Algorithm 1 outputs a solution close to the optimum in terms of regret.

Proposition 9.

Let ζk:=1kassignsubscript𝜁𝑘1𝑘\zeta_{k}:=\frac{1}{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG italic_k end_ARG for k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ], and let δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be the output of Algorithm 1 for k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ]. Then, there exists a constant C¯normal-¯𝐶\bar{C}over¯ start_ARG italic_C end_ARG such that

𝔼[k=1Rψk(δk)]minδk=1Rψk(δ)C¯logR.𝔼delimited-[]superscriptsubscript𝑘1𝑅subscript𝜓𝑘subscript𝛿𝑘subscript𝛿superscriptsubscript𝑘1𝑅subscript𝜓𝑘𝛿¯𝐶𝑅\displaystyle\mathbb{E}\left[\sum_{k=1}^{R}\psi_{k}(\delta_{k})\right]-\min_{% \delta}\sum_{k=1}^{R}\psi_{k}(\delta)\leq\bar{C}\log R.blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] - roman_min start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ ) ≤ over¯ start_ARG italic_C end_ARG roman_log italic_R .

From this proposition and the definition of ψksubscript𝜓𝑘\psi_{k}italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we find that output δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of Algorithm 1 is a reasonable approximation of 𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x}_{k})}[f(\bm{x}_{k},\bm{\xi})]blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ].

4.4 Proposed Algorithm

We propose Algorithm 2 for solving problem (P). It incorporates our stochastic gradient and Algorithm 1 into a projected stochastic gradient method (Ghadimi and Lan, 2016, Algorithm 4). Lines 59 update the iterate on the basis of (Ghadimi and Lan, 2016, Algorithm 4) by using our proposed stochastic gradient. Line 10 updates the variance reduction parameter on the basis of Algorithm 1 by letting vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be 1mk=1mkf(𝒙kmd,𝝃)1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃\frac{1}{m_{k}}\sum_{\ell=1}^{m_{k}}f(\bm{x}_{k}^{md},\bm{\xi}^{\ell})divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ). Note that line 10 does not impose any additional computation cost since 1mk=1mkf(𝒙kmd,𝝃)1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃\frac{1}{m_{k}}{\sum_{\ell=1}^{m_{k}}}f(\bm{x}_{k}^{md},\bm{\xi}^{\ell})divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) is already computed on line 7.

Then, from Lemmas 68 and (Ghadimi and Lan, 2016, Corollary 6), the following convergence theorem holds.

Algorithm 2 Projected stochastic gradient algorithm
1:initial iterate 𝒙0𝒞subscript𝒙0𝒞\bm{x}_{0}\in\mathcal{C}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_C, initial variance reduction parameter δ1[fmax,fmax]subscript𝛿1subscript𝑓subscript𝑓\delta_{1}\in[-f_{\max},f_{\max}]italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], iteration limit {N1}𝑁1\{N\geq 1\}{ italic_N ≥ 1 }, batch size {mk}k[N]subscriptsubscript𝑚𝑘𝑘delimited-[]𝑁\{m_{k}\}_{k\in[N]}{ italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ [ italic_N ] end_POSTSUBSCRIPT, probability distribution DR(N)subscript𝐷𝑅𝑁D_{R}(N)italic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_N ) for set {1,2,,N}12𝑁\{1,2,\dots,N\}{ 1 , 2 , … , italic_N }, and stepsize parameters {αk(0,1]}k[N]subscriptsubscript𝛼𝑘01𝑘delimited-[]𝑁\{\alpha_{k}\in(0,1]\}_{k\in[N]}{ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ ( 0 , 1 ] } start_POSTSUBSCRIPT italic_k ∈ [ italic_N ] end_POSTSUBSCRIPT, {βk+}k[N]subscriptsubscript𝛽𝑘subscript𝑘delimited-[]𝑁\{\beta_{k}\in\mathbb{R}_{+}\}_{k\in[N]}{ italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ [ italic_N ] end_POSTSUBSCRIPT, {λk+}k[N]subscriptsubscript𝜆𝑘subscript𝑘delimited-[]𝑁\{\lambda_{k}\in\mathbb{R}_{+}\}_{k\in[N]}{ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ [ italic_N ] end_POSTSUBSCRIPT, and {ζk}k=2Nsuperscriptsubscriptsubscript𝜁𝑘𝑘2𝑁\{\zeta_{k}\}_{k=2}^{N}{ italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
2:𝒙Rmdsubscriptsuperscript𝒙𝑚𝑑𝑅\bm{x}^{md}_{R}bold_italic_x start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT
3:Set 𝒙0ag=𝒙0subscriptsuperscript𝒙𝑎𝑔0subscript𝒙0\bm{x}^{ag}_{0}=\bm{x}_{0}bold_italic_x start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and sample RDR(N)similar-to𝑅subscript𝐷𝑅𝑁R\sim D_{R}(N)italic_R ∼ italic_D start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_N ).
4:for k=1,2,,R𝑘12𝑅k=1,2,\dots,Ritalic_k = 1 , 2 , … , italic_R : 
5:     𝒙kmd(1αk)𝒙k1ag+αk𝒙k1superscriptsubscript𝒙𝑘𝑚𝑑1subscript𝛼𝑘superscriptsubscript𝒙𝑘1𝑎𝑔subscript𝛼𝑘subscript𝒙𝑘1\bm{x}_{k}^{md}\leftarrow(1-\alpha_{k})\bm{x}_{k-1}^{ag}+\alpha_{k}\bm{x}_{k-1}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ← ( 1 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT
6:     Sample 𝝃D(𝒙kmd)similar-tosuperscript𝝃𝐷superscriptsubscript𝒙𝑘𝑚𝑑\bm{\xi}^{\ell}\sim D(\bm{x}_{k}^{md})bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) for =1,,mk1subscript𝑚𝑘\ell=1,\dots,m_{k}roman_ℓ = 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
7:     𝒈k1mk=1mk(𝒙f(𝒙kmd,𝝃)+(f(𝒙kmd,𝝃)δk)𝒙Pr(𝝃𝒙kmd)Pr(𝝃𝒙kmd))subscript𝒈𝑘1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘subscript𝒙𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃subscript𝛿𝑘subscript𝒙Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑\bm{g}_{k}\leftarrow\frac{1}{m_{k}}{\displaystyle\sum_{\ell=1}^{m_{k}}}\left(% \nabla_{\bm{x}}f(\bm{x}_{k}^{md},\bm{\xi}^{\ell})+(f(\bm{x}_{k}^{md},\bm{\xi}^% {\ell})-\delta_{k})\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}^{\ell}\mid\bm{x}_% {k}^{md})}{\mathrm{Pr}(\bm{\xi}^{\ell}\mid\bm{x}_{k}^{md})}\right)bold_italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) + ( italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) - italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG )
8:     𝒙kproj𝒞(𝒙k1λk𝒈k)subscript𝒙𝑘subscriptproj𝒞subscript𝒙𝑘1subscript𝜆𝑘subscript𝒈𝑘\bm{x}_{k}\leftarrow\mathrm{proj}_{\mathcal{C}}(\bm{x}_{k-1}-\lambda_{k}\bm{g}% _{k})bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
9:     𝒙kagproj𝒞(𝒙kmdβk𝒈k)superscriptsubscript𝒙𝑘𝑎𝑔subscriptproj𝒞superscriptsubscript𝒙𝑘𝑚𝑑subscript𝛽𝑘subscript𝒈𝑘\bm{x}_{k}^{ag}\leftarrow\mathrm{proj}_{\mathcal{C}}(\bm{x}_{k}^{md}-\beta_{k}% \bm{g}_{k})bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT ← roman_proj start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
10:     if kR1𝑘𝑅1k\leq R-1italic_k ≤ italic_R - 1 :  δk+1(1ζk+1)δk+ζk+1mk=1mkf(𝒙kmd,𝝃)subscript𝛿𝑘11subscript𝜁𝑘1subscript𝛿𝑘subscript𝜁𝑘1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃\delta_{k+1}\leftarrow(1-\zeta_{k+1})\delta_{k}+\frac{\zeta_{k+1}}{m_{k}}{% \displaystyle\sum_{\ell=1}^{m_{k}}}f(\bm{x}_{k}^{md},\bm{\xi}^{\ell})italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( 1 - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + divide start_ARG italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )      
Theorem 10.

Suppose that Assumptions 1 and 2 hold. Moreover, suppose that Assumption 3 holds if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let the inputs of Algorithm 2 be αk:=2k+1,βk:=12LEf,λk:=kβk2,mk:=(Lf+2fmaxM)2kLEfD~2formulae-sequenceassignsubscript𝛼𝑘2𝑘1formulae-sequenceassignsubscript𝛽𝑘12subscript𝐿𝐸𝑓formulae-sequenceassignsubscript𝜆𝑘𝑘subscript𝛽𝑘2assignsubscript𝑚𝑘superscriptsubscript𝐿𝑓2subscript𝑓𝑀2𝑘subscript𝐿𝐸𝑓superscriptnormal-~𝐷2\alpha_{k}:=\frac{2}{k+1},\ \beta_{k}:=\frac{1}{2L_{Ef}},\ \lambda_{k}:=\frac{% k\beta_{k}}{2},m_{k}:=\left\lceil\frac{(L_{f}+2f_{\max}M)^{2}k}{L_{Ef}\tilde{D% }^{2}}\right\rceilitalic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 2 end_ARG start_ARG italic_k + 1 end_ARG , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG 2 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT end_ARG , italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG italic_k italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := ⌈ divide start_ARG ( italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⌉, and Pr(R=k):=Γk1βk(1LEfβk)τ=1NΓτ1βτ(1LEfβτ)assignnormal-Pr𝑅𝑘superscriptsubscriptnormal-Γ𝑘1subscript𝛽𝑘1subscript𝐿𝐸𝑓subscript𝛽𝑘superscriptsubscript𝜏1𝑁superscriptsubscriptnormal-Γ𝜏1subscript𝛽𝜏1subscript𝐿𝐸𝑓subscript𝛽𝜏\Pr(R=k):=\frac{\Gamma_{k}^{-1}\beta_{k}(1-L_{Ef}\beta_{k})}{\sum_{{\tau}=1}^{% N}\Gamma_{\tau}^{-1}\beta_{\tau}(1-L_{Ef}\beta_{\tau})}roman_Pr ( italic_R = italic_k ) := divide start_ARG roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 1 - italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( 1 - italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) end_ARG for k=1,2,,N,𝑘12normal-…𝑁k=1,2,\dots,N,italic_k = 1 , 2 , … , italic_N , where D~normal-~𝐷\tilde{D}over~ start_ARG italic_D end_ARG is some parameter, LEf:=Lf+fmaxMassignsubscript𝐿𝐸𝑓subscript𝐿𝑓subscript𝑓𝑀L_{Ef}:=L_{f}+f_{\max}Mitalic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M, Γ1:=1assignsubscriptnormal-Γ11\Gamma_{1}:=1roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := 1, and Γk:=(1αk)Γk1assignsubscriptnormal-Γ𝑘1subscript𝛼𝑘subscriptnormal-Γ𝑘1\Gamma_{k}:=(1-\alpha_{k})\Gamma_{k-1}roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := ( 1 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Γ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT for k=2,,N𝑘2normal-…𝑁k=2,\dots,Nitalic_k = 2 , … , italic_N. Let ζk:=1kassignsubscript𝜁𝑘1𝑘\zeta_{k}:=\frac{1}{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG italic_k end_ARG for k=2,,N𝑘2normal-…𝑁k=2,\dots,Nitalic_k = 2 , … , italic_N. Then,

𝔼[𝒢(𝒙Rmd,βR)2]96LEf[4LEf𝒙0𝒙*2N(N+1)(N+2)+LEf(𝒙*2+H2)+2D~2N],𝔼delimited-[]superscriptnorm𝒢superscriptsubscript𝒙𝑅𝑚𝑑subscript𝛽𝑅296subscript𝐿𝐸𝑓delimited-[]4subscript𝐿𝐸𝑓superscriptnormsubscript𝒙0superscript𝒙2𝑁𝑁1𝑁2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2𝑁\displaystyle\mathbb{E}[\|\mathcal{G}(\bm{x}_{R}^{md},{\beta_{R}})\|^{2}]\leq 9% 6L_{Ef}\left[\frac{4L_{Ef}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}}{N(N+1)(N+2)}+\frac{L_% {Ef}(\|\bm{x}^{*}\|^{2}+H^{2})+2\tilde{D}^{2}}{N}\right],blackboard_E [ ∥ caligraphic_G ( bold_italic_x start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 96 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT [ divide start_ARG 4 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N ( italic_N + 1 ) ( italic_N + 2 ) end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ] ,

where H:=max𝐱𝒞𝐱assign𝐻subscript𝐱𝒞norm𝐱H:=\max_{\bm{x}\in\mathcal{C}}\|\bm{x}\|italic_H := roman_max start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_C end_POSTSUBSCRIPT ∥ bold_italic_x ∥. Consequently, to obtain an ε𝜀\varepsilonitalic_ε-stationary point of Definition 4, we need at most O([LEf2𝐱0𝐱*2ε2]13+LEf2(𝐱*2+H2)+LEfD~2ε2)𝑂superscriptdelimited-[]superscriptsubscript𝐿𝐸𝑓2superscriptnormsubscript𝐱0superscript𝐱2superscript𝜀213superscriptsubscript𝐿𝐸𝑓2superscriptnormsuperscript𝐱2superscript𝐻2subscript𝐿𝐸𝑓superscriptnormal-~𝐷2superscript𝜀2O\left(\left[\frac{L_{Ef}^{2}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}}{\varepsilon^{2}}% \right]^{\frac{1}{3}}+\frac{L_{Ef}^{2}(\|\bm{x}^{*}\|^{2}+H^{2})+L_{Ef}\tilde{% D}^{2}}{\varepsilon^{2}}\right)italic_O ( [ divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) iterations.

The parameter D~~𝐷\tilde{D}over~ start_ARG italic_D end_ARG in Theorem 10 determines the balance between the minibatch size and the iteration complexity: a small D~~𝐷\tilde{D}over~ start_ARG italic_D end_ARG results in smaller iteration complexity but a larger minibatch size; a large D~~𝐷\tilde{D}over~ start_ARG italic_D end_ARG leads to a smaller minibatch size but larger iteration complexity.

Bottleneck of Algorithm 2.

The bottleneck of Algorithm 2 is line 7 because it requires 𝒙Pr(𝝃𝒙kmd)Pr(𝝃𝒙kmd)subscript𝒙Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}^{\ell}\mid\bm{x}_{k}^{md})}{\mathrm{% Pr}(\bm{\xi}^{\ell}\mid\bm{x}_{k}^{md})}divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG to be computed mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT times. This calculation takes a lot of time since mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has to be at least proportional to the number k𝑘kitalic_k of current iterations to obtain the convergence rate of Theorem 10.

4.5 Specialized Projected Stochastic Gradient Method for Price Optimization in Multi-agent Applications

To reduce the computation cost at the bottleneck of Algorithm 2, we propose a specialized projected stochastic gradient method that adds the following assumptions to (P).

Assumption 4.

𝔼𝝃D(𝒙)[s(𝒙,𝝃)]=s(𝒙,𝔼𝝃D(𝒙)[𝝃])subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑠𝒙𝝃𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[s(\bm{x},\bm{\xi})]=s(\bm{x},\mathbb{E}_{% \bm{\xi}\sim D(\bm{x})}[\bm{\xi}])blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_s ( bold_italic_x , bold_italic_ξ ) ] = italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) and c(𝛏)𝑐𝛏c(\bm{\xi})italic_c ( bold_italic_ξ ) is continuous.

Assumption 5.

The probability density function Pr(𝛏𝐱)normal-Prconditional𝛏𝐱\Pr(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is defined as ϕ(𝐩(𝐱),𝛏)italic-ϕ𝐩𝐱𝛏\phi(\bm{p}(\bm{x}),\bm{\xi})italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ), where ϕitalic-ϕ\phiitalic_ϕ is real-valued and differentiable w.r.t. 𝐩𝐩\bm{p}bold_italic_p, 𝐩ϕ(𝐩,𝛏)ϕ(𝐩,𝛏)subscriptnormal-∇𝐩italic-ϕ𝐩𝛏italic-ϕ𝐩𝛏\frac{\nabla_{\bm{p}}\phi(\bm{p},\bm{\xi})}{\phi(\bm{p},\bm{\xi})}divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p , bold_italic_ξ ) end_ARG is easily computed, and 𝐩𝐩\bm{p}bold_italic_p is vector-valued and differentiable.

The above assumptions are often satisfied in price optimization for multi-agent applications. In particular, Assumption 4 tends to hold because the sales function s𝑠sitalic_s in price optimization is usually linear with respect to 𝝃𝝃\bm{\xi}bold_italic_ξ and the cost function c𝑐citalic_c is usually continuous with respect to 𝝃𝝃\bm{\xi}bold_italic_ξ. Assumption 5 holds for many parameterized distributions (e.g., binomial, multinomial, and Poisson distributions) since the probability density function and its gradient can be simply written by its parameters. Many multi-agent applications satisfy Assumption 5 because the distribution of the demand follows a binomial or multinomial distribution with parameters 𝒑𝒑\bm{p}bold_italic_p, which represents the probabilities of each agent’s actions.

The following lemmas show that the applications with multiple agents described in Section 3.3 satisfy Assumptions 4 and 5.

Proposition 11.

The problem of multiproduct pricing satisfies Assumption 4. Moreover, let ϕ(𝐩(𝐱),𝛏):=i=0nCξimpi(𝐱)ξi.assignitalic-ϕ𝐩𝐱𝛏superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝐱subscript𝜉𝑖\phi(\bm{p}(\bm{x}),\bm{\xi}):=\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(\bm{x})^{% \xi_{i}}.italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) := ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . Then, Pr(𝛏𝐱)=ϕ(𝐩(𝐱),𝛏)normal-Prconditional𝛏𝐱italic-ϕ𝐩𝐱𝛏\Pr(\bm{\xi}\mid\bm{x})=\phi(\bm{p}(\bm{x}),\bm{\xi})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) and (𝐩ϕ(𝐩(𝐱),𝛏)ϕ(𝐩(𝐱),𝛏))k=ξkpk(𝐱).subscriptsubscriptnormal-∇𝐩italic-ϕ𝐩𝐱𝛏italic-ϕ𝐩𝐱𝛏𝑘subscript𝜉𝑘subscript𝑝𝑘𝐱\left(\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),% \bm{\xi})}\right)_{k}=\frac{\xi_{k}}{p_{k}(\bm{x})}.( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG .

Proposition 12.

The problem of congestion pricing for Hot lanes satisfies Assumption 4. Moreover, let ϕ(𝐩(𝐱),𝛏):=iICξidipi(xi)ξi(1pi(xi))diξi.assignitalic-ϕ𝐩𝐱𝛏subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖\phi(\bm{p}(\bm{x}),\bm{\xi}):=\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})% ^{\xi_{i}}(1-p_{i}(x_{i}))^{d_{i}-\xi_{i}}.italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) := ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . Then, Pr(𝛏𝐱)=ϕ(𝐩(𝐱),𝛏)normal-Prconditional𝛏𝐱italic-ϕ𝐩𝐱𝛏\Pr(\bm{\xi}\mid\bm{x})=\phi(\bm{p}(\bm{x}),\bm{\xi})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) and (𝐩ϕ(𝐩(𝐱),𝛏)ϕ(𝐩(𝐱),𝛏))k=ξkpk(xk)dkξk1pk(xk).subscriptsubscriptnormal-∇𝐩italic-ϕ𝐩𝐱𝛏italic-ϕ𝐩𝐱𝛏𝑘subscript𝜉𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘1subscript𝑝𝑘subscript𝑥𝑘\left(\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),% \bm{\xi})}\right)_{k}=\frac{\xi_{k}}{p_{k}(x_{k})}-\frac{d_{k}-\xi_{k}}{1-p_{k% }(x_{k})}.( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - divide start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG .

Below, we present the lemmas for our specialized method under Assumptions 15. Let cmax:=max𝝃Ξ|c(𝝃)|assignsubscript𝑐subscript𝝃Ξ𝑐𝝃c_{\max}:=\max_{\bm{\xi}\in\Xi}|c(\bm{\xi})|italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT | italic_c ( bold_italic_ξ ) |, which exists since ΞΞ\Xiroman_Ξ is compact from Assumption 2 and c(𝝃)𝑐𝝃c(\bm{\xi})italic_c ( bold_italic_ξ ) is continuous from Assumption 4.

Lemma 13.

Suppose that condition (ii) of Assumption 1 and Assumptions 4 and 5 hold. Moreover, suppose that condition (iii) of Assumption 1, Assumption 2, and Assumption 3 hold if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let 𝐠2(𝐱,𝛏,δ):=𝐱s(𝐱,𝔼𝛏D(𝐱)[𝛏])+(c(𝛏)δ)d𝐩(𝐱)d𝐱𝐩ϕ(𝐩(𝐱),𝛏)ϕ(𝐩(𝐱),𝛏).assignsubscript𝐠2𝐱superscript𝛏normal-′𝛿subscriptnormal-∇𝐱𝑠𝐱subscript𝔼similar-to𝛏𝐷𝐱delimited-[]𝛏𝑐superscript𝛏normal-′𝛿𝑑𝐩𝐱𝑑𝐱subscriptnormal-∇𝐩italic-ϕ𝐩𝐱superscript𝛏normal-′italic-ϕ𝐩𝐱superscript𝛏normal-′\bm{g}_{2}(\bm{x},\bm{\xi}^{\prime},\delta):=-\nabla_{\bm{x}}s(\bm{x},\mathbb{% E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])+(c(\bm{\xi}^{\prime})-\delta)\frac{d\bm% {p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime% })}{\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})}.bold_italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) := - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG . Then, 𝐠2(𝐱,𝛏,δ)subscript𝐠2𝐱superscript𝛏normal-′𝛿\bm{g}_{2}(\bm{x},\bm{\xi}^{\prime},\delta)bold_italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) is an unbiased stochastic gradient for (P).

Lemma 14.

Suppose that conditions (ii) and (iii) of Assumption 1, and Assumptions 2, 4, and 5 hold. Moreover, suppose that Assumption 3 holds if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let δ[cmax,cmax]𝛿subscript𝑐subscript𝑐\delta\in[-c_{\max},c_{\max}]italic_δ ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ]. Then, for all 𝐱𝒞𝐱𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C, 𝔼𝛏D(𝐱)[𝐠2(𝐱,𝛏,δ)𝐱𝔼𝛏D(𝐱)[f(𝐱,𝛏)]2]4(cmaxM)2,subscript𝔼similar-tosuperscript𝛏normal-′𝐷𝐱delimited-[]superscriptnormsubscript𝐠2𝐱superscript𝛏normal-′𝛿subscriptnormal-∇𝐱subscript𝔼similar-to𝛏𝐷𝐱delimited-[]𝑓𝐱𝛏24superscriptsubscript𝑐𝑀2\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|\bm{g}_{2}(\bm{x},\bm{\xi}^{% \prime},\delta)-\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},% \bm{\xi})]\|^{2}]\leq 4\left(c_{\max}M\right)^{2},blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ bold_italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 4 ( italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , where 𝐠2(𝐱,𝛏,δ)subscript𝐠2𝐱superscript𝛏normal-′𝛿\bm{g}_{2}(\bm{x},\bm{\xi}^{\prime},\delta)bold_italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ) is defined as in Lemma 13.

Now, let us examine Algorithm 3. Its computational cost is lower than that of Algorithm 2: Algorithm 2 requires mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT calculations of 𝒙Pr(𝝃𝒙kmd)Pr(𝝃𝒙kmd)subscript𝒙Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑Prconditionalsuperscript𝝃superscriptsubscript𝒙𝑘𝑚𝑑\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}^{\ell}\mid\bm{x}_{k}^{md})}{\mathrm{% Pr}(\bm{\xi}^{\ell}\mid\bm{x}_{k}^{md})}divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG, whereas Algorithm 3 requires mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT calculations of 𝒑ϕ(𝒑(𝒙kmd),𝝃)ϕ(𝒑(𝒙kmd),𝝃)subscript𝒑italic-ϕ𝒑superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃italic-ϕ𝒑superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}_{k}^{md}),\bm{\xi}^{\ell})}{\phi(\bm{p% }(\bm{x}_{k}^{md}),\bm{\xi}^{\ell})}divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) end_ARG, which can be easily computed from Assumption 5.

Algorithm 3 Specialized projected stochastic gradient algorithm
In Algorithm 2, let δ1[cmax,cmax]subscript𝛿1subscript𝑐subscript𝑐\delta_{1}\in[-c_{\max},c_{\max}]italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] in the input, replace line 7 by
𝒈k𝒙s(𝒙kmd,𝔼𝝃D(𝒙kmd)[𝝃])+d𝒑(𝒙kmd)d𝒙1mk=1mk(c(𝝃)δk)𝒑ϕ(𝒑(𝒙kmd),𝝃)ϕ(𝒑(𝒙kmd),𝝃),subscript𝒈𝑘subscript𝒙𝑠superscriptsubscript𝒙𝑘𝑚𝑑subscript𝔼similar-to𝝃𝐷superscriptsubscript𝒙𝑘𝑚𝑑delimited-[]𝝃𝑑𝒑superscriptsubscript𝒙𝑘𝑚𝑑𝑑𝒙1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑐superscript𝝃subscript𝛿𝑘subscript𝒑italic-ϕ𝒑superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃italic-ϕ𝒑superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃\displaystyle\bm{g}_{k}\leftarrow-\nabla_{\bm{x}}s(\bm{x}_{k}^{md},\mathbb{E}_% {\bm{\xi}\sim D(\bm{x}_{k}^{md})}[\bm{\xi}])+\frac{d\bm{p}(\bm{x}_{k}^{md})}{d% \bm{x}}\frac{1}{m_{k}}\sum_{\ell=1}^{m_{k}}\left(c(\bm{\xi}^{\ell})-\delta_{k}% \right)\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}_{k}^{md}),\bm{\xi}^{\ell})}{% \phi(\bm{p}(\bm{x}_{k}^{md}),\bm{\xi}^{\ell})},bold_italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + divide start_ARG italic_d bold_italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) - italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) end_ARG ,
and replace line 10 by
if kR1𝑘𝑅1k\leq R-1italic_k ≤ italic_R - 1 :  δk+1(1ζk+1)δk+ζk+1mk=1mkc(𝝃)subscript𝛿𝑘11subscript𝜁𝑘1subscript𝛿𝑘subscript𝜁𝑘1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑐superscript𝝃\delta_{k+1}\leftarrow(1-\zeta_{k+1})\delta_{k}+\frac{\zeta_{k+1}}{m_{k}}{% \displaystyle\sum_{\ell=1}^{m_{k}}}c(\bm{\xi}^{\ell})italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( 1 - italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + divide start_ARG italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )

Similarly to Algorithm 2, we can determine the convergence rate of Algorithm 3.

Theorem 15.

Suppose that Assumptions 1, 2, 4, and 5 hold. Moreover, suppose that Assumption 3 holds if 𝛏𝛏\bm{\xi}bold_italic_ξ is a continuous random vector. Let the inputs of Algorithm 3 other than mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be as in Theorem 10 and let mk:=4(cmaxM)2kLEfD~2assignsubscript𝑚𝑘4superscriptsubscript𝑐𝑀2𝑘subscript𝐿𝐸𝑓superscriptnormal-~𝐷2m_{k}:=\left\lceil\frac{4(c_{\max}M)^{2}k}{L_{Ef}\tilde{D}^{2}}\right\rceilitalic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := ⌈ divide start_ARG 4 ( italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⌉, where LEf:=Lf+fmaxMassignsubscript𝐿𝐸𝑓subscript𝐿𝑓subscript𝑓𝑀L_{Ef}:=L_{f}+f_{\max}Mitalic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M. Then, Algorithm 3 achieves the same convergence rate as in Theorem 10.

5 Experiments

We conducted experiments on an application of multiproduct pricing to show that Algorithm 3 outputs solutions with higher total revenues compared with the existing methods. We performed synthetic experiments and simulation experiments with real retail data from a supermarket service provider in Japan.111111We used publicly available data, “New Product Sales Ranking”, provided by KSP-SP Co., Ltd, http://www.ksp-sp.com. The details of our experiments are in Appenndix B.

We implemented the following methods.
Proposed Method: We implemented Algorithm 3 with αk:=10k+1,βk:=0.12m,λk:=kβk2formulae-sequenceassignsubscript𝛼𝑘10𝑘1formulae-sequenceassignsubscript𝛽𝑘0.12𝑚assignsubscript𝜆𝑘𝑘subscript𝛽𝑘2\alpha_{k}:=\frac{10}{k+1},\beta_{k}:=\frac{0.1}{2m},\lambda_{k}:=\frac{k\beta% _{k}}{2}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 10 end_ARG start_ARG italic_k + 1 end_ARG , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 0.1 end_ARG start_ARG 2 italic_m end_ARG , italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG italic_k italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG, mk=0.1kmsubscript𝑚𝑘0.1𝑘𝑚m_{k}=0.1kmitalic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0.1 italic_k italic_m, and ζk=1ksubscript𝜁𝑘1𝑘\zeta_{k}=\frac{1}{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG, where k𝑘kitalic_k is the current iteration number and m𝑚mitalic_m is the number of buyers.
Proposed Method (fixed δ𝛿\deltaitalic_δ): This is the proposed method with a fixed δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from the information at the initial iterate 𝒙0subscript𝒙0\bm{x}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Specifically, δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT was set to 1103=1103c(𝝃(𝒙0))1superscript103superscriptsubscript1superscript103𝑐superscript𝝃subscript𝒙0\frac{1}{10^{3}}\sum_{\ell=1}^{10^{3}}c(\bm{\xi}^{\ell}(\bm{x}_{0}))divide start_ARG 1 end_ARG start_ARG 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) for all k𝑘kitalic_k, where 𝝃(𝒙0)D(𝒙0)similar-tosuperscript𝝃subscript𝒙0𝐷subscript𝒙0\bm{\xi}^{\ell}(\bm{x}_{0})\sim D(\bm{x}_{0})bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).
Proposed Method (δ=0𝛿0\delta=0italic_δ = 0): This is the proposed method with δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT set to zero.
L2-Regularized Repeated Gradient Descent (L2-RGD(α𝛼\alphaitalic_α)) (Perdomo et al., 2020, Appendix E):
This method applies a repeated gradient descent (Perdomo et al., 2020, Section 3.3) to the objective function with a regularization term α2𝒙𝒙0𝛼2norm𝒙subscript𝒙0\frac{\alpha}{2}\|\bm{x}-\bm{x}_{0}\|divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∥ bold_italic_x - bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥, where 𝒙0subscript𝒙0\bm{x}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the initial point.121212This is introduced in (Perdomo et al., 2020, Appendix E) as a remedy for the retraining method for non-strongly convex objective functions. Note that the retraining method was originally intended for strongly convex objective functions. We implemented this method for several α𝛼\alphaitalic_α.
Bayesian Optimization (BO) (GPyOpt-authors, 2016): This method sequentially searches for points where the objective value is likely to be small and outputs the solution with the lowest objective value among the evaluated points. We used GPyOpt, a Python open-source library for Bayesian optimization (GPyOpt-authors, 2016).
Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall, 2005, 1998): This method updates the current iterate by using approximated gradient, which calculated by the difference between objective values of two perturbed iterates.
Projected Sub-gradient Descent for Average Demand (PSD-AD) (Boyd et al., 2003, Section 3): This is a projected subgradient descent method for deterministic pricing problems with average demand.

We performed our experiments under the following settings.
Initial points. For all methods other than BO, we set the initial points as 𝒙0:=0.5𝒆assignsubscript𝒙00.5𝒆\bm{x}_{0}:=0.5\bm{e}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := 0.5 bold_italic_e, where 𝒆n𝒆superscript𝑛\bm{e}\in\mathbb{R}^{n}bold_italic_e ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a vector with all elements equal to 1111. BO first evaluates five random points; then it runs the Bayesian optimization.
Metric. We computed 1103=1103(s(𝒙k,𝝃(𝒙k))+c(𝝃(𝒙k)))1superscript103superscriptsubscript1superscript103𝑠subscript𝒙𝑘superscript𝝃subscript𝒙𝑘𝑐superscript𝝃subscript𝒙𝑘\frac{1}{10^{3}}\sum_{\ell=1}^{10^{3}}\left(-s(\bm{x}_{k},\bm{\xi}^{\ell}(\bm{% x}_{k}))+c(\bm{\xi}^{\ell}(\bm{x}_{k}))\right)divide start_ARG 1 end_ARG start_ARG 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( - italic_s ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) + italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ) for each iterate 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where 𝝃(𝒙k)D(𝒙k)similar-tosuperscript𝝃subscript𝒙𝑘𝐷subscript𝒙𝑘\bm{\xi}^{\ell}(\bm{x}_{k})\sim D(\bm{x}_{k})bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and defined the smallest value among the iteration points as the Negative Expected Revenue (NER).
Termination criteria. We terminated all methods at a maximum computation time of 500500500500 seconds.

5.1 Synthetic Experiments

Synthetic Parameter Setup.

We performed experiments by varying each parameter from the following default settings. We set n:=20assign𝑛20n:=20italic_n := 20 and m:=200assign𝑚200m:=200italic_m := 200, which are the numbers of products and buyers, respectively. For each product, we let the minimum price xminsubscript𝑥x_{\min}italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT be 0.010.010.010.01 and the maximum price be xmax:=10assignsubscript𝑥10x_{\max}:=10italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT := 10. For the parameters of the function pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we generated αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each i𝑖iitalic_i from a uniform distribution of [0.01,1]0.011[0.01,1][ 0.01 , 1 ], and we let γi:=2π6αiassignsubscript𝛾𝑖2𝜋6subscript𝛼𝑖\gamma_{i}:=\frac{2\pi}{\sqrt{6}\alpha_{i}}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG 2 italic_π end_ARG start_ARG square-root start_ARG 6 end_ARG italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG and let a0:=0.25nassignsubscript𝑎00.25𝑛a_{0}:=0.25nitalic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := 0.25 italic_n. For the parameters of the function cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each i𝑖iitalic_i, we set ηi1:=2.0wiassignsubscriptsuperscript𝜂1𝑖2.0subscript𝑤𝑖\eta^{1}_{i}:=2.0w_{i}italic_η start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := 2.0 italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, ηi2:=wiassignsubscriptsuperscript𝜂2𝑖subscript𝑤𝑖\eta^{2}_{i}:=w_{i}italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and ηi3:=3.0wiassignsubscriptsuperscript𝜂3𝑖3.0subscript𝑤𝑖\eta^{3}_{i}:=3.0w_{i}italic_η start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := 3.0 italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT was generated from a uniform distribution of [0.25αi,0.5αi]0.25subscript𝛼𝑖0.5subscript𝛼𝑖[0.25\alpha_{i},0.5\alpha_{i}][ 0.25 italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0.5 italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. We let li:=0.5mnassignsubscript𝑙𝑖0.5𝑚𝑛l_{i}:=\frac{0.5m}{n}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG 0.5 italic_m end_ARG start_ARG italic_n end_ARG and ui:=1.5mnassignsubscript𝑢𝑖1.5𝑚𝑛u_{i}:=\frac{1.5m}{n}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG 1.5 italic_m end_ARG start_ARG italic_n end_ARG. We then varied m𝑚mitalic_m and n𝑛nitalic_n under these default settings.

Table 1: Results of synthetic experiments for 20 randomly generated problem instances. The NER (SD) column represents the average (standard deviation) of NER. The best value of the average NER for each experiment is in bold.
(n𝑛nitalic_n, m𝑚mitalic_m) Proposed
Proposed
(fixed δ𝛿\deltaitalic_δ)
Proposed
(δ=0𝛿0\delta=0italic_δ = 0)
L2-RGD
(α=0.1𝛼0.1\alpha=0.1italic_α = 0.1)
L2-RGD
(α=1𝛼1\alpha=1italic_α = 1)
L2-RGD
(α=10𝛼10\alpha=10italic_α = 10)
BO SPSA PSD-AD
NER SD NER SD NER SD NER SD NER SD NER SD NER SD NER SD NER SD
(20,200)20200(20,200)( 20 , 200 ) -56.3 5.4 -54.9 5.5 -54.7 5.2 -28.0 10.6 -28.2 10.7 -28.2 10.7 -22.3 6.2 -33.7 17.6 -45.9 8.9
(10,200)10200(10,200)( 10 , 200 ) -55.4 8.4 -54.5 8.8 -54.4 8.3 -8.2 23.0 -9.8 23.2 -11.9 23.4 -34.4 10.6 -46.8 13.4 -31.8 13.9
(40,200)40200(40,200)( 40 , 200 ) -56.6 3.2 -54.7 3.5 -54.1 3.4 -24.5 9.5 -24.4 9.5 -24.4 9.6 -14.6 3.9 -1.7 13.0 -47.4 3.7
(20,100)20100(20,100)( 20 , 100 ) -26.9 3.7 -26.2 3.8 -26.1 3.7 -8.8 6.8 -8.8 6.8 -8.8 6.8 -11.7 4.0 -4.6 6.3 -20.4 5.1
(20,400)20400(20,400)( 20 , 400 ) -106.9 14.2 -104.4 14.6 -103.4 14.0 -38.5 28.5 -38.5 28.2 -39.2 26.8 -37.8 7.9 -36.4 17.5 -79.4 21.0
Experimental Results

Table 1 shows the results of the simulation experiments with different parameter values. The proposed method outperformed the baselines in terms of NER for all parameters, for the following reasons: (i) Proposed (fixed δ𝛿\deltaitalic_δ) and Proposed (δ=0𝛿0\delta=0italic_δ = 0) converged to low-quality local solutions because the variance of the gradient was larger than that of the proposed method; (ii) L2-RGD continued to increase prices without considering the effect of prices on the probability distribution, as shown in Example 1 in Section 2.2, which led to unreasonably high prices; (iii) BO did not adequately explore 𝒙𝒙\bm{x}bold_italic_x because it took a lot of time to evaluate the objective value at each search point; (iv) SPSA did not accurately estimate the gradient because the noise in the gradient was too large; (v) PSD-AD ignored demand uncertainty, which increases the objective value since over/under demand occurs stochastically and causes unprofitable costs.

5.2 Simulation Experiments with Real Data

Data Set and Parameter Setup

We used retail data from a supermarket service provider in Japan. This data records the average sales prices of top-selling new products in food supermarkets. We targeted sales data for n=50𝑛50n=50italic_n = 50 different confectionery products for randomly selected weeks from 2022. We set the recorded average selling price as the general value αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each product i𝑖iitalic_i. The other parameters were set the same as in the synthetic experiment. Since the parameter wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each i𝑖iitalic_i was generated randomly, experiments were performed on 20 problem instances for each week’s data.

Table 2: Results of simulation experiments with real data for 20 randomly generated problem instances. The NER (SD) column represents the average (standard deviation) of NER. The best value of the average NER for each experiment is in bold.
date Proposed
Proposed
(fixed δ𝛿\deltaitalic_δ)
Proposed
(δ=0𝛿0\delta=0italic_δ = 0)
L2-RGD
(α=0.1𝛼0.1\alpha=0.1italic_α = 0.1)
L2-RGD
(α=1𝛼1\alpha=1italic_α = 1)
L2-RGD
(α=10𝛼10\alpha=10italic_α = 10)
BO SPSA PSD-AD
NER SD NER SD NER SD NER SD NER SD NER SD NER SD NER SD NER SD
02/21–02/27 -28.1 1.0 -21.5 1.4 -25.3 1.0 13.8 18.2 8.8 19.0 2.2 16.3 -8.5 2.1 10.3 8.7 -9.0 2.5
03/21–03/27 -20.6 0.7 -20.1 1.0 -18.5 1.0 -7.5 3.4 -7.5 3.4 -7.6 3.4 -4.4 0.7 -10.4 3.3 -17.7 1.8
05/23–05/29 -22.6 0.9 -17.8 1.8 -20.2 1.0 12.4 7.1 12.3 7.1 12.1 7.9 -6.1 1.6 -1.2 6.6 -10.2 3.2
06/20–06/26 -32.3 2.1 -21.6 3.6 -28.8 2.4 79.2 39.5 55.0 58.9 53.3 57.9 -14.1 4.1 30.6 15.4 -8.1 5.6
08/08–08/14 -33.6 0.9 -31.7 1.0 -31.2 1.0 -24.5 3.8 -24.5 3.7 -24.6 3.7 -7.2 1.8 -11.1 3.4 -29.4 1.6
09/19–09/25 -31.3 1.5 -23.9 3.4 -28.5 1.8 0.0 24.4 -6.4 22.1 -10.3 18.5 -9.8 2.3 11.6 7.4 -13.5 5.3
12/05–12/11 -73.0 3.2 -66.0 3.9 -71.1 3.4 172.3 30.7 152.6 44.0 146.2 37.2 -37.9 7.0 72.5 22.3 -28.9 10.8
Experimental Results

Table 2 shows the results of the experiments using real data from different weeks. The proposed method was superior to the baseline in terms of NER for all weeks of data.

6 Conclusion

We formulated a new price optimization problem with decision-dependent uncertainty to address the drawbacks of existing formulations that (i) cannot deal with decision-dependent demand uncertainty, (ii) require discontinuous functions to define buyers’ discrete actions, or (iii) have limited applications due to specific assumptions. Moreover, we developed a projected stochastic gradient descent method by deriving an unbiased stochastic gradient with a variance reduction parameter. Our method is guaranteed to converge to an ε𝜀\varepsilonitalic_ε-stationary point. Synthetic experiments and simulation experiments with real data confirmed the effectiveness of our formulation and method.

Our formulation and results suggest directions for further research. The first is to construct methods to find a globally optimal solution rather than a stationary point (e.g., incorporating multi-start techniques (György and Kocsis, 2011) into our methods or building fast Bayesian optimization under more specific assumptions). The second is analyzing the performance of our method when some of our assumptions are relaxed. This would include analyzing the performance when the probability density function is not differentiable and smoothed with the existing techniques.

References

  • Basciftci et al. [2021] B. Basciftci, S. Ahmed, and S. Shen. Distributionally robust facility location problem under decision-dependent stochastic demand. European Journal of Operational Research, 292(2):548–561, 2021.
  • Bertsimas and de Boer [2005] D. Bertsimas and S. de Boer. Special issue papers: Dynamic pricing and inventory control for multiple products. Journal of Revenue and Pricing Management, 3:303–319, 2005.
  • Besbes et al. [2015] O. Besbes, Y. Gur, and A. Zeevi. Non-stationary stochastic optimization. Operations research, 63(5):1227–1244, 2015.
  • Boyd et al. [2003] S. Boyd, L. Xiao, and A. Mutapcic. Subgradient methods. lecture notes of EE392o, Stanford University, Autumn Quarter, 2004:2004–2005, 2003.
  • Brochu et al. [2010] E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
  • Chawla et al. [2010] S. Chawla, J. D. Hartline, D. L. Malec, and B. Sivan. Multi-parameter mechanism design and sequential posted pricing. In STOC, pages 311–320, 2010.
  • Chen and Mangasarian [1996] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97–138, 1996.
  • Chen [2012] X. Chen. Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1):71–99, 2012.
  • Correa et al. [2017] J. Correa, P. Foncea, R. Hoeksma, T. Oosterwijk, and T. Vredeveld. Posted price mechanisms for a random stream of customers. In EC, page 169–186, 2017.
  • Croissant [2012] Y. Croissant. Estimation of multinomial logit models in r: The mlogit packages. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=03dbc1728d3860d239132b5af95367d4a5b273c3, 2012.
  • Dong et al. [2017] C. Dong, C. T. Ng, and T. Cheng. Electricity time-of-use tariff with stochastic demand. Production and Operations Management, 26(1):64–79, 2017.
  • Ferreira et al. [2016] K. J. Ferreira, B. H. A. Lee, and D. Simchi-Levi. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18(1):69–88, 2016.
  • Flaxman et al. [2005] A. Flaxman, A. T. Kalai, and H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pages 385––94, 2005.
  • Frazier [2018] P. I. Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  • Gallego and Wang [2014] G. Gallego and R. Wang. Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Operations Research, 62(2):450–461, 2014.
  • Ghadimi and Lan [2016] S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1):59–99, 2016.
  • GPyOpt-authors [2016] T. GPyOpt-authors. Gpyopt: A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt, 2016.
  • György and Kocsis [2011] A. György and L. Kocsis. Efficient multi-start strategies for local search algorithms. Journal of Artificial Intelligence Research, 41:407–444, 2011.
  • He et al. [2009] Y. He, X. Zhao, L. Zhao, and J. He. Coordinating a supply chain with effort and price dependent stochastic demand. Applied Mathematical Modelling, 33(6):2777–2790, 2009.
  • Hellemo et al. [2018] L. Hellemo, P. I. Barton, and A. Tomasgard. Decision-dependent probabilities in stochastic programs with recourse. Computational Management Science, 15(3):369–395, 2018.
  • Heydari and Norouzinasab [2015] J. Heydari and Y. Norouzinasab. A two-level discount model for coordinating a decentralized supply chain considering stochastic price-sensitive demand. Journal of Industrial Engineering International, 11:531–542, 2015.
  • Hikima et al. [2021] Y. Hikima, Y. Akagi, H. Kim, M. Kohjima, T. Kurashima, and H. Toda. Integrated optimization of bipartite matching and its stochastic behavior: New formulation and approximation algorithm via min-cost flow optimization. In AAAI, pages 3796–3805, 2021.
  • Hikima et al. [2022] Y. Hikima, Y. Akagi, N. Marumo, and H. Kim. Online matching with controllable rewards and arrival probabilities. In IJCAI, pages 1825–1833, 2022.
  • Hikima et al. [2023] Y. Hikima, Y. Akagi, H. Kim, and T. Asami. An improved approximation algorithm for wage determination and online task allocation in crowd-sourcing. In AAAI, pages 3977–3986, 2023.
  • Koushik et al. [2012] D. Koushik, J. A. Higbie, and C. Eister. Retail price optimization at intercontinental hotels group. Interfaces, 42(1):45–57, 2012.
  • Li and Li [2018] Z. Li and J. Li. A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. In NeurIPS, pages 5569––5579, 2018.
  • Lou et al. [2011] Y. Lou, Y. Yin, and J. A. Laval. Optimal dynamic pricing strategies for high-occupancy/toll lanes. Transportation Research Part C: Emerging Technologies, 19(1):64–74, 2011.
  • Luo and Mehrotra [2020] F. Luo and S. Mehrotra. Distributionally robust optimization with decision dependent ambiguity sets. Optimization Letters, 14:2565–2594, 2020.
  • Mendler-Dünner et al. [2020] C. Mendler-Dünner, J. Perdomo, T. Zrnic, and M. Hardt. Stochastic optimization for performative prediction. In NeurIPS, pages 4929–4939, 2020.
  • Miller et al. [2021] J. P. Miller, J. C. Perdomo, and T. Zrnic. Outside the echo chamber: Optimizing the performative risk. In ICML, pages 7710–7720, 2021.
  • Perdomo et al. [2020] J. Perdomo, T. Zrnic, C. Mendler-Dünner, and M. Hardt. Performative prediction. In ICML, pages 7599–7609, 2020.
  • Royden and Fitzpatrick [1988] H. L. Royden and P. Fitzpatrick. Real analysis, volume 32. Macmillan New York, 1988.
  • Schulte and Sachs [2020] B. Schulte and A.-L. Sachs. The price-setting newsvendor with poisson demand. European Journal of Operational Research, 283(1):125–137, 2020.
  • Spall [1998] J. C. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on aerospace and electronic systems, 34(3):817–823, 1998.
  • Spall [2005] J. C. Spall. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley & Sons, 2005.
  • Sutton and Barto [2018] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
  • Swiler et al. [2020] L. P. Swiler, M. Gulian, A. L. Frankel, C. Safta, and J. D. Jakeman. A survey of constrained gaussian process regression: Approaches and implementation challenges. Journal of Machine Learning for Modeling and Computing, 1(2), 2020.
  • Varaiya and Wets [1989] P. Varaiya and R. J. Wets. Stochastic dynamic optimization, approaches and computation. In Mathematical Programming, Recent Developments and Applications, 1989.
  • Wang and Wang [2019] X.-z. Wang and G.-q. Wang. Integrating dynamic pricing and inventory control for fresh-agri product under consumer choice. Australian Economic Papers, 58(1):96–111, 2019.
  • Williams [1992] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  • Zhang et al. [2018] H. Zhang, P. Rusmevichientong, and H. Topaloglu. Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters. Operations Research, 66(6):1559–1570, 2018.

Appendix A Proofs

A.1 Proof of Proposition 1

Proof.

Assumption 2 holds since 𝒞=[xmin,xmax]n𝒞superscriptsubscript𝑥subscript𝑥𝑛\mathcal{C}=[x_{\min},x_{\max}]^{n}caligraphic_C = [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃{0,1,,m}n+1𝝃superscript01𝑚𝑛1\bm{\xi}\in\{0,1,\dots,m\}^{n+1}bold_italic_ξ ∈ { 0 , 1 , … , italic_m } start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT. Therefore, we give proof for each of (i)–(iii) in Assumption 1.
(i) From definitions of s𝑠sitalic_s and c𝑐citalic_c, the function f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ. Moreover, 𝒙f(𝒙,𝝃)=𝒙(s(𝒙,𝝃)+c(𝝃))=𝒙s(𝒙,𝝃)i=1n|ξi|mnormsubscript𝒙𝑓𝒙𝝃normsubscript𝒙𝑠𝒙𝝃𝑐𝝃normsubscript𝒙𝑠𝒙𝝃superscriptsubscript𝑖1𝑛subscript𝜉𝑖𝑚\|\nabla_{\bm{x}}f(\bm{x},\bm{\xi})\|=\|\nabla_{\bm{x}}\left(-s(\bm{x},\bm{\xi% })+c(\bm{\xi})\right)\|=\|-\nabla_{\bm{x}}s(\bm{x},\bm{\xi})\|\leq\sum_{i=1}^{% n}|\xi_{i}|\leq m∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) ∥ = ∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( - italic_s ( bold_italic_x , bold_italic_ξ ) + italic_c ( bold_italic_ξ ) ) ∥ = ∥ - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , bold_italic_ξ ) ∥ ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_m, where the second inequality is due to the fact that the total demand for all products never exceeds the number m𝑚mitalic_m of buyers. Therefore, f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is Lipschitz continuous with modulus Lf=msubscript𝐿𝑓𝑚L_{f}=mitalic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_m.

(ii) Since Pr(𝝃𝒙)=i=0nCξimpi(𝒙)ξiPrconditional𝝃𝒙superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖\Pr(\bm{\xi}\mid\bm{x})=\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(\bm{x})^{\xi_{i}}roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the function Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and Pr(𝝃𝒙)0Prconditional𝝃𝒙0\mathrm{Pr}(\bm{\xi}\mid\bm{x})\neq 0roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ≠ 0 for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ from the definition of pi(𝒙)subscript𝑝𝑖𝒙p_{i}(\bm{x})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) for each i{0,1,,n}𝑖01𝑛i\in\{0,1,\dots,n\}italic_i ∈ { 0 , 1 , … , italic_n }.

(iii) We have 0<pi(𝒙)<10subscript𝑝𝑖𝒙10<p_{i}(\bm{x})<10 < italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) < 1 for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and i{0,1,,n}𝑖01𝑛i\in\{0,1,\dots,n\}italic_i ∈ { 0 , 1 , … , italic_n } from the definition of pi(𝒙)subscript𝑝𝑖𝒙p_{i}(\bm{x})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) for each i{0,1,,n}𝑖01𝑛i\in\{0,1,\dots,n\}italic_i ∈ { 0 , 1 , … , italic_n }. Then, since (𝒙Pr(𝝃𝒙))k=i=0nCξimpi(𝒙)ξij=0nξjpj(𝒙)pj(𝒙)xksubscriptsubscript𝒙Prconditional𝝃𝒙𝑘superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖superscriptsubscript𝑗0𝑛subscript𝜉𝑗subscript𝑝𝑗𝒙subscript𝑝𝑗𝒙subscript𝑥𝑘\left(\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})\right)_{k}=\prod_{i=0}^{n% }{}_{m}C_{\xi_{i}}p_{i}(\bm{x})^{\xi_{i}}\sum_{j=0}^{n}\frac{\xi_{j}}{p_{j}(% \bm{x})}\frac{\partial p_{j}(\bm{x})}{\partial x_{k}}( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG, we have

(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))k=j=0nξjpj(𝒙)pj(𝒙)xk.subscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘superscriptsubscript𝑗0𝑛subscript𝜉𝑗subscript𝑝𝑗𝒙subscript𝑝𝑗𝒙subscript𝑥𝑘\displaystyle\left(\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}=\sum_{j=0}^{n}\frac{\xi_{j}}{p_{j}% (\bm{x})}\frac{\partial p_{j}(\bm{x})}{\partial x_{k}}.( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG . (2)

Let ai(x):=eγi(αix)assignsubscript𝑎𝑖𝑥superscript𝑒subscript𝛾𝑖subscript𝛼𝑖𝑥a_{i}(x):=e^{\gamma_{i}(\alpha_{i}-x)}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) := italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x ) end_POSTSUPERSCRIPT for all iI(={1,2,,n})𝑖annotated𝐼absent12𝑛i\in I\ (=\{1,2,\dots,n\})italic_i ∈ italic_I ( = { 1 , 2 , … , italic_n } ). Then, ai(x)x=γiai(x)subscript𝑎𝑖𝑥𝑥subscript𝛾𝑖subscript𝑎𝑖𝑥\frac{\partial a_{i}(x)}{\partial x}=-\gamma_{i}a_{i}(x)divide start_ARG ∂ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x end_ARG = - italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ). For kI𝑘𝐼k\in Iitalic_k ∈ italic_I,

pk(𝒙)xksubscript𝑝𝑘𝒙subscript𝑥𝑘\displaystyle\frac{\partial p_{k}(\bm{x})}{\partial x_{k}}divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG =ak(xk)xk(a0+iIeγi(αixi))ak(xk)ak(xk)xk(a0+iIeγi(αixi))2=γkak(xk)(a0+iIeγi(αixi)ak(xk))(a0+iIeγi(αixi))2absentsubscript𝑎𝑘subscript𝑥𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎𝑘subscript𝑥𝑘subscript𝑎𝑘subscript𝑥𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2subscript𝛾𝑘subscript𝑎𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2\displaystyle=\frac{\frac{\partial a_{k}(x_{k})}{\partial x_{k}}(a_{0}+\sum_{i% \in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})-a_{k}(x_{k})\frac{\partial a_{k}(x_{k}% )}{\partial x_{k}}}{(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})^{2}}% =\frac{-\gamma_{k}a_{k}(x_{k})(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{% i})}-a_{k}(x_{k}))}{(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})^{2}}= divide start_ARG divide start_ARG ∂ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) divide start_ARG ∂ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=γkak(xk)a0+iIeγi(αixi)a0+iIeγi(αixi)ak(xk)a0+iIeγi(αixi)=γkpk(𝒙)(1pk(𝒙)).absentsubscript𝛾𝑘subscript𝑎𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝛾𝑘subscript𝑝𝑘𝒙1subscript𝑝𝑘𝒙\displaystyle=-\gamma_{k}\frac{a_{k}(x_{k})}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(% \alpha_{i}-x_{i})}}\frac{a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})}-a% _{k}(x_{k})}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})}}=-\gamma_{k}p% _{k}(\bm{x})(1-p_{k}(\bm{x})).= - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG = - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ( 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ) . (3)

For kI𝑘𝐼k\in Iitalic_k ∈ italic_I and jI{k}𝑗𝐼𝑘j\in I\setminus\{k\}italic_j ∈ italic_I ∖ { italic_k },

pj(𝒙)xksubscript𝑝𝑗𝒙subscript𝑥𝑘\displaystyle\frac{\partial p_{j}(\bm{x})}{\partial x_{k}}divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG =aj(xj)ak(xk)xk(a0+iIeγi(αixi))2=aj(xj)(γkak(xk))(a0+iIeγi(αixi))2absentsubscript𝑎𝑗subscript𝑥𝑗subscript𝑎𝑘subscript𝑥𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2subscript𝑎𝑗subscript𝑥𝑗subscript𝛾𝑘subscript𝑎𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2\displaystyle=\frac{-a_{j}(x_{j})\frac{\partial a_{k}(x_{k})}{\partial x_{k}}}% {(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})^{2}}=\frac{-a_{j}(x_{j}% )(-\gamma_{k}a_{k}(x_{k}))}{(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i}% )})^{2}}= divide start_ARG - italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) divide start_ARG ∂ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG - italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=γkaj(xj)a0+iIeγi(αixi)ak(xk)a0+iIeγi(αixi)=γkpj(𝒙)pk(𝒙).absentsubscript𝛾𝑘subscript𝑎𝑗subscript𝑥𝑗subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝛾𝑘subscript𝑝𝑗𝒙subscript𝑝𝑘𝒙\displaystyle=\gamma_{k}\frac{a_{j}(x_{j})}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(% \alpha_{i}-x_{i})}}\frac{a_{k}(x_{k})}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha% _{i}-x_{i})}}=\gamma_{k}p_{j}(\bm{x})p_{k}(\bm{x}).= italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG = italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) . (4)

For kI𝑘𝐼k\in Iitalic_k ∈ italic_I,

p0(𝒙)xksubscript𝑝0𝒙subscript𝑥𝑘\displaystyle\frac{\partial p_{0}(\bm{x})}{\partial x_{k}}divide start_ARG ∂ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG =a0ak(xk)xk(a0+iIeγi(αixi))2=a0(γkak(xk))(a0+iIeγi(αixi))2absentsubscript𝑎0subscript𝑎𝑘subscript𝑥𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2subscript𝑎0subscript𝛾𝑘subscript𝑎𝑘subscript𝑥𝑘superscriptsubscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖2\displaystyle=\frac{-a_{0}\frac{\partial a_{k}(x_{k})}{\partial x_{k}}}{(a_{0}% +\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})^{2}}=\frac{-a_{0}(-\gamma_{k}a% _{k}(x_{k}))}{(a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{i})})^{2}}= divide start_ARG - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG ∂ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) end_ARG start_ARG ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=γka0a0+iIeγi(αixi)ak(xk)a0+iIeγi(αixi)=γkp0(𝒙)pk(𝒙).absentsubscript𝛾𝑘subscript𝑎0subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑎𝑘subscript𝑥𝑘subscript𝑎0subscript𝑖𝐼superscript𝑒subscript𝛾𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝛾𝑘subscript𝑝0𝒙subscript𝑝𝑘𝒙\displaystyle=\gamma_{k}\frac{a_{0}}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{% i}-x_{i})}}\frac{a_{k}(x_{k})}{a_{0}+\sum_{i\in I}e^{\gamma_{i}(\alpha_{i}-x_{% i})}}=\gamma_{k}p_{0}(\bm{x})p_{k}(\bm{x}).= italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG = italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) . (5)

From (2)–(5),

(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))ksubscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘\displaystyle\left(\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =jI{k}(ξjpj(𝒙)γkpk(𝒙)pj(𝒙))ξkpk(𝒙)γkpk(𝒙)(1pk(𝒙))+ξ0p0(𝒙)γkp0(𝒙)pk(𝒙)absentsubscript𝑗𝐼𝑘subscript𝜉𝑗subscript𝑝𝑗𝒙subscript𝛾𝑘subscript𝑝𝑘𝒙subscript𝑝𝑗𝒙subscript𝜉𝑘subscript𝑝𝑘𝒙subscript𝛾𝑘subscript𝑝𝑘𝒙1subscript𝑝𝑘𝒙subscript𝜉0subscript𝑝0𝒙subscript𝛾𝑘subscript𝑝0𝒙subscript𝑝𝑘𝒙\displaystyle=\sum_{j\in I\setminus\{k\}}\left(\frac{\xi_{j}}{p_{j}(\bm{x})}% \gamma_{k}p_{k}(\bm{x})p_{j}(\bm{x})\right)-\frac{\xi_{k}}{p_{k}(\bm{x})}% \gamma_{k}p_{k}(\bm{x})(1-p_{k}(\bm{x}))+\frac{\xi_{0}}{p_{0}(\bm{x})}\gamma_{% k}p_{0}(\bm{x})p_{k}(\bm{x})= ∑ start_POSTSUBSCRIPT italic_j ∈ italic_I ∖ { italic_k } end_POSTSUBSCRIPT ( divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) ) - divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ( 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ) + divide start_ARG italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x )
=jI{0}{k}ξjγkpk(𝒙)ξkγk(1pk(𝒙))jI{0}|ξj||γk|mγmax,absentsubscript𝑗𝐼0𝑘subscript𝜉𝑗subscript𝛾𝑘subscript𝑝𝑘𝒙subscript𝜉𝑘subscript𝛾𝑘1subscript𝑝𝑘𝒙subscript𝑗𝐼0subscript𝜉𝑗subscript𝛾𝑘𝑚superscript𝛾\displaystyle=\sum_{j\in I\cup\{0\}\setminus\{k\}}\xi_{j}\gamma_{k}p_{k}(\bm{x% })-\xi_{k}\gamma_{k}(1-p_{k}(\bm{x}))\leq\sum_{j\in I\cup\{0\}}|\xi_{j}||% \gamma_{k}|\leq m\gamma^{\max},= ∑ start_POSTSUBSCRIPT italic_j ∈ italic_I ∪ { 0 } ∖ { italic_k } end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ) ≤ ∑ start_POSTSUBSCRIPT italic_j ∈ italic_I ∪ { 0 } end_POSTSUBSCRIPT | italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ italic_m italic_γ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ,

where the first inequality follows from 0<pk(𝒙)<10subscript𝑝𝑘𝒙10<p_{k}(\bm{x})<10 < italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) < 1 for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The second inequality follows from the definition whereby jI{0}|ξj|subscript𝑗𝐼0subscript𝜉𝑗\sum_{j\in I\cup\{0\}}|\xi_{j}|∑ start_POSTSUBSCRIPT italic_j ∈ italic_I ∪ { 0 } end_POSTSUBSCRIPT | italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | is equal to the number m𝑚mitalic_m of buyers. Then, for all xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ, 𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)k=1n|(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))k|nmγmaxnormsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙superscriptsubscript𝑘1𝑛subscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘𝑛𝑚superscript𝛾\left\|\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{% \xi}\mid\bm{x})}\right\|\leq\sum_{k=1}^{n}\left|\left(\frac{\nabla_{\bm{x}}% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}% \right|\leq nm\gamma^{\max}∥ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ italic_n italic_m italic_γ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT. ∎

A.2 Proof of Proposition 2

Proof.

Assumption 2 holds since 𝒞=[xmin,xmax]I𝒞superscriptsubscript𝑥subscript𝑥𝐼\mathcal{C}=[x_{\min},x_{\max}]^{I}caligraphic_C = [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT and ξi{0,1,,di}subscript𝜉𝑖01subscript𝑑𝑖\xi_{i}\in\{0,1,\dots,d_{i}\}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 , … , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } for all iI𝑖𝐼i\in Iitalic_i ∈ italic_I. Therefore, we give proof for each of (i)–(iii) in Assumption 1.
(i) From the definition, the value of f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is independent of 𝒙𝒙\bm{x}bold_italic_x. Therefore, f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is differentiable and Lipschitz continuous with modulus Lf=0subscript𝐿𝑓0L_{f}=0italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 0 w.r.t. 𝒙𝒙\bm{x}bold_italic_x. Moreover, f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ from the definition since qHsubscript𝑞𝐻q_{H}italic_q start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT, qRsubscript𝑞𝑅q_{R}italic_q start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, and k𝑘kitalic_k are continuous functions.

(ii) Since Pr(𝝃𝒙)=iICξidipi(xi)ξi(1pi(xi))diξiPrconditional𝝃𝒙subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖\Pr(\bm{\xi}\mid\bm{x})=\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})^{\xi_{% i}}(1-p_{i}(x_{i}))^{d_{i}-\xi_{i}}roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x from the definition of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Moreover, since 0<pi(x)<10subscript𝑝𝑖𝑥10<p_{i}(x)<10 < italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) < 1 for all x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R and iI𝑖𝐼i\in Iitalic_i ∈ italic_I, Pr(𝝃𝒙)0Prconditional𝝃𝒙0\mathrm{Pr}(\bm{\xi}\mid\bm{x})\neq 0roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ≠ 0 for all (𝒙,𝝃)n×Ξ𝒙𝝃superscript𝑛Ξ(\bm{x},\bm{\xi})\in\mathbb{R}^{n}\times\Xi( bold_italic_x , bold_italic_ξ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × roman_Ξ.

(iii) For all x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R and iI𝑖𝐼i\in Iitalic_i ∈ italic_I, the definition of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT gives 0<pi(x)<10subscript𝑝𝑖𝑥10<p_{i}(x)<10 < italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) < 1. Then, since

(𝒙Pr(𝝃𝒙))ksubscriptsubscript𝒙Prconditional𝝃𝒙𝑘\displaystyle\left(\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})\right)_{k}( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =iICξidipi(xi)ξi(1pi(xi))diξi(ξkpk(xk)pk(xk)(dkξk)pk(xk)1pk(xk)),absentsubscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖subscript𝜉𝑘subscriptsuperscript𝑝𝑘subscript𝑥𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘subscriptsuperscript𝑝𝑘subscript𝑥𝑘1subscript𝑝𝑘subscript𝑥𝑘\displaystyle=\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})^{\xi_{i}}(1-p_{i% }(x_{i}))^{d_{i}-\xi_{i}}\left(\frac{\xi_{k}p^{\prime}_{k}(x_{k})}{p_{k}(x_{k}% )}-\frac{(d_{k}-\xi_{k})p^{\prime}_{k}(x_{k})}{1-p_{k}(x_{k})}\right),= ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - divide start_ARG ( italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ) ,

we have

(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))k=ξkpk(xk)pk(xk)(dkξk)pk(xk)1pk(xk).subscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘subscript𝜉𝑘subscriptsuperscript𝑝𝑘subscript𝑥𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘subscriptsuperscript𝑝𝑘subscript𝑥𝑘1subscript𝑝𝑘subscript𝑥𝑘\displaystyle\left(\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}=\frac{\xi_{k}p^{\prime}_{k}(x_{k})% }{p_{k}(x_{k})}-\frac{(d_{k}-\xi_{k})p^{\prime}_{k}(x_{k})}{1-p_{k}(x_{k})}.( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - divide start_ARG ( italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG . (6)

Here,

pk(x)subscriptsuperscript𝑝𝑘𝑥\displaystyle p^{\prime}_{k}(x)italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) =βkeαkhk+βkx+γk(1+eαkhk+βkx+γk)2=βk1(1+eαkhk+βkx+γk)eαkhk+βkx+γk(1+eαkhk+βkx+γk)=βkpk(x)(1pk(x)).absentsubscript𝛽𝑘superscript𝑒subscript𝛼𝑘subscript𝑘subscript𝛽𝑘𝑥subscript𝛾𝑘superscript1superscript𝑒subscript𝛼𝑘subscript𝑘subscript𝛽𝑘𝑥subscript𝛾𝑘2subscript𝛽𝑘11superscript𝑒subscript𝛼𝑘subscript𝑘subscript𝛽𝑘𝑥subscript𝛾𝑘superscript𝑒subscript𝛼𝑘subscript𝑘subscript𝛽𝑘𝑥subscript𝛾𝑘1superscript𝑒subscript𝛼𝑘subscript𝑘subscript𝛽𝑘𝑥subscript𝛾𝑘subscript𝛽𝑘subscript𝑝𝑘𝑥1subscript𝑝𝑘𝑥\displaystyle=\frac{-\beta_{k}e^{\alpha_{k}h_{k}+\beta_{k}x+\gamma_{k}}}{(1+e^% {\alpha_{k}h_{k}+\beta_{k}x+\gamma_{k}})^{2}}=-\beta_{k}\frac{1}{(1+e^{\alpha_% {k}h_{k}+\beta_{k}x+\gamma_{k}})}\frac{e^{\alpha_{k}h_{k}+\beta_{k}x+\gamma_{k% }}}{(1+e^{\alpha_{k}h_{k}+\beta_{k}x+\gamma_{k}})}=-\beta_{k}p_{k}(x)(1-p_{k}(% x)).= divide start_ARG - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG divide start_ARG italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_x + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG = - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) ( 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) ) . (7)

From (6) and (7),

(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))ksubscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘\displaystyle\left(\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =βk(1pk(xk))ξk+βkpk(xk)(dkξk)=βk(dkpk(xk)ξk)absentsubscript𝛽𝑘1subscript𝑝𝑘subscript𝑥𝑘subscript𝜉𝑘subscript𝛽𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘subscript𝛽𝑘subscript𝑑𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝜉𝑘\displaystyle=-\beta_{k}(1-p_{k}(x_{k}))\xi_{k}+\beta_{k}p_{k}(x_{k})(d_{k}-% \xi_{k})=\beta_{k}(d_{k}p_{k}(x_{k})-\xi_{k})= - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
|βk||dkpk(xk)ξk||βk|dk,absentsubscript𝛽𝑘subscript𝑑𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝜉𝑘subscript𝛽𝑘subscript𝑑𝑘\displaystyle\leq|\beta_{k}||d_{k}p_{k}(x_{k})-\xi_{k}|\leq|\beta_{k}|d_{k},≤ | italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ | italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

where the second inequality follows from ξk{0,1,,dk}subscript𝜉𝑘01subscript𝑑𝑘\xi_{k}\in\{0,1,\dots,d_{k}\}italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 0 , 1 , … , italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } and 0pk(xk)10subscript𝑝𝑘subscript𝑥𝑘10\leq p_{k}(x_{k})\leq 10 ≤ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ 1 for all kI𝑘𝐼k\in Iitalic_k ∈ italic_I. Then, for all xI𝑥superscript𝐼x\in\mathbb{R}^{I}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ, 𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)kI|(𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))k||I|maxiI(|βi|di)normsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙subscript𝑘𝐼subscriptsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙𝑘𝐼subscript𝑖𝐼subscript𝛽𝑖subscript𝑑𝑖\left\|\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{% \xi}\mid\bm{x})}\right\|\leq\sum_{k\in I}\left|\left(\frac{\nabla_{\bm{x}}% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)_{k}% \right|\leq|I|\max_{i\in I}\left(|\beta_{i}|d_{i}\right)∥ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_I end_POSTSUBSCRIPT | ( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ | italic_I | roman_max start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT ( | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). ∎

A.3 Proof of Proposition 3

Proof.

Assumption 2 holds since 𝒞=[xmin,xmax]n𝒞superscriptsubscript𝑥subscript𝑥𝑛\mathcal{C}=[x_{\min},x_{\max}]^{n}caligraphic_C = [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Ξ={𝝃i[n],0ξiξimax}Ξconditional-set𝝃formulae-sequencefor-all𝑖delimited-[]𝑛0subscript𝜉𝑖superscriptsubscript𝜉𝑖\Xi=\{\bm{\xi}\mid\forall i\in[n],0\leq\xi_{i}\leq\xi_{i}^{\max}\}roman_Ξ = { bold_italic_ξ ∣ ∀ italic_i ∈ [ italic_n ] , 0 ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT }. Moreover, Assumption 3 holds since ΞΞ\Xiroman_Ξ is a Borel set on nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT from the definition of Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ). We give proof for each condition of Assumption 1.
(i) From definitions of s𝑠sitalic_s and c𝑐citalic_c, f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ. Moreover, 𝒙f(𝒙,𝝃)=𝝃nξmaxnormsubscript𝒙𝑓𝒙𝝃norm𝝃𝑛superscript𝜉\|\nabla_{\bm{x}}f(\bm{x},\bm{\xi})\|=\|\bm{\xi}\|\leq n\xi^{\max}∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) ∥ = ∥ bold_italic_ξ ∥ ≤ italic_n italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT. Therefore, f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) is Lipschitz continuous with modulus Lf=nξmaxsubscript𝐿𝑓𝑛superscript𝜉L_{f}=n\xi^{\max}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_n italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT.

(ii) Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x and Pr(𝝃𝒙)0Prconditional𝝃𝒙0\mathrm{Pr}(\bm{\xi}\mid\bm{x})\neq 0roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ≠ 0 for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝃Ξ𝝃Ξ\bm{\xi}\in\Xibold_italic_ξ ∈ roman_Ξ from definitions of Pr(𝝃𝒙)Prconditional𝝃𝒙\Pr(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ), Ci(𝒙)superscript𝐶𝑖𝒙C^{i}(\bm{x})italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ), and 𝒗i(𝒙)superscript𝒗𝑖𝒙\bm{v}^{i}(\bm{x})bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ).

(iii) Let gi(𝒙,ξ):=(ξ𝒗i(𝒙)𝒂i)22((σi)2𝒗i(𝒙)Ai𝒗i(𝒙))assignsubscript𝑔𝑖𝒙𝜉superscript𝜉superscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖22superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙g_{i}(\bm{x},\xi):=-\frac{(\xi-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})^{2}}{2((% \sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ ) := - divide start_ARG ( italic_ξ - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG and hi(𝒙):=(σi)2𝒗i(𝒙)Ai𝒗i(𝒙)assignsubscript𝑖𝒙superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙h_{i}(\bm{x}):=(\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) := ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ). Then, Pr(𝝃𝒙):=i=1n1Ci(𝒙)2πhi(𝒙)exp(gi(𝒙,ξi))assignPrconditional𝝃𝒙superscriptsubscriptproduct𝑖1𝑛1superscript𝐶𝑖𝒙2𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙subscript𝜉𝑖\Pr(\bm{\xi}\mid\bm{x}):=\prod_{i=1}^{n}\frac{1}{C^{i}(\bm{x})\sqrt{2\pi h_{i}% (\bm{x})}}\exp\left(g_{i}(\bm{x},\xi_{i})\right)roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) := ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). Therefore,

Pr(𝝃𝒙)xk=i=1n1Ci(𝒙)2πhi(𝒙)exp(gi(𝒙,ξi))(i=1n1Ci(𝒙)Ci(𝒙)xki=1n121hi(𝒙)hi(𝒙)xk+i=1ngi(𝒙,ξi)xk).Prconditional𝝃𝒙subscript𝑥𝑘superscriptsubscriptproduct𝑖1𝑛1superscript𝐶𝑖𝒙2𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙subscript𝜉𝑖superscriptsubscript𝑖1𝑛1superscript𝐶𝑖𝒙superscript𝐶𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑖1𝑛121superscript𝑖𝒙superscript𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝒙subscript𝜉𝑖subscript𝑥𝑘\displaystyle\frac{\partial\Pr(\bm{\xi}\mid\bm{x})}{\partial x_{k}}=\prod_{i=1% }^{n}\frac{1}{C^{i}(\bm{x})\sqrt{2\pi h_{i}(\bm{x})}}\exp\left(g_{i}(\bm{x},% \xi_{i})\right)\left(-\sum_{i=1}^{n}\frac{1}{C^{i}(\bm{x})}\frac{\partial C^{i% }(\bm{x})}{\partial x_{k}}-\sum_{i=1}^{n}\frac{1}{2}\frac{1}{h^{i}(\bm{x})}% \frac{\partial h^{i}(\bm{x})}{\partial x_{k}}+\sum_{i=1}^{n}\frac{\partial g_{% i}(\bm{x},\xi_{i})}{\partial x_{k}}\right).divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ( - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_h start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) . (8)

Here,

|vji(𝒙)xk|=|θ1i2xk2x^kjθ2iexp(𝒙𝒙^j2θ2i)|2|θ1i||xkx^kj||θ2i|2θ1max(xmaxxmin)θ2min.superscriptsubscript𝑣𝑗𝑖𝒙subscript𝑥𝑘superscriptsubscript𝜃1𝑖2subscript𝑥𝑘2superscriptsubscript^𝑥𝑘𝑗superscriptsubscript𝜃2𝑖superscriptnorm𝒙superscript^𝒙𝑗2superscriptsubscript𝜃2𝑖2superscriptsubscript𝜃1𝑖subscript𝑥𝑘superscriptsubscript^𝑥𝑘𝑗superscriptsubscript𝜃2𝑖2superscriptsubscript𝜃1subscript𝑥subscript𝑥superscriptsubscript𝜃2\displaystyle\left|\frac{\partial v_{j}^{i}(\bm{x})}{\partial x_{k}}\right|=% \left|-\theta_{1}^{i}\frac{2x_{k}-2\hat{x}_{k}^{j}}{\theta_{2}^{i}}\exp\left(-% \frac{\|\bm{x}-\hat{\bm{x}}^{j}\|^{2}}{\theta_{2}^{i}}\right)\right|\leq\frac{% 2|\theta_{1}^{i}||x_{k}-\hat{x}_{k}^{j}|}{|\theta_{2}^{i}|}\leq\frac{2\theta_{% 1}^{\max}(x_{\max}-x_{\min})}{\theta_{2}^{\min}}.| divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | = | - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT divide start_ARG 2 italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG roman_exp ( - divide start_ARG ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG ) | ≤ divide start_ARG 2 | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | end_ARG start_ARG | italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | end_ARG ≤ divide start_ARG 2 italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG . (9)

For all ξisubscript𝜉𝑖\xi_{i}\in\mathbb{R}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R,

|gi(𝒙,ξi)xk|subscript𝑔𝑖𝒙subscript𝜉𝑖subscript𝑥𝑘\displaystyle\left|\frac{\partial g_{i}(\bm{x},\xi_{i})}{\partial x_{k}}\right|| divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | =|2(ξi𝒗i(𝒙)𝒂i)(j=1Nvji(𝒙)xkaji)2((σi)2𝒗i(𝒙)Ai𝒗i(𝒙))\displaystyle=\left|-\frac{2(\xi_{i}-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})(-% \sum_{j=1}^{N}\frac{\partial v_{j}^{i}(\bm{x})}{\partial x_{k}}a_{j}^{i})}{2((% \sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x}))}\right.= | - divide start_ARG 2 ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) end_ARG
+(ξi𝒗i(𝒙)𝒂i)2(2s=1Nt=1NAsti(vti(𝒙)vsi(𝒙)xk+vsi(𝒙)vti(𝒙)xk))(2((σi)2𝒗i(𝒙)Ai𝒗i(𝒙)))2|\displaystyle\quad\left.+\frac{(\xi_{i}-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})^{% 2}(-2\sum_{s=1}^{N}\sum_{t=1}^{N}A_{st}^{i}(v_{t}^{i}(\bm{x})\frac{\partial v_% {s}^{i}(\bm{x})}{\partial x_{k}}+v_{s}^{i}(\bm{x})\frac{\partial v_{t}^{i}(\bm% {x})}{\partial x_{k}}))}{(2((\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm% {v}^{i}(\bm{x})))^{2}}\right|+ divide start_ARG ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( - 2 ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) ) end_ARG start_ARG ( 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG |
|ξi𝒗i(𝒙)𝒂i(σi)2𝒗i(𝒙)Ai𝒗i(𝒙)||j=1Nvji(𝒙)xkaji|absentsubscript𝜉𝑖superscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙superscriptsubscript𝑗1𝑁superscriptsubscript𝑣𝑗𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑎𝑗𝑖\displaystyle\leq\left|\frac{\xi_{i}-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i}}{(% \sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})}\right|\left|% \sum_{j=1}^{N}\frac{\partial v_{j}^{i}(\bm{x})}{\partial x_{k}}a_{j}^{i}\right|≤ | divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG | | ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT |
+|(ξi𝒗i(𝒙)𝒂i)2(2((σi)2𝒗i(𝒙)Ai𝒗i(𝒙)))2||2s=1Nt=1NAsti(vti(𝒙)vsi(𝒙)xk+vsi(𝒙)vti(𝒙)xk)|superscriptsubscript𝜉𝑖superscript𝒗𝑖superscript𝒙topsuperscript𝒂𝑖2superscript2superscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙22superscriptsubscript𝑠1𝑁superscriptsubscript𝑡1𝑁superscriptsubscript𝐴𝑠𝑡𝑖superscriptsubscript𝑣𝑡𝑖𝒙superscriptsubscript𝑣𝑠𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑣𝑠𝑖𝒙superscriptsubscript𝑣𝑡𝑖𝒙subscript𝑥𝑘\displaystyle\quad+\left|\frac{(\xi_{i}-\bm{v}^{i}(\bm{x})^{\top}\bm{a}^{i})^{% 2}}{(2((\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})))^{2}% }\right|\left|2\sum_{s=1}^{N}\sum_{t=1}^{N}A_{st}^{i}\left(v_{t}^{i}(\bm{x})% \frac{\partial v_{s}^{i}(\bm{x})}{\partial x_{k}}+v_{s}^{i}(\bm{x})\frac{% \partial v_{t}^{i}(\bm{x})}{\partial x_{k}}\right)\right|+ | divide start_ARG ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 2 ( ( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | | 2 ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) |
ξmax+Nθ1maxamaxΔ2Namaxθ1max(xmaxxmin)θ2minabsentsuperscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎Δ2𝑁superscript𝑎superscriptsubscript𝜃1subscript𝑥subscript𝑥superscriptsubscript𝜃2\displaystyle\leq\frac{\xi^{\max}+N\theta_{1}^{\max}a^{\max}}{\Delta}\cdot 2Na% ^{\max}\frac{\theta_{1}^{\max}(x_{\max}-x_{\min})}{\theta_{2}^{\min}}≤ divide start_ARG italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG ⋅ 2 italic_N italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG
+(ξmax+Nθ1maxamax)24Δ28N2Amax(θ1max)2(xmaxxmin)θ2minsuperscriptsuperscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎24superscriptΔ28superscript𝑁2superscript𝐴superscriptsuperscriptsubscript𝜃12subscript𝑥subscript𝑥superscriptsubscript𝜃2\displaystyle\quad+\frac{(\xi^{\max}+N\theta_{1}^{\max}a^{\max})^{2}}{4\Delta^% {2}}\cdot 8N^{2}A^{\max}\frac{(\theta_{1}^{\max})^{2}(x_{\max}-x_{\min})}{% \theta_{2}^{\min}}+ divide start_ARG ( italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ 8 italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG
2Nθ1max(ξmax+Nθ1maxamax)(xmaxxmin)Δθ2min(amax+NAmaxθ1maxξmax+Nθ1maxamaxΔ),absent2𝑁superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎subscript𝑥subscript𝑥Δsuperscriptsubscript𝜃2superscript𝑎𝑁superscript𝐴superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎Δ\displaystyle\leq\frac{2N\theta_{1}^{\max}(\xi^{\max}+N\theta_{1}^{\max}a^{% \max})(x_{\max}-x_{\min})}{\Delta\theta_{2}^{\min}}\left(a^{\max}+NA^{\max}% \theta_{1}^{\max}\frac{\xi^{\max}+N\theta_{1}^{\max}a^{\max}}{\Delta}\right),≤ divide start_ARG 2 italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Δ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG ( italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG ) , (10)

where the second inequality comes from (9) and the fact that (σi)2𝒗i(𝒙)Ai𝒗i(𝒙)Δsuperscriptsuperscript𝜎𝑖2superscript𝒗𝑖superscript𝒙topsuperscript𝐴𝑖superscript𝒗𝑖𝒙Δ(\sigma^{i})^{2}-\bm{v}^{i}(\bm{x})^{\top}A^{i}\bm{v}^{i}(\bm{x})\geq\Delta( italic_σ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) ≥ roman_Δ and |vti(𝒙)||θ1i|θ1maxsuperscriptsubscript𝑣𝑡𝑖𝒙superscriptsubscript𝜃1𝑖superscriptsubscript𝜃1|v_{t}^{i}(\bm{x})|\leq|\theta_{1}^{i}|\leq\theta_{1}^{\max}| italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) | ≤ | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | ≤ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT for i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ] and t[N]𝑡delimited-[]𝑁t\in[N]italic_t ∈ [ italic_N ].

Moreover, for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

|121hi(𝒙)hi(𝒙)xk|121subscript𝑖𝒙subscript𝑖𝒙subscript𝑥𝑘\displaystyle\left|\frac{1}{2}\frac{1}{h_{i}(\bm{x})}\frac{\partial h_{i}(\bm{% x})}{\partial x_{k}}\right|| divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | =12|1hi(𝒙)||s=1Nt=1NAsti(vti(𝒙)vsi(𝒙)xk+vsi(𝒙)vti(𝒙)xk)|absent121subscript𝑖𝒙superscriptsubscript𝑠1𝑁superscriptsubscript𝑡1𝑁superscriptsubscript𝐴𝑠𝑡𝑖superscriptsubscript𝑣𝑡𝑖𝒙superscriptsubscript𝑣𝑠𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑣𝑠𝑖𝒙superscriptsubscript𝑣𝑡𝑖𝒙subscript𝑥𝑘\displaystyle=\frac{1}{2}\left|\frac{1}{h_{i}(\bm{x})}\right|\left|\sum_{s=1}^% {N}\sum_{t=1}^{N}A_{st}^{i}\left(v_{t}^{i}(\bm{x})\frac{\partial v_{s}^{i}(\bm% {x})}{\partial x_{k}}+v_{s}^{i}(\bm{x})\frac{\partial v_{t}^{i}(\bm{x})}{% \partial x_{k}}\right)\right|= divide start_ARG 1 end_ARG start_ARG 2 end_ARG | divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG | | ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) |
2ΔN2Amax(θ1max)2(xmaxxmin)θ2min,absent2Δsuperscript𝑁2superscript𝐴superscriptsuperscriptsubscript𝜃12subscript𝑥subscript𝑥superscriptsubscript𝜃2\displaystyle\leq\frac{2}{\Delta}N^{2}A^{\max}\frac{(\theta_{1}^{\max})^{2}(x_% {\max}-x_{\min})}{\theta_{2}^{\min}},≤ divide start_ARG 2 end_ARG start_ARG roman_Δ end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG , (11)

where the inequality follows from (9) and the fact that hi(𝒙)Δsubscript𝑖𝒙Δh_{i}(\bm{x})\geq\Deltaitalic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ≥ roman_Δ and |vti(𝒙)|θ1maxsuperscriptsubscript𝑣𝑡𝑖𝒙superscriptsubscript𝜃1|v_{t}^{i}(\bm{x})|\leq\theta_{1}^{\max}| italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) | ≤ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT for for i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ] and t[N]𝑡delimited-[]𝑁t\in[N]italic_t ∈ [ italic_N ].

Let r1:=2Nθ1max(ξmax+Nθ1maxamax)(xmaxxmin)Δθ2min(amax+NAmaxθ1maxξmax+Nθ1maxamaxΔ)assignsubscript𝑟12𝑁superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎subscript𝑥subscript𝑥Δsuperscriptsubscript𝜃2superscript𝑎𝑁superscript𝐴superscriptsubscript𝜃1superscript𝜉𝑁superscriptsubscript𝜃1superscript𝑎Δr_{1}:=\frac{2N\theta_{1}^{\max}(\xi^{\max}+N\theta_{1}^{\max}a^{\max})(x_{% \max}-x_{\min})}{\Delta\theta_{2}^{\min}}\left(a^{\max}+NA^{\max}\theta_{1}^{% \max}\frac{\xi^{\max}+N\theta_{1}^{\max}a^{\max}}{\Delta}\right)italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := divide start_ARG 2 italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ( italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG roman_Δ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG ( italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT + italic_N italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_a start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG ), and
r2:=2ΔN2Amax(θ1max)2(xmaxxmin)θ2minassignsubscript𝑟22Δsuperscript𝑁2superscript𝐴superscriptsuperscriptsubscript𝜃12subscript𝑥subscript𝑥superscriptsubscript𝜃2r_{2}:=\frac{2}{\Delta}N^{2}A^{\max}\frac{(\theta_{1}^{\max})^{2}(x_{\max}-x_{% \min})}{\theta_{2}^{\min}}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := divide start_ARG 2 end_ARG start_ARG roman_Δ end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT divide start_ARG ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ) end_ARG start_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT end_ARG. Then,

|1Ci(𝒙)Ci(𝒙)xk|1superscript𝐶𝑖𝒙superscript𝐶𝑖𝒙subscript𝑥𝑘\displaystyle\left|\frac{1}{C^{i}(\bm{x})}\frac{\partial C^{i}(\bm{x})}{% \partial x_{k}}\right|| divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | =|1Ci(𝒙)0ξimax12πhi(𝒙)exp(gi(𝒙,ϕ))𝑑ϕxk|absent1superscript𝐶𝑖𝒙superscriptsubscript0superscriptsubscript𝜉𝑖12𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙italic-ϕdifferential-ditalic-ϕsubscript𝑥𝑘\displaystyle=\left|\frac{1}{C^{i}(\bm{x})}\frac{\partial\int_{0}^{\xi_{i}^{% \max}}\frac{1}{\sqrt{2\pi h_{i}(\bm{x})}}\exp\left(g_{i}(\bm{x},\phi)\right)d% \phi}{\partial x_{k}}\right|= | divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) ) italic_d italic_ϕ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG |
=|1Ci(𝒙)0ξimax12πhi(𝒙)exp(gi(𝒙,ϕ))(121hi(𝒙)hi(𝒙)xk+gi(𝒙,ϕ)xk)𝑑ϕ|absent1superscript𝐶𝑖𝒙superscriptsubscript0superscriptsubscript𝜉𝑖12𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙italic-ϕ121subscript𝑖𝒙subscript𝑖𝒙subscript𝑥𝑘subscript𝑔𝑖𝒙italic-ϕsubscript𝑥𝑘differential-ditalic-ϕ\displaystyle=\left|\frac{1}{C^{i}(\bm{x})}\int_{0}^{\xi_{i}^{\max}}\frac{1}{% \sqrt{2\pi h_{i}(\bm{x})}}\exp\left(g_{i}(\bm{x},\phi)\right)\left(-\frac{1}{2% }\frac{1}{h_{i}(\bm{x})}\frac{\partial h_{i}(\bm{x})}{\partial x_{k}}+\frac{% \partial g_{i}(\bm{x},\phi)}{\partial x_{k}}\right)d\phi\right|= | divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) ) ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) italic_d italic_ϕ |
|1Ci(𝒙)0ξimax12πhi(𝒙)exp(gi(𝒙,ϕ))(|121hi(𝒙)hi(𝒙)xk|+|gi(𝒙,ϕ)xk|)𝑑ϕ|absent1superscript𝐶𝑖𝒙superscriptsubscript0superscriptsubscript𝜉𝑖12𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙italic-ϕ121subscript𝑖𝒙subscript𝑖𝒙subscript𝑥𝑘subscript𝑔𝑖𝒙italic-ϕsubscript𝑥𝑘differential-ditalic-ϕ\displaystyle\leq\left|\frac{1}{C^{i}(\bm{x})}\int_{0}^{\xi_{i}^{\max}}\frac{1% }{\sqrt{2\pi h_{i}(\bm{x})}}\exp\left(g_{i}(\bm{x},\phi)\right)\left(\left|% \frac{1}{2}\frac{1}{h_{i}(\bm{x})}\frac{\partial h_{i}(\bm{x})}{\partial x_{k}% }\right|+\left|\frac{\partial g_{i}(\bm{x},\phi)}{\partial x_{k}}\right|\right% )d\phi\right|≤ | divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) ) ( | divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | + | divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | ) italic_d italic_ϕ |
|1Ci(𝒙)0ξimax12πhi(𝒙)exp(gi(𝒙,ϕ))(r1+r2)𝑑ϕ|=r1+r2,absent1superscript𝐶𝑖𝒙superscriptsubscript0superscriptsubscript𝜉𝑖12𝜋subscript𝑖𝒙subscript𝑔𝑖𝒙italic-ϕsubscript𝑟1subscript𝑟2differential-ditalic-ϕsubscript𝑟1subscript𝑟2\displaystyle\leq\left|\frac{1}{C^{i}(\bm{x})}\int_{0}^{\xi_{i}^{\max}}\frac{1% }{\sqrt{2\pi h_{i}(\bm{x})}}\exp\left(g_{i}(\bm{x},\phi)\right)\left(r_{1}+r_{% 2}\right)d\phi\right|=r_{1}+r_{2},≤ | divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_ARG roman_exp ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ϕ ) ) ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_d italic_ϕ | = italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where the second inequality follows from (10) and (11). Here, from (8),

|Pr(𝝃𝒙)xk1Pr(𝝃𝒙)|Prconditional𝝃𝒙subscript𝑥𝑘1Prconditional𝝃𝒙\displaystyle\left|\frac{\partial\Pr(\bm{\xi}\mid\bm{x})}{\partial x_{k}}\cdot% \frac{1}{\Pr(\bm{\xi}\mid\bm{x})}\right|| divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG | i=1n|1Ci(𝒙)Ci(𝒙)xk|+i=1n|121hi(𝒙)hi(𝒙)xk|+i=1n|gi(𝒙,ξi)xk|2n(r1+r2).absentsuperscriptsubscript𝑖1𝑛1superscript𝐶𝑖𝒙superscript𝐶𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑖1𝑛121superscript𝑖𝒙superscript𝑖𝒙subscript𝑥𝑘superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝒙subscript𝜉𝑖subscript𝑥𝑘2𝑛subscript𝑟1subscript𝑟2\displaystyle\leq\sum_{i=1}^{n}\left|\frac{1}{C^{i}(\bm{x})}\frac{\partial C^{% i}(\bm{x})}{\partial x_{k}}\right|+\sum_{i=1}^{n}\left|\frac{1}{2}\frac{1}{h^{% i}(\bm{x})}\frac{\partial h^{i}(\bm{x})}{\partial x_{k}}\right|+\sum_{i=1}^{n}% \left|\frac{\partial g_{i}(\bm{x},\xi_{i})}{\partial x_{k}}\right|\leq 2n(r_{1% }+r_{2}).≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG divide start_ARG ∂ italic_h start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG | ≤ 2 italic_n ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) .

Therefore, Pr(𝝃𝒙)Pr(𝝃𝒙)k=1n|Pr(𝝃𝒙)xk1Pr(𝝃𝒙)|2n2(r1+r2)normPrconditional𝝃𝒙Prconditional𝝃𝒙superscriptsubscript𝑘1𝑛Prconditional𝝃𝒙subscript𝑥𝑘1Prconditional𝝃𝒙2superscript𝑛2subscript𝑟1subscript𝑟2\left\|\frac{\nabla\Pr(\bm{\xi}\mid\bm{x})}{\Pr(\bm{\xi}\mid\bm{x})}\right\|% \leq\sum_{k=1}^{n}\left|\frac{\partial\Pr(\bm{\xi}\mid\bm{x})}{\partial x_{k}}% \cdot\frac{1}{\Pr(\bm{\xi}\mid\bm{x})}\right|\leq 2n^{2}(r_{1}+r_{2})∥ divide start_ARG ∇ roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG | ≤ 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Condition (iii) of Assumption 1 holds from the definition of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and M𝑀Mitalic_M. ∎

A.4 Proof of Lemma 4

Proof.

For a given 𝒙C𝒙𝐶\bm{x}\in Cbold_italic_x ∈ italic_C, let {Δk}subscriptΔ𝑘\{\Delta_{k}\}{ roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } be a sequence of scalars such that limkΔk=0subscript𝑘subscriptΔ𝑘0\lim_{k\to\infty}\Delta_{k}=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 and 𝒙+Δk𝒆i𝒞𝒙subscriptΔ𝑘superscript𝒆𝑖𝒞\bm{x}+\Delta_{k}\bm{e}^{i}\in\mathcal{C}bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_C, where 𝒆isuperscript𝒆𝑖\bm{e}^{i}bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a vector such that the i𝑖iitalic_i-th element is 1111 and other elements are 00. Let gk,i(𝒙,𝝃):=h(𝒙+Δk𝒆i,𝝃)h(𝒙,𝝃)Δkassignsubscript𝑔𝑘𝑖𝒙𝝃𝒙subscriptΔ𝑘superscript𝒆𝑖𝝃𝒙𝝃subscriptΔ𝑘g_{k,i}(\bm{x},\bm{\xi}):=\frac{h(\bm{x}+\Delta_{k}\bm{e}^{i},\bm{\xi})-h(\bm{% x},\bm{\xi})}{\Delta_{k}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) := divide start_ARG italic_h ( bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_ξ ) - italic_h ( bold_italic_x , bold_italic_ξ ) end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG. There exists 𝒙𝒞superscript𝒙𝒞\bm{x}^{\prime}\in\mathcal{C}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_C such that gk,i(𝒙,𝝃)=h(𝒙,𝝃)xisubscript𝑔𝑘𝑖𝒙𝝃superscript𝒙𝝃subscript𝑥𝑖g_{k,i}(\bm{x},\bm{\xi})=\frac{\partial h(\bm{x}^{\prime},\bm{\xi})}{\partial x% _{i}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) = divide start_ARG ∂ italic_h ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG from the mean-value theorem. Moreover, fmax(=max𝒙𝒞,𝝃Ξ|f(𝒙,𝝃)|)annotatedsubscript𝑓absentsubscriptformulae-sequence𝒙𝒞𝝃Ξ𝑓𝒙𝝃f_{\max}(=\max_{\bm{x}\in\mathcal{C},\bm{\xi}\in\Xi}|f(\bm{x},\bm{\xi})|)italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( = roman_max start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_C , bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT | italic_f ( bold_italic_x , bold_italic_ξ ) | ) exists since ΞΞ\Xiroman_Ξ and 𝒞𝒞\mathcal{C}caligraphic_C are compact from Assumption 2 and f𝑓fitalic_f is a real-valued continuous function from Assumption 1. Then, for all 𝒙missingC𝒙missing𝐶\bm{x}\in\mathcal{\mathcal{missing}}Cbold_italic_x ∈ roman_missing italic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n,

|gk,i(𝒙,𝝃)|subscript𝑔𝑘𝑖𝒙𝝃\displaystyle|g_{k,i}(\bm{x},\bm{\xi})|| italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) | =|h(𝒙,𝝃)xi|=|f(𝒙,𝝃)xiPr(𝝃𝒙)+f(𝒙,𝝃)Pr(𝝃𝒙)xi|\displaystyle=\left|\frac{\partial h(\bm{x}^{\prime},\bm{\xi})}{\partial x_{i}% }\right|=\left|\frac{\partial f(\bm{x}^{\prime},\bm{\xi})}{\partial x_{i}}% \mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})+f(\bm{x}^{\prime},\bm{\xi})\frac{% \partial\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})}{\partial x_{i}}\right|= | divide start_ARG ∂ italic_h ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | = | divide start_ARG ∂ italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ) divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG |
|f(𝒙,𝝃)xi|+fmax|Pr(𝝃𝒙)xi1Pr(𝝃𝒙)|Lf+fmaxM,absent𝑓superscript𝒙𝝃subscript𝑥𝑖subscript𝑓Prconditional𝝃superscript𝒙subscript𝑥𝑖1Prconditional𝝃superscript𝒙subscript𝐿𝑓subscript𝑓𝑀\displaystyle\leq\left|\frac{\partial f(\bm{x}^{\prime},\bm{\xi})}{\partial x_% {i}}\right|+f_{\max}\left|\frac{\partial\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime% })}{\partial x_{i}}\frac{1}{\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})}\right|% \leq L_{f}+f_{\max}M,≤ | divide start_ARG ∂ italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT | divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG | ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ,

where the first inequality comes from 0<Pr(𝝃𝒙)10Prconditional𝝃superscript𝒙10<\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})\leq 10 < roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 1, and the second inequality follows from conditions (i) and (iii) of Assumption 1. Here, gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is measurable on ΞΞ\Xiroman_Ξ since ΞΞ\Xiroman_Ξ is a Borel set and gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ from Assumption 3 and the definitions of gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT and hhitalic_h. The constant function r(𝝃):=Lf+fmaxM<assign𝑟𝝃subscript𝐿𝑓subscript𝑓𝑀r(\bm{\xi}):=L_{f}+f_{\max}M<\inftyitalic_r ( bold_italic_ξ ) := italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M < ∞ is integrable over ΞΞ\Xiroman_Ξ. Moreover, gk,i(𝒙,𝝃)h(𝒙,𝝃)xisubscript𝑔𝑘𝑖𝒙𝝃𝒙𝝃subscript𝑥𝑖g_{k,i}(\bm{x},\bm{\xi})\to\frac{\partial h(\bm{x},\bm{\xi})}{\partial x_{i}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) → divide start_ARG ∂ italic_h ( bold_italic_x , bold_italic_ξ ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG pointwise when k𝑘k\to\inftyitalic_k → ∞ since h(𝒙,𝝃)𝒙𝝃h(\bm{x},\bm{\xi})italic_h ( bold_italic_x , bold_italic_ξ ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x from conditions (i) and (ii) of Assumption 1. Then, the Lebesgue dominated convergence theorem [Royden and Fitzpatrick, 1988, Chapter 4.4, page 88] holds for gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n, that is,

limk𝝃Ξgk,i(𝒙,𝝃)𝑑𝝃=𝝃Ξlimkgk,i(𝒙,𝝃)d𝝃,for all𝒙𝒞andi=1,,n.formulae-sequenceformulae-sequencesubscript𝑘subscript𝝃Ξsubscript𝑔𝑘𝑖𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝑘subscript𝑔𝑘𝑖𝒙𝝃𝑑𝝃for all𝒙𝒞and𝑖1𝑛\displaystyle\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}g_{k,i}(\bm{x},\bm{\xi})d% \bm{\xi}=\int_{\bm{\xi}\in\Xi}\lim_{k\to\infty}g_{k,i}(\bm{x},\bm{\xi})d\bm{% \xi},\ \textrm{for all}\ \bm{x}\in\mathcal{C}\ \textrm{and}\ i=1,\dots,n.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ , for all bold_italic_x ∈ caligraphic_C and italic_i = 1 , … , italic_n .

Then, for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n,

(𝒙𝝃Ξh(𝒙,𝝃)𝑑𝝃)isubscriptsubscript𝒙subscript𝝃Ξ𝒙𝝃differential-d𝝃𝑖\displaystyle\left(\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}h(\bm{x},\bm{\xi})d\bm{% \xi}\right)_{i}( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =limk𝝃Ξh(𝒙+Δk𝒆i,𝝃)𝑑𝝃𝝃Ξh(𝒙,𝝃)𝑑𝝃Δkabsentsubscript𝑘subscript𝝃Ξ𝒙subscriptΔ𝑘superscript𝒆𝑖𝝃differential-d𝝃subscript𝝃Ξ𝒙𝝃differential-d𝝃subscriptΔ𝑘\displaystyle=\lim_{k\to\infty}\frac{\int_{\bm{\xi}\in\Xi}h(\bm{x}+\Delta_{k}% \bm{e}^{i},\bm{\xi})d\bm{\xi}-\int_{\bm{\xi}\in\Xi}h(\bm{x},\bm{\xi})d\bm{\xi}% }{\Delta_{k}}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT divide start_ARG ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_h ( bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_ξ ) italic_d bold_italic_ξ - ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG
=limk𝝃Ξh(𝒙+Δk𝒆i,𝝃)h(𝒙,𝝃)Δk𝑑𝝃absentsubscript𝑘subscript𝝃Ξ𝒙subscriptΔ𝑘superscript𝒆𝑖𝝃𝒙𝝃subscriptΔ𝑘differential-d𝝃\displaystyle=\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}\frac{h(\bm{x}+\Delta_{k}% \bm{e}^{i},\bm{\xi})-h(\bm{x},\bm{\xi})}{\Delta_{k}}d\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT divide start_ARG italic_h ( bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_ξ ) - italic_h ( bold_italic_x , bold_italic_ξ ) end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_d bold_italic_ξ
=limk𝝃Ξgk,i(𝒙,𝝃)𝑑𝝃=𝝃Ξlimkgk,i(𝒙,𝝃)d𝝃absentsubscript𝑘subscript𝝃Ξsubscript𝑔𝑘𝑖𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝑘subscript𝑔𝑘𝑖𝒙𝝃𝑑𝝃\displaystyle=\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}g_{k,i}(\bm{x},\bm{\xi})d% \bm{\xi}=\int_{\bm{\xi}\in\Xi}\lim_{k\to\infty}g_{k,i}(\bm{x},\bm{\xi})d\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ
=𝝃Ξ(𝒙h(𝒙,𝝃))i𝑑𝝃.absentsubscript𝝃Ξsubscriptsubscript𝒙𝒙𝝃𝑖differential-d𝝃\displaystyle=\int_{\bm{\xi}\in\Xi}\left(\nabla_{\bm{x}}h(\bm{x},\bm{\xi})% \right)_{i}d\bm{\xi}.= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d bold_italic_ξ .

Therefore, for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C,

𝒙𝝃Ξh(𝒙,𝝃)𝑑𝝃=𝝃Ξ𝒙h(𝒙,𝝃)𝑑𝝃.subscript𝒙subscript𝝃Ξ𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝒙𝒙𝝃differential-d𝝃\displaystyle\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}h(\bm{x},\bm{\xi})d\bm{\xi}=% \int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}h(\bm{x},\bm{\xi})d\bm{\xi}.∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_h ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ .

A.5 Proof of Lemma 5

Proof.

For given 𝒙C𝒙𝐶\bm{x}\in Cbold_italic_x ∈ italic_C, let {Δk}subscriptΔ𝑘\{\Delta_{k}\}{ roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } be a sequence of scalars such that limkΔk=0subscript𝑘subscriptΔ𝑘0\lim_{k\to\infty}\Delta_{k}=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 and 𝒙+Δk𝒆i𝒞𝒙subscriptΔ𝑘superscript𝒆𝑖𝒞\bm{x}+\Delta_{k}\bm{e}^{i}\in\mathcal{C}bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_C, where 𝒆isuperscript𝒆𝑖\bm{e}^{i}bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a vector such that the i𝑖iitalic_i-th element is 1111 and other elements are 00. Let gk,i(𝒙,𝝃):=q(𝝃)Pr(𝝃𝒙+Δk𝒆i)q(𝝃)Pr(𝝃𝒙)Δkassignsubscript𝑔𝑘𝑖𝒙𝝃𝑞𝝃Prconditional𝝃𝒙subscriptΔ𝑘superscript𝒆𝑖𝑞𝝃Prconditional𝝃𝒙subscriptΔ𝑘g_{k,i}(\bm{x},\bm{\xi}):=\frac{q(\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x}+% \Delta_{k}\bm{e}^{i})-q(\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\Delta_{k}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) := divide start_ARG italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG. There exists 𝒙𝒞superscript𝒙𝒞\bm{x}^{\prime}\in\mathcal{C}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_C such that gk,i(𝒙,𝝃)=q(𝝃)Pr(𝝃𝒙)xisubscript𝑔𝑘𝑖𝒙𝝃𝑞𝝃Prconditional𝝃superscript𝒙subscript𝑥𝑖g_{k,i}(\bm{x},\bm{\xi})=\frac{\partial q(\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm% {x}^{\prime})}{\partial x_{i}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) = divide start_ARG ∂ italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG from the mean-value theorem. Moreover, let qmax:=max𝝃Ξ|q(𝝃)|assignsuperscript𝑞subscript𝝃Ξ𝑞𝝃q^{\max}:=\max_{\bm{\xi}\in\Xi}|q(\bm{\xi})|italic_q start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT := roman_max start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT | italic_q ( bold_italic_ξ ) |, which exists since ΞΞ\Xiroman_Ξ is compact from Assumption 2 and q𝑞qitalic_q is a real-valued continuous function. Then, for all 𝒙missingC𝒙missing𝐶\bm{x}\in\mathcal{\mathcal{missing}}Cbold_italic_x ∈ roman_missing italic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n,

|gk,i(𝒙,𝝃)|subscript𝑔𝑘𝑖𝒙𝝃\displaystyle|g_{k,i}(\bm{x},\bm{\xi})|| italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) | =|q(𝝃)Pr(𝝃𝒙)xi|=|q(𝝃)||Pr(𝝃𝒙)xi|absent𝑞𝝃Prconditional𝝃superscript𝒙subscript𝑥𝑖𝑞𝝃Prconditional𝝃superscript𝒙subscript𝑥𝑖\displaystyle=\left|\frac{\partial q(\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{% \prime})}{\partial x_{i}}\right|=|q(\bm{\xi})|\left|\frac{\partial\mathrm{Pr}(% \bm{\xi}\mid\bm{x}^{\prime})}{\partial x_{i}}\right|= | divide start_ARG ∂ italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | = | italic_q ( bold_italic_ξ ) | | divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG |
|q(𝝃)||Pr(𝝃𝒙)xi1Pr(𝝃𝒙)|qmaxM,absent𝑞𝝃Prconditional𝝃superscript𝒙subscript𝑥𝑖1Prconditional𝝃superscript𝒙superscript𝑞𝑀\displaystyle\leq|q(\bm{\xi})|\left|\frac{\partial\mathrm{Pr}(\bm{\xi}\mid\bm{% x}^{\prime})}{\partial x_{i}}\frac{1}{\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})% }\right|\leq q^{\max}M,≤ | italic_q ( bold_italic_ξ ) | | divide start_ARG ∂ roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG | ≤ italic_q start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_M ,

where the first inequality follows from 0<Pr(𝝃𝒙)10Prconditional𝝃superscript𝒙10<\mathrm{Pr}(\bm{\xi}\mid\bm{x}^{\prime})\leq 10 < roman_Pr ( bold_italic_ξ ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 1. The second inequality comes from condition (iii) of Assumption 1. Here, gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is measurable on ΞΞ\Xiroman_Ξ since ΞΞ\Xiroman_Ξ is a Borel set and gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is continuous w.r.t. 𝝃𝝃\bm{\xi}bold_italic_ξ from Assumption 3 and the definition of gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT. The constant function r(𝝃):=qmaxM<assign𝑟𝝃superscript𝑞𝑀r(\bm{\xi}):=q^{\max}M<\inftyitalic_r ( bold_italic_ξ ) := italic_q start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT italic_M < ∞ is integrable over ΞΞ\Xiroman_Ξ. Moreover, gk,i(𝒙,𝝃)q(𝝃)Pr(𝝃𝒙)xisubscript𝑔𝑘𝑖𝒙𝝃𝑞𝝃Prconditional𝝃𝒙subscript𝑥𝑖g_{k,i}(\bm{x},\bm{\xi})\to\frac{\partial q(\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid% \bm{x})}{\partial x_{i}}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) → divide start_ARG ∂ italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG pointwise when k𝑘k\to\inftyitalic_k → ∞ since Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) is differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x from condition (ii) of Assumption 1. Then, the Lebesgue dominated convergence theorem [Royden and Fitzpatrick, 1988, Chapter 4.4, page 88] holds for gk,isubscript𝑔𝑘𝑖g_{k,i}italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n, that is,

limk𝝃Ξgk,i(𝒙,𝝃)𝑑𝝃=𝝃Ξlimkgk,i(𝒙,𝝃)d𝝃,for all𝒙𝒞andi=1,,n.formulae-sequenceformulae-sequencesubscript𝑘subscript𝝃Ξsubscript𝑔𝑘𝑖𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝑘subscript𝑔𝑘𝑖𝒙𝝃𝑑𝝃for all𝒙𝒞and𝑖1𝑛\displaystyle\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}g_{k,i}(\bm{x},\bm{\xi})d{% \bm{\xi}}=\int_{\bm{\xi}\in\Xi}\lim_{k\to\infty}g_{k,i}(\bm{x},\bm{\xi})d{\bm{% \xi}},\ \textrm{for all}\ \bm{x}\in\mathcal{C}\ \textrm{and}\ i=1,\dots,n.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ , for all bold_italic_x ∈ caligraphic_C and italic_i = 1 , … , italic_n .

Then, for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C and i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n,

(𝒙𝝃Ξq(𝝃)Pr(𝝃𝒙)𝑑𝝃)isubscriptsubscript𝒙subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙differential-d𝝃𝑖\displaystyle\left(\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}q(\bm{\xi})\mathrm{Pr}(% \bm{\xi}\mid\bm{x})d\bm{\xi}\right)_{i}( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
=limk𝝃Ξq(𝝃)Pr(𝝃𝒙+Δk𝒆i)𝑑𝝃𝝃Ξq(𝝃)Pr(𝝃𝒙)𝑑𝝃Δkabsentsubscript𝑘subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙subscriptΔ𝑘superscript𝒆𝑖differential-d𝝃subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙differential-d𝝃subscriptΔ𝑘\displaystyle=\lim_{k\to\infty}\frac{\int_{\bm{\xi}\in\Xi}q(\bm{\xi})\mathrm{% Pr}(\bm{\xi}\mid\bm{x}+\Delta_{k}\bm{e}^{i})d\bm{\xi}-\int_{\bm{\xi}\in\Xi}q(% \bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x})d\bm{\xi}}{\Delta_{k}}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT divide start_ARG ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_d bold_italic_ξ - ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG
=limk𝝃Ξq(𝝃)Pr(𝝃𝒙+Δk𝒆i)q(𝝃)Pr(𝝃𝒙)Δk𝑑𝝃absentsubscript𝑘subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙subscriptΔ𝑘superscript𝒆𝑖𝑞𝝃Prconditional𝝃𝒙subscriptΔ𝑘differential-d𝝃\displaystyle=\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}\frac{q(\bm{\xi})\mathrm{% Pr}(\bm{\xi}\mid\bm{x}+\Delta_{k}\bm{e}^{i})-q(\bm{\xi})\mathrm{Pr}(\bm{\xi}% \mid\bm{x})}{\Delta_{k}}d\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT divide start_ARG italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG italic_d bold_italic_ξ
=limk𝝃Ξgk,i(𝒙,𝝃)𝑑𝝃=𝝃Ξlimkgk,i(𝒙,𝝃)d𝝃absentsubscript𝑘subscript𝝃Ξsubscript𝑔𝑘𝑖𝒙𝝃differential-d𝝃subscript𝝃Ξsubscript𝑘subscript𝑔𝑘𝑖𝒙𝝃𝑑𝝃\displaystyle=\lim_{k\to\infty}\int_{\bm{\xi}\in\Xi}g_{k,i}(\bm{x},\bm{\xi})d% \bm{\xi}=\int_{\bm{\xi}\in\Xi}\lim_{k\to\infty}g_{k,i}(\bm{x},\bm{\xi})d\bm{\xi}= roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ ) italic_d bold_italic_ξ
=𝝃Ξ(𝒙(q(𝝃)Pr(𝝃𝒙)))i𝑑𝝃.absentsubscript𝝃Ξsubscriptsubscript𝒙𝑞𝝃Prconditional𝝃𝒙𝑖differential-d𝝃\displaystyle=\int_{\bm{\xi}\in\Xi}\left(\nabla_{\bm{x}}(q(\bm{\xi})\mathrm{Pr% }(\bm{\xi}\mid\bm{x}))\right)_{i}d\bm{\xi}.= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d bold_italic_ξ . (12)

Therefore, for all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C,

𝒙𝝃Ξq(𝝃)Pr(𝝃𝒙)𝑑𝝃=𝝃Ξ𝒙(q(𝝃)Pr(𝝃𝒙))d𝝃.subscript𝒙subscript𝝃Ξ𝑞𝝃Prconditional𝝃𝒙differential-d𝝃subscript𝝃Ξsubscript𝒙𝑞𝝃Prconditional𝝃𝒙𝑑𝝃\displaystyle\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}q(\bm{\xi})\mathrm{Pr}(\bm{% \xi}\mid\bm{x})d\bm{\xi}=\int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}(q(\bm{\xi})% \mathrm{Pr}(\bm{\xi}\mid\bm{x}))d{\bm{\xi}}.∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_q ( bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ) italic_d bold_italic_ξ .

A.6 Proof of Lemma 6

Proof.

We have

𝝃Ξ(δ𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))Pr(𝝃𝒙)𝑑𝝃=δ𝝃Ξ𝒙Pr(𝝃𝒙)𝑑𝝃subscript𝝃Ξ𝛿subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙differential-d𝝃𝛿subscript𝝃Ξsubscript𝒙Prconditional𝝃𝒙differential-d𝝃\displaystyle\int_{\bm{\xi}\in\Xi}\left(\delta\frac{\nabla_{\bm{x}}\mathrm{Pr}% (\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)\mathrm{Pr}(\bm{% \xi}\mid\bm{x})d\bm{\xi}=\delta\int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}\mathrm{Pr}% (\bm{\xi}\mid\bm{x})d\bm{\xi}∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( italic_δ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ = italic_δ ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ
=δ𝒙𝝃ΞPr(𝝃𝒙)𝑑𝝃=δ𝒙(1)=0,absent𝛿subscript𝒙subscript𝝃ΞPrconditional𝝃𝒙differential-d𝝃𝛿subscript𝒙10\displaystyle=\delta\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}\mathrm{Pr}(\bm{\xi}% \mid\bm{x})d\bm{\xi}=\delta\nabla_{\bm{x}}(1)=0,= italic_δ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ = italic_δ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( 1 ) = 0 , (13)

where the second equality comes from Lemma 5 with q(𝝃)=1𝑞𝝃1q(\bm{\xi})=1italic_q ( bold_italic_ξ ) = 1. Then,

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]=𝒙𝝃Ξf(𝒙,𝝃)𝑑Pr(𝝃𝒙)=𝒙𝝃Ξf(𝒙,𝝃)Pr(𝝃𝒙)𝑑𝝃subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃subscript𝒙subscript𝝃Ξ𝑓𝒙𝝃differential-dPrconditional𝝃𝒙subscript𝒙subscript𝝃Ξ𝑓𝒙𝝃Prconditional𝝃𝒙differential-d𝝃\displaystyle\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{% \xi})]=\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}f(\bm{x},\bm{\xi})d\mathrm{Pr}(\bm{% \xi}\mid\bm{x})=\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}f(\bm{x},\bm{\xi})\mathrm{% Pr}(\bm{\xi}\mid\bm{x})d\bm{\xi}∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] = ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) italic_d roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ
=𝝃Ξ𝒙(f(𝒙,𝝃)Pr(𝝃𝒙))d𝝃=𝝃Ξ𝒙f(𝒙,𝝃)Pr(𝝃𝒙)+f(𝒙,𝝃)𝒙Pr(𝝃𝒙)d𝝃absentsubscript𝝃Ξsubscript𝒙𝑓𝒙𝝃Prconditional𝝃𝒙𝑑𝝃subscript𝝃Ξsubscript𝒙𝑓𝒙𝝃Prconditional𝝃𝒙𝑓𝒙𝝃subscript𝒙Prconditional𝝃𝒙𝑑𝝃\displaystyle=\int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}\left(f(\bm{x},\bm{\xi})% \mathrm{Pr}(\bm{\xi}\mid\bm{x})\right)d\bm{\xi}=\int_{\bm{\xi}\in\Xi}\nabla_{% \bm{x}}f(\bm{x},\bm{\xi})\mathrm{Pr}(\bm{\xi}\mid\bm{x})+f(\bm{x},\bm{\xi})% \nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})d\bm{\xi}= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_f ( bold_italic_x , bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ) italic_d bold_italic_ξ = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) + italic_f ( bold_italic_x , bold_italic_ξ ) ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ
=𝝃Ξ(𝒙f(𝒙,𝝃)+f(𝒙,𝝃)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))Pr(𝝃𝒙)𝑑𝝃absentsubscript𝝃Ξsubscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙differential-d𝝃\displaystyle=\int_{\bm{\xi}\in\Xi}\left(\nabla_{\bm{x}}f(\bm{x},\bm{\xi})+f(% \bm{x},\bm{\xi})\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{% Pr}(\bm{\xi}\mid\bm{x})}\right)\mathrm{Pr}(\bm{\xi}\mid\bm{x})d\bm{\xi}= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + italic_f ( bold_italic_x , bold_italic_ξ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ
=𝝃Ξ(𝒙f(𝒙,𝝃)+f(𝒙,𝝃)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))𝑑Pr(𝝃𝒙)absentsubscript𝝃Ξsubscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙differential-dPrconditional𝝃𝒙\displaystyle=\int_{\bm{\xi}\in\Xi}\left(\nabla_{\bm{x}}f(\bm{x},\bm{\xi})+f(% \bm{x},\bm{\xi})\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{% Pr}(\bm{\xi}\mid\bm{x})}\right)d\mathrm{Pr}(\bm{\xi}\mid\bm{x})= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + italic_f ( bold_italic_x , bold_italic_ξ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) italic_d roman_Pr ( bold_italic_ξ ∣ bold_italic_x )
=𝝃Ξ(𝒙f(𝒙,𝝃)+(f(𝒙,𝝃)δ)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))𝑑Pr(𝝃𝒙)absentsubscript𝝃Ξsubscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃𝛿subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙differential-dPrconditional𝝃𝒙\displaystyle=\int_{\bm{\xi}\in\Xi}\left(\nabla_{\bm{x}}f(\bm{x},\bm{\xi})+(f(% \bm{x},\bm{\xi})-\delta)\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})}{% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right)d\mathrm{Pr}(\bm{\xi}\mid\bm{x})= ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + ( italic_f ( bold_italic_x , bold_italic_ξ ) - italic_δ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ) italic_d roman_Pr ( bold_italic_ξ ∣ bold_italic_x )
=𝔼𝝃D(𝒙)[𝒙f(𝒙,𝝃)+(f(𝒙,𝝃)δ)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)].absentsubscript𝔼similar-to𝝃𝐷𝒙delimited-[]subscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃𝛿subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙\displaystyle=\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[\nabla_{\bm{x}}f(\bm{x}% ,\bm{\xi})+(f(\bm{x},\bm{\xi})-\delta)\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi% }\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right].= blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + ( italic_f ( bold_italic_x , bold_italic_ξ ) - italic_δ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ] .

Here, the third equality obviously holds when 𝝃𝝃\bm{\xi}bold_italic_ξ is a discrete random vector. If 𝝃𝝃\bm{\xi}bold_italic_ξ is a continuous random vector, the third equality follows from Lemma 4 since Assumptions 13 hold. The fourth equality is due to the fact that f(𝒙,𝝃)𝑓𝒙𝝃f(\bm{x},\bm{\xi})italic_f ( bold_italic_x , bold_italic_ξ ) and Pr(𝝃𝒙)Prconditional𝝃𝒙\mathrm{Pr}(\bm{\xi}\mid\bm{x})roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) are differentiable w.r.t. 𝒙𝒙\bm{x}bold_italic_x from conditions (i) and (ii) of Assumption 1. The fifth equality is due to the fact that Pr(𝝃𝒙)0Prconditional𝝃𝒙0\mathrm{Pr}(\bm{\xi}\mid\bm{x})\neq 0roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ≠ 0 from condition (ii) of Assumption 1. The seventh equality comes from (13). Then, Lemma 6 holds from Definition 2. ∎

A.7 Proof of Lemma 7

Proof.

We have

𝔼𝝃D(𝒙)[g2(𝒙,𝝃)𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]2]=𝔼𝝃D(𝒙)[g2(𝒙,𝝃)2]𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]2subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnormsubscript𝑔2𝒙superscript𝝃subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃2subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnormsubscript𝑔2𝒙superscript𝝃2superscriptnormsubscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃2\displaystyle\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|g_{2}(\bm{x},\bm{% \xi}^{\prime})-\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm% {\xi})]\|^{2}]=\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|g_{2}(\bm{x},\bm% {\xi}^{\prime})\|^{2}]-\|\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(% \bm{x},\bm{\xi})]\|^{2}blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝔼𝝃D(𝒙)[g2(𝒙,𝝃)2]𝔼𝝃D(𝒙)[(𝒙f(𝒙,𝝃)+|f(𝒙,𝝃)δ|𝒙Pr(𝝃𝒙)Pr(𝝃𝒙))2]absentsubscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnormsubscript𝑔2𝒙superscript𝝃2subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnormsubscript𝒙𝑓𝒙superscript𝝃𝑓𝒙superscript𝝃𝛿normsubscript𝒙Prconditionalsuperscript𝝃𝒙Prconditionalsuperscript𝝃𝒙2\displaystyle\leq\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|g_{2}(\bm{x},% \bm{\xi}^{\prime})\|^{2}]\leq\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left% [\left(\|\nabla_{\bm{x}}f(\bm{x},\bm{\xi}^{\prime})\|+|f(\bm{x},\bm{\xi}^{% \prime})-\delta|\left\|\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}^{\prime}\mid% \bm{x})}{\mathrm{Pr}(\bm{\xi}^{\prime}\mid\bm{x})}\right\|\right)^{2}\right]≤ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( ∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ + | italic_f ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ | ∥ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ bold_italic_x ) end_ARG ∥ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
𝔼𝝃D(𝒙)[(Lf+2fmaxM)2]=(Lf+2fmaxM)2,absentsubscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptsubscript𝐿𝑓2subscript𝑓𝑀2superscriptsubscript𝐿𝑓2subscript𝑓𝑀2\displaystyle\leq\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[(L_{f}+2f_{\max}% M)^{2}]=(L_{f}+2f_{\max}M)^{2},≤ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = ( italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first equality comes from Lemma 6. The third inequality follows from conditions (i) and (iii) of Assumption 1 and Assumption 2. ∎

A.8 Proof of Lemma 8

Proof.

For all 𝒙𝒞𝒙𝒞\bm{x}\in\mathcal{C}bold_italic_x ∈ caligraphic_C,

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]=𝔼𝝃D(𝒙)[𝒙f(𝒙,𝝃)+f(𝒙,𝝃)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)]normsubscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃normsubscript𝔼similar-to𝝃𝐷𝒙delimited-[]subscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃subscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙\displaystyle\|\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm% {\xi})]\|=\left\|\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[\nabla_{\bm{x}}f(\bm% {x},\bm{\xi})+f(\bm{x},\bm{\xi})\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid% \bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right]\right\|∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ = ∥ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) + italic_f ( bold_italic_x , bold_italic_ξ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ] ∥
𝔼𝝃D(𝒙)[𝒙f(𝒙,𝝃)+|f(𝒙,𝝃)|𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)]𝔼𝝃D(𝒙)[Lf+fmaxM]absentnormsubscript𝔼similar-to𝝃𝐷𝒙delimited-[]normsubscript𝒙𝑓𝒙𝝃𝑓𝒙𝝃normsubscript𝒙Prconditional𝝃𝒙Prconditional𝝃𝒙normsubscript𝔼similar-to𝝃𝐷𝒙delimited-[]subscript𝐿𝑓subscript𝑓𝑀\displaystyle\leq\left\|\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[\|\nabla_{\bm% {x}}f(\bm{x},\bm{\xi})\|+|f(\bm{x},\bm{\xi})|\left\|\frac{\nabla_{\bm{x}}% \mathrm{Pr}(\bm{\xi}\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}\mid\bm{x})}\right\|% \right]\right\|\leq\left\|\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[L_{f}+f_{% \max}M\right]\right\|≤ ∥ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f ( bold_italic_x , bold_italic_ξ ) ∥ + | italic_f ( bold_italic_x , bold_italic_ξ ) | ∥ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) end_ARG ∥ ] ∥ ≤ ∥ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ] ∥
=Lf+fmaxM,absentsubscript𝐿𝑓subscript𝑓𝑀\displaystyle=L_{f}+f_{\max}M,= italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ,

where the first equality follows from Lemma 6 with δ=0𝛿0\delta=0italic_δ = 0, and the second inequality comes from conditions (i) and (iii) of Assumption 1 and Assumption 2. ∎

A.9 Proof of Proposition 9

Proof.

We show that our problem satisfies the assumptions of [Besbes et al., 2015, Lemma C-5]. First, we show that {ψk}k=1Rsuperscriptsubscriptsubscript𝜓𝑘𝑘1𝑅\{\psi_{k}\}_{k=1}^{R}{ italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT in our problem is included in ssubscript𝑠\mathcal{F}_{s}caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT defined by [Besbes et al., 2015, Section 5]. ssubscript𝑠\mathcal{F}_{s}caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is a class of sequences {gk}k=1Rsuperscriptsubscriptsubscript𝑔𝑘𝑘1𝑅\{g_{k}\}_{k=1}^{R}{ italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT of convex cost functions from 𝒴d𝒴superscript𝑑\mathcal{Y}\subset\mathbb{R}^{d}caligraphic_Y ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into \mathbb{R}blackboard_R, where 𝒴𝒴\mathcal{Y}caligraphic_Y is convex, compact, and non-empty. Moreover, ssubscript𝑠\mathcal{F}_{s}caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and 𝒴𝒴\mathcal{Y}caligraphic_Y satify the following conditions for all k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ]:

  • 1.

    There is a finite number G>0𝐺0G>0italic_G > 0 such that |gk(𝒚)|Gsubscript𝑔𝑘𝒚𝐺|g_{k}(\bm{y})|\leq G| italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_y ) | ≤ italic_G and 𝒚gk(𝒚)Gnormsubscript𝒚subscript𝑔𝑘𝒚𝐺\|\nabla_{\bm{y}}g_{k}(\bm{y})\|\leq G∥ ∇ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_y ) ∥ ≤ italic_G for all 𝒚𝒴𝒚𝒴\bm{y}\in\mathcal{Y}bold_italic_y ∈ caligraphic_Y.

  • 2.

    There is some ν>0𝜈0\nu>0italic_ν > 0 such that {𝒚d:𝒚𝒚k*ν}𝒴conditional-set𝒚superscript𝑑norm𝒚subscriptsuperscript𝒚𝑘𝜈𝒴\{\bm{y}\in\mathbb{R}^{d}:\|\bm{y}-\bm{y}^{*}_{k}\|\leq\nu\}\subset\mathcal{Y}{ bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ bold_italic_y - bold_italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ≤ italic_ν } ⊂ caligraphic_Y, where 𝒚k*argmin𝒚𝒴ψk(𝒚)subscriptsuperscript𝒚𝑘subscript𝒚𝒴subscript𝜓𝑘𝒚\bm{y}^{*}_{k}\in\arg\min_{\bm{y}\in\mathcal{Y}}\psi_{k}(\bm{y})bold_italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ roman_arg roman_min start_POSTSUBSCRIPT bold_italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_y ).

  • 3.

    There are finite numbers H>0𝐻0H>0italic_H > 0 and G>0𝐺0G>0italic_G > 0 such that H𝑰d𝒚2gk(𝒚)G𝑰dprecedes-or-equals𝐻subscript𝑰𝑑superscriptsubscript𝒚2subscript𝑔𝑘𝒚precedes-or-equals𝐺subscript𝑰𝑑H\bm{I}_{d}\preceq\nabla_{\bm{y}}^{2}g_{k}(\bm{y})\preceq G\bm{I}_{d}italic_H bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ⪯ ∇ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_y ) ⪯ italic_G bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, where 𝑰dsubscript𝑰𝑑\bm{I}_{d}bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is the d𝑑ditalic_d-dimensional identity matrix.

We consider the case of d=1𝑑1d=1italic_d = 1 and 𝒴=[fmaxκ,fmax+κ]𝒴subscript𝑓𝜅subscript𝑓𝜅\mathcal{Y}=[-f_{\max}-\kappa,f_{\max}+\kappa]caligraphic_Y = [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_κ , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_κ ]. Accordingly, {ψk}k=1Rsuperscriptsubscriptsubscript𝜓𝑘𝑘1𝑅\{\psi_{k}\}_{k=1}^{R}{ italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT in our problem is included in ssubscript𝑠\mathcal{F}_{s}caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT since the following holds for any δ𝒴𝛿𝒴\delta\in\mathcal{Y}italic_δ ∈ caligraphic_Y and k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ]:

|ψk(δ)|=12(δ𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)])212(2fmax+κ)2,subscript𝜓𝑘𝛿12superscript𝛿subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃212superscript2subscript𝑓𝜅2\displaystyle|\psi_{k}(\delta)|=\frac{1}{2}(\delta-\mathbb{E}_{\bm{\xi}\sim D(% \bm{x}_{k})}[f(\bm{x}_{k},\bm{\xi})])^{2}\leq\frac{1}{2}(2f_{\max}+\kappa)^{2},| italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ ) | = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_δ - blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_κ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
|δψk(δ)|=|δ𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]|2fmax+κ,subscript𝛿subscript𝜓𝑘𝛿𝛿subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃2subscript𝑓𝜅\displaystyle|\nabla_{\delta}\psi_{k}(\delta)|=|\delta-\mathbb{E}_{\bm{\xi}% \sim D(\bm{x}_{k})}[f(\bm{x}_{k},\bm{\xi})]|\leq 2f_{\max}+\kappa,| ∇ start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ ) | = | italic_δ - blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] | ≤ 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_κ ,
argminδψk(δ)[fmax,fmax],andsubscriptsuperscript𝛿subscript𝜓𝑘superscript𝛿subscript𝑓subscript𝑓and\displaystyle\arg\min_{\delta^{\prime}}\psi_{k}(\delta^{\prime})\in[-f_{\max},% f_{\max}],\ \textrm{and}roman_arg roman_min start_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] , and
δ2ψk(δ)=1.superscriptsubscript𝛿2subscript𝜓𝑘𝛿1\displaystyle\nabla_{\delta}^{2}\psi_{k}(\delta)=1.∇ start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ ) = 1 .

Here, (C-9) in the proof of Lemma C-5 of Besbes et al. [2015] holds by letting ϕ1(δk,ψk):=δkvkassignsuperscriptitalic-ϕ1subscript𝛿𝑘subscript𝜓𝑘subscript𝛿𝑘subscript𝑣𝑘\phi^{1}(\delta_{k},\psi_{k}):=\delta_{k}-v_{k}italic_ϕ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT since 𝔼vk[δkvk]=δk𝔼𝝃D(𝒙k)[f(𝒙k,𝝃)]=ψk(δk)subscript𝔼subscript𝑣𝑘delimited-[]subscript𝛿𝑘subscript𝑣𝑘subscript𝛿𝑘subscript𝔼similar-to𝝃𝐷subscript𝒙𝑘delimited-[]𝑓subscript𝒙𝑘𝝃subscript𝜓𝑘subscript𝛿𝑘\mathbb{E}_{v_{k}}[\delta_{k}-v_{k}]=\delta_{k}-\mathbb{E}_{\bm{\xi}\sim D(\bm% {x}_{k})}[f(\bm{x}_{k},\bm{\xi})]=\nabla\psi_{k}(\delta_{k})blackboard_E start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_ξ ) ] = ∇ italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and 𝔼vk[|δkvk|2]|2fmax+κ|2subscript𝔼subscript𝑣𝑘delimited-[]superscriptsubscript𝛿𝑘subscript𝑣𝑘2superscript2subscript𝑓𝜅2\mathbb{E}_{v_{k}}[|\delta_{k}-v_{k}|^{2}]\leq|2f_{\max}+\kappa|^{2}blackboard_E start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ | 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_κ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Proposition 9 follows from the same argument as in the proof of [Besbes et al., 2015, Lemma C-5]. ∎

A.10 Proof of Theorem 10

Proof.

When δk[fmax,fmax]subscript𝛿𝑘subscript𝑓subscript𝑓\delta_{k}\in[-f_{\max},f_{\max}]italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], the output δk+1subscript𝛿𝑘1\delta_{k+1}italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT of Algorithm 2 is included in [fmax,fmax]subscript𝑓subscript𝑓[-f_{\max},f_{\max}][ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] from 1mk=1mkf(𝒙kmd,𝝃)[fmax,fmax]1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑓superscriptsubscript𝒙𝑘𝑚𝑑superscript𝝃subscript𝑓subscript𝑓\frac{1}{m_{k}}{\sum_{\ell=1}^{m_{k}}}f(\bm{x}_{k}^{md},\bm{\xi}^{\ell})\in[-f% _{\max},f_{\max}]divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], ζk+1(0,1)subscript𝜁𝑘101\zeta_{k+1}\in(0,1)italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ), and line 10 of Algorithm 2. Therefore, δk[fmax,fmax]subscript𝛿𝑘subscript𝑓subscript𝑓\delta_{k}\in[-f_{\max},f_{\max}]italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] for all k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ] from δ1[fmax,fmax]subscript𝛿1subscript𝑓subscript𝑓\delta_{1}\in[-f_{\max},f_{\max}]italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ - italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ]. From Lemmas 6-8, Assumption 2, and [Ghadimi and Lan, 2016, Corollary 6], we have

𝔼[𝒢(𝒙Rmd,βR)2]96LEf[4LEf𝒙0𝒙*2N(N+1)(N+2)+LEf(𝒙*2+H2)+2D~2N].𝔼delimited-[]superscriptnorm𝒢superscriptsubscript𝒙𝑅𝑚𝑑subscript𝛽𝑅296subscript𝐿𝐸𝑓delimited-[]4subscript𝐿𝐸𝑓superscriptnormsubscript𝒙0superscript𝒙2𝑁𝑁1𝑁2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2𝑁\displaystyle\mathbb{E}[\|\mathcal{G}(\bm{x}_{R}^{md},{\beta_{R}})\|^{2}]\leq 9% 6L_{Ef}\left[\frac{4L_{Ef}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}}{N(N+1)(N+2)}+\frac{L_% {Ef}(\|\bm{x}^{*}\|^{2}+H^{2})+2\tilde{D}^{2}}{N}\right].blackboard_E [ ∥ caligraphic_G ( bold_italic_x start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 96 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT [ divide start_ARG 4 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N ( italic_N + 1 ) ( italic_N + 2 ) end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ] .

Here, in [Ghadimi and Lan, 2016, Corollary 6], we let Lψ:=LEfassignsubscript𝐿𝜓subscript𝐿𝐸𝑓L_{\psi}:=L_{Ef}italic_L start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT, Lf:=LEfassignsubscript𝐿𝑓subscript𝐿𝐸𝑓L_{f}:=L_{Ef}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT, and σ2:=(Lf+2fmaxM)2assignsuperscript𝜎2superscriptsubscript𝐿𝑓2subscript𝑓𝑀2\sigma^{2}:=(L_{f}+2f_{\max}M)^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := ( italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, to obtain an ε𝜀\varepsilonitalic_ε-stationary point, we need the iteration number N^^𝑁\hat{N}over^ start_ARG italic_N end_ARG such that

96LEf[4LEf𝒙0𝒙*2N^(N^+1)(N^+2)+LEf(𝒙*2+H2)+2D~2N^]ε2.96subscript𝐿𝐸𝑓delimited-[]4subscript𝐿𝐸𝑓superscriptnormsubscript𝒙0superscript𝒙2^𝑁^𝑁1^𝑁2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2^𝑁superscript𝜀2\displaystyle 96L_{Ef}\left[\frac{4L_{Ef}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}}{\hat{N% }(\hat{N}+1)(\hat{N}+2)}+\frac{L_{Ef}(\|\bm{x}^{*}\|^{2}+H^{2})+2\tilde{D}^{2}% }{\hat{N}}\right]\leq\varepsilon^{2}.96 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT [ divide start_ARG 4 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_N end_ARG ( over^ start_ARG italic_N end_ARG + 1 ) ( over^ start_ARG italic_N end_ARG + 2 ) end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_N end_ARG end_ARG ] ≤ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (14)

Eq. (14) can be reformulated as

N^(N^+1)(N^+2)384LEf2ε2𝒙0𝒙*2+96LEfε2(LEf(𝒙*2+H2)+2D~2)(N^+1)(N^+2).^𝑁^𝑁1^𝑁2384superscriptsubscript𝐿𝐸𝑓2superscript𝜀2superscriptnormsubscript𝒙0superscript𝒙296subscript𝐿𝐸𝑓superscript𝜀2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2^𝑁1^𝑁2\displaystyle\hat{N}(\hat{N}+1)(\hat{N}+2)\geq\frac{384L_{Ef}^{2}}{\varepsilon% ^{2}}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}+\frac{96L_{Ef}}{\varepsilon^{2}}(L_{Ef}(\|% \bm{x}^{*}\|^{2}+H^{2})+2\tilde{D}^{2})(\hat{N}+1)(\hat{N}+2).over^ start_ARG italic_N end_ARG ( over^ start_ARG italic_N end_ARG + 1 ) ( over^ start_ARG italic_N end_ARG + 2 ) ≥ divide start_ARG 384 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 96 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( over^ start_ARG italic_N end_ARG + 1 ) ( over^ start_ARG italic_N end_ARG + 2 ) .

Therefore, the sufficient condition for (14) is as follows:

N^3768LEf2ε2𝒙0𝒙*2,N^192LEfε2(LEf(𝒙*2+H2)+2D~2).formulae-sequencesuperscript^𝑁3768superscriptsubscript𝐿𝐸𝑓2superscript𝜀2superscriptnormsubscript𝒙0superscript𝒙2^𝑁192subscript𝐿𝐸𝑓superscript𝜀2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2\displaystyle\hat{N}^{3}\geq\frac{768L_{Ef}^{2}}{\varepsilon^{2}}\|\bm{x}_{0}-% \bm{x}^{*}\|^{2},\ \hat{N}\geq\frac{192L_{Ef}}{\varepsilon^{2}}(L_{Ef}(\|\bm{x% }^{*}\|^{2}+H^{2})+2\tilde{D}^{2}).over^ start_ARG italic_N end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≥ divide start_ARG 768 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_N end_ARG ≥ divide start_ARG 192 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

A.11 Proof of Proposition 11

Proof.

Assumption 4 holds since c(𝝃)𝑐𝝃c(\bm{\xi})italic_c ( bold_italic_ξ ) is continuous from the definition and
𝔼𝝃D(𝒙)[s(𝒙,𝝃)]=𝔼𝝃D(𝒙)[i=1nxiξi]=i=1nxi𝔼𝝃D(𝒙)[ξi]=s(𝒙,𝔼𝝃D(𝒙)[𝝃])subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑠𝒙𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝜉𝑖superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝔼similar-to𝝃𝐷𝒙delimited-[]subscript𝜉𝑖𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[s(\bm{x},\bm{\xi})]=\mathbb{E}_{\bm{\xi}% \sim D(\bm{x})}\left[\sum_{i=1}^{n}x_{i}\xi_{i}\right]=\sum_{i=1}^{n}x_{i}% \mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\xi_{i}]=s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D% (\bm{x})}[\bm{\xi}])blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_s ( bold_italic_x , bold_italic_ξ ) ] = blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ). Moreover, since pk(𝒙)0subscript𝑝𝑘𝒙0p_{k}(\bm{x})\neq 0italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) ≠ 0 for all 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and k{0,,n}𝑘0𝑛k\in\{0,\dots,n\}italic_k ∈ { 0 , … , italic_n } from the definition of pi(𝒙)subscript𝑝𝑖𝒙p_{i}(\bm{x})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) for each i{0,1,,n}𝑖01𝑛i\in\{0,1,\dots,n\}italic_i ∈ { 0 , 1 , … , italic_n }, (𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃))k=i=0nCξimpi(𝒙)ξipk1i=0nCξimpi(𝒙)ξi=i=0nCξimpi(𝒙)ξiξkpk(𝒙)1i=0nCξimpi(𝒙)ξi=ξkpk(𝒙)subscriptsubscript𝒑italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃𝑘superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖subscript𝑝𝑘1superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖subscript𝜉𝑘subscript𝑝𝑘𝒙1superscriptsubscriptproduct𝑖0𝑛subscriptsubscript𝐶subscript𝜉𝑖𝑚subscript𝑝𝑖superscript𝒙subscript𝜉𝑖subscript𝜉𝑘subscript𝑝𝑘𝒙\left(\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),% \bm{\xi})}\right)_{k}=\frac{\partial\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(\bm{% x})^{\xi_{i}}}{\partial p_{k}}\frac{1}{\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(% \bm{x})^{\xi_{i}}}=\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(\bm{x})^{\xi_{i}}% \frac{\xi_{k}}{p_{k}(\bm{x})}\frac{1}{\prod_{i=0}^{n}{}_{m}C_{\xi_{i}}p_{i}(% \bm{x})^{\xi_{i}}}=\frac{\xi_{k}}{p_{k}(\bm{x})}( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∂ ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_m end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG. ∎

A.12 Proof of Proposition 12

Proof.

Assumption 4 holds since c(𝝃)𝑐𝝃c(\bm{\xi})italic_c ( bold_italic_ξ ) is continuous and s(𝒙,𝝃)=0𝑠𝒙𝝃0s(\bm{x},\bm{\xi})=0italic_s ( bold_italic_x , bold_italic_ξ ) = 0. Moreover, since 0<pi(x)<10subscript𝑝𝑖𝑥10<p_{i}(x)<10 < italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) < 1 for all x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R from the definition of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have

(𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃))k=iICξidipi(xi)ξi(1pi(xi))diξipk1iICξidipi(xi)ξi(1pi(xi))diξisubscriptsubscript𝒑italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃𝑘subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖subscript𝑝𝑘1subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖\displaystyle\left(\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(% \bm{p}(\bm{x}),\bm{\xi})}\right)_{k}=\frac{\partial\prod_{i\in I}{}_{d_{i}}C_{% \xi_{i}}p_{i}(x_{i})^{\xi_{i}}(1-p_{i}(x_{i}))^{d_{i}-\xi_{i}}}{\partial p_{k}% }\frac{1}{\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})^{\xi_{i}}(1-p_{i}(x_% {i}))^{d_{i}-\xi_{i}}}( divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∂ ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG
=iICξidipi(xi)ξi(1pi(xi))diξi(ξkpk(xk)dkξk1pk(xk))1iICξidipi(xi)ξi(1pi(xi))diξiabsentsubscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖subscript𝜉𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘1subscript𝑝𝑘subscript𝑥𝑘1subscriptproduct𝑖𝐼subscriptsubscript𝐶subscript𝜉𝑖subscript𝑑𝑖subscript𝑝𝑖superscriptsubscript𝑥𝑖subscript𝜉𝑖superscript1subscript𝑝𝑖subscript𝑥𝑖subscript𝑑𝑖subscript𝜉𝑖\displaystyle=\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{i})^{\xi_{i}}(1-p_{i% }(x_{i}))^{d_{i}-\xi_{i}}\left(\frac{\xi_{k}}{p_{k}(x_{k})}-\frac{d_{k}-\xi_{k% }}{1-p_{k}(x_{k})}\right)\frac{1}{\prod_{i\in I}{}_{d_{i}}C_{\xi_{i}}p_{i}(x_{% i})^{\xi_{i}}(1-p_{i}(x_{i}))^{d_{i}-\xi_{i}}}= ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - divide start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ) divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG
=ξkpk(xk)dkξk1pk(xk).absentsubscript𝜉𝑘subscript𝑝𝑘subscript𝑥𝑘subscript𝑑𝑘subscript𝜉𝑘1subscript𝑝𝑘subscript𝑥𝑘\displaystyle=\frac{\xi_{k}}{p_{k}(x_{k})}-\frac{d_{k}-\xi_{k}}{1-p_{k}(x_{k})}.= divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - divide start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG .

A.13 Proof of Lemma 13

Proof.

It follows from the definition of ϕitalic-ϕ\phiitalic_ϕ that

𝔼𝝃D(𝒙)[δ𝒙ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)]=𝝃Ξ(δ𝒙ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃))𝑑Pr(𝝃𝒙)=δ𝝃Ξ𝒙Pr(𝝃𝒙)𝑑𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝛿subscript𝒙italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃subscript𝝃Ξ𝛿subscript𝒙italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃differential-dPrconditional𝝃𝒙𝛿subscript𝝃Ξsubscript𝒙Prconditional𝝃𝒙differential-d𝝃\displaystyle\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[\delta\frac{\nabla_{\bm{% x}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),\bm{\xi})}\right]=\int_{% \bm{\xi}\in\Xi}\left(\delta\frac{\nabla_{\bm{x}}\phi(\bm{p}(\bm{x}),\bm{\xi})}% {\phi(\bm{p}(\bm{x}),\bm{\xi})}\right)d\mathrm{Pr}(\bm{\xi}\mid\bm{x})=\delta% \int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}\mid\bm{x})d\bm{\xi}blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_δ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ] = ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ( italic_δ divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ) italic_d roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) = italic_δ ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ
=δ𝒙𝝃ΞPr(𝝃𝒙)𝑑𝝃=δ𝒙(1)=0,absent𝛿subscript𝒙subscript𝝃ΞPrconditional𝝃𝒙differential-d𝝃𝛿subscript𝒙10\displaystyle=\delta\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}\mathrm{Pr}(\bm{\xi}% \mid\bm{x})d\bm{\xi}=\delta\nabla_{\bm{x}}(1)=0,= italic_δ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) italic_d bold_italic_ξ = italic_δ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( 1 ) = 0 , (15)

where the third equality comes from Lemma 5 with q(𝝃)=1𝑞𝝃1q(\bm{\xi})=1italic_q ( bold_italic_ξ ) = 1. Since

𝔼𝝃D(𝒙)[f(𝒙,𝝃)]=s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)𝑑𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝝃Ξ𝑐𝝃italic-ϕ𝒑𝒙𝝃differential-d𝝃\displaystyle\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{\xi})]=-s(\bm{x}% ,\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])+\int_{\bm{\xi}\in\Xi}c(\bm{\xi% })\phi(\bm{p}(\bm{x}),\bm{\xi})d\bm{\xi}blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] = - italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ

from Assumptions 4 and 5, we have

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃\displaystyle\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{% \xi})]∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] =𝒙(s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)𝑑𝝃)absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝝃Ξ𝑐𝝃italic-ϕ𝒑𝒙𝝃differential-d𝝃\displaystyle=\nabla_{\bm{x}}\left(-s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x}% )}[\bm{\xi}])+\int_{\bm{\xi}\in\Xi}c(\bm{\xi})\phi(\bm{p}(\bm{x}),\bm{\xi})d% \bm{\xi}\right)= ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( - italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ )
=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)𝑑𝝃absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝒙subscript𝝃Ξ𝑐𝝃italic-ϕ𝒑𝒙𝝃differential-d𝝃\displaystyle=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm% {\xi}])+\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}c(\bm{\xi})\phi(\bm{p}(\bm{x}),\bm% {\xi})d\bm{\xi}= - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ
=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝝃Ξ𝒙(c(𝝃)ϕ(𝒑(𝒙),𝝃))d𝝃absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝝃Ξsubscript𝒙𝑐𝝃italic-ϕ𝒑𝒙𝝃𝑑𝝃\displaystyle=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm% {\xi}])+\int_{\bm{\xi}\in\Xi}\nabla_{\bm{x}}\left(c(\bm{\xi})\phi(\bm{p}(\bm{x% }),\bm{\xi})\right)d\bm{\xi}= - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) ) italic_d bold_italic_ξ
=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝔼𝝃D(𝒙)[c(𝝃)𝒙ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)]absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑐𝝃subscript𝒙italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃\displaystyle=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm% {\xi}])+\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[c(\bm{\xi})\frac{\nabla_{\bm{% x}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),\bm{\xi})}\right]= - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_c ( bold_italic_ξ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ]
=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝔼𝝃D(𝒙)[(c(𝝃)δ)𝒙ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)]absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑐𝝃𝛿subscript𝒙italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃\displaystyle=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm% {\xi}])+\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[(c(\bm{\xi})-\delta)\frac{% \nabla_{\bm{x}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{p}(\bm{x}),\bm{\xi})}\right]= - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( italic_c ( bold_italic_ξ ) - italic_δ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ]
=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)].absentsubscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑐𝝃𝛿𝑑𝒑𝒙𝑑𝒙subscript𝒑italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃\displaystyle=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm% {\xi}])+\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[(c(\bm{\xi})-\delta)\frac{d% \bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{% \phi(\bm{p}(\bm{x}),\bm{\xi})}\right].= - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( italic_c ( bold_italic_ξ ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ] .

Here, the third equality holds when 𝝃𝝃\bm{\xi}bold_italic_ξ is a discrete random vector. When 𝝃𝝃\bm{\xi}bold_italic_ξ is a continuous random vector, the third equality comes from Assumption 3 and Lemma 5 by letting q(𝝃):=c(𝝃)assign𝑞𝝃𝑐𝝃q(\bm{\xi}):=c(\bm{\xi})italic_q ( bold_italic_ξ ) := italic_c ( bold_italic_ξ ). The fourth equality is due to the fact that ϕ(𝒑(𝒙),𝝃)=Pr(𝝃𝒙)0italic-ϕ𝒑𝒙𝝃Prconditional𝝃𝒙0\phi(\bm{p}(\bm{x}),\bm{\xi})=\mathrm{Pr}(\bm{\xi}\mid\bm{x})\neq 0italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) = roman_Pr ( bold_italic_ξ ∣ bold_italic_x ) ≠ 0 from condition (ii) of Assumption 1. The fifth equality comes from (15). ∎

A.14 Proof of Lemma 14

Proof.

Under Assumptions 4 and 5, we have

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)𝑑𝝃.subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃subscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝒙subscript𝝃Ξ𝑐𝝃italic-ϕ𝒑𝒙𝝃differential-d𝝃\displaystyle\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{% \xi})]=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])% +\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}c(\bm{\xi})\phi(\bm{p}(\bm{x}),\bm{\xi})d% \bm{\xi}.∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] = - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ . (16)

From Lemma 13,

𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]=𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)].subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃subscript𝒙𝑠𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑐𝝃𝛿𝑑𝒑𝒙𝑑𝒙subscript𝒑italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃\displaystyle\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm{% \xi})]=-\nabla_{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])% +\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[(c(\bm{\xi})-\delta)\frac{d\bm{p}(% \bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{\xi})}{\phi(\bm{% p}(\bm{x}),\bm{\xi})}\right].∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] = - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( italic_c ( bold_italic_ξ ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ] . (17)

Then, from (16) and (17),

𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)]=𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)𝑑𝝃.subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑐𝝃𝛿𝑑𝒑𝒙𝑑𝒙subscript𝒑italic-ϕ𝒑𝒙𝝃italic-ϕ𝒑𝒙𝝃subscript𝒙subscript𝝃Ξ𝑐𝝃italic-ϕ𝒑𝒙𝝃differential-d𝝃\displaystyle\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}\left[(c(\bm{\xi})-\delta)% \frac{d\bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi(\bm{p}(\bm{x}),\bm{% \xi})}{\phi(\bm{p}(\bm{x}),\bm{\xi})}\right]=\nabla_{\bm{x}}\int_{\bm{\xi}\in% \Xi}c(\bm{\xi})\phi(\bm{p}(\bm{x}),\bm{\xi})d\bm{\xi}.blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( italic_c ( bold_italic_ξ ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) end_ARG ] = ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ . (18)

Then,

𝔼𝝃D(𝒙)[g2(𝒙,𝝃)𝒙𝔼𝝃D(𝒙)[f(𝒙,𝝃)]2]subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnormsubscript𝑔2𝒙superscript𝝃subscript𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝑓𝒙𝝃2\displaystyle\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}[\|g_{2}(\bm{x},\bm{% \xi}^{\prime})-\nabla_{\bm{x}}\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[f(\bm{x},\bm% {\xi})]\|^{2}]blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ italic_f ( bold_italic_x , bold_italic_ξ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=𝔼𝝃D(𝒙)[𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])+(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)\displaystyle=\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[\left\|-\nabla% _{\bm{x}}s(\bm{x},\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])+(c(\bm{\xi}^{% \prime})-\delta)\frac{d\bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi(\bm{p% }(\bm{x}),\bm{\xi}^{\prime})}{\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})}\right.\right.= blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) + ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG
+𝒙s(𝒙,𝔼𝝃D(𝒙)[𝝃])𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)d𝝃}2]\displaystyle\qquad\qquad\qquad\left.\left.+\nabla_{\bm{x}}s(\bm{x},\mathbb{E}% _{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}])-\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}c(\bm% {\xi})\phi(\bm{p}(\bm{x}),\bm{\xi})d\bm{\xi}\}\right\|^{2}\right]+ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_s ( bold_italic_x , blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ] ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ } ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)d𝝃}2]\displaystyle=\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[\left\|(c(\bm{% \xi}^{\prime})-\delta)\frac{d\bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi% (\bm{p}(\bm{x}),\bm{\xi}^{\prime})}{\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})}-% \nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}c(\bm{\xi})\phi(\bm{p}(\bm{x}),\bm{\xi})d% \bm{\xi}\}\right\|^{2}\right]= blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ } ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)2]𝒙𝝃Ξc(𝝃)ϕ(𝒑(𝒙),𝝃)d𝝃}2\displaystyle=\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[\left\|(c(\bm{% \xi}^{\prime})-\delta)\frac{d\bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}\phi% (\bm{p}(\bm{x}),\bm{\xi}^{\prime})}{\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})}% \right\|^{2}\right]-\left\|\nabla_{\bm{x}}\int_{\bm{\xi}\in\Xi}c(\bm{\xi})\phi% (\bm{p}(\bm{x}),\bm{\xi})d\bm{\xi}\}\right\|^{2}= blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ∥ ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_ξ ∈ roman_Ξ end_POSTSUBSCRIPT italic_c ( bold_italic_ξ ) italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ ) italic_d bold_italic_ξ } ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝔼𝝃D(𝒙)[(c(𝝃)δ)d𝒑(𝒙)d𝒙𝒑ϕ(𝒑(𝒙),𝝃)ϕ(𝒑(𝒙),𝝃)2]=𝔼𝝃D(𝒙)[(c(𝝃)δ)𝒙Pr(𝝃𝒙)Pr(𝝃𝒙)2]absentsubscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnorm𝑐superscript𝝃𝛿𝑑𝒑𝒙𝑑𝒙subscript𝒑italic-ϕ𝒑𝒙superscript𝝃italic-ϕ𝒑𝒙superscript𝝃2subscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscriptnorm𝑐superscript𝝃𝛿subscript𝒙Prconditionalsuperscript𝝃𝒙Prconditionalsuperscript𝝃𝒙2\displaystyle\leq\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[\left\|(c(% \bm{\xi}^{\prime})-\delta)\frac{d\bm{p}(\bm{x})}{d\bm{x}}\frac{\nabla_{\bm{p}}% \phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})}{\phi(\bm{p}(\bm{x}),\bm{\xi}^{\prime})% }\right\|^{2}\right]=\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[\left\|% (c(\bm{\xi}^{\prime})-\delta)\frac{\nabla_{\bm{x}}\mathrm{Pr}(\bm{\xi}^{\prime% }\mid\bm{x})}{\mathrm{Pr}(\bm{\xi}^{\prime}\mid\bm{x})}\right\|^{2}\right]≤ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG italic_d bold_italic_p ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_x end_ARG divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_p end_POSTSUBSCRIPT italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϕ ( bold_italic_p ( bold_italic_x ) , bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ∥ ( italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_δ ) divide start_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ bold_italic_x ) end_ARG start_ARG roman_Pr ( bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ bold_italic_x ) end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
𝔼𝝃D(𝒙)[(2cmax)2M2]=4(cmaxM)2,absentsubscript𝔼similar-tosuperscript𝝃𝐷𝒙delimited-[]superscript2subscript𝑐2superscript𝑀24superscriptsubscript𝑐𝑀2\displaystyle\leq\mathbb{E}_{\bm{\xi}^{\prime}\sim D(\bm{x})}\left[(2c_{\max})% ^{2}M^{2}\right]=4(c_{\max}M)^{2},≤ blackboard_E start_POSTSUBSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ ( 2 italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = 4 ( italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first equality comes from (16), and the third equality follows from (18). The second inequality follows from condition (iii) of Assumption 1. ∎

A.15 Proof of Theorem 15

Proof.

When δk[cmax,cmax]subscript𝛿𝑘subscript𝑐subscript𝑐\delta_{k}\in[-c_{\max},c_{\max}]italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], the output δk+1subscript𝛿𝑘1\delta_{k+1}italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT of Algorithm 3 is included in [cmax,cmax]subscript𝑐subscript𝑐[-c_{\max},c_{\max}][ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] from 1mk=1mkc(𝝃)[cmax,cmax]1subscript𝑚𝑘superscriptsubscript1subscript𝑚𝑘𝑐superscript𝝃subscript𝑐subscript𝑐\frac{1}{m_{k}}\sum_{\ell=1}^{m_{k}}c(\bm{\xi}^{\ell})\in[-c_{\max},c_{\max}]divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c ( bold_italic_ξ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ], ζk+1(0,1)subscript𝜁𝑘101\zeta_{k+1}\in(0,1)italic_ζ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ), and the update rule for δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in Algorithm 3. Therefore, δk[cmax,cmax]subscript𝛿𝑘subscript𝑐subscript𝑐\delta_{k}\in[-c_{\max},c_{\max}]italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] for all k[R]𝑘delimited-[]𝑅k\in[R]italic_k ∈ [ italic_R ] from δ1[cmax,cmax]subscript𝛿1subscript𝑐subscript𝑐\delta_{1}\in[-c_{\max},c_{\max}]italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ - italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ]. From Lemmas 8, 13, 14, and [Ghadimi and Lan, 2016, Corollary 6], we have

𝔼[𝒢(𝒙Rmd,βR)2]96LEf[4LEf𝒙0𝒙*2N(N+1)(N+2)+LEf(𝒙*2+H2)+2D~2N].𝔼delimited-[]superscriptnorm𝒢superscriptsubscript𝒙𝑅𝑚𝑑subscript𝛽𝑅296subscript𝐿𝐸𝑓delimited-[]4subscript𝐿𝐸𝑓superscriptnormsubscript𝒙0superscript𝒙2𝑁𝑁1𝑁2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2𝑁\displaystyle\mathbb{E}[\|\mathcal{G}(\bm{x}_{R}^{md},{\beta_{R}})\|^{2}]\leq 9% 6L_{Ef}\left[\frac{4L_{Ef}\|\bm{x}_{0}-\bm{x}^{*}\|^{2}}{N(N+1)(N+2)}+\frac{L_% {Ef}(\|\bm{x}^{*}\|^{2}+H^{2})+2\tilde{D}^{2}}{N}\right].blackboard_E [ ∥ caligraphic_G ( bold_italic_x start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ 96 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT [ divide start_ARG 4 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N ( italic_N + 1 ) ( italic_N + 2 ) end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ] .

Here, in [Ghadimi and Lan, 2016, Corollary 6], we let Lψ:=LEfassignsubscript𝐿𝜓subscript𝐿𝐸𝑓L_{\psi}:=L_{Ef}italic_L start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT, Lf:=LEfassignsubscript𝐿𝑓subscript𝐿𝐸𝑓L_{f}:=L_{Ef}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT, and σ2:=4(cmaxM)2assignsuperscript𝜎24superscriptsubscript𝑐𝑀2\sigma^{2}:=4(c_{\max}M)^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := 4 ( italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, as in the proof of Theorem 10, we need the iteration number N^^𝑁\hat{N}over^ start_ARG italic_N end_ARG to obtain an ε𝜀\varepsilonitalic_ε-stationary point such that:

N^3768LEf2ε2𝒙0𝒙*2,N^192LEfε2(LEf(𝒙*2+H2)+2D~2).formulae-sequencesuperscript^𝑁3768superscriptsubscript𝐿𝐸𝑓2superscript𝜀2superscriptnormsubscript𝒙0superscript𝒙2^𝑁192subscript𝐿𝐸𝑓superscript𝜀2subscript𝐿𝐸𝑓superscriptnormsuperscript𝒙2superscript𝐻22superscript~𝐷2\displaystyle\hat{N}^{3}\geq\frac{768L_{Ef}^{2}}{\varepsilon^{2}}\|\bm{x}_{0}-% \bm{x}^{*}\|^{2},\ \hat{N}\geq\frac{192L_{Ef}}{\varepsilon^{2}}(L_{Ef}(\|\bm{x% }^{*}\|^{2}+H^{2})+2\tilde{D}^{2}).over^ start_ARG italic_N end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≥ divide start_ARG 768 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_N end_ARG ≥ divide start_ARG 192 italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_L start_POSTSUBSCRIPT italic_E italic_f end_POSTSUBSCRIPT ( ∥ bold_italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Appendix B Details of our experiments

B.1 Common Settings

All experiments were conducted on a computer with an AMD EPYC 7413 24-Core Processor, 503.6 GiB of memory RAM, and Ubuntu 20.04.6 LTS. The program code was implemented in Python 3.8.3.

B.2 Settings of Baselines

L2-Regularized Repeated Gradient Descent (L2-RGD(α𝛼\alphaitalic_α)): This method is described in Section 2.2. We used the fixed step size ηk:=0.01assignsubscript𝜂𝑘0.01\eta_{k}:=0.01italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := 0.01 at each iteration k𝑘kitalic_k.
Bayesian Optimization (BO): We used GPyOpt, a Python open-source library for Bayesian optimization [GPyOpt-authors, 2016]. We used the default setting of the library for parameters other than the termination criteria.
Simultaneous Perturbation Stochastic Approximation (SPSA): At each iteration k𝑘kitalic_k, this method updates the current iterate by using the stochastic perturbation gradient:

f(𝒙k+ckΔk,𝝃k,1)f(𝒙kckΔk,𝝃k,2)ckΔk,𝑓superscript𝒙𝑘subscript𝑐𝑘superscriptΔ𝑘superscript𝝃𝑘1𝑓superscript𝒙𝑘subscript𝑐𝑘superscriptΔ𝑘superscript𝝃𝑘2subscript𝑐𝑘superscriptΔ𝑘\displaystyle\frac{f(\bm{x}^{k}+c_{k}\Delta^{k},\bm{\xi}^{k,1})-f(\bm{x}^{k}-c% _{k}\Delta^{k},\bm{\xi}^{k,2})}{c_{k}\Delta^{k}},divide start_ARG italic_f ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT italic_k , 1 end_POSTSUPERSCRIPT ) - italic_f ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_italic_ξ start_POSTSUPERSCRIPT italic_k , 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ,

where ck:=1(k+1)0.101assignsubscript𝑐𝑘1superscript𝑘10.101c_{k}:=\frac{1}{(k+1)^{0.101}}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG ( italic_k + 1 ) start_POSTSUPERSCRIPT 0.101 end_POSTSUPERSCRIPT end_ARG, each element of ΔksuperscriptΔ𝑘\Delta^{k}roman_Δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is sampled from a Rademacher distribution (i.e. Bernoulli ±1plus-or-minus1\pm 1± 1 with probability 0.50.50.50.5), and 𝝃k,1superscript𝝃𝑘1\bm{\xi}^{k,1}bold_italic_ξ start_POSTSUPERSCRIPT italic_k , 1 end_POSTSUPERSCRIPT and 𝝃k,2superscript𝝃𝑘2\bm{\xi}^{k,2}bold_italic_ξ start_POSTSUPERSCRIPT italic_k , 2 end_POSTSUPERSCRIPT are random vectors sampled from the distribution D(𝒙k)𝐷superscript𝒙𝑘D(\bm{x}^{k})italic_D ( bold_italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ). We set ak:=0.16(100+k+1)0.602assignsubscript𝑎𝑘0.16superscript100𝑘10.602a_{k}:=\frac{0.16}{(100+k+1)^{0.602}}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := divide start_ARG 0.16 end_ARG start_ARG ( 100 + italic_k + 1 ) start_POSTSUPERSCRIPT 0.602 end_POSTSUPERSCRIPT end_ARG as the stepsize at each iteration. The settings of cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, ΔksuperscriptΔ𝑘\Delta^{k}roman_Δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, and aksubscript𝑎𝑘a_{k}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are based on [Spall, 1998, Section III].
Projected Sub-gradient Descent for Average Demand (PSD-AD): This method is a projected subgradient descent method for

min𝒙[xmin,xmax]n(s(𝒙,𝝃¯(𝒙))+c(𝝃¯(𝒙))),subscript𝒙superscriptsubscript𝑥subscript𝑥𝑛𝑠𝒙¯𝝃𝒙𝑐¯𝝃𝒙\displaystyle\min_{\bm{x}\in[x_{\min},x_{\max}]^{n}}(-s(\bm{x},\bar{\bm{\xi}}(% \bm{x}))+c(\bar{\bm{\xi}}(\bm{x}))),roman_min start_POSTSUBSCRIPT bold_italic_x ∈ [ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( - italic_s ( bold_italic_x , over¯ start_ARG bold_italic_ξ end_ARG ( bold_italic_x ) ) + italic_c ( over¯ start_ARG bold_italic_ξ end_ARG ( bold_italic_x ) ) ) ,

where 𝝃¯(𝒙):=𝔼𝝃D(𝒙)[𝝃]assign¯𝝃𝒙subscript𝔼similar-to𝝃𝐷𝒙delimited-[]𝝃\bar{\bm{\xi}}(\bm{x}):=\mathbb{E}_{\bm{\xi}\sim D(\bm{x})}[\bm{\xi}]over¯ start_ARG bold_italic_ξ end_ARG ( bold_italic_x ) := blackboard_E start_POSTSUBSCRIPT bold_italic_ξ ∼ italic_D ( bold_italic_x ) end_POSTSUBSCRIPT [ bold_italic_ξ ], which represents the average demand for 𝒙𝒙\bm{x}bold_italic_x. We set the step size at each iteration so that the objective value decreases by repeatedly multiplying by δ=0.9𝛿0.9\delta=0.9italic_δ = 0.9.