License: CC BY 4.0
arXiv:2401.10616v1 [math.OC] 19 Jan 2024

Mini-batch stochastic subgradient for functional constrained optimization

\nameNitesh Kumar Singha, Ion Necoaraa,b, Vyacheslav Kungurtsevc Corresponding author: Ion Necoara, email: [email protected]. aAutomatic Control and Systems Engineering Department, University Politehnica Bucharest, Spl. Independentei 313, 060042 Bucharest, Romania; bGheorghe Mihoc-Caius Iacob Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, 050711 Bucharest, Romania; cComputer Science Department, Czech Technical University, Karlovo Namesti 13, 12135 Prague, Czech Republic.
Abstract

In this paper we consider finite sum composite convex optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both smooth and nonsmooth problems. We also consider the cases of both with and without a strong convexity property. Further, we assume that each constraint set is given as the level set of a convex but not necessarily differentiable function. We reformulate the constrained finite sum problem into a stochastic optimization problem for which the stochastic subgradient projection method from [17] specializes to a collection of mini-batch variants, with different mini-batch sizes for the objective function and functional constraints, respectively. More specifically, at each iteration, our algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function and then a subsequent mini-batch subgradient projection step minimizing the feasibility violation. By specializing different mini-batching strategies, we derive exact expressions for the stepsizes as a function of the mini-batch size and in some cases we also derive insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. We also prove sublinear convergence rates for the mini-batch subgradient projection algorithm which depend explicitly on the mini-batch sizes and on the properties of the objective function. Numerical results also show a better performance of our mini-batch scheme over its single-batch counterpart.

keywords:
Finite sum convex optimization, functional constraints, stochastic subgradient method, mini-batching, convergence rates.
articletype: ARTICLE TEMPLATE

1 Introduction

In this work we consider the following composite convex optimization problem with many functional constraints:

F*=minx𝒴nF(x)(:=1Ni=1N(fi(x)+gi(x)))subject to hj(x)0j=1:m,superscript𝐹subscript𝑥𝒴superscript𝑛annotated𝐹𝑥assignabsent1𝑁superscriptsubscript𝑖1𝑁subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥subject to :subscript𝑗𝑥0for-all𝑗1𝑚\begin{array}[]{rl}F^{*}=\min_{x\in\mathcal{Y}\subseteq\mathbb{R}^{n}}&F(x)% \quad\left(:=\frac{1}{N}\sum_{i=1}^{N}(f_{i}(x)+g_{i}(x))\right)\\ \text{subject to }&h_{j}(x)\leq 0\;\;\;\forall j=1:m,\end{array}start_ARRAY start_ROW start_CELL italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_x ∈ caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL italic_F ( italic_x ) ( := divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ) ) end_CELL end_ROW start_ROW start_CELL subject to end_CELL start_CELL italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 ∀ italic_j = 1 : italic_m , end_CELL end_ROW end_ARRAY (1)

where fi,gisubscript𝑓𝑖subscript𝑔𝑖f_{i},\;g_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are proper lower semi-continuous convex functions, 𝒴𝒴\mathcal{Y}caligraphic_Y is a closed convex set and the number of sum-additive objective function components, N𝑁Nitalic_N, and/or the number of constraints, m𝑚mitalic_m, are assumed to be large. This model is very general and covers many practical optimization applications, including machine learning and statistics [34, 2], distributed control [16], signal processing [18, 33], operations research and finance [28]. It can be remarked that more commonly, one sees a single g𝑔gitalic_g representing the regularizer on the parameters. However, we are interested in the more general problem as there are also applications where one encounters more g𝑔gitalic_g’s, such as e.g., in Lasso problems with mixed 12subscript1subscript2\ell_{1}-\ell_{2}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regularizers, and in the case of regularizers with overlap** groups [11]. Multiple functional constraints can arise from multistage stochastic programming with equity constraints [36], robust classification [2], and fairness constraints in machine learning [37]. In the aforementioned applications the corresponding problems are becoming increasingly large in terms of both the number of variables and the size of training data. The use of regularizers and constraints in a composite objective structure make proximal gradient methods particularly natural for these classes of problems, see, e.g. [18, 31]. Moreover, when the composite objective function is expressed as a large finite sum of functions, then by computational practical necessity, we may have access only to stochastic estimates via samples of the (sub)gradients, proximal operators or projections. In this setting, the most popular stochastic methods are the stochastic gradient descent (SGD) [30, 19, 8, 18] and the stochastic proximal point (SPP) algorithms [15, 22, 18, 26, 31]. However, in practice it has been noticed that these stochastic methods converge slowly. To improve the convergence speed, one can use techniques such as mini-batching [1, 21, 23, 29, 3, 32], averaging [22, 25, 35] or variance reduction strategies [12, 14, 7]. In this work we consider a versatile mini-batching framework for a stochastic subgradient projection method for solving the constrained finite sum problem (1), and demonstrate, theoretically and experimentally, its favorable convergence properties.

The papers most related to our work are [21, 29, 17]. However, the optimization problem, the algorithm and consequently the convergence analysis are different from the present paper. In particular, [21] considers the optimization problem (1) with a single nonsmooth convex function f𝑓fitalic_f and gi0subscript𝑔𝑖0g_{i}\equiv 0italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≡ 0 for all i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N. Additionally, the objective function f𝑓fitalic_f is assumed to be strongly convex. Under this setting, [21] proposes a stochastic subgradient scheme with mini-batch for constraints and derives a sublinear convergence rate for it, whose proof heavily relies on the strong convexity property of f𝑓fitalic_f, bounded subgradients of f𝑓fitalic_f assumption, and uniqueness of the optimal solution. In this work, these conditions do not hold anymore as we consider more general assumptions (i.e., smooth/nonsmooth functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, objective function F𝐹Fitalic_F is convex or satisfies a strong convexity condition) and a more general optimization problem (i.e., finite sum composite objective). Moreover, our mini-batch subgradient method differs from the one in [21]: we consider mini-batching to handle both the objective function and the constraints, while [21] considers only mini-batching for constraints; moreover, the data selection rules used to form the mini-batches are also different in these two papers. Due to these distinctions, our convergence analysis and rates are not the same as the ones in [21]. In [29] an unconstrained finite sum problem is considered, i.e., in problem (1) gi0subscript𝑔𝑖0g_{i}\equiv 0italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≡ 0 for all i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, and hj0subscript𝑗0h_{j}\equiv 0italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≡ 0 for all j=1:m:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, and reformulated as a stochastic optimization problem. This reformulation is then solved using SGD. In this paper we extend the stochastic reformulation from [29] to the finite sum composite objective function in (1) and add a new stochastic reformulation for the constraints. Then, we use the stochastic subgradient projection method from [17] to solve the reformulated problem, leading to an array of mini-batch variants depending on the data selection rule used to form mini-batches. This is the first time such an analysis is performed on the general problem (1), and most of our mini-batch variants of the stochastic subgradient projection method are new.

Contributions. In this paper we propose mini-batch variants of stochastic subgradient projection algorithm for solving the constrained finite sum composite convex problem (1). The main advantage of our formulation is that the theoretical convergence guarantees of the corresponding numerical scheme only require very basic properties of our problem functions (convexity, bounded gradient type conditions) and access only to stochastic (sub)gradients and proximal operators. The main contributions are:

Stochastic reformulation: we propose an equivalent stochastic reformulation for the finite sum composite objective function and for the constraints of problem (1) using arbitrary sampling rules. We also extend the assumptions considered for the original problem to the new stochastic reformulation and derive explicit bounds for the corresponding constants appearing in the assumptions, which depend on the random variables that define the stochastic problem. By specializing our bounds to different mini-batching strategies, such as partition sampling and nice sampling, we derive exact expressions for these constants.

Convergence rates: the stochastic problem is then solved with the stochastic subgradient projection method from [17], which specializes to a range of possible mini-batch schemes with different batch sizes for the objective function and functional constraints. Based on the constants defining the assumptions, we derive exact expressions for the stepsize as a function of the mini-batch size. Moreover, when the objective function satisfies a strong convexity condition we also derive informative stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. At each iteration, the algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function, followed by a feasibility step for minimizing the feasibility violation of the observed mini-batch of random constraints. We prove sublinear convergence rates for a weighted averages of the iterates in terms of expected distance to the constraint set, as well as for expected optimality of the function values/distance to the optimal set. Our rates depend explicitly on the mini-batch sizes and on the properties of the problem functions. This work is the first analysis of a mini-batch stochastic subgradient projection method on the general problem (1), and most of our mini-batch variants were never explicitly considered in the literature before.

Content. In Section 2 we introduce some basic notation and present the main assumptions. In Section 3 we provide a stochastic reformulation for the original problem, present several sampling strategies and derive some relevant bounds. In Section 4 we present a mini-batch stochastic subgradient projection algorithm and analyze its convergence. Finally, in Section 5, the performance on numerical simulations is presented, providing support for the effectiveness of our method.

2 Notations and assumptions

For the finite sum problem (1) we assume that 𝒴𝒴\mathcal{Y}caligraphic_Y is a simple convex set, i.e., it is easy to evaluate the projection onto 𝒴𝒴\mathcal{Y}caligraphic_Y. Moreover, we assume that the interior of 𝒴𝒴\mathcal{Y}caligraphic_Y is contained in the effective domains of the functions fi,gisubscript𝑓𝑖subscript𝑔𝑖f_{i},\;g_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Additionally, all the functions gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT have the common domain, domgdom𝑔\text{dom}\,gdom italic_g. We make no assumptions on the differentiability of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and use, with some abuse of notation, the same expression for the gradient or the subgradient of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at x𝑥xitalic_x, that is fi(x)fi(x)subscript𝑓𝑖𝑥subscript𝑓𝑖𝑥\nabla f_{i}(x)\in\partial f_{i}(x)∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∈ ∂ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ), where the subdifferential fi(x)subscript𝑓𝑖𝑥\partial f_{i}(x)∂ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) is either a singleton or a nonempty set for any i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N. Similarly for gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s. Throughout the paper, the subgradient of h(x,ξ)𝑥𝜉h(x,\xi)italic_h ( italic_x , italic_ξ ) w.r.t. x𝑥xitalic_x, xh(x,ξ)subscript𝑥𝑥𝜉\nabla_{x}h(x,\xi)∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_h ( italic_x , italic_ξ ), is denoted simply by h(x,ξ)𝑥𝜉\nabla h(x,\xi)∇ italic_h ( italic_x , italic_ξ ). Let us denote Fi(x)=fi(x)+gi(x)subscript𝐹𝑖𝑥subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥F_{i}(x)=f_{i}(x)+g_{i}(x)italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ). Assuming gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are convex functions, then from basic calculus rules we have Fi(x)=fi(x)+gi(x)subscript𝐹𝑖𝑥subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥\nabla F_{i}(x)=\nabla f_{i}(x)+\nabla g_{i}(x)∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + ∇ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ). Further, for a given xN𝑥superscript𝑁x\in\mathbb{R}^{N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, xnorm𝑥\|x\|∥ italic_x ∥ denotes its Euclidean norm and (x)+=max{0,x}subscript𝑥0𝑥(x)_{+}=\max\{0,x\}( italic_x ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_max { 0 , italic_x }. The feasible set of (1) is denoted by:

𝒳={x𝒴:hj(x)0j=1:m}.𝒳conditional-set𝑥𝒴:subscript𝑗𝑥0for-all𝑗1𝑚\mathcal{X}=\left\{x\in\mathcal{Y}:\;h_{j}(x)\leq 0\;\;\forall j=1:m\right\}.caligraphic_X = { italic_x ∈ caligraphic_Y : italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 ∀ italic_j = 1 : italic_m } .

We assume the optimal value F*>superscript𝐹F^{*}>-\inftyitalic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT > - ∞ and 𝒳*ϕsuperscript𝒳italic-ϕ\mathcal{X}^{*}\neq\phicaligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≠ italic_ϕ denotes the optimal set, i.e.:

F*=minx𝒳F(x):=1Ni=1NFi(x),𝒳*={x𝒳F(x)=F*}.formulae-sequencesuperscript𝐹subscript𝑥𝒳𝐹𝑥assign1𝑁superscriptsubscript𝑖1𝑁subscript𝐹𝑖𝑥superscript𝒳conditional-set𝑥𝒳𝐹𝑥superscript𝐹F^{*}=\min_{x\in\mathcal{X}}F(x):=\frac{1}{N}\sum_{i=1}^{N}F_{i}(x),\quad% \mathcal{X}^{*}=\{x\in\mathcal{X}\mid F(x)=F^{*}\}.italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_F ( italic_x ) := divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) , caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = { italic_x ∈ caligraphic_X ∣ italic_F ( italic_x ) = italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT } .

For any xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we denote its projection onto the optimal set 𝒳*superscript𝒳\mathcal{X}^{*}caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT by x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG, that is:

x¯=Π𝒳*(x).¯𝑥subscriptΠsuperscript𝒳𝑥\bar{x}=\Pi_{\mathcal{X}^{*}}(x).over¯ start_ARG italic_x end_ARG = roman_Π start_POSTSUBSCRIPT caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) .

We consider additionally the following assumptions. First, we assume that the objective function satisfies some bounded gradient condition.

Assumption 2.1.

The (sub)gradients of F𝐹Fitalic_F satisfy the following bounded gradient condition: there exist nonnegative constants L0𝐿0L\geq 0italic_L ≥ 0 and B0𝐵0B\geq 0italic_B ≥ 0 such that:

B2+L(F(x)F*)1Ni=1NFi(x)2x𝒴.formulae-sequencesuperscript𝐵2𝐿𝐹𝑥superscript𝐹1𝑁superscriptsubscript𝑖1𝑁superscriptnormsubscript𝐹𝑖𝑥2for-all𝑥𝒴B^{2}+L(F(x)-F^{*})\geq\frac{1}{N}\sum_{i=1}^{N}\|\nabla F_{i}(x)\|^{2}\quad% \forall x\in\mathcal{Y}.italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L ( italic_F ( italic_x ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_x ∈ caligraphic_Y . (2)

To the best of our knowledge this assumption was first introduced in [18] and further studied in [17, 29]. We present two examples of functions satisfying this assumption below (see [17] for proofs).

Example 1 [Non-smooth (Lipschitz) functions satisfy Assumption 2.1]: Assume that the convex functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT have bounded (sub)gradients:

fi(x)Bfiandgi(x)Bgix𝒴.formulae-sequencenormsubscript𝑓𝑖𝑥subscript𝐵subscript𝑓𝑖andformulae-sequencenormsubscript𝑔𝑖𝑥subscript𝐵subscript𝑔𝑖for-all𝑥𝒴\|\nabla f_{i}(x)\|\leq B_{f_{i}}\quad\text{and}\quad\|\nabla g_{i}(x)\|\leq B% _{g_{i}}\quad\forall x\in\mathcal{Y}.∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ italic_B start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ∥ ∇ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∀ italic_x ∈ caligraphic_Y .

Then, Assumption 2.1 holds with L=0andB2=2Ni=1N(Bfi2+Bgi2).𝐿0andsuperscript𝐵22𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝐵subscript𝑓𝑖2superscriptsubscript𝐵subscript𝑔𝑖2L=0\;\;\text{and}\;\;B^{2}=\frac{2}{N}\sum_{i=1}^{N}(B_{f_{i}}^{2}+B_{g_{i}}^{% 2}).italic_L = 0 and italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_B start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Example 2 [Smooth (Lipschitz gradient) functions satisfy Assumption 2.1]: Condition (2) contains the class of convex functions formed as a sum of two convex terms, one having Lipschitz continuous gradients with constants Lfisubscript𝐿subscript𝑓𝑖L_{f_{i}}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT’s and the other having bounded subgradients over bounded set 𝒴𝒴\mathcal{Y}caligraphic_Y with constant Bgsubscript𝐵𝑔B_{g}italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT. Then, Assumption 2.1 holds with (here D𝐷Ditalic_D denotes the diameter of 𝒴𝒴\mathcal{Y}caligraphic_Y):

L=4maxi=1:NLfiandB2=4Ni=1NBg2+4maxx¯𝒳*(1Ni=1Nfi(x¯)2+Dmaxi=1:NLfiF(x¯)).𝐿4subscript:𝑖1𝑁subscript𝐿subscript𝑓𝑖andsuperscript𝐵24𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝐵𝑔24subscript¯𝑥superscript𝒳1𝑁superscriptsubscript𝑖1𝑁superscriptnormsubscript𝑓𝑖¯𝑥2𝐷subscript:𝑖1𝑁subscript𝐿subscript𝑓𝑖norm𝐹¯𝑥L=4\max_{i=1:N}L_{f_{i}}\;\text{and}\;B^{2}\!=\!\frac{4}{N}\sum_{i=1}^{N}B_{g}% ^{2}+4\max_{\bar{x}\in\mathcal{X}^{*}}\left(\!\frac{1}{N}\sum_{i=1}^{N}\|% \nabla f_{i}(\bar{x})\|^{2}+D\max_{i=1:N}L_{f_{i}}\|\nabla F(\bar{x})\|\!% \right)\!.italic_L = 4 roman_max start_POSTSUBSCRIPT italic_i = 1 : italic_N end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 4 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 roman_max start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG ∈ caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_D roman_max start_POSTSUBSCRIPT italic_i = 1 : italic_N end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ ∇ italic_F ( over¯ start_ARG italic_x end_ARG ) ∥ ) .

In our analysis below we also assume F𝐹Fitalic_F to satisfy a (strong) convexity condition:

Assumption 2.2.

The function F𝐹Fitalic_F satisfies a (strong) convex condition on 𝒴𝒴\mathcal{Y}caligraphic_Y, i.e., there exists non-negative constant μ0𝜇0\mu\geq 0italic_μ ≥ 0 such that:

F(y)F(x)+F(x),yx+μ2yx2x,y𝒴.formulae-sequence𝐹𝑦𝐹𝑥𝐹𝑥𝑦𝑥𝜇2superscriptnorm𝑦𝑥2for-all𝑥𝑦𝒴F(y)\geq F(x)+\langle\nabla F(x),y-x\rangle+\frac{\mu}{2}\|y-x\|^{2}\quad% \forall x,y\in\mathcal{Y}.italic_F ( italic_y ) ≥ italic_F ( italic_x ) + ⟨ ∇ italic_F ( italic_x ) , italic_y - italic_x ⟩ + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_x , italic_y ∈ caligraphic_Y . (3)

Note that when μ=0𝜇0\mu=0italic_μ = 0 relation (3) states that F𝐹Fitalic_F is convex on 𝒴𝒴\mathcal{Y}caligraphic_Y. Additionally, we assume the following bound for the functional constraints:

Assumption 2.3.

The functional constraints hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have bounded subgradients on dom g, i.e., there exists Bj>0subscript𝐵𝑗0B_{j}>0italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > 0 such that:

hj(x)Bjhj(x)hj(x),xdomg,j=1:m.:formulae-sequencenormsubscript𝑗𝑥subscript𝐵𝑗formulae-sequencefor-allsubscript𝑗𝑥subscript𝑗𝑥formulae-sequence𝑥dom𝑔𝑗1𝑚\|\nabla h_{j}(x)\|\leq B_{j}\quad\forall\,\nabla h_{j}(x)\in\partial h_{j}(x)% ,\;x\in\emph{dom}\;g,\;\;j=1:m.∥ ∇ italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ ∇ italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ∈ ∂ italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) , italic_x ∈ dom italic_g , italic_j = 1 : italic_m . (4)

Note that this assumption implies that the functional constraints hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are Lipschitz continuous. Additionally, we assume a Hölderian growth condition for the constraints.

Assumption 2.4.

The functional constraints satisfy additionally the following Hölderian growth condition for some constants c¯>0normal-¯𝑐0\bar{c}>0over¯ start_ARG italic_c end_ARG > 0 and q1𝑞1q\geq 1italic_q ≥ 1:

dist2q(y,𝒳)c¯(maxj=1:m(hj(y)))+2ydomg.superscriptdist2𝑞𝑦𝒳¯𝑐superscriptsubscriptsubscript:𝑗1𝑚subscript𝑗𝑦2for-all𝑦dom𝑔\emph{dist}^{2q}(y,\mathcal{X})\leq\bar{c}\left(\max_{j=1:m}(h_{j}(y))\right)_% {+}^{2}\;\;\;\forall y\in\emph{dom}\;g.dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_y , caligraphic_X ) ≤ over¯ start_ARG italic_c end_ARG ( roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_y ∈ dom italic_g . (5)

Note that this assumption has been used in [32] in the context of convex feasibility problems and in [17] for q=1𝑞1q=1italic_q = 1 in the context of stochastic optimization problems. It holds e.g., when the feasible set 𝒳𝒳\mathcal{X}caligraphic_X has an interior point, see e.g. [13], or when the feasible set is polyhedral. However, Assumption 2.4 holds for more general sets, e.g., when a strengthened Slater condition holds for the collection of functional constraints, such as the generalized Robinson condition, as detailed in [13] Corollary 3.

3 Stochastic reformulation

In this section we reformulate the deterministic problem (1) into a stochastic one wherein the objective function is expressed in the form of an expectation. We analyze its main properties and then use the machinery of stochastic sampling to devise efficient mini-batch schemes. For this we use an arbitrary sampling paradigm. More precisely, let (Ω1,1,1)subscriptΩ1subscript1subscript1(\Omega_{1},\mathcal{F}_{1},\mathbb{P}_{1})( roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , blackboard_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) be a finite probability space with Ω1={1,,N}subscriptΩ11𝑁\Omega_{1}=\{1,...,N\}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { 1 , … , italic_N } and a random vector ζN𝜁superscript𝑁\zeta\in\mathbb{R}^{N}italic_ζ ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT drawn from some probability distribution 1subscript1\mathbb{P}_{1}blackboard_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT having the property 𝔼[ζi]=1for alli=1:N:𝔼delimited-[]superscript𝜁𝑖1for all𝑖1𝑁\mathbb{E}[\zeta^{i}]=1\;\text{for all}\;i=1:Nblackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = 1 for all italic_i = 1 : italic_N. Then, let us define the following functions:

f(x,ζ)=1Ni=1Nζifi(x)andg(x,ζ)=1Ni=1Nζigi(x).𝑓𝑥𝜁1𝑁superscriptsubscript𝑖1𝑁superscript𝜁𝑖subscript𝑓𝑖𝑥and𝑔𝑥𝜁1𝑁superscriptsubscript𝑖1𝑁superscript𝜁𝑖subscript𝑔𝑖𝑥\displaystyle f(x,\zeta)=\frac{1}{N}\sum_{i=1}^{N}\zeta^{i}f_{i}(x)\;\;\text{% and}\;\;g(x,\zeta)=\frac{1}{N}\sum_{i=1}^{N}\zeta^{i}g_{i}(x).italic_f ( italic_x , italic_ζ ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) and italic_g ( italic_x , italic_ζ ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) . (6)

Note that if fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, are convex functions and ζi0subscript𝜁𝑖0\zeta_{i}\geq 0italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0, then f(,ζ)𝑓𝜁f(\cdot,\zeta)italic_f ( ⋅ , italic_ζ ) and g(,ζ)𝑔𝜁g(\cdot,\zeta)italic_g ( ⋅ , italic_ζ ) are also convex functions. Also consider a probability space (Ω2,2,2)subscriptΩ2subscript2subscript2(\Omega_{2},\mathcal{F}_{2},\mathbb{P}_{2})( roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , blackboard_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), with Ω2={1,,m}subscriptΩ21𝑚\Omega_{2}=\{1,...,m\}roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { 1 , … , italic_m } and a random vector ξm𝜉superscript𝑚\xi\in\mathbb{R}^{m}italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT drawn from some probability distribution 2subscript2\mathbb{P}_{2}blackboard_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT having the property 𝔼[ξj]>0𝔼delimited-[]superscript𝜉𝑗0\mathbb{E}[\xi^{j}]>0blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 and 0ξjξ¯0superscript𝜉𝑗¯𝜉0\leq\xi^{j}\leq\bar{\xi}0 ≤ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_ξ end_ARG, for all j=1:m:𝑗1𝑚j=1:mitalic_j = 1 : italic_m and some ξ¯<¯𝜉\bar{\xi}<\inftyover¯ start_ARG italic_ξ end_ARG < ∞. Then, let us define the functional constraints:

h(x,ξ)=maxj=1:m(ξjhj(x)).𝑥𝜉subscript:𝑗1𝑚superscript𝜉𝑗subscript𝑗𝑥\displaystyle h(x,\xi)=\max_{j=1:m}(\xi^{j}h_{j}(x)).italic_h ( italic_x , italic_ξ ) = roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) . (7)

Since ξj0superscript𝜉𝑗0\xi^{j}\geq 0italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≥ 0, then h(,ξ)𝜉h(\cdot,\xi)italic_h ( ⋅ , italic_ξ ) is a convex function provided that hj,withj=1:m:subscript𝑗with𝑗1𝑚h_{j},\;\text{with}\;j=1:mitalic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , with italic_j = 1 : italic_m, are convex functions. Then, we can define a stochastic reformulation of the original optimization problem (1):

F*=minx𝒴n𝔼[f(x,ζ)+g(x,ζ)]subject to h(x,ξ)0ξ2.superscript𝐹absentsubscript𝑥𝒴superscript𝑛𝔼delimited-[]𝑓𝑥𝜁𝑔𝑥𝜁missing-subexpressionsubject to 𝑥𝜉0for-all𝜉subscript2\begin{array}[]{rl}F^{*}=&\min\limits_{x\in\mathcal{Y}\subseteq\mathbb{R}^{n}}% \;\mathbb{E}[f(x,\zeta)+g(x,\zeta)]\\ &\text{subject to }\;h(x,\xi)\leq 0\;\;\forall\xi\in\mathcal{F}_{2}.\end{array}start_ARRAY start_ROW start_CELL italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = end_CELL start_CELL roman_min start_POSTSUBSCRIPT italic_x ∈ caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_f ( italic_x , italic_ζ ) + italic_g ( italic_x , italic_ζ ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL subject to italic_h ( italic_x , italic_ξ ) ≤ 0 ∀ italic_ξ ∈ caligraphic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY (8)

Note that F(x,ζ)=f(x,ζ)+g(x,ζ)𝐹𝑥𝜁𝑓𝑥𝜁𝑔𝑥𝜁F(x,\zeta)=f(x,\zeta)+g(x,\zeta)italic_F ( italic_x , italic_ζ ) = italic_f ( italic_x , italic_ζ ) + italic_g ( italic_x , italic_ζ ) and F(x,ζ)=f(x,ζ)+g(x,ζ)𝐹𝑥𝜁𝑓𝑥𝜁𝑔𝑥𝜁\nabla F(x,\zeta)=\nabla f(x,\zeta)+\nabla g(x,\zeta)∇ italic_F ( italic_x , italic_ζ ) = ∇ italic_f ( italic_x , italic_ζ ) + ∇ italic_g ( italic_x , italic_ζ ) are unbiased estimators of F(x)𝐹𝑥F(x)italic_F ( italic_x ) and F(x)𝐹𝑥\nabla F(x)∇ italic_F ( italic_x ), respectively. Indeed:

𝔼[F(x,ζ)]=(6)1Ni=1N𝔼[ζi](fi(x)+gi(x))=𝔼[ζi]=1F(x).𝔼delimited-[]𝐹𝑥𝜁italic-(6italic-)1𝑁superscriptsubscript𝑖1𝑁𝔼delimited-[]superscript𝜁𝑖subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥𝔼delimited-[]superscript𝜁𝑖1𝐹𝑥\displaystyle\mathbb{E}[\nabla F(x,\zeta)]\overset{\eqref{eq:reformulation_f}}% {=}\frac{1}{N}\sum_{i=1}^{N}\mathbb{E}[\zeta^{i}](\nabla f_{i}(x)+\nabla g_{i}% (x))\overset{\mathbb{E}[\zeta^{i}]=1}{=}\nabla F(x).blackboard_E [ ∇ italic_F ( italic_x , italic_ζ ) ] start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + ∇ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ) start_OVERACCENT blackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = 1 end_OVERACCENT start_ARG = end_ARG ∇ italic_F ( italic_x ) .

In the following lemma we prove that under some basic conditions on the random vectors, the deterministic problem (1) is equivalent to the stochastic problem (8).

Lemma 3.1.

Let the random vectors ζ𝜁\zetaitalic_ζ and ξ𝜉\xiitalic_ξ satisfy 𝔼[ζi]=1for alli=1:Nnormal-:𝔼delimited-[]superscript𝜁𝑖1for all𝑖1𝑁\mathbb{E}[\zeta^{i}]=1\;\;\text{for all}\;\;i=1:Nblackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = 1 for all italic_i = 1 : italic_N and ξ0𝜉0\xi\geq 0italic_ξ ≥ 0, with 𝔼[ξj]>0for allj=1:mnormal-:𝔼delimited-[]superscript𝜉𝑗0for all𝑗1𝑚\mathbb{E}[\xi^{j}]>0\;\;\text{for all}\;\;j=1:mblackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 for all italic_j = 1 : italic_m. Then, the the deterministic problem (1) is equivalent to stochastic problem (8).

Proof.

For the objective function in problem (8), we have:

𝔼[f(x,ζ)+g(x,ζ)]𝔼delimited-[]𝑓𝑥𝜁𝑔𝑥𝜁\displaystyle\mathbb{E}[f(x,\zeta)+g(x,\zeta)]blackboard_E [ italic_f ( italic_x , italic_ζ ) + italic_g ( italic_x , italic_ζ ) ] =𝔼[1Ni=1Nζi(fi(x)+gi(x))]absent𝔼delimited-[]1𝑁superscriptsubscript𝑖1𝑁superscript𝜁𝑖subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥\displaystyle=\mathbb{E}\left[\frac{1}{N}\sum_{i=1}^{N}\zeta^{i}(f_{i}(x)+g_{i% }(x))\right]= blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ) ]
=1Ni=1N𝔼[ζi](fi(x)+gi(x))=1Ni=1N(fi(x)+gi(x))absent1𝑁superscriptsubscript𝑖1𝑁𝔼delimited-[]superscript𝜁𝑖subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥1𝑁superscriptsubscript𝑖1𝑁subscript𝑓𝑖𝑥subscript𝑔𝑖𝑥\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\mathbb{E}[\zeta^{i}](f_{i}(x)+g_{i}(x)% )=\frac{1}{N}\sum_{i=1}^{N}(f_{i}(x)+g_{i}(x))= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) + italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) )
=F(x).absent𝐹𝑥\displaystyle=F(x).= italic_F ( italic_x ) .

For the functional constraints, if x𝑥xitalic_x is feasible for the stochastic problem (8), i.e. h(x,ξ)0𝑥𝜉0h(x,\xi)\leq 0italic_h ( italic_x , italic_ξ ) ≤ 0, then we have:

h(x,ξ)𝑥𝜉\displaystyle h(x,\xi)italic_h ( italic_x , italic_ξ ) =maxj=1:m(ξjhj(x))0ξjhj(x)0j=1:m.:absentsubscript:𝑗1𝑚superscript𝜉𝑗subscript𝑗𝑥0superscript𝜉𝑗subscript𝑗𝑥0for-all𝑗1𝑚\displaystyle=\max_{j=1:m}(\xi^{j}h_{j}(x))\leq 0\implies\xi^{j}h_{j}(x)\leq 0% \;\;\;\forall j=1:m.= roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) ≤ 0 ⟹ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 ∀ italic_j = 1 : italic_m .

Taking expectation on both sides, we get:

𝔼[ξjhj(x)]0𝔼[ξj]>0hj(x)0j=1:m,:𝔼delimited-[]superscript𝜉𝑗subscript𝑗𝑥0𝔼delimited-[]superscript𝜉𝑗0subscript𝑗𝑥0for-all𝑗1𝑚\displaystyle\mathbb{E}[\xi^{j}h_{j}(x)]\leq 0\overset{\mathbb{E}[\xi^{j}]>0}{% \implies}h_{j}(x)\leq 0\;\;\;\forall j=1:m,blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ] ≤ 0 start_OVERACCENT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 end_OVERACCENT start_ARG ⟹ end_ARG italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 ∀ italic_j = 1 : italic_m ,

x𝑥xitalic_x is feasible for the original problem (1). On the other hand, if hj(x)0subscript𝑗𝑥0h_{j}(x)\leq 0italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0, for all j=1:m:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, then using ξj0superscript𝜉𝑗0\xi^{j}\geq 0italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≥ 0, we get:

ξjhj(x)0maxj=1:m(ξjhj(x))0h(x,ξ)0.superscript𝜉𝑗subscript𝑗𝑥0subscript:𝑗1𝑚superscript𝜉𝑗subscript𝑗𝑥0𝑥𝜉0\displaystyle\xi^{j}h_{j}(x)\leq 0\implies\max_{j=1:m}(\xi^{j}h_{j}(x))\leq 0% \implies h(x,\xi)\leq 0.italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 ⟹ roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) ≤ 0 ⟹ italic_h ( italic_x , italic_ξ ) ≤ 0 .

This concludes our proof. ∎

3.1 Properties of stochastic problem

In this section we prove that the assumptions valid for the original problem (1) can be extended to the stochastic reformulation (8). Moreover, we derive explicit bounds for the corresponding assumptions’ constants depending on the random variables that define the stochastic problem. Let F^(x)^𝐹𝑥\nabla\hat{F}(x)∇ over^ start_ARG italic_F end_ARG ( italic_x ) be the matrix of dimension n×N𝑛𝑁n\times Nitalic_n × italic_N obtained by arranging Fi(x)subscript𝐹𝑖𝑥\nabla F_{i}(x)∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x )’s as its columns. In the next lemma we prove that a stochastic bounded gradient type condition holds for the objective function of problem (8).

Lemma 3.2.

Let Assumption 2.1 hold and consider the random vector ζ𝜁\zetaitalic_ζ satisfying 𝔼[ζi]=1𝔼delimited-[]superscript𝜁𝑖1\mathbb{E}[\zeta^{i}]=1blackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = 1. Then, the (sub)gradients of F(,ζ)𝐹normal-⋅𝜁F(\cdot,\zeta)italic_F ( ⋅ , italic_ζ ) from the problem (8) satisfy a stochastic bounded gradient condition:

2+(F(x)F*)𝔼ζ[F(x,ζ)2]x𝒴,formulae-sequencesuperscript2𝐹𝑥superscript𝐹subscript𝔼𝜁delimited-[]superscriptnorm𝐹𝑥𝜁2for-all𝑥𝒴\mathcal{B}^{2}+\mathcal{L}(F(x)-F^{*})\geq\mathbb{E}_{\zeta}[\|\nabla F(x,% \zeta)\|^{2}]\quad\forall x\in\mathcal{Y},caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_L ( italic_F ( italic_x ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≥ blackboard_E start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT [ ∥ ∇ italic_F ( italic_x , italic_ζ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ∀ italic_x ∈ caligraphic_Y , (9)

with the parameters 2=𝔼[ζ2]NB2superscript2𝔼delimited-[]superscriptnorm𝜁2𝑁superscript𝐵2\mathcal{B}^{2}=\frac{\mathbb{E}[\|\zeta\|^{2}]}{N}B^{2}caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG blackboard_E [ ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_N end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and =𝔼[ζ2]NL𝔼delimited-[]superscriptnorm𝜁2𝑁𝐿\mathcal{L}=\frac{\mathbb{E}[\|\zeta\|^{2}]}{N}Lcaligraphic_L = divide start_ARG blackboard_E [ ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_N end_ARG italic_L.

Proof.

Using the definition of F(x,ζ)𝐹𝑥𝜁F(x,\zeta)italic_F ( italic_x , italic_ζ ), we get:

F(x,ζ)2superscriptnorm𝐹𝑥𝜁2\displaystyle\|\nabla F(x,\zeta)\|^{2}∥ ∇ italic_F ( italic_x , italic_ζ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =1Ni=1NζiFi(x)2=1N2F^(x)ζ21N2F^(x)2ζ2absentsuperscriptnorm1𝑁superscriptsubscript𝑖1𝑁superscript𝜁𝑖subscript𝐹𝑖𝑥21superscript𝑁2superscriptnorm^𝐹𝑥𝜁21superscript𝑁2superscriptnorm^𝐹𝑥2superscriptnorm𝜁2\displaystyle=\left\|\frac{1}{N}\sum_{i=1}^{N}\zeta^{i}\nabla F_{i}(x)\right\|% ^{2}=\frac{1}{N^{2}}\|\nabla\hat{F}(x)\zeta\|^{2}\leq\frac{1}{N^{2}}\|\nabla% \hat{F}(x)\|^{2}\|\zeta\|^{2}= ∥ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ ∇ over^ start_ARG italic_F end_ARG ( italic_x ) italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ ∇ over^ start_ARG italic_F end_ARG ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
1N2F^(x)F2ζ2=ζ2N(1Ni=1NFi(x)2)absent1superscript𝑁2superscriptsubscriptnorm^𝐹𝑥𝐹2superscriptnorm𝜁2superscriptnorm𝜁2𝑁1𝑁superscriptsubscript𝑖1𝑁superscriptnormsubscript𝐹𝑖𝑥2\displaystyle\leq\frac{1}{N^{2}}\|\nabla\hat{F}(x)\|_{F}^{2}\|\zeta\|^{2}=% \frac{\|\zeta\|^{2}}{N}\left(\frac{1}{N}\sum_{i=1}^{N}\|\nabla F_{i}(x)\|^{2}\right)≤ divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ ∇ over^ start_ARG italic_F end_ARG ( italic_x ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ ∇ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
(2)ζ2NB2+ζ2NL(F(x)F(x¯)),italic-(2italic-)superscriptnorm𝜁2𝑁superscript𝐵2superscriptnorm𝜁2𝑁𝐿𝐹𝑥𝐹¯𝑥\displaystyle\overset{\eqref{as:main1_spg}}{\leq}\frac{\|\zeta\|^{2}}{N}B^{2}+% \frac{\|\zeta\|^{2}}{N}L(F(x)-F(\bar{x})),start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG italic_L ( italic_F ( italic_x ) - italic_F ( over¯ start_ARG italic_x end_ARG ) ) ,

where the second inequality follows from the fact that the Frobenius norm is larger than the 2-norm of a matrix. Then, the statement follows after taking expectation with respect to ζ𝜁\zetaitalic_ζ. ∎

From Jensen’s inequality, taking x=x*𝒳*𝑥superscript𝑥superscript𝒳x=x^{*}\in\mathcal{X}^{*}italic_x = italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT in (2), we get:

B2𝔼ζ[F(x*,ζ)2]𝔼ζ[F(x*,ζ)]2=F(x*)2x*𝒳*.formulae-sequencesuperscript𝐵2subscript𝔼𝜁delimited-[]superscriptnorm𝐹superscript𝑥𝜁2superscriptnormsubscript𝔼𝜁delimited-[]𝐹superscript𝑥𝜁2superscriptnorm𝐹superscript𝑥2for-allsuperscript𝑥superscript𝒳B^{2}\geq\mathbb{E}_{\zeta}[\|\nabla F(x^{*},\zeta)\|^{2}]\geq\|\mathbb{E}_{% \zeta}[\nabla F(x^{*},\zeta)]\|^{2}=\|\nabla F(x^{*})\|^{2}\quad\forall x^{*}% \in\mathcal{X}^{*}.italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ blackboard_E start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT [ ∥ ∇ italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ζ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≥ ∥ blackboard_E start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT [ ∇ italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ζ ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ∇ italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT . (10)

Since F(x,ζ)𝐹𝑥𝜁F(x,\zeta)italic_F ( italic_x , italic_ζ ) is an unbiased estimator of F(x)𝐹𝑥F(x)italic_F ( italic_x ), it also follows that if Assumption 2.2 holds for the original objective function, then the same condition is valid for the objective function of the stochastic problem (8) with the same constant μ𝜇\muitalic_μ. Further, for a given x𝑥xitalic_x let us define the set of active constraints by J*(x)={j=1:m|h(x,ξ)=ξjhj(x)}superscript𝐽𝑥conditional-set𝑗1conditional𝑚𝑥𝜉superscript𝜉𝑗subscript𝑗𝑥J^{*}(x)=\{j=1:m\;|\;h(x,\xi)=\xi^{j}h_{j}(x)\}italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) = { italic_j = 1 : italic_m | italic_h ( italic_x , italic_ξ ) = italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) }. In the next lemma we provide a bounded subgradient condition for the functional constraints of the stochastic problem (8).

Lemma 3.3.

Let Assumption 2.3 hold and consider the random vector ξ0𝜉0\xi\geq 0italic_ξ ≥ 0 satisfying 𝔼[ξj]>0𝔼delimited-[]superscript𝜉𝑗0\mathbb{E}[\xi^{j}]>0blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 and ξjξ¯superscript𝜉𝑗normal-¯𝜉\xi^{j}\leq\bar{\xi}italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_ξ end_ARG, for all j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m and some ξ¯<normal-¯𝜉\bar{\xi}<\inftyover¯ start_ARG italic_ξ end_ARG < ∞. Then, the functional constraints h(,ξ)normal-⋅𝜉h(\cdot,\xi)italic_h ( ⋅ , italic_ξ ) of the problem (8) have bounded subgradients on dom g, i.e.:

h(x,ξ)hxdomgand ξ2,formulae-sequencenorm𝑥𝜉subscriptfor-all𝑥dom𝑔and 𝜉subscript2\displaystyle\|\nabla h(x,\xi)\|\leq\mathcal{B}_{h}\quad\forall x\in{\mathrm{% dom}}\,g\;\;\text{and }\;\;\xi\in\mathcal{F}_{2},∥ ∇ italic_h ( italic_x , italic_ξ ) ∥ ≤ caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∀ italic_x ∈ roman_dom italic_g and italic_ξ ∈ caligraphic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (11)

where h(x,ξ)h(x,ξ)normal-∇𝑥𝜉𝑥𝜉\nabla h(x,\xi)\in\partial h(x,\xi)∇ italic_h ( italic_x , italic_ξ ) ∈ ∂ italic_h ( italic_x , italic_ξ ) and h=ξ¯maxj=1:mBj.subscriptnormal-¯𝜉subscriptnormal-:𝑗1𝑚subscript𝐵𝑗\mathcal{B}_{h}=\bar{\xi}\max_{j=1:m}B_{j}.caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = over¯ start_ARG italic_ξ end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

Proof.

Let xdomg𝑥dom𝑔x\in\text{dom}\;gitalic_x ∈ dom italic_g and h(x,ξ)h(x,ξ)𝑥𝜉𝑥𝜉\nabla h(x,\xi)\in\partial h(x,\xi)∇ italic_h ( italic_x , italic_ξ ) ∈ ∂ italic_h ( italic_x , italic_ξ ). Then, from the definition of h(,ξ)𝜉h(\cdot,\xi)italic_h ( ⋅ , italic_ξ ) and of the index set J*(x)superscript𝐽𝑥J^{*}(x)italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ), we have:

h(x,ξ)=maxj=1:m(ξjhj(x))=ξj*hj*(x)j*J*(x).formulae-sequence𝑥𝜉subscript:𝑗1𝑚superscript𝜉𝑗subscript𝑗𝑥superscript𝜉superscript𝑗subscriptsuperscript𝑗𝑥for-allsuperscript𝑗superscript𝐽𝑥h(x,\xi)=\max_{j=1:m}(\xi^{j}h_{j}(x))=\xi^{j^{*}}\cdot h_{j^{*}}(x)\quad% \forall j^{*}\in J^{*}(x).italic_h ( italic_x , italic_ξ ) = roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) = italic_ξ start_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∀ italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) .

Then, we further have (see Lemma 3.1.13 in [20]):

h(x,ξ)=Conv{ξj*hj*(x)|j*J*(x)}𝑥𝜉Convconditional-setsubscript𝜉superscript𝑗subscriptsuperscript𝑗𝑥superscript𝑗superscript𝐽𝑥\displaystyle\nabla h(x,\xi)=\text{Conv}\{\xi_{j^{*}}\nabla h_{j^{*}}(x)|j^{*}% \in J^{*}(x)\}∇ italic_h ( italic_x , italic_ξ ) = Conv { italic_ξ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) | italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) }
\displaystyle\implies h(x,ξ)maxθj*0,j*J*(x)θj*=1j*J*(x)θj*ξj*hj*(x)norm𝑥𝜉subscriptformulae-sequencesubscript𝜃superscript𝑗0subscriptsuperscript𝑗superscript𝐽𝑥subscript𝜃superscript𝑗1normsubscriptsuperscript𝑗superscript𝐽𝑥subscript𝜃superscript𝑗superscript𝜉superscript𝑗subscriptsuperscript𝑗𝑥\displaystyle\|\nabla h(x,\xi)\|\leq\max_{\theta_{j^{*}}\geq 0,\;\sum_{j^{*}% \in J^{*}(x)}\theta_{j^{*}}=1}\left\|\sum_{j^{*}\in J^{*}(x)}\theta_{j^{*}}\xi% ^{j^{*}}\cdot\nabla h_{j^{*}}(x)\right\|∥ ∇ italic_h ( italic_x , italic_ξ ) ∥ ≤ roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ 0 , ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥
maxθj*0,j*J*(x)θj*=1j*J*(x)θj*ξj*hj*(x)absentsubscriptformulae-sequencesubscript𝜃superscript𝑗0subscriptsuperscript𝑗superscript𝐽𝑥subscript𝜃superscript𝑗1subscriptsuperscript𝑗superscript𝐽𝑥subscript𝜃superscript𝑗superscript𝜉superscript𝑗normsubscriptsuperscript𝑗𝑥\displaystyle\leq\max_{\theta_{j^{*}}\geq 0,\;\sum_{j^{*}\in J^{*}(x)}\theta_{% j^{*}}=1}\sum_{j^{*}\in J^{*}(x)}\theta_{j^{*}}\xi^{j^{*}}\cdot\|\nabla h_{j^{% *}}(x)\|≤ roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ 0 , ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ ∥ ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥
maxθj*0,j*J*θj*=1j*J*(x)θj*ξ¯hj*(x)absentsubscriptformulae-sequencesubscript𝜃superscript𝑗0subscriptsuperscript𝑗superscript𝐽subscript𝜃superscript𝑗1subscriptsuperscript𝑗superscript𝐽𝑥subscript𝜃superscript𝑗¯𝜉normsubscriptsuperscript𝑗𝑥\displaystyle\leq\max_{\theta_{j^{*}}\geq 0,\;\sum_{j^{*}\in J^{*}}\theta_{j^{% *}}=1}\sum_{j^{*}\in J^{*}(x)}\theta_{j^{*}}\bar{\xi}\cdot\|\nabla h_{j^{*}}(x)\|≤ roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ 0 , ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_ξ end_ARG ⋅ ∥ ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥
=ξ¯maxj*J*(x)hj*(x)(4)ξ¯maxj=1:mBj=h,absent¯𝜉subscriptsuperscript𝑗superscript𝐽𝑥normsubscriptsuperscript𝑗𝑥italic-(4italic-)¯𝜉subscript:𝑗1𝑚subscript𝐵𝑗subscript\displaystyle=\bar{\xi}\max_{j^{*}\in J^{*}(x)}\|\nabla h_{j^{*}}(x)\|\overset% {\eqref{ass:3}}{\leq}\bar{\xi}\max_{j=1:m}B_{j}=\mathcal{B}_{h},= over¯ start_ARG italic_ξ end_ARG roman_max start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_J start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT ∥ ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥ start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG over¯ start_ARG italic_ξ end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , (12)

which proves our statement. ∎

In the next lemma we provide a Hölderian growth type condition for the functional constraints of the stochastic problem (8).

Lemma 3.4.

Let Assumption 2.4 hold and consider the random vector ξ𝜉\xiitalic_ξ satisfying ξ0𝜉0\xi\geq 0italic_ξ ≥ 0 and 𝔼[ξj]>0𝔼delimited-[]superscript𝜉𝑗0\mathbb{E}[\xi^{j}]>0blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 for all j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m. Then, the functional constraints of the problem (8) satisfy the following Hölderian growth type condition:

dist2q(y,𝒳)c𝔼[(h(y,ξ))+2]ydomg,superscriptdist2𝑞𝑦𝒳𝑐𝔼delimited-[]superscriptsubscript𝑦𝜉2for-all𝑦dom𝑔\emph{dist}^{2q}(y,\mathcal{X})\leq c\cdot\mathbb{E}\left[(h(y,\xi))_{+}^{2}% \right]\;\;\forall y\in{\mathrm{dom}}\,g,dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_y , caligraphic_X ) ≤ italic_c ⋅ blackboard_E [ ( italic_h ( italic_y , italic_ξ ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ∀ italic_y ∈ roman_dom italic_g , (13)

with the parameter c=(c¯minj=1:m𝔼[ξj])𝑐normal-¯𝑐subscriptnormal-:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗c=\left(\frac{\bar{c}}{\min_{j=1:m}\mathbb{E}[\xi^{j}]}\right)italic_c = ( divide start_ARG over¯ start_ARG italic_c end_ARG end_ARG start_ARG roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] end_ARG ).

Proof.

Let ydomg𝑦dom𝑔y\in\text{dom}\;gitalic_y ∈ dom italic_g, using the definition of h(,ξ)𝜉h(\cdot,\xi)italic_h ( ⋅ , italic_ξ ) and Jensen’s inequality, we have:

𝔼[(h(y,ξ))+2]𝔼delimited-[]superscriptsubscript𝑦𝜉2\displaystyle\mathbb{E}\left[(h(y,\xi))_{+}^{2}\right]blackboard_E [ ( italic_h ( italic_y , italic_ξ ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =𝔼[(maxj=1:m(ξjhj(y)))+2](maxj=1:m(𝔼[ξj]hj(y)))+2absent𝔼delimited-[]superscriptsubscriptsubscript:𝑗1𝑚superscript𝜉𝑗subscript𝑗𝑦2superscriptsubscriptsubscript:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗subscript𝑗𝑦2\displaystyle=\mathbb{E}\left[\left(\max_{j=1:m}(\xi^{j}h_{j}(y))\right)_{+}^{% 2}\right]\geq\left(\max_{j=1:m}(\mathbb{E}[\xi^{j}]h_{j}(y))\right)_{+}^{2}= blackboard_E [ ( roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≥ ( roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝔼[ξj]>0minj=1:m𝔼[ξj](maxj=1:m(hj(y)))+2(5)(minj=1:m𝔼[ξj])1c¯dist2q(y,𝒳).𝔼delimited-[]superscript𝜉𝑗0subscript:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗superscriptsubscriptsubscript:𝑗1𝑚subscript𝑗𝑦2italic-(5italic-)subscript:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗1¯𝑐superscriptdist2𝑞𝑦𝒳\displaystyle\overset{\mathbb{E}[\xi^{j}]>0}{\geq}\min_{j=1:m}\mathbb{E}[\xi^{% j}]\left(\max_{j=1:m}(h_{j}(y))\right)_{+}^{2}\overset{\eqref{eq:% constrainterrbound}}{\geq}\left(\min_{j=1:m}\mathbb{E}[\xi^{j}]\right)\frac{1}% {\bar{c}}\text{dist}^{2q}(y,\mathcal{X}).start_OVERACCENT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] > 0 end_OVERACCENT start_ARG ≥ end_ARG roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] ( roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≥ end_ARG ( roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] ) divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_c end_ARG end_ARG dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_y , caligraphic_X ) .

Thus, we have:

dist2q(y,𝒳)superscriptdist2𝑞𝑦𝒳\displaystyle\text{dist}^{2q}(y,\mathcal{X})dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_y , caligraphic_X ) c¯minj=1:m𝔼[ξj]𝔼[(h(y,ξ))+2],absent¯𝑐subscript:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗𝔼delimited-[]superscriptsubscript𝑦𝜉2\displaystyle\leq\frac{\bar{c}}{\min\limits_{j=1:m}\mathbb{E}[\xi^{j}]}\mathbb% {E}\left[(h(y,\xi))_{+}^{2}\right],≤ divide start_ARG over¯ start_ARG italic_c end_ARG end_ARG start_ARG roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] end_ARG blackboard_E [ ( italic_h ( italic_y , italic_ξ ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (14)

which proves our statement. ∎

3.2 Choices for random vectors ζ𝜁\zetaitalic_ζ and ξ𝜉\xiitalic_ξ

In this section we provide several choices for the two random vectors ζ𝜁\zetaitalic_ζ and ξ𝜉\xiitalic_ξ. Let [1:N]\mathcal{I}\subseteq[1:N]caligraphic_I ⊆ [ 1 : italic_N ] and let e=ieisubscript𝑒subscript𝑖subscript𝑒𝑖e_{\mathcal{I}}=\sum_{i\in\mathcal{I}}e_{i}italic_e start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where {e1,,eN}subscript𝑒1subscript𝑒𝑁\{e_{1},...,e_{N}\}{ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } is the standard basis of Nsuperscript𝑁\mathbb{R}^{N}blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. These subsets will be selected using a random set valued map, i.e. sampling S𝑆Sitalic_S. A sampling S𝑆Sitalic_S is uniquely characterized by choosing the probabilities p0subscript𝑝0p_{\mathcal{I}}\geq 0italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT ≥ 0 for all subsets \mathcal{I}caligraphic_I:

[S=]=p[1:N],\mathbb{P}[S=\mathcal{I}]=p_{\mathcal{I}}\;\;\forall\mathcal{I}\subset[1:N],blackboard_P [ italic_S = caligraphic_I ] = italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT ∀ caligraphic_I ⊂ [ 1 : italic_N ] ,

such that [1:N]p=1subscriptdelimited-[]:1𝑁subscript𝑝1\sum_{\mathcal{I}\subset[1:N]}p_{\mathcal{I}}=1∑ start_POSTSUBSCRIPT caligraphic_I ⊂ [ 1 : italic_N ] end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = 1. A sampling S𝑆Sitalic_S is called proper if pi=[iS]=𝔼[𝟏iS]=:ipsubscript𝑝𝑖delimited-[]𝑖𝑆𝔼delimited-[]subscript1𝑖𝑆subscript:𝑖subscript𝑝p_{i}=\mathbb{P}[i\in S]=\mathbb{E}[\mathbf{1}_{i\in S}]=\sum_{\mathcal{I}:i% \in\mathcal{I}}p_{\mathcal{I}}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_P [ italic_i ∈ italic_S ] = blackboard_E [ bold_1 start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT caligraphic_I : italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT is positive for all i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, see also [29, 27]. We now define some practical sampling vectors ζ=ζ(S)𝜁𝜁𝑆\zeta=\zeta(S)italic_ζ = italic_ζ ( italic_S ). For example, let S𝑆Sitalic_S be a proper sampling and let ^=Diag(p1,,pN)^Diagsubscript𝑝1subscript𝑝𝑁\hat{\mathbb{P}}=\text{Diag}(p_{1},...,p_{N})over^ start_ARG blackboard_P end_ARG = Diag ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ). Then, we can consider the sampling vector as:

ζ=^1eSζi=𝟏iSpi.𝜁superscript^1subscript𝑒𝑆superscript𝜁𝑖subscript1𝑖𝑆subscript𝑝𝑖\zeta=\hat{\mathbb{P}}^{-1}e_{S}\implies\zeta^{i}=\frac{\mathbf{1}_{i\in S}}{p% _{i}}.italic_ζ = over^ start_ARG blackboard_P end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ⟹ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = divide start_ARG bold_1 start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG . (15)

Note that 𝔼[ζi]=𝔼[𝟏iS]pi=1𝔼delimited-[]superscript𝜁𝑖𝔼delimited-[]subscript1𝑖𝑆subscript𝑝𝑖1\mathbb{E}[\zeta^{i}]=\frac{\mathbb{E}[\mathbf{1}_{i\in S}]}{p_{i}}=1blackboard_E [ italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = divide start_ARG blackboard_E [ bold_1 start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT ] end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 1 and since ζTζ=i=1N(ζi)2=i=1N𝟏iS/pi2superscript𝜁𝑇𝜁superscriptsubscript𝑖1𝑁superscriptsuperscript𝜁𝑖2superscriptsubscript𝑖1𝑁subscript1𝑖𝑆superscriptsubscript𝑝𝑖2\zeta^{T}\zeta=\sum_{i=1}^{N}(\zeta^{i})^{2}=\sum_{i=1}^{N}\mathbf{1}_{i\in S}% /p_{i}^{2}italic_ζ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ζ = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_1 start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT / italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, then 𝔼[ζ2]=i=1N1/pi𝔼delimited-[]superscriptnorm𝜁2superscriptsubscript𝑖1𝑁1subscript𝑝𝑖\mathbb{E}[\|\zeta\|^{2}]=\sum_{i=1}^{N}1/p_{i}blackboard_E [ ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT 1 / italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For constraints, if we let [1:m]\mathcal{I}^{\prime}\subseteq[1:m]caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ [ 1 : italic_m ] and define e=jejsubscript𝑒superscriptsubscript𝑗superscriptsubscript𝑒𝑗e_{\mathcal{I}^{\prime}}=\sum_{j\in\mathcal{I}^{\prime}}e_{j}italic_e start_POSTSUBSCRIPT caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then a sampling Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is uniquely characterized by choosing probabilities p0subscript𝑝superscript0p_{\mathcal{I}^{\prime}}\geq 0italic_p start_POSTSUBSCRIPT caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ 0 for all subsets superscript\mathcal{I}^{\prime}caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of [1:m]delimited-[]:1𝑚[1:m][ 1 : italic_m ]. Let Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a proper sampling vector, then we can define the practical sampling vector ξ=ξ(S)𝜉𝜉superscript𝑆\xi=\xi(S^{\prime})italic_ξ = italic_ξ ( italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) as:

ξ=eSξj=𝟏jS.𝜉subscript𝑒superscript𝑆superscript𝜉𝑗subscript1𝑗superscript𝑆\xi=e_{S^{\prime}}\implies\xi^{j}=\mathbf{1}_{j\in S^{\prime}}.italic_ξ = italic_e start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟹ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = bold_1 start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT . (16)

Note that 𝔼[ξj]=𝔼[𝟏jS]=pj>0𝔼delimited-[]superscript𝜉𝑗𝔼delimited-[]subscript1𝑗superscript𝑆subscript𝑝𝑗0\mathbb{E}[\xi^{j}]=\mathbb{E}[\mathbf{1}_{j\in S^{\prime}}]=p_{j}>0blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] = blackboard_E [ bold_1 start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] = italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > 0 and ξjξ¯=1superscript𝜉𝑗¯𝜉1\xi^{j}\leq\bar{\xi}=1italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_ξ end_ARG = 1. Furthermore, each sampling S𝑆Sitalic_S and Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT give rise to a particular sampling vector ζ=ζ(S)𝜁𝜁𝑆\zeta=\zeta(S)italic_ζ = italic_ζ ( italic_S ) and ξ=ξ(S)𝜉𝜉superscript𝑆\xi=\xi(S^{\prime})italic_ξ = italic_ξ ( italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Below we provide some sampling examples.

Partition sampling: A partition 𝒫𝒫\mathcal{P}caligraphic_P of [1:N]delimited-[]:1𝑁[1:N][ 1 : italic_N ] is a set consisting of subsets of [1:N]delimited-[]:1𝑁[1:N][ 1 : italic_N ] such that 𝒫=[1:N]\cup_{\mathcal{I}\in\mathcal{P}}\mathcal{I}=[1:N]∪ start_POSTSUBSCRIPT caligraphic_I ∈ caligraphic_P end_POSTSUBSCRIPT caligraphic_I = [ 1 : italic_N ] and il=ϕsubscript𝑖subscript𝑙italic-ϕ\mathcal{I}_{i}\cap\mathcal{I}_{l}=\phicaligraphic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_ϕ for any i,l𝒫subscript𝑖subscript𝑙𝒫\mathcal{I}_{i},\mathcal{I}_{l}\in\mathcal{P}caligraphic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_I start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ caligraphic_P with il𝑖𝑙i\neq litalic_i ≠ italic_l. A partition sampling S𝑆Sitalic_S is a sampling such that p=[S=]>0subscript𝑝delimited-[]𝑆0p_{\mathcal{I}}=\mathbb{P}[S=\mathcal{I}]>0italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = blackboard_P [ italic_S = caligraphic_I ] > 0 for all 𝒫𝒫\mathcal{I}\in\mathcal{P}caligraphic_I ∈ caligraphic_P and 𝒫p=1subscript𝒫subscript𝑝1\sum_{\mathcal{I}\in\mathcal{P}}p_{\mathcal{I}}=1∑ start_POSTSUBSCRIPT caligraphic_I ∈ caligraphic_P end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = 1.

τ𝜏\tauitalic_τ-nice sampling: We say that S𝑆Sitalic_S is τ𝜏\tauitalic_τ–nice if S𝑆Sitalic_S samples from all subsets of [1:N]delimited-[]:1𝑁[1:N][ 1 : italic_N ] of cardinality τ𝜏\tauitalic_τ uniformly at random. In this case we have that pi=τNfor alli=1:N:subscript𝑝𝑖𝜏𝑁for all𝑖1𝑁p_{i}=\frac{\tau}{N}\;\text{for all}\;i=1:Nitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG for all italic_i = 1 : italic_N. Then, p=[S=]=1/(Nτ)subscript𝑝delimited-[]𝑆1binomial𝑁𝜏p_{\mathcal{I}}=\mathbb{P}\left[S=\mathcal{I}\right]=1/{N\choose\tau}italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = blackboard_P [ italic_S = caligraphic_I ] = 1 / ( binomial start_ARG italic_N end_ARG start_ARG italic_τ end_ARG ) for all subsets {1,,N}1𝑁\mathcal{I}\subset\{1,...,N\}caligraphic_I ⊂ { 1 , … , italic_N } with τ𝜏\tauitalic_τ elements.

The reader can also consider other examples for sampling, see e.g., [29] for more details. Let the cardinality of samples S𝑆Sitalic_S and Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, respectively. In the next theorem, using Lemmas 3.2, 3.3 and 3.4, we derive explicit expressions. which depend on the mini-batch sizes τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, for the assumptions’ constants ,,hsubscript\mathcal{B},\mathcal{L},\mathcal{B}_{h}caligraphic_B , caligraphic_L , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c for the two sampling given previously.

Theorem 3.5.

Let Assumption 2.1, 2.3 and 2.4 hold. Let also S𝑆Sitalic_S and Ssuperscript𝑆normal-′S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be sampled uniform at random with partition sampling having the same cardinality τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, or alternatively with τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT- and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice sampling. Then, the constants ,,hsubscript\mathcal{B},\mathcal{L},\mathcal{B}_{h}caligraphic_B , caligraphic_L , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c are:

2=Nτ1B2,=Nτ1L,h=maxj=1:mBj𝑎𝑛𝑑c=(c¯mτ2).formulae-sequencesuperscript2𝑁subscript𝜏1superscript𝐵2formulae-sequence𝑁subscript𝜏1𝐿subscriptsubscript:𝑗1𝑚subscript𝐵𝑗𝑎𝑛𝑑𝑐¯𝑐𝑚subscript𝜏2\mathcal{B}^{2}=\frac{N}{\tau_{1}}B^{2},\;\mathcal{L}=\frac{N}{\tau_{1}}L,\;% \mathcal{B}_{h}=\max_{j=1:m}B_{j}\;\text{and}\;c=\left(\frac{\bar{c}m}{\tau_{2% }}\right).caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , caligraphic_L = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and italic_c = ( divide start_ARG over¯ start_ARG italic_c end_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) .
Proof.

From Lemma 3.2, for the parameters andand\mathcal{B}\;\text{and}\;\mathcal{L}caligraphic_B and caligraphic_L, we have:

2=𝔼[ζ2]NB2=(15)1NSpSiS1pi2B2,superscript2𝔼delimited-[]superscriptnorm𝜁2𝑁superscript𝐵2italic-(15italic-)1𝑁subscript𝑆subscript𝑝𝑆subscript𝑖𝑆1superscriptsubscript𝑝𝑖2superscript𝐵2\displaystyle\mathcal{B}^{2}=\frac{\mathbb{E}[\|\zeta\|^{2}]}{N}B^{2}\overset{% \eqref{samplingVector1}}{=}\frac{1}{N}\sum_{S}p_{S}\sum_{i\in S}\frac{1}{p_{i}% ^{2}}B^{2},caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG blackboard_E [ ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_N end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (17)
=𝔼[ζ2]NL=(15)1NSpSiS1pi2L.𝔼delimited-[]superscriptnorm𝜁2𝑁𝐿italic-(15italic-)1𝑁subscript𝑆subscript𝑝𝑆subscript𝑖𝑆1superscriptsubscript𝑝𝑖2𝐿\displaystyle\mathcal{L}=\frac{\mathbb{E}[\|\zeta\|^{2}]}{N}L\overset{\eqref{% samplingVector1}}{=}\frac{1}{N}\sum_{S}p_{S}\sum_{i\in S}\frac{1}{p_{i}^{2}}L.caligraphic_L = divide start_ARG blackboard_E [ ∥ italic_ζ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_N end_ARG italic_L start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_L .

For partition sampling given the realization S=𝑆S=\mathcal{I}italic_S = caligraphic_I, we have pi=psubscript𝑝𝑖subscript𝑝p_{i}=p_{\mathcal{I}}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT if i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I. Since the cardinality of each \mathcal{I}caligraphic_I is τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the sampling S{1,,}𝑆subscript1subscriptS\in\{\mathcal{I}_{1},...,\mathcal{I}_{\ell}\}italic_S ∈ { caligraphic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_I start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } is chosen uniform at random, then pi=p=1=τ1Nsubscript𝑝𝑖subscript𝑝1subscript𝜏1𝑁p_{i}=p_{\mathcal{I}}=\frac{1}{\ell}=\frac{\tau_{1}}{N}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG roman_ℓ end_ARG = divide start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG. Thus, using (17) we have:

2=1N𝒫pi1pi2B2=1N𝒫τ1Nτ1N2τ12B2=𝒫B2=B2=Nτ1B2.superscript21𝑁subscript𝒫subscript𝑝subscript𝑖1superscriptsubscript𝑝𝑖2superscript𝐵21𝑁subscript𝒫subscript𝜏1𝑁subscript𝜏1superscript𝑁2superscriptsubscript𝜏12superscript𝐵2subscript𝒫superscript𝐵2superscript𝐵2𝑁subscript𝜏1superscript𝐵2\displaystyle\mathcal{B}^{2}=\frac{1}{N}\sum_{\mathcal{I}\in\mathcal{P}}p_{% \mathcal{I}}\sum_{i\in\mathcal{I}}\frac{1}{p_{i}^{2}}B^{2}=\frac{1}{N}\sum_{% \mathcal{I}\in\mathcal{P}}\frac{\tau_{1}}{N}\tau_{1}\frac{N^{2}}{\tau_{1}^{2}}% B^{2}=\sum_{\mathcal{I}\in\mathcal{P}}B^{2}=\ell B^{2}=\frac{N}{\tau_{1}}B^{2}.caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT caligraphic_I ∈ caligraphic_P end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT caligraphic_I ∈ caligraphic_P end_POSTSUBSCRIPT divide start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT caligraphic_I ∈ caligraphic_P end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_ℓ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Similarly, we can prove that =Nτ1L𝑁subscript𝜏1𝐿\mathcal{L}=\frac{N}{\tau_{1}}Lcaligraphic_L = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L. For τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-nice sampling given the realization S=𝑆S=\mathcal{I}italic_S = caligraphic_I, we have, pi=τ1Nsubscript𝑝𝑖subscript𝜏1𝑁p_{i}=\frac{\tau_{1}}{N}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG for all i𝑖iitalic_i and p=1/(Nτ1)subscript𝑝1binomial𝑁subscript𝜏1p_{\mathcal{I}}=1/{N\choose\tau_{1}}italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT = 1 / ( binomial start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ). Using (17), we get:

2=B2Npi1pi2=B2N1(Nτ1)τ1N2τ12=Nτ1B21(Nτ1)=Nτ1B2.superscript2superscript𝐵2𝑁subscriptsubscript𝑝subscript𝑖1superscriptsubscript𝑝𝑖2superscript𝐵2𝑁subscript1binomial𝑁subscript𝜏1subscript𝜏1superscript𝑁2superscriptsubscript𝜏12𝑁subscript𝜏1superscript𝐵2subscript1binomial𝑁subscript𝜏1𝑁subscript𝜏1superscript𝐵2\mathcal{B}^{2}=\frac{B^{2}}{N}\sum_{\mathcal{I}}p_{\mathcal{I}}\sum_{i\in% \mathcal{I}}\frac{1}{p_{i}^{2}}=\frac{B^{2}}{N}\sum_{\mathcal{I}}\frac{1}{{N% \choose\tau_{1}}}\tau_{1}\frac{N^{2}}{\tau_{1}^{2}}=\frac{N}{\tau_{1}}B^{2}% \sum_{\mathcal{I}}\frac{1}{{N\choose\tau_{1}}}=\frac{N}{\tau_{1}}B^{2}.caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) end_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ( binomial start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) end_ARG = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Similarly, we can get the value for other parameter, i.e., =Nτ1L𝑁subscript𝜏1𝐿\mathcal{L}=\frac{N}{\tau_{1}}Lcaligraphic_L = divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L. By Lemma 3.3, for the parameter hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, we have:

h=ξ¯maxj=1:mBj.subscript¯𝜉subscript:𝑗1𝑚subscript𝐵𝑗\mathcal{B}_{h}=\bar{\xi}\max_{j=1:m}B_{j}.caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = over¯ start_ARG italic_ξ end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

Using the definition of ξjsuperscript𝜉𝑗\xi^{j}italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT from (16), i.e., ξjξ¯=1superscript𝜉𝑗¯𝜉1\xi^{j}\leq\bar{\xi}=1italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_ξ end_ARG = 1, we get:

h=maxj=1:mBj.subscriptsubscript:𝑗1𝑚subscript𝐵𝑗\mathcal{B}_{h}=\max_{j=1:m}B_{j}.caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

Note that this bound holds for both types of sampling. Finally, from Lemma 3.4, for the parameter c𝑐citalic_c, we have:

c=c¯minj=1:m𝔼[ξj]=(16)c¯minj=1:mpj.𝑐¯𝑐subscript:𝑗1𝑚𝔼delimited-[]superscript𝜉𝑗italic-(16italic-)¯𝑐subscript:𝑗1𝑚subscript𝑝𝑗c=\frac{\bar{c}}{\min_{j=1:m}\mathbb{E}[\xi^{j}]}\overset{\eqref{% samplingVector2}}{=}\frac{\bar{c}}{\min_{j=1:m}p_{j}}.italic_c = divide start_ARG over¯ start_ARG italic_c end_ARG end_ARG start_ARG roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] end_ARG start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG divide start_ARG over¯ start_ARG italic_c end_ARG end_ARG start_ARG roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG .

Here we use the fact that 𝔼[ξj]=𝔼[𝟏jS]=pj𝔼delimited-[]superscript𝜉𝑗𝔼delimited-[]subscript1𝑗superscript𝑆subscript𝑝𝑗\mathbb{E}[\xi^{j}]=\mathbb{E}[\mathbf{1}_{j\in S^{\prime}}]=p_{j}blackboard_E [ italic_ξ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] = blackboard_E [ bold_1 start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] = italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Now for the given realization S=superscript𝑆superscriptS^{\prime}=\mathcal{I}^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we have pj=p=τ2msubscript𝑝𝑗subscript𝑝superscriptsubscript𝜏2𝑚p_{j}=p_{\mathcal{I}^{\prime}}=\frac{\tau_{2}}{m}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT caligraphic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG for partition sampling and pj=τ2msubscript𝑝𝑗subscript𝜏2𝑚p_{j}=\frac{\tau_{2}}{m}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG for τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice sampling, respectively. Therefore, c=(c¯mτ2)𝑐¯𝑐𝑚subscript𝜏2c=\left(\frac{\bar{c}m}{\tau_{2}}\right)italic_c = ( divide start_ARG over¯ start_ARG italic_c end_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ). These prove our statements. ∎

4 Mini-batch stochastic subgradient projection algorithm

For solving the stochastic reformulation (8) of the optimization problem (1) we adapt the stochastic subgradient projection method from [17]. We refer to this algorithm as the Mini-batch Stochastic Subgradient Projection method (Mini-batch SSP).

Algorithm 1 (Mini-batch SSP): Choosex0𝒴and stepsizesαk>0,β(0,2).formulae-sequenceChoosesubscript𝑥0𝒴and stepsizessubscript𝛼𝑘0𝛽02\text{Choose}\;x_{0}\in\mathcal{Y}\;\text{and stepsizes}\;\alpha_{k}>0,\;\beta% \in(0,2).Choose italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_Y and stepsizes italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 , italic_β ∈ ( 0 , 2 ) . Fork0repeat:For𝑘0repeat:\text{For}\;k\geq 0\;\text{repeat:}For italic_k ≥ 0 repeat: Drawsample vectorsζk1andξk2independently.similar-toDrawsample vectorssubscript𝜁𝑘subscript1andsubscript𝜉𝑘similar-tosubscript2independently\displaystyle\text{Draw}\;\text{sample vectors}\;\zeta_{k}\sim\mathbb{P}_{1}\;% \text{and}\;\xi_{k}\sim\mathbb{P}_{2}\;\text{independently}.Draw sample vectors italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT independently . (18) vk=proxαkg(,ζk)(xkαkf(xk,ζk))subscript𝑣𝑘subscriptproxsubscript𝛼𝑘𝑔subscript𝜁𝑘subscript𝑥𝑘subscript𝛼𝑘𝑓subscript𝑥𝑘subscript𝜁𝑘\displaystyle v_{k}=\text{prox}_{\alpha_{k}g(\cdot,\zeta_{k})}\left(x_{k}-% \alpha_{k}\nabla f(x_{k},\zeta_{k})\right)italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_g ( ⋅ , italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∇ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) (19) Computeh(vk,ξk)=max(ξk1h1(vk),,ξkmhm(vk))Computesubscript𝑣𝑘subscript𝜉𝑘subscriptsuperscript𝜉1𝑘subscript1subscript𝑣𝑘subscriptsuperscript𝜉𝑚𝑘subscript𝑚subscript𝑣𝑘\displaystyle\text{Compute}\;h(v_{k},\xi_{k})=\max(\xi^{1}_{k}h_{1}(v_{k}),...% ,\xi^{m}_{k}h_{m}(v_{k}))Compute italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_max ( italic_ξ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , … , italic_ξ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) zk=vkβ(h(vk,ξk))+h(vk,ξk)2h(vk,ξk)subscript𝑧𝑘subscript𝑣𝑘𝛽subscriptsubscript𝑣𝑘subscript𝜉𝑘superscriptnormsubscript𝑣𝑘subscript𝜉𝑘2subscript𝑣𝑘subscript𝜉𝑘\displaystyle z_{k}=v_{k}-\beta\frac{(h(v_{k},\xi_{k}))_{+}}{\|\nabla h(v_{k},% \xi_{k})\|^{2}}\nabla h(v_{k},\xi_{k})italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_β divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG ∥ ∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (20) xk+1=Π𝒴(zk).subscript𝑥𝑘1subscriptΠ𝒴subscript𝑧𝑘\displaystyle x_{k+1}=\Pi_{\mathcal{Y}}(z_{k}).italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .

Using the sampling paradigm in Section 3, the Mini-batch SSP algorithm can incorporate a diverse array of mini-batch variants, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. Most of our variants of Mini-batch SSP, with different mini-batch sizes for the objective function and functional constraints, were never explicitly considered in the literature before, e.g., the variants corresponding to partition and nice samplings. Note that at each iteration our algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function (see (19)) and then a subsequent mini-batch subgradient projection step minimizing the feasibility violation (see (20)). More precisely, if the random vector ζksubscript𝜁𝑘\zeta_{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has ζki=1superscriptsubscript𝜁𝑘𝑖1\zeta_{k}^{i}=1italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 for all ik𝑖subscript𝑘i\in\mathcal{I}_{k}italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ζki=0superscriptsubscript𝜁𝑘𝑖0\zeta_{k}^{i}=0italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 0 for all i{1,,N}k𝑖1𝑁subscript𝑘i\in\{1,\cdots,N\}\setminus\mathcal{I}_{k}italic_i ∈ { 1 , ⋯ , italic_N } ∖ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, then step (19) is a mini-batch proximal subgradient iteration:

vk=proxαkikgi(xkαkikfi(xk)).subscript𝑣𝑘subscriptproxsubscript𝛼𝑘subscript𝑖subscript𝑘subscript𝑔𝑖subscript𝑥𝑘subscript𝛼𝑘subscript𝑖subscript𝑘subscript𝑓𝑖subscript𝑥𝑘v_{k}=\text{prox}_{\alpha_{k}\sum_{i\in\mathcal{I}_{k}}g_{i}}\left(x_{k}-% \alpha_{k}\sum_{i\in\mathcal{I}_{k}}\nabla f_{i}(x_{k})\right).italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) .

Similarly, if the random vector ξksubscript𝜉𝑘\xi_{k}italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has ξki=1superscriptsubscript𝜉𝑘𝑖1\xi_{k}^{i}=1italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 for all ik𝑖superscriptsubscript𝑘i\in\mathcal{I}_{k}^{\prime}italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and ξki=0superscriptsubscript𝜉𝑘𝑖0\xi_{k}^{i}=0italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 0 for all i{1,,m}k𝑖1𝑚superscriptsubscript𝑘i\in\{1,\cdots,m\}\setminus\mathcal{I}_{k}^{\prime}italic_i ∈ { 1 , ⋯ , italic_m } ∖ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then step (20) minimizes the feasibility violation of the observed mini-batch of constraints, i.e., we choose from the mini-batch the constraint that is violated the most, h(vk,ξk)=maxjkhj(vk)=hjk*(vk)subscript𝑣𝑘subscript𝜉𝑘subscript𝑗superscriptsubscript𝑘subscript𝑗subscript𝑣𝑘subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘h(v_{k},\xi_{k})=\max_{j\in\mathcal{I}_{k}^{\prime}}h_{j}(v_{k})=h_{j_{k}^{*}}% (v_{k})italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_max start_POSTSUBSCRIPT italic_j ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) for some index jk*ksuperscriptsubscript𝑗𝑘superscriptsubscript𝑘j_{k}^{*}\in\mathcal{I}_{k}^{\prime}italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and then perform a Polyak’s subgradient like update on it [24]:

zk=vkβ(h(vk,ξk))+h(vk,ξk)2h(vk,ξk)=vkβ(hjk*(vk))+hjk*(vk)2hjk*(vk).subscript𝑧𝑘subscript𝑣𝑘𝛽subscriptsubscript𝑣𝑘subscript𝜉𝑘superscriptnormsubscript𝑣𝑘subscript𝜉𝑘2subscript𝑣𝑘subscript𝜉𝑘subscript𝑣𝑘𝛽subscriptsubscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘superscriptnormsubscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘2subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘z_{k}=v_{k}-\beta\frac{(h(v_{k},\xi_{k}))_{+}}{\|\nabla h(v_{k},\xi_{k})\|^{2}% }\nabla h(v_{k},\xi_{k})=v_{k}-\beta\frac{(h_{j_{k}^{*}}(v_{k}))_{+}}{\|\nabla h% _{j_{k}^{*}}(v_{k})\|^{2}}\nabla h_{j_{k}^{*}}(v_{k}).italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_β divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG ∥ ∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_β divide start_ARG ( italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG ∥ ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .

Consider any arbitrary nonzero shnsubscript𝑠superscript𝑛s_{h}\in\mathbb{R}^{n}italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Disregarding the abuse of notation, we compute the vector h(vk,ξk)=hjk*(vk)subscript𝑣𝑘subscript𝜉𝑘subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘\nabla h(v_{k},\xi_{k})=\nabla h_{j_{k}^{*}}(v_{k})∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) by:

hjk*(vk)={hjk*(vk)hjk*(vk)if hjk*(vk)>0sh0if hjk*(vk)0.subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘casessubscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘if subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘0subscript𝑠0if subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘0\nabla h_{j_{k}^{*}}(v_{k})=\begin{cases}\nabla h_{j_{k}^{*}}(v_{k})\in% \partial h_{j_{k}^{*}}(v_{k})&\mbox{if }\;h_{j_{k}^{*}}(v_{k})>0\\ s_{h}\neq 0&\mbox{if }\;h_{j_{k}^{*}}(v_{k})\leq 0.\end{cases}∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = { start_ROW start_CELL ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ ∂ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > 0 end_CELL end_ROW start_ROW start_CELL italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≠ 0 end_CELL start_CELL if italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ 0 . end_CELL end_ROW

When (h(vk,ξk))+=(hjk*(vk))+=0subscriptsubscript𝑣𝑘subscript𝜉𝑘subscriptsubscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘0(h(v_{k},\xi_{k}))_{+}=(h_{j_{k}^{*}}(v_{k}))_{+}=0( italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ( italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0, we have zk=vksubscript𝑧𝑘subscript𝑣𝑘z_{k}=v_{k}italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for any choice of sh0subscript𝑠0s_{h}\neq 0italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≠ 0. Note that in the Mini-batch SPP algorithm αk>0subscript𝛼𝑘0\alpha_{k}>0italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 and β>0𝛽0\beta>0italic_β > 0 are deterministic stepsizes. Moreover, when β=1𝛽1\beta=1italic_β = 1, zksubscript𝑧𝑘z_{k}italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the projection of vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT onto the hyperplane given by the functional constraint that is violated the most in the observed mini-batch of constraints given by the index set ksuperscriptsubscript𝑘\mathcal{I}_{k}^{\prime}caligraphic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT:

vk,ξk={z:h(vk,ξk)+h(vk,ξk)T(zvk)0}={z:hjk*(vk)+hjk*(vk)T(zvk)0},subscriptsubscript𝑣𝑘subscript𝜉𝑘conditional-set𝑧subscript𝑣𝑘subscript𝜉𝑘superscriptsubscript𝑣𝑘subscript𝜉𝑘𝑇𝑧subscript𝑣𝑘0conditional-set𝑧subscriptsuperscriptsubscript𝑗𝑘subscript𝑣𝑘subscriptsuperscriptsubscript𝑗𝑘superscriptsubscript𝑣𝑘𝑇𝑧subscript𝑣𝑘0\mathcal{H}_{v_{k},\xi_{k}}=\{z:h(v_{k},\xi_{k})+\nabla h(v_{k},\xi_{k})^{T}(z% -v_{k})\!\leq\!0\}\!=\!\{z:h_{j_{k}^{*}}(v_{k})+\nabla h_{j_{k}^{*}}(v_{k})^{T% }(z-v_{k})\!\leq\!0\},caligraphic_H start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_z : italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_h ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_z - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ 0 } = { italic_z : italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_z - italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ 0 } ,

that is, we have zk=Πvk,ξk(vk)subscript𝑧𝑘subscriptΠsubscriptsubscript𝑣𝑘subscript𝜉𝑘subscript𝑣𝑘z_{k}=\Pi_{\mathcal{H}_{v_{k},\xi_{k}}}(v_{k})italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) when we choose β=1𝛽1\beta=1italic_β = 1. In the next sections we analyse the convergence behaviour of Mini-batch SSP algorithm and derive rates depending explicitly on the mini-batch sizes and on the properties of the objective function.

4.1 Convergence analysis: convex objective function

In this section we consider that the functions fi,gisubscript𝑓𝑖subscript𝑔𝑖f_{i},g_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in problem (1) are convex and the random vectors ζ𝜁\zetaitalic_ζ and ξ𝜉\xiitalic_ξ are non-negative. Let us define the filtration as the sigma algebra generated by the history of the random vectors ζ𝜁\zetaitalic_ζ and ξ𝜉\xiitalic_ξ:

[k]=σ({ζt,ξt: 0tk}).subscriptdelimited-[]𝑘𝜎conditional-setsubscript𝜁𝑡subscript𝜉𝑡 0𝑡𝑘\mathcal{F}_{[k]}=\sigma(\{\zeta_{t},\xi_{t}:\;0\leq t\leq k\}).caligraphic_F start_POSTSUBSCRIPT [ italic_k ] end_POSTSUBSCRIPT = italic_σ ( { italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : 0 ≤ italic_t ≤ italic_k } ) .

The next lemma, whose proof is similar to Lemma 5 in [17] provides a key descent property for the sequence vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (recall that v¯k=Π𝒳*(vk)subscript¯𝑣𝑘subscriptΠsuperscript𝒳subscript𝑣𝑘\bar{v}_{k}=\Pi_{\mathcal{X}^{*}}(v_{k})over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and x¯k=Π𝒳*(xk)subscript¯𝑥𝑘subscriptΠsuperscript𝒳subscript𝑥𝑘\bar{x}_{k}=\Pi_{\mathcal{X}^{*}}(x_{k})over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )).

Lemma 4.1.

Let fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with i=1:Nnormal-:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, be convex functions and ζ0𝜁0\zeta\geq 0italic_ζ ≥ 0. Additionally, let the bounded gradient condition from Assumption 2.1 hold. Then, for any k0𝑘0k\geq 0italic_k ≥ 0 and stepsize αk>0subscript𝛼𝑘0\alpha_{k}>0italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0, we have the following recursion:

𝔼[vkv¯k2]𝔼[xkx¯k2]αk(2αk)𝔼[F(xk)F(x¯k)]+αk22,𝔼delimited-[]superscriptnormsubscript𝑣𝑘subscript¯𝑣𝑘2𝔼delimited-[]superscriptnormsubscript𝑥𝑘subscript¯𝑥𝑘2subscript𝛼𝑘2subscript𝛼𝑘𝔼delimited-[]𝐹subscript𝑥𝑘𝐹subscript¯𝑥𝑘superscriptsubscript𝛼𝑘2superscript2\displaystyle\mathbb{E}[\|v_{k}-\bar{v}_{k}\|^{2}]\leq\mathbb{E}[\|x_{k}-\bar{% x}_{k}\|^{2}]-\alpha_{k}(2-\alpha_{k}\mathcal{L})\,\mathbb{E}[F(x_{k})-F(\bar{% x}_{k})]+\alpha_{k}^{2}\mathcal{B}^{2},blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ blackboard_E [ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (21)

with \mathcal{B}caligraphic_B and \mathcal{L}caligraphic_L given in Lemma 3.2.

The following lemma establishes a relation between xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and vk1subscript𝑣𝑘1v_{k-1}italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT. The proof is similar to Lemma 6 in [17].

Lemma 4.2.

Let hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, with j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, be convex functions and ξ0𝜉0\xi\geq 0italic_ξ ≥ 0. Additionally, assume that the bounded subgradient condition from Assumption 2.3 holds. Then, for any y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y such that (h(y,ξk1))+=0subscript𝑦subscript𝜉𝑘10(h(y,\xi_{k-1}))_{+}=0( italic_h ( italic_y , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0, the following relation holds:

xky2vk1y2β(2β)[(h(vk1,ξk1))+2h2],superscriptnormsubscript𝑥𝑘𝑦2superscriptnormsubscript𝑣𝑘1𝑦2𝛽2𝛽delimited-[]superscriptsubscriptsubscript𝑣𝑘1subscript𝜉𝑘12subscriptsuperscript2\displaystyle\|x_{k}-y\|^{2}\leq\|v_{k-1}-y\|^{2}-\beta(2-\beta)\left[\frac{(h% (v_{k-1},\xi_{k-1}))_{+}^{2}}{\mathcal{B}^{2}_{h}}\right],∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_β ( 2 - italic_β ) [ divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG ] , (22)

with hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT given in Lemma 3.3.

Taking now y=Π𝒳(vk1)𝒳𝒴𝑦subscriptΠ𝒳subscript𝑣𝑘1𝒳𝒴y=\Pi_{\mathcal{X}}({v}_{k-1})\subseteq\mathcal{X}\subseteq\mathcal{Y}italic_y = roman_Π start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ⊆ caligraphic_X ⊆ caligraphic_Y, then (h(Π𝒳(vk1),ξk1))+=0subscriptsubscriptΠ𝒳subscript𝑣𝑘1subscript𝜉𝑘10(h(\Pi_{\mathcal{X}}({v}_{k-1}),\xi_{k-1}))_{+}=0( italic_h ( roman_Π start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0 and

dist2(xk,𝒳)superscriptdist2subscript𝑥𝑘𝒳\displaystyle\text{dist}^{2}(x_{k},\mathcal{X})dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) =xkΠ𝒳(xk)2xkΠ𝒳(vk1)2absentsuperscriptnormsubscript𝑥𝑘subscriptΠ𝒳subscript𝑥𝑘2superscriptnormsubscript𝑥𝑘subscriptΠ𝒳subscript𝑣𝑘12\displaystyle=\|x_{k}-\Pi_{\mathcal{X}}({x}_{k})\|^{2}\leq\|x_{k}-\Pi_{% \mathcal{X}}({v}_{k-1})\|^{2}= ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - roman_Π start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - roman_Π start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(22)dist2(vk1,𝒳)β(2β)(h(vk1,ξk1))+2h2italic-(22italic-)superscriptdist2subscript𝑣𝑘1𝒳𝛽2𝛽superscriptsubscriptsubscript𝑣𝑘1subscript𝜉𝑘12subscriptsuperscript2\displaystyle\overset{\eqref{eq:x_k_v_k-1}}{\leq}\text{dist}^{2}(v_{k-1},% \mathcal{X})-\beta(2-\beta)\frac{(h(v_{k-1},\xi_{k-1}))_{+}^{2}}{\mathcal{B}^{% 2}_{h}}start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , caligraphic_X ) - italic_β ( 2 - italic_β ) divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG
dist2(vk1,𝒳).absentsuperscriptdist2subscript𝑣𝑘1𝒳\displaystyle\leq\text{dist}^{2}(v_{k-1},\mathcal{X}).≤ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , caligraphic_X ) .

Thus for any q1𝑞1q\geq 1italic_q ≥ 1, we have:

dist2q(xk,𝒳)dist2q(vk1,𝒳).superscriptdist2𝑞subscript𝑥𝑘𝒳superscriptdist2𝑞subscript𝑣𝑘1𝒳\displaystyle\text{dist}^{2q}(x_{k},\mathcal{X})\leq\text{dist}^{2q}(v_{k-1},% \mathcal{X}).dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ≤ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , caligraphic_X ) . (23)
Lemma 4.3.

Let Assumptions 2.3 and 2.4 hold and the random vectors ξ𝜉\xiitalic_ξ and ζ𝜁\zetaitalic_ζ be nonnegative. Then, the following relation is valid:

𝔼[xkx¯k2]𝔼[vk1v¯k12]β(2β)ch2𝔼[dist2q(xk,𝒳)],𝔼delimited-[]superscriptnormsubscript𝑥𝑘subscript¯𝑥𝑘2𝔼delimited-[]superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12𝛽2𝛽𝑐subscriptsuperscript2𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑘𝒳\mathbb{E}[\|x_{k}-\bar{x}_{k}\|^{2}]\leq\mathbb{E}[\|v_{k-1}-\bar{v}_{k-1}\|^% {2}]-\frac{\beta(2-\beta)}{c\mathcal{B}^{2}_{h}}\mathbb{E}\left[\emph{dist}^{2% q}(x_{k},\mathcal{X})\right],blackboard_E [ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - divide start_ARG italic_β ( 2 - italic_β ) end_ARG start_ARG italic_c caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ,

with hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c given in Lemmas 3.3 and 3.4, respectively.

Proof.

Note that for v¯k1𝒳*𝒳𝒴subscript¯𝑣𝑘1superscript𝒳𝒳𝒴\bar{v}_{k-1}\in\mathcal{X}^{*}\subseteq\mathcal{X}\subseteq\mathcal{Y}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⊆ caligraphic_X ⊆ caligraphic_Y we have (h(v¯k1,ξk1))+=0subscriptsubscript¯𝑣𝑘1subscript𝜉𝑘10(h(\bar{v}_{k-1},\xi_{k-1}))_{+}=0( italic_h ( over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0 and using Lemma 4.2 with y=v¯k1𝑦subscript¯𝑣𝑘1y=\bar{v}_{k-1}italic_y = over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, we get:

xkx¯k2xkv¯k12vk1v¯k12β(2β)[(h(vk1,ξk1))+2h2].superscriptnormsubscript𝑥𝑘subscript¯𝑥𝑘2superscriptnormsubscript𝑥𝑘subscript¯𝑣𝑘12superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12𝛽2𝛽delimited-[]superscriptsubscriptsubscript𝑣𝑘1subscript𝜉𝑘12subscriptsuperscript2\displaystyle\|x_{k}-\bar{x}_{k}\|^{2}\leq\|x_{k}-\bar{v}_{k-1}\|^{2}\leq\|v_{% k-1}-\bar{v}_{k-1}\|^{2}-\beta(2-\beta)\left[\frac{(h(v_{k-1},\xi_{k-1}))_{+}^% {2}}{\mathcal{B}^{2}_{h}}\right].∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_β ( 2 - italic_β ) [ divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG ] .

Taking conditional expectation on ξk1subscript𝜉𝑘1\xi_{k-1}italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT given [k2]subscriptdelimited-[]𝑘2\mathcal{F}_{[k-2]}caligraphic_F start_POSTSUBSCRIPT [ italic_k - 2 ] end_POSTSUBSCRIPT, we get:

𝔼ξk1[xkx¯k2|[k2]]subscript𝔼subscript𝜉𝑘1delimited-[]conditionalsuperscriptnormsubscript𝑥𝑘subscript¯𝑥𝑘2subscriptdelimited-[]𝑘2\displaystyle\mathbb{E}_{\xi_{k-1}}[\|x_{k}-\bar{x}_{k}\|^{2}|\mathcal{F}_{[k-% 2]}]blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT [ italic_k - 2 ] end_POSTSUBSCRIPT ] vk1v¯k12β(2β)𝔼ξk1[(h(vk1,ξk1))+2h2|[k2]]absentsuperscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12𝛽2𝛽subscript𝔼subscript𝜉𝑘1delimited-[]conditionalsuperscriptsubscriptsubscript𝑣𝑘1subscript𝜉𝑘12subscriptsuperscript2subscriptdelimited-[]𝑘2\displaystyle\leq\|v_{k-1}-\bar{v}_{k-1}\|^{2}-\beta(2-\beta)\mathbb{E}_{\xi_{% k-1}}\left[\frac{(h(v_{k-1},\xi_{k-1}))_{+}^{2}}{\mathcal{B}^{2}_{h}}|\mathcal% {F}_{[k-2]}\right]≤ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_β ( 2 - italic_β ) blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG ( italic_h ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG | caligraphic_F start_POSTSUBSCRIPT [ italic_k - 2 ] end_POSTSUBSCRIPT ]
(13)vk1v¯k12β(2β)ch2dist2q(vk1,𝒳)italic-(13italic-)superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12𝛽2𝛽𝑐subscriptsuperscript2superscriptdist2𝑞subscript𝑣𝑘1𝒳\displaystyle\overset{\eqref{qreg}}{\leq}\|v_{k-1}-\bar{v}_{k-1}\|^{2}-\frac{% \beta(2-\beta)}{c\mathcal{B}^{2}_{h}}\text{dist}^{2q}(v_{k-1},\mathcal{X})start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_β ( 2 - italic_β ) end_ARG start_ARG italic_c caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , caligraphic_X )
(23)vk1v¯k12β(2β)ch2dist2q(xk,𝒳).italic-(23italic-)superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12𝛽2𝛽𝑐subscriptsuperscript2superscriptdist2𝑞subscript𝑥𝑘𝒳\displaystyle\overset{\eqref{eq:distvdistx}}{\leq}\|v_{k-1}-\bar{v}_{k-1}\|^{2% }-\frac{\beta(2-\beta)}{c\mathcal{B}^{2}_{h}}\text{dist}^{2q}(x_{k},\mathcal{X% }).start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_β ( 2 - italic_β ) end_ARG start_ARG italic_c caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) .

Taking now the full expectation, we obtain our statement. ∎

For simplicity of the exposition let us introduce the following constant:

Cβ,c,h:=β(2β)ch2>0.assignsubscript𝐶𝛽𝑐subscript𝛽2𝛽𝑐subscriptsuperscript20\displaystyle C_{\beta,c,\mathcal{B}_{h}}:=\frac{\beta(2-\beta)}{c\mathcal{B}^% {2}_{h}}>0.italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT := divide start_ARG italic_β ( 2 - italic_β ) end_ARG start_ARG italic_c caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG > 0 . (24)

We impose the following conditions on the stepsize αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

0<αkαk(2αk)<1αk{(0,12)if=0(0,1(1)+)if>0.iff0subscript𝛼𝑘subscript𝛼𝑘2subscript𝛼𝑘1subscript𝛼𝑘cases012if0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒01subscript1if0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒\displaystyle 0<\alpha_{k}\leq\alpha_{k}(2-\alpha_{k}\mathcal{L})<1\;\;\iff\;% \;\alpha_{k}\in\begin{cases}\left(0,\frac{1}{2}\right)\;\;\text{if}\;\mathcal{% L}=0\\ \left(0,\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L}}\right)\;\;\text{if}\;% \mathcal{L}>0.\end{cases}0 < italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) < 1 ⇔ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { start_ROW start_CELL ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) if caligraphic_L = 0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ( 0 , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) if caligraphic_L > 0 . end_CELL start_CELL end_CELL end_ROW (25)

Then, we can define the following average sequence generated by the algorithm SSP:

x^k=j=1kαj(2αj)xjSk,whereSk=j=1kαj(2αj).formulae-sequencesubscript^𝑥𝑘superscriptsubscript𝑗1𝑘subscript𝛼𝑗2subscript𝛼𝑗subscript𝑥𝑗subscript𝑆𝑘wheresubscript𝑆𝑘superscriptsubscript𝑗1𝑘subscript𝛼𝑗2subscript𝛼𝑗\hat{x}_{k}=\frac{\sum_{j=1}^{k}\alpha_{j}{\color[rgb]{0,0,0}(2-\alpha_{j}% \mathcal{L})}x_{j}}{S_{k}},\quad\text{where}\;S_{k}=\sum_{j=1}^{k}\alpha_{j}{% \color[rgb]{0,0,0}(2-\alpha_{j}\mathcal{L})}.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT caligraphic_L ) italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , where italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT caligraphic_L ) .

Note that this type of average sequence is also consider in [5] for unconstrained stochastic optimization problems. The next theorem derives sublinear convergence rates for the average sequence x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Theorem 4.4.

Let fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with i=1:Nnormal-:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, with j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, be convex functions. Additionally, Assumptions 2.1, 2.3 and 2.4 hold and the random vectors ζ,ξ𝜁𝜉\zeta,\;\xiitalic_ζ , italic_ξ are nonnegative. Further, consider a nonincreasing positive stepsize sequence αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as in (25), satisfying k0αk=subscript𝑘0subscript𝛼𝑘\sum_{k\geq 0}\alpha_{k}=\infty∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∞ and k0αk2<subscript𝑘0superscriptsubscript𝛼𝑘2\sum_{k\geq 0}\alpha_{k}^{2}<\infty∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞, and stepsize β(0,2)𝛽02\beta\in(0,2)italic_β ∈ ( 0 , 2 ). Then, we have the following convergence rates for the average sequence x^ksubscriptnormal-^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in terms of optimality and feasibility violation for problem (1):

𝔼[F(x^k)F*]v0v¯02Sk+2t=1kαt2Sk,𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02subscript𝑆𝑘superscript2superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2subscript𝑆𝑘\displaystyle\mathbb{E}\left[F(\hat{x}_{k})-F^{*}\right]\leq\frac{\|v_{0}-% \overline{v}_{0}\|^{2}}{S_{k}}+\frac{\mathcal{B}^{2}\sum_{t=1}^{k}\alpha_{t}^{% 2}}{S_{k}},blackboard_E [ italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] ≤ divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ,
𝔼[dist2(x^k,𝒳)](1Cβ,c,hSk)1q[v0v¯02q+2qt=1kαt2q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript1subscript𝐶𝛽𝑐subscriptsubscript𝑆𝑘1𝑞delimited-[]superscriptnormsubscript𝑣0subscript¯𝑣02𝑞superscript2𝑞superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2𝑞\displaystyle\mathbb{E}\left[\emph{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{1}{C_{\beta,c,\mathcal{B}_{h}}\cdot S_{k}}\right)^{\frac{1}{q}% }\left[\|v_{0}-\bar{v}_{0}\|^{\frac{2}{q}}+\mathcal{B}^{\frac{2}{q}}\sum_{t=1}% ^{k}\alpha_{t}^{\frac{2}{q}}\right].blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT + caligraphic_B start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .
Proof.

Combining Lemma 4.3 with Lemma 4.1, we have:

𝔼[vkv¯k2]+β(2β)ch2𝔼[dist2q(xk,𝒳)]+αk(2αk)𝔼[F(xk)F(x¯k)]𝔼delimited-[]superscriptnormsubscript𝑣𝑘subscript¯𝑣𝑘2𝛽2𝛽𝑐subscriptsuperscript2𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑘𝒳subscript𝛼𝑘2subscript𝛼𝑘𝔼delimited-[]𝐹subscript𝑥𝑘𝐹subscript¯𝑥𝑘\displaystyle\mathbb{E}\left[\|v_{k}-\bar{v}_{k}\|^{2}\right]+\frac{\beta(2-% \beta)}{c\mathcal{B}^{2}_{h}}\mathbb{E}[\text{dist}^{2q}(x_{k},\mathcal{X})]+% \alpha_{k}(2-\alpha_{k}\mathcal{L})\mathbb{E}\left[F(x_{k})-F(\bar{x}_{k})\right]blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG italic_β ( 2 - italic_β ) end_ARG start_ARG italic_c caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ]
𝔼[vk1v¯k12]+αk22.absent𝔼delimited-[]superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12superscriptsubscript𝛼𝑘2superscript2\displaystyle\leq\mathbb{E}[\|v_{k-1}-\bar{v}_{k-1}\|^{2}]+\alpha_{k}^{2}% \mathcal{B}^{2}.≤ blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Together with the fact that αk(2αk)<1subscript𝛼𝑘2subscript𝛼𝑘1\alpha_{k}(2-\alpha_{k}\mathcal{L})<1italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) < 1, it yields:

𝔼[vkv¯k2]+Cβ,c,hαk(2αk)𝔼[dist2q(xk,𝒳)]+αk(2αk)𝔼[F(xk)F(x¯k)]𝔼delimited-[]superscriptnormsubscript𝑣𝑘subscript¯𝑣𝑘2subscript𝐶𝛽𝑐subscriptsubscript𝛼𝑘2subscript𝛼𝑘𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑘𝒳subscript𝛼𝑘2subscript𝛼𝑘𝔼delimited-[]𝐹subscript𝑥𝑘𝐹subscript¯𝑥𝑘\displaystyle\mathbb{E}\left[\|v_{k}-\bar{v}_{k}\|^{2}\right]+C_{\beta,c,% \mathcal{B}_{h}}\alpha_{k}(2-\alpha_{k}\mathcal{L})\mathbb{E}[\text{dist}^{2q}% (x_{k},\mathcal{X})]+\alpha_{k}(2-\alpha_{k}\mathcal{L})\mathbb{E}\left[F(x_{k% })-F(\bar{x}_{k})\right]blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ]
𝔼[vk1v¯k12]+αk22.absent𝔼delimited-[]superscriptnormsubscript𝑣𝑘1subscript¯𝑣𝑘12superscriptsubscript𝛼𝑘2superscript2\displaystyle\leq\mathbb{E}[\|v_{k-1}-\bar{v}_{k-1}\|^{2}]+\alpha_{k}^{2}% \mathcal{B}^{2}.≤ blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Summing this relation from t=1:k:𝑡1𝑘t=1:kitalic_t = 1 : italic_k, we get:

𝔼[vkv¯k2]+Cβ,c,ht=1kαt(2αt)𝔼[dist2q(xt,𝒳)]𝔼delimited-[]superscriptnormsubscript𝑣𝑘subscript¯𝑣𝑘2subscript𝐶𝛽𝑐subscriptsuperscriptsubscript𝑡1𝑘subscript𝛼𝑡2subscript𝛼𝑡𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑡𝒳\displaystyle\mathbb{E}\left[\|v_{k}-\bar{v}_{k}\|^{2}\right]+C_{\beta,c,% \mathcal{B}_{h}}\sum_{t=1}^{k}{\color[rgb]{0,0,0}\alpha_{t}(2-\alpha_{t}% \mathcal{L})}\mathbb{E}\left[\text{dist}^{2q}(x_{t},\mathcal{X})\right]blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_X ) ]
+t=1kαt(2αt)𝔼[F(xt)F*]v0v¯02+2t=1kαt2.superscriptsubscript𝑡1𝑘subscript𝛼𝑡2subscript𝛼𝑡𝔼delimited-[]𝐹subscript𝑥𝑡superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02superscript2superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2\displaystyle\quad+\sum_{t=1}^{k}{\color[rgb]{0,0,0}\alpha_{t}(2-\alpha_{t}% \mathcal{L})}\mathbb{E}\left[F(x_{t})-F^{*}\right]\leq\|v_{0}-\bar{v}_{0}\|^{2% }+\mathcal{B}^{2}\sum_{t=1}^{k}\alpha_{t}^{2}.+ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_L ) blackboard_E [ italic_F ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] ≤ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

From the definition of the average sequence x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the convexity of F𝐹Fitalic_F and of dist2(,𝒳)superscriptdist2𝒳\text{dist}^{2}(\cdot,\mathcal{X})dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ , caligraphic_X ), we get sublinear rate in expectation for the average sequence in terms of optimality:

𝔼[F(x^k)F*]t=1kαt(2αt)Sk𝔼[F(xt)F*]v0v¯02Sk+2t=1kαt2Sk.𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹superscriptsubscript𝑡1𝑘subscript𝛼𝑡2subscript𝛼𝑡subscript𝑆𝑘𝔼delimited-[]𝐹subscript𝑥𝑡superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02subscript𝑆𝑘superscript2superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2subscript𝑆𝑘\displaystyle\mathbb{E}\left[F(\hat{x}_{k})-F^{*}\right]\leq\sum_{t=1}^{k}% \frac{{\color[rgb]{0,0,0}\alpha_{t}(2-\alpha_{t}\mathcal{L})}}{S_{k}}\mathbb{E% }\left[F(x_{t})-F^{*}\right]\leq\frac{\|v_{0}-\bar{v}_{0}\|^{2}}{S_{k}}+% \mathcal{B}^{2}\frac{\sum_{t=1}^{k}\alpha_{t}^{2}}{S_{k}}.blackboard_E [ italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] ≤ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_L ) end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG blackboard_E [ italic_F ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] ≤ divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG .

Also by using Jensen’s inequality and q1𝑞1q\geq 1italic_q ≥ 1, we have:

Cβ,c,h(𝔼[dist2(x^k,𝒳)])qCβ,c,h𝔼[dist2q(x^k,𝒳)]subscript𝐶𝛽𝑐subscriptsuperscript𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳𝑞subscript𝐶𝛽𝑐subscript𝔼delimited-[]superscriptdist2𝑞subscript^𝑥𝑘𝒳\displaystyle C_{\beta,c,\mathcal{B}_{h}}\left(\mathbb{E}\left[\text{dist}^{2}% (\hat{x}_{k},\mathcal{X})\right]\right)^{q}\leq C_{\beta,c,\mathcal{B}_{h}}% \mathbb{E}\left[\text{dist}^{2q}(\hat{x}_{k},\mathcal{X})\right]italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ]
Cβ,c,ht=1kαt(2αt)Sk𝔼[dist2q(xt,𝒳)]v0v¯02Sk+2t=1kαt2Sk.absentsubscript𝐶𝛽𝑐subscriptsuperscriptsubscript𝑡1𝑘subscript𝛼𝑡2subscript𝛼𝑡subscript𝑆𝑘𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑡𝒳superscriptnormsubscript𝑣0subscript¯𝑣02subscript𝑆𝑘superscript2superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2subscript𝑆𝑘\displaystyle\leq C_{\beta,c,\mathcal{B}_{h}}\sum_{t=1}^{k}\frac{{\color[rgb]{% 0,0,0}\alpha_{t}(2-\alpha_{t}\mathcal{L})}}{S_{k}}\mathbb{E}\left[\text{dist}^% {2q}(x_{t},\mathcal{X})\right]\leq\frac{\|v_{0}-\bar{v}_{0}\|^{2}}{S_{k}}+% \mathcal{B}^{2}\frac{\sum_{t=1}^{k}\alpha_{t}^{2}}{S_{k}}.≤ italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 2 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_L ) end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG + caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG .

These conclude our statements. ∎

For stepsize αk=α0(k+1)γsubscript𝛼𝑘subscript𝛼0superscript𝑘1𝛾\alpha_{k}=\frac{\alpha_{0}}{(k+1)^{\gamma}}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( italic_k + 1 ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG, with γ[1/2,1)𝛾121\gamma\in[1/2,1)italic_γ ∈ [ 1 / 2 , 1 ) and α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT satisfies (25), we have:

1α0Sk(25)1α0t=1kαt𝒪(k1γ)and1α02t=1kαt2{𝒪(1)ifγ>1/2𝒪(ln(k))ifγ=1/2.formulae-sequence1subscript𝛼0subscript𝑆𝑘italic-(25italic-)1subscript𝛼0superscriptsubscript𝑡1𝑘subscript𝛼𝑡𝒪superscript𝑘1𝛾and1superscriptsubscript𝛼02superscriptsubscript𝑡1𝑘superscriptsubscript𝛼𝑡2cases𝒪1if𝛾12𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒𝒪𝑘if𝛾12𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒\frac{1}{\alpha_{0}}S_{k}{\color[rgb]{0,0,0}\overset{\eqref{eq:alk}}{\geq}}% \frac{1}{\alpha_{0}}\sum_{t=1}^{k}\alpha_{t}\geq{\cal O}(k^{1-\gamma})\quad% \text{and}\quad\frac{1}{\alpha_{0}^{2}}\sum_{t=1}^{k}\alpha_{t}^{2}\leq\begin{% cases}{\cal O}(1)\;\;\text{if}\;\gamma>1/2\\ {\cal O}(\ln(k))\;\;\;\text{if}\;\gamma=1/2.\end{cases}divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≥ end_ARG divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) and divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ { start_ROW start_CELL caligraphic_O ( 1 ) if italic_γ > 1 / 2 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL caligraphic_O ( roman_ln ( italic_k ) ) if italic_γ = 1 / 2 . end_CELL start_CELL end_CELL end_ROW

Consequently, for γ(1/2,1)𝛾121\gamma\in(1/2,1)italic_γ ∈ ( 1 / 2 , 1 ) we obtain from Theorem 4.4 the following sublinear convergence rates:

𝔼[(F(x^k)F*)]v0v¯02α0𝒪(k1γ)+α02𝒪(1)𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02subscript𝛼0𝒪superscript𝑘1𝛾subscript𝛼0superscript2𝒪1𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{\|v_{0}-% \bar{v}_{0}\|^{2}}{\alpha_{0}{\color[rgb]{0,0,0}{\cal O}(k^{1-\gamma})}}+\frac% {\alpha_{0}\mathcal{B}^{2}{\color[rgb]{0,0,0}{\cal O}(1)}}{{\color[rgb]{0,0,0}% {\cal O}(k^{1-\gamma})}},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG + divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_O ( 1 ) end_ARG start_ARG caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG , (26)
𝔼[dist2(x^k,𝒳)](1Cβ,c,hα0𝒪(k1γ))1q[v0v¯02q+(α022𝒪(1))1q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript1subscript𝐶𝛽𝑐subscriptsubscript𝛼0𝒪superscript𝑘1𝛾1𝑞delimited-[]superscriptnormsubscript𝑣0subscript¯𝑣02𝑞superscriptsuperscriptsubscript𝛼02superscript2𝒪11𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{1}{C_{\beta,c,\mathcal{B}_{h}}\cdot{\color[rgb]{0,0,0}\alpha_{% 0}{\cal O}(k^{1-\gamma})}}\right)^{\frac{1}{q}}\left[\|v_{0}-\bar{v}_{0}\|^{% \frac{2}{q}}+(\alpha_{0}^{2}\mathcal{B}^{2}{\color[rgb]{0,0,0}{\cal O}(1)})^{% \frac{1}{q}}\right].blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT + ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_O ( 1 ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

For the particular choice γ=1/2𝛾12\gamma=1/2italic_γ = 1 / 2 we can perform the same analysis as before and obtain similar convergence bounds (by replacing 𝒪(1)𝒪1{\cal O}(1)caligraphic_O ( 1 ) with 𝒪(ln(k))𝒪𝑘{\cal O}(\ln(k))caligraphic_O ( roman_ln ( italic_k ) )). Now, if we neglect the logarithmic terms, we get exactly the same rates as in (26), but replacing k1γsuperscript𝑘1𝛾k^{1-\gamma}italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT with k1/2superscript𝑘12k^{1/2}italic_k start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. Hence, we omit the details for this case.

Minimizing the right hand side of the bound for optimality in (26) w.r.t. α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we get an optimal choice for the initial stepsize, i.e., α0*=v0v¯0superscriptsubscript𝛼0normsubscript𝑣0subscript¯𝑣0\alpha_{0}^{*}=\frac{\|v_{0}-\bar{v}_{0}\|}{\mathcal{B}}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ end_ARG start_ARG caligraphic_B end_ARG. Since α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT must be in (0,min(12,1(1)+))0121subscript1\left(0,\min\left(\frac{1}{2},\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L}}% \right)\right)( 0 , roman_min ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) ), then we consider α0*=min(v0v¯0,min(12,1(1)+)δ)subscriptsuperscript𝛼0normsubscript𝑣0subscript¯𝑣0121subscript1𝛿\alpha^{*}_{0}=\min\left(\frac{\|v_{0}-\bar{v}_{0}\|}{\mathcal{B}},{\color[rgb% ]{0,0,0}\min\left(\frac{1}{2},\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L}}% \right)}-\delta\right)italic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_min ( divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ end_ARG start_ARG caligraphic_B end_ARG , roman_min ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) - italic_δ ) for some δ(0,12)𝛿012\delta\in(0,\frac{1}{2})italic_δ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ). We distinguish two cases:

Case 1: If α0*=0min(12,1(1)+)δsuperscriptsubscript𝛼0subscript0121subscript1𝛿\alpha_{0}^{*}=\frac{\mathcal{R}_{0}}{\mathcal{B}}\leq{\color[rgb]{0,0,0}\min% \left(\frac{1}{2},\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L}}\right)}-\deltaitalic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_B end_ARG ≤ roman_min ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) - italic_δ, where 0subscript0\mathcal{R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is an estimate of v0v¯0normsubscript𝑣0subscript¯𝑣0\|v_{0}-\bar{v}_{0}\|∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥, then the expressions for the rates from (26) are (after ignoring 𝒪(1)/𝒪(ln(k))𝒪1𝒪𝑘{\cal O}(1)/{\cal O}(\ln(k))caligraphic_O ( 1 ) / caligraphic_O ( roman_ln ( italic_k ) ) terms):

𝔼[(F(x^k)F*)]v0v¯020𝒪(k1γ)+0𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02subscript0𝒪superscript𝑘1𝛾subscript0𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{\mathcal{B% }\|v_{0}-\bar{v}_{0}\|^{2}}{\mathcal{R}_{0}{\color[rgb]{0,0,0}{\cal O}(k^{1-% \gamma})}}+\frac{\mathcal{R}_{0}\mathcal{B}}{{\color[rgb]{0,0,0}{\cal O}(k^{1-% \gamma})}},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG caligraphic_B ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG + divide start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_B end_ARG start_ARG caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ,
𝔼[dist2(x^k,𝒳)](Cβ,c,h0𝒪(k1γ))1q[v0v¯02q+(0)2q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscriptsubscript𝐶𝛽𝑐subscriptsubscript0𝒪superscript𝑘1𝛾1𝑞delimited-[]superscriptnormsubscript𝑣0subscript¯𝑣02𝑞superscriptsubscript02𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq{\color[rgb]{0,0,0}\left(\frac{\mathcal{B}}{C_{\beta,c,\mathcal{B}_{h}}% \mathcal{R}_{0}\cdot\mathcal{O}(k^{1-\gamma})}\right)^{\frac{1}{q}}\left[\|v_{% 0}-\bar{v}_{0}\|^{\frac{2}{q}}+(\mathcal{R}_{0})^{\frac{2}{q}}\right].}blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG caligraphic_B end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT + ( caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

Using the definition of Cβ,c,hsubscript𝐶𝛽𝑐subscriptC_{\beta,c,\mathcal{B}_{h}}italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT and replacing the values for \mathcal{L}caligraphic_L, \mathcal{B}caligraphic_B, hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c from Theorem 3.5 for both types of samplings, i.e., partition or τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice samplings, we get:

𝔼[(F(x^k)F*)]Nτ1B𝒪(k1γ)(v0v¯020+0),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹𝑁subscript𝜏1𝐵𝒪superscript𝑘1𝛾superscriptnormsubscript𝑣0subscript¯𝑣02subscript0subscript0\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\sqrt{\frac{N}{% \tau_{1}}}\frac{B}{\mathcal{O}(k^{1-\gamma})}\left(\frac{\|v_{0}-\bar{v}_{0}\|% ^{2}}{\mathcal{R}_{0}}+\mathcal{R}_{0}\right),blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ square-root start_ARG divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_B end_ARG start_ARG caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ( divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,
𝔼[dist2(x^k,𝒳)](Nτ1Bmc¯maxj=1:m2Bjτ2β(2β)0𝒪(k1γ))1q[v0v¯02q+(0)2q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript𝑁subscript𝜏1𝐵𝑚¯𝑐superscriptsubscript:𝑗1𝑚2subscript𝐵𝑗subscript𝜏2𝛽2𝛽subscript0𝒪superscript𝑘1𝛾1𝑞delimited-[]superscriptnormsubscript𝑣0subscript¯𝑣02𝑞superscriptsubscript02𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq{\color[rgb]{0,0,0}\left(\sqrt{\frac{N}{\tau_{1}}}\frac{Bm\bar{c}\max_{j=1% :m}^{2}B_{j}}{\tau_{2}\cdot\beta(2-\beta)\mathcal{R}_{0}\cdot\mathcal{O}(k^{1-% \gamma})}\right)^{\frac{1}{q}}\left[\|v_{0}-\bar{v}_{0}\|^{\frac{2}{q}}+(% \mathcal{R}_{0})^{\frac{2}{q}}\right].}blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( square-root start_ARG divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_B italic_m over¯ start_ARG italic_c end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_β ( 2 - italic_β ) caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT + ( caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

Case 2: If α0*=min(12,1(1)+)δ<v0v¯0superscriptsubscript𝛼0121subscript1𝛿normsubscript𝑣0subscript¯𝑣0\alpha_{0}^{*}={\color[rgb]{0,0,0}\min\left(\frac{1}{2},\frac{1-\sqrt{(1-% \mathcal{L})_{+}}}{\mathcal{L}}\right)}-\delta<\frac{\|v_{0}-\bar{v}_{0}\|}{% \mathcal{B}}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_min ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) - italic_δ < divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ end_ARG start_ARG caligraphic_B end_ARG, for some δ(0,1/2)𝛿012\delta\in(0,1/2)italic_δ ∈ ( 0 , 1 / 2 ). Then, the expressions for the rates from (26) are (after ignoring 𝒪(1)/𝒪(ln(k))𝒪1𝒪𝑘{\cal O}(1)/{\cal O}(\ln(k))caligraphic_O ( 1 ) / caligraphic_O ( roman_ln ( italic_k ) ) terms):

𝔼[(F(x^k)F*)]v0v¯02α0*𝒪(k1γ)+α0*2𝒪(k1γ)2v0v¯02α0*𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹superscriptnormsubscript𝑣0subscript¯𝑣02superscriptsubscript𝛼0𝒪superscript𝑘1𝛾superscriptsubscript𝛼0superscript2𝒪superscript𝑘1𝛾2superscriptnormsubscript𝑣0subscript¯𝑣02superscriptsubscript𝛼0𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{\|v_{0}-% \bar{v}_{0}\|^{2}}{\alpha_{0}^{*}\cdot{\cal O}(k^{1-\gamma})}+\frac{\alpha_{0}% ^{*}\cdot\mathcal{B}^{2}}{{\cal O}(k^{1-\gamma})}\leq\frac{2\|v_{0}-\bar{v}_{0% }\|^{2}}{\alpha_{0}^{*}\cdot{\cal O}(k^{1-\gamma})},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG + divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⋅ caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ≤ divide start_ARG 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG , (27)
𝔼[dist2(x^k,𝒳)](1Cβ,c,hα0*𝒪(k1γ))1q[v0v¯02q+((α0*)22)1q]𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript1subscript𝐶𝛽𝑐subscriptsuperscriptsubscript𝛼0𝒪superscript𝑘1𝛾1𝑞delimited-[]superscriptnormsubscript𝑣0subscript¯𝑣02𝑞superscriptsuperscriptsuperscriptsubscript𝛼02superscript21𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{1}{C_{\beta,c,\mathcal{B}_{h}}{\color[rgb]{0,0,0}\alpha_{0}^{*% }}\cdot{\cal O}(k^{1-\gamma})}\right)^{\frac{1}{q}}\left[\|v_{0}-\bar{v}_{0}\|% ^{\frac{2}{q}}+\left((\alpha_{0}^{*})^{2}\mathcal{B}^{2}\right)^{\frac{1}{q}}\right]blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT + ( ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ]
(1Cβ,c,hα0*𝒪(k1γ))1q[2v0v¯02q].absentsuperscript1subscript𝐶𝛽𝑐subscriptsuperscriptsubscript𝛼0𝒪superscript𝑘1𝛾1𝑞delimited-[]2superscriptnormsubscript𝑣0subscript¯𝑣02𝑞\displaystyle\qquad\qquad\qquad\quad\leq\left(\frac{1}{C_{\beta,c,\mathcal{B}_% {h}}{\color[rgb]{0,0,0}\alpha_{0}^{*}}\cdot{\cal O}(k^{1-\gamma})}\right)^{% \frac{1}{q}}\left[2\|v_{0}-\bar{v}_{0}\|^{\frac{2}{q}}\right].≤ ( divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] . (28)

Consider the case when α0*=12δsubscriptsuperscript𝛼012𝛿\alpha^{*}_{0}=\frac{1}{2}-\deltaitalic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_δ, from (27), and (28), we have:

𝔼[(F(x^k)F*)]4v0v¯02(12δ)𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹4superscriptnormsubscript𝑣0subscript¯𝑣0212𝛿𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{4\|v_{0}-% \bar{v}_{0}\|^{2}}{(1-2\delta){\cal O}(k^{1-\gamma})},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG 4 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - 2 italic_δ ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ,
𝔼[dist2(x^k,𝒳)](2Cβ,c,h(12δ)𝒪(k1γ))1q[2v0v¯02q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript2subscript𝐶𝛽𝑐subscript12𝛿𝒪superscript𝑘1𝛾1𝑞delimited-[]2superscriptnormsubscript𝑣0subscript¯𝑣02𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{2}{C_{\beta,c,\mathcal{B}_{h}}(1-2\delta){\cal O}(k^{1-\gamma}% )}\right)^{\frac{1}{q}}\left[2\|v_{0}-\bar{v}_{0}\|^{\frac{2}{q}}\right].blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 2 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 - 2 italic_δ ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

Using the definition of Cβ,c,hsubscript𝐶𝛽𝑐subscriptC_{\beta,c,\mathcal{B}_{h}}italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the expressions for hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c from Theorem 3.5 for the partition or τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice samplings, we get:

𝔼[(F(x^k)F*)]4v0v¯02(12δ)𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹4superscriptnormsubscript𝑣0subscript¯𝑣0212𝛿𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{4\|v_{0}-% \bar{v}_{0}\|^{2}}{(1-2\delta){\cal O}(k^{1-\gamma})},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG 4 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - 2 italic_δ ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ,
𝔼[dist2(x^k,𝒳)](2mc¯maxj=1:m2Bjτ2β(2β)(12δ)𝒪(k1γ))1q[2v0v¯02q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript2𝑚¯𝑐superscriptsubscript:𝑗1𝑚2subscript𝐵𝑗subscript𝜏2𝛽2𝛽12𝛿𝒪superscript𝑘1𝛾1𝑞delimited-[]2superscriptnormsubscript𝑣0subscript¯𝑣02𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{2m\bar{c}\max_{j=1:m}^{2}B_{j}}{\tau_{2}\cdot\beta(2-\beta)% \cdot(1-2\delta){\cal O}(k^{1-\gamma})}\right)^{\frac{1}{q}}\left[2\|v_{0}-% \bar{v}_{0}\|^{\frac{2}{q}}\right].blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 2 italic_m over¯ start_ARG italic_c end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_β ( 2 - italic_β ) ⋅ ( 1 - 2 italic_δ ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

When α0*=1(1)+δsubscriptsuperscript𝛼01subscript1𝛿\alpha^{*}_{0}=\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L}}-\deltaitalic_α start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG - italic_δ, from (27), and (28), we have:

𝔼[(F(x^k)F*)]2v0v¯02(1(1)+δ)𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹2superscriptnormsubscript𝑣0subscript¯𝑣021subscript1𝛿𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{2\mathcal{% L}\|v_{0}-\bar{v}_{0}\|^{2}}{(1-\sqrt{(1-\mathcal{L})_{+}}-\delta\mathcal{L}){% \cal O}(k^{1-\gamma})},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG 2 caligraphic_L ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG - italic_δ caligraphic_L ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ,
𝔼[dist2(x^k,𝒳)](2Cβ,c,h(1(1)+δ)𝒪(k1γ))1q[2v0v¯02q].𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳superscript2subscript𝐶𝛽𝑐subscript1subscript1𝛿𝒪superscript𝑘1𝛾1𝑞delimited-[]2superscriptnormsubscript𝑣0subscript¯𝑣02𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\left(\frac{2\mathcal{L}}{C_{\beta,c,\mathcal{B}_{h}}(1-\sqrt{(1-\mathcal{% L})_{+}}-\delta\mathcal{L}){\cal O}(k^{1-\gamma})}\right)^{\frac{1}{q}}\left[2% \|v_{0}-\bar{v}_{0}\|^{\frac{2}{q}}\right].blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ ( divide start_ARG 2 caligraphic_L end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG - italic_δ caligraphic_L ) caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

Using the definition of Cβ,c,hsubscript𝐶𝛽𝑐subscriptC_{\beta,c,\mathcal{B}_{h}}italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the expressions for \mathcal{L}caligraphic_L, \mathcal{B}caligraphic_B, hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c from Theorem 3.5 for the partition or τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice samplings, we get:

𝔼[(F(x^k)F*)]Nτ12Lv0v¯02(1(1Nτ1L)+δNτ1L)𝒪(k1γ),𝔼delimited-[]𝐹subscript^𝑥𝑘superscript𝐹𝑁subscript𝜏12𝐿superscriptnormsubscript𝑣0subscript¯𝑣021subscript1𝑁subscript𝜏1𝐿𝛿𝑁subscript𝜏1𝐿𝒪superscript𝑘1𝛾\displaystyle\mathbb{E}\left[(F(\hat{x}_{k})-F^{*})\right]\leq\frac{N}{\tau_{1% }}\frac{2L\|v_{0}-\bar{v}_{0}\|^{2}}{{\color[rgb]{0,0,0}\left(1-\sqrt{(1-\frac% {N}{\tau_{1}}L)_{+}}-\delta\frac{N}{\tau_{1}}L\right)}\cdot{\cal O}(k^{1-% \gamma})},blackboard_E [ ( italic_F ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG 2 italic_L ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - square-root start_ARG ( 1 - divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG - italic_δ divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L ) ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ,
𝔼[dist2(x^k,𝒳)]𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ]
(Nτ1mτ2Lc¯maxj=1:m2Bjβ(2β)(1(1Nτ1L)+δNτ1L)𝒪(k1γ))1q[2v0v¯02q].absentsuperscript𝑁subscript𝜏1𝑚subscript𝜏2𝐿¯𝑐superscriptsubscript:𝑗1𝑚2subscript𝐵𝑗𝛽2𝛽1subscript1𝑁subscript𝜏1𝐿𝛿𝑁subscript𝜏1𝐿𝒪superscript𝑘1𝛾1𝑞delimited-[]2superscriptnormsubscript𝑣0subscript¯𝑣02𝑞\displaystyle\leq\!\left(\!{\color[rgb]{0,0,0}\frac{N}{\tau_{1}}}\frac{m}{\tau% _{2}}\frac{L\bar{c}\max_{j=1:m}^{2}B_{j}}{\beta(2-\beta){\color[rgb]{0,0,0}% \left(1-\sqrt{(1-\frac{N}{\tau_{1}}L)_{+}}-\delta\frac{N}{\tau_{1}}L\right)}% \cdot{\cal O}(k^{1-\gamma})}\!\right)^{\frac{1}{q}}\!\left[2\|v_{0}-\bar{v}_{0% }\|^{\frac{2}{q}}\right].≤ ( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_L over¯ start_ARG italic_c end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_β ( 2 - italic_β ) ( 1 - square-root start_ARG ( 1 - divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG - italic_δ divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_L ) ⋅ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT [ 2 ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT divide start_ARG 2 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ] .

Note that for the initial stepsize choices α0*=0orα0*=1(1)+δsuperscriptsubscript𝛼0subscript0orsuperscriptsubscript𝛼01subscript1𝛿\alpha_{0}^{*}=\frac{{\color[rgb]{0,0,0}\mathcal{R}_{0}}}{\mathcal{B}}\;\text{% or}\;\alpha_{0}^{*}={\color[rgb]{0,0,0}\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{% \mathcal{L}}}-\deltaitalic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_B end_ARG or italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG - italic_δ and for the two particular choices of the sampling (partition or nice samplings), we obtain convergence rates depending explicitly on mini-batch sizes τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, namely (Nτ1,(Nτ1mτ2)1/q)𝑁subscript𝜏1superscript𝑁subscript𝜏1𝑚subscript𝜏21𝑞\left(\sqrt{\frac{N}{\tau_{1}}},\left({\color[rgb]{0,0,0}\sqrt{\frac{N}{\tau_{% 1}}}}\frac{m}{\tau_{2}}\right)^{1/q}\right)( square-root start_ARG divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG , ( square-root start_ARG divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / italic_q end_POSTSUPERSCRIPT ) or (Nτ1,(Nτ1mτ2)1/q)𝑁subscript𝜏1superscript𝑁subscript𝜏1𝑚subscript𝜏21𝑞\left(\frac{N}{\tau_{1}},\left({\color[rgb]{0,0,0}\frac{N}{\tau_{1}}}\frac{m}{% \tau_{2}}\right)^{1/q}\right)( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , ( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / italic_q end_POSTSUPERSCRIPT ), respectively. Hence, in these settings we have linear dependence on the mini-batch sizes (τ1,τ2)subscript𝜏1subscript𝜏2(\tau_{1},\tau_{2})( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for algorithm Mini-batch SSP.

Furthermore, since in the convex case we can consider a stepsize sequence αk=α0(k+1)γsubscript𝛼𝑘subscript𝛼0superscript𝑘1𝛾\alpha_{k}=\frac{\alpha_{0}}{(k+1)^{\gamma}}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( italic_k + 1 ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG, then for α0=0orα0=1(1)+δsubscript𝛼0subscript0orsubscript𝛼01subscript1𝛿\alpha_{0}=\frac{{\color[rgb]{0,0,0}\mathcal{R}_{0}}}{\mathcal{B}}\;\text{or}% \;\alpha_{0}={\color[rgb]{0,0,0}\frac{1-\sqrt{(1-\mathcal{L})_{+}}}{\mathcal{L% }}}-\deltaitalic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_B end_ARG or italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG - italic_δ one can notice immediately that our stepsize sequence αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT also depends linearly on the mini-batch size τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the two particular choices of sampling (partition or nice samplings), i.e., αk=𝒪(τ1N(k+1)γ)subscript𝛼𝑘𝒪subscript𝜏1𝑁superscript𝑘1𝛾\alpha_{k}=\mathcal{O}\left(\frac{\tau_{1}}{N(k+1)^{\gamma}}\right)italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = caligraphic_O ( divide start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_N ( italic_k + 1 ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG ).

Finally, one can notice that when B=0𝐵0B=0italic_B = 0, from Theorem 4.4 improved rates can be derived for Mini-batch SSP in the convex case. For example, for stepsize αk=α0(k+1)γsubscript𝛼𝑘subscript𝛼0superscript𝑘1𝛾\alpha_{k}=\frac{\alpha_{0}}{(k+1)^{\gamma}}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ( italic_k + 1 ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG, with γ[0,1)𝛾01\gamma\in[0,1)italic_γ ∈ [ 0 , 1 ) and α0=min(12,1(1)+)δsubscript𝛼0121subscript1𝛿\alpha_{0}={\color[rgb]{0,0,0}\min\left(\frac{1}{2},\frac{1-\sqrt{(1-\mathcal{% L})_{+}}}{\mathcal{L}}\right)}-\deltaitalic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_min ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 - square-root start_ARG ( 1 - caligraphic_L ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG end_ARG start_ARG caligraphic_L end_ARG ) - italic_δ, we obtain convergence rates for x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in optimality and feasibility violation of order 𝒪(Nτ1k1γ)𝒪𝑁subscript𝜏1superscript𝑘1𝛾{\cal O}\left(\frac{N}{\tau_{1}k^{1-\gamma}}\right)caligraphic_O ( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT end_ARG ) and 𝒪(Nmτ1τ2k1γ)1q𝒪superscript𝑁𝑚subscript𝜏1subscript𝜏2superscript𝑘1𝛾1𝑞{\cal O}\left(\frac{{\color[rgb]{0,0,0}N}m}{{\color[rgb]{0,0,0}\tau_{1}}\tau_{% 2}k^{1-\gamma}}\right)^{\frac{1}{q}}caligraphic_O ( divide start_ARG italic_N italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 1 - italic_γ end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT, respectively. In particular, for γ=0𝛾0\gamma=0italic_γ = 0 these rates become of order 𝒪(Nτ1k)𝒪𝑁subscript𝜏1𝑘{\cal O}\left(\frac{N}{\tau_{1}k}\right)caligraphic_O ( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k end_ARG ) and 𝒪(Nmτ1τ2k)1q𝒪superscript𝑁𝑚subscript𝜏1subscript𝜏2𝑘1𝑞{\cal O}\left(\frac{{\color[rgb]{0,0,0}N}m}{{\color[rgb]{0,0,0}\tau_{1}}\tau_{% 2}k}\right)^{\frac{1}{q}}caligraphic_O ( divide start_ARG italic_N italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_k end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT.

In conclusion, by specializing our Theorem 4.4 to different mini-batching strategies, such as partition or nice samplings, we derive explicit expressions for the stepsize αksubscript𝛼𝑘\alpha_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as a function of the mini-batch size and, consequently, convergence rates depending linearly on the mini-batch sizes (τ1,τ2)subscript𝜏1subscript𝜏2(\tau_{1},\tau_{2})( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Hence, Theorem 4.4 shows that a mini-batch variant of the stochastic subgradient projection scheme is more beneficial than the nonmini-batch variant.

4.2 Convergence analysis: strongly convex objective function

In this section, we additionally assume the inequality from Assumption 2.2 holds. The next lemma derives an improved recurrence for the sequence vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT under the strongly convex assumption. The proof is similar to Lemma 8 in [17].

Lemma 4.5.

Let fi,gisubscript𝑓𝑖subscript𝑔𝑖f_{i},\;g_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with i=1:Nnormal-:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, with j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, be convex functions. Additionally, Assumptions 2.12.4 hold, with μ>0𝜇0\mu>0italic_μ > 0, and the random vectors ζ,ξ𝜁𝜉\zeta,\;\xiitalic_ζ , italic_ξ are nonnegative. Define k0=8μsubscript𝑘08𝜇k_{0}=\lceil\frac{8\mathcal{L}}{\mu}\rceilitalic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ⌈ divide start_ARG 8 caligraphic_L end_ARG start_ARG italic_μ end_ARG ⌉, β(0,2)𝛽02\beta\in\left(0,2\right)italic_β ∈ ( 0 , 2 ), θ,μ=1μ/(4)subscript𝜃𝜇1𝜇4\theta_{\mathcal{L},\mu}\!=\!1\!-\!\mu/(4\mathcal{L})italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT = 1 - italic_μ / ( 4 caligraphic_L ) and αk=4μγksubscript𝛼𝑘4𝜇subscript𝛾𝑘\alpha_{k}\!=\!\frac{4}{\mu}\gamma_{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 4 end_ARG start_ARG italic_μ end_ARG italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is given by:

γk={μ4ifkk02k+1ifk>k0.subscript𝛾𝑘cases𝜇4if𝑘subscript𝑘02𝑘1if𝑘subscript𝑘0\gamma_{k}=\left\{\begin{array}[]{ll}\frac{\mu}{4\mathcal{L}}&\text{\emph{if}}% \;\;k\leq k_{0}\\ \frac{2}{k+1}&\text{\emph{if}}\;\;k>k_{0}.\end{array}\right.italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL divide start_ARG italic_μ end_ARG start_ARG 4 caligraphic_L end_ARG end_CELL start_CELL if italic_k ≤ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 2 end_ARG start_ARG italic_k + 1 end_ARG end_CELL start_CELL if italic_k > italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY

Then, the iterates of Algorithm Mini-batch SSP satisfy the following recurrence:

𝔼[vk0x*2]{22ifθ,μ0θ,μk0v0x*2+1θ,μk01θ,μ(1+2Cβ,c,hθ,μ)22ifθ,μ>0,𝔼delimited-[]superscriptnormsubscript𝑣subscript𝑘0superscript𝑥2casessuperscript2superscript2ifsubscript𝜃𝜇0superscriptsubscript𝜃𝜇subscript𝑘0superscriptnormsubscript𝑣0superscript𝑥21superscriptsubscript𝜃𝜇subscript𝑘01subscript𝜃𝜇12subscript𝐶𝛽𝑐subscriptsubscript𝜃𝜇superscript2superscript2ifsubscript𝜃𝜇0\displaystyle\mathbb{E}[\|v_{k_{0}}-x^{*}\|^{2}]\leq\left\{\begin{array}[]{ll}% \frac{\mathcal{B}^{2}}{\mathcal{L}^{2}}&\text{\emph{if}}\;\;\theta_{\mathcal{L% },\mu}\leq 0\\ \theta_{\mathcal{L},\mu}^{k_{0}}\|v_{0}-x^{*}\|^{2}+\frac{1-\theta_{\mathcal{L% },\mu}^{k_{0}}}{1-\theta_{\mathcal{L},\mu}}{\color[rgb]{0,0,0}\left(1+\frac{2}% {C_{\beta,c,\mathcal{B}_{h}}\theta_{\mathcal{L},\mu}}\right)}\frac{\mathcal{B}% ^{2}}{\mathcal{L}^{2}}&\text{\emph{if}}\;\;\theta_{\mathcal{L},\mu}>0,\end{% array}\right.blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ { start_ARRAY start_ROW start_CELL divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL if italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT ≤ 0 end_CELL end_ROW start_ROW start_CELL italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 - italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT end_ARG ( 1 + divide start_ARG 2 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT end_ARG ) divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL if italic_θ start_POSTSUBSCRIPT caligraphic_L , italic_μ end_POSTSUBSCRIPT > 0 , end_CELL end_ROW end_ARRAY
𝔼[vkx*2]+γk𝔼[xkx*2]+16Cβ,c,h𝔼[dist2q(xk,𝒳)]𝔼delimited-[]superscriptnormsubscript𝑣𝑘superscript𝑥2subscript𝛾𝑘𝔼delimited-[]superscriptnormsubscript𝑥𝑘superscript𝑥216subscript𝐶𝛽𝑐subscript𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑘𝒳\displaystyle\mathbb{E}[\|v_{k}-x^{*}\|^{2}]+\gamma_{k}\mathbb{E}[\|x_{k}-x^{*% }\|^{2}]+{\color[rgb]{0,0,0}\frac{1}{6}}C_{\beta,c,\mathcal{B}_{h}}\mathbb{E}[% \emph{dist}^{2q}(x_{k},\mathcal{X})]blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG 1 end_ARG start_ARG 6 end_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ]
(1γk)𝔼[vk1x*2]+(1+6Cβ,c,h)16μ2γk22k>k0.formulae-sequenceabsent1subscript𝛾𝑘𝔼delimited-[]superscriptnormsubscript𝑣𝑘1superscript𝑥216subscript𝐶𝛽𝑐subscript16superscript𝜇2superscriptsubscript𝛾𝑘2superscript2for-all𝑘subscript𝑘0\displaystyle\leq\left(1-\gamma_{k}\right)\mathbb{E}[\|v_{k-1}-x^{*}\|^{2}]+{% \color[rgb]{0,0,0}\left(1+\frac{6}{C_{\beta,c,\mathcal{B}_{h}}}\right)}\frac{1% 6}{\mu^{2}}\gamma_{k}^{2}\mathcal{B}^{2}\quad\forall k>k_{0}.≤ ( 1 - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + ( 1 + divide start_ARG 6 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) divide start_ARG 16 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_k > italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

Let us define for kk0+1𝑘subscript𝑘01k\geq k_{0}+1italic_k ≥ italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 the sum:

Sk=t=k0+1k(t+1)2𝒪(k3+k02k+k2k0)subscript𝑆𝑘superscriptsubscript𝑡subscript𝑘01𝑘superscript𝑡12similar-to𝒪superscript𝑘3superscriptsubscript𝑘02𝑘superscript𝑘2subscript𝑘0\displaystyle S_{k}=\sum_{t=k_{0}+1}^{k}(t+1)^{2}\sim\mathcal{O}(k^{3}+k_{0}^{% 2}k+k^{2}k_{0})italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_t + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ caligraphic_O ( italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k + italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )

and the corresponding average sequences:

x^k=t=k0+1k(t+1)2xtSk,andw^k=t=k0+1k(t+1)2Π𝒳(xt)Sk𝒳.formulae-sequencesubscript^𝑥𝑘superscriptsubscript𝑡subscript𝑘01𝑘superscript𝑡12subscript𝑥𝑡subscript𝑆𝑘andsubscript^𝑤𝑘superscriptsubscript𝑡subscript𝑘01𝑘superscript𝑡12subscriptΠ𝒳subscript𝑥𝑡subscript𝑆𝑘𝒳\displaystyle\hat{x}_{k}=\frac{\sum_{t=k_{0}+1}^{k}(t+1)^{2}x_{t}}{S_{k}},% \quad\text{and}\quad\hat{w}_{k}=\frac{\sum_{t=k_{0}+1}^{k}(t+1)^{2}\Pi_{% \mathcal{X}}(x_{t})}{S_{k}}\in\mathcal{X}.over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_t + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , and over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_t + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Π start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∈ caligraphic_X .
Theorem 4.6.

Let fi,gisubscript𝑓𝑖subscript𝑔𝑖f_{i},\;g_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with i=1:Nnormal-:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, with j=1:mnormal-:𝑗1𝑚j=1:mitalic_j = 1 : italic_m, be convex functions. Additionally, Assumptions 2.12.4 hold and the random vectors ζ,ξ𝜁𝜉\zeta,\;\xiitalic_ζ , italic_ξ are non-negative. Further, consider the stepsizes-switching rule αk=min(1,8μ(k+1))subscript𝛼𝑘18𝜇𝑘1\alpha_{k}=\min\left(\frac{1}{\mathcal{L}},\frac{8}{\mu(k+1)}\right)italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min ( divide start_ARG 1 end_ARG start_ARG caligraphic_L end_ARG , divide start_ARG 8 end_ARG start_ARG italic_μ ( italic_k + 1 ) end_ARG ), β(0,2)𝛽02\beta\in\left(0,2\right)italic_β ∈ ( 0 , 2 ) and k0=8μsubscript𝑘08𝜇k_{0}=\lceil{\frac{8\mathcal{L}}{\mu}}\rceilitalic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ⌈ divide start_ARG 8 caligraphic_L end_ARG start_ARG italic_μ end_ARG ⌉. Then, for k>k0𝑘subscript𝑘0k>k_{0}italic_k > italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT we have the following sublinear convergence rates for the average sequence x^ksubscriptnormal-^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in terms of optimality and feasibility violation for problem (1) (kee** only the dominant terms):

𝔼[x^kx*2]𝒪(2μ2Cβ,c,h(k+1)),𝔼delimited-[]superscriptnormsubscript^𝑥𝑘superscript𝑥2𝒪superscript2superscript𝜇2subscript𝐶𝛽𝑐subscript𝑘1\displaystyle\mathbb{E}\left[\|\hat{x}_{k}-x^{*}\|^{2}\right]\leq{\color[rgb]{% 0,0,0}\mathcal{O}\left(\frac{\mathcal{B}^{2}}{\mu^{2}C_{\beta,c,\mathcal{B}_{h% }}\,(k+1)}\right)},blackboard_E [ ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k + 1 ) end_ARG ) ,
𝔼[dist2(x^k,𝒳)]𝒪(2/qμ2/qCβ,c,h2/q(k+1)2/q).𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳𝒪superscript2𝑞superscript𝜇2𝑞superscriptsubscript𝐶𝛽𝑐subscript2𝑞superscript𝑘12𝑞\displaystyle\mathbb{E}\left[\emph{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\mathcal{O}\left(\frac{\mathcal{B}^{2/q}}{\mu^{2/q}{\color[rgb]{0,0,0}C_{% \beta,c,\mathcal{B}_{h}}^{2/q}}(k+1)^{2/q}}\right).blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 / italic_q end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 / italic_q end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 / italic_q end_POSTSUPERSCRIPT ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 / italic_q end_POSTSUPERSCRIPT end_ARG ) .
Proof.

Using Lemma 4.5, we get the recurrence:

(k+1)2𝔼[vkx*2]superscript𝑘12𝔼delimited-[]superscriptnormsubscript𝑣𝑘superscript𝑥2\displaystyle(k+1)^{2}\mathbb{E}[\|v_{k}-x^{*}\|^{2}]( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] +2(k+1)𝔼[xkx*2]+Cβ,c,h6(k+1)2𝔼[dist2q(xk,𝒳)]2𝑘1𝔼delimited-[]superscriptnormsubscript𝑥𝑘superscript𝑥2subscript𝐶𝛽𝑐subscript6superscript𝑘12𝔼delimited-[]superscriptdist2𝑞subscript𝑥𝑘𝒳\displaystyle+2(k+1)\mathbb{E}[\|x_{k}-x^{*}\|^{2}]+\frac{C_{\beta,c,\mathcal{% B}_{h}}}{6}(k+1)^{2}\mathbb{E}[\text{dist}^{2q}(x_{k},\mathcal{X})]+ 2 ( italic_k + 1 ) blackboard_E [ ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG 6 end_ARG ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ dist start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ]
k2𝔼[vk1x*2]+(1+6Cβ,c,h)64μ22k>k0.formulae-sequenceabsentsuperscript𝑘2𝔼delimited-[]superscriptnormsubscript𝑣𝑘1superscript𝑥216subscript𝐶𝛽𝑐subscript64superscript𝜇2superscript2for-all𝑘subscript𝑘0\displaystyle\leq k^{2}\mathbb{E}[\|v_{k-1}-x^{*}\|^{2}]+{\color[rgb]{0,0,0}% \left(1+\frac{6}{C_{\beta,c,\mathcal{B}_{h}}}\right)}\frac{64}{\mu^{2}}% \mathcal{B}^{2}\quad\forall k>k_{0}.≤ italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + ( 1 + divide start_ARG 6 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) divide start_ARG 64 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_k > italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

Summing this inequality from k0+1subscript𝑘01k_{0}+1italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 to k𝑘kitalic_k and using linearity of the expectation operator and convexity of the norm, we get:

(k+1)2𝔼[vkx*2]+2Sk(k+1)𝔼[x^kx*2]+SkCβ,c,h6𝔼[w^kx^k2q]superscript𝑘12𝔼delimited-[]superscriptnormsubscript𝑣𝑘superscript𝑥22subscript𝑆𝑘𝑘1𝔼delimited-[]superscriptnormsubscript^𝑥𝑘superscript𝑥2subscript𝑆𝑘subscript𝐶𝛽𝑐subscript6𝔼delimited-[]superscriptnormsubscript^𝑤𝑘subscript^𝑥𝑘2𝑞\displaystyle{(k+1)^{2}}\mathbb{E}[\|v_{k}-x^{*}\|^{2}]+\frac{2S_{k}}{(k+1)}% \mathbb{E}[\|\hat{x}_{k}-x^{*}\|^{2}]+\frac{S_{k}C_{\beta,c,\mathcal{B}_{h}}}{% 6}\mathbb{E}[\|\hat{w}_{k}-\hat{x}_{k}\|^{2q}]( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG 2 italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ( italic_k + 1 ) end_ARG blackboard_E [ ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + divide start_ARG italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG 6 end_ARG blackboard_E [ ∥ over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ]
(k0+1)2𝔼[vk0x*2]+(1+6Cβ,c,h)64μ22(kk0).absentsuperscriptsubscript𝑘012𝔼delimited-[]superscriptnormsubscript𝑣subscript𝑘0superscript𝑥216subscript𝐶𝛽𝑐subscript64superscript𝜇2superscript2𝑘subscript𝑘0\displaystyle\leq(k_{0}+1)^{2}\mathbb{E}[\|v_{k_{0}}-x^{*}\|^{2}]+{\color[rgb]% {0,0,0}\left(1+\frac{6}{C_{\beta,c,\mathcal{B}_{h}}}\right)}\frac{64}{\mu^{2}}% \mathcal{B}^{2}(k-k_{0}).≤ ( italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ italic_v start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + ( 1 + divide start_ARG 6 end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) divide start_ARG 64 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_k - italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

After simple calculations and kee** only the dominant terms, we get the following convergence rate for the average sequence x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in terms of optimality:

𝔼[x^kx*2]𝒪(2μ2Cβ,c,h(k+1)),𝔼delimited-[]superscriptnormsubscript^𝑥𝑘superscript𝑥2𝒪superscript2superscript𝜇2subscript𝐶𝛽𝑐subscript𝑘1\displaystyle\mathbb{E}[\|\hat{x}_{k}-x^{*}\|^{2}]\leq\mathcal{O}\left(\frac{% \mathcal{B}^{2}}{\mu^{2}{\color[rgb]{0,0,0}C_{\beta,c,\mathcal{B}_{h}}}\,(k+1)% }\right),blackboard_E [ ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k + 1 ) end_ARG ) ,
(𝔼[w^kx^k2])q𝔼[w^kx^k2q]𝒪(2μ2Cβ,c,h2(k+1)2).superscript𝔼delimited-[]superscriptnormsubscript^𝑤𝑘subscript^𝑥𝑘2𝑞𝔼delimited-[]superscriptnormsubscript^𝑤𝑘subscript^𝑥𝑘2𝑞𝒪superscript2superscript𝜇2subscriptsuperscript𝐶2𝛽𝑐subscriptsuperscript𝑘12\displaystyle\left(\mathbb{E}[\|\hat{w}_{k}-\hat{x}_{k}\|^{2}]\right)^{q}\leq% \mathbb{E}[\|\hat{w}_{k}-\hat{x}_{k}\|^{2q}]\leq\mathcal{O}\left(\frac{% \mathcal{B}^{2}}{\mu^{2}{\color[rgb]{0,0,0}C^{2}_{\beta,c,\mathcal{B}_{h}}}\,(% k+1)^{2}}\right).( blackboard_E [ ∥ over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ≤ blackboard_E [ ∥ over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 italic_q end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Since w^k𝒳subscript^𝑤𝑘𝒳\hat{w}_{k}\in\mathcal{X}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_X, we get the following convergence rate for the average sequence x^ksubscript^𝑥𝑘\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in terms of feasibility violation:

𝔼[dist2(x^k,𝒳)]𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳\displaystyle\mathbb{E}[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})]blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] 𝔼[w^kx^k2]𝒪(2μ2Cβ,c,h2(k+1)2)1q.absent𝔼delimited-[]superscriptnormsubscript^𝑤𝑘subscript^𝑥𝑘2𝒪superscriptsuperscript2superscript𝜇2subscriptsuperscript𝐶2𝛽𝑐subscriptsuperscript𝑘121𝑞\displaystyle\leq\mathbb{E}[\|\hat{w}_{k}-\hat{x}_{k}\|^{2}]\leq\mathcal{O}% \left(\frac{\mathcal{B}^{2}}{\mu^{2}{\color[rgb]{0,0,0}C^{2}_{\beta,c,\mathcal% {B}_{h}}}\,(k+1)^{2}}\right)^{\frac{1}{q}}.≤ blackboard_E [ ∥ over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_β , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT .

These prove our statements. ∎

Note that our previous theoretical convergence analysis naturally imposes a stepsize-switching rule which describes when one should switch from a constant regime (depending on mini-batch size τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) to a decreasing stepsize regime, i.e., αk=min(1,8μ(k+1))subscript𝛼𝑘18𝜇𝑘1\alpha_{k}=\min\left(\frac{1}{\mathcal{L}},\frac{8}{\mu(k+1)}\right)italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min ( divide start_ARG 1 end_ARG start_ARG caligraphic_L end_ARG , divide start_ARG 8 end_ARG start_ARG italic_μ ( italic_k + 1 ) end_ARG ). For the particular choice of the stepsize β=1𝛽1\beta=1italic_β = 1, we have (see (24)):

C1,c,h=(1ch2)>0,subscript𝐶1𝑐subscript1𝑐superscriptsubscript20\displaystyle C_{1,c,\mathcal{B}_{h}}=\left(\frac{1}{c\mathcal{B}_{h}^{2}}% \right)>0,italic_C start_POSTSUBSCRIPT 1 , italic_c , caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( divide start_ARG 1 end_ARG start_ARG italic_c caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) > 0 ,

since we can always choose c𝑐citalic_c such that ch2>1𝑐superscriptsubscript21c\mathcal{B}_{h}^{2}>1italic_c caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 1. Using this expression in the convergence rates of Theorem 4.6, we obtain:

𝔼[x^kx*2]𝒪(2(ch2)μ2(k+1)),𝔼delimited-[]superscriptnormsubscript^𝑥𝑘superscript𝑥2𝒪superscript2𝑐superscriptsubscript2superscript𝜇2𝑘1\displaystyle\mathbb{E}\left[\|\hat{x}_{k}-x^{*}\|^{2}\right]\leq{\color[rgb]{% 0,0,0}\mathcal{O}\left(\frac{\mathcal{B}^{2}{\color[rgb]{0,0,0}(c\mathcal{B}_{% h}^{2})}}{\mu^{2}(k+1)}\right)},blackboard_E [ ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_c caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_k + 1 ) end_ARG ) ,
𝔼[dist2(x^k,𝒳)]𝒪(2(ch2)2μ2(k+1)2)1/q.𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳𝒪superscriptsuperscript2superscript𝑐superscriptsubscript22superscript𝜇2superscript𝑘121𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq\mathcal{O}\left(\frac{\mathcal{B}^{2}{\color[rgb]{0,0,0}(c\mathcal{B}_{h}% ^{2})^{2}}}{\mu^{2}(k+1)^{2}}\right)^{1/q}.blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ caligraphic_O ( divide start_ARG caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_c caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / italic_q end_POSTSUPERSCRIPT .

By replacing the values for \mathcal{L}caligraphic_L, \mathcal{B}caligraphic_B, hsubscript\mathcal{B}_{h}caligraphic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and c𝑐citalic_c from Theorem 3.5 for both types of sampling, i.e., partition or τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-nice samplings, we get:

𝔼[x^kx*2]𝒪(mτ2Nτ1B2c¯maxj=1:m2Bjμ2(k+1)),𝔼delimited-[]superscriptnormsubscript^𝑥𝑘superscript𝑥2𝒪𝑚subscript𝜏2𝑁subscript𝜏1superscript𝐵2¯𝑐superscriptsubscript:𝑗1𝑚2subscript𝐵𝑗superscript𝜇2𝑘1\displaystyle\mathbb{E}\left[\|\hat{x}_{k}-x^{*}\|^{2}\right]\leq{\color[rgb]{% 0,0,0}\mathcal{O}\left(\frac{m}{\tau_{2}}\frac{N}{\tau_{1}}\cdot\frac{B^{2}% \bar{c}\max_{j=1:m}^{2}B_{j}}{\mu^{2}(k+1)}\right)},blackboard_E [ ∥ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ caligraphic_O ( divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_c end_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_k + 1 ) end_ARG ) ,
𝔼[dist2(x^k,𝒳)]𝒪((mτ2)2Nτ1B2c¯2maxj=1:m4Bjμ2(k+1)2)1/q.𝔼delimited-[]superscriptdist2subscript^𝑥𝑘𝒳𝒪superscriptsuperscript𝑚subscript𝜏22𝑁subscript𝜏1superscript𝐵2superscript¯𝑐2superscriptsubscript:𝑗1𝑚4subscript𝐵𝑗superscript𝜇2superscript𝑘121𝑞\displaystyle\mathbb{E}\left[\text{dist}^{2}(\hat{x}_{k},\mathcal{X})\right]% \leq{\color[rgb]{0,0,0}\mathcal{O}\left(\left(\frac{m}{\tau_{2}}\right)^{2}% \frac{N}{\tau_{1}}\frac{B^{2}\bar{c}^{2}\max_{j=1:m}^{4}B_{j}}{\mu^{2}(k+1)^{2% }}\right)^{1/q}.}blackboard_E [ dist start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_X ) ] ≤ caligraphic_O ( ( divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG divide start_ARG italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT italic_j = 1 : italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_k + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / italic_q end_POSTSUPERSCRIPT .

One can easily see that also in this case the obtained rates have linear dependence on the mini-batch sizes (τ1,τ2)subscript𝜏1subscript𝜏2(\tau_{1},\tau_{2})( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Therefore, Theorem 4.6 also proves that in the quadratic growth convex case a mini-batch variant of the stochastic subgradient projection scheme with a stepsize-switching rule brings benefits over the nonmini-batch variant.

5 Numerical simulations

In this section, we consider a general quadratic program with quadratic constraints:

minxn12Axb2+Δx1subject to Cx+d0,ciTx+diQi1/2xi=1:m,subscript𝑥superscript𝑛12superscriptnorm𝐴𝑥𝑏2subscriptnormΔ𝑥1subject to :formulae-sequence𝐶𝑥𝑑0formulae-sequencesuperscriptsubscript𝑐𝑖𝑇𝑥subscript𝑑𝑖normsuperscriptsubscript𝑄𝑖12𝑥for-all𝑖1𝑚\begin{array}[]{rl}\min_{x\in\mathbb{R}^{n}}&\frac{1}{2}\|Ax-b\|^{2}+\|\Delta x% \|_{1}\\ \text{subject to }&Cx+d\geq 0,\;\;c_{i}^{T}x+d_{i}\geq\|Q_{i}^{-1/2}x\|\quad% \forall i=1:m,\end{array}start_ARRAY start_ROW start_CELL roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_A italic_x - italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ roman_Δ italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL subject to end_CELL start_CELL italic_C italic_x + italic_d ≥ 0 , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x + italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ ∥ italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_x ∥ ∀ italic_i = 1 : italic_m , end_CELL end_ROW end_ARRAY (29)

with the matrices AN×n𝐴superscript𝑁𝑛A\in\mathbb{R}^{N\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_n end_POSTSUPERSCRIPT, ΔN×nΔsuperscript𝑁𝑛\Delta\in\mathbb{R}^{N\times n}roman_Δ ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_n end_POSTSUPERSCRIPT, Cm×n𝐶superscript𝑚𝑛C\in\mathbb{R}^{m\times n}italic_C ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, Qimi×nsubscript𝑄𝑖superscriptsubscript𝑚𝑖𝑛Q_{i}\in\mathbb{R}^{m_{i}\times n}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_n end_POSTSUPERSCRIPT and cinsubscript𝑐𝑖superscript𝑛c_{i}\in\mathbb{R}^{n}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, with i=1:m:𝑖1𝑚i=1:mitalic_i = 1 : italic_m. One can notice that this problem fits into our general modeling framework (1) (e.g., define fi(x)=1/2(aiTxbi)2subscript𝑓𝑖𝑥12superscriptsuperscriptsubscript𝑎𝑖𝑇𝑥subscript𝑏𝑖2f_{i}(x)=1/2(a_{i}^{T}x-b_{i})^{2}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = 1 / 2 ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, with aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the i𝑖iitalic_ith row of matrix A𝐴Aitalic_A, gi(x)=δiTx1subscript𝑔𝑖𝑥subscriptnormsuperscriptsubscript𝛿𝑖𝑇𝑥1g_{i}(x)=\|\delta_{i}^{T}x\|_{1}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = ∥ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, with δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the i𝑖iitalic_ith row of matrix ΔΔ\Deltaroman_Δ, for all i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N, and hjsubscript𝑗h_{j}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are either linear or quadratic constraints, for all j=1:2m:𝑗12𝑚j=1:2mitalic_j = 1 : 2 italic_m). Moreover, (29) is a general constrained Lasso problem which appears in many applications from machine learning, signal processing and statistics, see [2, 17, 10, 4, 9]. In particular if one considers appropriate matrices A𝐴Aitalic_A, ΔΔ\Deltaroman_Δ, C𝐶Citalic_C and Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, one can recast the robust (sparse) SVM problem from [2, 17] as problem (29). Indeed, the robust (sparse) SVM problem is defined as [2, 17]:

minw,d,uλ2w2+δi=1mui+w1subscript𝑤𝑑𝑢𝜆2superscriptnorm𝑤2𝛿superscriptsubscript𝑖1𝑚subscript𝑢𝑖subscriptnorm𝑤1\displaystyle\min_{w,d,u}\;\frac{\lambda}{2}\|w\|^{2}+\delta\sum_{i=1}^{m}u_{i% }+\|w\|_{1}roman_min start_POSTSUBSCRIPT italic_w , italic_d , italic_u end_POSTSUBSCRIPT divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ∥ italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_δ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∥ italic_w ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
subject to:u0,yi(wTz¯i+d)1ui,formulae-sequencesubject to:𝑢0subscript𝑦𝑖superscript𝑤𝑇subscript¯𝑧𝑖𝑑1subscript𝑢𝑖\displaystyle\text{subject to:}\;u\geq 0,\;y_{i}(w^{T}\bar{z}_{i}+d)\geq 1\!-% \!u_{i},subject to: italic_u ≥ 0 , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_d ) ≥ 1 - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,
yi(wTz¯i+d)Qi1/2w+1uii=1:m,:subscript𝑦𝑖superscript𝑤𝑇subscript¯𝑧𝑖𝑑normsuperscriptsubscript𝑄𝑖12𝑤1subscript𝑢𝑖for-all𝑖1𝑚\displaystyle\qquad\qquad\;\;\;y_{i}(w^{T}\bar{z}_{i}+d)\geq\|Q_{i}^{-1/2}w\|+% 1\!-\!u_{i}\;\;\;\forall i=1\!:\!m,italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_d ) ≥ ∥ italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_w ∥ + 1 - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∀ italic_i = 1 : italic_m ,

where (z¯i)i=1msuperscriptsubscriptsubscript¯𝑧𝑖𝑖1𝑚(\bar{z}_{i})_{i=1}^{m}( over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is the training dataset, (yi)i=1m{1,1}superscriptsubscriptsubscript𝑦𝑖𝑖1𝑚11(y_{i})_{i=1}^{m}\in\{-1,1\}( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∈ { - 1 , 1 } are the corresponding labels, Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are diagonal matrices with positive entries, δ>0𝛿0\delta>0italic_δ > 0 and (w,d)n×𝑤𝑑superscript𝑛(w,d)\in\mathbb{R}^{n}\times\mathbb{R}( italic_w , italic_d ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R are the parameters of the hyperplane to separate the data.

In the numerical experiments we consider random matrices A𝐴Aitalic_A and C𝐶Citalic_C and diagonal matrices ΔΔ\Deltaroman_Δ and Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, all generated from normal distributions. We consider as epoch max(Nτ1,mτ2)𝑁subscript𝜏1𝑚subscript𝜏2\max\left(\frac{N}{\tau_{1}},\frac{m}{\tau_{2}}\right)roman_max ( divide start_ARG italic_N end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , divide start_ARG italic_m end_ARG start_ARG italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) iterations of Mini-batch SSP algorithm and our stop** criteria are max(0,h(x))2102subscriptnorm0𝑥2superscript102\|\max(0,h(x))\|_{2}\leq 10^{-2}∥ roman_max ( 0 , italic_h ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT and F(x)F*102𝐹𝑥superscript𝐹superscript102F(x)-F^{*}\leq 10^{-2}italic_F ( italic_x ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≤ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT (we consider CVX solution [6] for computing F*superscript𝐹F^{*}italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, when CVX finishes in a reasonable time). The codes are written in Matlab and run on a PC with i7 CPU at 2.1 GHz and 16 GB RAM memory.

Figure 1 shows the convergence behaviour of Mini-batch SSP algorithm along epochs with four different choices for mini-batch sizes (τ1,τ2)subscript𝜏1subscript𝜏2(\tau_{1},\tau_{2})( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) as (1,1),(20,80),(60,160)11208060160(1,1),\;(20,80),\;(60,160)( 1 , 1 ) , ( 20 , 80 ) , ( 60 , 160 ) and (N=120,m=240)formulae-sequence𝑁120𝑚240(N=120,m=240)( italic_N = 120 , italic_m = 240 ) in terms of optimality (left) and feasibility (right) for solving the constrained Lasso problem (29) with N=120,n=110,m=240formulae-sequence𝑁120formulae-sequence𝑛110𝑚240N=120,n=110,m=240italic_N = 120 , italic_n = 110 , italic_m = 240. As we can see from this figure, increasing the minibatch sizes (τ1,τ2subscript𝜏1subscript𝜏2\tau_{1},\tau_{2}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) leads to better convergence than the nonmini-batch counterpart, as our theory also predicted.

Refer to caption
Refer to caption
Figure 1: Behaviour of Mini-batch SSP algorithm in terms of optimality (left) and feasibility (right) for N=120,n=110,m=240formulae-sequence𝑁120formulae-sequence𝑛110𝑚240N=120,n=110,m=240italic_N = 120 , italic_n = 110 , italic_m = 240 and different mini-batch sizes (τ1,τ2subscript𝜏1subscript𝜏2\tau_{1},\tau_{2}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT).

Finally, in Table 1 we compare Mini-batch SSP algorithm with CVX in terms of cpu time (in seconds) for solving problem (29) over different dimensions of the problem ranging from several hundreds to thousands of functions (N𝑁Nitalic_N) and constraints (m𝑚mitalic_m), respectively (note that if N<n𝑁𝑛N<nitalic_N < italic_n, then the objective function F𝐹Fitalic_F is convex, otherwise F𝐹Fitalic_F is strongly convex). For Mini-batch SSP algorithm we consider four different choices for mini-batch sizes and in the table we also give the number of epochs. The results we present in the table is the average of 10101010 runs on the same problem. From the table we observe that for some choices of mini-batch sizes Mini-batch SSP algorithm is even 10101010 times faster than CVX (”*” means that CVX has not finished after 3 hours). Moreover, Mini-batch SSP is much faster than its nonmini-batch counterpart.

Sizes
Mini-batch SSP
sizes(τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT)
epochs cpu time
CVX
cpu time
N = 120, m = 240, n = 110 (1, 1) 655 0.17
(20, 80) 148 0.06 1.44
(60, 160) 131 0.07
(N, m) 166 0.11
N = 100, m = 240, n = 110 (1, 1) 1023 0.25
(20, 80) 202 0.08 1.51
(60, 160) 175 0.08
(N, m) 357 0.21
N = 1200, m = 2400, n = 1100 (1, 1) 8131 51.94
(200, 800) 958 9.38 177.08
(600, 1600) 713 9.59
(N, m) 2327 48.81
N = 1000, m = 2400, n = 1100 (1, 1) 13115 66.15
(200, 800) 1983 14.70 179.67
(600, 1600) 1158 12.07
(N, m) 5771 61.33
N = 3600, m = 7200, n = 3300 (1, 1) 19491 2008.60
(600, 2400) 298 52.94 *
(1800, 4800) 1432 387.91
(N, m) 1200 464.79
N = 3000, m = 7200, n = 3300 (1, 1) 40168 3618.37
(600, 2400) 2990 457.99 *
(1800, 4800) 2130 471.87
(N, m) 24903 7260.44
Table 1: Comparison between Mini-batch SSP and CVX for different dimensions and mini-batch sizes.

6 Conclusions

In this paper we have considered a deterministic general finite sum composite optimization problem with many functional constraints. We have reformulated this problem into a stochastic problem for which the stochastic subgradient projection method from [17] specializes to an infinite array of mini-batch variants, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. By specializing different mini-batching strategies, we have derived exact expressions for the stepsizes as a function of the mini-batch size and in some cases we have derived stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. We have also proved sublinear convergence rates for the mini-batch subgradient projection algorithm which depend explicitly on the mini-batch sizes and on the properties of the objective function. Preliminary numerical results support the effectiveness of our method in practice.

Funding

The research leading to these results has received funding from: the NO Grants 2014–2021 RO-NO-2019-0184, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022 for N.K. Singh and I. Necoara. The OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 Research Center for Informatics for V. Kungurtsev.

References

  • [1] H. Asi, K. Chadha, G. Cheng and J. Duchi, Minibatch stochastic approximate proximal point methods, Advances in Neural Information Processing Systems Conference, 2020.
  • Bhattacharyya et al., [2004] C. Bhattacharyya, L.R. Grate, M.I. Jordan, L. El Ghaoui and S. Mian, Robust sparse hyperplane classifiers: Application to uncertain molecular profiling data, Journal of Computational Biology, 11(6): 1073–1089, 2004.
  • [3] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10: 2899–2934, 2009.
  • [4] B.R. Gaines, J. Kim and H. Zhou, Algorithms for fitting the constrained Lasso, J. Comput. Graph. Stat., 27(4): 861–871, 2018.
  • [5] G. Garrigos and R.M. Gower, Handbook of convergence theorems for (stochastic) gradient methods, arXiv:2301.11235v2, 2023.
  • [6] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.0 beta, http://cvxr.com/cvx, 2013.
  • [7] E. Gorbunov, F. Hanzely and P. Richtarik, A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, vol. 108, 2020.
  • [8] M. Hardt, B. Recht and Y. Singer, Train faster, generalize better: stability of stochastic gradient descent, International Conference on Machine Learning, 2016.
  • [9] Q. Hu, P. Zeng and L. Lin, The dual and degrees of freedom of linearly constrained generalized lasso, Comput. Stat. Data Anal., 86:13–26, 2015.
  • [10] G.M. James, C. Paulsonand and P. Rusmevichientong, Penalized and constrained optimization: an application to high-dimensional website advertising, SIAM Journal on Optimization, 30(4), 3230–3251, 2019.
  • [11] L. Jacob, G. Obozinski and J.P. Vert, Group lasso with overlap and graph lasso, International Conference on Machine Learning, 433–-440, 2009.
  • [12] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, 315–323, 2013.
  • [13] A. Lewis and J.S. Pang, Error bounds for convex inequality systems, Generalized Convexity, Generalized Monotonicity (J.-P. Crouzeix, J.-E.Martinez-Legaz, and M. Volle, eds.), 75–110, Cambridge University Press, 1998.
  • [14] H. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems Conference, 2015.
  • [15] E. Moulines and F. Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems Conf., 2011.
  • Nedelcu et al., [2014] V. Nedelcu, I. Necoara and Q. Tran Dinh, Computational complexity of inexact gradient augmented Lagrangian methods: application to constrained MPC, SIAM Journal on Control and Optimization, 52(5): 3109–3134, 2014.
  • [17] I. Necoara and N.K. Singh Stochastic subgradient projection methods for composite optimization with functional constraints, arXiv preprint: 2204.08204, 2022.
  • [18] I. Necoara, General convergence analysis of stochastic first order methods for composite optimization, Journal of Optimization Theory and Applications, 189: 66–95 2021.
  • [19] A. Nemirovski and D.B. Yudin, Problem complexity and method efficiency in optimization, Wiley Interscience, 1983.
  • [20] Yu. Nesterov, Lectures on Convex Optimization, Springer Optimization and Its Applications, 137, 2018.
  • [21] A. Nedich and I. Necoara. Random minibatch subgradient algorithms for convex problems with functional constraints, Applied Mathematics and Optimization, 8(3): 801–833, 2019.
  • [22] A. Nemirovski, A. Juditsky, G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journ. Optimization, 19(4): 1574–1609, 2009.
  • [23] X. Peng, L. Li and F. Wang, Accelerating minibatch stochastic gradient descent using typicality sampling, IEEE Trans. Neural Networks Learn. Syst., 2019.
  • [24] B.T. Polyak, Minimization of unsmooth functionals, USSR Computational Mathematics and Mathematical Physics, 9(3): 14-29, 1969.
  • [25] B.T. Polyak and A.B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, 30(4): 838–855, 1992.
  • [26] A. Patrascu and I. Necoara, Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization, Journal of Machine Learning Research, 18(198): 1–42, 2018.
  • [27] P. Richtarik and M. Takac, On optimal probabilities in stochastic coordinate descent methods, Optimization Letters, 10(6): 1233-1243, 2016.
  • [28] R.T. Rockafellar and S.P. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2: 21–41, 2000.
  • [29] R. Gower, L. Nicolas, Q. Xun, S. Alibek, S. Egor and P. Richtarik, SGD: General Analysis and Improved Rates, International Conference on Machine Learning, 2019.
  • [30] H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, 22(3): 400–407, 1951.
  • [31] L. Rosasco, S. Villa and B.C. Vu, Convergence of stochastic proximal gradient algorithm, Applied Mathematics and Optimization, 82: 891–917 , 2020.
  • [32] J. Renegar and S. Zhou A different perspective on the stochastic convex feasibility problem, arXiv preprint: 2108.12029v1, 2021.
  • Tibshirani, [2011] R. Tibshirani, The solution path of the generalized lasso, Phd Thesis, Stanford Univ., 2011.
  • Vapnik, [1998] V. Vapnik, Statistical learning theory, John Wiley, 1998.
  • [35] T. Yang and Q. Lin, RSG: Beating subgradient method without smoothness and strong convexity, Journal of Machine Learning Research, 19(6): 1–33, 2018.
  • [36] X. Yin and İ Büyüktahtakın, A multi-stage stochastic programming approach to epidemic resource allocation with equity considerations, Health Care Management Science, 24(3): 597–622, 2021.
  • [37] M. Zafar, I. Valera, M. Gomez-Rodriguez and K. Gummadi, Fairness constraints: A flexible approach for fair classification, Journal of Machine Learning Research, 20(1): 2737–2778, 2019.