Sharp bounds for max-sliced Wasserstein distances

March T. Boedihardjo Department of Mathematics, Michigan State University, East Lansing, MI 48824 [email protected]
Abstract.

We obtain essentially matching upper and lower bounds for the expected max-sliced 1-Wasserstein distance between a probability measure on a separable Hilbert space and its empirical distribution from n𝑛nitalic_n samples. By proving a Banach space version of this result, we also obtain an upper bound, that is sharp up to a log factor, for the expected max-sliced 2-Wasserstein distance between a symmetric probability measure μ𝜇\muitalic_μ on a Euclidean space and its symmetrized empirical distribution in terms of the operator norm of the covariance matrix of μ𝜇\muitalic_μ and the diameter of the support of μ𝜇\muitalic_μ.

Key words and phrases:
Max-sliced Wasserstein, Projection robust Wasserstein
2020 Mathematics Subject Classification:
60B11, 62G05, 62R20

1. Introduction

Suppose that μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with dx22𝑑x<subscriptsuperscript𝑑superscriptsubscriptnorm𝑥22differential-d𝑥\int_{\mathbb{R}^{d}}\|x\|_{2}^{2}\,dx<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x < ∞, where 2\|\,\|_{2}∥ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the Euclidean norm on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. samples of μ𝜇\muitalic_μ. How many samples are needed so that the empirical distribution 1ni=1nδXi1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is “close” to μ𝜇\muitalic_μ? Obviously the answer depends on the notion of “close” we use. If we want the covariance matrix of 1ni=1nδXi1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT to be close, in the operator norm, to the covariance matrix of μ𝜇\muitalic_μ, it is already a very deep question of how many samples are needed, though by now, in some aspects, this question has been settled after a series of work [32, 2, 3, 39, 33, 21, 15, 35, 44, 1]. In general, after certain rescaling, O(dlogd)𝑂𝑑𝑑O(d\log d)italic_O ( italic_d roman_log italic_d ) samples suffice to accurately approximate the covariance matrix of μ𝜇\muitalic_μ. On the other hand, if we want 1ni=1nδXi1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and μ𝜇\muitalic_μ to be close in the Wasserstein distance, we need n𝑛nitalic_n to be exponentially large in d𝑑ditalic_d (see, e.g., [12]).

To circumvent this curse of dimensionality issue, in recent years, the notions of sliced, max sliced and projection robust Wasserstein distances have been introduced and used in applications [31, 7, 9, 10, 13, 11, 14, 22, 28, 43, 18, 23, 24]. They were further studied in [26, 42, 19, 4, 25, 27]. The max sliced p𝑝pitalic_p-Wasserstein distance between two probability measures μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is

(1.1) Wp,1(μ1,μ2):=supvd,v2=1Wp(v#μ1,v#μ2),assignsubscript𝑊𝑝1subscript𝜇1subscript𝜇2subscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21subscript𝑊𝑝subscript𝑣#subscript𝜇1subscript𝑣#subscript𝜇2W_{p,1}(\mu_{1},\mu_{2}):=\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}=1}W_{p}(v_{\#}% \mu_{1},v_{\#}\mu_{2}),italic_W start_POSTSUBSCRIPT italic_p , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) := roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

where v#μisubscript𝑣#subscript𝜇𝑖v_{\#}\mu_{i}italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the pushforward probability measure of μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by the map ,v𝑣\langle\cdot,v\rangle⟨ ⋅ , italic_v ⟩, i.e., if μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the distribution of a random vector Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, then v#μisubscript𝑣#subscript𝜇𝑖v_{\#}\mu_{i}italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the distribution of the random variable Xi,vsubscript𝑋𝑖𝑣\langle X_{i},v\rangle⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩. The quantity Wp(v#μ1,v#μ2)subscript𝑊𝑝subscript𝑣#subscript𝜇1subscript𝑣#subscript𝜇2W_{p}(v_{\#}\mu_{1},v_{\#}\mu_{2})italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) denotes the p𝑝pitalic_p-Wasserstein distance between the measures v#μ1subscript𝑣#subscript𝜇1v_{\#}\mu_{1}italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and v#μ2subscript𝑣#subscript𝜇2v_{\#}\mu_{2}italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on \mathbb{R}blackboard_R. The sliced Wasserstein distance (which we do not study in this paper) is the notion where in (1.1), we replace the supremum over v𝑣vitalic_v by the integral of Wp(v#μ1,v#μ2)psubscript𝑊𝑝superscriptsubscript𝑣#subscript𝜇1subscript𝑣#subscript𝜇2𝑝W_{p}(v_{\#}\mu_{1},v_{\#}\mu_{2})^{p}italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over v𝑣vitalic_v on the unit sphere and then take the p𝑝pitalic_pth root. The projection robust Wasserstein distance Wp,ssubscript𝑊𝑝𝑠W_{p,s}italic_W start_POSTSUBSCRIPT italic_p , italic_s end_POSTSUBSCRIPT (which we also study in this paper) is the notion where in (1.1), we take the p𝑝pitalic_p-Wasserstein distance between the pushforward measures of μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by a projection onto a subspace of a fixed dimension s𝑠sitalic_s and then take supremum over all such subspaces. When s=1𝑠1s=1italic_s = 1, this is the max-sliced Wasserstein distance Wp,1subscript𝑊𝑝1W_{p,1}italic_W start_POSTSUBSCRIPT italic_p , 1 end_POSTSUBSCRIPT.

1.1. Max-sliced 1-Wasserstein distance

When p=1𝑝1p=1italic_p = 1, by the Kantorovich-Rubinstein theorem, the max-sliced 1111-Wasserstein distance between two probability measures μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT coincides with the following quantity:

(1.2) W1,1(μ1,μ2)=supvd,v2=1f is 1-Lipschitz|df(x,v)𝑑μ1(x)df(x,v)𝑑μ2(x)|,subscript𝑊11subscript𝜇1subscript𝜇2subscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21𝑓 is 1-Lipschitzsubscriptsuperscript𝑑𝑓𝑥𝑣differential-dsubscript𝜇1𝑥subscriptsuperscript𝑑𝑓𝑥𝑣differential-dsubscript𝜇2𝑥W_{1,1}(\mu_{1},\mu_{2})=\sup_{\begin{subarray}{c}v\in\mathbb{R}^{d},\,\|v\|_{% 2}=1\\ f\text{ is 1-Lipschitz}\end{subarray}}\left|\int_{\mathbb{R}^{d}}f(\langle x,v% \rangle)\,d\mu_{1}(x)-\int_{\mathbb{R}^{d}}f(\langle x,v\rangle)\,d\mu_{2}(x)% \right|,italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_CELL end_ROW start_ROW start_CELL italic_f is 1-Lipschitz end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( ⟨ italic_x , italic_v ⟩ ) italic_d italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) - ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( ⟨ italic_x , italic_v ⟩ ) italic_d italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) | ,

where the supremum is over all the v𝑣vitalic_v on the unit sphere and over all the 1-Lipschitz functions f::𝑓f:\mathbb{R}\to\mathbb{R}italic_f : blackboard_R → blackboard_R (i.e., |f(x)f(y)||xy|𝑓𝑥𝑓𝑦𝑥𝑦|f(x)-f(y)|\leq|x-y|| italic_f ( italic_x ) - italic_f ( italic_y ) | ≤ | italic_x - italic_y | for all x,y𝑥𝑦x,y\in\mathbb{R}italic_x , italic_y ∈ blackboard_R). Consider the following problem:

Problem 1.

Suppose that μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. samples of μ𝜇\muitalic_μ. Estimate 𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

There are known estimates (some of which are sharp) of 𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) under certain regularity assumptions on the measure μ𝜇\muitalic_μ, e.g., log-concavity of μ𝜇\muitalic_μ [25, Theorem 1] and [4, Theorem 1.6], or μ𝜇\muitalic_μ satisfying the spiked transport model and the transport inequality [26, Theorem 1], or μ𝜇\muitalic_μ satisfying the projection Bernstein tail condition or the projection Poincaré inequality [19, Theorem 3.5 and Theorem 3.6], or μ𝜇\muitalic_μ being isotropic with its marginal distributions having uniformly bounded 4th moments [4, Proposition 4.1] (see also [4, Remark 4.2]).

As for the most general setting, under the only assumption of μ𝜇\muitalic_μ being supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }, it was shown in [25, Proposition 1] that 𝔼W1,1(μ,1ni=1nδXi)Crdn𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟𝑑𝑛\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot\frac% {rd}{\sqrt{n}}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ divide start_ARG italic_r italic_d end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant. In [27, Theorem 2], this was improved to 𝔼W1,1(μ,1ni=1nδXi)Crdn𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟𝑑𝑛\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot\frac% {r\sqrt{d}}{\sqrt{n}}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ divide start_ARG italic_r square-root start_ARG italic_d end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG. In these two bounds, the rate of convergence 1n1𝑛\frac{1}{\sqrt{n}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG is optimal in n𝑛nitalic_n, but both bounds involve the dimension d𝑑ditalic_d.

There is a dimension-free bound for 𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) that holds with the same generality. More precisely, if μ𝜇\muitalic_μ is supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }, then 𝔼W1,1(μ,1ni=1nδXi)Crn1/3𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟superscript𝑛13\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot r% \cdot n^{-1/3}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ italic_r ⋅ italic_n start_POSTSUPERSCRIPT - 1 / 3 end_POSTSUPERSCRIPT, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant. This follows by taking k=1𝑘1k=1italic_k = 1 and optimizing the ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 in the term 𝒥nsubscript𝒥𝑛\mathcal{J}_{n}caligraphic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in [42, Theorem 1]. This estimate is dimension-free but comes at the cost of slower convergence rate in n𝑛nitalic_n.

In short, the literature concerning Problem 1 can be summarized as follows.

  1. (1)

    If μ𝜇\muitalic_μ is supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }, then 𝔼W1,1(μ,1ni=1nδXi)C(d)rn𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑑𝑟𝑛\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C(d)\cdot% \frac{r}{\sqrt{n}}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ( italic_d ) ⋅ divide start_ARG italic_r end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG, where C(d)1𝐶𝑑1C(d)\geq 1italic_C ( italic_d ) ≥ 1 is a constant that depends only on d𝑑ditalic_d.

  2. (2)

    If μ𝜇\muitalic_μ is supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }, then 𝔼W1,1(μ,1ni=1nδXi)Crn1/3𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟superscript𝑛13\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot r% \cdot n^{-1/3}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ italic_r ⋅ italic_n start_POSTSUPERSCRIPT - 1 / 3 end_POSTSUPERSCRIPT, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

  3. (3)

    If in addition, μ𝜇\muitalic_μ satisfies certain regularity assumptions, then 𝔼W1,1(μ,1ni=1nδXi)Crn𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟𝑛\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot\frac% {r}{\sqrt{n}}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ divide start_ARG italic_r end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

These results together suggest the following question. Does the dimension-free bound 𝔼W1,1(μ,1ni=1nδXi)Crn𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟𝑛\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})\leq C\cdot\frac% {r}{\sqrt{n}}blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ divide start_ARG italic_r end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant, actually hold for every μ𝜇\muitalic_μ supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r } even without any regularity assumptions?

In the first main result of this paper, we answer this question affirmatively. We obtain essentially matching dimension-free upper and lower bounds for 𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{1,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) in the most general setting. This essentially settles Problem 1.

Theorem 1.1.

Suppose that μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with dx2𝑑μ(x)<subscriptsuperscript𝑑subscriptnorm𝑥2differential-d𝜇𝑥\int_{\mathbb{R}^{d}}\|x\|_{2}\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d italic_μ ( italic_x ) < ∞ and dx𝑑μ(x)=0subscriptsuperscript𝑑𝑥differential-d𝜇𝑥0\int_{\mathbb{R}^{d}}x\,d\mu(x)=0∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_d italic_μ ( italic_x ) = 0. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random vectors in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT sampled according to μ𝜇\muitalic_μ. Then

122ndx2𝑑μ(x)𝔼W1,1(μ,1ni=1nδXi)Cninf0<δ11δ(dx22+δ𝑑μ(x))12+δ,122𝑛subscriptsuperscript𝑑subscriptnorm𝑥2differential-d𝜇𝑥𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑛subscriptinfimum0𝛿11𝛿superscriptsubscriptsuperscript𝑑superscriptsubscriptnorm𝑥22𝛿differential-d𝜇𝑥12𝛿\frac{1}{2\sqrt{2n}}\int_{\mathbb{R}^{d}}\|x\|_{2}\,d\mu(x)\leq\mathbb{E}W_{1,% 1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq\frac{C}{\sqrt{n% }}\cdot\inf_{0<\delta\leq 1}\frac{1}{\sqrt{\delta}}\left(\int_{\mathbb{R}^{d}}% \|x\|_{2}^{2+\delta}\,d\mu(x)\right)^{\frac{1}{2+\delta}},divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG 2 italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d italic_μ ( italic_x ) ≤ blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_C end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ roman_inf start_POSTSUBSCRIPT 0 < italic_δ ≤ 1 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_δ end_ARG end_ARG ( ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + italic_δ end_ARG end_POSTSUPERSCRIPT ,

where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

We also obtain a version of Theorem 1.1 for probability measures on Banach spaces. Beside being a result of intrinsic interest in the study of probability in Banach spaces (see [17]), this result is essential for proving the second main result Theorem 1.4 of this paper on the max-sliced 2-Wasserstein distance for probability measures on Euclidean spaces. Indeed, in proving the latter result, we will take the Banach space E𝐸Eitalic_E to be the space of all d×d𝑑𝑑d\times ditalic_d × italic_d matrices equipped with the operator norm. In the Banach space setting, to define the metric W1,1subscript𝑊11W_{1,1}italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT, in (1.2), instead of taking supremum over v𝑣vitalic_v on the unit sphere, we take supremum over all linear functionals vBEsuperscript𝑣subscript𝐵superscript𝐸v^{*}\in B_{E^{*}}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, where BEsubscript𝐵superscript𝐸B_{E^{*}}italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the unit ball of the dual space Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT centered at the origin. See Section 1.3 for the precise definition.

Theorem 1.2.

Suppose that μ𝜇\muitalic_μ is a probability measure on a Banach space (E,)(E,\|\,\|)( italic_E , ∥ ∥ ) with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and that Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞ and Ex𝑑μ(x)=0subscript𝐸𝑥differential-d𝜇𝑥0\int_{E}x\,d\mu(x)=0∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_x italic_d italic_μ ( italic_x ) = 0. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

12n𝔼i=1nϵiXi12𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖\displaystyle\frac{1}{2n}\mathbb{E}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ \displaystyle\leq 𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}% }\right)blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
\displaystyle\leq Cn𝔼i=1ngiXi+Clnnn𝔼supvBE(i=1n|v(Xi)|2)12,𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑛𝑛𝔼subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\displaystyle\frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|+% \frac{C\sqrt{\ln n}}{n}\cdot\mathbb{E}\sup_{v^{*}\in B_{E^{*}}}\left(\sum_{i=1% }^{n}|v^{*}(X_{i})|^{2}\right)^{\frac{1}{2}},divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where ϵ1,,ϵnsubscriptitalic-ϵ1subscriptitalic-ϵ𝑛\epsilon_{1},\ldots,\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. uniform ±1plus-or-minus1\pm 1± 1 random variables and g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. standard Gaussian random variables that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

Remark.

If we fix X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the quantity in the last term supvBE(i=1n|v(Xi)|2)12subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\sup_{v^{*}\in B_{E^{*}}}\left(\sum_{i=1}^{n}|v^{*}(X_{i})|^{2}\right)^{\frac{% 1}{2}}roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT is exactly the Lipschitz constant of the function (g1,,gn)i=1ngiXimaps-tosubscript𝑔1subscript𝑔𝑛normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖(g_{1},\ldots,g_{n})\mapsto\|\sum_{i=1}^{n}g_{i}X_{i}\|( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ↦ ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ with respect to the Euclidean norm on nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Moreover, by Khintchine’s inequality, if we take the expectation 𝔼ϵsubscript𝔼italic-ϵ\mathbb{E}_{\epsilon}blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT on ϵ1,,ϵnsubscriptitalic-ϵ1subscriptitalic-ϵ𝑛\epsilon_{1},\ldots,\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have for vBEsuperscript𝑣subscript𝐵superscript𝐸v^{*}\in B_{E^{*}}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT,

𝔼ϵi=1nϵiXi𝔼ϵ|i=1nϵiv(Xi)|c(𝔼ϵ|i=1nϵiv(Xi)|2)12=c(i=1n|v(Xi)|2)12,subscript𝔼italic-ϵnormsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖subscript𝔼italic-ϵsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖superscript𝑣subscript𝑋𝑖𝑐superscriptsubscript𝔼italic-ϵsuperscriptsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖superscript𝑣subscript𝑋𝑖212𝑐superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\mathbb{E}_{\epsilon}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|\geq\mathbb% {E}_{\epsilon}\left|\sum_{i=1}^{n}\epsilon_{i}v^{*}(X_{i})\right|\geq c\left(% \mathbb{E}_{\epsilon}\left|\sum_{i=1}^{n}\epsilon_{i}v^{*}(X_{i})\right|^{2}% \right)^{\frac{1}{2}}=c\left(\sum_{i=1}^{n}|v^{*}(X_{i})|^{2}\right)^{\frac{1}% {2}},blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≥ italic_c ( blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = italic_c ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where c>0𝑐0c>0italic_c > 0 is a universal constant. So if we take supremum over vBEsuperscript𝑣subscript𝐵superscript𝐸v^{*}\in B_{E^{*}}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and then take the full expectation, we obtain

𝔼i=1nϵiXic𝔼supvBE(i=1n|v(Xi)|2)12.𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖𝑐𝔼subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\mathbb{E}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|\geq c\cdot\mathbb{E}% \sup_{v^{*}\in B_{E^{*}}}\left(\sum_{i=1}^{n}|v^{*}(X_{i})|^{2}\right)^{\frac{% 1}{2}}.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ italic_c ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

Therefore, since 𝔼i=1ngiXiClnn𝔼i=1nϵiXi𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖\mathbb{E}\|\sum_{i=1}^{n}g_{i}X_{i}\|\leq C\sqrt{\ln n}\cdot\mathbb{E}\|\sum_% {i=1}^{n}\epsilon_{i}X_{i}\|blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ italic_C square-root start_ARG roman_ln italic_n end_ARG ⋅ blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ [37, Exercise 7.1], the upper and lower bounds in Theorem 1.2 differ by at most a lnn𝑛\sqrt{\ln n}square-root start_ARG roman_ln italic_n end_ARG factor.

1.2. Max-sliced 2-Wasserstein distance

We now turn to the problem of estimating the expected max-sliced 2-Wasserstein distance 𝔼W2,1(μ,1ni=1nδXi)𝔼subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{2,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

Unlike in Theorem 1.1, for the max-sliced 2-Wasserstein distance, the convergence rate is not always the same. Even in dimension one, for certain log-concave measures μ𝜇\muitalic_μ on \mathbb{R}blackboard_R, for p1𝑝1p\geq 1italic_p ≥ 1, the quantity 𝔼Wp(μ,1ni=1nδXi)𝔼subscript𝑊𝑝𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{p}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is of order 1n1𝑛\frac{1}{\sqrt{n}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG [6]. However, if μ𝜇\muitalic_μ is uniformly distributed on two points 1,1111,-1\in\mathbb{R}1 , - 1 ∈ blackboard_R, one can easily see that 𝔼Wp(μ,1ni=1nδXi)𝔼subscript𝑊𝑝𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{p}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is of order n1/(2p)superscript𝑛12𝑝n^{-1/(2p)}italic_n start_POSTSUPERSCRIPT - 1 / ( 2 italic_p ) end_POSTSUPERSCRIPT, which is much slower than 1n1𝑛\frac{1}{\sqrt{n}}divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG when p>1𝑝1p>1italic_p > 1.

Similarly, for the max-sliced 2-Wasserstein distance, if we assume certain regularity assumptions on μ𝜇\muitalic_μ (e.g., μ𝜇\muitalic_μ is log-concave [4, 25]), then 𝔼W2(μ,1ni=1nδXi)=O(1n)𝔼subscript𝑊2𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝑂1𝑛\mathbb{E}W_{2}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})=O(\frac{1}{\sqrt{% n}})blackboard_E italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = italic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ) or O(lognn)𝑂𝑛𝑛O(\frac{\log n}{\sqrt{n}})italic_O ( divide start_ARG roman_log italic_n end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ). (Let’s ignore the dimension factors for a short moment.) On the other hand, even if μ𝜇\muitalic_μ is isotropic and its marginal distributions have uniformly bounded 4th moments, the quantity 𝔼W2(μ,1ni=1nδXi)𝔼subscript𝑊2𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{2}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})blackboard_E italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) could already be as large as c(d/n)14𝑐superscript𝑑𝑛14c\cdot(d/n)^{\frac{1}{4}}italic_c ⋅ ( italic_d / italic_n ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT for some universal constant c>0𝑐0c>0italic_c > 0 [4, Example 3.3].

Thus, in the most general setting (i.e., no regularity assumptions on μ𝜇\muitalic_μ), the best convergence rate in n𝑛nitalic_n for the max-sliced 2-Wasserstein distance we can hope for is n1/4superscript𝑛14n^{-1/4}italic_n start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT.

Corollary 1.3.

Let r>0𝑟0r>0italic_r > 0. Suppose that μ𝜇\muitalic_μ is a probability measure on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random vectors in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT sampled according to μ𝜇\muitalic_μ. Then for all p1𝑝1p\geq 1italic_p ≥ 1,

𝔼Wp,1(μ,1ni=1nδXi)Crn1/(2p),𝔼subscript𝑊𝑝1𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑟superscript𝑛12𝑝\mathbb{E}W_{p,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq C% \cdot r\cdot n^{-1/(2p)},blackboard_E italic_W start_POSTSUBSCRIPT italic_p , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ italic_r ⋅ italic_n start_POSTSUPERSCRIPT - 1 / ( 2 italic_p ) end_POSTSUPERSCRIPT ,

where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

Proof.

For two probability measures μ1,μ2subscript𝜇1subscript𝜇2\mu_{1},\mu_{2}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }, it is easy to see that Wp,1(μ1,μ2)p(2r)p1W1,1(μ1,μ2)subscript𝑊𝑝1superscriptsubscript𝜇1subscript𝜇2𝑝superscript2𝑟𝑝1subscript𝑊11subscript𝜇1subscript𝜇2W_{p,1}(\mu_{1},\mu_{2})^{p}\leq(2r)^{p-1}\cdot W_{1,1}(\mu_{1},\mu_{2})italic_W start_POSTSUBSCRIPT italic_p , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ≤ ( 2 italic_r ) start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ⋅ italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Thus by Theorem 1.1, the result follows. ∎

Corollary 1.3 removes the dimension factor in the estimate of 𝔼Wp,1(μ,1ni=1nδXi)𝔼subscript𝑊𝑝1𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\mathbb{E}W_{p,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)blackboard_E italic_W start_POSTSUBSCRIPT italic_p , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) in [27, Theorem 2].

The upper bound Crn1/(2p)𝐶𝑟superscript𝑛12𝑝C\cdot r\cdot n^{-1/(2p)}italic_C ⋅ italic_r ⋅ italic_n start_POSTSUPERSCRIPT - 1 / ( 2 italic_p ) end_POSTSUPERSCRIPT in Corollary 1.3 is attained, up to the constant C𝐶Citalic_C, when μ=12δy0+12δy0𝜇12subscript𝛿subscript𝑦012subscript𝛿subscript𝑦0\mu=\frac{1}{2}\delta_{y_{0}}+\frac{1}{2}\delta_{y_{0}}italic_μ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is uniformly distributed on two points y0,y0dsubscript𝑦0subscript𝑦0superscript𝑑y_{0},-y_{0}\in\mathbb{R}^{d}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT being any vector with y02=rsubscriptnormsubscript𝑦02𝑟\|y_{0}\|_{2}=r∥ italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_r.

While the bound Crn1/(2p)𝐶𝑟superscript𝑛12𝑝C\cdot r\cdot n^{-1/(2p)}italic_C ⋅ italic_r ⋅ italic_n start_POSTSUPERSCRIPT - 1 / ( 2 italic_p ) end_POSTSUPERSCRIPT in Corollary 1.3 is sharp in n,r,p𝑛𝑟𝑝n,r,pitalic_n , italic_r , italic_p, if one also has information on the covariance matrix of μ𝜇\muitalic_μ, then perhaps, one can obtain a better bound that can depend on the covariance matrix of μ𝜇\muitalic_μ. Before we go into further discussions on this, we mention some simple connections between the max-sliced 2-Wasserstein distance and sample covariance matrices. The literature on sample covariance matrices gives us important intuition regarding the convergence in the max-sliced 2-Wasserstein distance.

If μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with dx22𝑑μ(x)<subscriptsuperscript𝑑superscriptsubscriptnorm𝑥22differential-d𝜇𝑥\int_{\mathbb{R}^{d}}\|x\|_{2}^{2}\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) < ∞, then the max-sliced 2-Wasserstein distance between μ𝜇\muitalic_μ and δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (the probability measure with an atom of mass 1111 at the origin) is equal to

W2,1(μ,δ0)=supv2=1(d|x,v|2𝑑μ(x))12=supv2=1Σv,v12=Σop12,subscript𝑊21𝜇subscript𝛿0subscriptsupremumsubscriptnorm𝑣21superscriptsubscriptsuperscript𝑑superscript𝑥𝑣2differential-d𝜇𝑥12subscriptsupremumsubscriptnorm𝑣21superscriptΣ𝑣𝑣12superscriptsubscriptnormΣop12W_{2,1}(\mu,\delta_{0})=\sup_{\|v\|_{2}=1}\left(\int_{\mathbb{R}^{d}}|\langle x% ,v\rangle|^{2}\,d\mu(x)\right)^{\frac{1}{2}}=\sup_{\|v\|_{2}=1}\langle\Sigma v% ,v\rangle^{\frac{1}{2}}=\|\Sigma\|_{\mathrm{op}}^{\frac{1}{2}},italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = roman_sup start_POSTSUBSCRIPT ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ( ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ⟨ italic_x , italic_v ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = roman_sup start_POSTSUBSCRIPT ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ⟨ roman_Σ italic_v , italic_v ⟩ start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where Σ=dxxT𝑑μ(x)Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥\Sigma=\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)roman_Σ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) is a d×d𝑑𝑑d\times ditalic_d × italic_d matrix and op\|\,\|_{\mathrm{op}}∥ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT denotes the operator norm. Thus, for X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we have

W2,1(μ,1ni=1nδXi)subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle W_{2,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) \displaystyle\geq W2,1(1ni=1nδXi,δ0)W2,1(μ,δ0)subscript𝑊211𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿0subscript𝑊21𝜇subscript𝛿0\displaystyle W_{2,1}\left(\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}},\delta_{0}% \right)-W_{2,1}(\mu,\delta_{0})italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
=\displaystyle== 1ni=1nXiXiTop12Σop12.superscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12superscriptsubscriptnormΣop12\displaystyle\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{% op}}^{\frac{1}{2}}-\|\Sigma\|_{\mathrm{op}}^{\frac{1}{2}}.∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

So in order for W2,1(μ,1ni=1nδXi)subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖W_{2,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) to be small, it is necessary that 1ni=1nXiXiTopsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT cannot be too much larger than ΣopsubscriptnormΣop\|\Sigma\|_{\mathrm{op}}∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT.

Given that W2,1(μ,δ0)=Σop12subscript𝑊21𝜇subscript𝛿0superscriptsubscriptnormΣop12W_{2,1}(\mu,\delta_{0})=\|\Sigma\|_{\mathrm{op}}^{\frac{1}{2}}italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, the quantity W2,1(μ,1ni=1nδXi)subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖W_{2,1}(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}})italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) should be assessed relative to Σop12superscriptsubscriptnormΣop12\|\Sigma\|_{\mathrm{op}}^{\frac{1}{2}}∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT.

Problem 2.

Suppose that μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let Σ=dxxT𝑑μ(x)Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥\Sigma=\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)roman_Σ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ). How many i.i.d. samples X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of μ𝜇\muitalic_μ are needed to make Σop12𝔼W2,1(μ,1ni=1nδXi)superscriptsubscriptnormΣop12𝔼subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle\|\Sigma\|_{\mathrm{op}}^{-\frac{1}{2}}\cdot\mathbb{E}W_{2,1}% \left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ⋅ blackboard_E italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) small?

In [4, Theorem 1.3], it was shown that if μ𝜇\muitalic_μ is centered and isotropic (i.e., Σ=IΣ𝐼\Sigma=Iroman_Σ = italic_I) with supvd,v2=1(𝔼|X,v|q)1qLsubscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21superscript𝔼superscript𝑋𝑣𝑞1𝑞𝐿\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}=1}(\mathbb{E}|\langle X,v\rangle|^{q})^{% \frac{1}{q}}\leq Lroman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ( blackboard_E | ⟨ italic_X , italic_v ⟩ | start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ≤ italic_L where q>4𝑞4q>4italic_q > 4, then with high probability,

(1.4) W2,1(μ,1ni=1nδXi)C(q,L)[1ni=1nXiXiTIop12+(dn)14],subscript𝑊21𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑞𝐿delimited-[]superscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇𝐼op12superscript𝑑𝑛14W_{2,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq C(q,L)% \left[\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}-I\right\|_{\mathrm{op}}^{% \frac{1}{2}}+\left(\frac{d}{n}\right)^{\frac{1}{4}}\right],italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ( italic_q , italic_L ) [ ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_I ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + ( divide start_ARG italic_d end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ] ,

where C(q,L)1𝐶𝑞𝐿1C(q,L)\geq 1italic_C ( italic_q , italic_L ) ≥ 1 is a constant that depends only on q𝑞qitalic_q and L𝐿Litalic_L. By [35], the sample covariance error term 1ni=1nXiXiTIop12superscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇𝐼op12\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}-I\right\|_{\mathrm{op}}^{\frac{% 1}{2}}∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_I ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT is of order (dn)14superscript𝑑𝑛14\left(\frac{d}{n}\right)^{\frac{1}{4}}( divide start_ARG italic_d end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT with high probability. Thus, under the assumptions mentioned above, n=O(d)𝑛𝑂𝑑n=O(d)italic_n = italic_O ( italic_d ) suffices in Problem 2.

The literature on sample covariance matrices (see e.g., [32, 38, 36]) suggests that for a general isotropic probability measure μ𝜇\muitalic_μ supported on {xd:x2Cd}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝐶𝑑\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq C\sqrt{d}\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_C square-root start_ARG italic_d end_ARG } but without the assumption supvd,v2=1(𝔼|X,v|q)1qLsubscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21superscript𝔼superscript𝑋𝑣𝑞1𝑞𝐿\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}=1}(\mathbb{E}|\langle X,v\rangle|^{q})^{% \frac{1}{q}}\leq Lroman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ( blackboard_E | ⟨ italic_X , italic_v ⟩ | start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_q end_ARG end_POSTSUPERSCRIPT ≤ italic_L, the number of samples n=O(dlogd)𝑛𝑂𝑑𝑑n=O(d\log d)italic_n = italic_O ( italic_d roman_log italic_d ) should suffice in Problem 2. More generally, if μ𝜇\muitalic_μ is supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r } but not necessarily isotropic, n=O(r2Σoplogr2Σop)𝑛𝑂superscript𝑟2subscriptnormΣopsuperscript𝑟2subscriptnormΣopn=O(\frac{r^{2}}{\|\Sigma\|_{\mathrm{op}}}\log\frac{r^{2}}{\|\Sigma\|_{\mathrm% {op}}})italic_n = italic_O ( divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG roman_log divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG ) should suffice in Problem 2.

In this paper, we show that these are indeed true for symmetric μ𝜇\muitalic_μ and its symmetrized empirical distribution. A probability measure μ𝜇\muitalic_μ on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is symmetric if μ(A)=μ(A)𝜇𝐴𝜇𝐴\mu(A)=\mu(-A)italic_μ ( italic_A ) = italic_μ ( - italic_A ) for all measurable Ad𝐴superscript𝑑A\subset\mathbb{R}^{d}italic_A ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Theorem 1.4.

Let r>0𝑟0r>0italic_r > 0. Suppose that μ𝜇\muitalic_μ is a symmetric probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random vectors sampled according to μ𝜇\muitalic_μ. Then

𝔼[W2,1(μ,12ni=1n(δXi+δXi))2]CΣop(r2lnnnΣop+r2lnnnΣop),𝔼delimited-[]subscript𝑊21superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2𝐶subscriptnormΣopsuperscript𝑟2𝑛𝑛subscriptnormΣopsuperscript𝑟2𝑛𝑛subscriptnormΣop\mathbb{E}\left[W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}\right]\leq C\|\Sigma\|_{\mathrm{op}}\left(\frac{r^% {2}\ln n}{n\|\Sigma\|_{\mathrm{op}}}+\sqrt{\frac{r^{2}\ln n}{n\|\Sigma\|_{% \mathrm{op}}}}\,\right),blackboard_E [ italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_C ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ( divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG + square-root start_ARG divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG end_ARG ) ,

where Σ=dxxT𝑑μ(x)Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥\displaystyle\Sigma=\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)roman_Σ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) and C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

The lnn𝑛\ln nroman_ln italic_n factors in Theorem 1.4 cannot always be removed. Indeed, consider the probability measure μ𝜇\muitalic_μ uniformly distributed on the 2d2𝑑2d2 italic_d points ±de1,,±dedplus-or-minus𝑑subscript𝑒1plus-or-minus𝑑subscript𝑒𝑑\pm\sqrt{d}\,e_{1},\ldots,\pm\sqrt{d}\,e_{d}± square-root start_ARG italic_d end_ARG italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , ± square-root start_ARG italic_d end_ARG italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, where {e1,,ed}subscript𝑒1subscript𝑒𝑑\{e_{1},\ldots,e_{d}\}{ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } is the unit vector basis for dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Then by (1.2), we have

W2,1(μ,12ni=1n(δXi+δXi))1ni=1nXiXiTop12Σ12,subscript𝑊21𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖superscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12superscriptnormΣ12W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+\delta_{-X_{i}})% \right)\geq\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}% }^{\frac{1}{2}}-\|\Sigma\|^{\frac{1}{2}},italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ≥ ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - ∥ roman_Σ ∥ start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where Σ=dxxT𝑑μ(x)=IΣsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥𝐼\Sigma=\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)=Iroman_Σ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) = italic_I. If we view e1,,edsubscript𝑒1subscript𝑒𝑑e_{1},\ldots,e_{d}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT as d𝑑ditalic_d bins and each XiXiTsubscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇X_{i}X_{i}^{T}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT as a ball into a bin, then 1di=1nXiXiT1𝑑normsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇\frac{1}{d}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ is the maximum number of balls in a bin after n𝑛nitalic_n balls are thrown into d𝑑ditalic_d bins. So by [30, Theorem 1], when dpolylog(d)ndlogd𝑑polylog𝑑𝑛much-less-than𝑑𝑑\frac{d}{\mathrm{polylog}(d)}\leq n\ll d\log ddivide start_ARG italic_d end_ARG start_ARG roman_polylog ( italic_d ) end_ARG ≤ italic_n ≪ italic_d roman_log italic_d,

𝔼1ni=1nXiXiTop12c(dnlogdlogdlogdn)12,𝔼superscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12𝑐superscript𝑑𝑛𝑑𝑑𝑑𝑛12\mathbb{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}% ^{\frac{1}{2}}\geq c\left(\frac{d}{n}\cdot\frac{\log d}{\log\frac{d\log d}{n}}% \right)^{\frac{1}{2}},blackboard_E ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≥ italic_c ( divide start_ARG italic_d end_ARG start_ARG italic_n end_ARG ⋅ divide start_ARG roman_log italic_d end_ARG start_ARG roman_log divide start_ARG italic_d roman_log italic_d end_ARG start_ARG italic_n end_ARG end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where c>0𝑐0c>0italic_c > 0 is a universal constant. Thus, in this example, the lnn𝑛\ln nroman_ln italic_n factors in Theorem 1.4 cannot be removed.

The following lower bound result shows that the upper bound in Theorem 1.4 is sharp for every covariance matrix ΣΣ\Sigmaroman_Σ up to the lnn𝑛\ln nroman_ln italic_n factor.

Proposition 1.5.

Let ΣΣ\Sigmaroman_Σ be a d×d𝑑𝑑d\times ditalic_d × italic_d positive semidefinite matrix such that Σop12Tr(Σ)subscriptnormΣop12TrΣ\|\Sigma\|_{\mathrm{op}}\leq\frac{1}{2}\mathrm{Tr}(\Sigma)∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr ( roman_Σ ). Then there exists a symmetric probability measure μ𝜇\muitalic_μ on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT supported on {xd:x22=Tr(Σ)}conditional-set𝑥superscript𝑑superscriptsubscriptnorm𝑥22TrΣ\{x\in\mathbb{R}^{d}:\,\|x\|_{2}^{2}=\mathrm{Tr}(\Sigma)\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_Tr ( roman_Σ ) } such that dxxT𝑑μ(x)=Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥Σ\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)=\Sigma∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) = roman_Σ and for every n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

𝔼[W2,1(μ,12ni=1n(δXi+δXi))2]116Σop(Tr(Σ)nΣop+Tr(Σ)nΣop),𝔼delimited-[]subscript𝑊21superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2116subscriptnormΣopTrΣ𝑛subscriptnormΣopTrΣ𝑛subscriptnormΣop\mathbb{E}\left[W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}\right]\geq\frac{1}{16}\|\Sigma\|_{\mathrm{op}}% \left(\frac{\mathrm{Tr}(\Sigma)}{n\|\Sigma\|_{\mathrm{op}}}+\sqrt{\frac{% \mathrm{Tr}(\Sigma)}{n\|\Sigma\|_{\mathrm{op}}}}\,\right),blackboard_E [ italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≥ divide start_ARG 1 end_ARG start_ARG 16 end_ARG ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ( divide start_ARG roman_Tr ( roman_Σ ) end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG + square-root start_ARG divide start_ARG roman_Tr ( roman_Σ ) end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG end_ARG ) ,

where X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. random vectors sampled according to μ𝜇\muitalic_μ.

1.3. Some definitions

Throughout this paper, unless specified otherwise, we always use the Euclidean metric 2\|\,\|_{2}∥ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. If f:Λ:𝑓Λf:\Lambda\to\mathbb{R}italic_f : roman_Λ → blackboard_R is a bounded function, then f:=supxΛ|f(x)|assignsubscriptnorm𝑓subscriptsupremum𝑥Λ𝑓𝑥\|f\|_{\infty}:=\sup_{x\in\Lambda}|f(x)|∥ italic_f ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Λ end_POSTSUBSCRIPT | italic_f ( italic_x ) |. A function f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R is 1111-Lipschitz function if |f(x)f(y)|xy2𝑓𝑥𝑓𝑦subscriptnorm𝑥𝑦2|f(x)-f(y)|\leq\|x-y\|_{2}| italic_f ( italic_x ) - italic_f ( italic_y ) | ≤ ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for all x,ys𝑥𝑦superscript𝑠x,y\in\mathbb{R}^{s}italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. The operator norm (or equivalently the largest singular value) of a matrix A𝐴Aitalic_A is denoted by Aopsubscriptnorm𝐴op\|A\|_{\mathrm{op}}∥ italic_A ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT

If (T,ρ)𝑇𝜌(T,\rho)( italic_T , italic_ρ ) is a metric space and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, then the covering number N(T,ρ,ϵ)𝑁𝑇𝜌italic-ϵN(T,\rho,\epsilon)italic_N ( italic_T , italic_ρ , italic_ϵ ) is the smallest size of ST𝑆𝑇S\subset Titalic_S ⊂ italic_T for which every element of T𝑇Titalic_T has distance at most ϵitalic-ϵ\epsilonitalic_ϵ from an element of S𝑆Sitalic_S. The packing number Npack(T,ρ,ϵ)subscript𝑁pack𝑇𝜌italic-ϵN_{\mathrm{pack}}(T,\rho,\epsilon)italic_N start_POSTSUBSCRIPT roman_pack end_POSTSUBSCRIPT ( italic_T , italic_ρ , italic_ϵ ) is the largest size of ST𝑆𝑇S\subset Titalic_S ⊂ italic_T for which all elements of S𝑆Sitalic_S have distance more than ϵitalic-ϵ\epsilonitalic_ϵ away from each other. We always have N(T,ρ,ϵ)Npack(T,ρ,ϵ)N(T,ρ,ϵ2)𝑁𝑇𝜌italic-ϵsubscript𝑁pack𝑇𝜌italic-ϵ𝑁𝑇𝜌italic-ϵ2N(T,\rho,\epsilon)\leq N_{\mathrm{pack}}(T,\rho,\epsilon)\leq N(T,\rho,\frac{% \epsilon}{2})italic_N ( italic_T , italic_ρ , italic_ϵ ) ≤ italic_N start_POSTSUBSCRIPT roman_pack end_POSTSUBSCRIPT ( italic_T , italic_ρ , italic_ϵ ) ≤ italic_N ( italic_T , italic_ρ , divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ).

If E𝐸Eitalic_E is a Banach space, then the unit ball {xE:x1}conditional-set𝑥𝐸norm𝑥1\{x\in E:\,\|x\|\leq 1\}{ italic_x ∈ italic_E : ∥ italic_x ∥ ≤ 1 } of E𝐸Eitalic_E is denoted by BEsubscript𝐵𝐸B_{E}italic_B start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. The dual space of all bounded linear functionals v:E:superscript𝑣𝐸v^{*}:E\to\mathbb{R}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : italic_E → blackboard_R is denoted by Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Pushforward measure: If μ𝜇\muitalic_μ is a probability measure on a separable Banach space E𝐸Eitalic_E and Q:Es:𝑄𝐸superscript𝑠Q:E\to\mathbb{R}^{s}italic_Q : italic_E → blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT is a map, then Q#μsubscript𝑄#𝜇Q_{\#}\muitalic_Q start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ is the pushforward measure of μ𝜇\muitalic_μ by Q𝑄Qitalic_Q, i.e., if X𝑋Xitalic_X is a random element of E𝐸Eitalic_E with distribution μ𝜇\muitalic_μ, then Q(X)𝑄𝑋Q(X)italic_Q ( italic_X ) has distribution Q#μsubscript𝑄#𝜇Q_{\#}\muitalic_Q start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ. In particular, Q#μsubscript𝑄#𝜇Q_{\#}\muitalic_Q start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ is a probability measure on ssuperscript𝑠\mathbb{R}^{s}blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT.

Classical Wasserstein distance: If μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are probability measures on E𝐸Eitalic_E and p1𝑝1p\geq 1italic_p ≥ 1, then the p𝑝pitalic_p-Wasserstein distance between μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is

Wp(μ1,μ2)=infγ(E×Exyp𝑑γ(x,y))1p,subscript𝑊𝑝subscript𝜇1subscript𝜇2subscriptinfimum𝛾superscriptsubscript𝐸𝐸superscriptnorm𝑥𝑦𝑝differential-d𝛾𝑥𝑦1𝑝W_{p}(\mu_{1},\mu_{2})=\inf_{\gamma}\left(\int_{E\times E}\|x-y\|^{p}\,d\gamma% (x,y)\right)^{\frac{1}{p}},italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∫ start_POSTSUBSCRIPT italic_E × italic_E end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_d italic_γ ( italic_x , italic_y ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT ,

where the infimum is over all distributions γ𝛾\gammaitalic_γ on E×E𝐸𝐸E\times Eitalic_E × italic_E with μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT being its marginal distributions for its first and second components.

Max-sliced and projection robust Wasserstein distances: If μ𝜇\muitalic_μ and ν𝜈\nuitalic_ν are probability measures on E𝐸Eitalic_E and p1𝑝1p\geq 1italic_p ≥ 1, s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, then

Wp,s(μ1,μ2)=supQWp(Q#μ1,Q#μ2),subscript𝑊𝑝𝑠subscript𝜇1subscript𝜇2subscriptsupremum𝑄subscript𝑊𝑝subscript𝑄#subscript𝜇1subscript𝑄#subscript𝜇2W_{p,s}(\mu_{1},\mu_{2})=\sup_{Q}W_{p}(Q_{\#}\mu_{1},Q_{\#}\mu_{2}),italic_W start_POSTSUBSCRIPT italic_p , italic_s end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_sup start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

where the supremum is over all Q:Es:𝑄𝐸superscript𝑠Q:E\to\mathbb{R}^{s}italic_Q : italic_E → blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT of the form Qx=(v1(x),,vs(x))𝑄𝑥superscriptsubscript𝑣1𝑥superscriptsubscript𝑣𝑠𝑥Qx=(v_{1}^{*}(x),\ldots,v_{s}^{*}(x))italic_Q italic_x = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) ), for xE𝑥𝐸x\in Eitalic_x ∈ italic_E, with v1,,vssuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠v_{1}^{*},\ldots,v_{s}^{*}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in the unit ball BEsubscript𝐵superscript𝐸B_{E^{*}}italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Here we use the Euclidean distance 2\|\,\|_{2}∥ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on ssuperscript𝑠\mathbb{R}^{s}blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT to define the Wasserstein distance Wpsubscript𝑊𝑝W_{p}italic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT on the right hand side.

When p=1𝑝1p=1italic_p = 1, we have

W1,s(μ1,μ2)subscript𝑊1𝑠subscript𝜇1subscript𝜇2\displaystyle W_{1,s}(\mu_{1},\mu_{2})italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
=\displaystyle== supv1,,vsBEf is 1-Lipschitz|Ef(v1(x),,vs(x))𝑑μ1(x)Ef(v1(x),,vs(x))𝑑μ(x)|,subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓 is 1-Lipschitzsubscript𝐸𝑓superscriptsubscript𝑣1𝑥superscriptsubscript𝑣𝑠𝑥differential-dsubscript𝜇1𝑥subscript𝐸𝑓superscriptsubscript𝑣1𝑥superscriptsubscript𝑣𝑠𝑥differential-d𝜇𝑥\displaystyle\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}% \\ f\text{ is 1-Lipschitz}\end{subarray}}\left|\int_{E}f(v_{1}^{*}(x),\ldots,v_{s% }^{*}(x))\,d\mu_{1}(x)-\int_{E}f(v_{1}^{*}(x),\ldots,v_{s}^{*}(x))\,d\mu(x)% \right|,roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f is 1-Lipschitz end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) ) italic_d italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) - ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) ) italic_d italic_μ ( italic_x ) | ,

where the supremum is over all v1,,vsBEsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and all the 1111-Lipschitz functions f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R.

1.4. Organization of this paper

In the rest of this paper, we prove the results stated in this introduction section.

In Section 2, we prove Theorem 1.1 and Theorem 1.2. The upper bound parts of Theorem 1.1 and Theorem 1.2 are contained in Corollary 2.8 and Corollary 2.9, respectively. The lower bound parts of Theorem 1.1 and Theorem 1.2 are stated as Corollary 2.11 and Proposition 2.10, respectively.

In Section 3, we prove Theorem 1.4 and Proposition 1.5. Theorem 1.4 is restated as Theorem 3.3. Proposition 1.5 is restated as Proposition 3.4.

2. Max-sliced 1-Wasserstein distance

In this section, we first derive a general upper bound result Theorem 2.7 (which we obtain at a greater generality of W1,ssubscript𝑊1𝑠W_{1,s}italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT) for the expected max-sliced 1111-Wasserstein distance between a probability measure on a Banach space and its empirical distribution. From this result, Corollary 2.8 and Corollary 2.9 follow as consequences. These give the upper bound parts of Theorem 1.1 and 1.2, respectively. Lower bound results are proved at the end of this section.

To prove Theorem 2.7, we use Gaussian symmetrization to reduce the problem of bounding the expected max-sliced 1-Wasserstein distance to bounding the expected supremum of a Gaussian process. To bound this expected supremum, we use Talagrand’s majorizing measure theorem. We bound the metric induced by the Gaussian process by the product metric of (1) a metric on some function space (which is locally an \|\,\|_{\infty}∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT metric) and (2) a Hilbert space metric. Since Talagrand’s γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT quantity of the product metric space is bounded by 3 times the sum of the γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for each metric space, it suffices to bound the γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for each of these two metric spaces. To bound the γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for the first metric space, we use the Dudley’s entropy integral. As for the second metric space, since it is a Hilbert space metric, the γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for that metric space is equivalent to the supremum of some Gaussian process which, in fact, coincides with the norm of a Gaussian sum.

Throughout this section,

Nk={22k,k11,k=0.subscript𝑁𝑘casessuperscript2superscript2𝑘𝑘11𝑘0N_{k}=\begin{cases}2^{2^{k}},&k\geq 1\\ 1,&k=0\end{cases}.italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { start_ROW start_CELL 2 start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , end_CELL start_CELL italic_k ≥ 1 end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL italic_k = 0 end_CELL end_ROW .

The following notion was introduced by Talagrand [34] (see also [40, Chapter 8] and [37, Chapter 6]). For a given metric space (T,ρ)𝑇𝜌(T,\rho)( italic_T , italic_ρ ), define

(2.1) γ2(T,ρ)=infadmissibleT0,T1,suptTk=02k2ρ(t,Tk),subscript𝛾2𝑇𝜌subscriptinfimumadmissiblesubscript𝑇0subscript𝑇1subscriptsupremum𝑡𝑇superscriptsubscript𝑘0superscript2𝑘2𝜌𝑡subscript𝑇𝑘\gamma_{2}(T,\rho)=\inf_{\mathrm{admissible}\,T_{0},T_{1},\ldots}\;\sup_{t\in T% }\sum_{k=0}^{\infty}2^{\frac{k}{2}}\rho(t,T_{k}),italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ ) = roman_inf start_POSTSUBSCRIPT roman_admissible italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ ( italic_t , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ,

where admissible means that T0,T1,Tsubscript𝑇0subscript𝑇1𝑇T_{0},T_{1},\ldots\subset Titalic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ⊂ italic_T with |Tk|Nksubscript𝑇𝑘subscript𝑁𝑘|T_{k}|\leq N_{k}| italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ≤ italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all k0𝑘0k\geq 0italic_k ≥ 0. Also ρ(t,Tk)=inftkTkρ(t,tk)𝜌𝑡subscript𝑇𝑘subscriptinfimumsubscript𝑡𝑘subscript𝑇𝑘𝜌𝑡subscript𝑡𝑘\rho(t,T_{k})=\inf_{t_{k}\in T_{k}}\rho(t,t_{k})italic_ρ ( italic_t , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ρ ( italic_t , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

Talagrand’s majorizing measure theorem states that if (Xt)tTsubscriptsubscript𝑋𝑡𝑡𝑇(X_{t})_{t\in T}( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT is a mean zero Gaussian process, then letting ρ(t,s)=(𝔼|XtXs|2)12𝜌𝑡𝑠superscript𝔼superscriptsubscript𝑋𝑡subscript𝑋𝑠212\rho(t,s)=(\mathbb{E}|X_{t}-X_{s}|^{2})^{\frac{1}{2}}italic_ρ ( italic_t , italic_s ) = ( blackboard_E | italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, we have

(2.2) cγ2(T,ρ)𝔼suptTXtCγ2(T,ρ),𝑐subscript𝛾2𝑇𝜌𝔼subscriptsupremum𝑡𝑇subscript𝑋𝑡𝐶subscript𝛾2𝑇𝜌c\gamma_{2}(T,\rho)\leq\mathbb{E}\sup_{t\in T}X_{t}\leq C\gamma_{2}(T,\rho),italic_c italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ ) ≤ blackboard_E roman_sup start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_C italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ ) ,

where C,c>0𝐶𝑐0C,c>0italic_C , italic_c > 0 are universal constants.

Lemma 2.1.

Let (T,ρT)𝑇subscript𝜌𝑇(T,\rho_{T})( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) and (Z,ρZ)𝑍subscript𝜌𝑍(Z,\rho_{Z})( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) be metric spaces. Define the metric ρT×ρZsubscript𝜌𝑇subscript𝜌𝑍\rho_{T}\times\rho_{Z}italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT on T×Z𝑇𝑍T\times Zitalic_T × italic_Z by

(ρT×ρZ)((t1,z1),(t2,z2))=ρT(t1,t2)+ρZ(z1,z2).subscript𝜌𝑇subscript𝜌𝑍subscript𝑡1subscript𝑧1subscript𝑡2subscript𝑧2subscript𝜌𝑇subscript𝑡1subscript𝑡2subscript𝜌𝑍subscript𝑧1subscript𝑧2(\rho_{T}\times\rho_{Z})((t_{1},z_{1}),(t_{2},z_{2}))=\rho_{T}(t_{1},t_{2})+% \rho_{Z}(z_{1},z_{2}).( italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ( ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) = italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) .

Then

γ2(T×Z,ρT×ρZ)3γ2(T,ρT)+3γ2(Z,ρZ).subscript𝛾2𝑇𝑍subscript𝜌𝑇subscript𝜌𝑍3subscript𝛾2𝑇subscript𝜌𝑇3subscript𝛾2𝑍subscript𝜌𝑍\gamma_{2}(T\times Z,\rho_{T}\times\rho_{Z})\leq 3\gamma_{2}(T,\rho_{T})+3% \gamma_{2}(Z,\rho_{Z}).italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T × italic_Z , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ≤ 3 italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + 3 italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) .
Proof.

Fix ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Let T0,T1,Tsubscript𝑇0subscript𝑇1𝑇T_{0},T_{1},\ldots\subset Titalic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ⊂ italic_T be an admissible sequence that almost attains the infimum in (2.1), i.e.,

suptTk=02k2ρT(t,Tk)γ2(T,ρT)+ϵ.subscriptsupremum𝑡𝑇superscriptsubscript𝑘0superscript2𝑘2subscript𝜌𝑇𝑡subscript𝑇𝑘subscript𝛾2𝑇subscript𝜌𝑇italic-ϵ\sup_{t\in T}\sum_{k=0}^{\infty}2^{\frac{k}{2}}\rho_{T}(t,T_{k})\leq\gamma_{2}% (T,\rho_{T})+\epsilon.roman_sup start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_ϵ .

Similarly, let Z0,Z1,Zsubscript𝑍0subscript𝑍1𝑍Z_{0},Z_{1},\ldots\subset Zitalic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ⊂ italic_Z be an admissible sequence such that

supzZk=02k2ρZ(z,Zk)γ2(Z,ρZ)+ϵ.subscriptsupremum𝑧𝑍superscriptsubscript𝑘0superscript2𝑘2subscript𝜌𝑍𝑧subscript𝑍𝑘subscript𝛾2𝑍subscript𝜌𝑍italic-ϵ\sup_{z\in Z}\sum_{k=0}^{\infty}2^{\frac{k}{2}}\rho_{Z}(z,Z_{k})\leq\gamma_{2}% (Z,\rho_{Z})+\epsilon.roman_sup start_POSTSUBSCRIPT italic_z ∈ italic_Z end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) + italic_ϵ .

For notational convenience, let T1=T0subscript𝑇1subscript𝑇0T_{-1}=T_{0}italic_T start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Z1=Z0subscript𝑍1subscript𝑍0Z_{-1}=Z_{0}italic_Z start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Observe that the sequence (Tk1×Zk1)k0subscriptsubscript𝑇𝑘1subscript𝑍𝑘1𝑘0(T_{k-1}\times Z_{k-1})_{k\geq 0}( italic_T start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT × italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT is admissible. For all tT𝑡𝑇t\in Titalic_t ∈ italic_T and zZ𝑧𝑍z\in Zitalic_z ∈ italic_Z, we have

k=02k2(ρT×ρZ)((t,z),Tk1×Zk1)superscriptsubscript𝑘0superscript2𝑘2subscript𝜌𝑇subscript𝜌𝑍𝑡𝑧subscript𝑇𝑘1subscript𝑍𝑘1\displaystyle\sum_{k=0}^{\infty}2^{\frac{k}{2}}(\rho_{T}\times\rho_{Z})((t,z),% T_{k-1}\times Z_{k-1})∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ( ( italic_t , italic_z ) , italic_T start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT × italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) =\displaystyle== k=02k2[ρT(t,Tk1)+ρZ(z,Zk1)]superscriptsubscript𝑘0superscript2𝑘2delimited-[]subscript𝜌𝑇𝑡subscript𝑇𝑘1subscript𝜌𝑍𝑧subscript𝑍𝑘1\displaystyle\sum_{k=0}^{\infty}2^{\frac{k}{2}}[\rho_{T}(t,T_{k-1})+\rho_{Z}(z% ,Z_{k-1})]∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT [ italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t , italic_T start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z , italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ]
=\displaystyle== k=12k+12ρT(t,Tk)+k=12k+12ρZ(z,Zk)superscriptsubscript𝑘1superscript2𝑘12subscript𝜌𝑇𝑡subscript𝑇𝑘superscriptsubscript𝑘1superscript2𝑘12subscript𝜌𝑍𝑧subscript𝑍𝑘\displaystyle\sum_{k=-1}^{\infty}2^{\frac{k+1}{2}}\rho_{T}(t,T_{k})+\sum_{k=-1% }^{\infty}2^{\frac{k+1}{2}}\rho_{Z}(z,Z_{k})∑ start_POSTSUBSCRIPT italic_k = - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k + 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_k = - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k + 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
\displaystyle\leq 3k=02k2ρT(t,Tk)+3k=12k2ρZ(z,Zk)3superscriptsubscript𝑘0superscript2𝑘2subscript𝜌𝑇𝑡subscript𝑇𝑘3superscriptsubscript𝑘1superscript2𝑘2subscript𝜌𝑍𝑧subscript𝑍𝑘\displaystyle 3\sum_{k=0}^{\infty}2^{\frac{k}{2}}\rho_{T}(t,T_{k})+3\sum_{k=-1% }^{\infty}2^{\frac{k}{2}}\rho_{Z}(z,Z_{k})3 ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + 3 ∑ start_POSTSUBSCRIPT italic_k = - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT divide start_ARG italic_k end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z , italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
\displaystyle\leq 3[γ2(T,ρT)+ϵ]+3[γ2(Z,ρZ)+ϵ].3delimited-[]subscript𝛾2𝑇subscript𝜌𝑇italic-ϵ3delimited-[]subscript𝛾2𝑍subscript𝜌𝑍italic-ϵ\displaystyle 3[\gamma_{2}(T,\rho_{T})+\epsilon]+3[\gamma_{2}(Z,\rho_{Z})+% \epsilon].3 [ italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_ϵ ] + 3 [ italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) + italic_ϵ ] .

So

γ2((T,Z),ρT×ρZ)3γ2(T,ρT)+3γ2(Z,ρZ)+6ϵ.subscript𝛾2𝑇𝑍subscript𝜌𝑇subscript𝜌𝑍3subscript𝛾2𝑇subscript𝜌𝑇3subscript𝛾2𝑍subscript𝜌𝑍6italic-ϵ\gamma_{2}((T,Z),\rho_{T}\times\rho_{Z})\leq 3\gamma_{2}(T,\rho_{T})+3\gamma_{% 2}(Z,\rho_{Z})+6\epsilon.italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ( italic_T , italic_Z ) , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ≤ 3 italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + 3 italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) + 6 italic_ϵ .

Since this holds for all ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the result follows. ∎

Lemma 2.2 ([34], page 12-13).

Let (T,ρT)𝑇subscript𝜌𝑇(T,\rho_{T})( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) be a metric space. Then

γ2(T,ρT)C0logN(T,ρT,ϵ)𝑑ϵ,subscript𝛾2𝑇subscript𝜌𝑇𝐶superscriptsubscript0𝑁𝑇subscript𝜌𝑇italic-ϵdifferential-ditalic-ϵ\gamma_{2}(T,\rho_{T})\leq C\int_{0}^{\infty}\sqrt{\log N(T,\rho_{T},\epsilon)% }\,d\epsilon,italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ≤ italic_C ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ ,

where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

Next we bound the covering number of a set of 1-Lipschitz functions with respect to a certain norm (see Lemma 2.4 below). This will be needed when we apply Lemma 2.2 to bound the γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT quantity for that metric space of 1111-Lipschitz functions. Before we do that, we need a basic result.

In the sequel, the readers who are interested in the max-sliced Wasserstein distances but not the general projection robust Wasserstein distances may take s=1𝑠1s=1italic_s = 1 in the rest of this paper. This will be enough to prove the main results mentioned in the introduction section.

Lemma 2.3.

Let a>0𝑎0a>0italic_a > 0. Let

D={h:[a,a]s|h is 1-Lipschitz and h(0)=0}.𝐷conditional-setsuperscript𝑎𝑎𝑠conditional is 1-Lipschitz and 00D=\{h:[-a,a]^{s}\to\mathbb{R}|\,h\text{ is 1-Lipschitz and }h(0)=0\}.italic_D = { italic_h : [ - italic_a , italic_a ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R | italic_h is 1-Lipschitz and italic_h ( 0 ) = 0 } .

Then

N(D,,ϵ)exp((Casϵ)s),N(D,\|\,\|_{\infty},\epsilon)\leq\exp\left(\left(\frac{Ca\sqrt{s}}{\epsilon}% \right)^{s}\right),italic_N ( italic_D , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , italic_ϵ ) ≤ roman_exp ( ( divide start_ARG italic_C italic_a square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) ,

for all ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

Proof.

The map h(xh(ax))maps-to𝑥𝑎𝑥h\to(x\mapsto h(ax))italic_h → ( italic_x ↦ italic_h ( italic_a italic_x ) ) defines an isometry from the metric space (D,)(D,\|\,\|_{\infty})( italic_D , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) to the metric space (D~,)(\widetilde{D},\|\,\|_{\infty})( over~ start_ARG italic_D end_ARG , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ), where

D~={h:[1,1]s|h is a-Lipschitz and h(0)=0}.~𝐷conditional-setsuperscript11𝑠conditional is 𝑎-Lipschitz and 00\widetilde{D}=\{h:[-1,1]^{s}\to\mathbb{R}|\,h\text{ is }a\text{-Lipschitz and % }h(0)=0\}.over~ start_ARG italic_D end_ARG = { italic_h : [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R | italic_h is italic_a -Lipschitz and italic_h ( 0 ) = 0 } .

So

N(D,,ϵ)=N(D~,,ϵ).N(D,\|\,\|_{\infty},\epsilon)=N(\widetilde{D},\|\,\|_{\infty},\epsilon).italic_N ( italic_D , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , italic_ϵ ) = italic_N ( over~ start_ARG italic_D end_ARG , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , italic_ϵ ) .

Since

D~D^:={h:[1,1]s:\displaystyle\widetilde{D}\subset\widehat{D}:=\{h:[-1,1]^{s}\to\mathbb{R}:\,over~ start_ARG italic_D end_ARG ⊂ over^ start_ARG italic_D end_ARG := { italic_h : [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R : h(0)=0 and00 and\displaystyle h(0)=0\text{ and }italic_h ( 0 ) = 0 and
|h(x)h(y)|asmaxi|xiyi|x,y[1,1]s}\displaystyle|h(x)-h(y)|\leq a\sqrt{s}\max_{i}|x_{i}-y_{i}|\;\forall x,y\in[-1% ,1]^{s}\}| italic_h ( italic_x ) - italic_h ( italic_y ) | ≤ italic_a square-root start_ARG italic_s end_ARG roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ∀ italic_x , italic_y ∈ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT }

and it is well known (see, e.g., [41, page 129]) that N(D^,,ϵ)exp((Casϵ)s)N(\widehat{D},\|\,\|_{\infty},\epsilon)\leq\exp((\frac{Ca\sqrt{s}}{\epsilon})^% {s})italic_N ( over^ start_ARG italic_D end_ARG , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , italic_ϵ ) ≤ roman_exp ( ( divide start_ARG italic_C italic_a square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ), it follows that

N(D~,,ϵ)N(D^,,ϵ2)exp((Casϵ)s).N(\widetilde{D},\|\,\|_{\infty},\epsilon)\leq N(\widehat{D},\|\,\|_{\infty},% \frac{\epsilon}{2})\leq\exp\left(\left(\frac{Ca\sqrt{s}}{\epsilon}\right)^{s}% \right).italic_N ( over~ start_ARG italic_D end_ARG , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , italic_ϵ ) ≤ italic_N ( over^ start_ARG italic_D end_ARG , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) ≤ roman_exp ( ( divide start_ARG italic_C italic_a square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) .

So the result follows. ∎

Lemma 2.4.

Let T𝑇Titalic_T be the set of all 1111-Lipschitz functions f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R with f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0. For 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1, define the norm (δ)\|\,\|_{(\delta)}∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT on T𝑇Titalic_T by

(2.3) f(δ)=supxs|f(x)|x21+δ+1.subscriptnorm𝑓𝛿subscriptsupremum𝑥superscript𝑠𝑓𝑥superscriptsubscriptnorm𝑥21𝛿1\|f\|_{(\delta)}=\sup_{x\in\mathbb{R}^{s}}\frac{|f(x)|}{\|x\|_{2}^{1+\delta}+1}.∥ italic_f ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG | italic_f ( italic_x ) | end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT + 1 end_ARG .

Then

logN(T,(δ),ϵ)(Csϵ)s1δ,\log N(T,\|\,\|_{(\delta)},\epsilon)\leq\left(\frac{C\sqrt{s}}{\epsilon}\right% )^{s}\frac{1}{\delta},roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) ≤ ( divide start_ARG italic_C square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG ,

for all ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 and 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1, where C1𝐶1C\geq 1italic_C ≥ 1 is a universal constant.

Proof.

Set Ω0={xs:x21}subscriptΩ0conditional-set𝑥superscript𝑠subscriptnorm𝑥21\Omega_{0}=\{x\in\mathbb{R}^{s}:\,\|x\|_{2}\leq 1\}roman_Ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 }, and for j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N, set

Ωj={xs: 2j1x22j}{0}.subscriptΩ𝑗conditional-set𝑥superscript𝑠superscript2𝑗1subscriptnorm𝑥2superscript2𝑗0\Omega_{j}=\{x\in\mathbb{R}^{s}:\,2^{j-1}\leq\|x\|_{2}\leq 2^{j}\}\cup\{0\}.roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT : 2 start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT ≤ ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } ∪ { 0 } .

Let

Aj={h:Ωj|h is 1-Lipschitz and h(0)=0}.subscript𝐴𝑗conditional-setsubscriptΩ𝑗conditional is 1-Lipschitz and 00A_{j}=\{h:\Omega_{j}\to\mathbb{R}|\,h\text{ is 1-Lipschitz and }h(0)=0\}.italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_h : roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → blackboard_R | italic_h is 1-Lipschitz and italic_h ( 0 ) = 0 } .

Define the following norm (δ),j\|\,\|_{(\delta),j}∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT on Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT:

h(δ),j=supxΩj|h(x)|x21+δ+1for hAj.formulae-sequencesubscriptnorm𝛿𝑗subscriptsupremum𝑥subscriptΩ𝑗𝑥superscriptsubscriptnorm𝑥21𝛿1for subscript𝐴𝑗\|h\|_{(\delta),j}=\sup_{x\in\Omega_{j}}\frac{|h(x)|}{\|x\|_{2}^{1+\delta}+1}% \quad\text{for }h\in A_{j}.∥ italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG | italic_h ( italic_x ) | end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT + 1 end_ARG for italic_h ∈ italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

For every fT𝑓𝑇f\in Titalic_f ∈ italic_T, observe that the restriction f|ΩjAjevaluated-at𝑓subscriptΩ𝑗subscript𝐴𝑗f|_{\Omega_{j}}\in A_{j}italic_f | start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and

f(δ)=supj0f|Ωj(δ),j.subscriptnorm𝑓𝛿evaluated-atsubscriptsupremum𝑗0subscriptdelimited-‖|𝑓subscriptΩ𝑗𝛿𝑗\|f\|_{(\delta)}=\sup_{j\geq 0}\left\|f|_{\Omega_{j}}\right\|_{(\delta),j}.∥ italic_f ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_j ≥ 0 end_POSTSUBSCRIPT ∥ italic_f | start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT .

Thus, (T,(δ))(T,\|\,\|_{(\delta)})( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ) can be identified as a metric subspace of the product metric space j=0(Aj,(δ))\prod_{j=0}^{\infty}(A_{j},\|\,\|_{(\delta)})∏ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ). So the ϵitalic-ϵ\epsilonitalic_ϵ-covering number of T𝑇Titalic_T is bounded by the ϵ2italic-ϵ2\frac{\epsilon}{2}divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG-covering number of jAjsubscriptproduct𝑗subscript𝐴𝑗\prod_{j\in\mathbb{N}}A_{j}∏ start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. So

(2.4) N(T,(δ),ϵ)j=0N(Aj,(δ),j,ϵ2).N(T,\|\,\|_{(\delta)},\epsilon)\leq\prod_{j=0}^{\infty}N(A_{j},\|\,\|_{(\delta% ),j},\frac{\epsilon}{2}).italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) ≤ ∏ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT , divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) .

Note that for all j1+1δlog21ϵ𝑗11𝛿subscript21italic-ϵj\geq 1+\frac{1}{\delta}\log_{2}\frac{1}{\epsilon}italic_j ≥ 1 + divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG and hAjsubscript𝐴𝑗h\in A_{j}italic_h ∈ italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we have

h(δ),j=supxΩj|h(x)|x21+δ+1supxΩj\{0}x2x21+δ+1supxΩj\{0}x2δ2δ(j1)ϵ.subscriptnorm𝛿𝑗subscriptsupremum𝑥subscriptΩ𝑗𝑥superscriptsubscriptnorm𝑥21𝛿1subscriptsupremum𝑥\subscriptΩ𝑗0subscriptnorm𝑥2superscriptsubscriptnorm𝑥21𝛿1subscriptsupremum𝑥\subscriptΩ𝑗0superscriptsubscriptnorm𝑥2𝛿superscript2𝛿𝑗1italic-ϵ\|h\|_{(\delta),j}=\sup_{x\in\Omega_{j}}\frac{|h(x)|}{\|x\|_{2}^{1+\delta}+1}% \leq\sup_{x\in\Omega_{j}\backslash\{0\}}\frac{\|x\|_{2}}{\|x\|_{2}^{1+\delta}+% 1}\leq\sup_{x\in\Omega_{j}\backslash\{0\}}\|x\|_{2}^{-\delta}\leq 2^{-\delta(j% -1)}\leq\epsilon.∥ italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG | italic_h ( italic_x ) | end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT + 1 end_ARG ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT \ { 0 } end_POSTSUBSCRIPT divide start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT + 1 end_ARG ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT \ { 0 } end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_δ end_POSTSUPERSCRIPT ≤ 2 start_POSTSUPERSCRIPT - italic_δ ( italic_j - 1 ) end_POSTSUPERSCRIPT ≤ italic_ϵ .

So N(Aj,,j,ϵ)=1N(A_{j},\|\,\|_{*,j},\epsilon)=1italic_N ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ∗ , italic_j end_POSTSUBSCRIPT , italic_ϵ ) = 1 for all j1+1δlog21ϵ𝑗11𝛿subscript21italic-ϵj\geq 1+\frac{1}{\delta}\log_{2}\frac{1}{\epsilon}italic_j ≥ 1 + divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG.

For j0𝑗0j\geq 0italic_j ≥ 0, let

(2.5) Dj={h:[2j,2j]s|h is 1-Lipschitz and h(0)=0}.subscript𝐷𝑗conditional-setsuperscriptsuperscript2𝑗superscript2𝑗𝑠conditional is 1-Lipschitz and 00D_{j}=\{h:[-2^{j},2^{j}]^{s}\to\mathbb{R}|\,h\text{ is 1-Lipschitz and }h(0)=0\}.italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_h : [ - 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R | italic_h is 1-Lipschitz and italic_h ( 0 ) = 0 } .

Note that Ωj[2j,2j]ssubscriptΩ𝑗superscriptsuperscript2𝑗superscript2𝑗𝑠\Omega_{j}\subset[-2^{j},2^{j}]^{s}roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊂ [ - 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Every function hAjsubscript𝐴𝑗h\in A_{j}italic_h ∈ italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be extended to a function τ(h)Dj𝜏subscript𝐷𝑗\tau(h)\in D_{j}italic_τ ( italic_h ) ∈ italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (by Kirszbraun extension), where

[τ(h)](x)=infyΩj(h(y)+xy2)for x[2j,2j]s.formulae-sequencedelimited-[]𝜏𝑥subscriptinfimum𝑦subscriptΩ𝑗𝑦subscriptnorm𝑥𝑦2for 𝑥superscriptsuperscript2𝑗superscript2𝑗𝑠[\tau(h)](x)=\inf_{y\in\Omega_{j}}(h(y)+\|x-y\|_{2})\quad\text{for }x\in[-2^{j% },2^{j}]^{s}.[ italic_τ ( italic_h ) ] ( italic_x ) = roman_inf start_POSTSUBSCRIPT italic_y ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_h ( italic_y ) + ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for italic_x ∈ [ - 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT .

(Note that τ(0)𝜏0\tau(0)italic_τ ( 0 ) is not the zero function, but [τ(h)](0)=0delimited-[]𝜏00[\tau(h)](0)=0[ italic_τ ( italic_h ) ] ( 0 ) = 0.) For all h1,h2Ajsubscript1subscript2subscript𝐴𝑗h_{1},h_{2}\in A_{j}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with j0𝑗0j\geq 0italic_j ≥ 0,

h1h2(δ),jsubscriptnormsubscript1subscript2𝛿𝑗\displaystyle\|h_{1}-h_{2}\|_{(\delta),j}∥ italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT =\displaystyle== supxΩj\{0}|h1(x)h2(x)|x21+δ+1subscriptsupremum𝑥\subscriptΩ𝑗0subscript1𝑥subscript2𝑥superscriptsubscriptnorm𝑥21𝛿1\displaystyle\sup_{x\in\Omega_{j}\backslash\{0\}}\frac{|h_{1}(x)-h_{2}(x)|}{\|% x\|_{2}^{1+\delta}+1}roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT \ { 0 } end_POSTSUBSCRIPT divide start_ARG | italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) - italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) | end_ARG start_ARG ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT + 1 end_ARG
\displaystyle\leq supxΩj\{0}|h1(x)h2(x)|2(j1)(1+δ)subscriptsupremum𝑥\subscriptΩ𝑗0subscript1𝑥subscript2𝑥superscript2𝑗11𝛿\displaystyle\sup_{x\in\Omega_{j}\backslash\{0\}}\frac{|h_{1}(x)-h_{2}(x)|}{2^% {(j-1)(1+\delta)}}roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT \ { 0 } end_POSTSUBSCRIPT divide start_ARG | italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) - italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) | end_ARG start_ARG 2 start_POSTSUPERSCRIPT ( italic_j - 1 ) ( 1 + italic_δ ) end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq 2(1j)(1+δ)supx[2j,2j]s|[τ(h1)](x)[τ(h2)](x)|superscript21𝑗1𝛿subscriptsupremum𝑥superscriptsuperscript2𝑗superscript2𝑗𝑠delimited-[]𝜏subscript1𝑥delimited-[]𝜏subscript2𝑥\displaystyle 2^{(1-j)(1+\delta)}\sup_{x\in[-2^{j},2^{j}]^{s}}|[\tau(h_{1})](x% )-[\tau(h_{2})](x)|2 start_POSTSUPERSCRIPT ( 1 - italic_j ) ( 1 + italic_δ ) end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ [ - 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | [ italic_τ ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ( italic_x ) - [ italic_τ ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] ( italic_x ) |
=\displaystyle== 2(1j)(1+δ)τ(h1)τ(h2),superscript21𝑗1𝛿subscriptnorm𝜏subscript1𝜏subscript2\displaystyle 2^{(1-j)(1+\delta)}\|\tau(h_{1})-\tau(h_{2})\|_{\infty},2 start_POSTSUPERSCRIPT ( 1 - italic_j ) ( 1 + italic_δ ) end_POSTSUPERSCRIPT ∥ italic_τ ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_τ ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ,

where h=supx[2j,2j]s|h(x)|subscriptnormsubscriptsupremum𝑥superscriptsuperscript2𝑗superscript2𝑗𝑠𝑥\displaystyle\|h\|_{\infty}=\sup_{x\in[-2^{j},2^{j}]^{s}}|h(x)|∥ italic_h ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ [ - 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_h ( italic_x ) | for hDjsubscript𝐷𝑗h\in D_{j}italic_h ∈ italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. So for all j0𝑗0j\geq 0italic_j ≥ 0,

N(Aj,(δ),j,ϵ)\displaystyle N(A_{j},\|\,\|_{(\delta),j},\epsilon)italic_N ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT , italic_ϵ ) \displaystyle\leq Npack(Aj,(δ),j,ϵ)\displaystyle N_{\mathrm{pack}}(A_{j},\|\,\|_{(\delta),j},\epsilon)italic_N start_POSTSUBSCRIPT roman_pack end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) , italic_j end_POSTSUBSCRIPT , italic_ϵ )
\displaystyle\leq Npack(Dj,,2(j1)(1+δ)ϵ)\displaystyle N_{\mathrm{pack}}(D_{j},\|\,\|_{\infty},2^{(j-1)(1+\delta)}\epsilon)italic_N start_POSTSUBSCRIPT roman_pack end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , 2 start_POSTSUPERSCRIPT ( italic_j - 1 ) ( 1 + italic_δ ) end_POSTSUPERSCRIPT italic_ϵ )
\displaystyle\leq exp((C2js2(j1)(1+δ)ϵ)s)=exp((Csϵ)s2s(1+δjδ)),superscript𝐶superscript2𝑗𝑠superscript2𝑗11𝛿italic-ϵ𝑠superscript𝐶𝑠italic-ϵ𝑠superscript2𝑠1𝛿𝑗𝛿\displaystyle\exp\left(\left(\frac{C\cdot 2^{j}\sqrt{s}}{2^{(j-1)(1+\delta)}% \epsilon}\right)^{s}\right)=\exp\left(\left(\frac{C\sqrt{s}}{\epsilon}\right)^% {s}2^{s(1+\delta-j\delta)}\right),roman_exp ( ( divide start_ARG italic_C ⋅ 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT square-root start_ARG italic_s end_ARG end_ARG start_ARG 2 start_POSTSUPERSCRIPT ( italic_j - 1 ) ( 1 + italic_δ ) end_POSTSUPERSCRIPT italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) = roman_exp ( ( divide start_ARG italic_C square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_s ( 1 + italic_δ - italic_j italic_δ ) end_POSTSUPERSCRIPT ) ,

where the last inequality follows from Lemma 2.3. Therefore, by (2.4),

logN(T,(δ),ϵ)j=0(Csϵ)s2s(1+δjδ).\log N(T,\|\,\|_{(\delta)},\epsilon)\leq\sum_{j=0}^{\infty}\left(\frac{C\sqrt{% s}}{\epsilon}\right)^{s}2^{s(1+\delta-j\delta)}.roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) ≤ ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( divide start_ARG italic_C square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_s ( 1 + italic_δ - italic_j italic_δ ) end_POSTSUPERSCRIPT .

But

j=02s(1+δjδ)=2s(1+δ)12sδ22s12δC22sδ,superscriptsubscript𝑗0superscript2𝑠1𝛿𝑗𝛿superscript2𝑠1𝛿1superscript2𝑠𝛿superscript22𝑠1superscript2𝛿𝐶superscript22𝑠𝛿\sum_{j=0}^{\infty}2^{s(1+\delta-j\delta)}=\frac{2^{s(1+\delta)}}{1-2^{-s% \delta}}\leq\frac{2^{2s}}{1-2^{-\delta}}\leq C\cdot\frac{2^{2s}}{\delta},∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_s ( 1 + italic_δ - italic_j italic_δ ) end_POSTSUPERSCRIPT = divide start_ARG 2 start_POSTSUPERSCRIPT italic_s ( 1 + italic_δ ) end_POSTSUPERSCRIPT end_ARG start_ARG 1 - 2 start_POSTSUPERSCRIPT - italic_s italic_δ end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 2 start_POSTSUPERSCRIPT 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 1 - 2 start_POSTSUPERSCRIPT - italic_δ end_POSTSUPERSCRIPT end_ARG ≤ italic_C ⋅ divide start_ARG 2 start_POSTSUPERSCRIPT 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG ,

since 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1. So the result follows. ∎

The following result is the main lemma of this section. We bound the expected supremum of the Gaussian process that arises when we use Gaussian symmetrization to prove Theorem 2.7. The key ingredient in proving this lemma is Talagrand’s majorizing measure theorem.

Lemma 2.5.

Let 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1. Suppose that E𝐸Eitalic_E is a Banach space with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and x1,,xnEsubscript𝑥1subscript𝑥𝑛𝐸x_{1},\ldots,x_{n}\in Eitalic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_E. Let g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. standard Gaussian random variables. Let T𝑇Titalic_T be the set of all 1111-Lipschitz functions f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R with f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0. Then

𝔼supv1,,vsBEfT|1ni=1ngif(v1(xi),,vs(xi))|𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))\right|blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) |
\displaystyle\leq Csn𝔼i=1ngixi+CMsn{(δn)12,s=1(ln(δn+2))(δn)12,s=2(δn)1s,s3,𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑀𝑠𝑛casessuperscript𝛿𝑛12𝑠1𝛿𝑛2superscript𝛿𝑛12𝑠2superscript𝛿𝑛1𝑠𝑠3\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+% \frac{CM\sqrt{s}}{\sqrt{n}}\cdot\begin{cases}(\delta n)^{-\frac{1}{2}},&s=1\\ (\ln(\delta n+2))\cdot(\delta n)^{-\frac{1}{2}},&s=2\\ (\delta n)^{-\frac{1}{s}},&s\geq 3\end{cases},divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C italic_M square-root start_ARG italic_s end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ { start_ROW start_CELL ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 1 end_CELL end_ROW start_ROW start_CELL ( roman_ln ( italic_δ italic_n + 2 ) ) ⋅ ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 2 end_CELL end_ROW start_ROW start_CELL ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s ≥ 3 end_CELL end_ROW ,

where

(2.6) M=2(n+s1+δsupvBEi=1n|v(xi)|2+2δ)12,𝑀2superscript𝑛superscript𝑠1𝛿subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿12M=\sqrt{2}\left(n+s^{1+\delta}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(x_% {i})|^{2+2\delta}\right)^{\frac{1}{2}},italic_M = square-root start_ARG 2 end_ARG ( italic_n + italic_s start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

and BE={vE:v1}subscript𝐵superscript𝐸conditional-setsuperscript𝑣superscript𝐸normsuperscript𝑣1B_{E^{*}}=\{v^{*}\in E^{*}:\,\|v^{*}\|\leq 1\}italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : ∥ italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ 1 }.

Proof.

Let Z={(v1,,vs):v1,,vsBE}𝑍conditional-setsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠superscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸Z=\{(v_{1}^{*},\ldots,v_{s}^{*}):\,v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}\}italic_Z = { ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) : italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT }. Define the Gaussian process (Xf,z)(f,z)T×Zsubscriptsubscript𝑋𝑓𝑧𝑓𝑧𝑇𝑍(X_{f,z})_{(f,z)\in T\times Z}( italic_X start_POSTSUBSCRIPT italic_f , italic_z end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT ( italic_f , italic_z ) ∈ italic_T × italic_Z end_POSTSUBSCRIPT as follows. If fT𝑓𝑇f\in Titalic_f ∈ italic_T and z=(v1,,vs)Z𝑧superscriptsubscript𝑣1superscriptsubscript𝑣𝑠𝑍z=(v_{1}^{*},\ldots,v_{s}^{*})\in Zitalic_z = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z, then

Xf,z=i=1ngif(v1(xi),,vs(xi)).subscript𝑋𝑓𝑧superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖X_{f,z}=\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i})).italic_X start_POSTSUBSCRIPT italic_f , italic_z end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) .

Recall that (δ)\|\,\|_{(\delta)}∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT is defined in (2.3). For f,hT𝑓𝑇f,h\in Titalic_f , italic_h ∈ italic_T and (v1,,vs)Zsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠𝑍(v_{1}^{*},\ldots,v_{s}^{*})\in Z( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z, we have

(2.7) {f(v1(xi),,vs(xi))}1in{h(v1(xi),,vs(xi))}1in2subscriptnormsubscript𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛subscriptsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛2\displaystyle\|\{f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}% -\{h(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}\|_{2}∥ { italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT - { italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=\displaystyle== (i=1n|f(v1(xi),,vs(xi))h(v1(xi),,vs(xi))|2)12superscriptsuperscriptsubscript𝑖1𝑛superscript𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖212\displaystyle\left(\sum_{i=1}^{n}|f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))-% h(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))|^{2}\right)^{\frac{1}{2}}( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) - italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq fh(δ)(i=1n[1+(v1(xi),,vs(xi))21+δ]2)12subscriptnorm𝑓𝛿superscriptsuperscriptsubscript𝑖1𝑛superscriptdelimited-[]1superscriptsubscriptnormsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖21𝛿212\displaystyle\|f-h\|_{(\delta)}\left(\sum_{i=1}^{n}\left[1+\|(v_{1}^{*}(x_{i})% ,\ldots,v_{s}^{*}(x_{i}))\|_{2}^{1+\delta}\right]^{2}\right)^{\frac{1}{2}}∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ 1 + ∥ ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq fh(δ)(i=1n2[1+(v1(xi),,vs(xi))22+2δ])12subscriptnorm𝑓𝛿superscriptsuperscriptsubscript𝑖1𝑛2delimited-[]1superscriptsubscriptnormsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖222𝛿12\displaystyle\|f-h\|_{(\delta)}\left(\sum_{i=1}^{n}2\left[1+\|(v_{1}^{*}(x_{i}% ),\ldots,v_{s}^{*}(x_{i}))\|_{2}^{2+2\delta}\right]\right)^{\frac{1}{2}}∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 [ 1 + ∥ ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq fh(δ)2(i=1n[1+sδ(|v1(xi)|2+2δ++|vs(xi)|2+2δ)])12subscriptnorm𝑓𝛿2superscriptsuperscriptsubscript𝑖1𝑛delimited-[]1superscript𝑠𝛿superscriptsuperscriptsubscript𝑣1subscript𝑥𝑖22𝛿superscriptsuperscriptsubscript𝑣𝑠subscript𝑥𝑖22𝛿12\displaystyle\|f-h\|_{(\delta)}\sqrt{2}\left(\sum_{i=1}^{n}\left[1+s^{\delta}(% |v_{1}^{*}(x_{i})|^{2+2\delta}+\ldots+|v_{s}^{*}(x_{i})|^{2+2\delta})\right]% \right)^{\frac{1}{2}}∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT square-root start_ARG 2 end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ 1 + italic_s start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( | italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT + … + | italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq fh(δ)2(n+s1+δsupvBEi=1n|v(xi)|2+2δ)12subscriptnorm𝑓𝛿2superscript𝑛superscript𝑠1𝛿subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿12\displaystyle\|f-h\|_{(\delta)}\sqrt{2}\left(n+s^{1+\delta}\sup_{v^{*}\in B_{E% ^{*}}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2+2\delta}\right)^{\frac{1}{2}}∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT square-root start_ARG 2 end_ARG ( italic_n + italic_s start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== Mfh(δ),𝑀subscriptnorm𝑓𝛿\displaystyle M\|f-h\|_{(\delta)},italic_M ∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ,

where M>0𝑀0M>0italic_M > 0 is defined in (2.6).

Fix b>0𝑏0b>0italic_b > 0. Let T(b)Tsuperscript𝑇𝑏𝑇T^{(b)}\subset Titalic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ⊂ italic_T be a b𝑏bitalic_b-covering of T𝑇Titalic_T with respect to (δ)\|\,\|_{(\delta)}∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT that has the smallest size, i.e., |T(b)|=N(T,(δ),b)|T^{(b)}|=N(T,\|\,\|_{(\delta)},b)| italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | = italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_b ). For every fT𝑓𝑇f\in Titalic_f ∈ italic_T, there exists hT(b)superscript𝑇𝑏h\in T^{(b)}italic_h ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT such that fh(δ)bsubscriptnorm𝑓𝛿𝑏\|f-h\|_{(\delta)}\leq b∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT ≤ italic_b so by (2.7),

{f(v1(xi),,vs(xi))}1in{h(v1(xi),,vs(xi))}1in2bM,subscriptnormsubscript𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛subscriptsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛2𝑏𝑀\|\{f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}-\{h(v_{1}^{*% }(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}\|_{2}\leq bM,∥ { italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT - { italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b italic_M ,

for all v1,,vsBEsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. So

supv1,,vsBE1ni=1ngif(v1(xi),,vs(xi))subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\sup_{v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}}\frac{1}{n}\sum_{i=% 1}^{n}g_{i}f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )
\displaystyle\leq supv1,,vsBE1ni=1ngih(v1(xi),,vs(xi))+1n(g1,,gn)2bM.subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑛subscriptnormsubscript𝑔1subscript𝑔𝑛2𝑏𝑀\displaystyle\sup_{v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}}\frac{1}{n}\sum_{i=% 1}^{n}g_{i}h(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))+\frac{1}{n}\|(g_{1},% \ldots,g_{n})\|_{2}\cdot bM.roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_b italic_M .

So since T=T𝑇𝑇T=-Titalic_T = - italic_T, we have

(2.8) 𝔼supv1,,vsBEfT|1ni=1ngif(v1(xi),,vs(xi))|𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))\right|blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) |
=\displaystyle== 𝔼supv1,,vsBEfT1ni=1ngif(v1(xi),,vs(xi))𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),\ldots,v% _{s}^{*}(x_{i}))blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )
\displaystyle\leq 𝔼supv1,,vsBEhT(b)1ni=1ngih(v1(xi),,vs(xi))+𝔼1n(g1,,gn)2bM𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸superscript𝑇𝑏1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖𝔼1𝑛subscriptnormsubscript𝑔1subscript𝑔𝑛2𝑏𝑀\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ h\in T^{(b)}\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}g_{i}h(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))+\mathbb{E}\frac{1}{n}\|(g_{1},\ldots,g_{n})\|_{2}% \cdot bMblackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_h ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + blackboard_E divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_b italic_M
\displaystyle\leq 𝔼supv1,,vsBEhT(b)1ni=1ngih(v1(xi),,vs(xi))+bMn𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸superscript𝑇𝑏1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖𝑏𝑀𝑛\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ h\in T^{(b)}\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}g_{i}h(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))+\frac{bM}{\sqrt{n}}blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_h ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) + divide start_ARG italic_b italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG
=\displaystyle== 1n𝔼sup(h,z)T(b)×ZXh,z+bMn,1𝑛𝔼subscriptsupremum𝑧superscript𝑇𝑏𝑍subscript𝑋𝑧𝑏𝑀𝑛\displaystyle\frac{1}{n}\mathbb{E}\sup_{(h,z)\in T^{(b)}\times Z}X_{h,z}+\frac% {bM}{\sqrt{n}},divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E roman_sup start_POSTSUBSCRIPT ( italic_h , italic_z ) ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT × italic_Z end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_h , italic_z end_POSTSUBSCRIPT + divide start_ARG italic_b italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ,

where Xh,zsubscript𝑋𝑧X_{h,z}italic_X start_POSTSUBSCRIPT italic_h , italic_z end_POSTSUBSCRIPT is defined at the beginning of this proof.

For f,hT𝑓𝑇f,h\in Titalic_f , italic_h ∈ italic_T and z1=(v1,,vs)Zsubscript𝑧1superscriptsubscript𝑣1superscriptsubscript𝑣𝑠𝑍z_{1}=(v_{1}^{*},\ldots,v_{s}^{*})\in Zitalic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z, z2=(w1,,ws)Zsubscript𝑧2superscriptsubscript𝑤1superscriptsubscript𝑤𝑠𝑍z_{2}=(w_{1}^{*},\ldots,w_{s}^{*})\in Zitalic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z, we have

(2.9) (𝔼|Xf,z1Xh,z2|2)12superscript𝔼superscriptsubscript𝑋𝑓subscript𝑧1subscript𝑋subscript𝑧2212\displaystyle\left(\mathbb{E}|X_{f,z_{1}}-X_{h,z_{2}}|^{2}\right)^{\frac{1}{2}}( blackboard_E | italic_X start_POSTSUBSCRIPT italic_f , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_h , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== {f(v1(xi),,vs(xi))}1in{h(w1(xi),,ws(xi))}1in2subscriptnormsubscript𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛subscriptsuperscriptsubscript𝑤1subscript𝑥𝑖superscriptsubscript𝑤𝑠subscript𝑥𝑖1𝑖𝑛2\displaystyle\|\{f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}% -\{h(w_{1}^{*}(x_{i}),\ldots,w_{s}^{*}(x_{i}))\}_{1\leq i\leq n}\|_{2}∥ { italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT - { italic_h ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
\displaystyle\leq {f(v1(xi),,vs(xi))}1in{h(v1(xi),,vs(xi))}1in2subscriptnormsubscript𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛subscriptsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛2\displaystyle\|\{f(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}% -\{h(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n}\|_{2}∥ { italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT - { italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
+{h(v1(xi),,vs(xi))}1in{h(w1(xi),,ws(xi))}1in2subscriptnormsubscriptsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖1𝑖𝑛subscriptsuperscriptsubscript𝑤1subscript𝑥𝑖superscriptsubscript𝑤𝑠subscript𝑥𝑖1𝑖𝑛2\displaystyle+\|\{h(v_{1}^{*}(x_{i}),\ldots,v_{s}^{*}(x_{i}))\}_{1\leq i\leq n% }-\{h(w_{1}^{*}(x_{i}),\ldots,w_{s}^{*}(x_{i}))\}_{1\leq i\leq n}\|_{2}+ ∥ { italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT - { italic_h ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) } start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
\displaystyle\leq Mfh(δ)+(i=1n|h(v1(xi),,vs(xi))h(w1(xi),,ws(xi))|2)12𝑀subscriptnorm𝑓𝛿superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖superscriptsubscript𝑤1subscript𝑥𝑖superscriptsubscript𝑤𝑠subscript𝑥𝑖212\displaystyle M\|f-h\|_{(\delta)}+\left(\sum_{i=1}^{n}|h(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))-h(w_{1}^{*}(x_{i}),\ldots,w_{s}^{*}(x_{i}))|^{2}% \right)^{\frac{1}{2}}italic_M ∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT + ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_h ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) - italic_h ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq Mfh(δ)+(i=1nj=1s|vj(xi)wj(xi)|2)12,𝑀subscriptnorm𝑓𝛿superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠superscriptsuperscriptsubscript𝑣𝑗subscript𝑥𝑖superscriptsubscript𝑤𝑗subscript𝑥𝑖212\displaystyle M\|f-h\|_{(\delta)}+\left(\sum_{i=1}^{n}\sum_{j=1}^{s}|v_{j}^{*}% (x_{i})-w_{j}^{*}(x_{i})|^{2}\right)^{\frac{1}{2}},italic_M ∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT + ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where the second inequality follows from (2.7) and the last inequality follows from hhitalic_h being 1-Lipschitz. Recall that M>0𝑀0M>0italic_M > 0 is defined in (2.6) and (δ)\|\,\|_{(\delta)}∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT is defined in (2.3). Consider the metric ρT(f,h)=Mfh(δ)subscript𝜌𝑇𝑓𝑀subscriptnorm𝑓𝛿\rho_{T}(f,h)=M\|f-h\|_{(\delta)}italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f , italic_h ) = italic_M ∥ italic_f - italic_h ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT on T𝑇Titalic_T. Also, define the metric ρZsubscript𝜌𝑍\rho_{Z}italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT on Z𝑍Zitalic_Z by

ρZ((v1,,vs),(w1,,ws))=(i=1nj=1s|vj(xi)wj(xi)|2)12.subscript𝜌𝑍superscriptsubscript𝑣1superscriptsubscript𝑣𝑠superscriptsubscript𝑤1superscriptsubscript𝑤𝑠superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠superscriptsuperscriptsubscript𝑣𝑗subscript𝑥𝑖superscriptsubscript𝑤𝑗subscript𝑥𝑖212\rho_{Z}((v_{1}^{*},\ldots,v_{s}^{*}),(w_{1}^{*},\ldots,w_{s}^{*}))=\left(\sum% _{i=1}^{n}\sum_{j=1}^{s}|v_{j}^{*}(x_{i})-w_{j}^{*}(x_{i})|^{2}\right)^{\frac{% 1}{2}}.italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

Then by (2.9), we have

(𝔼|Xf,z1Xh,z2|2)12ρT(f,h)+ρZ(z1,z2),superscript𝔼superscriptsubscript𝑋𝑓subscript𝑧1subscript𝑋subscript𝑧2212subscript𝜌𝑇𝑓subscript𝜌𝑍subscript𝑧1subscript𝑧2\left(\mathbb{E}|X_{f,z_{1}}-X_{h,z_{2}}|^{2}\right)^{\frac{1}{2}}\leq\rho_{T}% (f,h)+\rho_{Z}(z_{1},z_{2}),( blackboard_E | italic_X start_POSTSUBSCRIPT italic_f , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_h , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≤ italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f , italic_h ) + italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

for all (f,z1),(h,z2)T×Z𝑓subscript𝑧1subscript𝑧2𝑇𝑍(f,z_{1}),(h,z_{2})\in T\times Z( italic_f , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_h , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ italic_T × italic_Z. So by (2.2) and Lemma 2.1,

(2.10) 𝔼sup(f,z)T(b)×ZXf,zCγ2(T(b)×Z,ρT×ρZ)Cγ2(T(b),ρT)+Cγ2(Z,ρZ).𝔼subscriptsupremum𝑓𝑧superscript𝑇𝑏𝑍subscript𝑋𝑓𝑧𝐶subscript𝛾2superscript𝑇𝑏𝑍subscript𝜌𝑇subscript𝜌𝑍𝐶subscript𝛾2superscript𝑇𝑏subscript𝜌𝑇𝐶subscript𝛾2𝑍subscript𝜌𝑍\mathbb{E}\sup_{(f,z)\in T^{(b)}\times Z}X_{f,z}\leq C\gamma_{2}(T^{(b)}\times Z% ,\rho_{T}\times\rho_{Z})\leq C\gamma_{2}(T^{(b)},\rho_{T})+C\gamma_{2}(Z,\rho_% {Z}).blackboard_E roman_sup start_POSTSUBSCRIPT ( italic_f , italic_z ) ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT × italic_Z end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_f , italic_z end_POSTSUBSCRIPT ≤ italic_C italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT × italic_Z , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT × italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) ≤ italic_C italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_C italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) .

Let’s bound each of these two terms. For the first term, by Lemma 2.2,

(2.11) γ2(T(b),ρT)subscript𝛾2superscript𝑇𝑏subscript𝜌𝑇\displaystyle\gamma_{2}(T^{(b)},\rho_{T})italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT )
\displaystyle\leq C0logN(T(b),ρT,ϵ)𝑑ϵ𝐶superscriptsubscript0𝑁superscript𝑇𝑏subscript𝜌𝑇italic-ϵdifferential-ditalic-ϵ\displaystyle C\int_{0}^{\infty}\sqrt{\log N(T^{(b)},\rho_{T},\epsilon)}\,d\epsilonitalic_C ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , italic_ρ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ
=\displaystyle== C0logN(T(b),(δ),ϵM)𝑑ϵ\displaystyle C\int_{0}^{\infty}\sqrt{\log N(T^{(b)},\|\,\|_{(\delta)},\frac{% \epsilon}{M})}\,d\epsilonitalic_C ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , divide start_ARG italic_ϵ end_ARG start_ARG italic_M end_ARG ) end_ARG italic_d italic_ϵ
=\displaystyle== CM0logN(T(b),(δ),ϵ)𝑑ϵ\displaystyle CM\int_{0}^{\infty}\sqrt{\log N(T^{(b)},\|\,\|_{(\delta)},% \epsilon)}\,d\epsilonitalic_C italic_M ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ
\displaystyle\leq CM(blogN(T(b),(δ),ϵ)𝑑ϵ+blog|T(b)|)\displaystyle CM\left(\int_{b}^{\infty}\sqrt{\log N(T^{(b)},\|\,\|_{(\delta)},% \epsilon)}\,d\epsilon+b\sqrt{\log|T^{(b)}|}\right)italic_C italic_M ( ∫ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ + italic_b square-root start_ARG roman_log | italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG )
\displaystyle\leq CM(blogN(T,(δ),ϵ2)𝑑ϵ+blog|T(b)|)\displaystyle CM\left(\int_{b}^{\infty}\sqrt{\log N(T,\|\,\|_{(\delta)},\frac{% \epsilon}{2})}\,d\epsilon+b\sqrt{\log|T^{(b)}|}\right)italic_C italic_M ( ∫ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) end_ARG italic_d italic_ϵ + italic_b square-root start_ARG roman_log | italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG )
\displaystyle\leq CMb2logN(T,(δ),ϵ)𝑑ϵ,\displaystyle CM\int_{\frac{b}{2}}^{\infty}\sqrt{\log N(T,\|\,\|_{(\delta)},% \epsilon)}\,d\epsilon,italic_C italic_M ∫ start_POSTSUBSCRIPT divide start_ARG italic_b end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ ,

where the second last inequality follows from T(b)Tsuperscript𝑇𝑏𝑇T^{(b)}\subset Titalic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ⊂ italic_T and the last inequality follows from |T(b)|=N(T,(δ),b)|T^{(b)}|=N(T,\|\,\|_{(\delta)},b)| italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | = italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_b ) (by definition of T(b)superscript𝑇𝑏T^{(b)}italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT) and b=2b2b1𝑑ϵ𝑏2superscriptsubscript𝑏2𝑏1differential-ditalic-ϵb=2\int_{\frac{b}{2}}^{b}1\,d\epsilonitalic_b = 2 ∫ start_POSTSUBSCRIPT divide start_ARG italic_b end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT 1 italic_d italic_ϵ.

We now bound the other term in (2.10). Let (gi,j)1in, 1jssubscriptsubscript𝑔𝑖𝑗formulae-sequence1𝑖𝑛1𝑗𝑠(g_{i,j})_{1\leq i\leq n,\,1\leq j\leq s}( italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n , 1 ≤ italic_j ≤ italic_s end_POSTSUBSCRIPT be i.i.d. standard Gaussian random variables. Then for (v1,,vs),(w1,,ws)Zsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠superscriptsubscript𝑤1superscriptsubscript𝑤𝑠𝑍(v_{1}^{*},\ldots,v_{s}^{*}),(w_{1}^{*},\ldots,w_{s}^{*})\in Z( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z, we have

(𝔼|i=1nj=1sgi,jvj(xi)i=1nj=1sgi,jwj(xi)|2)12superscript𝔼superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠subscript𝑔𝑖𝑗superscriptsubscript𝑣𝑗subscript𝑥𝑖superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠subscript𝑔𝑖𝑗superscriptsubscript𝑤𝑗subscript𝑥𝑖212\displaystyle\left(\mathbb{E}\left|\sum_{i=1}^{n}\sum_{j=1}^{s}g_{i,j}v_{j}^{*% }(x_{i})-\sum_{i=1}^{n}\sum_{j=1}^{s}g_{i,j}w_{j}^{*}(x_{i})\right|^{2}\right)% ^{\frac{1}{2}}( blackboard_E | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT =\displaystyle== (i=1nj=1s|vj(xi)wj(xi)|2)12superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠superscriptsuperscriptsubscript𝑣𝑗subscript𝑥𝑖superscriptsubscript𝑤𝑗subscript𝑥𝑖212\displaystyle\left(\sum_{i=1}^{n}\sum_{j=1}^{s}|v_{j}^{*}(x_{i})-w_{j}^{*}(x_{% i})|^{2}\right)^{\frac{1}{2}}( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== ρZ((v1,,vs),(w1,,ws)).subscript𝜌𝑍superscriptsubscript𝑣1superscriptsubscript𝑣𝑠superscriptsubscript𝑤1superscriptsubscript𝑤𝑠\displaystyle\rho_{Z}((v_{1}^{*},\ldots,v_{s}^{*}),(w_{1}^{*},\ldots,w_{s}^{*}% )).italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ( ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) .

So by (2.2),

γ2(Z,ρZ)subscript𝛾2𝑍subscript𝜌𝑍\displaystyle\gamma_{2}(Z,\rho_{Z})italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z , italic_ρ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ) \displaystyle\leq C𝔼sup(v1,,vs)Zi=1nj=1sgi,jvj(xi)𝐶𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠𝑍superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑠subscript𝑔𝑖𝑗superscriptsubscript𝑣𝑗subscript𝑥𝑖\displaystyle C\cdot\mathbb{E}\sup_{(v_{1}^{*},\ldots,v_{s}^{*})\in Z}\sum_{i=% 1}^{n}\sum_{j=1}^{s}g_{i,j}v_{j}^{*}(x_{i})italic_C ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_Z end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
=\displaystyle== C𝔼j=1ssupvBEi=1ngi,jv(xi)𝐶𝔼superscriptsubscript𝑗1𝑠subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑗superscript𝑣subscript𝑥𝑖\displaystyle C\cdot\mathbb{E}\sum_{j=1}^{s}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1% }^{n}g_{i,j}v^{*}(x_{i})italic_C ⋅ blackboard_E ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
=\displaystyle== C𝔼j=1si=1ngi,jxi=Cs𝔼i=1ngixi.𝐶𝔼superscriptsubscript𝑗1𝑠normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑗subscript𝑥𝑖𝐶𝑠𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖\displaystyle C\cdot\mathbb{E}\sum_{j=1}^{s}\left\|\sum_{i=1}^{n}g_{i,j}x_{i}% \right\|=Cs\cdot\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|.italic_C ⋅ blackboard_E ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = italic_C italic_s ⋅ blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ .

So we have bounded the second term in (2.10). Together with the bound (2.11) for the first term, we obtain the following from (2.10).

𝔼sup(f,z)T(b)×ZXf,zCMb2logN(T,(δ),ϵ)𝑑ϵ+Cs𝔼i=1ngixi.\mathbb{E}\sup_{(f,z)\in T^{(b)}\times Z}X_{f,z}\leq CM\int_{\frac{b}{2}}^{% \infty}\sqrt{\log N(T,\|\,\|_{(\delta)},\epsilon)}\,d\epsilon+Cs\cdot\mathbb{E% }\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|.blackboard_E roman_sup start_POSTSUBSCRIPT ( italic_f , italic_z ) ∈ italic_T start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT × italic_Z end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_f , italic_z end_POSTSUBSCRIPT ≤ italic_C italic_M ∫ start_POSTSUBSCRIPT divide start_ARG italic_b end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ + italic_C italic_s ⋅ blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ .

Combining this with (2.8), we obtain

𝔼supv1,,vsBEfT|1ni=1ngif(v1(xi),,vs(xi))|𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))\right|blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) |
\displaystyle\leq Cinfb>0(bMn+MnblogN(T,(δ),ϵ)𝑑ϵ)+Csn𝔼i=1ngixi\displaystyle C\inf_{b>0}\left(\frac{bM}{\sqrt{n}}+\frac{M}{n}\int_{b}^{\infty% }\sqrt{\log N(T,\|\,\|_{(\delta)},\epsilon)}\,d\epsilon\right)+\frac{Cs}{n}% \mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|italic_C roman_inf start_POSTSUBSCRIPT italic_b > 0 end_POSTSUBSCRIPT ( divide start_ARG italic_b italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + divide start_ARG italic_M end_ARG start_ARG italic_n end_ARG ∫ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_N ( italic_T , ∥ ∥ start_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT , italic_ϵ ) end_ARG italic_d italic_ϵ ) + divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
\displaystyle\leq Cinf0<b1(bMn+Mnb1(Csϵ)s1δ𝑑ϵ)+Csn𝔼i=1ngixi𝐶subscriptinfimum0𝑏1𝑏𝑀𝑛𝑀𝑛superscriptsubscript𝑏1superscript𝐶𝑠italic-ϵ𝑠1𝛿differential-ditalic-ϵ𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖\displaystyle C\inf_{0<b\leq 1}\left(\frac{bM}{\sqrt{n}}+\frac{M}{n}\int_{b}^{% 1}\sqrt{\left(\frac{C\sqrt{s}}{\epsilon}\right)^{s}\frac{1}{\delta}}\,d% \epsilon\right)+\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|italic_C roman_inf start_POSTSUBSCRIPT 0 < italic_b ≤ 1 end_POSTSUBSCRIPT ( divide start_ARG italic_b italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + divide start_ARG italic_M end_ARG start_ARG italic_n end_ARG ∫ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT square-root start_ARG ( divide start_ARG italic_C square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG end_ARG italic_d italic_ϵ ) + divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
by Lemma 2.4
=\displaystyle== Csn𝔼i=1ngixi+CMninf0<b1(b+1δnb1(Csϵ)s𝑑ϵ)𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑀𝑛subscriptinfimum0𝑏1𝑏1𝛿𝑛superscriptsubscript𝑏1superscript𝐶𝑠italic-ϵ𝑠differential-ditalic-ϵ\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+% \frac{CM}{\sqrt{n}}\cdot\inf_{0<b\leq 1}\left(b+\frac{1}{\sqrt{\delta n}}\int_% {b}^{1}\sqrt{\left(\frac{C\sqrt{s}}{\epsilon}\right)^{s}}\,d\epsilon\right)divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ roman_inf start_POSTSUBSCRIPT 0 < italic_b ≤ 1 end_POSTSUBSCRIPT ( italic_b + divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_δ italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT square-root start_ARG ( divide start_ARG italic_C square-root start_ARG italic_s end_ARG end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_ARG italic_d italic_ϵ )
\displaystyle\leq Csn𝔼i=1ngixi+CMn{(δn)12,s=1(ln(δn+2))(δn)12,s=2s(δn)1s,s3,𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑀𝑛casessuperscript𝛿𝑛12𝑠1𝛿𝑛2superscript𝛿𝑛12𝑠2𝑠superscript𝛿𝑛1𝑠𝑠3\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+% \frac{CM}{\sqrt{n}}\cdot\begin{cases}(\delta n)^{-\frac{1}{2}},&s=1\\ (\ln(\delta n+2))\cdot(\delta n)^{-\frac{1}{2}},&s=2\\ \sqrt{s}\cdot(\delta n)^{-\frac{1}{s}},&s\geq 3\end{cases},divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C italic_M end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ { start_ROW start_CELL ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 1 end_CELL end_ROW start_ROW start_CELL ( roman_ln ( italic_δ italic_n + 2 ) ) ⋅ ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 2 end_CELL end_ROW start_ROW start_CELL square-root start_ARG italic_s end_ARG ⋅ ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s ≥ 3 end_CELL end_ROW ,

where we take b={0,s=1min((δn)12,1),s=2min(Cs(δn)1s, 1),s3𝑏cases0𝑠1superscript𝛿𝑛121𝑠2𝐶𝑠superscript𝛿𝑛1𝑠1𝑠3b=\begin{cases}0,&s=1\\ \min((\delta n)^{-\frac{1}{2}},1),&s=2\\ \min(C\sqrt{s}\cdot(\delta n)^{-\frac{1}{s}},\,1),&s\geq 3\end{cases}italic_b = { start_ROW start_CELL 0 , end_CELL start_CELL italic_s = 1 end_CELL end_ROW start_ROW start_CELL roman_min ( ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , 1 ) , end_CELL start_CELL italic_s = 2 end_CELL end_ROW start_ROW start_CELL roman_min ( italic_C square-root start_ARG italic_s end_ARG ⋅ ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT , 1 ) , end_CELL start_CELL italic_s ≥ 3 end_CELL end_ROW

In the sequel, we define

(2.12) Φ(n,s,δ)={(δn)12,s=1(ln(δn+2))(δn)12,s=2(δn)1s,s3.Φ𝑛𝑠𝛿casessuperscript𝛿𝑛12𝑠1𝛿𝑛2superscript𝛿𝑛12𝑠2superscript𝛿𝑛1𝑠𝑠3\Phi(n,s,\delta)=\begin{cases}(\delta n)^{-\frac{1}{2}},&s=1\\ (\ln(\delta n+2))\cdot(\delta n)^{-\frac{1}{2}},&s=2\\ (\delta n)^{-\frac{1}{s}},&s\geq 3\end{cases}.roman_Φ ( italic_n , italic_s , italic_δ ) = { start_ROW start_CELL ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 1 end_CELL end_ROW start_ROW start_CELL ( roman_ln ( italic_δ italic_n + 2 ) ) ⋅ ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s = 2 end_CELL end_ROW start_ROW start_CELL ( italic_δ italic_n ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_s end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_s ≥ 3 end_CELL end_ROW .

Next we adjust the scale in Lemma 2.5.

Lemma 2.6.

Let 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1. Suppose that E𝐸Eitalic_E is a Banach space with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and x1,,xnEsubscript𝑥1subscript𝑥𝑛𝐸x_{1},\ldots,x_{n}\in Eitalic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_E. Let g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. standard Gaussian random variables. Let T𝑇Titalic_T be the set of all 1111-Lipschitz functions f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R with f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0. Then

𝔼supv1,,vsBEfT|1ni=1ngif(v1(xi),,vs(xi))|𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))\right|blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) |
\displaystyle\leq Csn𝔼i=1ngixi+Cs(1nsupvBEi=1n|v(xi)|2+2δ)12+2δΦ(n,s,δ).𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑠superscript1𝑛subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿122𝛿Φ𝑛𝑠𝛿\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+Cs% \cdot\left(\frac{1}{n}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2% +2\delta}\right)^{\frac{1}{2+2\delta}}\cdot\Phi(n,s,\delta).divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + italic_C italic_s ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ⋅ roman_Φ ( italic_n , italic_s , italic_δ ) .
Proof.

Observe that if fT𝑓𝑇f\in Titalic_f ∈ italic_T and a>0𝑎0a>0italic_a > 0, then the map y1af(ay)maps-to𝑦1𝑎𝑓𝑎𝑦y\mapsto\frac{1}{a}f(ay)italic_y ↦ divide start_ARG 1 end_ARG start_ARG italic_a end_ARG italic_f ( italic_a italic_y ) from ssuperscript𝑠\mathbb{R}^{s}blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT to \mathbb{R}blackboard_R is also in T𝑇Titalic_T. Thus, without loss of generality, by rescaling x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\ldots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we may assume that

supvBEi=1n|v(xi)|2+2δ=ns(1+δ).subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿𝑛superscript𝑠1𝛿\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2+2\delta}=n\cdot s^{-(% 1+\delta)}.roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT = italic_n ⋅ italic_s start_POSTSUPERSCRIPT - ( 1 + italic_δ ) end_POSTSUPERSCRIPT .

Then in Lemma 2.5,

M2[n+s1+δ2(supvBEi=1n|v(xi)|2+2δ)12]=22n.𝑀2delimited-[]𝑛superscript𝑠1𝛿2superscriptsubscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿1222𝑛M\leq\sqrt{2}\left[\sqrt{n}+s^{\frac{1+\delta}{2}}\left(\sup_{v^{*}\in B_{E^{*% }}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2+2\delta}\right)^{\frac{1}{2}}\right]=2\sqrt% {2}\cdot\sqrt{n}.italic_M ≤ square-root start_ARG 2 end_ARG [ square-root start_ARG italic_n end_ARG + italic_s start_POSTSUPERSCRIPT divide start_ARG 1 + italic_δ end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ] = 2 square-root start_ARG 2 end_ARG ⋅ square-root start_ARG italic_n end_ARG .

So by Lemma 2.5,

𝔼supv1,,vsBEfT|1ni=1ngif(v1(xi),,vs(xi))|𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓𝑇1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑥𝑖superscriptsubscript𝑣𝑠subscript𝑥𝑖\displaystyle\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B% _{E^{*}}\\ f\in T\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_{1}^{*}(x_{i}),% \ldots,v_{s}^{*}(x_{i}))\right|blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f ∈ italic_T end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) |
\displaystyle\leq Csn𝔼i=1ngixi+CsΦ(n,s,δ)𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑠Φ𝑛𝑠𝛿\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+C% \sqrt{s}\cdot\Phi(n,s,\delta)divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + italic_C square-root start_ARG italic_s end_ARG ⋅ roman_Φ ( italic_n , italic_s , italic_δ )
=\displaystyle== Csn𝔼i=1ngixi+Cs(1nsupvBEi=1n|v(xi)|2+2δ)12+2δΦ(n,s,δ),𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖𝐶𝑠superscript1𝑛subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿122𝛿Φ𝑛𝑠𝛿\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}\right\|+Cs% \cdot\left(\frac{1}{n}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2% +2\delta}\right)^{\frac{1}{2+2\delta}}\cdot\Phi(n,s,\delta),divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + italic_C italic_s ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ⋅ roman_Φ ( italic_n , italic_s , italic_δ ) ,

since we assume that supvBEi=1n|v(xi)|2+2δ=ns(1+δ)subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑥𝑖22𝛿𝑛superscript𝑠1𝛿\displaystyle\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(x_{i})|^{2+2\delta}% =n\cdot s^{-(1+\delta)}roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT = italic_n ⋅ italic_s start_POSTSUPERSCRIPT - ( 1 + italic_δ ) end_POSTSUPERSCRIPT. So the result follows. ∎

Theorem 2.7.

Let 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1. Suppose that μ𝜇\muitalic_μ is a probability measure on a Banach space E𝐸Eitalic_E with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

𝔼W1,s(μ,1ni=1nδXi)𝔼subscript𝑊1𝑠𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle\mathbb{E}W_{1,s}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}% }\right)blackboard_E italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
\displaystyle\leq Csn𝔼i=1ngiXi+Cs𝔼[(1nsupvBEi=1n|v(Xi)|2+2δ)12+2δ]Φ(n,s,δ),𝐶𝑠𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑠𝔼delimited-[]superscript1𝑛subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖22𝛿122𝛿Φ𝑛𝑠𝛿\displaystyle\frac{Cs}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|+Cs% \cdot\mathbb{E}\left[\left(\frac{1}{n}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|% v^{*}(X_{i})|^{2+2\delta}\right)^{\frac{1}{2+2\delta}}\right]\cdot\Phi(n,s,% \delta),divide start_ARG italic_C italic_s end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + italic_C italic_s ⋅ blackboard_E [ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ] ⋅ roman_Φ ( italic_n , italic_s , italic_δ ) ,

where g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. standard Gaussian random variables that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and Φ(n,s,δ)Φ𝑛𝑠𝛿\Phi(n,s,\delta)roman_Φ ( italic_n , italic_s , italic_δ ) is defined in (2.12).

Proof.

By the definition of W1,ssubscript𝑊1𝑠W_{1,s}italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT in Section 1.3,

W1,s(μ,1ni=1nδXi)subscript𝑊1𝑠𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle W_{1,s}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
=\displaystyle== supv1,,vsBEf is 1-Lipschitz|1ni=1nf(v1(Xi),,vs(Xi))Ef(v1(x),,vs(x))𝑑μ(x)|,subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓 is 1-Lipschitz1𝑛superscriptsubscript𝑖1𝑛𝑓superscriptsubscript𝑣1subscript𝑋𝑖superscriptsubscript𝑣𝑠subscript𝑋𝑖subscript𝐸𝑓superscriptsubscript𝑣1𝑥superscriptsubscript𝑣𝑠𝑥differential-d𝜇𝑥\displaystyle\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}% \\ f\text{ is 1-Lipschitz}\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}f(v_{1}^{% *}(X_{i}),\ldots,v_{s}^{*}(X_{i}))-\int_{E}f(v_{1}^{*}(x),\ldots,v_{s}^{*}(x))% \,d\mu(x)\right|,roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f is 1-Lipschitz end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) - ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) ) italic_d italic_μ ( italic_x ) | ,

where the supremum is over all v1,,vsBEsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and all 1111-Lipschitz functions f:s:𝑓superscript𝑠f:\mathbb{R}^{s}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → blackboard_R with f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0. By symmetrization,

𝔼W1,s(μ,1ni=1nδXi)C𝔼supv1,,vsBEf is 1-Lipschitz|1ni=1ngif(v1(Xi),,vs(Xi))|.𝔼subscript𝑊1𝑠𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝔼subscriptsupremumsuperscriptsubscript𝑣1superscriptsubscript𝑣𝑠subscript𝐵superscript𝐸𝑓 is 1-Lipschitz1𝑛superscriptsubscript𝑖1𝑛subscript𝑔𝑖𝑓superscriptsubscript𝑣1subscript𝑋𝑖superscriptsubscript𝑣𝑠subscript𝑋𝑖\mathbb{E}W_{1,s}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq C% \cdot\mathbb{E}\sup_{\begin{subarray}{c}v_{1}^{*},\ldots,v_{s}^{*}\in B_{E^{*}% }\\ f\text{ is 1-Lipschitz}\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}g_{i}f(v_% {1}^{*}(X_{i}),\ldots,v_{s}^{*}(X_{i}))\right|.blackboard_E italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f is 1-Lipschitz end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | .

So by Lemma 2.6, the result follows. ∎

Corollary 2.8.

Let 0<δ10𝛿10<\delta\leq 10 < italic_δ ≤ 1. Suppose that μ𝜇\muitalic_μ is a probability measure on a separable Hilbert space E𝐸Eitalic_E with Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

𝔼W1,s(μ,1ni=1nδXi)Cs(Ex2+2δ𝑑μ(x))12+2δΦ(n,s,δ),𝔼subscript𝑊1𝑠𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑠superscriptsubscript𝐸superscriptnorm𝑥22𝛿differential-d𝜇𝑥122𝛿Φ𝑛𝑠𝛿\mathbb{E}W_{1,s}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq Cs% \cdot\left(\int_{E}\|x\|^{2+2\delta}\,d\mu(x)\right)^{\frac{1}{2+2\delta}}% \cdot\Phi(n,s,\delta),blackboard_E italic_W start_POSTSUBSCRIPT 1 , italic_s end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_C italic_s ⋅ ( ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ⋅ roman_Φ ( italic_n , italic_s , italic_δ ) ,

where Φ(n,s,δ)Φ𝑛𝑠𝛿\Phi(n,s,\delta)roman_Φ ( italic_n , italic_s , italic_δ ) is defined in (2.12).

Proof.

In Theorem 2.7,

𝔼i=1ngiXi(𝔼i=1ngiXi2)12=(i=1n𝔼Xi2)12=n(Ex2𝑑μ(x))12.𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscript𝔼superscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖212superscriptsuperscriptsubscript𝑖1𝑛𝔼superscriptnormsubscript𝑋𝑖212𝑛superscriptsubscript𝐸superscriptnorm𝑥2differential-d𝜇𝑥12\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|\leq\left(\mathbb{E}\left\|% \sum_{i=1}^{n}g_{i}X_{i}\right\|^{2}\right)^{\frac{1}{2}}=\left(\sum_{i=1}^{n}% \mathbb{E}\|X_{i}\|^{2}\right)^{\frac{1}{2}}=\sqrt{n}\left(\int_{E}\|x\|^{2}\,% d\mu(x)\right)^{\frac{1}{2}}.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ( blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = square-root start_ARG italic_n end_ARG ( ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

We also have

𝔼[(1nsupvBEi=1n|v(Xi)|2+2δ)12+2δ]𝔼delimited-[]superscript1𝑛subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖22𝛿122𝛿\displaystyle\mathbb{E}\left[\left(\frac{1}{n}\sup_{v^{*}\in B_{E^{*}}}\sum_{i% =1}^{n}|v^{*}(X_{i})|^{2+2\delta}\right)^{\frac{1}{2+2\delta}}\right]blackboard_E [ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ] \displaystyle\leq 𝔼[(1ni=1nXi2+2δ)12+2δ]𝔼delimited-[]superscript1𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝑋𝑖22𝛿122𝛿\displaystyle\mathbb{E}\left[\left(\frac{1}{n}\sum_{i=1}^{n}\|X_{i}\|^{2+2% \delta}\right)^{\frac{1}{2+2\delta}}\right]blackboard_E [ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ]
\displaystyle\leq (Ex2+2δ𝑑μ(x))12+2δ.superscriptsubscript𝐸superscriptnorm𝑥22𝛿differential-d𝜇𝑥122𝛿\displaystyle\left(\int_{E}\|x\|^{2+2\delta}\,d\mu(x)\right)^{\frac{1}{2+2% \delta}}.( ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT .

Since 1nΦ(s,δ,n)1𝑛Φ𝑠𝛿𝑛\frac{1}{\sqrt{n}}\leq\Phi(s,\delta,n)divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ≤ roman_Φ ( italic_s , italic_δ , italic_n ), by Theorem 2.7, the result follows. ∎

Corollary 2.9.

Suppose that μ𝜇\muitalic_μ is a probability measure on a Banach space E𝐸Eitalic_E with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

𝔼W1,1(μ,1ni=1nδXi)Cn𝔼i=1ngiXi+Clnnn𝔼supvBE(i=1n|v(Xi)|2)12,𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑛𝑛𝔼subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\leq% \frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|+\frac{C\sqrt{\ln n% }}{n}\cdot\mathbb{E}\sup_{v^{*}\in B_{E^{*}}}\left(\sum_{i=1}^{n}|v^{*}(X_{i})% |^{2}\right)^{\frac{1}{2}},blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ,

where g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. standard Gaussian random variables that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Proof.

By Theorem 2.7 with s=1𝑠1s=1italic_s = 1,

𝔼W1,1(μ,1ni=1nδXi)𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖\displaystyle\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}% }\right)blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
\displaystyle\leq Cn𝔼i=1ngiXi+Cninf0<δ11δ𝔼[(1nsupvBEi=1n|v(Xi)|2+2δ)12+2δ]𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑛subscriptinfimum0𝛿11𝛿𝔼delimited-[]superscript1𝑛subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖22𝛿122𝛿\displaystyle\frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|+% \frac{C}{\sqrt{n}}\cdot\inf_{0<\delta\leq 1}\frac{1}{\sqrt{\delta}}\mathbb{E}% \left[\left(\frac{1}{n}\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^{*}(X_{i})|^{% 2+2\delta}\right)^{\frac{1}{2+2\delta}}\right]divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ roman_inf start_POSTSUBSCRIPT 0 < italic_δ ≤ 1 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_δ end_ARG end_ARG blackboard_E [ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 + 2 italic_δ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ]
\displaystyle\leq Cn𝔼i=1ngiXi+Cninf0<δ11δ𝔼[n12+2δ(supvBEi=1n|v(Xi)|2)12].𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖𝐶𝑛subscriptinfimum0𝛿11𝛿𝔼delimited-[]superscript𝑛122𝛿superscriptsubscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸superscriptsubscript𝑖1𝑛superscriptsuperscript𝑣subscript𝑋𝑖212\displaystyle\frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}\right\|+% \frac{C}{\sqrt{n}}\cdot\inf_{0<\delta\leq 1}\frac{1}{\sqrt{\delta}}\mathbb{E}% \left[n^{-\frac{1}{2+2\delta}}\left(\sup_{v^{*}\in B_{E^{*}}}\sum_{i=1}^{n}|v^% {*}(X_{i})|^{2}\right)^{\frac{1}{2}}\right].divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + divide start_ARG italic_C end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ⋅ roman_inf start_POSTSUBSCRIPT 0 < italic_δ ≤ 1 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_δ end_ARG end_ARG blackboard_E [ italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ( roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ] .

Take δ=1/lnn𝛿1𝑛\delta=1/\lceil\ln n\rceilitalic_δ = 1 / ⌈ roman_ln italic_n ⌉. Then n12+2δCnsuperscript𝑛122𝛿𝐶𝑛n^{-\frac{1}{2+2\delta}}\leq\frac{C}{\sqrt{n}}italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 + 2 italic_δ end_ARG end_POSTSUPERSCRIPT ≤ divide start_ARG italic_C end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG. The result follows. ∎

In the rest of this section, we prove some lower bound results. These results are quite standard.

Proposition 2.10.

Suppose that μ𝜇\muitalic_μ is a probability measure on a Banach space E𝐸Eitalic_E with separable dual Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and that Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞ and Ex𝑑μ(x)=0subscript𝐸𝑥differential-d𝜇𝑥0\int_{E}x\,d\mu(x)=0∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_x italic_d italic_μ ( italic_x ) = 0. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

𝔼W1,1(μ,1ni=1nδXi)12n𝔼i=1nϵiXi,𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖12𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\geq% \frac{1}{2n}\mathbb{E}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|,blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ,

where ϵ1,,ϵnsubscriptitalic-ϵ1subscriptitalic-ϵ𝑛\epsilon_{1},\ldots,\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. uniform ±1plus-or-minus1\pm 1± 1 random variables that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Proof.

For fixed x1,,xnEsubscript𝑥1subscript𝑥𝑛𝐸x_{1},\ldots,x_{n}\in Eitalic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_E, by considering the 1-Lipschitz function f(t)=t𝑓𝑡𝑡f(t)=titalic_f ( italic_t ) = italic_t, we have

W1,1(μ,1ni=1nδxi)supvBE|Ev(x)𝑑μ(x)1ni=1nv(xi)|=1ni=1nxi.subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑥𝑖subscriptsupremumsuperscript𝑣subscript𝐵superscript𝐸subscript𝐸superscript𝑣𝑥differential-d𝜇𝑥1𝑛superscriptsubscript𝑖1𝑛superscript𝑣subscript𝑥𝑖norm1𝑛superscriptsubscript𝑖1𝑛subscript𝑥𝑖W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{x_{i}}\right)\geq\sup_{v^{*}% \in B_{E^{*}}}\left|\int_{E}v^{*}(x)\,d\mu(x)-\frac{1}{n}\sum_{i=1}^{n}v^{*}(x% _{i})\right|=\left\|\frac{1}{n}\sum_{i=1}^{n}x_{i}\right\|.italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≥ roman_sup start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) italic_d italic_μ ( italic_x ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | = ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ .

So

𝔼W1,1(μ,1ni=1nδXi)𝔼1ni=1nXi.𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖𝔼norm1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\geq% \mathbb{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\right\|.blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≥ blackboard_E ∥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ .

Let Y1,,Ynsubscript𝑌1subscript𝑌𝑛Y_{1},\ldots,Y_{n}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ϵ1,,ϵnsubscriptitalic-ϵ1subscriptitalic-ϵ𝑛\epsilon_{1},\ldots,\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then

𝔼i=1nXi12𝔼i=1n(XiYi)=12𝔼i=1nϵi(XiYi)12𝔼i=1nϵiXi,𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖12𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖subscript𝑌𝑖12𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖subscript𝑌𝑖12𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}\right\|\geq\frac{1}{2}\mathbb{E}\left\|% \sum_{i=1}^{n}(X_{i}-Y_{i})\right\|=\frac{1}{2}\mathbb{E}\left\|\sum_{i=1}^{n}% \epsilon_{i}(X_{i}-Y_{i})\right\|\geq\frac{1}{2}\mathbb{E}\left\|\sum_{i=1}^{n% }\epsilon_{i}X_{i}\right\|,blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ,

where the last inequality follows from Jensen’s inequality and taking expectation on Y1,,Ynsubscript𝑌1subscript𝑌𝑛Y_{1},\ldots,Y_{n}italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The result follows. ∎

Corollary 2.11.

Suppose that μ𝜇\muitalic_μ is a probability measure on a separable Hilbert space E𝐸Eitalic_E with Ex𝑑μ(x)<subscript𝐸norm𝑥differential-d𝜇𝑥\int_{E}\|x\|\,d\mu(x)<\infty∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) < ∞ and Ex𝑑μ(x)=0subscript𝐸𝑥differential-d𝜇𝑥0\int_{E}x\,d\mu(x)=0∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_x italic_d italic_μ ( italic_x ) = 0. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random elements of E𝐸Eitalic_E sampled according to μ𝜇\muitalic_μ. Then

𝔼W1,1(μ,1ni=1nδXi)122nEx𝑑μ(x).𝔼subscript𝑊11𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖122𝑛subscript𝐸norm𝑥differential-d𝜇𝑥\mathbb{E}W_{1,1}\left(\mu,\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\right)\geq% \frac{1}{2\sqrt{2n}}\int_{E}\|x\|\,d\mu(x).blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG 2 italic_n end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∥ italic_x ∥ italic_d italic_μ ( italic_x ) .
Proof.

By Proposition 2.10, it suffices to show that

𝔼i=1nϵiXin2𝔼X1.𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖𝑛2𝔼normsubscript𝑋1\mathbb{E}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|\geq\sqrt{\frac{n}{2}}% \cdot\mathbb{E}\|X_{1}\|.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 end_ARG end_ARG ⋅ blackboard_E ∥ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ .

If we first take expectation on ϵ1,,ϵnsubscriptitalic-ϵ1subscriptitalic-ϵ𝑛\epsilon_{1},\ldots,\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, then by the Kahane-Khintchine inequality [16], we have

𝔼ϵi=1nϵiXi12(𝔼ϵi=1nϵiXi2)12=12(i=1nXi2)1212ni=1nXi.subscript𝔼italic-ϵnormsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖12superscriptsubscript𝔼italic-ϵsuperscriptnormsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖21212superscriptsuperscriptsubscript𝑖1𝑛superscriptnormsubscript𝑋𝑖21212𝑛superscriptsubscript𝑖1𝑛normsubscript𝑋𝑖\mathbb{E}_{\epsilon}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|\geq\frac{1% }{\sqrt{2}}\left(\mathbb{E}_{\epsilon}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}% \right\|^{2}\right)^{\frac{1}{2}}=\frac{1}{\sqrt{2}}\left(\sum_{i=1}^{n}\|X_{i% }\|^{2}\right)^{\frac{1}{2}}\geq\frac{1}{\sqrt{2n}}\sum_{i=1}^{n}\|X_{i}\|.blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_n end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ .

So

𝔼i=1nϵiXi12ni=1n𝔼Xi=n2𝔼X1.𝔼normsuperscriptsubscript𝑖1𝑛subscriptitalic-ϵ𝑖subscript𝑋𝑖12𝑛superscriptsubscript𝑖1𝑛𝔼normsubscript𝑋𝑖𝑛2𝔼normsubscript𝑋1\mathbb{E}\left\|\sum_{i=1}^{n}\epsilon_{i}X_{i}\right\|\geq\frac{1}{\sqrt{2n}% }\sum_{i=1}^{n}\mathbb{E}\|X_{i}\|=\sqrt{\frac{n}{2}}\cdot\mathbb{E}\|X_{1}\|.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≥ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_n end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 end_ARG end_ARG ⋅ blackboard_E ∥ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ .

3. Max-sliced 2-Wasserstein distance

The following lemma is known. See e.g., [32].

Lemma 3.1.

Let r>0𝑟0r>0italic_r > 0. Suppose that μ𝜇\muitalic_μ is a probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random vectors in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT sampled according to μ𝜇\muitalic_μ. Let g1,,gnsubscript𝑔1subscript𝑔𝑛g_{1},\ldots,g_{n}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. standard Gaussian random variables that are independent from X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then

𝔼i=1nXiXiTop2n𝔼X1X1Top+Cr2lnn,𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op2𝑛subscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op𝐶superscript𝑟2𝑛\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}\leq 2n\|% \mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}+Cr^{2}\ln n,blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ 2 italic_n ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT + italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n ,

and

𝔼i=1ngiXiXiTopCrnlnn𝔼X1X1Top12+Cr2lnn.𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op𝐶𝑟𝑛𝑛superscriptsubscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op12𝐶superscript𝑟2𝑛\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}\leq Cr% \sqrt{n\ln n}\,\|\mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}^{\frac{1}{2}}+Cr^{2}% \ln n.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ italic_C italic_r square-root start_ARG italic_n roman_ln italic_n end_ARG ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n .
Proof.

Fix x1,,xndsubscript𝑥1subscript𝑥𝑛superscript𝑑x_{1},\ldots,x_{n}\in\mathbb{R}^{d}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with xi2rsubscriptnormsubscript𝑥𝑖2𝑟\|x_{i}\|_{2}\leq r∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r for all i𝑖iitalic_i. By the noncommutative Khintchine inequality (see [20, 29, 8]), for p𝑝p\in\mathbb{N}italic_p ∈ blackboard_N,

𝔼Tr(i=1ngixixiT)2p𝔼Trsuperscriptsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇2𝑝\displaystyle\mathbb{E}\,\mathrm{Tr}\left(\sum_{i=1}^{n}g_{i}x_{i}x_{i}^{T}% \right)^{2p}blackboard_E roman_Tr ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT \displaystyle\leq (Cp)2pTr[(i=1n(xixiT)2)p]superscript𝐶𝑝2𝑝Trdelimited-[]superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇2𝑝\displaystyle(C\sqrt{p})^{2p}\,\mathrm{Tr}\left[\left(\sum_{i=1}^{n}(x_{i}x_{i% }^{T})^{2}\right)^{p}\,\right]( italic_C square-root start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT roman_Tr [ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ]
=\displaystyle== (Cp)2pTr[(i=1nxi22xixiT)p]superscript𝐶𝑝2𝑝Trdelimited-[]superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscriptnormsubscript𝑥𝑖22subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇𝑝\displaystyle(C\sqrt{p})^{2p}\,\mathrm{Tr}\left[\left(\sum_{i=1}^{n}\|x_{i}\|_% {2}^{2}\,x_{i}x_{i}^{T}\right)^{p}\,\right]( italic_C square-root start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT roman_Tr [ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ]
\displaystyle\leq (Cp)2pn(i=1nxi22xixiT)popsuperscript𝐶𝑝2𝑝𝑛subscriptnormsuperscriptsuperscriptsubscript𝑖1𝑛superscriptsubscriptnormsubscript𝑥𝑖22subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇𝑝op\displaystyle(C\sqrt{p})^{2p}n\,\left\|\left(\sum_{i=1}^{n}\|x_{i}\|_{2}^{2}\,% x_{i}x_{i}^{T}\right)^{p}\,\right\|_{\mathrm{op}}( italic_C square-root start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT italic_n ∥ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT
\displaystyle\leq (Crp)2pni=1nxixiTopp,superscript𝐶𝑟𝑝2𝑝𝑛superscriptsubscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇op𝑝\displaystyle(Cr\sqrt{p})^{2p}n\left\|\sum_{i=1}^{n}x_{i}x_{i}^{T}\right\|_{% \mathrm{op}}^{p},( italic_C italic_r square-root start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT italic_n ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ,

where the second last inequality follows from the fact that i=1nxi22xixiTsuperscriptsubscript𝑖1𝑛superscriptsubscriptnormsubscript𝑥𝑖22subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇\sum_{i=1}^{n}\|x_{i}\|_{2}^{2}\,x_{i}x_{i}^{T}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT has rank at most n𝑛nitalic_n. Taking p=lnn𝑝𝑛p=\lceil\ln n\rceilitalic_p = ⌈ roman_ln italic_n ⌉, we obtain

𝔼i=1ngixixiTopCrlnni=1nxixiTop12.𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇op𝐶𝑟𝑛superscriptsubscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑇op12\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}x_{i}x_{i}^{T}\right\|_{\mathrm{op}}\leq Cr% \sqrt{\ln n}\left\|\sum_{i=1}^{n}x_{i}x_{i}^{T}\right\|_{\mathrm{op}}^{\frac{1% }{2}}.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

Now we randomize x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\ldots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We get

(3.1) 𝔼i=1ngiXiXiTopCrlnn(𝔼i=1nXiXiTop)12.𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op𝐶𝑟𝑛superscript𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}\leq Cr% \sqrt{\ln n}\left(\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{% \mathrm{op}}\right)^{\frac{1}{2}}.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG ( blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

By symmetrization,

𝔼i=1nXiXiTop𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op\displaystyle\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT \displaystyle\leq n𝔼X1X1Top+C𝔼i=1ngiXiXiTopsubscriptnorm𝑛𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op𝐶𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op\displaystyle\|n\mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}+C\cdot\mathbb{E}\left% \|\sum_{i=1}^{n}g_{i}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}∥ italic_n blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT + italic_C ⋅ blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT
\displaystyle\leq n𝔼X1X1Top+Crlnn(𝔼i=1nXiXiTop)12.𝑛subscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op𝐶𝑟𝑛superscript𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12\displaystyle n\|\mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}+Cr\sqrt{\ln n}\left(% \mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}\right)^{% \frac{1}{2}}.italic_n ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT + italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG ( blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

So

𝔼i=1nXiXiTop2n𝔼X1X1Top+Cr2lnn.𝔼subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op2𝑛subscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op𝐶superscript𝑟2𝑛\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i}^{T}\right\|_{\mathrm{op}}\leq 2n\|% \mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}+Cr^{2}\ln n.blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ 2 italic_n ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT + italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n .

This proves the first inequality. Combining this with (3.1), we obtain the second inequality. ∎

Lemma 3.2.

Suppose that μ1,μ2subscript𝜇1subscript𝜇2\mu_{1},\mu_{2}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are symmetric probability measures on (d,2)(\mathbb{R}^{d},\|\,\|_{2})( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }. Consider the map η(x):=xxTassign𝜂𝑥𝑥superscript𝑥𝑇\eta(x):=xx^{T}italic_η ( italic_x ) := italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT from the Hilbert space (d,2)(\mathbb{R}^{d},\|\,\|_{2})( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) to the Banach space (d×d,op)(\mathbb{R}^{d\times d},\|\,\|_{\mathrm{op}})( blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ). Let η#μ1subscript𝜂#subscript𝜇1\eta_{\#}\mu_{1}italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and η#μ2subscript𝜂#subscript𝜇2\eta_{\#}\mu_{2}italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be the pushforward measures of μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and μ2subscript𝜇2\mu_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by η𝜂\etaitalic_η, respectively. Then

W2,1(μ1,μ2)2W1,1(η#μ1,η#μ2).subscript𝑊21superscriptsubscript𝜇1subscript𝜇22subscript𝑊11subscript𝜂#subscript𝜇1subscript𝜂#subscript𝜇2W_{2,1}(\mu_{1},\mu_{2})^{2}\leq W_{1,1}(\eta_{\#}\mu_{1},\eta_{\#}\mu_{2}).italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) .
Proof.

Define abs::abs\mathrm{abs}:\mathbb{R}\to\mathbb{R}roman_abs : blackboard_R → blackboard_R and sq::sq\mathrm{sq}:\mathbb{R}\to\mathbb{R}roman_sq : blackboard_R → blackboard_R by abs(t)=|t|abs𝑡𝑡\mathrm{abs}(t)=|t|roman_abs ( italic_t ) = | italic_t | and sq(t)=t2sq𝑡superscript𝑡2\mathrm{sq}(t)=t^{2}roman_sq ( italic_t ) = italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Observe that if ν1subscript𝜈1\nu_{1}italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ν2subscript𝜈2\nu_{2}italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are symmetric probability measures on the interval [r,r]𝑟𝑟[-r,r][ - italic_r , italic_r ], then

W2(ν1,ν2)2subscript𝑊2superscriptsubscript𝜈1subscript𝜈22\displaystyle W_{2}(\nu_{1},\nu_{2})^{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =\displaystyle== W2(abs#ν1,abs#ν2)2subscript𝑊2superscriptsubscriptabs#subscript𝜈1subscriptabs#subscript𝜈22\displaystyle W_{2}(\mathrm{abs}_{\#}\nu_{1},\mathrm{abs}_{\#}\nu_{2})^{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== infγ[0,r]×[0,r]|ts|2𝑑γ(t,s)subscriptinfimum𝛾subscript0𝑟0𝑟superscript𝑡𝑠2differential-d𝛾𝑡𝑠\displaystyle\inf_{\gamma}\int_{[0,r]\times[0,r]}|t-s|^{2}\,d\gamma(t,s)roman_inf start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT [ 0 , italic_r ] × [ 0 , italic_r ] end_POSTSUBSCRIPT | italic_t - italic_s | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_γ ( italic_t , italic_s )
\displaystyle\leq infγ[0,r]×[0,r]|t2s2|𝑑γ(t,s)subscriptinfimum𝛾subscript0𝑟0𝑟superscript𝑡2superscript𝑠2differential-d𝛾𝑡𝑠\displaystyle\inf_{\gamma}\int_{[0,r]\times[0,r]}|t^{2}-s^{2}|\,d\gamma(t,s)roman_inf start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT [ 0 , italic_r ] × [ 0 , italic_r ] end_POSTSUBSCRIPT | italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_d italic_γ ( italic_t , italic_s )
=\displaystyle== W1(sq#abs#ν1,sq#abs#ν2)subscript𝑊1subscriptsq#subscriptabs#subscript𝜈1subscriptsq#subscriptabs#subscript𝜈2\displaystyle W_{1}(\mathrm{sq}_{\#}\mathrm{abs}_{\#}\nu_{1},\mathrm{sq}_{\#}% \mathrm{abs}_{\#}\nu_{2})italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
=\displaystyle== W1(sq#ν1,sq#ν2),subscript𝑊1subscriptsq#subscript𝜈1subscriptsq#subscript𝜈2\displaystyle W_{1}(\mathrm{sq}_{\#}\nu_{1},\mathrm{sq}_{\#}\nu_{2}),italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

where the infimum is over all coupling γ𝛾\gammaitalic_γ of the pushforward measures abs#ν1subscriptabs#subscript𝜈1\mathrm{abs}_{\#}\nu_{1}roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and abs#ν2subscriptabs#subscript𝜈2\mathrm{abs}_{\#}\nu_{2}roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on [0,r]0𝑟[0,r][ 0 , italic_r ].

For ud𝑢superscript𝑑u\in\mathbb{R}^{d}italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with u2=1subscriptnorm𝑢21\|u\|_{2}=1∥ italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1, let u#μisubscript𝑢#subscript𝜇𝑖u_{\#}\mu_{i}italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the pushforward measure of μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by the map ,u𝑢\langle\cdot,u\rangle⟨ ⋅ , italic_u ⟩. Taking νi=u#μisubscript𝜈𝑖subscript𝑢#subscript𝜇𝑖\nu_{i}=u_{\#}\mu_{i}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the above, we obtain

W2,1(μ1,μ2)2subscript𝑊21superscriptsubscript𝜇1subscript𝜇22\displaystyle W_{2,1}(\mu_{1},\mu_{2})^{2}italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =\displaystyle== supud,u2=1W2(u#μ1,u#μ2)2subscriptsupremumformulae-sequence𝑢superscript𝑑subscriptnorm𝑢21subscript𝑊2superscriptsubscript𝑢#subscript𝜇1subscript𝑢#subscript𝜇22\displaystyle\sup_{u\in\mathbb{R}^{d},\,\|u\|_{2}=1}W_{2}(u_{\#}\mu_{1},\,u_{% \#}\mu_{2})^{2}roman_sup start_POSTSUBSCRIPT italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle\leq supud,u2=1W1(sq#u#μ1,sq#u#μ2)subscriptsupremumformulae-sequence𝑢superscript𝑑subscriptnorm𝑢21subscript𝑊1subscriptsq#subscript𝑢#subscript𝜇1subscriptsq#subscript𝑢#subscript𝜇2\displaystyle\sup_{u\in\mathbb{R}^{d},\,\|u\|_{2}=1}W_{1}(\mathrm{sq}_{\#}u_{% \#}\mu_{1},\,\mathrm{sq}_{\#}u_{\#}\mu_{2})roman_sup start_POSTSUBSCRIPT italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

Observe that sq#u#μisubscriptsq#subscript𝑢#subscript𝜇𝑖\mathrm{sq}_{\#}u_{\#}\mu_{i}roman_sq start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the pushforward measure of μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by the map

xx,u2=Tr(uuTxxT)=Tr(uuTη(x)).maps-to𝑥superscript𝑥𝑢2Tr𝑢superscript𝑢𝑇𝑥superscript𝑥𝑇Tr𝑢superscript𝑢𝑇𝜂𝑥x\mapsto\langle x,u\rangle^{2}=\mathrm{Tr}(uu^{T}xx^{T})=\mathrm{Tr}(uu^{T}% \eta(x)).italic_x ↦ ⟨ italic_x , italic_u ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_Tr ( italic_u italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = roman_Tr ( italic_u italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_η ( italic_x ) ) .

Moreover, since the trace class norm of uuT𝑢superscript𝑢𝑇uu^{T}italic_u italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is equal to 1111, we can identify uuT𝑢superscript𝑢𝑇uu^{T}italic_u italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT as an element in the unit ball of the dual of the Banach space (d,op)(\mathbb{R}^{d},\|\,\|_{\mathrm{op}})( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ). Thus the result follows. ∎

Below we restate and prove Theorem 1.4.

Theorem 3.3.

Let r>0𝑟0r>0italic_r > 0. Suppose that μ𝜇\muitalic_μ is a symmetric probability measure on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT supported on {xd:x2r}conditional-set𝑥superscript𝑑subscriptnorm𝑥2𝑟\{x\in\mathbb{R}^{d}:\,\|x\|_{2}\leq r\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_r }. Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d. random vectors sampled according to μ𝜇\muitalic_μ. Then

𝔼[W2,1(μ,12ni=1n(δXi+δXi))2]CΣop(r2lnnnΣop+r2lnnnΣop),𝔼delimited-[]subscript𝑊21superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2𝐶subscriptnormΣopsuperscript𝑟2𝑛𝑛subscriptnormΣopsuperscript𝑟2𝑛𝑛subscriptnormΣop\mathbb{E}\left[W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}\right]\leq C\|\Sigma\|_{\mathrm{op}}\left(\frac{r^% {2}\ln n}{n\|\Sigma\|_{\mathrm{op}}}+\sqrt{\frac{r^{2}\ln n}{n\|\Sigma\|_{% \mathrm{op}}}}\,\right),blackboard_E [ italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_C ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ( divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG + square-root start_ARG divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG end_ARG ) ,

where Σ=dxxT𝑑μ(x)Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥\displaystyle\Sigma=\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)roman_Σ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ).

Proof.

Since μ𝜇\muitalic_μ and 12ni=1n(δXi+δXi)12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+\delta_{-X_{i}})divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) are symmetric, by Lemma 3.2,

W2,1(μ,12ni=1n(δXi+δXi))2subscript𝑊21superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2\displaystyle W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT \displaystyle\leq W1,1(η#μ,η#[12ni=1n(δXi+δXi)])subscript𝑊11subscript𝜂#𝜇subscript𝜂#delimited-[]12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖\displaystyle W_{1,1}\left(\eta_{\#}\mu,\,\eta_{\#}\left[\frac{1}{2n}\sum_{i=1% }^{n}(\delta_{X_{i}}+\delta_{-X_{i}})\right]\right)italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] )
=\displaystyle== W1,1(η#μ,1ni=1nδη(Xi)).subscript𝑊11subscript𝜂#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿𝜂subscript𝑋𝑖\displaystyle W_{1,1}\left(\eta_{\#}\mu,\,\frac{1}{n}\sum_{i=1}^{n}\delta_{% \eta(X_{i})}\right).italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_η ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) .

Note that η(Xi)=XiXiT𝜂subscript𝑋𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇\eta(X_{i})=X_{i}X_{i}^{T}italic_η ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are i.i.d. random matrices with distribution η#μsubscript𝜂#𝜇\eta_{\#}\muitalic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ. Taking E=(d×d,op)E=(\mathbb{R}^{d\times d},\|\,\|_{\mathrm{op}})italic_E = ( blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT , ∥ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ) in Corollary 2.9, we obtain

𝔼W1,1(η#μ,1ni=1nδη(Xi))𝔼subscript𝑊11subscript𝜂#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿𝜂subscript𝑋𝑖\displaystyle\mathbb{E}W_{1,1}\left(\eta_{\#}\mu,\,\frac{1}{n}\sum_{i=1}^{n}% \delta_{\eta(X_{i})}\right)blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_η ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT )
\displaystyle\leq Cn𝔼i=1ngiXiXiT+Clnnn𝔼supVBE(i=1n|V(XiXiT)|2)12𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇𝐶𝑛𝑛𝔼subscriptsupremumsuperscript𝑉subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑉subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇212\displaystyle\frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}X_{i}^{T}% \right\|+\frac{C\sqrt{\ln n}}{n}\cdot\mathbb{E}\sup_{V^{*}\in B_{E^{*}}}\left(% \sum_{i=1}^{n}|V^{*}(X_{i}X_{i}^{T})|^{2}\right)^{\frac{1}{2}}divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ + divide start_ARG italic_C square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E roman_sup start_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT

Since BEsubscript𝐵superscript𝐸B_{E^{*}}italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT coincides with the convex hull of {±vvT:vd,v21}conditional-setplus-or-minus𝑣superscript𝑣𝑇formulae-sequence𝑣superscript𝑑subscriptnorm𝑣21\{\pm vv^{T}:\,v\in\mathbb{R}^{d},\,\|v\|_{2}\leq 1\}{ ± italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT : italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 },

supVBE(i=1n|V(XiXiT)|2)12subscriptsupremumsuperscript𝑉subscript𝐵superscript𝐸superscriptsuperscriptsubscript𝑖1𝑛superscriptsuperscript𝑉subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇212\displaystyle\sup_{V^{*}\in B_{E^{*}}}\left(\sum_{i=1}^{n}|V^{*}(X_{i}X_{i}^{T% })|^{2}\right)^{\frac{1}{2}}roman_sup start_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT =\displaystyle== supvd,v21(i=1n|Tr(vvTXiXiT)|2)12subscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21superscriptsuperscriptsubscript𝑖1𝑛superscriptTr𝑣superscript𝑣𝑇subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇212\displaystyle\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}\leq 1}\left(\sum_{i=1}^{n}|% \mathrm{Tr}(vv^{T}X_{i}X_{i}^{T})|^{2}\right)^{\frac{1}{2}}roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | roman_Tr ( italic_v italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== supvd,v21(i=1nXi,v4)12subscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑋𝑖𝑣412\displaystyle\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}\leq 1}\left(\sum_{i=1}^{n}% \langle X_{i},v\rangle^{4}\right)^{\frac{1}{2}}roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
\displaystyle\leq rsupvd,v2=1(i=1nXi,v2)12=ri=1nXiXiTop12.𝑟subscriptsupremumformulae-sequence𝑣superscript𝑑subscriptnorm𝑣21superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑋𝑖𝑣212𝑟superscriptsubscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12\displaystyle r\sup_{v\in\mathbb{R}^{d},\,\|v\|_{2}=1}\left(\sum_{i=1}^{n}% \langle X_{i},v\rangle^{2}\right)^{\frac{1}{2}}=r\left\|\sum_{i=1}^{n}X_{i}X_{% i}^{T}\right\|_{\mathrm{op}}^{\frac{1}{2}}.italic_r roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = italic_r ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

Therefore,

𝔼W1,1(η#μ,1ni=1nδη(Xi))Cn𝔼i=1ngiXiXiT+Crlnnn𝔼i=1nXiXiTop12.𝔼subscript𝑊11subscript𝜂#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿𝜂subscript𝑋𝑖𝐶𝑛𝔼normsuperscriptsubscript𝑖1𝑛subscript𝑔𝑖subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇𝐶𝑟𝑛𝑛𝔼superscriptsubscriptnormsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscript𝑋𝑖𝑇op12\mathbb{E}W_{1,1}\left(\eta_{\#}\mu,\,\frac{1}{n}\sum_{i=1}^{n}\delta_{\eta(X_% {i})}\right)\leq\frac{C}{n}\mathbb{E}\left\|\sum_{i=1}^{n}g_{i}X_{i}X_{i}^{T}% \right\|+\frac{Cr\sqrt{\ln n}}{n}\cdot\mathbb{E}\left\|\sum_{i=1}^{n}X_{i}X_{i% }^{T}\right\|_{\mathrm{op}}^{\frac{1}{2}}.blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_η ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_C end_ARG start_ARG italic_n end_ARG blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ + divide start_ARG italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG italic_n end_ARG ⋅ blackboard_E ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT .

So by Lemma 3.1,

𝔼W1,1(η#μ,1ni=1nδη(Xi))𝔼subscript𝑊11subscript𝜂#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿𝜂subscript𝑋𝑖\displaystyle\mathbb{E}W_{1,1}\left(\eta_{\#}\mu,\,\frac{1}{n}\sum_{i=1}^{n}% \delta_{\eta(X_{i})}\right)blackboard_E italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_η ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT )
\displaystyle\leq Crlnnn𝔼X1X1Top12+Cr2lnnn+Crlnnn𝔼X1X1Top12+Cr2lnnn𝐶𝑟𝑛𝑛superscriptsubscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op12𝐶superscript𝑟2𝑛𝑛𝐶𝑟𝑛𝑛superscriptsubscriptnorm𝔼subscript𝑋1superscriptsubscript𝑋1𝑇op12𝐶superscript𝑟2𝑛𝑛\displaystyle\frac{Cr\sqrt{\ln n}}{\sqrt{n}}\|\mathbb{E}X_{1}X_{1}^{T}\|_{% \mathrm{op}}^{\frac{1}{2}}+\frac{Cr^{2}\ln n}{n}+\frac{Cr\sqrt{\ln n}}{\sqrt{n% }}\|\mathbb{E}X_{1}X_{1}^{T}\|_{\mathrm{op}}^{\frac{1}{2}}+\frac{Cr^{2}\ln n}{n}divide start_ARG italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∥ blackboard_E italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n end_ARG
=\displaystyle== CrlnnnΣop12+Cr2lnnn.𝐶𝑟𝑛𝑛superscriptsubscriptnormΣop12𝐶superscript𝑟2𝑛𝑛\displaystyle\frac{Cr\sqrt{\ln n}}{\sqrt{n}}\|\Sigma\|_{\mathrm{op}}^{\frac{1}% {2}}+\frac{Cr^{2}\ln n}{n}.divide start_ARG italic_C italic_r square-root start_ARG roman_ln italic_n end_ARG end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_C italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n end_ARG start_ARG italic_n end_ARG .

So by (3), the result follows. ∎

Below we restate and prove Proposition 1.5.

Proposition 3.4.

Let ΣΣ\Sigmaroman_Σ be a d×d𝑑𝑑d\times ditalic_d × italic_d positive semidefinite matrix such that Σop12Tr(Σ)subscriptnormΣop12TrΣ\|\Sigma\|_{\mathrm{op}}\leq\frac{1}{2}\mathrm{Tr}(\Sigma)∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr ( roman_Σ ). Then there exists a symmetric probability measure μ𝜇\muitalic_μ on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT supported on {xd:x22=Tr(Σ)}conditional-set𝑥superscript𝑑superscriptsubscriptnorm𝑥22TrΣ\{x\in\mathbb{R}^{d}:\,\|x\|_{2}^{2}=\mathrm{Tr}(\Sigma)\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_Tr ( roman_Σ ) } such that dxxT𝑑μ(x)=Σsubscriptsuperscript𝑑𝑥superscript𝑥𝑇differential-d𝜇𝑥Σ\int_{\mathbb{R}^{d}}xx^{T}\,d\mu(x)=\Sigma∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d italic_μ ( italic_x ) = roman_Σ and for every n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

(3.3) 𝔼[W2,1(μ,12ni=1n(δXi+δXi))2]116Σop(Tr(Σ)nΣop+Tr(Σ)nΣop),𝔼delimited-[]subscript𝑊21superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2116subscriptnormΣopTrΣ𝑛subscriptnormΣopTrΣ𝑛subscriptnormΣop\mathbb{E}\left[W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}\right]\geq\frac{1}{16}\|\Sigma\|_{\mathrm{op}}% \left(\frac{\mathrm{Tr}(\Sigma)}{n\|\Sigma\|_{\mathrm{op}}}+\sqrt{\frac{% \mathrm{Tr}(\Sigma)}{n\|\Sigma\|_{\mathrm{op}}}}\,\right),blackboard_E [ italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≥ divide start_ARG 1 end_ARG start_ARG 16 end_ARG ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ( divide start_ARG roman_Tr ( roman_Σ ) end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG + square-root start_ARG divide start_ARG roman_Tr ( roman_Σ ) end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG end_ARG ) ,

where X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. random vectors sampled according to μ𝜇\muitalic_μ.

Proof.

Without loss of generality, we may assume that ΣΣ\Sigmaroman_Σ is a diagonal matrix with diagonal entries λ1λd0subscript𝜆1subscript𝜆𝑑0\lambda_{1}\geq\ldots\geq\lambda_{d}\geq 0italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ … ≥ italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ≥ 0. We may also assume that Tr(Σ)=λ1++λd=1TrΣsubscript𝜆1subscript𝜆𝑑1\mathrm{Tr}(\Sigma)=\lambda_{1}+\ldots+\lambda_{d}=1roman_Tr ( roman_Σ ) = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 1. Let {e1,,ed}subscript𝑒1subscript𝑒𝑑\{e_{1},\ldots,e_{d}\}{ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } be the unit vector basis for dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Take

μ({ej})=μ({ej})=12λjfor j=1,,d.formulae-sequence𝜇subscript𝑒𝑗𝜇subscript𝑒𝑗12subscript𝜆𝑗for 𝑗1𝑑\mu(\{e_{j}\})=\mu(\{-e_{j}\})=\frac{1}{2}\lambda_{j}\quad\text{for }j=1,% \ldots,d.italic_μ ( { italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ) = italic_μ ( { - italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for italic_j = 1 , … , italic_d .

Then μ𝜇\muitalic_μ is symmetric.

We need to show that the left hand side of (3.3) is at least each of the two terms on the right hand side. So the proof has two parts. The first part of the proof is similar to the proofs of Proposition 2.10 and Corollary 2.11. Let +d={(v1,,vd):v1,,vd0}superscriptsubscript𝑑conditional-setsubscript𝑣1subscript𝑣𝑑subscript𝑣1subscript𝑣𝑑0\mathbb{R}_{+}^{d}=\{(v_{1},\ldots,v_{d}):\,v_{1},\ldots,v_{d}\geq 0\}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) : italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ≥ 0 }. By considering the 1-Lipschitz function f(t)=|t|𝑓𝑡𝑡f(t)=|t|italic_f ( italic_t ) = | italic_t |, we have

W1,1(μ,12ni=1n(δXi+δXi))subscript𝑊11𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖\displaystyle W_{1,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) )
\displaystyle\geq supv+d,v2=1|d|x,v|𝑑μ(x)12ni=1n(|Xi,v|+|Xi,v|)|subscriptsupremumformulae-sequence𝑣superscriptsubscript𝑑subscriptnorm𝑣21subscriptsuperscript𝑑𝑥𝑣differential-d𝜇𝑥12𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖𝑣subscript𝑋𝑖𝑣\displaystyle\sup_{v\in\mathbb{R}_{+}^{d},\,\|v\|_{2}=1}\left|\int_{\mathbb{R}% ^{d}}|\langle x,v\rangle|\,d\mu(x)-\frac{1}{2n}\sum_{i=1}^{n}(|\langle X_{i},v% \rangle|+|\langle-X_{i},v\rangle|)\right|roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ⟨ italic_x , italic_v ⟩ | italic_d italic_μ ( italic_x ) - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( | ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩ | + | ⟨ - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩ | ) |
=\displaystyle== supv+d,v2=1|i=1dλivi1ni=1n|Xi,v||subscriptsupremumformulae-sequence𝑣superscriptsubscript𝑑subscriptnorm𝑣21superscriptsubscript𝑖1𝑑subscript𝜆𝑖subscript𝑣𝑖1𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖𝑣\displaystyle\sup_{v\in\mathbb{R}_{+}^{d},\,\|v\|_{2}=1}\left|\sum_{i=1}^{d}% \lambda_{i}v_{i}-\frac{1}{n}\sum_{i=1}^{n}|\langle X_{i},v\rangle|\right|roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ⟩ | |
=\displaystyle== supv+d,v2=1|i=1dλivi1ni=1nabs(Xi),v|,subscriptsupremumformulae-sequence𝑣superscriptsubscript𝑑subscriptnorm𝑣21superscriptsubscript𝑖1𝑑subscript𝜆𝑖subscript𝑣𝑖1𝑛superscriptsubscript𝑖1𝑛abssubscript𝑋𝑖𝑣\displaystyle\sup_{v\in\mathbb{R}_{+}^{d},\,\|v\|_{2}=1}\left|\sum_{i=1}^{d}% \lambda_{i}v_{i}-\frac{1}{n}\sum_{i=1}^{n}\langle\mathrm{abs}(X_{i}),v\rangle% \right|,roman_sup start_POSTSUBSCRIPT italic_v ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ roman_abs ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_v ⟩ | ,

where abs(Xi)abssubscript𝑋𝑖\mathrm{abs}(X_{i})roman_abs ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the vector for which we take absolute value on each entry of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. (Since Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is distributed according to μ𝜇\muitalic_μ, the vector Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT actually has only one nonzero entry.) So

W1,1(μ,12ni=1n(δXi+δXi))12diag(Σ)1ni=1nabs(Xi)2,subscript𝑊11𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖12subscriptnormdiagΣ1𝑛superscriptsubscript𝑖1𝑛abssubscript𝑋𝑖2W_{1,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+\delta_{-X_{i}})% \right)\geq\frac{1}{2}\left\|\mathrm{diag}(\Sigma)-\frac{1}{n}\sum_{i=1}^{n}% \mathrm{abs}(X_{i})\right\|_{2},italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ roman_diag ( roman_Σ ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_abs ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where diag(Σ)=(λ1,,λd)ddiagΣsubscript𝜆1subscript𝜆𝑑superscript𝑑\mathrm{diag}(\Sigma)=(\lambda_{1},\ldots,\lambda_{d})\in\mathbb{R}^{d}roman_diag ( roman_Σ ) = ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Since X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\ldots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. with distribution μ𝜇\muitalic_μ, the random vectors abs(X1),,abs(Xn)abssubscript𝑋1abssubscript𝑋𝑛\mathrm{abs}(X_{1}),\ldots,\mathrm{abs}(X_{n})roman_abs ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , roman_abs ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are i.i.d. with the following distribution

abs#μ({ej})=λjfor j=1,,d.formulae-sequencesubscriptabs#𝜇subscript𝑒𝑗subscript𝜆𝑗for 𝑗1𝑑\mathrm{abs}_{\#}\mu(\{e_{j}\})=\lambda_{j}\quad\text{for }j=1,\ldots,d.roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ) = italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for italic_j = 1 , … , italic_d .

In particular, 𝔼[abs(X1)]=j=1dλjej=diag(Σ)𝔼delimited-[]abssubscript𝑋1superscriptsubscript𝑗1𝑑subscript𝜆𝑗subscript𝑒𝑗diagΣ\displaystyle\mathbb{E}[\mathrm{abs}(X_{1})]=\sum_{j=1}^{d}\lambda_{j}e_{j}=% \mathrm{diag}(\Sigma)blackboard_E [ roman_abs ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_diag ( roman_Σ ). So

𝔼[W1,1(μ,12ni=1n(δXi+δXi))2]𝔼delimited-[]subscript𝑊11superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖2\displaystyle\mathbb{E}\left[W_{1,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(% \delta_{X_{i}}+\delta_{-X_{i}})\right)^{2}\right]blackboard_E [ italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] \displaystyle\geq 14𝔼diag(Σ)1ni=1nabs(Xi)2214𝔼superscriptsubscriptnormdiagΣ1𝑛superscriptsubscript𝑖1𝑛abssubscript𝑋𝑖22\displaystyle\frac{1}{4}\mathbb{E}\left\|\mathrm{diag}(\Sigma)-\frac{1}{n}\sum% _{i=1}^{n}\mathrm{abs}(X_{i})\right\|_{2}^{2}divide start_ARG 1 end_ARG start_ARG 4 end_ARG blackboard_E ∥ roman_diag ( roman_Σ ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_abs ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 14n[𝔼abs(X1)22diag(Σ)22]14𝑛delimited-[]𝔼superscriptsubscriptnormabssubscript𝑋122superscriptsubscriptnormdiagΣ22\displaystyle\frac{1}{4n}\left[\mathbb{E}\|\mathrm{abs}(X_{1})\|_{2}^{2}-\|% \mathrm{diag}(\Sigma)\|_{2}^{2}\right]divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG [ blackboard_E ∥ roman_abs ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ roman_diag ( roman_Σ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=\displaystyle== 14n(1(λ12++λd2)).14𝑛1superscriptsubscript𝜆12superscriptsubscript𝜆𝑑2\displaystyle\frac{1}{4n}(1-(\lambda_{1}^{2}+\ldots+\lambda_{d}^{2})).divide start_ARG 1 end_ARG start_ARG 4 italic_n end_ARG ( 1 - ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + … + italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) .

Since by assumption Σop12Tr(Σ)=12subscriptnormΣop12TrΣ12\|\Sigma\|_{\mathrm{op}}\leq\frac{1}{2}\mathrm{Tr}(\Sigma)=\frac{1}{2}∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr ( roman_Σ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, we have λj12subscript𝜆𝑗12\lambda_{j}\leq\frac{1}{2}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG for all j𝑗jitalic_j. So λ12++λd212(λ1++λd)=12superscriptsubscript𝜆12superscriptsubscript𝜆𝑑212subscript𝜆1subscript𝜆𝑑12\lambda_{1}^{2}+\ldots+\lambda_{d}^{2}\leq\frac{1}{2}(\lambda_{1}+\ldots+% \lambda_{d})=\frac{1}{2}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + … + italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG. So

(3.4) 𝔼[W1,1(μ,12ni=1n(δXi+δXi))2]18n.𝔼delimited-[]subscript𝑊11superscript𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖218𝑛\mathbb{E}\left[W_{1,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)^{2}\right]\geq\frac{1}{8n}.blackboard_E [ italic_W start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≥ divide start_ARG 1 end_ARG start_ARG 8 italic_n end_ARG .

This proves that the left hand side of (3.3) is at least twice the first term on the right hand side. We now move to the second part of the proof. The second term on the right hand side of (3.3) is larger than the first term precisely when Tr(Σ)nΣop<1TrΣ𝑛subscriptnormΣop1\frac{\mathrm{Tr}(\Sigma)}{n\|\Sigma\|_{\mathrm{op}}}<1divide start_ARG roman_Tr ( roman_Σ ) end_ARG start_ARG italic_n ∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG < 1, or equivalently, 1n<λ11𝑛subscript𝜆1\frac{1}{n}<\lambda_{1}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG < italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. So we may assume this in the rest of the proof.

Consider the pushforward measure (e1)#μsubscriptsubscript𝑒1#𝜇(e_{1})_{\#}\mu( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ of μ𝜇\muitalic_μ by the map ,e1subscript𝑒1\langle\cdot,e_{1}\rangle⟨ ⋅ , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩. Note that

(e1)#μ({1})=12λ1,(e1)#μ({1})=12λ1,(e1)#μ({0})=1λ1.formulae-sequencesubscriptsubscript𝑒1#𝜇112subscript𝜆1formulae-sequencesubscriptsubscript𝑒1#𝜇112subscript𝜆1subscriptsubscript𝑒1#𝜇01subscript𝜆1(e_{1})_{\#}\mu(\{-1\})=\frac{1}{2}\lambda_{1},\quad(e_{1})_{\#}\mu(\{1\})=% \frac{1}{2}\lambda_{1},\quad(e_{1})_{\#}\mu(\{0\})=1-\lambda_{1}.( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { - 1 } ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { 1 } ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { 0 } ) = 1 - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

We have

W2,1(μ,12ni=1n(δXi+δXi))subscript𝑊21𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝛿subscript𝑋𝑖\displaystyle W_{2,1}\left(\mu,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{X_{i}}+% \delta_{-X_{i}})\right)italic_W start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ( italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) \displaystyle\geq W2((e1)#μ,12ni=1n(δXi,e1+δXi,e1))subscript𝑊2subscriptsubscript𝑒1#𝜇12𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝑒1subscript𝛿subscript𝑋𝑖subscript𝑒1\displaystyle W_{2}\left((e_{1})_{\#}\mu,\,\frac{1}{2n}\sum_{i=1}^{n}(\delta_{% \langle X_{i},e_{1}\rangle}+\delta_{-\langle X_{i},e_{1}\rangle})\right)italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT - ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT ) )
=\displaystyle== W2(abs#(e1)#μ,1ni=1nδ|Xi,e1|),subscript𝑊2subscriptabs#subscriptsubscript𝑒1#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝑒1\displaystyle W_{2}\left(\mathrm{abs}_{\#}(e_{1})_{\#}\mu,\,\frac{1}{n}\sum_{i% =1}^{n}\delta_{|\langle X_{i},e_{1}\rangle|}\right),italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT | ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ | end_POSTSUBSCRIPT ) ,

where abs#(e1)#μ({1})=λ1subscriptabs#subscriptsubscript𝑒1#𝜇1subscript𝜆1\mathrm{abs}_{\#}(e_{1})_{\#}\mu(\{1\})=\lambda_{1}roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { 1 } ) = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and abs#(e1)#μ({0})=1λ1subscriptabs#subscriptsubscript𝑒1#𝜇01subscript𝜆1\mathrm{abs}_{\#}(e_{1})_{\#}\mu(\{0\})=1-\lambda_{1}roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ ( { 0 } ) = 1 - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. (See the beginning of the proof of Lemma 3.2.) Moreover, the random variables |Xi,e1|subscript𝑋𝑖subscript𝑒1|\langle X_{i},e_{1}\rangle|| ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ |, for i=1,,d𝑖1𝑑i=1,\ldots,ditalic_i = 1 , … , italic_d, are i.i.d. with this distribution. Thus, the probability measure 1ni=1nδ|Xi,e1|1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝑒1\frac{1}{n}\sum_{i=1}^{n}\delta_{|\langle X_{i},e_{1}\rangle|}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT | ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ | end_POSTSUBSCRIPT is supported on only two points 00 and 1111 with the mass at 1111 being 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG times a binom(n,λ1)binom𝑛subscript𝜆1\mathrm{binom}(n,\lambda_{1})roman_binom ( italic_n , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) random variable, which we denote by Y𝑌Yitalic_Y. So we have

W2(abs#(e1)#μ,1ni=1nδ|Xi,e1|)2=|1nYλ1|.subscript𝑊2superscriptsubscriptabs#subscriptsubscript𝑒1#𝜇1𝑛superscriptsubscript𝑖1𝑛subscript𝛿subscript𝑋𝑖subscript𝑒121𝑛𝑌subscript𝜆1W_{2}\left(\mathrm{abs}_{\#}(e_{1})_{\#}\mu,\,\frac{1}{n}\sum_{i=1}^{n}\delta_% {|\langle X_{i},e_{1}\rangle|}\right)^{2}=\left|\frac{1}{n}Y-\lambda_{1}\right|.italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_abs start_POSTSUBSCRIPT # end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT # end_POSTSUBSCRIPT italic_μ , divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT | ⟨ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ | end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_Y - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | .

As explained above, we may assume that 1nλ11𝑛subscript𝜆1\frac{1}{n}\leq\lambda_{1}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ≤ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Also by assumption, Σop12Tr(Σ)subscriptnormΣop12TrΣ\|\Sigma\|_{\mathrm{op}}\leq\frac{1}{2}\mathrm{Tr}(\Sigma)∥ roman_Σ ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr ( roman_Σ ) so λ11211nsubscript𝜆11211𝑛\lambda_{1}\leq\frac{1}{2}\leq 1-\frac{1}{n}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ≤ 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. Therefore, 1nλ111n1𝑛subscript𝜆111𝑛\frac{1}{n}\leq\lambda_{1}\leq 1-\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ≤ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG. With λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in this range, by [5, Theorem 1],

𝔼|1nYλ1|12(𝔼|1nYλ1|2)12=nλ1(1λ1)n2=λ1(1λ1)2n12λ1n.𝔼1𝑛𝑌subscript𝜆112superscript𝔼superscript1𝑛𝑌subscript𝜆1212𝑛subscript𝜆11subscript𝜆1𝑛2subscript𝜆11subscript𝜆12𝑛12subscript𝜆1𝑛\mathbb{E}\left|\frac{1}{n}Y-\lambda_{1}\right|\geq\frac{1}{\sqrt{2}}\left(% \mathbb{E}\left|\frac{1}{n}Y-\lambda_{1}\right|^{2}\right)^{\frac{1}{2}}=\frac% {\sqrt{n\lambda_{1}(1-\lambda_{1})}}{n\sqrt{2}}=\sqrt{\frac{\lambda_{1}(1-% \lambda_{1})}{2n}}\geq\frac{1}{2}\sqrt{\frac{\lambda_{1}}{n}}.blackboard_E | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_Y - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ( blackboard_E | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_Y - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT = divide start_ARG square-root start_ARG italic_n italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG end_ARG start_ARG italic_n square-root start_ARG 2 end_ARG end_ARG = square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 italic_n end_ARG end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG square-root start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG end_ARG .

This proves that the left hand side of (3.3) is at least twice the second term on the right hand side. Together with (3.4), this completes the proof. ∎

Acknowledgement: The author is grateful to Ramon van Handel,
Nikita Zhivotovskiy and Sloan Nietert for some useful discussions.

References

  • [1] P. Abdalla and N. Zhivotovskiy, Covariance estimation: Optimal dimension-free guarantees for adversarial corruption and heavy tails, Journal of the European Mathematical Society, to appear.
  • [2] R. Adamczak, A. E. Litvak, A. Pajor and N. Tomczak-Jaegermann, Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles, Journal of the American Mathematical Society, 23:535-561, 2009
  • [3] R. Adamczak, A. E. Litvak, A. Pajor and N. Tomczak-Jaegermann, Sharp bounds on the rate of convergence of the empirical covariance matrix, Comptes Rendus. Mathématique 349.3-4 (2011): 195-200.
  • [4] D. Bartl and S. Mendelson, Structure preservation via the Wasserstein distance, arXiv preprint arXiv:2209.07058 (2022).
  • [5] D. Berend and A. Kontorovich, A sharp estimate of the binomial mean absolute deviation with applications, Statistics & Probability Letters 83.4 (2013): 1254-1259.
  • [6] S. Bobkov and M. Ledoux, One-dimensional empirical measures, order statistics, and Kantorovich transport distances, Vol. 261. No. 1259. American Mathematical Society, 2019.
  • [7] N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, Sliced and Radon Wasserstein barycenters of measures, Journal of Mathematical Imaging and Vision, 1(51):22-45, 2015.
  • [8] A. Buchholz, Operator Khintchine inequality in non-commutative probability, Math. Ann., 319(1):1-16, 2001.
  • [9] M. Carrière, M. Cuturi, and S. Oudot, Sliced Wasserstein kernel for persistence diagrams, In International Conference on Machine Learning (ICML), 2017.
  • [10] I. Deshpande, Z. Zhang, and A. G. Schwing, Generative modeling using the sliced Wasserstein distance, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • [11] I. Deshpande, Y-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S  Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing Max-sliced Wasserstein distance and its use for GANs, In CVPR, pages 10648-10656, 2019.
  • [12] N. Fournier and A. Guillin, On the rate of convergence in Wasserstein distance of the empirical measure, Probability theory and related fields 162.3-4 (2015): 707-738.
  • [13] S. Kolouri, P. E. Pope, C. E. Martin, and G. K. Rohde. Sliced Wasserstein autoencoders. In International Conference on Learning Representations, 2018.
  • [14] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau and G. Rohde, Generalized sliced wasserstein distances, Advances in neural information processing systems, 32 (2019).
  • [15] V. Koltchinskii and K. Lounici, Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 23:110-133, 2014.
  • [16] R. Latała and K. Oleszkiewicz, On the best constant in the Khinchin-Kahane inequality, Studia Mathematica 109.1 (1994): 101-104.
  • [17] M. Ledoux and M. Talagrand. Probability in Banach Spaces: isoperimetry and processes, Springer Science & Business Media, 2013.
  • [18] T. Lin, C. Fan, N. Ho, M. Cuturi and M. Jordan, Projection robust Wasserstein distance and Riemannian optimization. Advances in neural information processing systems, 33 (2020): 9383-9397.
  • [19] T. Lin, Z. Zheng, E. Chen, M. Cuturi, M. I. Jordan, On projection robust optimal transport: Sample complexity and model misspecification, In International Conference on Artificial Intelligence and Statistics PMLR 2021.
  • [20] F. Lust-Piquard, Inégalités de Khintchine dans Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT (1<p<1𝑝1<p<\infty1 < italic_p < ∞), C. R. Acad. Sci. Paris Sér. I Math., 303(7):289-292, 1986.
  • [21] S. Mendelson and G. Paouris, On the singular values of random matrices, Journal of the European Mathematical Society, 16:823-834, 2014.
  • [22] K. Nadjahi, A. Durmus, U. Simsekli and R. Badeau, Asymptotic guarantees for learning generative models with the sliced-Wasserstein distance. Advances in Neural Information Processing Systems, 32 (2019).
  • [23] K. Nguyen and N. Ho, Amortized projection optimization for sliced Wasserstein generative models, Advances in Neural Information Processing Systems 35 (2022): 36985-36998.
  • [24] K. Nguyen and N. Ho, Energy-based sliced Wasserstein distance, Advances in Neural Information Processing Systems 36 (2024).
  • [25] S. Nietert, Z. Goldfeld, R. Sadhu and K. Kato, Statistical, robustness, and computational guarantees for sliced Wasserstein distances, Advances in Neural Information Processing Systems, 35, 28179-28193 (2022).
  • [26] J. Niles-Weed and P. Rigollet, Estimation of Wasserstein distances in the spiked transport model, Bernoulli 28.4 (2022): 2663-2688.
  • [27] J. Olea, C. Rush, A. Velez, and J. Wiesel, On the generalization error of norm penalty linear regression models, arXiv preprint arXiv:2211.07608, 2022.
  • [28] F.-P. Paty and M. Cuturi, Subspace robust Wasserstein distances, In ICML, pages 5072-5081, 2019.
  • [29] G. Pisier, Introduction to operator space theory, volume 294 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, 2003.
  • [30] M. Raab and A. Steger, “Balls into bins”-A simple and tight analysis, International Workshop on Randomization and Approximation Techniques in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998.
  • [31] J. Rabin, G. Peyré, J. Delon, and M. Bernot, Wasserstein barycenter and its application to texture mixing, In International Conference on Scale Space and Variational Methods in Computer Vision, pages 435-446. Springer, 2011.
  • [32] M. Rudelson, Random vectors in the isotropic position, Journal of Functional Analysis 164.1 (1999): 60-72.
  • [33] N. Srivastava and R. Vershynin, Covariance estimation for distributions with 2+ϵ2italic-ϵ2+\epsilon2 + italic_ϵ moments, Annals of Probability 41 (2013), 3081-3111.
  • [34] M. Talagrand, The generic chaining: upper and lower bounds of stochastic processes. Springer Science & Business Media, 2005.
  • [35] K. Tikhomirov, Sample covariance matrices of heavy-tailed distributions, International Mathematics Research Notices 2018.20 (2018): 6254-6289.
  • [36] J. A. Tropp, An introduction to matrix concentration inequalities, Foundations and Trends in Machine Learning 8.1-2 (2015): 1-230.
  • [37] R. van Handel, Probability in high dimension, Lecture Notes (Princeton University) (2014).
  • [38] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. Compressed sensing, 210–268, Cambridge Univ. Press, Cambridge, 2012.
  • [39] R. Vershynin, How close is the sample covariance matrix to the actual covariance matrix?, Journal of Theoretical Probability 25.3 (2012): 655-686.
  • [40] R. Vershynin, High-dimensional probability: An introduction with applications in data science, Vol. 47. Cambridge university press, 2018.
  • [41] M. J. Wainwright, High-dimensional statistics: A non-asymptotic viewpoint. Vol. 48. Cambridge university press, 2019.
  • [42] J. Wang, R. Gao, and Y. Xie, Two-sample test using projected Wasserstein distance, IEEE International Symposium on Information Theory (ISIT), 2021.
    Updated version: arxiv: 2010.11970
  • [43] J. Wu, Z. Huang, D. Acharya, W. Li, J. Thoma, D. P. Paudel, and L. V. Gool, Sliced Wasserstein generative models, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3713-3722, 2019.
  • [44] N. Zhivotovskiy, Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle, Electronic Journal of Probability 29 (2024): 1-28.