Optimal Rate of Kernel Regression
in Large Dimensions

Weihao Lu, Haobo Zhang, Yicheng Li, Manyun Xu
Center for Statistical Science, Department of Industrial Engineering, Tsinghua University
100084, Bei**g, China
{luwh19, zhang-hb21, liyc22, xumy20}@mails.tsinghua.edu.cn
&Qian Lin
Center for Statistical Science, Department of Industrial Engineering, Tsinghua University
100084, Bei**g, China
[email protected]
Co-first authorCorresponding author
Abstract

We perform a study on kernel regression for large-dimensional data (where the sample size n𝑛nitalic_n is polynomially depending on the dimension d𝑑ditalic_d of the samples, i.e., ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some γ>0𝛾0\gamma>0italic_γ > 0 ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity εn2superscriptsubscript𝜀𝑛2\varepsilon_{n}^{2}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the metric entropy ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for γ=2,4,6,8,𝛾2468\gamma=2,4,6,8,\cdotsitalic_γ = 2 , 4 , 6 , 8 , ⋯. We then further determine the optimal rate of the excess risk of kernel regression for all the γ>0𝛾0\gamma>0italic_γ > 0 and find that the curve of optimal rate varying along γ𝛾\gammaitalic_γ exhibits several new phenomena including the multiple descent behavior and the periodic plateau behavior. As an application, for the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well.

Keywords kernel regression  \cdot neural network  \cdot high-dimensional statistics  \cdot minimax rates

1 Introduction

Suppose we have observed n𝑛nitalic_n i.i.d. samples (Xi,Yi)subscript𝑋𝑖subscript𝑌𝑖(X_{i},Y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) from a joint distribution (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ) supported on d+1×superscript𝑑1\mathbb{R}^{d+1}\times\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT × blackboard_R. The regression problem, one of the most fundamental problems in statistics, aims to find a function f^nsubscript^𝑓𝑛\hat{f}_{n}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT based on these samples such that the excess risk,

f^nfL22=𝔼X[(f(X)f^n(X))2],superscriptsubscriptnormsubscript^𝑓𝑛subscript𝑓superscript𝐿22subscript𝔼𝑋delimited-[]superscriptsubscript𝑓𝑋subscript^𝑓𝑛𝑋2\displaystyle\left\|\hat{f}_{n}-f_{\star}\right\|_{L^{2}}^{2}=\mathbb{E}_{X}% \left[\left(f_{\star}(X)-\hat{f}_{n}(X)\right)^{2}\right],∥ over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_X ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ,

is small, where f(x)=𝔼[Y|x]subscript𝑓𝑥𝔼delimited-[]conditional𝑌𝑥f_{\star}(x)=\mathbb{E}[Y|x]italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) = blackboard_E [ italic_Y | italic_x ] is the regression function. Many non-parametric regression methods are proposed to solve the regression problem, such as polynomial splines [75], local polynomials [20, 73], the kernel methods [14, 15, 16], etc. When the dimension d𝑑ditalic_d of data is small, these methods produce reasonable results; however, when d𝑑ditalic_d is relatively large, the convergence rate of the excess risk can be extremely slow. What’s worse, though some additional assumptions such as low intrinsic dimensionality (that data falls into a subspace with dimension far smaller than d𝑑ditalic_d) and sparsity of features can improve the theoretical performance of certain non-parametric regression problems [2, 27], few successful real-world examples/applications have been reported. On the other hand, neural network methods have gained tremendous successes in many large-dimensional problems, such as computer vision [35, 43] and natural language processing [23]. For example, the ILSVRC competition [65] has a dataset of 1.2 million samples with a dimensionality of approximately 200K, while the pre-train dataset of the well-known language representation model, Bidirectional Encoder Representations from Transformers (BERT) [23], consists of 13 million samples with a dimensionality of approximately 400K.

Several groups of researchers tried to explain the superior performance of neural networks on large dimensional data from various aspects. However, the highly non-linear dynamic of the differential equation associated with the gradient descent/flow of training the neural network[33, 45, 63] makes the analysis on the dynamic of training the neural network notoriously hard. When the width of a neural network is sufficiently large, the training process falls into the ‘lazy regime’, i.e., its parameters/weights stay in a small neighborhood of their initial position during the training process [3, 25, 26, 48]. Since [39] observed that the time-varying neural network kernel (NNK) converges to a time-invariant neural tangent kernel (NTK) point-wisely as the width m𝑚mitalic_m of the neural network absent\rightarrow\infty→ ∞, it has been widely believed that the generalization ability of early-stop** kernel regression with NTK could be served as a proper surrogate of the generalization ability of neural networks in the ‘lazy regime’ [4, 37, 76]. Recently, a sequence of works [44, 46] further showed that the NNK uniformly converges to the NTK as the width m𝑚m\rightarrow\inftyitalic_m → ∞ which rigorously justified this belief. Thus, understanding the generalization ability of the kernel regression (with respect to NTK) in large dimensions will help us understand the superior performance of (wide) neural networks.

Kernel regression (or regression over an RKHS), as a classical topic, has been studied since the 1990s. Most work imposes the polynomial eigenvalue decay assumption over a kernel K𝐾Kitalic_K (i.e., there exist constants 0<𝔠<0𝔠0<\mathfrak{c}\leq\mathfrak{C}<\infty0 < fraktur_c ≤ fraktur_C < ∞, such that the eigenvalues of the kernel satisfy 𝔠jβλjjβ𝔠superscript𝑗𝛽subscript𝜆𝑗superscript𝑗𝛽\mathfrak{c}j^{-\beta}\leq\lambda_{j}\leq\mathfrak{C}j^{-\beta}fraktur_c italic_j start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT ≤ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ fraktur_C italic_j start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT for some constant β>1𝛽1\beta>1italic_β > 1) and assume that the target function fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT belongs to the RKHS associated with K𝐾Kitalic_K [14, 15, 51, 64]. They then showed that the minimax rate of the excess risk of regression over the corresponding RKHS is lower bounded by nβ/(β+1)superscript𝑛𝛽𝛽1n^{-{\beta}/({\beta+1})}italic_n start_POSTSUPERSCRIPT - italic_β / ( italic_β + 1 ) end_POSTSUPERSCRIPT and that some kernel methods (e.g., the kernel ridge regression and the early-stop** kernel regression) can produce estimators achieving this optimal rate. Thus, verifying that if an NTK satisfies the polynomial eigenvalue decay assumption and determining the eigenvalue decay rate of it becomes a natural strategy to discuss the generalization ability of the NTK ( or equivalently, the wide neural networks ) regression. When the NTK is defined on sphere 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, it is an inner product kernel. Hence, the eigenvalues of NTK can be obtained through a detailed calculation with the help of the spherical harmonic polynomials. It is shown in [10, 11] that when d𝑑ditalic_d is fixed, the eigenvalues of the NTK defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT polynomially decayed at rate (d+1)/d𝑑1𝑑({d+1})/{d}( italic_d + 1 ) / italic_d. When the domain is other than a sphere, [44, 46] further illustrated that the eigenvalues decay rate of NTK on any bounded open set in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is still (d+1)/d𝑑1𝑑({d+1})/{d}( italic_d + 1 ) / italic_d. Some works then claimed that the optimal tuned neural network on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT or on any bounded open set in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT can achieve the optimal rate n(d+1)/(2d+1)superscript𝑛𝑑12𝑑1n^{-({d+1})/({2d+1})}italic_n start_POSTSUPERSCRIPT - ( italic_d + 1 ) / ( 2 italic_d + 1 ) end_POSTSUPERSCRIPT [37, 44, 46].

When dimension d𝑑ditalic_d is large, much less is known about the convergence rate of the excess risk of kernel methods. There are several works devoted to the high-dimensional setting where ndasymptotically-equals𝑛𝑑n\asymp ditalic_n ≍ italic_d. For example, motivated by the linear approximation of kernel matrices in high dimensional data proposed by [40], [49] provided an upper bound on the excess risk of kernel interpolation and claimed that kernel interpolation generalizes well in high dimensions. Similar results for kernel ridge regression are proven in [52]. These results are widely interpreted as evidence of the benign overfitting phenomenon (e.g., [9, 11, 53, 67]): overfitted models can still generalize well. Building on the work of [49], the benign overfitting phenomenon has been extensively investigated in the literature, and we referred to [7, 13, 34, 62, 78] for details. There is another line of research considering the large dimensional setting where ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some γ>0𝛾0{\gamma}>0italic_γ > 0. For example, [30] studied the square-integrable function space on the sphere 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and proved that when γ𝛾{\gamma}italic_γ is a non-integer, kernel ridge regression is consistent if and only if the regression function is a polynomial with a fixed degree γabsent𝛾\leq\gamma≤ italic_γ. Inspired by the techniques presented in [30], several follow-up works extended the results to different settings [1, 29, 31, 55, 56, 61]. Additionally, [24] established an upper bound for kernel methods with specific kernels when γ𝛾{\gamma}italic_γ is an integer. Surprisingly, a recent work ([12]) numerically reported a ‘periodic plateau behavior’ in Figure 5 (b) of their paper: when γ𝛾\gammaitalic_γ varies within certain specific ranges, the excess risk of kernel regression decays very slowly. All these inspirational works hint that determining the convergence rate of kernel regression in large dimensions is a hard but fruitful question, and we are probably to find many new phenomena if we can determine its convergence rate.

In this paper, we consider the generalization ability of kernel regression, especially kernel regression with inner product kernel defined on sphere 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with respect to large-dimensional data where ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. More precisely, assuming the target function f𝚒𝚗subscript𝑓superscript𝚒𝚗f_{\star}\in\mathcal{H}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, the RKHS associated with an inner product kernel defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we will provide a sharp convergence rate of the excess risk of kernel regression with respect to data of large dimension. We will further show that this rate is actually (nearly) minimax optimal for any γ>0𝛾0{\gamma}>0italic_γ > 0.

1.1 Related works

The generalization ability of high dimensional kernel regression attracts increasing attentions recently. When ndasymptotically-equals𝑛𝑑n\asymp ditalic_n ≍ italic_d, [40] discovered a linear approximation of the empirical kernel matrix,

K(𝑿,𝑿)α1𝑿𝑿τ+α2𝟏n𝟏nτ+α3𝐈n,𝐾𝑿𝑿subscript𝛼1𝑿superscript𝑿𝜏subscript𝛼2subscript1𝑛superscriptsubscript1𝑛𝜏subscript𝛼3subscript𝐈𝑛K(\boldsymbol{X},\boldsymbol{X})\approx\alpha_{1}\boldsymbol{X}\boldsymbol{X}^% {\tau}+\alpha_{2}\mathbf{1}_{n}\mathbf{1}_{n}^{\tau}+\alpha_{3}\mathbf{I}_{n},italic_K ( bold_italic_X , bold_italic_X ) ≈ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_X bold_italic_X start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ,

where the coefficients α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and α3subscript𝛼3\alpha_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT depend on the dimension d𝑑ditalic_d and the inner-product kernel K𝐾Kitalic_K. Inspired by this approximation, [49] subsequently provided an upper bound 𝐕𝐕\mathbf{V}bold_V on the excess risk of kernel interpolation when nd𝑛𝑑n\approx ditalic_n ≈ italic_d. They further demonstrated that 𝐕0𝐕0\mathbf{V}\to 0bold_V → 0 when the data exhibits a low-dimensional structure. Under the same setting, [52] extends the upper bound of the excess risk to the kernel ridge regression with other choice of the regularization parameters. Furthermore, [66] demonstrated that the fitting function of kernel ridge regression converges point-wisely to the one of a linear model with two penalized terms when ndasymptotically-equals𝑛𝑑n\asymp ditalic_n ≍ italic_d.

In the large dimensional setting where ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some non-integer γ>0𝛾0{\gamma}>0italic_γ > 0, [30] develop the higher-order approximation for the empirical kernel matrix in the following forms:

K(𝑿,𝑿)𝐾𝑿𝑿\displaystyle K(\boldsymbol{X},\boldsymbol{X})italic_K ( bold_italic_X , bold_italic_X ) k<rμk𝒀k(𝑿)𝒀k(𝑿)τ𝐈+μr𝒀r(𝑿)𝒀r(𝑿)τ𝐈𝐈+k>rμk𝒀k(𝑿)𝒀k(𝑿)τ𝐈𝐈𝐈,absentsubscriptsubscript𝑘𝑟subscript𝜇𝑘subscript𝒀𝑘𝑿subscript𝒀𝑘superscript𝑿𝜏𝐈subscriptsubscript𝜇𝑟subscript𝒀𝑟𝑿subscript𝒀𝑟superscript𝑿𝜏𝐈𝐈subscriptsubscript𝑘𝑟subscript𝜇𝑘subscript𝒀𝑘𝑿subscript𝒀𝑘superscript𝑿𝜏𝐈𝐈𝐈\displaystyle\approx~{}\underbrace{\sum_{k<r}\mu_{k}\boldsymbol{Y}_{k}(% \boldsymbol{X})\boldsymbol{Y}_{k}(\boldsymbol{X})^{\tau}}_{\mathbf{I}}~{}+~{}% \underbrace{\vphantom{\sum_{k=r}}\mu_{r}\boldsymbol{Y}_{r}(\boldsymbol{X})% \boldsymbol{Y}_{r}(\boldsymbol{X})^{\tau}}_{\mathbf{II}}~{}+~{}\underbrace{% \sum_{k>r}\mu_{k}\boldsymbol{Y}_{k}(\boldsymbol{X})\boldsymbol{Y}_{k}(% \boldsymbol{X})^{\tau}}_{\mathbf{III}},≈ under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k < italic_r end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_X ) bold_italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_I end_POSTSUBSCRIPT + under⏟ start_ARG italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_X ) bold_italic_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_II end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k > italic_r end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_X ) bold_italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_III end_POSTSUBSCRIPT , (1)

where r𝑟ritalic_r is an integer γabsent𝛾\leq{\gamma}≤ italic_γ, μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s are the eigenvalues of K𝐾Kitalic_K, and 𝒀k(𝑿)subscript𝒀𝑘𝑿\boldsymbol{Y}_{k}(\boldsymbol{X})bold_italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_X ) consists of spherical harmonic of degree k𝑘kitalic_k. They demonstrated that the term 𝐈𝐈𝐈𝐈𝐈𝐈\mathbf{III}bold_III in (1) can be approximated by an identity matrix. By assuming that the regression function fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is square-integrable on the sphere d𝕊d𝑑superscript𝕊𝑑\sqrt{d}\mathbb{S}^{d}square-root start_ARG italic_d end_ARG blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with non-vanishing L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm as d𝑑d\to\inftyitalic_d → ∞, [30] then proved two results: (1) If fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is a polynomial, then kernel ridge regression is consistent, and (2) If fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is not a polynomial and if the model is noiseless, then all kernel methods are inconsistent. Several follow-up works have extended the results presented in [30], and all of them adopted the square-integrable function space assumption. For example, [29] consider the low-intrinsic-dimensional case; [56] allows the degrees of the polynomials diverge with d𝑑ditalic_d; [1, 55, 61] analyze kernel ridge regression with invariance kernels and convolution kernels rather than inner-product kernels; [31] discuss the performance of early-stop** kernel regression; while [81] approximate the term 𝐈𝐈𝐈𝐈\mathbf{II}bold_II in (1) by 𝑿𝑿τ𝑿superscript𝑿𝜏\boldsymbol{X}\boldsymbol{X}^{\tau}bold_italic_X bold_italic_X start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT using the Marchenko–Pastur law when γ1𝛾1{\gamma}\geq 1italic_γ ≥ 1 is an integer. We discuss their assumptions regarding the function space and their results in Section 7.1.

In the work of [50], an upper bound on the convergence rate of the excess risk of kernel interpolation is provided when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, assuming γ>1𝛾1{\gamma}>1italic_γ > 1 is fixed. [50] assume that the regression function can be expressed as f(x)=K(x,),ρ()L2subscript𝑓𝑥subscript𝐾𝑥subscript𝜌superscript𝐿2f_{\star}(x)=\langle K(x,\cdot),\rho_{\star}(\cdot)\rangle_{L^{2}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) = ⟨ italic_K ( italic_x , ⋅ ) , italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( ⋅ ) ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with ρL44Csuperscriptsubscriptnormsubscript𝜌superscript𝐿44𝐶\|\rho_{\star}\|_{L^{4}}^{4}\leq C∥ italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≤ italic_C for some constant C>0𝐶0C>0italic_C > 0. Then, they obtain the convergence rate nβ(γ)superscript𝑛𝛽𝛾n^{-\beta({\gamma})}italic_n start_POSTSUPERSCRIPT - italic_β ( italic_γ ) end_POSTSUPERSCRIPT, where 0β(γ):=min{γ/γ1,1γ/γ}1/(2γ+1)0𝛽𝛾assign𝛾𝛾11𝛾𝛾12𝛾10\leq\beta({\gamma}):=\min\left\{\lceil{\gamma}\rceil/{\gamma}-1,1-\lfloor{% \gamma}\rfloor/{\gamma}\right\}\leq 1/(2\lfloor{\gamma}\rfloor+1)0 ≤ italic_β ( italic_γ ) := roman_min { ⌈ italic_γ ⌉ / italic_γ - 1 , 1 - ⌊ italic_γ ⌋ / italic_γ } ≤ 1 / ( 2 ⌊ italic_γ ⌋ + 1 ). However, it remains uncertain whether other kernel methods with regularized terms, including early-stop** kernel regression, can achieve significantly better convergence rates than nβ(γ)superscript𝑛𝛽𝛾n^{-\beta({\gamma})}italic_n start_POSTSUPERSCRIPT - italic_β ( italic_γ ) end_POSTSUPERSCRIPT in large dimensions. As is recently reported in [47], kernel interpolation generalizes much more poorly than early-stop** kernel regression in fixed dimensions. Therefore, it cannot be assumed that other kernel methods perform similarly to kernel interpolation in large dimensions. Moreover, the results provided by [50] are not sufficient to assert that kernel interpolation is optimal due to the absence of a corresponding minimax lower bound. A detailed comparison of [50] with our results and corresponding experiments are deferred to Section 7.2.

1.2 Our contribution

Theories for kernel regression with polynomial eigenvalue decay rate have been well studied in the last several decades (e.g. [15, 47, 64, 71, 83, 84]). When the dimension of data is large, because the eigenvalues of the kernel may depend on d𝑑ditalic_d and the polynomial eigendecay property may not hold anymore, few results about the optimality of kernel regression for large dimensional data have been obtained. We list our contributions to the optimality of kernel regression on large dimensional data below.

The upper and lower bound for the excess risk of the kernel regression for large dimensional data. Suppose that K𝐾Kitalic_K is a kernel defined on a d𝑑ditalic_d-dimensional space where d𝑑ditalic_d is large. Since the eigenvalues λjsubscript𝜆𝑗\lambda_{j}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’s of K𝐾Kitalic_K may depend on d𝑑ditalic_d, the existing arguments for the optimality of kernel regression are no longer applicable. We first find that the Mendelson complexity εn2superscriptsubscript𝜀𝑛2\varepsilon_{n}^{2}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (defined in Definition 6.1) and the metric entropy ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT only depend on the eigenvalues of the kernel K𝐾Kitalic_K. With the assumption that fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is in the unit ball of \mathcal{H}caligraphic_H, where \mathcal{H}caligraphic_H is the reproducing kernel Hilbert space associated with K𝐾Kitalic_K, we further prove that the minimax rate of the excess risk is upper bounded by the Mendelson complexity εn2superscriptsubscript𝜀𝑛2\varepsilon_{n}^{2}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and lower bounded by the metric entropy ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (Theorem 6.3 and Theorem 6.10).

As an application, when f𝚒𝚗subscript𝑓superscript𝚒𝚗f_{\star}\in\mathcal{H}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, the reproducing kernel Hilbert space associated with an inner product K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and the marginal distribution of 𝕏𝕏\mathbb{X}blackboard_X is uniformly distributed on the sphere 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we can show that if ndγproportional-to𝑛superscript𝑑𝛾n\propto d^{\gamma}italic_n ∝ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, the following statements hold: 1. For any γ>0𝛾0{\gamma}>0italic_γ > 0, we prove that the excess risk of properly early stopped gradient descent algorithm is upper bounded by n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT; 2. If γ=2,4,6,𝛾246{\gamma}=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯, we show that the minimax expected excess risk over 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is lower bounded by n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT (Theorem 3.3 and Theorem 3.5).

Optimality of kernel regression for large dimensional data. When ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for γ2,4,6,𝛾246\gamma\neq 2,4,6,\cdotsitalic_γ ≠ 2 , 4 , 6 , ⋯, the upper bound and lower bound provided by Mendelson complexity εn2superscriptsubscript𝜀𝑛2\varepsilon_{n}^{2}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and metric entropy ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are no-longer matching. We first resort to a new technical observation to derive a new upper bound of the excess risk which is tighter than the Mendeslson complexity. We then find that the richness condition proposed in [82] does not longer hold, and propose a modification to derive a new minimax lower bound. Fortunately, all these efforts provide us the minimax rate of kernel regression in large dimension (i.e., ndγproportional-to𝑛superscript𝑑𝛾n\propto d^{\gamma}italic_n ∝ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT) for all γ>0𝛾0\gamma>0italic_γ > 0 (Theorem 4.2 and Theorem 4.3).

New phenomena in large-dimension kernel regression. The results obtained from Theorem 4.2 and Theorem 4.3 are visually illustrated in Figure 1. This figure reveals two intriguing phenomena only observed in large-dimensional kernel regression. i)i)italic_i ) The first phenomenon is referred to as the multiple descent behavior. We plot the curve of the convergence rate ( with respect to n𝑛nitalic_n ) of the optimal excess risk of kernel regression. This curve achieves its peaks at γ=2,4,6,𝛾246\gamma=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯ and its isolated valleys at γ=3,5,7,𝛾357\gamma=3,5,7,\cdotsitalic_γ = 3 , 5 , 7 , ⋯. ii)ii)italic_i italic_i ) We also report another noteworthy phenomenon, ‘periodic plateau behavior’. We plot the curve of the convergence rate ( with respect to d𝑑ditalic_d ) of the optimal excess risk of kernel regression. When γ𝛾\gammaitalic_γ varies within certain specific ranges, we find that the value of this curve does not change. This indicates that, in order to improve the rate of excess risk, one has to increase the sample size above a certain threshold. We believe that these interesting phenomena are worth further investigations.

Refer to caption
(a) Multiple descent behavior
Refer to caption
(b) Periodic plateau behavior
Figure 1: A graphical representation of the minimax optimal rate of the excess risk of kernel regression with inner product kernels obtained from Theorem 4.2, and Theorem 4.3. The solid black line represents the upper bound that matches the minimax lower bound up to a constant factor. The dashed blue line indicates that, for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the ratio between the upper and lower bounds differs by at most nϵsuperscript𝑛italic-ϵn^{-\epsilon}italic_n start_POSTSUPERSCRIPT - italic_ϵ end_POSTSUPERSCRIPT.

1.3 Notations

oFor a real number x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R, denote by x𝑥\lceil x\rceil⌈ italic_x ⌉ the smallest integer that is greater or equal to x𝑥xitalic_x and by x𝑥\lfloor x\rfloor⌊ italic_x ⌋ the greatest integer that is less or equal to x𝑥xitalic_x. For 𝒗d𝒗superscript𝑑\boldsymbol{v}\in\mathbb{R}^{d}bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, denote by 𝒗(j)subscript𝒗𝑗\boldsymbol{v}_{(j)}bold_italic_v start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT the j𝑗jitalic_j-th component of 𝒗𝒗\boldsymbol{v}bold_italic_v and denote the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm and supremum norm of 𝒗𝒗\boldsymbol{v}bold_italic_v by 𝒗2=(j[d]𝒗(j)2)1/2subscriptnorm𝒗2superscriptsubscript𝑗delimited-[]𝑑superscriptsubscript𝒗𝑗212\|\boldsymbol{v}\|_{2}=(\sum_{j\in[d]}\boldsymbol{v}_{(j)}^{2})^{1/2}∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_d ] end_POSTSUBSCRIPT bold_italic_v start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT and 𝒗=maxj[d]|𝒗(j)|subscriptnorm𝒗subscript𝑗delimited-[]𝑑subscript𝒗𝑗\|\boldsymbol{v}\|_{\infty}=\max_{j\in[d]}|\boldsymbol{v}_{(j)}|∥ bold_italic_v ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j ∈ [ italic_d ] end_POSTSUBSCRIPT | bold_italic_v start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT | respectively. For a matrix 𝑨m×n𝑨superscript𝑚𝑛\boldsymbol{A}\in\mathbb{R}^{m\times n}bold_italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, denote by aijsubscript𝑎𝑖𝑗a_{ij}italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT the (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th component of 𝑨𝑨\boldsymbol{A}bold_italic_A and denote the operator norm and the Frobenius norm of 𝑨𝑨\boldsymbol{A}bold_italic_A by 𝑨op=sup𝒗n𝑨𝒗2/𝒗2subscriptnorm𝑨opsubscriptsupremum𝒗superscript𝑛subscriptnorm𝑨𝒗2subscriptnorm𝒗2\|\boldsymbol{A}\|_{\mathrm{op}}=\sup_{\boldsymbol{v}\in\mathbb{R}^{n}}\|% \boldsymbol{A}\boldsymbol{v}\|_{2}/\|\boldsymbol{v}\|_{2}∥ bold_italic_A ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_A bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ bold_italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝑨F=(i[m],j[n]aij2)1/2subscriptnorm𝑨Fsuperscriptsubscriptformulae-sequence𝑖delimited-[]𝑚𝑗delimited-[]𝑛superscriptsubscript𝑎𝑖𝑗212\|\boldsymbol{A}\|_{\mathrm{F}}=(\sum_{i\in[m],j\in[n]}a_{ij}^{2})^{1/2}∥ bold_italic_A ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_m ] , italic_j ∈ [ italic_n ] end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT respectively. Denote the j𝑗jitalic_j-th largest eigenvalues of the matrix 𝑨𝑨\boldsymbol{A}bold_italic_A by λj(𝑨)subscript𝜆𝑗𝑨\lambda_{j}(\boldsymbol{A})italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_A ). For a set A𝐴Aitalic_A, denote by |A|𝐴|A|| italic_A | the number of elements A𝐴Aitalic_A contains. For a marginal distribution, ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT, on 𝒳d+1𝒳superscript𝑑1\mathcal{X}\subset\mathbb{R}^{d+1}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT, we define the space L2(𝒳,ρ𝒳)={f:𝒳:𝒳|f(𝒙)|2dρ𝒳<}superscript𝐿2𝒳subscript𝜌𝒳conditional-set𝑓:𝒳subscript𝒳superscript𝑓𝒙2differential-dsubscript𝜌𝒳L^{2}(\mathcal{X},\rho_{\mathcal{X}})=\{f:\mathcal{X}\to\mathbb{R}:\int_{% \mathcal{X}}|f(\boldsymbol{x})|^{2}\mathrm{d}\rho_{\mathcal{X}}<\infty\}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X , italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) = { italic_f : caligraphic_X → blackboard_R : ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT | italic_f ( bold_italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_d italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT < ∞ }, and we denote L2=L2(𝒳,ρ𝒳)superscript𝐿2superscript𝐿2𝒳subscript𝜌𝒳L^{2}=L^{2}(\mathcal{X},\rho_{\mathcal{X}})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X , italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) for simplicity.

Throughout this paper, we will use the symbols C,C1,C2,𝐶subscript𝐶1subscript𝐶2C,C_{1},C_{2},\dotsitalic_C , italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … to denote absolute constants, i.e., constants that have a fixed value and do not depend on any other parameters. Unless specified, the symbols ,1,2,subscript1subscript2\mathfrak{C},\mathfrak{C}_{1},\mathfrak{C}_{2},\cdotsfraktur_C , fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ will denote constants that depend only on the variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of the noise in (2), κ𝜅\kappaitalic_κ defined in Assumption 1, and the constant in the asymptotic framework (7), i.e., c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ. In different conclusions, we may use the same symbols, such as C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, to represent different constants.

2 Preliminaries

Traditional technical tools for kernel regression are developed implicitly under the assumption that the dimension d𝑑ditalic_d of the domain 𝒳𝒳\mathcal{X}caligraphic_X is fixed or bounded. The recent successes of neural networks in high dimensional data urge us to investigate the convergence rate of the excess risk of the NTK regression for data with large d𝑑ditalic_d.

Suppose that we have observed n𝑛nitalic_n i.i.d. samples (Xi,Yi),i[n]subscript𝑋𝑖subscript𝑌𝑖𝑖delimited-[]𝑛(X_{i},Y_{i}),i\in[n]( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i ∈ [ italic_n ] from the model:

y=f(𝒙)+ϵ,𝑦subscript𝑓𝒙italic-ϵy=f_{\star}(\boldsymbol{x})+\epsilon,italic_y = italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_x ) + italic_ϵ , (2)

where Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are sampled from ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT, ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT is the marginal distribution on 𝒳d+1𝒳superscript𝑑1\mathcal{X}\subset\mathbb{R}^{d+1}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT, fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is some function defined on a compact set 𝒳𝒳\mathcal{X}caligraphic_X, and ϵ𝒩(0,σ2)similar-toitalic-ϵ𝒩0superscript𝜎2\epsilon\sim\mathcal{N}(0,\sigma^{2})italic_ϵ ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for some fixed σ>0𝜎0\sigma>0italic_σ > 0. Denote the n×1𝑛1n\times 1italic_n × 1 data vector of Yisubscript𝑌𝑖{Y_{i}}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s and the n×d𝑛𝑑n\times ditalic_n × italic_d data matrix of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s by 𝒚𝒚\boldsymbol{y}bold_italic_y and 𝑿𝑿\boldsymbol{X}bold_italic_X respectively.

Let us make the following assumptions on the kernel K𝐾Kitalic_K and the candidate function class \mathcal{B}caligraphic_B throughout this paper.

Assumption 1.

Suppose that K𝐾Kitalic_K is a continuous positive definite kernel function defined on 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT satisfying maxx𝒳K(x,x)κsubscript𝑥𝒳𝐾𝑥𝑥𝜅\max_{x\in\mathcal{X}}K(x,x)\leq\kapparoman_max start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_K ( italic_x , italic_x ) ≤ italic_κ for an absolute constant κ>0𝜅0\kappa>0italic_κ > 0.

Assumption 2.

Let us assume that fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is in the following family of candidate functions,

={f|f1},conditional-set𝑓subscriptnorm𝑓1\displaystyle\mathcal{B}=\left\{f\in\mathcal{H}~{}\bigg{|}~{}\|f\|_{\mathcal{H% }}\leq 1\right\},caligraphic_B = { italic_f ∈ caligraphic_H | ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 } , (3)

where \mathcal{H}caligraphic_H is the RKHS associated with the kernel K𝐾Kitalic_K.

Remark 2.1.

The Assumption 1 holds for a large class of kernels ( e.g. , the spherical NTK, Gaussian kernel, Laplace kernel, etc.). The Assumption 2 is merely a compact condition that is quite common and necessary regardless of the dimension d𝑑ditalic_d. Both of these two assumptions are commonly assumed in the literature on kernel methods [14, 15, 64] when the dimension d𝑑ditalic_d of the domain is fixed or bounded.

Given a positive definite kernel function K𝐾Kitalic_K and a positive measure ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT on 𝒳𝒳\mathcal{X}caligraphic_X, the integral operator TKsubscript𝑇𝐾T_{K}italic_T start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT defined by

TK(f)(x)=K(x,y)f(y)𝖽ρ𝒳(y)subscript𝑇𝐾𝑓𝑥𝐾𝑥𝑦𝑓𝑦differential-dsubscript𝜌𝒳𝑦\displaystyle T_{K}(f)(x)=\int K(x,y)f(y)~{}\mathsf{d}\rho_{\mathcal{X}}(y)italic_T start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_f ) ( italic_x ) = ∫ italic_K ( italic_x , italic_y ) italic_f ( italic_y ) sansserif_d italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ( italic_y )

is a self-adjoint compact operator. The celebrated Mercer’s decomposition theorem further assures that

K(x,y)=jλjϕj(x)ϕj(y),𝐾𝑥𝑦subscript𝑗subscript𝜆𝑗subscriptitalic-ϕ𝑗𝑥subscriptitalic-ϕ𝑗𝑦\displaystyle K(x,y)=\sum\nolimits_{j}\lambda_{j}\phi_{j}(x)\phi_{j}(y),italic_K ( italic_x , italic_y ) = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) , (4)

where {λj,j=1,2,}formulae-sequencesubscript𝜆𝑗𝑗12\{\lambda_{j},j=1,2,...\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , 2 , … } and ortho-normal eigen-functions {ϕj(x),j=1,2,}formulae-sequencesubscriptitalic-ϕ𝑗𝑥𝑗12\{\phi_{j}(x),j=1,2,...\}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) , italic_j = 1 , 2 , … } are the non-increasing ordered eigenvalues and corresponding eigen-functions of TKsubscript𝑇𝐾T_{K}italic_T start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. After a little bit of abuse of notations, we may call {λj,j=1,2,}formulae-sequencesubscript𝜆𝑗𝑗12\{\lambda_{j},j=1,2,...\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , 2 , … } and {ϕj(x),j=1,2,}formulae-sequencesubscriptitalic-ϕ𝑗𝑥𝑗12\{\phi_{j}(x),j=1,2,...\}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) , italic_j = 1 , 2 , … } the eigenvalues and eigenvectors(or eigen-functions) of the kernel function K𝐾Kitalic_K as well.

Suppose that fsubscript𝑓f_{\star}\in\mathcal{H}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H, a reproducible kernel Hilbert space (RKHS) [21, 41, 70] associated with a positive definite kernel function K(,)𝐾K(\cdot,\cdot)italic_K ( ⋅ , ⋅ ) defined on 𝒳𝒳\mathcal{X}caligraphic_X. The gradient flow of the loss function =12nj(yjf(Xj))212𝑛subscript𝑗superscriptsubscript𝑦𝑗𝑓subscript𝑋𝑗2\mathcal{L}=\frac{1}{2n}\sum_{j}(y_{j}-f(X_{j}))^{2}caligraphic_L = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_f ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT induced a gradient flow in \mathcal{H}caligraphic_H which is given by

𝖽𝖽tft(𝒙)=1nK(𝒙,𝑿)(ft(𝑿)𝒚),𝖽𝖽𝑡subscript𝑓𝑡𝒙1𝑛𝐾𝒙𝑿subscript𝑓𝑡𝑿𝒚\displaystyle\frac{\mathsf{d}}{\mathsf{d}t}{f}_{t}(\boldsymbol{x})=-\frac{1}{n% }K(\boldsymbol{x},\boldsymbol{X})(f_{t}(\boldsymbol{X})-\boldsymbol{y}),divide start_ARG sansserif_d end_ARG start_ARG sansserif_d italic_t end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) = - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_x , bold_italic_X ) ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_X ) - bold_italic_y ) , (5)

where 𝑿=(X1,,Xn)𝑿subscript𝑋1subscript𝑋𝑛\boldsymbol{X}=(X_{1},\cdots,X_{n})bold_italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), 𝒚=(Y1,,Yn)τ𝒚superscriptsubscript𝑌1subscript𝑌𝑛𝜏\boldsymbol{y}=(Y_{1},\cdots,Y_{n})^{\tau}bold_italic_y = ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT. If we further assume that f0(𝒙)=0subscript𝑓0𝒙0{f}_{0}(\boldsymbol{x})=0italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) = 0, then we have

ft(𝒙)=K(𝒙,𝑿)K(𝑿,𝑿)1(𝑰ne1nK(𝑿,𝑿)t)𝒚.subscript𝑓𝑡𝒙𝐾𝒙𝑿𝐾superscript𝑿𝑿1subscript𝑰𝑛superscript𝑒1𝑛𝐾𝑿𝑿𝑡𝒚f_{t}(\boldsymbol{x})=K(\boldsymbol{x},\boldsymbol{X})K(\boldsymbol{X},% \boldsymbol{X})^{-1}(\boldsymbol{I}_{n}-e^{-\frac{1}{n}K(\boldsymbol{X},% \boldsymbol{X})t})\boldsymbol{y}.italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) = italic_K ( bold_italic_x , bold_italic_X ) italic_K ( bold_italic_X , bold_italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) italic_t end_POSTSUPERSCRIPT ) bold_italic_y . (6)

This ft(𝒙)subscript𝑓𝑡𝒙f_{t}(\boldsymbol{x})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) is referred to as the estimator given by kernel regression stopped at time t𝑡titalic_t.

3 Warm-ups: optimality of kernel regression with inner product kernels in large dimensions for γ=2,4,6,𝛾246\gamma=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯

In this section, as a warm-up, we will show that the optimal rate of kernel regression with respect to the inner product kernel is n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT when ndγ,γ=2,4,6,formulae-sequenceproportional-to𝑛superscript𝑑𝛾𝛾246n\propto d^{\gamma},\gamma=2,4,6,\cdotsitalic_n ∝ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT , italic_γ = 2 , 4 , 6 , ⋯.

We first specify the following large-dimensional scenario for kernel regression where we perform our analysis:

Assumption 3.

Suppose that there exist three positive constants c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ, such that

c1dγnc2dγ,subscript𝑐1superscript𝑑𝛾𝑛subscript𝑐2superscript𝑑𝛾\displaystyle c_{1}d^{\gamma}\leq n\leq c_{2}d^{\gamma},italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ≤ italic_n ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT , (7)

and we often assume that d𝑑ditalic_d is sufficiently large.

In this paper, we only consider the inner product kernels defined on the sphere. An inner product kernel is a kernel function K𝐾Kitalic_K defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that there exists a function Φ:[1,1]:Φ11\Phi:[-1,1]\to\mathbb{R}roman_Φ : [ - 1 , 1 ] → blackboard_R satisfying that for any x,x𝕊d𝑥superscript𝑥superscript𝕊𝑑x,x^{\prime}\in\mathbb{S}^{d}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we have K(x,x)=Φ(x,x)𝐾𝑥superscript𝑥Φ𝑥superscript𝑥K(x,x^{\prime})=\Phi(\left\langle x,x^{\prime}\right\rangle)italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_Φ ( ⟨ italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ ). If we further assume that the marginal distribution ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT is the uniform distribution on 𝒳=𝕊d𝒳superscript𝕊𝑑\mathcal{X}=\mathbb{S}^{d}caligraphic_X = blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, then the Mercer’s decomposition for K𝐾{K}italic_K can be rewritten as

K(x,x)=k=0μkj=1N(d,k)Yk,j(x)Yk,j(x),𝐾𝑥superscript𝑥superscriptsubscript𝑘0subscript𝜇𝑘superscriptsubscript𝑗1𝑁𝑑𝑘subscript𝑌𝑘𝑗𝑥subscript𝑌𝑘𝑗superscript𝑥\displaystyle{K}(x,x^{\prime})=\sum_{k=0}^{\infty}\mu_{k}\sum_{j=1}^{N(d,k)}Y_% {k,j}(x)Y_{k,j}\left(x^{\prime}\right),italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ( italic_x ) italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (8)

where Yk,jsubscript𝑌𝑘𝑗Y_{k,j}italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT for j=1,,N(d,k)𝑗1𝑁𝑑𝑘j=1,\cdots,N(d,k)italic_j = 1 , ⋯ , italic_N ( italic_d , italic_k ) are spherical harmonic polynomials of degree k𝑘kitalic_k and μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s are the eigenvalues of K𝐾Kitalic_K with multiplicity N(d,0)=1𝑁𝑑01N(d,0)=1italic_N ( italic_d , 0 ) = 1; N(d,k)=2k+d1k(k+d2)!(d1)!(k1)!,k=1,2,formulae-sequence𝑁𝑑𝑘2𝑘𝑑1𝑘𝑘𝑑2𝑑1𝑘1𝑘12N(d,k)=\frac{2k+d-1}{k}\cdot\frac{(k+d-2)!}{(d-1)!(k-1)!},k=1,2,\cdotsitalic_N ( italic_d , italic_k ) = divide start_ARG 2 italic_k + italic_d - 1 end_ARG start_ARG italic_k end_ARG ⋅ divide start_ARG ( italic_k + italic_d - 2 ) ! end_ARG start_ARG ( italic_d - 1 ) ! ( italic_k - 1 ) ! end_ARG , italic_k = 1 , 2 , ⋯. For more details of the inner product kernels, readers can refer to [28].

Remark 3.1.

We consider the inner product kernels on the sphere mainly because the harmonic analysis is clear on the sphere ( e.g., properties of spherical harmonic polynomials are more concise than the orthogonal series on general domains). This makes Mercer’s decomposition of the inner product more explicit rather than several abstract assumptions ( e.g., [57]). We also notice that very few results are available for Mercer’s decomposition of a kernel defined on the general domain, especially when the dimension of the domain is taking into consideration. e.g., even the eigen-decay rate of the neural tangent kernels is only determined for the spheres. Restricted by this technical reason, most works analyzing the spectral algorithm in large-dimensional settings focus on the inner product kernels on spheres [50, 30, 60, 81, etc.]. Though there might be several works that tried to relax the spherical assumption (e.g., [50, 1, 8], we can find that most of them (i) adopted a near-spherical assumption; (ii) adopted strong assumptions on the regression function, e.g., f(x)=x[1]x[2]x[L]subscript𝑓𝑥𝑥delimited-[]1𝑥delimited-[]2𝑥delimited-[]𝐿f_{\star}(x)=x[1]x[2]\cdots x[L]italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) = italic_x [ 1 ] italic_x [ 2 ] ⋯ italic_x [ italic_L ] for an integer L>0𝐿0L>0italic_L > 0; or (iii) can not determine the convergence rate on the excess risk of the spectral algorithm.

To avoid unnecessary notation, let us make the following assumption on the inner product kernel K𝐾Kitalic_K.

Assumption 4.

Φ(t)𝒞([1,1])Φ𝑡superscript𝒞11\Phi(t)\in\mathcal{C}^{\infty}\left([-1,1]\right)roman_Φ ( italic_t ) ∈ caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( [ - 1 , 1 ] ) is a fixed function independent of d𝑑ditalic_d and there exists a sequence of absolute constants {aj}j0subscriptsubscript𝑎𝑗𝑗0\{a_{j}\}_{j\geq 0}{ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ≥ 0 end_POSTSUBSCRIPT, such that we have

Φ(t)=j=0ajtj,aj>0,for anyj=0,1,2,.formulae-sequenceΦ𝑡superscriptsubscript𝑗0subscript𝑎𝑗superscript𝑡𝑗formulae-sequencesubscript𝑎𝑗0for any𝑗012\Phi(t)=\sum_{j=0}^{\infty}a_{j}t^{j},~{}a_{j}>0,~{}\text{for any}~{}j=0,1,2,\dots.roman_Φ ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > 0 , for any italic_j = 0 , 1 , 2 , … .

The purpose of Assumption 4 is to keep the main results and proofs clean. Notice that, by Theorem 1.b in [32], the inner product kernel K𝐾Kitalic_K on the sphere is semi-positive definite for all dimensions if and only if all coefficients {aj,j=0,1,2,}formulae-sequencesubscript𝑎𝑗𝑗012\{a_{j},j=0,1,2,...\}{ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 0 , 1 , 2 , … } are non-negative. One can easily extend our results in this paper when certain coefficients aksubscript𝑎𝑘a_{k}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s are zero (e.g., one can consider the two-layer NTK defined as in Section 5, with ai=0subscript𝑎𝑖0a_{i}=0italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 for any i=3,5,7,𝑖357i=3,5,7,\cdotsitalic_i = 3 , 5 , 7 , ⋯).

With this assumption, we have the following lemma which is borrowed from [30].

Lemma 3.2.

Suppose that Assumptions  1-4 hold. Suppose that p0𝑝0p\geq 0italic_p ≥ 0 is any integer. There exist positive constants 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

1dksubscript1superscript𝑑𝑘\displaystyle{\mathfrak{C}_{1}}{d^{-k}}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT μk2dk,k=0,1,2,,p,p+1formulae-sequenceabsentsubscript𝜇𝑘subscript2superscript𝑑𝑘𝑘012𝑝𝑝1\displaystyle\leq\mu_{k}\leq{\mathfrak{C}_{2}}{d^{-k}},\quad k=0,1,2,\cdots,p,% p+1≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT , italic_k = 0 , 1 , 2 , ⋯ , italic_p , italic_p + 1 (9)
3dksubscript3superscript𝑑𝑘\displaystyle{\mathfrak{C}_{3}}{d^{k}}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT N(d,k)4dk,k=0,1,2,,p,p+1.formulae-sequenceabsent𝑁𝑑𝑘subscript4superscript𝑑𝑘𝑘012𝑝𝑝1\displaystyle\leq N(d,k)\leq{\mathfrak{C}_{4}}{d^{k}},\quad k=0,1,2,\cdots,p,p% +1.≤ italic_N ( italic_d , italic_k ) ≤ fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_k = 0 , 1 , 2 , ⋯ , italic_p , italic_p + 1 .

Thanks to Lemma 3.2, we can now use Theorem 6.3 to provide an upper bound on the excess risk of kernel regression with the inner product kernel K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT in large dimensions.

Theorem 3.3 (Upper bound).

Suppose that 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is an RKHS associated with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let fT^𝚒𝚗superscriptsubscript𝑓^𝑇𝚒𝚗f_{\widehat{T}}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT be the function defined in (6) where T^1=ε^n2superscript^𝑇1superscriptsubscript^𝜀𝑛2\widehat{T}^{-1}=\widehat{\varepsilon}_{n}^{2}over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT defined in (28) and K=K𝚒𝚗𝐾superscript𝐾𝚒𝚗K=K^{\mathtt{in}}italic_K = italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Suppose further that Assumptions  1-4 hold with =𝚒𝚗superscript𝚒𝚗\mathcal{H}=\mathcal{H}^{\mathtt{in}}caligraphic_H = caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Then, there exist constants isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

fT^𝚒𝚗fL221n12,superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscript𝐿22subscript1superscript𝑛12\displaystyle\left\|{f}_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{L^{2}}^% {2}\leq\mathfrak{C}_{1}n^{-\frac{1}{2}},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (10)

with probability at least 12exp{3n1/2}1subscript2subscript3superscript𝑛121-\mathfrak{C}_{2}\exp\left\{-\mathfrak{C}_{3}n^{1/2}\right\}1 - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT }.

Recall that the eigenvalues λjsubscript𝜆𝑗\lambda_{j}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’s in (4) are of non-increasing order, while the eigenvalues μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s in (8) are not necessarily non-increasing. However, the minimax lower bound on the excess risk with respect to the RKHS is determined by large eigenvalues. Therefore, the following property of the eigenvalues {μk}k0subscriptsubscript𝜇𝑘𝑘0\{\mu_{k}\}_{k\geq 0}{ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT is crucial to determining the minimax lower bound of large-dimensional kernel regression.

Lemma 3.4.

Suppose that Assumptions  1-4 hold. Fixed an integer p0𝑝0p\geq 0italic_p ≥ 0. Then, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

μj21d1μp,j=p+1,p+2,,formulae-sequencesubscript𝜇𝑗subscript2subscript1superscript𝑑1subscript𝜇𝑝𝑗𝑝1𝑝2\mu_{j}\leq\frac{\mathfrak{C}_{2}}{\mathfrak{C}_{1}}d^{-1}\mu_{p},\quad j=p+1,% p+2,\cdots,italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_j = italic_p + 1 , italic_p + 2 , ⋯ ,

where 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are given in Lemma 3.2.

We then use Theorem 6.10 to show that kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT achieves the optimal rate under specific asymptotic frameworks.

Theorem 3.5 (Minimax lower bound).

Let γ{2,4,6,}𝛾246{\gamma}\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ } be a fixed integer. There exist constants \mathfrak{C}fraktur_C and 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221n12,subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22subscript1superscript𝑛12\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\mathfrak{C}_{1}n^{-\frac{1}{2}},roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (11)

where ρfsubscript𝜌subscript𝑓\rho_{f_{\star}}italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the joint-p.d.f. of x,y𝑥𝑦x,yitalic_x , italic_y given by (2) with f=f𝑓subscript𝑓f=f_{\star}italic_f = italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, ={f𝚒𝚗f𝚒𝚗1}conditional-setsubscript𝑓superscript𝚒𝚗subscriptnormsubscript𝑓superscript𝚒𝚗1\mathcal{B}=\{f_{\star}\in\mathcal{H}^{\mathtt{in}}~{}\mid~{}\|f_{\star}\|_{% \mathcal{H}^{\mathtt{in}}}\leq 1\}caligraphic_B = { italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT ∣ ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 }.

Notice that Theorem 3.3 and Theorem 3.5 only show the optimality of kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for γ=2,4,6,𝛾246\gamma=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯. In the next section, we will modify the existing tools for bounding the excess risk of kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, and show the optimality of kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for any γ>0𝛾0\gamma>0italic_γ > 0.

4 Main results: optimality of kernel regression in large dimensions for all γ>0𝛾0\gamma>0italic_γ > 0

We have shown that in the large dimensional setting where ndγ,γ=2,4,6,formulae-sequenceasymptotically-equals𝑛superscript𝑑𝛾𝛾246n\asymp d^{\gamma},\gamma=2,4,6,\cdotsitalic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT , italic_γ = 2 , 4 , 6 , ⋯, the optimal rate of the kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT for large dimensional data is n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT.

However, when γ2,4,6,𝛾246\gamma\neq 2,4,6,\cdotsitalic_γ ≠ 2 , 4 , 6 , ⋯, Theorem 6.10 can not be applied to large-dimensional kernel regression. For example, when γ(2p,2p+1]𝛾2𝑝2𝑝1\gamma\in(2p,2p+1]italic_γ ∈ ( 2 italic_p , 2 italic_p + 1 ] for some integer p0𝑝0p\geq 0italic_p ≥ 0, we have 4dpγlog(d)ε¯n3dpγlog(d)n1/2subscript4superscript𝑑𝑝𝛾𝑑subscript¯𝜀𝑛subscript3superscript𝑑𝑝𝛾𝑑much-less-thansuperscript𝑛12\mathfrak{C}_{4}d^{p-\gamma}\log(d)\leq\bar{\varepsilon}_{n}\leq\mathfrak{C}_{% 3}d^{p-\gamma}\log(d)\ll n^{-1/2}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ) ≤ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ) ≪ italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT, where 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT are constants only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (see e.g., Remark C.2). However, the inequality (33) does not hold (see, e.g., Lemma C.1). Furthermore, the upper bound n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT provided by Theorem 6.3 is no longer matching the metric entropy ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

The main focus of this section is trying to determine the optimal rate for all the γ>0𝛾0\gamma>0italic_γ > 0. To construct a minimax lower bound for regression over the unit ball 𝚒𝚗superscript𝚒𝚗\mathcal{B}\subset\mathcal{H}^{\mathtt{in}}caligraphic_B ⊂ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, we need the following modification of Proposition 6.7.

Lemma 4.1.

Let 𝔠(0,1)𝔠01\mathfrak{c}\in(0,1)fraktur_c ∈ ( 0 , 1 ) be a constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ. For any 0<ε~1,ε~2<formulae-sequence0subscript~𝜀1subscript~𝜀20<\tilde{\varepsilon}_{1},\tilde{\varepsilon}_{2}<\infty0 < over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ∞ only depending on n𝑛nitalic_n, d𝑑ditalic_d, {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ and satisfying

VK(ε~2,𝒟)+nε~22+log(2)VK(ε~1/2σ,𝒟)𝔠,subscript𝑉𝐾subscript~𝜀2𝒟𝑛superscriptsubscript~𝜀222subscript𝑉𝐾subscript~𝜀12𝜎𝒟𝔠\frac{V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})+n\tilde{\varepsilon}_{2}^{2}+% \log(2)}{V_{K}(\tilde{\varepsilon}_{1}/\sqrt{2}\sigma,\mathcal{D})}\leq% \mathfrak{c},divide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) + italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log ( 2 ) end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / square-root start_ARG 2 end_ARG italic_σ , caligraphic_D ) end_ARG ≤ fraktur_c , (12)

we have

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221𝔠4ε~12,subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿221𝔠4superscriptsubscript~𝜀12\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\frac{1-\mathfrak{c}}{4}\tilde{\varepsilon}_{1}^{2},roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 - fraktur_c end_ARG start_ARG 4 end_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (13)

where VK(ε,𝒟)subscript𝑉𝐾𝜀𝒟V_{K}(\varepsilon,\mathcal{D})italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε , caligraphic_D ) is the ε𝜀\varepsilonitalic_ε-covering entropy of (𝒟,d2= KL divergence )𝒟superscript𝑑2 KL divergence (\mathcal{D},d^{2}=\text{ KL divergence })( caligraphic_D , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = KL divergence ), 𝒟𝒟\mathcal{D}caligraphic_D is defined in (30), and ρfsubscript𝜌subscript𝑓\rho_{f_{\star}}italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the joint-p.d.f. of x,y𝑥𝑦x,yitalic_x , italic_y given by (2) with f=f𝑓subscript𝑓f=f_{\star}italic_f = italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT.

We then have the following minimax lower bounds, which greatly extend the results given in Theorem 3.5:

Theorem 4.2.

Let ρfsubscript𝜌subscript𝑓\rho_{f_{\star}}italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT be the joint-p.d.f. of x,y𝑥𝑦x,yitalic_x , italic_y given by (2) with f={f𝚒𝚗f𝚒𝚗1}subscript𝑓conditional-set𝑓superscript𝚒𝚗subscriptnorm𝑓superscript𝚒𝚗1f_{\star}\in\mathcal{B}=\{f\in\mathcal{H}^{\mathtt{in}}~{}\mid~{}\|f\|_{% \mathcal{H}^{\mathtt{in}}}\leq 1\}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B = { italic_f ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT ∣ ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 }. Let γ>0𝛾0\gamma>0italic_γ > 0 be a fixed real number and p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋. Then we have the following statements.

  • (i)

    If γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ }, then, there exist constants 1>0subscript10\mathfrak{C}_{1}>0fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 and \mathfrak{C}fraktur_C, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have:

    minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221n12;subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22subscript1superscript𝑛12\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\mathfrak{C}_{1}n^{-\frac{1}{2}};roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ; (14)
  • (ii)

    If γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ], then, for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, there exist constants 1>0subscript10\mathfrak{C}_{1}>0fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 and \mathfrak{C}fraktur_C only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, γ𝛾\gammaitalic_γ, and ϵitalic-ϵ\epsilonitalic_ϵ, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have:

    minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221n(γpγ+ϵ);subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22subscript1superscript𝑛𝛾𝑝𝛾italic-ϵ\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\mathfrak{C}_{1}n^{-\left(\frac{\gamma-p}{\gamma}+\epsilon\right)};roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - ( divide start_ARG italic_γ - italic_p end_ARG start_ARG italic_γ end_ARG + italic_ϵ ) end_POSTSUPERSCRIPT ; (15)
  • (iii)

    If γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ), then, there exist constants 1>0subscript10\mathfrak{C}_{1}>0fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 and \mathfrak{C}fraktur_C, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have:

    minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221np+1γ.subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22subscript1superscript𝑛𝑝1𝛾\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\mathfrak{C}_{1}n^{-\frac{p+1}{\gamma}}.roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_p + 1 end_ARG start_ARG italic_γ end_ARG end_POSTSUPERSCRIPT . (16)

Since the upper bound provided by the Mendelson complexity is no longer a tight upper bound, we have to improve the claims in Theorem 3.3. Fortunately, thanks to a nontrivial technical observation, we then present new upper bounds on the excess risk of kernel regression in large dimensions which (nearly) match the minimax lower bounds given in Theorem 4.2.

Theorem 4.3.

Suppose that 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is an RKHS associated with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let fT^𝚒𝚗superscriptsubscript𝑓^𝑇𝚒𝚗f_{\widehat{T}}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT be the function defined in (6) where T^1=ε^n2superscript^𝑇1superscriptsubscript^𝜀𝑛2\widehat{T}^{-1}=\widehat{\varepsilon}_{n}^{2}over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT defined in (28) and K=K𝚒𝚗𝐾superscript𝐾𝚒𝚗K=K^{\mathtt{in}}italic_K = italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Let γ>0𝛾0\gamma>0italic_γ > 0 be a fixed real number and p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋. Suppose further that Assumptions  1-4 hold with =𝚒𝚗superscript𝚒𝚗\mathcal{H}=\mathcal{H}^{\mathtt{in}}caligraphic_H = caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Then, we have the following statements:

  • (i)

    If γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ }, then, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

    fT^𝚒𝚗fL221n12,superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscript𝐿22subscript1superscript𝑛12\displaystyle\left\|{f}_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{L^{2}}^% {2}\leq\mathfrak{C}_{1}n^{-\frac{1}{2}},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (17)

    holds with probability at least 12exp{3n1/2}1subscript2subscript3superscript𝑛121-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{1/2}\}1 - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT }.

  • (ii)

    If γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ], then, for any δ>0𝛿0\delta>0italic_δ > 0, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

    fT^𝚒𝚗fL221nγpγlog(n),superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscript𝐿22subscript1superscript𝑛𝛾𝑝𝛾𝑛\displaystyle\left\|{f}_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{L^{2}}^% {2}\leq\mathfrak{C}_{1}n^{-\frac{\gamma-p}{\gamma}}\log(n),∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_γ - italic_p end_ARG start_ARG italic_γ end_ARG end_POSTSUPERSCRIPT roman_log ( italic_n ) , (18)

    holds with probability at least 1δ2exp{3np/γlog(n)}1𝛿subscript2subscript3superscript𝑛𝑝𝛾𝑛1-\delta-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{p/\gamma}\log(n)\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT italic_p / italic_γ end_POSTSUPERSCRIPT roman_log ( italic_n ) }.

  • (iii)

    If γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ), then, for any δ>0𝛿0\delta>0italic_δ > 0, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have

    fT^𝚒𝚗fL221np+1γ,superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscript𝐿22subscript1superscript𝑛𝑝1𝛾\displaystyle\left\|{f}_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{L^{2}}^% {2}\leq\mathfrak{C}_{1}n^{-\frac{p+1}{\gamma}},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_p + 1 end_ARG start_ARG italic_γ end_ARG end_POSTSUPERSCRIPT , (19)

    holds with probability at least 1δ2exp{3n1(p+1)/γ}1𝛿subscript2subscript3superscript𝑛1𝑝1𝛾1-\delta-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{1-(p+1)/\gamma}\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 - ( italic_p + 1 ) / italic_γ end_POSTSUPERSCRIPT }.

We notice that the above periodic behavior is very much linked to the spectral structure of inner product kernels for uniform data on a large-dimensional sphere. Recall that from Lemma 3.2, eigenvalues of order dksuperscript𝑑𝑘d^{-k}italic_d start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT is of multiplicity dksuperscript𝑑𝑘d^{k}italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, k=0,,p+1𝑘0𝑝1k=0,\cdots,p+1italic_k = 0 , ⋯ , italic_p + 1. Such a strong block structure of the spectrum makes both the bias and variance terms of the (empirical) excess risk decrease with gaps when γ𝛾\gammaitalic_γ increases. (see, e.g., Lemma E.3 and E.4, and their modified version Lemma C.6 and C.7 )

Remark 4.4.

Denote P>0subscriptPabsent0\mathrm{P}_{>0}roman_P start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT as the projection onto linear space of spherical harmonics with degree >0absent0>0> 0. In the region ndmuch-less-than𝑛𝑑n\ll ditalic_n ≪ italic_d (i.e. γ(0,1)𝛾01\gamma\in(0,1)italic_γ ∈ ( 0 , 1 )), since P>0fL22μ1f21/dsuperscriptsubscriptnormsubscriptPabsent0subscript𝑓superscript𝐿22subscript𝜇1superscriptsubscriptnormsubscript𝑓2less-than-or-similar-to1𝑑\left\|\mathrm{P}_{>0}f_{*}\right\|_{L^{2}}^{2}\leq\mu_{1}\left\|f_{*}\right\|% _{\mathcal{H}}^{2}\lesssim 1/d∥ roman_P start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ 1 / italic_d, one is essentially fitting a trivial predictor (a constant), which has excess risk indeed O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ). Hence, such a region is in fact not very interesting, though the convergence rate is faster (as a function of n𝑛nitalic_n ).

Remark 4.5.

In Theorem 4.3, a data-driven optimal stop** time is given, and we show that kernel regression stopped at T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG is minimax optimal for any γ>0𝛾0\gamma>0italic_γ > 0. Moreover, the order of T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG is Θ(n1/2)Θsuperscript𝑛12\Theta(n^{1/2})roman_Θ ( italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) (see, e.g., Lemma A.3 and Lemma B.4), which is independent to γ𝛾\gammaitalic_γ.

Figure 1 illustrates the results obtained by Theorem 4.2 and Theorem 4.3. From these figures, we observe some interesting phenomena.

Multiple descent behavior

The curve in figure 1 (a) shows how the convergence rate (in terms of the sample size n𝑛nitalic_n) of the optimal excess risk of kernel regression fluctuates as γ>0𝛾0\gamma>0italic_γ > 0 grows. We find that this curve is non-monotone and exhibits the following multiple descent behavior: this curve achieves its peaks at γ=2,4,6,𝛾246\gamma=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯ and its isolated valleys at γ=3,5,7,𝛾357\gamma=3,5,7,\cdotsitalic_γ = 3 , 5 , 7 , ⋯. A similar multiple descent phenomenon has been reported in [50], where they consider the excess risk of the kernel interpolation in large-dimensional settings. Though they only provided the upper bound of the excess risk of kernel interpolation, their results and our observation strongly suggest that there might be a significant difference between kernel regression in large dimensional data and fixed dimensional data.

Figure 1 (b) provides an alternative representation of our results, and the curve in it shows how the convergence rate (in terms of the dimension d𝑑ditalic_d) of the optimal excess risk of kernel regression fluctuates as γ>0𝛾0\gamma>0italic_γ > 0 grows. From Figure 1 (b), we can find that the curve of this convergence rate decreases when the scaling γ𝛾\gammaitalic_γ (recall that we have n=dγ𝑛superscript𝑑𝛾n=d^{\gamma}italic_n = italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT) increases, indicating that the performance of kernel regression becomes better when the sample size n𝑛nitalic_n grows. Moreover, from Figure 1 (b), we observe another interesting phenomenon:

Periodic plateau behavior

In Figure 1 (b), when γ𝛾\gammaitalic_γ varies within certain specific ranges, ζ𝜁\zetaitalic_ζ, the vertical axis in Figure 1 (b), does not change. In other words, if we fix a large dimension d𝑑ditalic_d and increase γ𝛾\gammaitalic_γ (or equivalently, increase the sample size n𝑛nitalic_n), the optimal rate of excess risk in kernel regression stays invariant in certain ranges (e.g., γ(1,2)(3,4)(5,6)(7,8)𝛾12345678\gamma\in(1,2)\cup(3,4)\cup(5,6)\cup(7,8)\cdotsitalic_γ ∈ ( 1 , 2 ) ∪ ( 3 , 4 ) ∪ ( 5 , 6 ) ∪ ( 7 , 8 ) ⋯). This ‘periodic plateau behavior’ was numerically reported in Figure 5 (b) in [12]: when γ𝛾\gammaitalic_γ varies within certain specific ranges, the excess risk of kernel regression decays very slowly.

Therefore, in order to improve the rate of excess risk, one has to increase the sample size above a certain threshold. For example, when d=10𝑑10d=10italic_d = 10, even when the sample size n𝑛nitalic_n ranges from ten million (107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT) to hundred million (108superscript10810^{8}10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT), the convergence speed of excess risk stays invariant, and is proportional to 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

5 Applications in Wide Neural Network

In this section, we apply our results to large-dimensional neural networks based on recent work ([46]). Most of the notations in this section follow those in [46].

Let us consider the square loss function

=12nj=1n(Yjf(Xj;𝜽))2,12𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑌𝑗𝑓subscript𝑋𝑗𝜽2\displaystyle\mathcal{L}=\frac{1}{2n}\sum_{j=1}^{n}(Y_{j}-f(X_{j};{\boldsymbol% {\theta}}))^{2},caligraphic_L = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_f ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; bold_italic_θ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (20)

where f(𝒙;𝜽)𝑓𝒙𝜽f(\boldsymbol{x};{\boldsymbol{\theta}})italic_f ( bold_italic_x ; bold_italic_θ ) is a ReLU neural network with L2𝐿2L\geq 2italic_L ≥ 2 hidden layers defined as in Section 3.1 in [46], and we use 𝜽𝜽\boldsymbol{\theta}bold_italic_θ to represent the collection of all parameters flatten as a column vector. Furthermore, assume for simplicity that the widths of all layers of the neural network equal m𝑚mitalic_m.

The loss function \mathcal{L}caligraphic_L induced a gradient flow on msuperscript𝑚\mathcal{F}^{m}caligraphic_F start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, the space of all the two-layer neural networks with width m𝑚mitalic_m, which is given by

𝖽𝖽tf(𝒙;𝜽(t))=1n𝜽(t)f(𝒙;𝜽(t))𝜽(t)f(𝑿;𝜽(t))(f(𝑿;𝜽(t))𝒚).𝖽𝖽𝑡𝑓𝒙𝜽𝑡1𝑛subscript𝜽𝑡𝑓𝒙𝜽𝑡subscript𝜽𝑡𝑓superscript𝑿𝜽𝑡top𝑓𝑿𝜽𝑡𝒚\frac{\mathsf{d}}{\mathsf{d}t}f(\boldsymbol{x};{\boldsymbol{\theta}(t)})=-% \frac{1}{n}\nabla_{\boldsymbol{\theta}(t)}f(\boldsymbol{x};{\boldsymbol{\theta% }(t)})\nabla_{\boldsymbol{\theta}(t)}f(\boldsymbol{X};{\boldsymbol{\theta}(t)}% )^{\top}(f(\boldsymbol{X};{\boldsymbol{\theta}(t)})-\boldsymbol{y}).divide start_ARG sansserif_d end_ARG start_ARG sansserif_d italic_t end_ARG italic_f ( bold_italic_x ; bold_italic_θ ( italic_t ) ) = - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∇ start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT italic_f ( bold_italic_x ; bold_italic_θ ( italic_t ) ) ∇ start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT italic_f ( bold_italic_X ; bold_italic_θ ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_f ( bold_italic_X ; bold_italic_θ ( italic_t ) ) - bold_italic_y ) . (21)

If we introduce a time-varying kernel function K𝜽(t)m(𝒙,𝒙):=𝜽(t)f(𝒙;𝜽(t))𝜽(t)f(𝒙;𝜽(t))assignsuperscriptsubscript𝐾𝜽𝑡𝑚𝒙superscript𝒙subscript𝜽𝑡𝑓𝒙𝜽𝑡subscript𝜽𝑡𝑓superscript𝒙𝜽𝑡K_{\boldsymbol{\theta}(t)}^{m}(\boldsymbol{x},\boldsymbol{x}^{\prime}):=\nabla% _{\boldsymbol{\theta}(t)}f(\boldsymbol{x};{\boldsymbol{\theta}(t)})\nabla_{% \boldsymbol{\theta}(t)}f(\boldsymbol{x}^{\prime};{\boldsymbol{\theta}(t)})italic_K start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) := ∇ start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT italic_f ( bold_italic_x ; bold_italic_θ ( italic_t ) ) ∇ start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT italic_f ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; bold_italic_θ ( italic_t ) ), which is referred to as the neural network kernel (NNK) in this paper, the gradient flow on msuperscript𝑚\mathcal{F}^{m}caligraphic_F start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT can be written as

𝖽𝖽tf(𝒙;𝜽(t))=1nK𝜽(t)m(𝒙,𝑿)(f(𝑿;𝜽(t))𝒚).𝖽𝖽𝑡𝑓𝒙𝜽𝑡1𝑛superscriptsubscript𝐾𝜽𝑡𝑚𝒙𝑿𝑓𝑿𝜽𝑡𝒚\displaystyle\frac{\mathsf{d}}{\mathsf{d}t}f(\boldsymbol{x};{\boldsymbol{% \theta}(t)})=-\frac{1}{n}K_{\boldsymbol{\theta}(t)}^{m}(\boldsymbol{x},% \boldsymbol{X})(f(\boldsymbol{X};{\boldsymbol{\theta}(t)})-\boldsymbol{y}).divide start_ARG sansserif_d end_ARG start_ARG sansserif_d italic_t end_ARG italic_f ( bold_italic_x ; bold_italic_θ ( italic_t ) ) = - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_X ) ( italic_f ( bold_italic_X ; bold_italic_θ ( italic_t ) ) - bold_italic_y ) .

The celebrated work [39] observed that as m𝑚m\rightarrow\inftyitalic_m → ∞, the neural network kernel K𝜽(t)m(𝒙,𝒙)superscriptsubscript𝐾𝜽𝑡𝑚𝒙superscript𝒙K_{\boldsymbol{\theta}(t)}^{m}(\boldsymbol{x},\boldsymbol{x}^{\prime})italic_K start_POSTSUBSCRIPT bold_italic_θ ( italic_t ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) point-wisely converges to a time-invariant kernel K𝙽𝚃(𝒙,𝒙)superscript𝐾𝙽𝚃𝒙superscript𝒙bold-′K^{\mathtt{NT}}(\boldsymbol{x},\boldsymbol{x^{\prime}})italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) which is now referred to as the neural tangent kernel (NTK) in literature(see, e.g., [39, 10]). Thus, they considered the regressor ft𝙽𝚃superscriptsubscript𝑓𝑡𝙽𝚃{f}_{t}^{\mathtt{NT}}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT, which is also known as the estimator produced by the early-stop** kernel regression with NTK, given by the following gradient flow

𝖽𝖽tft𝙽𝚃(𝒙)𝖽𝖽𝑡superscriptsubscript𝑓𝑡𝙽𝚃𝒙\displaystyle\frac{\mathsf{d}}{\mathsf{d}t}{f}_{t}^{\mathtt{NT}}(\boldsymbol{x})divide start_ARG sansserif_d end_ARG start_ARG sansserif_d italic_t end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( bold_italic_x ) =1nK𝙽𝚃(𝒙,𝑿)(ft𝙽𝚃(𝑿)𝒚).absent1𝑛superscript𝐾𝙽𝚃𝒙𝑿superscriptsubscript𝑓𝑡𝙽𝚃𝑿𝒚\displaystyle=-\frac{1}{n}K^{\mathtt{NT}}(\boldsymbol{x},\boldsymbol{X})(f_{t}% ^{\mathtt{NT}}(\boldsymbol{X})-\boldsymbol{y}).= - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_X ) ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( bold_italic_X ) - bold_italic_y ) . (22)

In the remainder of this article, we will abbreviate early-stop** kernel regression with NTK to ‘NTK regression’ where it will not cause confusion.

Suppose that f0𝙽𝚃(𝒙)=0subscriptsuperscript𝑓𝙽𝚃0𝒙0f^{\mathtt{NT}}_{0}(\boldsymbol{x})=0italic_f start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) = 0. Recently, [44, 46] demonstrated that, with the "mirror initialization" such that f(𝑿;𝜽(0))=0𝑓𝑿𝜽00f(\boldsymbol{X};{\boldsymbol{\theta}(0)})=0italic_f ( bold_italic_X ; bold_italic_θ ( 0 ) ) = 0 [19, 38], the excess risk of a wide multi-layer neural network uniformly converges to the excess risk of NTK regression for any values of d𝑑ditalic_d and n𝑛nitalic_n. The following proposition reiterates their findings.

Proposition 5.1 (A direct result of Lemma 12 in [46]).

Suppose that 𝒳𝒳\mathcal{X}caligraphic_X is a bounded subset of d+1superscript𝑑1\mathbb{R}^{d+1}blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT. If we further assume that f0𝙽𝚃=0subscriptsuperscript𝑓𝙽𝚃00f^{\mathtt{NT}}_{0}=0italic_f start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 and the neural network is initialized symmetrically, then for any ϵ,δ>0italic-ϵ𝛿0\epsilon,\delta>0italic_ϵ , italic_δ > 0, there exists M𝑀Mitalic_M such that for any mM𝑚𝑀m\geq Mitalic_m ≥ italic_M, we have

supt0|f(𝑿;𝜽(t))fL2ft𝙽𝚃fL2|ϵ.subscriptsupremum𝑡0subscriptnorm𝑓𝑿𝜽𝑡subscript𝑓superscript𝐿2subscriptnormsuperscriptsubscript𝑓𝑡𝙽𝚃subscript𝑓superscript𝐿2italic-ϵ\displaystyle\sup_{t\geq 0}\left|\left\|f(\boldsymbol{X};{\boldsymbol{\theta}(% t)})-f_{\star}\right\|_{L^{2}}-\left\|{f}_{t}^{\mathtt{NT}}-f_{\star}\right\|_% {L^{2}}\right|\leq\epsilon.roman_sup start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT | ∥ italic_f ( bold_italic_X ; bold_italic_θ ( italic_t ) ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - ∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ≤ italic_ϵ . (23)

Thanks to the Proposition 5.1, we can focus on the generalization ability of the kernel regression with respect to NTK in large dimensions instead of that of wide neural networks.

It can be shown that when the number of hidden layers L2𝐿2L\geq 2italic_L ≥ 2, K𝙽𝚃superscript𝐾𝙽𝚃K^{\mathtt{NT}}italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT satisfies Assumption 1 and 4 (see, e.g., Proposition D.1 and Proposition 9 in [46]). Therefore, an application of Proposition 5.1 and Theorem 4.3 provides an upper bound and minimax lower bound on the excess risk of NTK regression in large dimensions.

Theorem 5.2.

Suppose that 𝙽𝚃superscript𝙽𝚃\mathcal{H}^{\mathtt{NT}}caligraphic_H start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT is an RKHS associated with the neural tangent kernel K𝙽𝚃superscript𝐾𝙽𝚃K^{\mathtt{NT}}italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Let f(𝐗;𝛉(T^))𝑓𝐗𝛉^𝑇f(\boldsymbol{X};{\boldsymbol{\theta}(\widehat{T})})italic_f ( bold_italic_X ; bold_italic_θ ( over^ start_ARG italic_T end_ARG ) ) be the function defined in (22) where T^1=ε^n2superscript^𝑇1superscriptsubscript^𝜀𝑛2\widehat{T}^{-1}=\widehat{\varepsilon}_{n}^{2}over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT defined in (28) and K=K𝙽𝚃𝐾superscript𝐾𝙽𝚃K=K^{\mathtt{NT}}italic_K = italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT. Let γ>0𝛾0\gamma>0italic_γ > 0 be a fixed real number and p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋. Suppose further that Assumption  2 and  3 hold with =𝙽𝚃superscript𝙽𝚃\mathcal{H}=\mathcal{H}^{\mathtt{NT}}caligraphic_H = caligraphic_H start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT. Then, we have the following statements:

  • (i)

    If γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ }, then, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, when m𝑚mitalic_m is sufficiently large, we have

    f(𝑿;𝜽(T^))fL221n12,superscriptsubscriptnorm𝑓𝑿𝜽^𝑇subscript𝑓superscript𝐿22subscript1superscript𝑛12\displaystyle\left\|f(\boldsymbol{X};{\boldsymbol{\theta}(\widehat{T})})-f_{% \star}\right\|_{L^{2}}^{2}\leq\mathfrak{C}_{1}n^{-\frac{1}{2}},∥ italic_f ( bold_italic_X ; bold_italic_θ ( over^ start_ARG italic_T end_ARG ) ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , (24)

    holds with probability at least 12exp{3n1/2}1subscript2subscript3superscript𝑛121-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{1/2}\}1 - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT }.

  • (ii)

    If γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ], then, for any δ>0𝛿0\delta>0italic_δ > 0, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, when m𝑚mitalic_m is sufficiently large, we have

    f(𝑿;𝜽(T^))fL221nγpγlog(n),superscriptsubscriptnorm𝑓𝑿𝜽^𝑇subscript𝑓superscript𝐿22subscript1superscript𝑛𝛾𝑝𝛾𝑛\displaystyle\left\|f(\boldsymbol{X};{\boldsymbol{\theta}(\widehat{T})})-f_{% \star}\right\|_{L^{2}}^{2}\leq\mathfrak{C}_{1}n^{-\frac{\gamma-p}{\gamma}}\log% (n),∥ italic_f ( bold_italic_X ; bold_italic_θ ( over^ start_ARG italic_T end_ARG ) ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_γ - italic_p end_ARG start_ARG italic_γ end_ARG end_POSTSUPERSCRIPT roman_log ( italic_n ) , (25)

    holds with probability at least 1δ2exp{3np/γlog(n)}1𝛿subscript2subscript3superscript𝑛𝑝𝛾𝑛1-\delta-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{p/\gamma}\log(n)\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT italic_p / italic_γ end_POSTSUPERSCRIPT roman_log ( italic_n ) }.

  • (iii)

    If γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ), then, for any δ>0𝛿0\delta>0italic_δ > 0, there exist constants \mathfrak{C}fraktur_C and isubscript𝑖\mathfrak{C}_{i}fraktur_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, when m𝑚mitalic_m is sufficiently large, we have

    f(𝑿;𝜽(T^))fL221np+1γ,superscriptsubscriptnorm𝑓𝑿𝜽^𝑇subscript𝑓superscript𝐿22subscript1superscript𝑛𝑝1𝛾\displaystyle\left\|f(\boldsymbol{X};{\boldsymbol{\theta}(\widehat{T})})-f_{% \star}\right\|_{L^{2}}^{2}\leq\mathfrak{C}_{1}n^{-\frac{p+1}{\gamma}},∥ italic_f ( bold_italic_X ; bold_italic_θ ( over^ start_ARG italic_T end_ARG ) ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_p + 1 end_ARG start_ARG italic_γ end_ARG end_POSTSUPERSCRIPT , (26)

    holds with probability at least 1δ2exp{3n1(p+1)/γ}1𝛿subscript2subscript3superscript𝑛1𝑝1𝛾1-\delta-\mathfrak{C}_{2}\exp\{-\mathfrak{C}_{3}n^{1-(p+1)/\gamma}\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 - ( italic_p + 1 ) / italic_γ end_POSTSUPERSCRIPT }.

6 Bounds for large dimensional kernel regression

In this section, we present technical results that build upon the findings discussed in Section 3. These results pertain to general kernel learning rate bounds and are applicable to a continuous kernel K𝐾Kitalic_K defined on a compact space 𝒳𝒳\mathcal{X}caligraphic_X (not necessarily 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). We believe these results may hold independent interest.

We first introduce the (population and empirical) Mendelson complexity, the key quantities in determining the minimax rate of regression over \mathcal{B}caligraphic_B.

Definition 6.1 (Mendelson complexity).

Suppose that K𝐾Kitalic_K is a kernel function satisfying the Assumption 1, we then introduce:

  • i)

    (Population Mendelson complexity) Let λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s be the eigenvalues of K𝐾Kitalic_K given in (4) and K(ε):=[1nj=1min{λj,ε2}]1/2assignsubscript𝐾𝜀superscriptdelimited-[]1𝑛superscriptsubscript𝑗1subscript𝜆𝑗superscript𝜀212\mathcal{R}_{K}(\varepsilon):=\left[\frac{1}{n}\sum_{j=1}^{\infty}\min\left\{% \lambda_{j},\varepsilon^{2}\right\}\right]^{1/2}caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) := [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT roman_min { italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. The population Mendelson complexity is given by

    εn:=argminε{K(ε)ε2/(2eσ)}.assignsubscript𝜀𝑛subscript𝜀subscript𝐾𝜀superscript𝜀22𝑒𝜎\displaystyle{\varepsilon}_{n}:=\arg\min_{\varepsilon}\left\{{\mathcal{R}}_{{K% }}(\varepsilon)\leq\varepsilon^{2}/(2e\sigma)\right\}.italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := roman_arg roman_min start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT { caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≤ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_e italic_σ ) } . (27)
  • ii)

    (Empirical Mendelson complexity) Let λ^isubscript^𝜆𝑖\widehat{\lambda}_{i}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s be the eigenvalues of 1nK(𝑿,𝑿)1𝑛𝐾𝑿𝑿\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) and R^K(ε)=[1nj=1nmin{λ^j,ε2}]1/2subscript^𝑅𝐾𝜀superscriptdelimited-[]1𝑛superscriptsubscript𝑗1𝑛subscript^𝜆𝑗superscript𝜀212\widehat{R}_{K}(\varepsilon)=\left[\frac{1}{n}\sum_{j=1}^{n}\min\{\widehat{% \lambda}_{j},\varepsilon^{2}\}\right]^{1/2}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) = [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. The empirical Mendelson complexity is given by

    ε^n:=argminε{^K(ε)ε2/(2eσ)}.assignsubscript^𝜀𝑛subscript𝜀subscript^𝐾𝜀superscript𝜀22𝑒𝜎\displaystyle\widehat{\varepsilon}_{n}:=\arg\min_{\varepsilon}\left\{\widehat{% \mathcal{R}}_{{K}}(\varepsilon)\leq\varepsilon^{2}/(2e\sigma)\right\}.over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := roman_arg roman_min start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT { over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≤ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_e italic_σ ) } . (28)
Remark 6.2.

From the monotony of RK()subscript𝑅𝐾R_{K}(\cdot)italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ⋅ ) and R^K()subscript^𝑅𝐾\widehat{R}_{K}(\cdot)over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ⋅ ), one can show the existence and uniqueness of εnsubscript𝜀𝑛{\varepsilon}_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ε^nsubscript^𝜀𝑛\widehat{\varepsilon}_{n}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (also refer to [64]).

Upper bound of the excess risk of kernel regression.  The Mendelson complexity is closely related to the upper bound of the excess risk of kernel regression.

Theorem 6.3 (Upper bound).

Suppose that Assumptions  1 and 2 hold. Let fT^subscript𝑓^𝑇f_{\widehat{T}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT be the function defined in (6) where T^=1/ε^n2^𝑇1superscriptsubscript^𝜀𝑛2\widehat{T}=1/\widehat{\varepsilon}_{n}^{2}over^ start_ARG italic_T end_ARG = 1 / over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Suppose for any absolute constant C𝐶Citalic_C, there exists a constant \mathfrak{C}fraktur_C, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have nεn2C𝑛superscriptsubscript𝜀𝑛2𝐶n\varepsilon_{n}^{2}\geq Citalic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_C. Then there exist absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and a constant 0subscript0\mathfrak{C}_{0}fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, such that for any n0𝑛subscript0n\geq\mathfrak{C}_{0}italic_n ≥ fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have

fT^fL22C1εn2,superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓superscript𝐿22subscript𝐶1superscriptsubscript𝜀𝑛2\displaystyle\left\|{f}_{\widehat{T}}-f_{\star}\right\|_{L^{2}}^{2}\leq C_{1}% \varepsilon_{n}^{2},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (29)

with probability at least 1C2exp{C3nεn2}1subscript𝐶2subscript𝐶3𝑛superscriptsubscript𝜀𝑛21-C_{2}\exp\left\{-C_{3}n\varepsilon_{n}^{2}\right\}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Similar results have been claimed in [64] for fixed d𝑑ditalic_d ( see e.g., the Theorem 2 of [64]), the contributions here is that we demonstrate that the constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are absolute constants. Thus, we could apply it to the large-dimensional scenario.

Remark 6.4.

Since εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT should be much slower than the typical parametric rate n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT [18, 36], previous works have commonly assumed the existence of constants \mathfrak{C}fraktur_C and C𝐶Citalic_C, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have nεn2C𝑛superscriptsubscript𝜀𝑛2𝐶n\varepsilon_{n}^{2}\geq Citalic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_C (e.g., [64]). However, most of these works implicitly assumed that d𝑑ditalic_d is bounded and {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } are polynomially decayed and ignored the dependence of the constant \mathfrak{C}fraktur_C on {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } and d𝑑ditalic_d. Theorem 6.3 explicitly requires that \mathfrak{C}fraktur_C only depends on c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ.

Lower bound of the excess risk of kernel regression. Suppose that (𝒵,d)𝒵𝑑(\mathcal{Z},d)( caligraphic_Z , italic_d ) is a topological space with a compatible loss function d𝑑ditalic_d, which are map**s from 𝒵×𝒵𝒵𝒵\mathcal{Z}\times\mathcal{Z}caligraphic_Z × caligraphic_Z to 0subscriptabsent0\mathbb{R}_{\geq 0}blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT with d(f,f)=0𝑑𝑓𝑓0d(f,f)=0italic_d ( italic_f , italic_f ) = 0 and d(f,f)>0𝑑𝑓superscript𝑓0d(f,f^{\prime})>0italic_d ( italic_f , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) > 0 for ff𝑓superscript𝑓f\neq f^{\prime}italic_f ≠ italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We introduce the packing entropy and covering entropy below:

Definition 6.5 (Packing entropy).

A finite set Nε𝒵subscript𝑁𝜀𝒵N_{\varepsilon}\subset\mathcal{Z}italic_N start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ⊂ caligraphic_Z is said to be an ε𝜀\varepsilonitalic_ε-packing set in 𝒵𝒵\mathcal{Z}caligraphic_Z with separation ε>0𝜀0\varepsilon>0italic_ε > 0, if for any f,fNε,ffformulae-sequence𝑓superscript𝑓subscript𝑁𝜀𝑓superscript𝑓f,f^{\prime}\in N_{\varepsilon},f\neq f^{\prime}italic_f , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT , italic_f ≠ italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we have d(f,f)>ε𝑑𝑓superscript𝑓𝜀d\left(f,f^{\prime}\right)>\varepsilonitalic_d ( italic_f , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) > italic_ε. The logarithm of the maximum cardinality of ε𝜀\varepsilonitalic_ε-packing set is called the ε𝜀\varepsilonitalic_ε-packing entropy or Kolmogorov capacity of 𝒵𝒵\mathcal{Z}caligraphic_Z with distance d𝑑ditalic_d and is denoted by Md(ε,𝒵)subscript𝑀𝑑𝜀𝒵M_{d}(\varepsilon,\mathcal{Z})italic_M start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_ε , caligraphic_Z ).

Definition 6.6 (Covering entropy).

A set Gε𝒵subscript𝐺𝜀𝒵G_{\varepsilon}\subset\mathcal{Z}italic_G start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ⊂ caligraphic_Z is said to be an ε𝜀\varepsilonitalic_ε-net for 𝒵𝒵\mathcal{Z}caligraphic_Z if for any f~𝒵~𝑓𝒵\tilde{f}\in\mathcal{Z}over~ start_ARG italic_f end_ARG ∈ caligraphic_Z, there exists an f0Gεsubscript𝑓0subscript𝐺𝜀f_{0}\in G_{\varepsilon}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_G start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT such that d(f~,f0)ε𝑑~𝑓subscript𝑓0𝜀d(\tilde{f},f_{0})\leq\varepsilonitalic_d ( over~ start_ARG italic_f end_ARG , italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ italic_ε. The logarithm of the minimum cardinality of ε𝜀\varepsilonitalic_ε-net is called the ε𝜀\varepsilonitalic_ε-covering entropy of 𝒵𝒵\mathcal{Z}caligraphic_Z and is denoted by Vd(ε,𝒵)subscript𝑉𝑑𝜀𝒵V_{d}(\varepsilon,\mathcal{Z})italic_V start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_ε , caligraphic_Z ).

Let M2(ε,)subscript𝑀2𝜀M_{2}(\varepsilon,\mathcal{B})italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) be the ε𝜀\varepsilonitalic_ε-packing entropy of (,d2=L22)(\mathcal{B},d^{2}=\|\cdot\|_{L^{2}}^{2})( caligraphic_B , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and V2(ε,)subscript𝑉2𝜀V_{2}(\varepsilon,\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) be the ε𝜀\varepsilonitalic_ε-covering entropy of (,d2=L22)(\mathcal{B},d^{2}=\|\cdot\|_{L^{2}}^{2})( caligraphic_B , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). It is easy to verify that M2(2ε,)V2(ε,)M2(ε,)subscript𝑀22𝜀subscript𝑉2𝜀subscript𝑀2𝜀M_{2}(2\varepsilon,\mathcal{B})\leq V_{2}(\varepsilon,\mathcal{B})\leq M_{2}(% \varepsilon,\mathcal{B})italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 italic_ε , caligraphic_B ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ≤ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ( see, e.g., Lemma A.7 ). If we further introduce

𝒟={ρf| joint distribution of (y,x) where xρ𝒳,y=f(x)+ϵ,ϵN(0,σ2),f},𝒟conditional-setsubscript𝜌𝑓formulae-sequencesimilar-to joint distribution of (y,x) where 𝑥subscript𝜌𝒳formulae-sequence𝑦𝑓𝑥italic-ϵformulae-sequencesimilar-toitalic-ϵ𝑁0superscript𝜎2𝑓\displaystyle\mathcal{D}=\left\{\rho_{f}~{}\bigg{|}~{}\mbox{ joint % distribution of $(y,x$) where }x\sim\rho_{\mathcal{X}},y=f(x)+\epsilon,% \epsilon\sim N(0,\sigma^{2}),f\in\mathcal{B}\right\},caligraphic_D = { italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | joint distribution of ( italic_y , italic_x ) where italic_x ∼ italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT , italic_y = italic_f ( italic_x ) + italic_ϵ , italic_ϵ ∼ italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_f ∈ caligraphic_B } , (30)

and let VK(ε,𝒟)subscript𝑉𝐾𝜀𝒟V_{K}(\varepsilon,\mathcal{D})italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε , caligraphic_D ) be the ε𝜀\varepsilonitalic_ε-covering entropy of (𝒟,d2= KL divergence )𝒟superscript𝑑2 KL divergence (\mathcal{D},d^{2}=\text{ KL divergence })( caligraphic_D , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = KL divergence ). Then it is easy to verify that V2(ε,)=VK(ε/(2σ),𝒟)subscript𝑉2𝜀subscript𝑉𝐾𝜀2𝜎𝒟V_{2}(\varepsilon,\mathcal{B})=V_{K}({\varepsilon}/{(\sqrt{2}\sigma)},\mathcal% {D})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε / ( square-root start_ARG 2 end_ARG italic_σ ) , caligraphic_D ) ( see, e.g., Lemma A.8 ).

The following minimax lower bound is introduced in [82].

Proposition 6.7 (Theorem 1 and Corollary 1 in [82]).

Let ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ε¯nsubscript¯𝜀𝑛\underline{\varepsilon}_{n}under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be given by nε¯n2=VK(ε¯n,𝒟)𝑛superscriptsubscript¯𝜀𝑛2subscript𝑉𝐾subscript¯𝜀𝑛𝒟n\bar{\varepsilon}_{n}^{2}=V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) and M2(ε¯n,)=4nε¯n2+2log2subscript𝑀2subscript¯𝜀𝑛4𝑛superscriptsubscript¯𝜀𝑛222M_{2}(\underline{\varepsilon}_{n},\mathcal{B})=4n\bar{\varepsilon}_{n}^{2}+2\log 2italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = 4 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 roman_log 2. Suppose further that M2(ε,)2log2subscript𝑀2𝜀22M_{2}(\varepsilon,\mathcal{B})\geq 2\log 2italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ≥ 2 roman_log 2 for sufficiently small ε𝜀\varepsilonitalic_ε. Then we have the following statements.

  • i)

    For sufficiently large n𝑛nitalic_n, we have ε¯n<subscript¯𝜀𝑛\underline{\varepsilon}_{n}<\inftyunder¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < ∞ and the minimax risk for estimating fsubscript𝑓f_{\star}\in\mathcal{B}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B satisfies

    minf^maxf𝔼(𝕏,𝕪)ρfnf^fL22(1/8)ε¯n2;subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿2218superscriptsubscript¯𝜀𝑛2\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq(1/8)\underline{\varepsilon}_{n}^{2};roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ ( 1 / 8 ) under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ; (31)
  • ii)

    If the richness condition lim infε0M2(αε,)/M2(ε,)=1+δsubscriptlimit-infimum𝜀0subscript𝑀2𝛼𝜀subscript𝑀2𝜀1𝛿\liminf_{\varepsilon\rightarrow 0}M_{2}(\alpha\varepsilon,\mathcal{B})/M_{2}(% \varepsilon,\mathcal{B})=1+\deltalim inf start_POSTSUBSCRIPT italic_ε → 0 end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_α italic_ε , caligraphic_B ) / italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) = 1 + italic_δ holds for some 0<α<10𝛼10<\alpha<10 < italic_α < 1 and some δ>0𝛿0\delta>0italic_δ > 0, then we have

    𝔠1ε¯n2(1/8)ε¯n2minf^maxf𝔼(𝕏,𝕪)ρfnf^fL22𝔠2ε¯n2,subscript𝔠1superscriptsubscript¯𝜀𝑛218superscriptsubscript¯𝜀𝑛2subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22subscript𝔠2superscriptsubscript¯𝜀𝑛2\mathfrak{c}_{1}\bar{\varepsilon}_{n}^{2}\leq(1/8)\underline{\varepsilon}_{n}^% {2}\leq\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},% \mathbb{y})\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{% L^{2}}^{2}\leq\mathfrak{c}_{2}\bar{\varepsilon}_{n}^{2},fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 / 8 ) under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (32)

    where 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝔠2subscript𝔠2\mathfrak{c}_{2}fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are constants only depending on α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ.

Remark 6.8.

From the monotony of VKsubscript𝑉𝐾V_{K}italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT and M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, one can show the existence and uniqueness of ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ε¯nsubscript¯𝜀𝑛\underline{\varepsilon}_{n}under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

If the richness condition holds, [82] has shown that ε¯n8𝔠1ε¯nsubscript¯𝜀𝑛8subscript𝔠1subscript¯𝜀𝑛\underline{\varepsilon}_{n}\geq 8\mathfrak{c}_{1}\bar{\varepsilon}_{n}under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 8 fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and demonstrated that ε¯n2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be served as a minimax lower bound for several function classes. The constant 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT depends on δ𝛿\deltaitalic_δ and α𝛼\alphaitalic_α will be very small provided that δ𝛿\deltaitalic_δ is small enough ( referred to Lemma 4 in [82]). Unfortunately, if one plans to apply the Proposition 6.7 into the RKHS with large d𝑑ditalic_d, we have the following proposition showing that for the RKHS associated with inner product kernels, δ𝛿\deltaitalic_δ can be arbitrarily small when d𝑑ditalic_d is large:

Proposition 6.9.

Let ={f𝚒𝚗|f𝚒𝚗1}conditional-setsubscript𝑓superscript𝚒𝚗subscriptnormsubscript𝑓superscript𝚒𝚗1\mathcal{B}=\{f_{\star}\in\mathcal{H}^{\mathtt{in}}~{}|~{}\|f_{\star}\|_{% \mathcal{H}^{\mathtt{in}}}\leq 1\}caligraphic_B = { italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT | ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 }, where 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is the RKHS associated with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. For any 0<α<10𝛼10<\alpha<10 < italic_α < 1 and any δ>0𝛿0\delta>0italic_δ > 0, there exists a sequence {ε~d}d=1superscriptsubscriptsubscript~𝜀𝑑𝑑1\{\tilde{\varepsilon}_{d}\}_{d=1}^{\infty}{ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT, such that lim infdε~d=0subscriptlimit-infimum𝑑subscript~𝜀𝑑0\liminf_{d\rightarrow\infty}\tilde{\varepsilon}_{d}=0lim inf start_POSTSUBSCRIPT italic_d → ∞ end_POSTSUBSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 0, and we have

lim infdM2(αε~d,)M2(ε~d,)1+δ.subscriptlimit-infimum𝑑subscript𝑀2𝛼subscript~𝜀𝑑subscript𝑀2subscript~𝜀𝑑1𝛿\liminf_{d\rightarrow\infty}\frac{M_{2}(\alpha\tilde{\varepsilon}_{d},\mathcal% {B})}{M_{2}(\tilde{\varepsilon}_{d},\mathcal{B})}\leq 1+\delta.lim inf start_POSTSUBSCRIPT italic_d → ∞ end_POSTSUBSCRIPT divide start_ARG italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_α over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , caligraphic_B ) end_ARG start_ARG italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≤ 1 + italic_δ .

The Proposition 6.9 reveals an essential difficulty in determining the minimax lower bound of kernel regression with large dimensional data: when d𝑑ditalic_d is very large, the lower bound in Proposition 6.7 may become vague. To avoid potential confusion, we specify the large dimensional scenario for kernel regression where we perform our analysis as in Assumption 3. The following theorem provides a minimax lower bound of kernel regression in large dimensions.

Theorem 6.10 (Minimax lower bound).

Let ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be given by nε¯n2=VK(ε¯n,𝒟)𝑛superscriptsubscript¯𝜀𝑛2subscript𝑉𝐾subscript¯𝜀𝑛𝒟n\bar{\varepsilon}_{n}^{2}=V_{K}(\bar{\varepsilon}_{n},\mathcal{D})italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ). Assume that there exists a constant \mathfrak{C}fraktur_C, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have nε¯n22log2𝑛superscriptsubscript¯𝜀𝑛222n\bar{\varepsilon}_{n}^{2}\geq 2\log 2italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 2 roman_log 2. Then for any constant 𝔠2>0subscript𝔠20\mathfrak{c}_{2}>0fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that the inequality

VK(ε¯n,𝒟)15V2(𝔠2ε¯n,)subscript𝑉𝐾subscript¯𝜀𝑛𝒟15subscript𝑉2subscript𝔠2subscript¯𝜀𝑛\displaystyle V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})\leq\frac{1}{5% }V_{2}(\mathfrak{c}_{2}\bar{\varepsilon}_{n},\mathcal{B})italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) ≤ divide start_ARG 1 end_ARG start_ARG 5 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) (33)

holds for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL2212(𝔠212)2ε¯n2,subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿2212superscriptsubscript𝔠2122superscriptsubscript¯𝜀𝑛2\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\frac{1}{2}\left(\frac{\mathfrak{c}_{2}}{12}\right)^{2}\bar{\varepsilon}_{% n}^{2},roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 12 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (34)

where ρfsubscript𝜌subscript𝑓\rho_{f_{\star}}italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the joint-p.d.f. of x,y𝑥𝑦x,yitalic_x , italic_y given by (2) with f=f𝑓subscript𝑓f=f_{\star}italic_f = italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT.

Remark 6.11.

When the richness condition holds for some constants δ>0𝛿0\delta>0italic_δ > 0 and 0<α<10𝛼10<\alpha<10 < italic_α < 1, let N𝑁Nitalic_N be the smallest integer satisfying (1+δ)N>5superscript1𝛿𝑁5(1+\delta)^{N}>5( 1 + italic_δ ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT > 5. One can show that (33) holds for 𝔠2=(1+δ)Nσ/2subscript𝔠2superscript1𝛿𝑁𝜎2\mathfrak{c}_{2}=(1+\delta)^{N}\sigma/\sqrt{2}fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( 1 + italic_δ ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ / square-root start_ARG 2 end_ARG (see, e.g., Proposition A.9). In other words, the scope where the Theorem 6.10 can be applied is larger than the Proposition 6.7.

7 What Can We Expect from Kernel Regression for Large Dimensional Data

Since [39] introduced the NTK, studying the generalization performance of kernel methods has become a natural surrogate for studying the generalization performance of neural networks. In the past several years, lots of works have been done in kernel regression with fixed-dimension (e.g. [15, 47, 64, 71, 83, 84]). Though these works greatly extend our understanding of kernel regression, they also raise more natural problems for us. For example, [47] showed that fixed-dimensional kernel interpolation generalized poorly, which conflicts with the widely observed ‘benign overfitting’ phenomenon. Some researchers then speculated that in certain scenarios, the ‘benign overfitting phenomenon’ might be due to the large dimensionality of data. This urges researchers to study the kernel regression over large dimensional data (i.e., ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some γ>0𝛾0\gamma>0italic_γ > 0) ( see, e.g., [24, 30, 49, 50, 52, 66, 81]).

In this section, we gather some recent findings and compare them with Theorem 4.2 and Theorem 4.3. These great works and our results strongly suggest that there might be other deeper structures hidden in the kernel regression on large dimensional data.

7.1 Consistency of kernel regression when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, γ>0𝛾0\gamma>0italic_γ > 0

We term a non-parametric regression method consistent if its estimator’s excess risk converges to zero as n𝑛n\to\inftyitalic_n → ∞, and inconsistent otherwise. We note that some literature has discussed the inconsistency of kernel methods with inner product kernels when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some non-integer γ𝛾\gammaitalic_γ ([29, 30, 31, 56, 60]). Let us first replicate some notations from [30]. Denote RKR(f,𝑿,λ)subscript𝑅KRsubscript𝑓𝑿𝜆R_{\mathrm{KR}}\left(f_{\star},\boldsymbol{X},\lambda\right)italic_R start_POSTSUBSCRIPT roman_KR end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , bold_italic_X , italic_λ ) as the excess risk of kernel ridge regression, and RK(g):=mina𝔼x{(g(x)i=1naiK(xi,x))2}assignsubscript𝑅𝐾𝑔subscript𝑎subscript𝔼𝑥superscript𝑔𝑥superscriptsubscript𝑖1𝑛subscript𝑎𝑖𝐾subscript𝑥𝑖𝑥2R_{K}\left(g\right):=\min_{a}\mathbb{E}_{x}\left\{\left(g(x)-\sum_{i=1}^{n}a_{% i}K\left(x_{i},x\right)\right)^{2}\right\}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_g ) := roman_min start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT { ( italic_g ( italic_x ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } as a lower bound on the prediction error of general kernel methods with regression function g𝑔gitalic_g, PsubscriptPabsent\mathrm{P}_{\leq\ell}roman_P start_POSTSUBSCRIPT ≤ roman_ℓ end_POSTSUBSCRIPT as the projection onto polynomials with degree absent\leq\ell≤ roman_ℓ, and P>subscriptPabsent\mathrm{P}_{>\ell}roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT as the projection onto polynomials with degree >absent>\ell> roman_ℓ.

Remark 7.1.

For functions defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, PsubscriptPabsent\mathrm{P}_{\leq\ell}roman_P start_POSTSUBSCRIPT ≤ roman_ℓ end_POSTSUBSCRIPT is the projection onto linear space of spherical harmonics with degree absent\leq\ell≤ roman_ℓ (see, e.g., Definition 1.1.1 in [22]). These spherical harmonics form an orthonormal basis of L2(𝕊d,ρ𝒳)superscript𝐿2superscript𝕊𝑑subscript𝜌𝒳L^{2}\left(\mathbb{S}^{d},\rho_{\mathcal{X}}\right)italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ), and thus can represent functions in any RKHS L2(𝕊d,ρ𝒳)superscript𝐿2superscript𝕊𝑑subscript𝜌𝒳\mathcal{H}\subset L^{2}\left(\mathbb{S}^{d},\rho_{\mathcal{X}}\right)caligraphic_H ⊂ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ). For example, 𝙽𝚃superscript𝙽𝚃\mathcal{H}^{\mathtt{NT}}caligraphic_H start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT is spanned by spherical polynomials with degree =0,1,2,4,0124\ell=0,1,2,4,\cdotsroman_ℓ = 0 , 1 , 2 , 4 , ⋯ ([11]).

The following two propositions restate the results in [30]:

Proposition 7.2 (Restate Theorem 3 in [30]).

Suppose there exists an integer {0,1,}01\ell\in\{0,1,\cdots\}roman_ℓ ∈ { 0 , 1 , ⋯ }, and a constant 0<δ<10𝛿10<\delta<10 < italic_δ < 1, such that nd+1δasymptotically-equals𝑛superscript𝑑1𝛿n\asymp d^{\ell+1-\delta}italic_n ≍ italic_d start_POSTSUPERSCRIPT roman_ℓ + 1 - italic_δ end_POSTSUPERSCRIPT. Assume that fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is square-integrable in d𝕊d𝑑superscript𝕊𝑑\sqrt{d}\mathbb{S}^{d}square-root start_ARG italic_d end_ARG blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with bounded L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm. Suppose further that σ2=0superscript𝜎20\sigma^{2}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0. Then, for any ε>0𝜀0\varepsilon>0italic_ε > 0, with high probability we have

|RK(f)RK(Pf)P>fL22|εfL2P>fL2.subscript𝑅𝐾subscript𝑓subscript𝑅𝐾subscriptPabsentsubscript𝑓superscriptsubscriptnormsubscriptPabsentsubscript𝑓superscript𝐿22𝜀subscriptnormsubscript𝑓superscript𝐿2subscriptnormsubscriptPabsentsubscript𝑓superscript𝐿2\left|R_{K}\left(f_{\star}\right)-R_{K}\left(\mathrm{P}_{\leq\ell}f_{\star}% \right)-\left\|\mathrm{P}_{>\ell}f_{\star}\right\|_{L^{2}}^{2}\right|\leq% \varepsilon\left\|f_{\star}\right\|_{L^{2}}\left\|\mathrm{P}_{>\ell}f_{\star}% \right\|_{L^{2}}.| italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) - italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( roman_P start_POSTSUBSCRIPT ≤ roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) - ∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ italic_ε ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT . (35)
Proposition 7.3 (Restate Theorem 4 in [30]).

Suppose there exists an integer {0,1,}01\ell\in\{0,1,\cdots\}roman_ℓ ∈ { 0 , 1 , ⋯ }, and a constant 0<δ<10𝛿10<\delta<10 < italic_δ < 1, such that nd+1δasymptotically-equals𝑛superscript𝑑1𝛿n\asymp d^{\ell+1-\delta}italic_n ≍ italic_d start_POSTSUPERSCRIPT roman_ℓ + 1 - italic_δ end_POSTSUPERSCRIPT. Assume that fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is square-integrable in d𝕊d𝑑superscript𝕊𝑑\sqrt{d}\mathbb{S}^{d}square-root start_ARG italic_d end_ARG blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with bounded L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm. Suppose further that Assumption 3 in [30] holds for the kernel K𝐾Kitalic_K. Then, for any ε>0𝜀0\varepsilon>0italic_ε > 0, and any regularization parameter 0<λ<λ0𝜆superscript𝜆0<\lambda<\lambda^{*}0 < italic_λ < italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with high probability we have

|RKR(f,𝑿,λ)P>fL22|ε(fL22+σ2),subscript𝑅KRsubscript𝑓𝑿𝜆superscriptsubscriptnormsubscriptPabsentsubscript𝑓superscript𝐿22𝜀superscriptsubscriptnormsubscript𝑓superscript𝐿22superscript𝜎2\left|R_{\mathrm{KR}}\left(f_{\star},\boldsymbol{X},\lambda\right)-\left\|% \mathrm{P}_{>\ell}f_{\star}\right\|_{L^{2}}^{2}\right|\leq\varepsilon\left(% \left\|f_{\star}\right\|_{L^{2}}^{2}+\sigma^{2}\right),| italic_R start_POSTSUBSCRIPT roman_KR end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , bold_italic_X , italic_λ ) - ∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ italic_ε ( ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (36)

where λsuperscript𝜆\lambda^{*}italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is defined as (20) in [30].

By assuming that the regression function falls into the square-integrable function space, we can summarize their results (and what they claimed as their main contributions) as following three points:

  • (1)

    When fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is a polynomial with a degree at most 00\ell\geq 0roman_ℓ ≥ 0, Proposition 7.3 demonstrates that under specific regularization parameters, kernel ridge regression is consistent when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some non-integer γ>𝛾\gamma>\ellitalic_γ > roman_ℓ.

  • (2)

    When fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is not a polynomial with a degree at most 00\ell\geq 0roman_ℓ ≥ 0, if the noise term is always zero, then Proposition 7.2 shows that all kernel methods are inconsistent when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some non-integer γ<+1𝛾1\gamma<\ell+1italic_γ < roman_ℓ + 1.

  • (3)

    They claimed that "kernel methods can fit at most a degree-\ellroman_ℓ polynomial".

We notice that they merely assume the regression function falls into the square-integrable function space, which is too large and seldom considered in most non-parametric regression problems. In practice, researchers often consider sub-spaces of the square-integrable function space that possess better properties. For instance, [74] and [75] prove the optimality of additive regression and polynomial splines by assuming that the regression functions are square-integrable with specific smoothness conditions. Moreover, when dealing with kernel methods, researchers often assume that the regression function falls into the RKHS associated with the kernel [14, 15, 16], instead of merely assuming that the regression function is square-integrable.

In our study, we also adopt the more reasonable assumption that the regression function falls into the RKHS 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. By modifying tools of the empirical process and calculating the covering number of 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, we attained the optimality, and thus consistency, of kernel regression when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for γ>0𝛾0\gamma>0italic_γ > 0. In contrast, tools of the empirical process do not apply to the square-integrable function class, since the covering number of the square-integrable function class is unbounded. Therefore, for the square-integrable function class, it is difficult to attain optimality results of kernel regression in large dimensions.

Remark 7.4.

Notice that Proposition 7.3 can be applied to functions in \mathcal{B}caligraphic_B since P>fL22μ+1f[]s2superscriptsubscriptnormsubscriptPabsentsubscript𝑓superscript𝐿22subscript𝜇1superscriptsubscriptnormsubscript𝑓superscriptdelimited-[]𝑠2\left\|\mathrm{P}_{>\ell}f_{\star}\right\|_{L^{2}}^{2}\leq\mu_{\ell+1}\left\|f% _{\star}\right\|_{[\mathcal{H}]^{s}}^{2}∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT roman_ℓ + 1 end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT [ caligraphic_H ] start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. However, we have that P>fL22=od(1)superscriptsubscriptnormsubscriptPabsentsubscript𝑓superscript𝐿22subscript𝑜𝑑1\left\|\mathrm{P}_{>\ell}f_{\star}\right\|_{L^{2}}^{2}=o_{d}(1)∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 ), hence Proposition 7.3 is not precise enough to provide a convergence rate (the r.h.s. is basically Θd(1)subscriptΘ𝑑1\Theta_{d}(1)roman_Θ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 )) and in fact P>γfL22superscriptsubscriptnormsubscriptPabsent𝛾subscript𝑓superscript𝐿22\left\|\mathrm{P}_{>\lfloor\gamma\rfloor}f_{\star}\right\|_{L^{2}}^{2}∥ roman_P start_POSTSUBSCRIPT > ⌊ italic_γ ⌋ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is not the right quantity determining the convergence, the analysis in the paper rather suggests P>qfL22superscriptsubscriptnormsubscriptPabsent𝑞subscript𝑓superscript𝐿22\left\|\mathrm{P}_{>q}f_{\star}\right\|_{L^{2}}^{2}∥ roman_P start_POSTSUBSCRIPT > italic_q end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, q=p1,p𝑞𝑝1𝑝q=p-1,pitalic_q = italic_p - 1 , italic_p as a pivotal role.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: (a) A cartoon of the excess risk of kernel ridge regression when fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is square-integrable. Borrowed from [30]. (b) The excess risk of early-stop** kernel regression when f𝚒𝚗subscript𝑓superscript𝚒𝚗f_{\star}\in\mathcal{H}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Obtained from Theorem 4.2 and Theorem 4.3. (c) The excess risk of kernel interpolation when f𝚒𝚗subscript𝑓superscript𝚒𝚗f_{\star}\in\mathcal{H}^{\mathtt{in}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. Obtained from results in [50].

7.2 Kernel regressions generalize better than kernel interpolation in large dimensions

Recent findings reported in [47] indicate that kernel interpolation exhibits poorer generalization compared to early-stop** kernel regression in fixed dimensions. In this subsection, we will show that kernel interpolation generalizes more poorly than kernel regression in large dimensions.

We notice that [50] have obtained an upper bound on the convergence rate of the excess risk of kernel interpolation. The following proposition restates their main results:

Proposition 7.5 (Restate Theorem 1 in [50]).

Suppose there exists a constant γ>1𝛾1\gamma>1italic_γ > 1, such that ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. Suppose further that the regression function can be expressed as f(x)=K(x,),ρ()L2subscript𝑓𝑥subscript𝐾𝑥subscript𝜌superscript𝐿2f_{\star}(x)=\langle K(x,\cdot),\rho_{\star}(\cdot)\rangle_{L^{2}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) = ⟨ italic_K ( italic_x , ⋅ ) , italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( ⋅ ) ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with ρL44Csuperscriptsubscriptnormsubscript𝜌superscript𝐿44𝐶\|\rho_{\star}\|_{L^{4}}^{4}\leq C∥ italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≤ italic_C for some constant C>0𝐶0C>0italic_C > 0. Let fsubscript𝑓f_{\infty}italic_f start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT be the function defined in (6) with t=𝑡t=\inftyitalic_t = ∞. Define =γ𝛾\ell=\lfloor\gamma\rfloorroman_ℓ = ⌊ italic_γ ⌋, and η(γ)=min{(+1)/γ1,1/γ}𝜂𝛾1𝛾11𝛾\eta(\gamma)=\min\left\{(\ell+1)/\gamma-1,1-\ell/\gamma\right\}italic_η ( italic_γ ) = roman_min { ( roman_ℓ + 1 ) / italic_γ - 1 , 1 - roman_ℓ / italic_γ }. Then, under some specific conditions on the distribution of the samples and the kernel K𝐾Kitalic_K, there exists a constant 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT not depending on n𝑛nitalic_n and d𝑑ditalic_d, such that we have

ffL221nη(γ)superscriptsubscriptnormsubscript𝑓subscript𝑓superscript𝐿22subscript1superscript𝑛𝜂𝛾\left\|{f}_{\infty}-f_{\star}\right\|_{L^{2}}^{2}\leq\mathfrak{C}_{1}n^{-\eta(% \gamma)}∥ italic_f start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - italic_η ( italic_γ ) end_POSTSUPERSCRIPT (37)

with probability at least 1δexp{n/d}1𝛿𝑛superscript𝑑1-\delta-\exp\{n/d^{\ell}\}1 - italic_δ - roman_exp { italic_n / italic_d start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT }.

Let’s compare the results presented in Proposition 7.5 with the findings stated in Theorem 4.3:

  • (1)

    It is clear that η(γ)η(3/2)=1/3<1/2𝜂𝛾𝜂321312\eta(\gamma)\leq\eta(3/2)=1/3<1/2italic_η ( italic_γ ) ≤ italic_η ( 3 / 2 ) = 1 / 3 < 1 / 2. Therefore, when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for γ>1𝛾1\gamma>1italic_γ > 1, the convergence rate of kernel interpolation, which is nη(γ)superscript𝑛𝜂𝛾n^{-\eta(\gamma)}italic_n start_POSTSUPERSCRIPT - italic_η ( italic_γ ) end_POSTSUPERSCRIPT, is slower compared to the convergence rate of kernel regression given in Theorem 4.3.

  • (2)

    For 0<γ10𝛾10<\gamma\leq 10 < italic_γ ≤ 1, the convergence rate of the estimators produced by kernel regression is n1superscript𝑛1n^{-1}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, while the convergence rate of the estimators produced by kernel interpolation is missing in the [50].

Moreover, in Figure 2, we plot the upper bound results in [50] for kernel interpolation, represented by the orange line, together with the upper and lower bound results in Theorem 4.2 and Theorem 4.3 for kernel regression, represented by the blue line. We can observe that the rate of the blue line is significantly faster than that of the orange line for all γ>1𝛾1\gamma>1italic_γ > 1. From the above discussion, we can conclude that kernel interpolation (t=𝑡t=\inftyitalic_t = ∞) generalizes much more poorly than early-stop** kernel regression (t=T^<𝑡^𝑇t=\widehat{T}<\inftyitalic_t = over^ start_ARG italic_T end_ARG < ∞) in large dimensions.

7.3 Numerical Experiments

In this subsection, our objective is to experimentally verify that when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some fixed γ>0𝛾0\gamma>0italic_γ > 0, and considering functions fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT in \mathcal{H}caligraphic_H with bounded norms, the early-stop** kernel regression algorithms, defined as (6), can achieve a convergence rate given in Theorem 4.2 and Theorem 4.3, while the kernel interpolation algorithms can not (when γ>1𝛾1\gamma>1italic_γ > 1).

We consider the following two inner product kernels:

  • The neural tangent kernel of a two-layer ReLU neural network:

    K𝙽𝚃(x,x):=Φ(x,x),x,x𝕊d.formulae-sequenceassignsuperscript𝐾𝙽𝚃𝑥superscript𝑥Φ𝑥superscript𝑥𝑥similar-tosuperscript𝑥superscript𝕊𝑑K^{\mathtt{NT}}(x,x^{\prime}):=\Phi(\langle x,x^{\prime}\rangle),~{}~{}x,x^{% \prime}\sim\mathbb{S}^{d}.italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) := roman_Φ ( ⟨ italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ ) , italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

    where Φ(t)=[sin(arccost)+2(πarccost)t]/(2π)Φ𝑡delimited-[]𝑡2𝜋𝑡𝑡2𝜋\Phi(t)=\left[\sin{(\arccos t)}+2(\pi-\arccos t)t\right]/(2\pi)roman_Φ ( italic_t ) = [ roman_sin ( roman_arccos italic_t ) + 2 ( italic_π - roman_arccos italic_t ) italic_t ] / ( 2 italic_π ).

  • The RBF kernel with a fixed bandwidth:

    Krbf(x,x)=exp(xx222),x,x𝕊d.formulae-sequencesuperscript𝐾rbf𝑥superscript𝑥superscriptsubscriptnorm𝑥superscript𝑥222𝑥similar-tosuperscript𝑥superscript𝕊𝑑K^{\mathrm{rbf}}(x,x^{\prime})=\exp{\left(-\frac{\|x-x^{\prime}\|_{2}^{2}}{2}% \right)},~{}~{}x,x^{\prime}\sim\mathbb{S}^{d}.italic_K start_POSTSUPERSCRIPT roman_rbf end_POSTSUPERSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_exp ( - divide start_ARG ∥ italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) , italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

For any dimension d𝑑ditalic_d, let ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT be the uniform distribution on 𝒳=𝕊d𝒳superscript𝕊𝑑\mathcal{X}=\mathbb{S}^{d}caligraphic_X = blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. We construct a function fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT in \mathcal{H}caligraphic_H as follows:

f(x)=k(x,u1)+k(x,u2)+k(x,u3),superscript𝑓𝑥𝑘𝑥subscript𝑢1𝑘𝑥subscript𝑢2𝑘𝑥subscript𝑢3f^{*}(x)=k(x,u_{1})+k(x,u_{2})+k(x,u_{3}),italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) = italic_k ( italic_x , italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_k ( italic_x , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_k ( italic_x , italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , (38)

where u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, u2subscript𝑢2u_{2}italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and u3subscript𝑢3u_{3}italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are sampled from ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT. Then, we consider the data generation process with the model given by Equation (2), which can be expressed as:

y=f(𝒙)+ϵ,𝑦subscript𝑓𝒙italic-ϵy=f_{\star}(\boldsymbol{x})+\epsilon,italic_y = italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_x ) + italic_ϵ , (39)

where ϵ𝒩(0,1)similar-toitalic-ϵ𝒩01\epsilon\sim\mathcal{N}(0,1)italic_ϵ ∼ caligraphic_N ( 0 , 1 ). We construct the estimators of the kernel regression and kernel interpolation (KI) fT^subscript𝑓^𝑇{f}_{\widehat{T}}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT and ftsubscript𝑓subscript𝑡{f}_{t_{\infty}}italic_f start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_POSTSUBSCRIPT using Equation (6), where the stop** time T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG is set to Cn1/2𝐶superscript𝑛12Cn^{-1/2}italic_C italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT with a constant C𝐶Citalic_C and t=subscript𝑡t_{\infty}=\inftyitalic_t start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = ∞. We consider four different settings to simulate results under different asymptotic frameworks of ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, where γ>0𝛾0\gamma>0italic_γ > 0:

  • γ=0.5::𝛾0.5absent\gamma=0.5:italic_γ = 0.5 : n𝑛nitalic_n from 100 to 200, with intervals 5, d=n2𝑑superscript𝑛2d=n^{2}italic_d = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • γ=0.8::𝛾0.8absent\gamma=0.8:italic_γ = 0.8 : n𝑛nitalic_n from 500 to 1000, with intervals 10, d=n5/4𝑑superscript𝑛54d=n^{5/4}italic_d = italic_n start_POSTSUPERSCRIPT 5 / 4 end_POSTSUPERSCRIPT.

  • γ=1.5::𝛾1.5absent\gamma=1.5:italic_γ = 1.5 : n𝑛nitalic_n from 1000 to 5000, with intervals 200, d=n2/3𝑑superscript𝑛23d=n^{2/3}italic_d = italic_n start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT.

  • γ=1.8::𝛾1.8absent\gamma=1.8:italic_γ = 1.8 : n𝑛nitalic_n from 1000 to 5000, with intervals 200, d=n5/9𝑑superscript𝑛59d=n^{5/9}italic_d = italic_n start_POSTSUPERSCRIPT 5 / 9 end_POSTSUPERSCRIPT.

We numerically approximate the excess risk ftfL22superscriptsubscriptnormsubscript𝑓𝑡subscript𝑓superscript𝐿22\|{f}_{t}-f_{\star}\|_{L^{2}}^{2}∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by i=1N(ft(zi)f(zi))2/Nsuperscriptsubscript𝑖1𝑁superscriptsubscript𝑓𝑡subscript𝑧𝑖subscript𝑓subscript𝑧𝑖2𝑁\sum_{i=1}^{N}({f}_{t}(z_{i})-f_{\star}(z_{i}))^{2}/N∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_N, where N=1000𝑁1000N=1000italic_N = 1000 and zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are test data drawn i.i.d. from ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT. For each combination of (n,d)𝑛𝑑(n,d)( italic_n , italic_d ), we repeat the experiments 20202020 times and compute the average excess risk. To visualize the convergence rate r𝑟ritalic_r, we perform logarithmic least-squares logrisk=rlogn+brisk𝑟𝑛𝑏\log\text{risk}=r\log n+broman_log risk = italic_r roman_log italic_n + italic_b to fit the excess risk with respect to the sample size and display the value of r𝑟ritalic_r.

Refer to caption
(a) n=d0.5𝑛superscript𝑑0.5n=d^{0.5}italic_n = italic_d start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n1absentsuperscript𝑛1=n^{-1}= italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
Refer to caption
(b) n=d0.8𝑛superscript𝑑0.8n=d^{0.8}italic_n = italic_d start_POSTSUPERSCRIPT 0.8 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n1absentsuperscript𝑛1=n^{-1}= italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
Refer to caption
(c) n=d1.5𝑛superscript𝑑1.5n=d^{1.5}italic_n = italic_d start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n2/3absentsuperscript𝑛23=n^{-2/3}= italic_n start_POSTSUPERSCRIPT - 2 / 3 end_POSTSUPERSCRIPT
Refer to caption
(d) n=d1.8𝑛superscript𝑑1.8n=d^{1.8}italic_n = italic_d start_POSTSUPERSCRIPT 1.8 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n5/9absentsuperscript𝑛59=n^{-5/9}= italic_n start_POSTSUPERSCRIPT - 5 / 9 end_POSTSUPERSCRIPT
Figure 3: Log excess risk decay curves of kernel regression and kernel interpolation with NTK under different asymptotic frameworks ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. The blue curves represent the average excess risks computed from 20 trials. The dashed black lines are obtained through logarithmic least-squares regression, with the slopes indicating the convergence rates denoted as r𝑟ritalic_r. The four sub-figures from left to right and from top to bottom correspond to the settings where n𝑛nitalic_n is set to be equal to d0.5superscript𝑑0.5d^{0.5}italic_d start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT, d0.8superscript𝑑0.8d^{0.8}italic_d start_POSTSUPERSCRIPT 0.8 end_POSTSUPERSCRIPT, d1.5superscript𝑑1.5d^{1.5}italic_d start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, and d1.8superscript𝑑1.8d^{1.8}italic_d start_POSTSUPERSCRIPT 1.8 end_POSTSUPERSCRIPT respectively. In each setting, the constant C𝐶Citalic_C is chosen from {0.001,0.01,0.1,1,10,100,1000}0.0010.010.11101001000\{0.001,0.01,0.1,1,10,100,1000\}{ 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 }, and we report our numerical results under the best choice of C𝐶Citalic_C.
Refer to caption
(a) n=d0.5𝑛superscript𝑑0.5n=d^{0.5}italic_n = italic_d start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n1absentsuperscript𝑛1=n^{-1}= italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
Refer to caption
(b) n=d0.8𝑛superscript𝑑0.8n=d^{0.8}italic_n = italic_d start_POSTSUPERSCRIPT 0.8 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n1absentsuperscript𝑛1=n^{-1}= italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
Refer to caption
(c) n=d1.5𝑛superscript𝑑1.5n=d^{1.5}italic_n = italic_d start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n2/3absentsuperscript𝑛23=n^{-2/3}= italic_n start_POSTSUPERSCRIPT - 2 / 3 end_POSTSUPERSCRIPT
Refer to caption
(d) n=d1.8𝑛superscript𝑑1.8n=d^{1.8}italic_n = italic_d start_POSTSUPERSCRIPT 1.8 end_POSTSUPERSCRIPT, kernel regression theoretical rate =n5/9absentsuperscript𝑛59=n^{-5/9}= italic_n start_POSTSUPERSCRIPT - 5 / 9 end_POSTSUPERSCRIPT
Figure 4: A similar plot as Figure 3, but with the RBF kernel.

We try different values of the constant C{0.001,0.01,0.1,1,10,100,1000}𝐶0.0010.010.11101001000C\in\{0.001,0.01,0.1,1,10,100,1000\}italic_C ∈ { 0.001 , 0.01 , 0.1 , 1 , 10 , 100 , 1000 } for the stop** time T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG, and we report our numerical results in Figure 3 and Figure 4 under the best choice of C𝐶Citalic_C. For each setting, we observe that the convergence rates of the excess risk in kernel regression algorithms are consistently close to the theoretical rate as given in Theorem 4.2 and Theorem 4.3. Moreover, we find that KI is comparative to kernel regression when γ=0.5𝛾0.5\gamma=0.5italic_γ = 0.5, and is worse than kernel regression when γ=0.8,1.5𝛾0.81.5\gamma=0.8,1.5italic_γ = 0.8 , 1.5, or 1.81.81.81.8.

8 Conclusion and Future Works

In this paper, we built a set of technical tools to study kernel regression in large dimensions (where the sample size n𝑛nitalic_n was polynomially depending on the dimensionality d𝑑ditalic_d, i.e., ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some γ>0𝛾0\gamma>0italic_γ > 0). We have shown that a properly chosen early stop** rule results in a fitting function with its excess risk (generalization error) upper bounded by the Mendelson complexity εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the minimax lower bound of the generalization error is bounded below by the metric entropy ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We then examined the spherical data. Provided that fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT fell into the unit ball of 𝚒𝚗superscript𝚒𝚗\mathcal{H}^{\mathtt{in}}caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, the RKHS associated with an inner product kernel K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT, we showed in Theorem 3.3 and Theorem 3.5 that the minimax rate of the excess risk of kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for any γ=2,4,6,𝛾246\gamma=2,4,6,\cdotsitalic_γ = 2 , 4 , 6 , ⋯. Then, in Section 4, we determined the minimax rate of kernel regression with K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for any γ>0𝛾0\gamma>0italic_γ > 0. We also found some intriguing phenomena exhibited in large-dimension kernel regression, which were referred to as the ‘multiple descent behavior’ and the ‘periodic plateau behavior’.

This periodic behavior has been observed in a variety of research. For example, there are some works discussing the inconsistency of kernel methods with inner product kernels when ndγasymptotically-equals𝑛superscript𝑑𝛾n\asymp d^{\gamma}italic_n ≍ italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT for some non-integer γ𝛾\gammaitalic_γ ( see e.g., [29, 30, 31, 56, 60]). Denote RKR(f,𝑿,λ)subscript𝑅KRsubscript𝑓𝑿𝜆R_{\mathrm{KR}}\left(f_{\star},\boldsymbol{X},\lambda\right)italic_R start_POSTSUBSCRIPT roman_KR end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , bold_italic_X , italic_λ ) as the excess risk of kernel ridge regression and P>subscriptPabsent\mathrm{P}_{>\ell}roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT as the projection onto polynomials with degree >absent>\ell> roman_ℓ. [30] showed that for any ε>0𝜀0\varepsilon>0italic_ε > 0 and any regularization parameter 0<λ<λ0𝜆superscript𝜆0<\lambda<\lambda^{*}0 < italic_λ < italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with high probability, one has

|RKR(f,𝑿,λ)P>fL22|ε(fL22+σ2),subscript𝑅KRsubscript𝑓𝑿𝜆superscriptsubscriptnormsubscriptPabsentsubscript𝑓superscript𝐿22𝜀superscriptsubscriptnormsubscript𝑓superscript𝐿22superscript𝜎2\left|R_{\mathrm{KR}}\left(f_{\star},\boldsymbol{X},\lambda\right)-\left\|% \mathrm{P}_{>\ell}f_{\star}\right\|_{L^{2}}^{2}\right|\leq\varepsilon\left(% \left\|f_{\star}\right\|_{L^{2}}^{2}+\sigma^{2}\right),| italic_R start_POSTSUBSCRIPT roman_KR end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , bold_italic_X , italic_λ ) - ∥ roman_P start_POSTSUBSCRIPT > roman_ℓ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ italic_ε ( ∥ italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (40)

where =γ𝛾\ell=\lfloor\gamma\rfloorroman_ℓ = ⌊ italic_γ ⌋ and λsuperscript𝜆\lambda^{*}italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is defined as (20) in [30]. They provided a cartoon representation of their results ( we replicated it in Figure 2 (a)).

Furthermore, there is also another line of work that obtained an upper bound on the convergence rate of the excess risk of kernel interpolation [50]. With the assumption that the regression function can be expressed as f(x)=K(x,),ρ()L2subscript𝑓𝑥subscript𝐾𝑥subscript𝜌superscript𝐿2f_{\star}(x)=\langle K(x,\cdot),\rho_{\star}(\cdot)\rangle_{L^{2}}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) = ⟨ italic_K ( italic_x , ⋅ ) , italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( ⋅ ) ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with ρL44Csuperscriptsubscriptnormsubscript𝜌superscript𝐿44𝐶\|\rho_{\star}\|_{L^{4}}^{4}\leq C∥ italic_ρ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≤ italic_C for some constant C>0𝐶0C>0italic_C > 0, they showed that with probability at least 1δexp{n/d}1𝛿𝑛superscript𝑑1-\delta-\exp\{n/d^{\ell}\}1 - italic_δ - roman_exp { italic_n / italic_d start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT },

f𝚒𝚗fL221nη(γ),superscriptsubscriptnormsuperscriptsubscript𝑓𝚒𝚗subscript𝑓superscript𝐿22subscript1superscript𝑛𝜂𝛾\left\|{f}_{\infty}^{\mathtt{in}}-f_{\star}\right\|_{L^{2}}^{2}\leq\mathfrak{C% }_{1}n^{-\eta(\gamma)},∥ italic_f start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - italic_η ( italic_γ ) end_POSTSUPERSCRIPT , (41)

where =γ𝛾\ell=\lfloor\gamma\rfloorroman_ℓ = ⌊ italic_γ ⌋, and η(γ)=min{(+1)/γ1,1/γ}𝜂𝛾1𝛾11𝛾\eta(\gamma)=\min\left\{(\ell+1)/\gamma-1,1-\ell/\gamma\right\}italic_η ( italic_γ ) = roman_min { ( roman_ℓ + 1 ) / italic_γ - 1 , 1 - roman_ℓ / italic_γ }. In Figure 2 (c), we plot the upper bound results in [50] for kernel interpolation, represented by the orange line. It is clear that this curve also exhibits similar periodic behavior.

The new periodic phenomena exhibited in kernel regression with large dimensional data might be an interesting research direction. Motivated by recent work in kernel regression with fixed dimensions, we believe that there might be a uniform explanation for this periodic behavior of kernel regression with respect to the inner product kernels. In particular, whether the periodic plateau behavior holds for more general classes of kernels defined on some domain other than 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT would be of great interest.

Acknowledgements

The authors gratefully acknowledge the National Natural Science Foundation of China (Grant 11971257), Bei**g Natural Science Foundation (Grant Z190001), National Key R&D Program of China (2020AAA0105200), and Bei**g Academy of Artificial Intelligence. Part of the work in this paper was done while the authors visited the Center of Statistical Research, School of Statistics, Southwestern University of Finance and Economics. The authors would like to thank the anonymous referees, the Associate Editor, and the Editor for their constructive comments that improved the quality of this paper.

References

  • [1] Michael Aerni, Marco Milanta, Konstantin Donhauser, and Fanny Yang. Strong inductive biases provably prevent harmless interpolation. arXiv preprint arXiv:2301.07605, 2023.
  • [2] Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E Houle, Ken-ichi Kawarabayashi, and Michael Nett. Estimating local intrinsic dimensionality. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 29–38, 2015.
  • [3] Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning, pages 322–332. PMLR, 2019.
  • [4] Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Russ R Salakhutdinov, and Ruosong Wang. On exact computation with an infinitely wide neural net. Advances in Neural Information Processing Systems, 32, 2019.
  • [5] Douglas Azevedo and Valdir A Menegatto. Eigenvalues of dot-product kernels on the sphere. Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 3(1), 2015.
  • [6] Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local Rademacher complexities. The Annals of Statistics, 33(4):1497 – 1537, 2005.
  • [7] Peter L Bartlett, Philip M Long, Gábor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
  • [8] Daniel Barzilai and Ohad Shamir. Generalization in kernel regression under realistic assumptions. arXiv preprint arXiv:2312.15995, 2023.
  • [9] Daniel Beaglehole, Mikhail Belkin, and Parthe Pandit. Kernel ridgeless regression is inconsistent for low dimensions. arXiv preprint arXiv:2205.13525, 2022.
  • [10] Alberto Bietti and Francis Bach. Deep equals shallow for relu networks in kernel regimes. arXiv preprint arXiv:2009.14397, 2020.
  • [11] Alberto Bietti and Julien Mairal. On the inductive bias of neural tangent kernels. Advances in Neural Information Processing Systems, 32, 2019.
  • [12] Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature communications, 12(1):2914, 2021.
  • [13] Yuan Cao, Zixiang Chen, Misha Belkin, and Quanquan Gu. Benign overfitting in two-layer convolutional neural networks. Advances in Neural Information Processing Systems, 35:25237–25250, 2022.
  • [14] Andrea Caponnetto. Optimal rates for regularization operators in learning theory. Technical Report CBCL Paper #264/AI Technical Report #062, Massachusetts Institute of Technology, September 2006.
  • [15] Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
  • [16] Andrea Caponnetto and Yuan Yao. Cross-validation based adaptation for regularization operators in learning theory. Analysis and Applications, 8(02):161–183, 2010.
  • [17] Bernd Carl and Irmtraud Stephani. Entropy, Compactness and the Approximation of Operators. Cambridge Tracts in Mathematics. Cambridge University Press, 1990.
  • [18] Hung Chen. Convergence Rates for Parametric Components in a Partly Linear Model. The Annals of Statistics, 16(1):136 – 146, 1988.
  • [19] Lenaic Chizat, Edouard Oyallon, and Francis Bach. On lazy training in differentiable programming. Advances in Neural Information Processing Systems, 32, 2019.
  • [20] William S. Cleveland. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74(368):829–836, 1979.
  • [21] Felipe Cucker and Steve Smale. On the mathematical foundations of learning. Bulletin of the American mathematical society, 39(1):1–49, 2002.
  • [22] Feng Dai and Yuan Xu. Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
  • [23] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  • [24] Konstantin Donhauser, Mingqi Wu, and Fanny Yang. How rotational invariance of common kernels prevents generalization in high dimensions. In International Conference on Machine Learning, pages 2804–2814. PMLR, 2021.
  • [25] Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning, pages 1675–1685. PMLR, 2019.
  • [26] Simon S Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054, 2018.
  • [27] Keinosuke Fukunaga and David R Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 100(2):176–183, 1971.
  • [28] Jean Gallier. Notes on spherical harmonics and linear representations of lie groups. preprint, 2009.
  • [29] Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, and Andrea Montanari. When do neural networks outperform kernel methods? Advances in Neural Information Processing Systems, 33:14820–14830, 2020.
  • [30] Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Linearized two-layers neural networks in high dimension. The Annals of Statistics, 49(2):1029 – 1054, 2021.
  • [31] Nikhil Ghosh, Song Mei, and Bin Yu. The three stages of learning dynamics in high-dimensional kernel methods. arXiv preprint arXiv:2111.07167, 2021.
  • [32] Tilmann Gneiting. Strictly and non-strictly positive definite functions on spheres. Bernoulli, 19(4):1327 – 1349, 2013.
  • [33] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
  • [34] Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2):949 – 986, 2022.
  • [35] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [36] Nancy E. Heckman. Spline smoothing in a partly linear model. Journal of the Royal Statistical Society: Series B (Methodological), 48(2):244–248, 1986.
  • [37] Tianyang Hu, Wenjia Wang, Cong Lin, and Guang Cheng. Regularization matters: A nonparametric perspective on overparametrized neural network. In International Conference on Artificial Intelligence and Statistics, pages 829–837. PMLR, 2021.
  • [38] Wei Hu, Zhiyuan Li, and Dingli Yu. Simple and effective regularization methods for training on noisily labeled data with generalization guarantee. arXiv preprint arXiv:1905.11368, 2019.
  • [39] Arthur Jacot, Franck Gabriel, and Clément Hongler. Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 31, 2018.
  • [40] Noureddine El Karoui. The spectrum of kernel random matrices. The Annals of Statistics, 38(1):1 – 50, 2010.
  • [41] Michael Kohler and Adam Krzyzak. Nonparametric regression estimation using penalized least squares. IEEE Transactions on Information Theory, 47(7):3054–3058, 2001.
  • [42] Vladimir Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593 – 2656, 2006.
  • [43] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  • [44] Jianfa Lai, Manyun Xu, Rui Chen, and Qian Lin. Generalization ability of wide neural networks on \mathbb{R}blackboard_R, 2023.
  • [45] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
  • [46] Yicheng Li, Zixiong Yu, Guhan Chen, and Qian Lin. On the eigenvalue decay rates of a class of neural-network related kernel functions defined on general domains. Journal of Machine Learning Research, 25(82):1–47, 2024.
  • [47] Yicheng Li, Haobo Zhang, and Qian Lin. Kernel interpolation generalizes poorly. arXiv preprint arXiv:2303.15809, 2023.
  • [48] Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in Neural Information Processing Systems, 31, 2018.
  • [49] Tengyuan Liang and Alexander Rakhlin. Just interpolate: Kernel “Ridgeless” regression can generalize. The Annals of Statistics, 48(3):1329 – 1347, 2020.
  • [50] Tengyuan Liang, Alexander Rakhlin, and Xiyu Zhai. On the multiple descent of minimum-norm interpolants and restricted lower isometry of kernels. In Conference on Learning Theory, pages 2683–2711. PMLR, 2020.
  • [51] Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, and Volkan Cevher. Optimal rates for spectral algorithms with least-squares regression over hilbert spaces. Applied and Computational Harmonic Analysis, 48(3):868–890, may 2020.
  • [52] Fanghui Liu, Zhenyu Liao, and Johan Suykens. Kernel regression in high dimensions: Refined analysis beyond double descent. In International Conference on Artificial Intelligence and Statistics, pages 649–657. PMLR, 2021.
  • [53] Neil Mallinar, James B Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, and Preetum Nakkiran. Benign, tempered, or catastrophic: A taxonomy of overfitting. arXiv preprint arXiv:2207.06569, 2022.
  • [54] Pascal Massart. About the constants in Talagrand’s concentration inequalities for empirical processes. The Annals of Probability, 28(2):863 – 884, 2000.
  • [55] Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Learning with invariances in random features and kernel models. In Conference on Learning Theory, pages 3351–3418. PMLR, 2021.
  • [56] Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration. Applied and Computational Harmonic Analysis, 59:3–84, 2022.
  • [57] Song Mei and Andrea Montanari. The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics, 75(4):667–766, 2022.
  • [58] Shahar Mendelson. Geometric parameters of kernel machines. In Computational Learning Theory, volume 2375 of Lecture Notes in Artificial Intelligence, pages 29–43, Berlin, 2002. Springer.
  • [59] Vitali D Milman and Gideon Schechtman. Asymptotic theory of finite dimensional normed spaces: Isoperimetric inequalities in riemannian manifolds, volume 1200. Springer, 2009.
  • [60] Theodor Misiakiewicz. Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression. arXiv preprint arXiv:2204.10425, 2022.
  • [61] Theodor Misiakiewicz and Song Mei. Learning with convolution and pooling operations in kernel methods. Advances in Neural Information Processing Systems, 35:29014–29025, 2022.
  • [62] Vidya Muthukumar, Kailas Vodrahalli, Vignesh Subramanian, and Anant Sahai. Harmless interpolation of noisy data in regression. IEEE Journal on Selected Areas in Information Theory, 1(1):67–83, 2020.
  • [63] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318. PMLR, 2013.
  • [64] Garvesh Raskutti, Martin J. Wainwright, and Bin Yu. Early stop** and non-parametric regression: An optimal data-dependent stop** rule. Journal of Machine Learning Research, 15(11):335–366, 2014.
  • [65] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2015.
  • [66] Mojtaba Sahraee-Ardakan, Melikasadat Emami, Parthe Pandit, Sundeep Rangan, and Alyson K Fletcher. Kernel methods and multi-layer perceptrons learn linear models in high dimensions. arXiv preprint arXiv:2201.08082, 2022.
  • [67] Amartya Sanyal, Puneet K Dokania, Varun Kanade, and Philip HS Torr. How benign is benign overfitting? arXiv preprint arXiv:2007.04028, 2020.
  • [68] Bernhard Schölkopf, Alexander J Smola, Francis Bach, et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
  • [69] Alex Smola, Zoltán Ovári, and Robert C Williamson. Regularization with dot-product kernels. Advances in Neural Information Processing Systems, 13, 2000.
  • [70] Ingo Steinwart and Andreas Christmann. Support vector machines. Springer Science & Business Media, 2008.
  • [71] Ingo Steinwart, Don Hush, and Clint Scovel. Optimal rates for regularized least squares regression. In Conference on Learning Theory, pages 79–93. PMLR, 2009.
  • [72] Ingo Steinwart and Clint Scovel. Mercer’s theorem on general domains: On the interaction between measures, kernels, and rkhss. Constructive Approximation, 35:363–417, 2012.
  • [73] Charles J. Stone. Consistent Nonparametric Regression. The Annals of Statistics, 5(4):595 – 620, 1977.
  • [74] Charles J. Stone. Additive Regression and Other Nonparametric Models. The Annals of Statistics, 13(2):689 – 705, 1985.
  • [75] Charles J. Stone. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation. The Annals of Statistics, 22(1):118 – 171, 1994.
  • [76] Namjoon Suh, Hyunouk Ko, and Xiaoming Huo. A non-parametric regression viewpoint : Generalization of overparametrized deep RELU network under noisy observations. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  • [77] Terence Tao. 254a, notes 1: Concentration of measure. https://terrytao.wordpress.com/2010/01/03/254a-notes-1-concentration-of-measure/, 2010.
  • [78] Alexander Tsigler and Peter L Bartlett. Benign overfitting in ridge regression. arXiv preprint arXiv:2009.14286, 2020.
  • [79] Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  • [80] F. T. Wright. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric. The Annals of Probability, 1(6):1068 – 1070, 1973.
  • [81] Lechao Xiao and Jeffrey Pennington. Precise learning curves and higher-order scaling limits for dot product kernel regression. arXiv preprint arXiv:2205.14846, 2022.
  • [82] Yuhong Yang and Andrew Barron. Information-theoretic determination of minimax rates of convergence. The Annals of Statistics, 27(5):1564 – 1599, 1999.
  • [83] Haobo Zhang, Yicheng Li, and Qian Lin. On the optimality of misspecified spectral algorithms. arXiv preprint arXiv:2303.14942, 2023.
  • [84] Haobo Zhang, Yicheng Li, Weihao Lu, and Qian Lin. On the optimality of misspecified kernel ridge regression. arXiv preprint arXiv:2305.07241, 2023.

Supplement to "Optimal Rate of Kernel Regression in Large Dimensions"

Appendix A Proof of Theorems in Section 6

A.1 Proof of Theorem 6.3

The proof is divided into four lemmas below:

Lemma A.1.

Let ε^nsubscript^𝜀𝑛\widehat{\varepsilon}_{n}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the empirical Mendelson complexity defined in (28). There exist absolute constants C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT such that we have

fT^fn2σ2+1σ2ε^n2,superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓𝑛2superscript𝜎21superscript𝜎2superscriptsubscript^𝜀𝑛2\left\|f_{\widehat{T}}-f_{\star}\right\|_{n}^{2}\leq\frac{\sigma^{2}+1}{\sigma% ^{2}}\widehat{\varepsilon}_{n}^{2},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (42)

with probability at least 1C2exp(C3nε^n2)1subscript𝐶2subscript𝐶3𝑛superscriptsubscript^𝜀𝑛21-C_{2}\exp\left(-C_{3}n\widehat{\varepsilon}_{n}^{2}\right)1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp ( - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where gn2=1njng(xj)2subscriptsuperscriptnorm𝑔2𝑛1𝑛subscript𝑗𝑛𝑔superscriptsubscript𝑥𝑗2\|g\|^{2}_{n}=\frac{1}{n}\sum_{j\leq n}g(x_{j})^{2}∥ italic_g ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j ≤ italic_n end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and the randomness comes from the noise term 𝐲f(𝐗)𝐲subscript𝑓𝐗\boldsymbol{y}-f_{\star}(\boldsymbol{X})bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ).

Lemma A.2.

Let εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the population Mendelson complexity defined in (27). There exist absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, such that for any M>0𝑀0M>0italic_M > 0, let M:={ggM}assign𝑀conditional-set𝑔subscriptnorm𝑔𝑀M\mathcal{B}:=\left\{g\in\mathcal{H}\mid\|g\|_{\mathcal{H}}\leq M\right\}italic_M caligraphic_B := { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ italic_M }, then we have

|gn2gL22|12gL22+C1M2κεn2 for all gM,formulae-sequencesuperscriptsubscriptnorm𝑔𝑛2superscriptsubscriptnorm𝑔superscript𝐿2212superscriptsubscriptnorm𝑔superscript𝐿22subscript𝐶1superscript𝑀2𝜅superscriptsubscript𝜀𝑛2 for all 𝑔𝑀\displaystyle\left|\|g\|_{n}^{2}-\|g\|_{L^{2}}^{2}\right|\leq\frac{1}{2}\|g\|_% {L^{2}}^{2}+C_{1}{M^{2}\kappa\varepsilon_{n}^{2}}\quad\text{ for all }g\in M% \mathcal{B},| ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all italic_g ∈ italic_M caligraphic_B , (43)

holds with probability at least 1C2eC3nεn21subscript𝐶2superscript𝑒subscript𝐶3𝑛superscriptsubscript𝜀𝑛21-C_{2}e^{-C_{3}n\varepsilon_{n}^{2}}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where the randomness comes from n𝑛nitalic_n samples x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\cdots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Lemma A.3.

Let ε^nsubscript^𝜀𝑛\widehat{\varepsilon}_{n}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the empirical Mendelson complexity defined in (28) and εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the population Mendelson complexity defined in (27). Under the same assumptions as Theorem 6.3, there exist absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, C4subscript𝐶4C_{4}italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, and a constant 0subscript0\mathfrak{C}_{0}fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, such that for any n0𝑛subscript0n\geq\mathfrak{C}_{0}italic_n ≥ fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have

C1εnε^nC2εn,subscript𝐶1subscript𝜀𝑛subscript^𝜀𝑛subscript𝐶2subscript𝜀𝑛\displaystyle C_{1}{\varepsilon}_{n}\leq\widehat{\varepsilon}_{n}\leq C_{2}{% \varepsilon}_{n},italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (44)

holds with probability at least 1C3exp{C4nεn2}1subscript𝐶3subscript𝐶4𝑛superscriptsubscript𝜀𝑛21-C_{3}\exp\left\{-C_{4}n\varepsilon_{n}^{2}\right\}1 - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_exp { - italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, where the randomness comes from n𝑛nitalic_n samples x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\cdots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Lemma A.4.

There exists an absolute constant C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, such that

fT^fsubscriptnormsubscript𝑓^𝑇subscript𝑓\displaystyle\|{f}_{\widehat{T}}-f_{\star}\|_{\mathcal{H}}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT 3,absent3\displaystyle\leq 3,≤ 3 , (45)

holds with probability at least 1C1exp{n(min{ε^n,εn})2}1subscript𝐶1𝑛superscriptsubscript^𝜀𝑛subscript𝜀𝑛21-C_{1}\exp\left\{-n(\min\{\widehat{\varepsilon}_{n},\varepsilon_{n}\})^{2}\right\}1 - italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp { - italic_n ( roman_min { over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, where the randomness comes from the noise term 𝐲f(𝐗)𝐲subscript𝑓𝐗\boldsymbol{y}-f_{\star}(\boldsymbol{X})bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ).

It is a tedious work to show that these constants are absolute constants. We defer the details of the proofs to Appendix E.2. Now let’s begin the proof of Theorem 6.3. Thanks to the Lemma A.3, we know the following three statements hold with probability at least 1C2exp{C3nεn2}1subscript𝐶2subscript𝐶3𝑛superscriptsubscript𝜀𝑛21-C_{2}\exp\{-C_{3}n\varepsilon_{n}^{2}\}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } for some absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

  • a)

    Lemma A.1 and A.3 imply that fT^fn2σ2+1σ2ε^n2C1εn2superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓𝑛2superscript𝜎21superscript𝜎2superscriptsubscript^𝜀𝑛2subscript𝐶1superscriptsubscript𝜀𝑛2\|f_{\widehat{T}}-f_{\star}\|_{n}^{2}\leq\frac{\sigma^{2}+1}{\sigma^{2}}% \widehat{\varepsilon}_{n}^{2}\leq C_{1}\varepsilon_{n}^{2}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • b)

    Lemma A.4 guarantees 13(fT^f)13subscript𝑓^𝑇subscript𝑓\frac{1}{3}\left({f}_{\widehat{T}}-f_{\star}\right)\in\mathcal{B}divide start_ARG 1 end_ARG start_ARG 3 end_ARG ( italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ∈ caligraphic_B.

  • c)

    Lemma A.2 then guarantees

    12fT^fL22fT^fn2+9C1κεn2,12superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓superscript𝐿22superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓𝑛29subscript𝐶1𝜅superscriptsubscript𝜀𝑛2\displaystyle\frac{1}{2}\|{f}_{\widehat{T}}-f_{\star}\|_{L^{2}}^{2}\leq\|{f}_{% \widehat{T}}-f_{\star}\|_{n}^{2}+{9C_{1}\kappa\varepsilon_{n}^{2}},divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 9 italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_κ italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (46)

Conditioning on the event that both (42), (44), (45), and (46) hold. we have

fT^fL222fT^fn2+C1εn23C1εn2,subscriptsuperscriptnormsubscript𝑓^𝑇subscript𝑓2superscript𝐿22superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓𝑛2subscript𝐶1superscriptsubscript𝜀𝑛23subscript𝐶1superscriptsubscript𝜀𝑛2\displaystyle\|f_{\widehat{T}}-f_{\star}\|^{2}_{L^{2}}{\leq}2\|{f}_{\widehat{T% }}-f_{\star}\|_{n}^{2}+{C_{1}\varepsilon_{n}^{2}}\leq 3C_{1}\varepsilon_{n}^{2},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 2 ∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 3 italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

holds with probability at least 1C2exp{C3nεn2}1subscript𝐶2subscript𝐶3𝑛superscriptsubscript𝜀𝑛21-C_{2}\exp\left\{-C_{3}n\varepsilon_{n}^{2}\right\}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. \square

A.2 Proof of Proposition 6.9

Recall that each eigenvalue μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has multiplicity N(d,k)𝑁𝑑𝑘N(d,k)italic_N ( italic_d , italic_k ) (see, e.g., Appendix D).

For each d1𝑑1d\geq 1italic_d ≥ 1, let ε~d=13μ2/αsubscript~𝜀𝑑13subscript𝜇2𝛼\tilde{\varepsilon}_{d}=13\sqrt{\mu_{2}}/\alphaover~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 13 square-root start_ARG italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG / italic_α, where μ2=μ2(d)subscript𝜇2subscript𝜇2𝑑\mu_{2}=\mu_{2}(d)italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d ) is the eigenvalue of K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Then we have (αε~d/12)2>μ2superscript𝛼subscript~𝜀𝑑122subscript𝜇2(\alpha\tilde{\varepsilon}_{d}/12)^{2}>\mu_{2}( italic_α over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / 12 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for any d1𝑑1d\geq 1italic_d ≥ 1. From results in Appendix D, when d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ, we further have

ε~d2=169α2μ2<μ12,K(μ1/2)12N(d,1)log(2)>1δlog(12α),formulae-sequencesuperscriptsubscript~𝜀𝑑2169superscript𝛼2subscript𝜇2subscript𝜇12𝐾subscript𝜇1212𝑁𝑑121𝛿12𝛼\tilde{\varepsilon}_{d}^{2}=\frac{169}{\alpha^{2}}{\mu_{2}}<\frac{\mu_{1}}{2},% \quad K(\sqrt{\mu_{1}/2})\geq\frac{1}{2}N(d,1)\log(2)>\frac{1}{\delta}\log% \left(\frac{12}{\alpha}\right),over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 169 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < divide start_ARG italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , italic_K ( square-root start_ARG italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 end_ARG ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , 1 ) roman_log ( 2 ) > divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG roman_log ( divide start_ARG 12 end_ARG start_ARG italic_α end_ARG ) ,

where K(ε)=1/2k:μk>ε2N(d,k)log(μk/ε2)𝐾𝜀12subscript:𝑘subscript𝜇𝑘superscript𝜀2𝑁𝑑𝑘subscript𝜇𝑘superscript𝜀2K(\varepsilon)=1/2\sum_{k:\mu_{k}>\varepsilon^{2}}N(d,k)\log\left({\mu_{k}}/{% \varepsilon^{2}}\right)italic_K ( italic_ε ) = 1 / 2 ∑ start_POSTSUBSCRIPT italic_k : italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) roman_log ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

From the monotonicity of K()𝐾K(\cdot)italic_K ( ⋅ ), when d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we have K(ε~d)K(μ1/2)𝐾subscript~𝜀𝑑𝐾subscript𝜇12K(\tilde{\varepsilon}_{d})\geq K(\sqrt{\mu_{1}/2})italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≥ italic_K ( square-root start_ARG italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 end_ARG ). Therefore, from Lemma A.5 and Lemma A.7, we have

lim infdM2(αε~d,)M2(ε~d,)supdK(αε~d/12)K(ε~d)=supd(K(ε~d)+log(12α)K(ε~d))1+δ.subscriptlimit-infimum𝑑subscript𝑀2𝛼subscript~𝜀𝑑subscript𝑀2subscript~𝜀𝑑subscriptsupremum𝑑𝐾𝛼subscript~𝜀𝑑12𝐾subscript~𝜀𝑑subscriptsupremum𝑑𝐾subscript~𝜀𝑑12𝛼𝐾subscript~𝜀𝑑1𝛿\displaystyle~{}\liminf_{d\rightarrow\infty}\frac{M_{2}(\alpha\tilde{% \varepsilon}_{d},\mathcal{B})}{M_{2}(\tilde{\varepsilon}_{d},\mathcal{B})}\leq% \sup_{d\geq\mathfrak{C}}\frac{K(\alpha\tilde{\varepsilon}_{d}/12)}{K(\tilde{% \varepsilon}_{d})}=\sup_{d\geq\mathfrak{C}}\left(\frac{K(\tilde{\varepsilon}_{% d})+\log\left(\frac{12}{\alpha}\right)}{K(\tilde{\varepsilon}_{d})}\right)\leq 1% +\delta.lim inf start_POSTSUBSCRIPT italic_d → ∞ end_POSTSUBSCRIPT divide start_ARG italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_α over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , caligraphic_B ) end_ARG start_ARG italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≤ roman_sup start_POSTSUBSCRIPT italic_d ≥ fraktur_C end_POSTSUBSCRIPT divide start_ARG italic_K ( italic_α over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / 12 ) end_ARG start_ARG italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG = roman_sup start_POSTSUBSCRIPT italic_d ≥ fraktur_C end_POSTSUBSCRIPT ( divide start_ARG italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) + roman_log ( divide start_ARG 12 end_ARG start_ARG italic_α end_ARG ) end_ARG start_ARG italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ) ≤ 1 + italic_δ .

\square

A.3 Proof of Theorem 6.10

Suppose that there exists a constant \mathfrak{C}fraktur_C only depending on c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have nε¯n22log2𝑛superscriptsubscript¯𝜀𝑛222n\bar{\varepsilon}_{n}^{2}\geq 2\log 2italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 2 roman_log 2. Then for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, (1) for any ε2σε¯n𝜀2𝜎subscript¯𝜀𝑛\varepsilon\leq\sqrt{2}\sigma\bar{\varepsilon}_{n}italic_ε ≤ square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have M2(ε,)VK(ε/(2σ),𝒟)VK(ε¯n,𝒟)=nε¯n22log2subscript𝑀2𝜀subscript𝑉𝐾𝜀2𝜎𝒟subscript𝑉𝐾subscript¯𝜀𝑛𝒟𝑛superscriptsubscript¯𝜀𝑛222M_{2}(\varepsilon,\mathcal{B})\geq V_{K}(\varepsilon/(\sqrt{2}\sigma),\mathcal% {D})\geq V_{K}(\bar{\varepsilon}_{n},\mathcal{D})=n\bar{\varepsilon}_{n}^{2}% \geq 2\log 2italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ≥ italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε / ( square-root start_ARG 2 end_ARG italic_σ ) , caligraphic_D ) ≥ italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) = italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 2 roman_log 2 (see, e.g., Appendix A.3.1), and (2) we have ε¯n<subscript¯𝜀𝑛\underline{\varepsilon}_{n}<\inftyunder¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < ∞ since M2(ε¯n,)=4nε¯n2+2log210log2subscript𝑀2subscript¯𝜀𝑛4𝑛superscriptsubscript¯𝜀𝑛222102M_{2}(\underline{\varepsilon}_{n},\mathcal{B})=4n\bar{\varepsilon}_{n}^{2}+2% \log 2\geq 10\log 2italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = 4 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 roman_log 2 ≥ 10 roman_log 2. Therefore, we have actually verified that all conditions in Proposition 6.7 hold.

Thanks to the Proposition 6.7, now we only need to verify that ε¯n𝔠2ε¯n/6subscript¯𝜀𝑛subscript𝔠2subscript¯𝜀𝑛6\underline{\varepsilon}_{n}\geq\mathfrak{c}_{2}\bar{\varepsilon}_{n}/6under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / 6. In fact, thanks to the properties of metric entropy of \mathcal{B}caligraphic_B in subsection A.3.1, we have

nε¯n2𝑛superscriptsubscript¯𝜀𝑛2\displaystyle n\bar{\varepsilon}_{n}^{2}italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =VK(ε¯n,𝒟)(33)15V2(𝔠2ε¯n,)(50)110j:λj>𝔠22ε¯n2/36log(λj𝔠22ε¯n2/36)absentsubscript𝑉𝐾subscript¯𝜀𝑛𝒟3315subscript𝑉2subscript𝔠2subscript¯𝜀𝑛50110subscript:𝑗subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236\displaystyle=V_{K}(\bar{\varepsilon}_{n},\mathcal{D})\overset{(\ref{eqn:lower% _condition_24})}{\leq}\frac{1}{5}V_{2}(\mathfrak{c}_{2}\bar{\varepsilon}_{n},% \mathcal{B})\overset{(\ref{eqn:137})}{\leq}\frac{1}{10}\sum_{j:\lambda_{j}>% \mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/36}\log\left(\frac{\lambda_{j}}{% \mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/36}\right)= italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 1 end_ARG start_ARG 5 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 1 end_ARG start_ARG 10 end_ARG ∑ start_POSTSUBSCRIPT italic_j : italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_POSTSUBSCRIPT roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_ARG ) (47)
110j:λj>𝔠22ε¯n2/36log(λj𝔠22ε¯n2/36).absent110subscript:𝑗subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236\displaystyle\leq\frac{1}{10}\sum_{j:\lambda_{j}>\mathfrak{c}_{2}^{2}\bar{% \varepsilon}_{n}^{2}/36}\log\left(\frac{\lambda_{j}}{\mathfrak{c}_{2}^{2}\bar{% \varepsilon}_{n}^{2}/36}\right).≤ divide start_ARG 1 end_ARG start_ARG 10 end_ARG ∑ start_POSTSUBSCRIPT italic_j : italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_POSTSUBSCRIPT roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_ARG ) .

Therefore,

V2(ε¯n,)subscript𝑉2subscript¯𝜀𝑛\displaystyle V_{2}(\underline{\varepsilon}_{n},\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) LemmaA.7M2(ε¯n,)=4nε¯n2+2log25nε¯n2𝐿𝑒𝑚𝑚𝑎A.7subscript𝑀2subscript¯𝜀𝑛4𝑛superscriptsubscript¯𝜀𝑛2225𝑛superscriptsubscript¯𝜀𝑛2\displaystyle\overset{Lemma\ref{lemma_M_2_and_V_2}}{\leq}M_{2}(\underline{% \varepsilon}_{n},\mathcal{B})=4n\bar{\varepsilon}_{n}^{2}+2\log 2\leq 5n\bar{% \varepsilon}_{n}^{2}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = 4 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 roman_log 2 ≤ 5 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (48)
(47)12j:λj>𝔠22ε¯n2/36log(λj𝔠22ε¯n2/36)(50)V2(𝔠2ε¯n/6,).4712subscript:𝑗subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236subscript𝜆𝑗superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛23650subscript𝑉2subscript𝔠2subscript¯𝜀𝑛6\displaystyle\overset{(\ref{eqn:127_modified_condition_of_lower_bound})}{\leq}% \frac{1}{2}\sum_{j:\lambda_{j}>\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/3% 6}\log\left(\frac{\lambda_{j}}{\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/3% 6}\right)\overset{(\ref{eqn:137})}{\leq}V_{2}(\mathfrak{c}_{2}\bar{\varepsilon% }_{n}/6,\mathcal{B}).start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j : italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_POSTSUBSCRIPT roman_log ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_ARG ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / 6 , caligraphic_B ) .

Since V2subscript𝑉2V_{2}italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is monotone decreasing, we know that ε¯n𝔠2ε¯n/6subscript¯𝜀𝑛subscript𝔠2subscript¯𝜀𝑛6\underline{\varepsilon}_{n}\geq\mathfrak{c}_{2}\bar{\varepsilon}_{n}/6under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / 6. From Proposition 6.7, we have

𝔼(𝕏,𝕪)ρfnf^fL22subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22\displaystyle\mathbb{E}_{(\mathbb{X},\mathbb{y})\sim\rho_{f_{\star}}^{\otimes n% }}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 12(𝔠212)2ε¯n2,absent12superscriptsubscript𝔠2122superscriptsubscript¯𝜀𝑛2\displaystyle\geq\frac{1}{2}\left(\frac{\mathfrak{c}_{2}}{12}\right)^{2}\bar{% \varepsilon}_{n}^{2},≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 12 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (49)

and we get the desired result. \square

A.3.1 Properties of the metric entropy of \mathcal{B}caligraphic_B

It is clear that V2(ε,)subscript𝑉2𝜀V_{2}(\varepsilon,\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) is also the logarithm of the ε𝜀\varepsilonitalic_ε-covering number of {(λiai)i1iai21}2conditional-setsubscriptsubscript𝜆𝑖subscript𝑎𝑖𝑖1subscript𝑖superscriptsubscript𝑎𝑖21superscript2\{(\sqrt{\lambda_{i}}a_{i})_{i\geq 1}\mid\sum_{i}a_{i}^{2}\leq 1\}\subset\ell^% {2}{ ( square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT ∣ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 } ⊂ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (with respect to the 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT distance), where λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are given in (4). Then we have the following lemmas about the metric entropy of \mathcal{B}caligraphic_B.

Lemma A.5.

For any ε>0𝜀0\varepsilon>0italic_ε > 0, let K(ε)=1/2j:λj>ε2log(λj/ε2)𝐾𝜀12subscript:𝑗subscript𝜆𝑗superscript𝜀2subscript𝜆𝑗superscript𝜀2K(\varepsilon)=1/2\sum_{j:\lambda_{j}>\varepsilon^{2}}\log\left({\lambda_{j}}/% {\varepsilon^{2}}\right)italic_K ( italic_ε ) = 1 / 2 ∑ start_POSTSUBSCRIPT italic_j : italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We have

V2(6ε,)K(ε)V2(ε,).subscript𝑉26𝜀𝐾𝜀subscript𝑉2𝜀\displaystyle V_{2}(6\varepsilon,\mathcal{B})\leq K(\varepsilon)\leq V_{2}(% \varepsilon,\mathcal{B}).italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 6 italic_ε , caligraphic_B ) ≤ italic_K ( italic_ε ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) . (50)
Proof.

We need the following lemma:

Lemma A.6 (Proposition 1.3.2 in [17]).

For a non-increasing sequence {λi}subscript𝜆𝑖\{\lambda_{i}\}{ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of positive numbers, let S𝑆Sitalic_S be an operator from 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to itself which is given by

S𝑆\displaystyle Sitalic_S :22,(ai)i1(λiai)i1.:absentformulae-sequencesuperscript2superscript2subscriptsubscript𝑎𝑖𝑖1subscriptsubscript𝜆𝑖subscript𝑎𝑖𝑖1\displaystyle:\ell^{2}\to\ell^{2},\quad(a_{i})_{i\geq 1}\to(\sqrt{\lambda_{i}}% a_{i})_{i\geq 1}.: roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT → ( square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT . (51)

Let us denote the unit ball in 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by UEsubscript𝑈𝐸U_{E}italic_U start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT. Then, we have

sup1k<(j=1kλjq)1kεq(S)6sup1k<(j=1kλjq)1k,subscriptsupremum1𝑘superscriptsuperscriptsubscriptproduct𝑗1𝑘subscript𝜆𝑗𝑞1𝑘subscript𝜀𝑞𝑆6subscriptsupremum1𝑘superscriptsuperscriptsubscriptproduct𝑗1𝑘subscript𝜆𝑗𝑞1𝑘\displaystyle\sup_{1\leq k<\infty}\left(\frac{\prod_{j=1}^{k}\sqrt{\lambda_{j}% }}{q}\right)^{\frac{1}{k}}\leq\varepsilon_{q}(S)\leq 6\sup_{1\leq k<\infty}% \left(\frac{\prod_{j=1}^{k}\sqrt{\lambda_{j}}}{q}\right)^{\frac{1}{k}},roman_sup start_POSTSUBSCRIPT 1 ≤ italic_k < ∞ end_POSTSUBSCRIPT ( divide start_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_q end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUPERSCRIPT ≤ italic_ε start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_S ) ≤ 6 roman_sup start_POSTSUBSCRIPT 1 ≤ italic_k < ∞ end_POSTSUBSCRIPT ( divide start_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_q end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_k end_ARG end_POSTSUPERSCRIPT , (52)

where

εq(S)=inf{ε>0 there exist n points a1,,aq2 such that S(UE)j=1qB(ai,ε)}.subscript𝜀𝑞𝑆infimumconditional-set𝜀0 there exist 𝑛 points subscript𝑎1subscript𝑎𝑞superscript2 such that 𝑆subscript𝑈𝐸superscriptsubscript𝑗1𝑞𝐵subscript𝑎𝑖𝜀\displaystyle\varepsilon_{q}(S)=\inf\left\{\varepsilon>0\mid\text{ there exist% }n\text{ points }a_{1},\cdots,a_{q}\in\ell^{2}\text{ such that }S(U_{E})% \subset\cup_{j=1}^{q}B(a_{i},\varepsilon)\right\}.italic_ε start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_S ) = roman_inf { italic_ε > 0 ∣ there exist italic_n points italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_a start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT such that italic_S ( italic_U start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ) ⊂ ∪ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_B ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ε ) } .

Now let’s begin to prove Lemma A.5.

For any ε>0𝜀0\varepsilon>0italic_ε > 0, let m=min{k:λk+1ε2}𝑚:𝑘subscript𝜆𝑘1superscript𝜀2m=\min\{k:\lambda_{k+1}\leq\varepsilon^{2}\}italic_m = roman_min { italic_k : italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } and q=j=1m(λj/ε)𝑞superscriptsubscriptproduct𝑗1𝑚subscript𝜆𝑗𝜀q=\prod_{j=1}^{m}(\sqrt{\lambda_{j}}/\varepsilon)italic_q = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG / italic_ε ). Note that q𝑞qitalic_q is exactly the εq(S)subscript𝜀𝑞𝑆\varepsilon_{q}(S)italic_ε start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_S )-covering number of the S(UE)𝑆subscript𝑈𝐸S(U_{E})italic_S ( italic_U start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ). The lemma A.6 implies that

exp{V2(6ε,)}qexp{V2(ε,)}.subscript𝑉26𝜀𝑞subscript𝑉2𝜀\displaystyle\exp\{V_{2}(6\varepsilon,\mathcal{B})\}\leq q\leq\exp\{V_{2}(% \varepsilon,\mathcal{B})\}.roman_exp { italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 6 italic_ε , caligraphic_B ) } ≤ italic_q ≤ roman_exp { italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) } . (53)

Taking the logarithm, we know that

V2(6ε,)K(ε)V2(ε,).subscript𝑉26𝜀𝐾𝜀subscript𝑉2𝜀\displaystyle V_{2}(6\varepsilon,\mathcal{B})\leq K(\varepsilon)\leq V_{2}(% \varepsilon,\mathcal{B}).italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 6 italic_ε , caligraphic_B ) ≤ italic_K ( italic_ε ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) . (54)

\square

Lemma A.7.

For any ε>0𝜀0\varepsilon>0italic_ε > 0, we have M2(2ε,)V2(ε,)M2(ε,)subscript𝑀22𝜀subscript𝑉2𝜀subscript𝑀2𝜀M_{2}(2\varepsilon,\mathcal{B})\leq V_{2}(\varepsilon,\mathcal{B})\leq M_{2}(% \varepsilon,\mathcal{B})italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 italic_ε , caligraphic_B ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ≤ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ).

Proof.

Suppose E={f1,,fM}𝐸subscript𝑓1subscript𝑓𝑀E=\left\{f_{1},\ldots,f_{M}\right\}italic_E = { italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT } is an ε𝜀\varepsilonitalic_ε-packing. Then for all f\E𝑓\𝐸f\in\mathcal{B}\backslash Eitalic_f ∈ caligraphic_B \ italic_E, we can find fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, such that ffiεnorm𝑓subscript𝑓𝑖𝜀\left\|f-f_{i}\right\|\leq\varepsilon∥ italic_f - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ italic_ε. Hence E𝐸Eitalic_E is an ε𝜀\varepsilonitalic_ε-net. Therefore, we have V2(ε,)M2(ε,)subscript𝑉2𝜀subscript𝑀2𝜀V_{2}(\varepsilon,\mathcal{B})\leq M_{2}(\varepsilon,\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) ≤ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ).

On the other side, suppose there exists a 2ε2𝜀2\varepsilon2 italic_ε-packing {f1,,fM}subscript𝑓1subscript𝑓𝑀\left\{f_{1},\ldots,f_{M}\right\}{ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT } and an ε𝜀\varepsilonitalic_ε-net {g1,,gN}subscript𝑔1subscript𝑔𝑁\left\{g_{1},\ldots,g_{N}\right\}{ italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } such that MN+1𝑀𝑁1M\geq N+1italic_M ≥ italic_N + 1. Then we must have fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT belonging to the same ε𝜀\varepsilonitalic_ε-ball B(gk,ε)𝐵subscript𝑔𝑘𝜀B\left(g_{k},\varepsilon\right)italic_B ( italic_g start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ε ) for some ij𝑖𝑗i\neq jitalic_i ≠ italic_j and k𝑘kitalic_k. This means that we have fifj2εnormsubscript𝑓𝑖subscript𝑓𝑗2𝜀\left\|f_{i}-f_{j}\right\|\leq 2\varepsilon∥ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ ≤ 2 italic_ε, which leads to a contradiction. Therefore, we have M2(2ε,)V2(ε,)subscript𝑀22𝜀subscript𝑉2𝜀M_{2}(2\varepsilon,\mathcal{B})\leq V_{2}(\varepsilon,\mathcal{B})italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 italic_ε , caligraphic_B ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ).

\square

Lemma A.8.

V2(ε,)=VK(ε2σ,𝒟)subscript𝑉2𝜀subscript𝑉𝐾𝜀2𝜎𝒟V_{2}\left(\varepsilon,\mathcal{B}\right)=V_{K}\left(\frac{\varepsilon}{\sqrt{% 2}\sigma},\mathcal{D}\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( divide start_ARG italic_ε end_ARG start_ARG square-root start_ARG 2 end_ARG italic_σ end_ARG , caligraphic_D ).

Proof.

Denote the p.d.f. of x𝑥xitalic_x as μ(x)𝜇𝑥\mu(x)italic_μ ( italic_x ), and the p.d.f. of y𝑦yitalic_y given x𝑥xitalic_x as ρ(yx)𝜌conditional𝑦𝑥\rho(y\mid x)italic_ρ ( italic_y ∣ italic_x ). Since yxN(f(x),σ2)similar-toconditional𝑦𝑥𝑁𝑓𝑥superscript𝜎2y\mid x\sim N(f(x),\sigma^{2})italic_y ∣ italic_x ∼ italic_N ( italic_f ( italic_x ) , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), we then have

ρ(yx)=12πσ2exp{(yf(x))22σ2};𝜌conditional𝑦𝑥12𝜋superscript𝜎2superscript𝑦𝑓𝑥22superscript𝜎2\displaystyle\rho(y\mid x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left\{-\frac{(y% -f(x))^{2}}{2\sigma^{2}}\right\};italic_ρ ( italic_y ∣ italic_x ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG roman_exp { - divide start_ARG ( italic_y - italic_f ( italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ; (55)

Therefore, for any f,g𝑓𝑔f,g\in\mathcal{H}italic_f , italic_g ∈ caligraphic_H, we have

K2(f,g)=superscriptsubscript𝐾2𝑓𝑔absent\displaystyle\ell_{K}^{2}(f,g)=roman_ℓ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_f , italic_g ) = ρf(x,y)logρf(x,y)ρg(x,y)𝖽x𝖽ysubscript𝜌𝑓𝑥𝑦subscript𝜌𝑓𝑥𝑦subscript𝜌𝑔𝑥𝑦𝖽𝑥𝖽𝑦\displaystyle\ \int\rho_{f}(x,y)\log\frac{\rho_{f}(x,y)}{\rho_{g}(x,y)}\mathsf% {d}x\mathsf{d}y∫ italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) roman_log divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_x , italic_y ) end_ARG sansserif_d italic_x sansserif_d italic_y (56)
=\displaystyle== ρf(x,y)yσ2[f(x)g(x)]𝖽x𝖽y+ρf(x,y)12σ2[g2(x)f2(x)]𝖽x𝖽ysubscript𝜌𝑓𝑥𝑦𝑦superscript𝜎2delimited-[]𝑓𝑥𝑔𝑥differential-d𝑥differential-d𝑦subscript𝜌𝑓𝑥𝑦12superscript𝜎2delimited-[]superscript𝑔2𝑥superscript𝑓2𝑥differential-d𝑥differential-d𝑦\displaystyle\ \int\rho_{f}(x,y)\frac{y}{\sigma^{2}}[f(x)-g(x)]\mathsf{d}x% \mathsf{d}y+\int\rho_{f}(x,y)\frac{1}{2\sigma^{2}}[g^{2}(x)-f^{2}(x)]\mathsf{d% }x\mathsf{d}y∫ italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) divide start_ARG italic_y end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ italic_f ( italic_x ) - italic_g ( italic_x ) ] sansserif_d italic_x sansserif_d italic_y + ∫ italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) - italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ] sansserif_d italic_x sansserif_d italic_y
=\displaystyle== (yρ(yx)𝑑y)f(x)g(x)σ2μ(x)𝖽x+g2(x)f2(x)2σ2μ(x)𝖽x𝑦𝜌conditional𝑦𝑥differential-d𝑦𝑓𝑥𝑔𝑥superscript𝜎2𝜇𝑥differential-d𝑥superscript𝑔2𝑥superscript𝑓2𝑥2superscript𝜎2𝜇𝑥differential-d𝑥\displaystyle\ \int\left(\int y\rho(y\mid x)dy\right)\frac{f(x)-g(x)}{\sigma^{% 2}}\mu(x)\mathsf{d}x+\int\frac{g^{2}(x)-f^{2}(x)}{2\sigma^{2}}\mu(x)\mathsf{d}x∫ ( ∫ italic_y italic_ρ ( italic_y ∣ italic_x ) italic_d italic_y ) divide start_ARG italic_f ( italic_x ) - italic_g ( italic_x ) end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ ( italic_x ) sansserif_d italic_x + ∫ divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) - italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ ( italic_x ) sansserif_d italic_x
=\displaystyle== 12σ2(2f2(x)2f(x)g(x)+g2(x)f2(x))μ(x)𝖽x=12σ222(f,g).12superscript𝜎22superscript𝑓2𝑥2𝑓𝑥𝑔𝑥superscript𝑔2𝑥superscript𝑓2𝑥𝜇𝑥differential-d𝑥12superscript𝜎2superscriptsubscript22𝑓𝑔\displaystyle\ \frac{1}{2\sigma^{2}}\int\left(2f^{2}(x)-2f(x)g(x)+g^{2}(x)-f^{% 2}(x)\right)\mu(x)\mathsf{d}x=\frac{1}{2\sigma^{2}}\ell_{2}^{2}(f,g).divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∫ ( 2 italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) - 2 italic_f ( italic_x ) italic_g ( italic_x ) + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) - italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ) italic_μ ( italic_x ) sansserif_d italic_x = divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_f , italic_g ) .

Therefore, from the definition of V2subscript𝑉2V_{2}italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and VKsubscript𝑉𝐾V_{K}italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, we have V2(ε,)=VK(ε2σ,𝒟)subscript𝑉2𝜀subscript𝑉𝐾𝜀2𝜎𝒟V_{2}\left(\varepsilon,\mathcal{B}\right)=V_{K}\left(\frac{\varepsilon}{\sqrt{% 2}\sigma},\mathcal{D}\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( divide start_ARG italic_ε end_ARG start_ARG square-root start_ARG 2 end_ARG italic_σ end_ARG , caligraphic_D ). \square

The following proposition proves the claim in Remark 6.11.

Proposition A.9.

Suppose there is an α1>1subscript𝛼11\alpha_{1}>1italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 1 such that lim infε0M2(αε,)/M2(ε,)=α1subscriptlimit-infimum𝜀0subscript𝑀2𝛼𝜀subscript𝑀2𝜀subscript𝛼1\liminf_{\varepsilon\rightarrow 0}M_{2}(\alpha\varepsilon,\mathcal{B})/M_{2}(% \varepsilon,\mathcal{B})=\alpha_{1}lim inf start_POSTSUBSCRIPT italic_ε → 0 end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_α italic_ε , caligraphic_B ) / italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε , caligraphic_B ) = italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Let 𝔠2=αNσ/2subscript𝔠2superscript𝛼𝑁𝜎2\mathfrak{c}_{2}=\alpha^{N}\sigma/\sqrt{2}fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ / square-root start_ARG 2 end_ARG, then we have

VK(ε¯n,𝒟)15V2(𝔠2ε¯n,).subscript𝑉𝐾subscript¯𝜀𝑛𝒟15subscript𝑉2subscript𝔠2subscript¯𝜀𝑛\displaystyle V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})\leq\frac{1}{5% }V_{2}(\mathfrak{c}_{2}\bar{\varepsilon}_{n},\mathcal{B}).italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) ≤ divide start_ARG 1 end_ARG start_ARG 5 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) . (57)
Proof.

We have

V2(𝔠2ε¯n,)subscript𝑉2subscript𝔠2subscript¯𝜀𝑛\displaystyle V_{2}\left(\mathfrak{c}_{2}\bar{\varepsilon}_{n},\mathcal{B}\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) M2(2𝔠2ε¯n,)=M2(αk2σε¯n,)absentsubscript𝑀22subscript𝔠2subscript¯𝜀𝑛subscript𝑀2superscript𝛼𝑘2𝜎subscript¯𝜀𝑛\displaystyle\geqslant M_{2}\left(2\mathfrak{c}_{2}\bar{\varepsilon}_{n},% \mathcal{B}\right)=M_{2}\left(\alpha^{k}\sqrt{2}\sigma\bar{\varepsilon}_{n},% \mathcal{B}\right)⩾ italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_α start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) (58)
α1kM2(2σε¯n,)>5M2(2σε¯n,).absentsuperscriptsubscript𝛼1𝑘subscript𝑀22𝜎subscript¯𝜀𝑛5subscript𝑀22𝜎subscript¯𝜀𝑛\displaystyle\geq\alpha_{1}^{k}M_{2}\left(\sqrt{2}\sigma\bar{\varepsilon}_{n},% \mathcal{B}\right)>5M_{2}\left(\sqrt{2}\sigma\bar{\varepsilon}_{n},\mathcal{B}% \right).≥ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) > 5 italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) .
5V2(2σε¯n,)=5Vk(ε¯n,𝒟).absent5subscript𝑉22𝜎subscript¯𝜀𝑛5subscript𝑉𝑘subscript¯𝜀𝑛𝒟\displaystyle\geqslant 5V_{2}\left(\sqrt{2}\sigma\bar{\varepsilon}_{n},% \mathcal{B}\right)=5V_{k}\left(\bar{\varepsilon}_{n},\mathcal{D}\right).⩾ 5 italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = 5 italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) .

\blacksquare

Appendix B Proof of Claims and Theorems in Section 3

B.1 Proof of Lemma 3.2

Fixed an integer p0𝑝0p\geq 0italic_p ≥ 0. From (22) in [30], for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on p𝑝pitalic_p, we have

Φ(k)(0)dksuperscriptΦ𝑘0superscript𝑑𝑘\displaystyle\frac{\Phi^{(k)}(0)}{d^{k}}divide start_ARG roman_Φ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( 0 ) end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG μk2Φ(k)(0)dk,kp+1.formulae-sequenceabsentsubscript𝜇𝑘2superscriptΦ𝑘0superscript𝑑𝑘𝑘𝑝1\displaystyle\leq\mu_{k}\leq\frac{2\Phi^{(k)}(0)}{d^{k}},\quad k\leq p+1.≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ divide start_ARG 2 roman_Φ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( 0 ) end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG , italic_k ≤ italic_p + 1 . (59)

Observe that for any k0𝑘0k\geq 0italic_k ≥ 0, we have k!ak=Φ(k)(0)𝑘subscript𝑎𝑘superscriptΦ𝑘0k!a_{k}=\Phi^{(k)}(0)italic_k ! italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_Φ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( 0 ). Therefore, if we let 1:=minkp+1{k!ak}>0assignsubscript1subscript𝑘𝑝1𝑘subscript𝑎𝑘0\mathfrak{C}_{1}:=\min_{k\leq p+1}\{k!a_{k}\}>0fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_k ≤ italic_p + 1 end_POSTSUBSCRIPT { italic_k ! italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } > 0 and 2:=2maxkp+1{k!ak}<assignsubscript22subscript𝑘𝑝1𝑘subscript𝑎𝑘\mathfrak{C}_{2}:=2\max_{k\leq p+1}\{k!a_{k}\}<\inftyfraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := 2 roman_max start_POSTSUBSCRIPT italic_k ≤ italic_p + 1 end_POSTSUBSCRIPT { italic_k ! italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } < ∞, then we get the desired results.

The second part of Lemma 3.2 is a direct result of Lemma D.2. \square

Remark B.1.

The results in [30] consider data uniformly distributed on d𝕊d𝑑superscript𝕊𝑑\sqrt{d}\mathbb{S}^{d}square-root start_ARG italic_d end_ARG blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, while we consider the unit sphere. However, the spectrum estimates borrowed from [30] are invariant with respect to this scaling, hence we can directly use (22) in [30] in the above proof.

B.2 Proof of Theorem 3.3

Let εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the population Mendelson complexity defined in (27) with K=K𝚒𝚗𝐾superscript𝐾𝚒𝚗K=K^{\mathtt{in}}italic_K = italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. We need the following lemmas.

Lemma B.2 (Restate Lemma 3.2).

Suppose that p{1,2,3,}𝑝123p\in\{1,2,3,\cdots\}italic_p ∈ { 1 , 2 , 3 , ⋯ } and k{1,2,3,,p,p+1}𝑘123𝑝𝑝1k\in\{1,2,3,\cdots,p,p+1\}italic_k ∈ { 1 , 2 , 3 , ⋯ , italic_p , italic_p + 1 }. There exist constants 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT only depending on p𝑝pitalic_p, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on p𝑝pitalic_p, we have

1dksubscript1superscript𝑑𝑘\displaystyle\frac{\mathfrak{C}_{1}}{d^{k}}divide start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG μk2dk,absentsubscript𝜇𝑘subscript2superscript𝑑𝑘\displaystyle\leq\mu_{k}\leq\frac{\mathfrak{C}_{2}}{d^{k}},≤ italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG , (60)
1dksubscript1superscript𝑑𝑘\displaystyle\mathfrak{C}_{1}d^{k}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT N(d,k)2dk.absent𝑁𝑑𝑘subscript2superscript𝑑𝑘\displaystyle\leq N(d,k)\leq\mathfrak{C}_{2}d^{k}.≤ italic_N ( italic_d , italic_k ) ≤ fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT .
Lemma B.3.

Suppose that q{1,2,3,}𝑞123q\in\{1,2,3,\cdots\}italic_q ∈ { 1 , 2 , 3 , ⋯ }. There exists a constant 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT only depending on q𝑞qitalic_q, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on q𝑞qitalic_q, we have

3k=0N(d,k)min{μk,μq}1.subscript3superscriptsubscript𝑘0𝑁𝑑𝑘subscript𝜇𝑘subscript𝜇𝑞1\displaystyle\mathfrak{C}_{3}\leq\sum_{k=0}^{\infty}N(d,k)\min\{\mu_{k},\mu_{q% }\}\leq 1.fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } ≤ 1 . (61)
Proof.

From Assumption  1 we have kN(d,k)min{μk,μq}kN(d,k)μk1subscript𝑘𝑁𝑑𝑘subscript𝜇𝑘subscript𝜇𝑞subscript𝑘𝑁𝑑𝑘subscript𝜇𝑘1\sum_{k}N(d,k)\min\{\mu_{k},\mu_{q}\}\leq\sum_{k}N(d,k)\mu_{k}\leq 1∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } ≤ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1; from Lemma B.2 we have kN(d,k)min{μk,μq}N(d,q)μq12subscript𝑘𝑁𝑑𝑘subscript𝜇𝑘subscript𝜇𝑞𝑁𝑑𝑞subscript𝜇𝑞superscriptsubscript12\sum_{k}N(d,k)\min\{\mu_{k},\mu_{q}\}\geq N(d,q)\mu_{q}\geq\mathfrak{C}_{1}^{2}∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } ≥ italic_N ( italic_d , italic_q ) italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. \square

Lemma B.4.

Suppose that γ>0𝛾0\gamma>0italic_γ > 0 is a real number and p𝑝pitalic_p is an integer satisfying that γ[2p,2p+2)𝛾2𝑝2𝑝2\gamma\in[2p,2p+2)italic_γ ∈ [ 2 italic_p , 2 italic_p + 2 ). Then, there exist constants 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT only depending on p𝑝pitalic_p satisfying that for any constants 0<c1c2<0subscript𝑐1subscript𝑐20<c_{1}\leq c_{2}<\infty0 < italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ∞, there exists a sufficiently large constant \mathfrak{C}fraktur_C only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C and any n[c1dγ,c2dγ]𝑛subscript𝑐1superscript𝑑𝛾subscript𝑐2superscript𝑑𝛾n\in[c_{1}d^{\gamma},c_{2}d^{\gamma}]italic_n ∈ [ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ], we have

1n1/2εn22n1/2.subscript1superscript𝑛12superscriptsubscript𝜀𝑛2subscript2superscript𝑛12\displaystyle\mathfrak{C}_{1}n^{-1/2}\leq\varepsilon_{n}^{2}\leq\mathfrak{C}_{% 2}n^{-1/2}.fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ≤ italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT . (62)

Now we begin to prove Theorem 3.3 by using Theorem 6.3.

  • From Lemma B.4, it is easy to check that for any absolute constant C𝐶Citalic_C, there exists a sufficiently large constant \mathfrak{C}fraktur_C only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have C2εn21/nsuperscript𝐶2superscriptsubscript𝜀𝑛21𝑛C^{2}\varepsilon_{n}^{2}\geq 1/nitalic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 1 / italic_n.

  • It is assumed that K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT satisfies Assumption  1.

  • Therefore, all requirements in Theorem 6.3 are satisfied.

From Lemma B.4, we get the desired results. \square

B.2.1 Proof of Lemma B.4

We need to apply the Lemma E.1 and Remark E.2. Suppose that γ[2p,2p+2)𝛾2𝑝2𝑝2\gamma\in[2p,2p+2)italic_γ ∈ [ 2 italic_p , 2 italic_p + 2 ) for some integer p𝑝pitalic_p. Let C(p)=[4e2σ23/22]1/2𝐶𝑝superscriptdelimited-[]4superscript𝑒2superscript𝜎2subscript3superscriptsubscript2212C(p)=\left[{4e^{2}\sigma^{2}\mathfrak{C}_{3}}/{\mathfrak{C}_{2}^{2}}\right]^{1% /2}italic_C ( italic_p ) = [ 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT / fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, where 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are two constants only depending on p𝑝pitalic_p given in the Lemma B.2 and the Lemma B.3 respectively. It is clear that

εlow2C(p)μpd2p/nLemmaB.2C(p)11n,superscriptsubscript𝜀𝑙𝑜𝑤2𝐶𝑝subscript𝜇𝑝superscript𝑑2𝑝𝑛𝐿𝑒𝑚𝑚𝑎B.2𝐶𝑝subscript11𝑛\displaystyle\varepsilon_{low}^{2}\triangleq C(p)\mu_{p}\sqrt{d^{2p}/n}% \overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}{\geq}C(p)% \mathfrak{C}_{1}\sqrt{\frac{1}{n}},italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ italic_C ( italic_p ) italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT / italic_n end_ARG start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG italic_C ( italic_p ) fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG end_ARG , (63)

and for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on p𝑝pitalic_p and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

εlow2μp+1superscriptsubscript𝜀𝑙𝑜𝑤2subscript𝜇𝑝1\displaystyle\frac{\varepsilon_{low}^{2}}{\mu_{p+1}}divide start_ARG italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT end_ARG LemmaB.212C(p)d2p+2c2dγ1.𝐿𝑒𝑚𝑚𝑎B.2subscript1subscript2𝐶𝑝superscript𝑑2𝑝2subscript𝑐2superscript𝑑𝛾1\displaystyle\overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}% {\geq}\frac{\mathfrak{C}_{1}}{\mathfrak{C}_{2}}C(p)\sqrt{\frac{d^{2p+2}}{c_{2}% d^{\gamma}}}\geq 1.start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG divide start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_C ( italic_p ) square-root start_ARG divide start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p + 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG end_ARG ≥ 1 . (64)

Therefore, we have

k=0N(d,k)min{μk,εlow2}(64)k=0N(d,k)min{μk,μp+1}superscriptsubscript𝑘0𝑁𝑑𝑘subscript𝜇𝑘superscriptsubscript𝜀𝑙𝑜𝑤264superscriptsubscript𝑘0𝑁𝑑𝑘subscript𝜇𝑘subscript𝜇𝑝1\displaystyle~{}\sum_{k=0}^{\infty}N(d,k)\min\{\mu_{k},\varepsilon_{low}^{2}\}% \overset{(\ref{eqn:eps_low_larger_than_mu})}{\geq}\sum_{k=0}^{\infty}N(d,k)% \min\{\mu_{k},\mu_{p+1}\}∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } start_OVERACCENT ( ) end_OVERACCENT start_ARG ≥ end_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT } (65)
LemmaB.3𝐿𝑒𝑚𝑚𝑎B.3\displaystyle\overset{Lemma\ref{lemma:theorem_upper_inner_assist_summation}}{\geq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG 3=Definition of C(p)[C(p)]24e2σ222subscript3Definition of 𝐶𝑝superscriptdelimited-[]𝐶𝑝24superscript𝑒2superscript𝜎2superscriptsubscript22\displaystyle~{}\mathfrak{C}_{3}\overset{\text{Definition of }C(p)}{=}\frac{[C% (p)]^{2}}{4e^{2}\sigma^{2}}\mathfrak{C}_{2}^{2}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_OVERACCENT Definition of italic_C ( italic_p ) end_OVERACCENT start_ARG = end_ARG divide start_ARG [ italic_C ( italic_p ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
LemmaB.2𝐿𝑒𝑚𝑚𝑎B.2\displaystyle\overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}% {\geq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG [C(p)]24e2σ2μp2d2p=nεlow44e2σ2.superscriptdelimited-[]𝐶𝑝24superscript𝑒2superscript𝜎2superscriptsubscript𝜇𝑝2superscript𝑑2𝑝𝑛superscriptsubscript𝜀𝑙𝑜𝑤44superscript𝑒2superscript𝜎2\displaystyle~{}\frac{[C(p)]^{2}}{4e^{2}\sigma^{2}}\mu_{p}^{2}d^{2p}=\frac{n% \varepsilon_{low}^{4}}{4e^{2}\sigma^{2}}.divide start_ARG [ italic_C ( italic_p ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT = divide start_ARG italic_n italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Thus, we know that εn2εlow2C(p)11/nsuperscriptsubscript𝜀𝑛2superscriptsubscript𝜀𝑙𝑜𝑤2𝐶𝑝subscript11𝑛\varepsilon_{n}^{2}\geq\varepsilon_{low}^{2}\geq C(p)\mathfrak{C}_{1}\sqrt{1/n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_C ( italic_p ) fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG 1 / italic_n end_ARG.

We then produce the upper bound on εn2superscriptsubscript𝜀𝑛2\varepsilon_{n}^{2}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in a similar way. Let C~(p)=[4e2σ2/12]1/2~𝐶𝑝superscriptdelimited-[]4superscript𝑒2superscript𝜎2superscriptsubscript1212\tilde{C}(p)=\left[{4e^{2}\sigma^{2}}/{\mathfrak{C}_{1}^{2}}\right]^{1/2}over~ start_ARG italic_C end_ARG ( italic_p ) = [ 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, where 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on p𝑝pitalic_p given in the Lemma B.2. It is clear that

εupp2C~(p)μpd2p/nLemmaB.2C~(p)21n,superscriptsubscript𝜀𝑢𝑝𝑝2~𝐶𝑝subscript𝜇𝑝superscript𝑑2𝑝𝑛𝐿𝑒𝑚𝑚𝑎B.2~𝐶𝑝subscript21𝑛\displaystyle\varepsilon_{upp}^{2}\triangleq\tilde{C}(p)\mu_{p}\sqrt{d^{2p}/n}% \overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}{\leq}\tilde{% C}(p)\mathfrak{C}_{2}\sqrt{\frac{1}{n}},italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ over~ start_ARG italic_C end_ARG ( italic_p ) italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT / italic_n end_ARG start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG over~ start_ARG italic_C end_ARG ( italic_p ) fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG end_ARG , (66)

and for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on p2𝑝2p\geq 2italic_p ≥ 2 and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have

εupp2μp1superscriptsubscript𝜀𝑢𝑝𝑝2subscript𝜇𝑝1\displaystyle\frac{\varepsilon_{upp}^{2}}{\mu_{p-1}}divide start_ARG italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT end_ARG LemmaB.221C~(p)d2p2c1dγ1.𝐿𝑒𝑚𝑚𝑎B.2subscript2subscript1~𝐶𝑝superscript𝑑2𝑝2subscript𝑐1superscript𝑑𝛾1\displaystyle\overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}% {\leq}\frac{\mathfrak{C}_{2}}{\mathfrak{C}_{1}}\tilde{C}(p)\sqrt{\frac{d^{2p-2% }}{c_{1}d^{\gamma}}}\leq 1.start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG over~ start_ARG italic_C end_ARG ( italic_p ) square-root start_ARG divide start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_ARG end_ARG ≤ 1 . (67)

Therefore, we have

k=0N(d,k)min{μk,εupp2}k=0N(d,k)min{μk,μp1}superscriptsubscript𝑘0𝑁𝑑𝑘subscript𝜇𝑘superscriptsubscript𝜀𝑢𝑝𝑝2superscriptsubscript𝑘0𝑁𝑑𝑘subscript𝜇𝑘subscript𝜇𝑝1\displaystyle~{}\sum_{k=0}^{\infty}N(d,k)\min\{\mu_{k},\varepsilon_{upp}^{2}\}% \leq\sum_{k=0}^{\infty}N(d,k)\min\{\mu_{k},\mu_{p-1}\}∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) roman_min { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT } (68)
LemmaB.3𝐿𝑒𝑚𝑚𝑎B.3\displaystyle\overset{Lemma\ref{lemma:theorem_upper_inner_assist_summation}}{\leq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG 1=Definition of C~(p)[C~(p)]24e2σ2121Definition of ~𝐶𝑝superscriptdelimited-[]~𝐶𝑝24superscript𝑒2superscript𝜎2superscriptsubscript12\displaystyle~{}1\overset{\text{Definition of }\tilde{C}(p)}{=}\frac{[\tilde{C% }(p)]^{2}}{4e^{2}\sigma^{2}}\mathfrak{C}_{1}^{2}1 start_OVERACCENT Definition of over~ start_ARG italic_C end_ARG ( italic_p ) end_OVERACCENT start_ARG = end_ARG divide start_ARG [ over~ start_ARG italic_C end_ARG ( italic_p ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
LemmaB.2𝐿𝑒𝑚𝑚𝑎B.2\displaystyle\overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}% {\leq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG [C~(p)]24e2σ2μp2d2p=nεupp44e2σ2.superscriptdelimited-[]~𝐶𝑝24superscript𝑒2superscript𝜎2superscriptsubscript𝜇𝑝2superscript𝑑2𝑝𝑛superscriptsubscript𝜀𝑢𝑝𝑝44superscript𝑒2superscript𝜎2\displaystyle~{}\frac{[\tilde{C}(p)]^{2}}{4e^{2}\sigma^{2}}\mu_{p}^{2}d^{2p}=% \frac{n\varepsilon_{upp}^{4}}{4e^{2}\sigma^{2}}.divide start_ARG [ over~ start_ARG italic_C end_ARG ( italic_p ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT = divide start_ARG italic_n italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Thus, we know that εn2εupp2C~(p)21/nsuperscriptsubscript𝜀𝑛2superscriptsubscript𝜀𝑢𝑝𝑝2~𝐶𝑝subscript21𝑛\varepsilon_{n}^{2}\leq\varepsilon_{upp}^{2}{\leq}\tilde{C}(p)\mathfrak{C}_{2}% \sqrt{1/n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ over~ start_ARG italic_C end_ARG ( italic_p ) fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG 1 / italic_n end_ARG. \square

B.3 Proof of Lemma 3.4

We need the following lemma:

Lemma B.5.

For any integer q0𝑞0q\geq 0italic_q ≥ 0, we have μq>μq+2subscript𝜇𝑞subscript𝜇𝑞2\mu_{q}>\mu_{q+2}italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT italic_q + 2 end_POSTSUBSCRIPT.

Proof.

Deferred to the end of this subsection.

Now let’s begin to prove Lemma 3.4. From Lemma 3.2, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, a sufficiently large constant only depending on p𝑝pitalic_p, we have

μp+121d1μp,μp+221d2μp.formulae-sequencesubscript𝜇𝑝1subscript2subscript1superscript𝑑1subscript𝜇𝑝subscript𝜇𝑝2subscript2subscript1superscript𝑑2subscript𝜇𝑝\mu_{p+1}\leq\frac{\mathfrak{C}_{2}}{\mathfrak{C}_{1}}d^{-1}\mu_{p},\quad\mu_{% p+2}\leq\frac{\mathfrak{C}_{2}}{\mathfrak{C}_{1}}d^{-2}\mu_{p}.italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_p + 2 end_POSTSUBSCRIPT ≤ divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT .

Then, from Lemma B.5, we further have

μjmax{μp+1,μp+2}21d1μp,j=p+1,p+2,.formulae-sequencesubscript𝜇𝑗subscript𝜇𝑝1subscript𝜇𝑝2subscript2subscript1superscript𝑑1subscript𝜇𝑝𝑗𝑝1𝑝2\mu_{j}\leq\max\{\mu_{p+1},\mu_{p+2}\}\leq\frac{\mathfrak{C}_{2}}{\mathfrak{C}% _{1}}d^{-1}\mu_{p},\quad j=p+1,p+2,\cdots.italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ roman_max { italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_p + 2 end_POSTSUBSCRIPT } ≤ divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_j = italic_p + 1 , italic_p + 2 , ⋯ .

\square

Proof of Lemma B.5: From [5], we have

μk+2μksubscript𝜇𝑘2subscript𝜇𝑘\displaystyle\frac{\mu_{k+2}}{\mu_{k}}divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG =14s=0a2s+k+2(2s+k+2)!(2s)!Γ(s+12)Γ(s+k+2+d+12)s=0a2s+k(2s+k)!(2s)!Γ(s+12)Γ(s+k+d+12)absent14superscriptsubscript𝑠0subscript𝑎2𝑠𝑘22𝑠𝑘22𝑠Γ𝑠12Γ𝑠𝑘2𝑑12superscriptsubscript𝑠0subscript𝑎2𝑠𝑘2𝑠𝑘2𝑠Γ𝑠12Γ𝑠𝑘𝑑12\displaystyle=\frac{1}{4}\cdot\frac{\sum_{s=0}^{\infty}a_{2s+k+2}\frac{(2s+k+2% )!}{(2s)!}\frac{\Gamma(s+\frac{1}{2})}{\Gamma(s+k+2+\frac{d+1}{2})}}{\sum_{s=0% }^{\infty}a_{2s+k}\frac{(2s+k)!}{(2s)!}\frac{\Gamma(s+\frac{1}{2})}{\Gamma(s+k% +\frac{d+1}{2})}}= divide start_ARG 1 end_ARG start_ARG 4 end_ARG ⋅ divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k + 2 end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k + 2 ) ! end_ARG start_ARG ( 2 italic_s ) ! end_ARG divide start_ARG roman_Γ ( italic_s + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + 2 + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k ) ! end_ARG start_ARG ( 2 italic_s ) ! end_ARG divide start_ARG roman_Γ ( italic_s + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG end_ARG
=14s=1a2s+k(2s+k)!(2s2)!Γ(s12)Γ(s+k+1+d+12)s=0a2s+k(2s+k)!(2s)!Γ(s+12)Γ(s+k+d+12)absent14superscriptsubscript𝑠1subscript𝑎2𝑠𝑘2𝑠𝑘2𝑠2Γ𝑠12Γ𝑠𝑘1𝑑12superscriptsubscript𝑠0subscript𝑎2𝑠𝑘2𝑠𝑘2𝑠Γ𝑠12Γ𝑠𝑘𝑑12\displaystyle=\frac{1}{4}\cdot\frac{\sum_{s=1}^{\infty}a_{2s+k}\frac{(2s+k)!}{% (2s-2)!}\frac{\Gamma(s-\frac{1}{2})}{\Gamma(s+k+1+\frac{d+1}{2})}}{\sum_{s=0}^% {\infty}a_{2s+k}\frac{(2s+k)!}{(2s)!}\frac{\Gamma(s+\frac{1}{2})}{\Gamma(s+k+% \frac{d+1}{2})}}= divide start_ARG 1 end_ARG start_ARG 4 end_ARG ⋅ divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k ) ! end_ARG start_ARG ( 2 italic_s - 2 ) ! end_ARG divide start_ARG roman_Γ ( italic_s - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + 1 + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k ) ! end_ARG start_ARG ( 2 italic_s ) ! end_ARG divide start_ARG roman_Γ ( italic_s + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG end_ARG
=s=1a2s+k(2s+k)!(2s)!Γ(s+12)Γ(s+k+d+12)ss+k+d+12s=0a2s+k(2s+k)!(2s)!Γ(s+12)Γ(s+k+d+12)absentsuperscriptsubscript𝑠1subscript𝑎2𝑠𝑘2𝑠𝑘2𝑠Γ𝑠12Γ𝑠𝑘𝑑12𝑠𝑠𝑘𝑑12superscriptsubscript𝑠0subscript𝑎2𝑠𝑘2𝑠𝑘2𝑠Γ𝑠12Γ𝑠𝑘𝑑12\displaystyle=\frac{\sum_{s=1}^{\infty}a_{2s+k}\frac{(2s+k)!}{(2s)!}\frac{% \Gamma(s+\frac{1}{2})}{\Gamma(s+k+\frac{d+1}{2})}\cdot\frac{s}{s+k+\frac{d+1}{% 2}}}{\sum_{s=0}^{\infty}a_{2s+k}\frac{(2s+k)!}{(2s)!}\frac{\Gamma(s+\frac{1}{2% })}{\Gamma(s+k+\frac{d+1}{2})}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k ) ! end_ARG start_ARG ( 2 italic_s ) ! end_ARG divide start_ARG roman_Γ ( italic_s + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG ⋅ divide start_ARG italic_s end_ARG start_ARG italic_s + italic_k + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 2 italic_s + italic_k end_POSTSUBSCRIPT divide start_ARG ( 2 italic_s + italic_k ) ! end_ARG start_ARG ( 2 italic_s ) ! end_ARG divide start_ARG roman_Γ ( italic_s + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) end_ARG start_ARG roman_Γ ( italic_s + italic_k + divide start_ARG italic_d + 1 end_ARG start_ARG 2 end_ARG ) end_ARG end_ARG
<1.absent1\displaystyle<1.< 1 .

\square

B.4 Proof of Theorem 3.5

Let ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the covering radius defined in Proposition 6.7 with =𝚒𝚗superscript𝚒𝚗\mathcal{H}=\mathcal{H}^{\mathtt{in}}caligraphic_H = caligraphic_H start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT. We need the following lemma.

Lemma B.6.

Suppose that γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ } is an integer and p=γ/2𝑝𝛾2p=\gamma/2italic_p = italic_γ / 2. Then, for any constants 0<c1c2<0subscript𝑐1subscript𝑐20<c_{1}\leq c_{2}<\infty0 < italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ∞, there exist constants \mathfrak{C}fraktur_C, 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C and any n[c1dγ,c2dγ]𝑛subscript𝑐1superscript𝑑𝛾subscript𝑐2superscript𝑑𝛾n\in[c_{1}d^{\gamma},c_{2}d^{\gamma}]italic_n ∈ [ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ], we have

11nsubscript11𝑛\displaystyle\mathfrak{C}_{1}\sqrt{\frac{1}{n}}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG end_ARG <ε¯n2<21n.absentsuperscriptsubscript¯𝜀𝑛2subscript21𝑛\displaystyle<\bar{\varepsilon}_{n}^{2}<\mathfrak{C}_{2}\sqrt{\frac{1}{n}}.< over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG end_ARG . (69)
Proof.

Deferred to the end of this subsection.

Now let’s begin to prove Theorem 3.5 by using Theorem 6.10.

From Lemma B.6, it is easy to check that there exists a sufficiently large constant \mathfrak{C}fraktur_C only depending on c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have nε¯n22log2𝑛superscriptsubscript¯𝜀𝑛222n\bar{\varepsilon}_{n}^{2}\geq 2\log 2italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 2 roman_log 2.

We also assert that there exist constants 𝔠2subscript𝔠2\mathfrak{c}_{2}fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and \mathfrak{C}fraktur_C only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, we will prove the following inequality

k:μk>𝔠22ε¯n2/36N(d,k)log(μk𝔠22ε¯n2/36)10nε¯n2,subscript:𝑘subscript𝜇𝑘superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236𝑁𝑑𝑘subscript𝜇𝑘superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛23610𝑛superscriptsubscript¯𝜀𝑛2\displaystyle\sum_{k:\mu_{k}>\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/36}% N(d,k)\log\left(\frac{\mu_{k}}{\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/3% 6}\right)\geq 10n\bar{\varepsilon}_{n}^{2},∑ start_POSTSUBSCRIPT italic_k : italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_ARG ) ≥ 10 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (70)

at the end of this subsection.

From (48), we know that (70) implies VK(ε¯n,𝒟)V2(𝔠2ε¯n,)/5subscript𝑉𝐾subscript¯𝜀𝑛𝒟subscript𝑉2subscript𝔠2subscript¯𝜀𝑛5V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})\leq V_{2}(\mathfrak{c}_{2}% \bar{\varepsilon}_{n},\mathcal{B})/5italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) / 5, and hence from Theorem 6.10 we get the desired results. \square

Proof of Lemma B.6: Suppose that p𝑝pitalic_p is a fixed integer. Let C(p)=min{c1/(4σ2),C(p)=\min\{\sqrt{c_{1}}/(4\sigma^{2}),italic_C ( italic_p ) = roman_min { square-root start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG / ( 4 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , 121log(2)/(c22)}\frac{1}{2}\mathfrak{C}_{1}\log\left(2\right)/\left(\sqrt{c_{2}}\mathfrak{C}_{% 2}\right)\}divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_log ( 2 ) / ( square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) }. It is clear that

ε¯low2C(p)μpd2p/n<μp/(2σ2).superscriptsubscript¯𝜀𝑙𝑜𝑤2𝐶𝑝subscript𝜇𝑝superscript𝑑2𝑝𝑛subscript𝜇𝑝2superscript𝜎2\displaystyle\bar{\varepsilon}_{low}^{2}\triangleq C(p)\mu_{p}\sqrt{d^{2p}/n}<% \mu_{p}/(2\sigma^{2}).over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ italic_C ( italic_p ) italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT / italic_n end_ARG < italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / ( 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (71)

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on p𝑝pitalic_p and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

V2(2σε¯low,)LemmaA.5K(2σε¯low)12N(d,p)log(μp2σ2ε¯low2)subscript𝑉22𝜎subscript¯𝜀𝑙𝑜𝑤𝐿𝑒𝑚𝑚𝑎A.5𝐾2𝜎subscript¯𝜀𝑙𝑜𝑤12𝑁𝑑𝑝subscript𝜇𝑝2superscript𝜎2superscriptsubscript¯𝜀𝑙𝑜𝑤2\displaystyle~{}V_{2}(\sqrt{2}\sigma\bar{\varepsilon}_{low},\mathcal{B})% \overset{Lemma\ref{lemma_entropy_of_RKHS}}{\geq}K\left(\sqrt{2}\sigma\bar{% \varepsilon}_{low}\right)\geq\frac{1}{2}N(d,p)\log\left(\frac{\mu_{p}}{2\sigma% ^{2}\bar{\varepsilon}_{low}^{2}}\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG italic_K ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (72)
Lemma B.2 and Definition of ε¯low2Lemma B.2 and Definition of superscriptsubscript¯𝜀𝑙𝑜𝑤2\displaystyle\overset{\text{Lemma }\ref{lemma:inner_mendelson_point_control_% assist_2}\text{ and Definition of }\bar{\varepsilon}_{low}^{2}}{\geq}start_OVERACCENT Lemma and Definition of over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≥ end_ARG 121dplog(c12σ2C(p))Definition of C(p)121dplog(2)12subscript1superscript𝑑𝑝subscript𝑐12superscript𝜎2𝐶𝑝Definition of 𝐶𝑝12subscript1superscript𝑑𝑝2\displaystyle~{}\frac{1}{2}\mathfrak{C}_{1}d^{p}\log\left(\frac{\sqrt{c_{1}}}{% 2\sigma^{2}C(p)}\right)\overset{\text{Definition of }C(p)}{\geq}\frac{1}{2}% \mathfrak{C}_{1}d^{p}\log\left(2\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( divide start_ARG square-root start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C ( italic_p ) end_ARG ) start_OVERACCENT Definition of italic_C ( italic_p ) end_OVERACCENT start_ARG ≥ end_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( 2 )
Definition of C(p)Definition of 𝐶𝑝\displaystyle\overset{\text{Definition of }C(p)}{\geq}start_OVERACCENT Definition of italic_C ( italic_p ) end_OVERACCENT start_ARG ≥ end_ARG C(p)c22dpLemma B.2C(p)μpnd2p=nε¯low2.𝐶𝑝subscript𝑐2subscript2superscript𝑑𝑝Lemma B.2𝐶𝑝subscript𝜇𝑝𝑛superscript𝑑2𝑝𝑛superscriptsubscript¯𝜀𝑙𝑜𝑤2\displaystyle~{}C(p)\sqrt{c_{2}}\mathfrak{C}_{2}d^{p}\overset{\text{Lemma }% \ref{lemma:inner_mendelson_point_control_assist_2}}{\geq}C(p)\mu_{p}\sqrt{nd^{% 2p}}=n\bar{\varepsilon}_{low}^{2}.italic_C ( italic_p ) square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT overLemma start_ARG ≥ end_ARG italic_C ( italic_p ) italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT square-root start_ARG italic_n italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT end_ARG = italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Recall the definition of ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as well as Lemma A.8, we then have nε¯n2=VK(ε¯n,𝒟)=V2(2σε¯n,)𝑛superscriptsubscript¯𝜀𝑛2subscript𝑉𝐾subscript¯𝜀𝑛𝒟subscript𝑉22𝜎subscript¯𝜀𝑛n\bar{\varepsilon}_{n}^{2}=V_{K}(\bar{\varepsilon}_{n},\mathcal{D})=V_{2}(% \sqrt{2}\sigma\bar{\varepsilon}_{n},\mathcal{B})italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ). From the monotonicity of V2(,)subscript𝑉2V_{2}(\cdot,\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ , caligraphic_B ), we then have ε¯n2ε¯low2superscriptsubscript¯𝜀𝑛2superscriptsubscript¯𝜀𝑙𝑜𝑤2\bar{\varepsilon}_{n}^{2}\geq\bar{\varepsilon}_{low}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

On the other hand, let C~(p)=max{36c2/(2σ2),\tilde{C}(p)=\max\{36\sqrt{c_{2}}/(2\sigma^{2}),over~ start_ARG italic_C end_ARG ( italic_p ) = roman_max { 36 square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG / ( 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , p2log(2)/(c11)}p\mathfrak{C}_{2}\log\left(2\right)/\left(\sqrt{c_{1}}\mathfrak{C}_{1}\right)\}italic_p fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_log ( 2 ) / ( square-root start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) }. It is clear that

ε¯upp2C~(p)μpd2p/n>36μp/(2σ2).superscriptsubscript¯𝜀𝑢𝑝𝑝2~𝐶𝑝subscript𝜇𝑝superscript𝑑2𝑝𝑛36subscript𝜇𝑝2superscript𝜎2\displaystyle\bar{\varepsilon}_{upp}^{2}\triangleq\tilde{C}(p)\mu_{p}\sqrt{d^{% 2p}/n}>36\mu_{p}/(2\sigma^{2}).over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ over~ start_ARG italic_C end_ARG ( italic_p ) italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT / italic_n end_ARG > 36 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / ( 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (73)

Furthermore, from Lemma B.2 and Lemma 3.4, one can check the following claim:

Claim 1.

For any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and p𝑝pitalic_p, we have

K(2σε¯upp/6)=0, if p=0,formulae-sequence𝐾2𝜎subscript¯𝜀𝑢𝑝𝑝60 if 𝑝0\displaystyle K\left(\sqrt{2}\sigma\bar{\varepsilon}_{upp}/6\right)=0,\quad% \text{ if }p=0,italic_K ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT / 6 ) = 0 , if italic_p = 0 , (74)
2σ2ε¯upp2/36<μp1, if p=1,2,,formulae-sequence2superscript𝜎2superscriptsubscript¯𝜀𝑢𝑝𝑝236subscript𝜇𝑝1 if 𝑝12\displaystyle 2\sigma^{2}\bar{\varepsilon}_{upp}^{2}/36<\mu_{p-1},\quad\text{ % if }p=1,2,\cdots,2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 < italic_μ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT , if italic_p = 1 , 2 , ⋯ ,
K(2σε¯upp/6)pN(d,p)log(18μpσ2ε¯upp2), if p=1,2,.formulae-sequence𝐾2𝜎subscript¯𝜀𝑢𝑝𝑝6𝑝𝑁𝑑𝑝18subscript𝜇𝑝superscript𝜎2superscriptsubscript¯𝜀𝑢𝑝𝑝2 if 𝑝12\displaystyle K\left(\sqrt{2}\sigma\bar{\varepsilon}_{upp}/6\right)\leq pN(d,p% )\log\left(\frac{18\mu_{p}}{\sigma^{2}\bar{\varepsilon}_{upp}^{2}}\right),% \quad\text{ if }p=1,2,\cdots.italic_K ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT / 6 ) ≤ italic_p italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , if italic_p = 1 , 2 , ⋯ .

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and p𝑝pitalic_p, we have

V2(2σε¯upp,)LemmaA.5K(2σε¯upp/6)subscript𝑉22𝜎subscript¯𝜀𝑢𝑝𝑝𝐿𝑒𝑚𝑚𝑎A.5𝐾2𝜎subscript¯𝜀𝑢𝑝𝑝6\displaystyle~{}V_{2}(\sqrt{2}\sigma\bar{\varepsilon}_{upp},\mathcal{B})% \overset{Lemma\ref{lemma_entropy_of_RKHS}}{\leq}K\left(\sqrt{2}\sigma\bar{% \varepsilon}_{upp}/6\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG italic_K ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT / 6 ) (75)
Claim 1Claim 1\displaystyle\overset{\text{ Claim }\ref{claim_1}}{\leq}overClaim start_ARG ≤ end_ARG {0,p=0pN(d,p)log(18μpσ2ε¯upp2)p=1,2,\displaystyle~{}\left\{\begin{matrix}0,&\quad p=0\\ pN(d,p)\log\left(\frac{18\mu_{p}}{\sigma^{2}\bar{\varepsilon}_{upp}^{2}}\right% )&\quad p=1,2,\cdots\end{matrix}\right.{ start_ARG start_ROW start_CELL 0 , end_CELL start_CELL italic_p = 0 end_CELL end_ROW start_ROW start_CELL italic_p italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_CELL start_CELL italic_p = 1 , 2 , ⋯ end_CELL end_ROW end_ARG
Lemma B.2 and Definition of ε¯upp2Lemma B.2 and Definition of superscriptsubscript¯𝜀𝑢𝑝𝑝2\displaystyle\overset{\text{Lemma }\ref{lemma:inner_mendelson_point_control_% assist_2}\text{ and Definition of }\bar{\varepsilon}_{upp}^{2}}{\leq}start_OVERACCENT Lemma and Definition of over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≤ end_ARG {0,p=0p2dplog(18c2σ2C~(p))p=1,2,\displaystyle~{}\left\{\begin{matrix}0,&\quad p=0\\ p\mathfrak{C}_{2}d^{p}\log\left(\frac{18\sqrt{c_{2}}}{\sigma^{2}\tilde{C}(p)}% \right)&\quad p=1,2,\cdots\end{matrix}\right.{ start_ARG start_ROW start_CELL 0 , end_CELL start_CELL italic_p = 0 end_CELL end_ROW start_ROW start_CELL italic_p fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( divide start_ARG 18 square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_C end_ARG ( italic_p ) end_ARG ) end_CELL start_CELL italic_p = 1 , 2 , ⋯ end_CELL end_ROW end_ARG
Definition of C~(p)Definition of ~𝐶𝑝\displaystyle\overset{\text{Definition of }\tilde{C}(p)}{\leq}start_OVERACCENT Definition of over~ start_ARG italic_C end_ARG ( italic_p ) end_OVERACCENT start_ARG ≤ end_ARG {C~(p)μ0n,p=0C~(p)c11dpp=1,2,\displaystyle~{}\left\{\begin{matrix}\tilde{C}(p)\mu_{0}\sqrt{n},&\quad p=0\\ \tilde{C}(p)\sqrt{c_{1}}\mathfrak{C}_{1}d^{p}&\quad p=1,2,\cdots\end{matrix}\right.{ start_ARG start_ROW start_CELL over~ start_ARG italic_C end_ARG ( italic_p ) italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG , end_CELL start_CELL italic_p = 0 end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_C end_ARG ( italic_p ) square-root start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_CELL start_CELL italic_p = 1 , 2 , ⋯ end_CELL end_ROW end_ARG
Lemma B.2Lemma B.2\displaystyle\overset{\text{Lemma }\ref{lemma:inner_mendelson_point_control_% assist_2}}{\leq}overLemma start_ARG ≤ end_ARG nε¯upp2.𝑛superscriptsubscript¯𝜀𝑢𝑝𝑝2\displaystyle~{}n\bar{\varepsilon}_{upp}^{2}.italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Recall the definition of ε¯nsubscript¯𝜀𝑛\bar{\varepsilon}_{n}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as well as Lemma A.8, we then have nε¯n2=VK(ε¯n,𝒟)=V2(2σε¯n,)𝑛superscriptsubscript¯𝜀𝑛2subscript𝑉𝐾subscript¯𝜀𝑛𝒟subscript𝑉22𝜎subscript¯𝜀𝑛n\bar{\varepsilon}_{n}^{2}=V_{K}(\bar{\varepsilon}_{n},\mathcal{D})=V_{2}(% \sqrt{2}\sigma\bar{\varepsilon}_{n},\mathcal{B})italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ). From the monotonicity of V2(,)subscript𝑉2V_{2}(\cdot,\mathcal{B})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ , caligraphic_B ), we then have ε¯upp2ε¯n2superscriptsubscript¯𝜀𝑢𝑝𝑝2superscriptsubscript¯𝜀𝑛2\bar{\varepsilon}_{upp}^{2}\geq\bar{\varepsilon}_{n}^{2}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. \square

Proof of (70): From (73), there exist constants \mathfrak{C}fraktur_C and 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT only depending on p𝑝pitalic_p, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C and any n[c1d2p,c2d2p]𝑛subscript𝑐1superscript𝑑2𝑝subscript𝑐2superscript𝑑2𝑝n\in[c_{1}d^{2p},c_{2}d^{2p}]italic_n ∈ [ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 italic_p end_POSTSUPERSCRIPT ] (recall that we have γ=2p𝛾2𝑝\gamma=2pitalic_γ = 2 italic_p), we have

μp>𝔠12ε¯n2/36.subscript𝜇𝑝superscriptsubscript𝔠12superscriptsubscript¯𝜀𝑛236\mu_{p}>\mathfrak{c}_{1}^{2}\bar{\varepsilon}_{n}^{2}/36.italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 .

Let 𝔠2𝔠1subscript𝔠2subscript𝔠1\mathfrak{c}_{2}\leq\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be a sufficiently small constant satisfying 1log(361𝔠222)>102c2subscript136subscript1superscriptsubscript𝔠22subscript210subscript2subscript𝑐2\mathfrak{C}_{1}\log\left(\frac{36\mathfrak{C}_{1}}{\mathfrak{c}_{2}^{2}% \mathfrak{C}_{2}}\right)>10\mathfrak{C}_{2}\sqrt{c_{2}}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_log ( divide start_ARG 36 fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) > 10 fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG, where 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are two constants only depending on p𝑝pitalic_p given in Lemma B.2. Then, we have

k:μk>𝔠22ε¯n2/36N(d,k)log(μk𝔠22ε¯n2/36)10nε¯n2subscript:𝑘subscript𝜇𝑘superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛236𝑁𝑑𝑘subscript𝜇𝑘superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛23610𝑛superscriptsubscript¯𝜀𝑛2\displaystyle~{}\sum_{k:\mu_{k}>\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2}/% 36}N(d,k)\log\left(\frac{\mu_{k}}{\mathfrak{c}_{2}^{2}\bar{\varepsilon}_{n}^{2% }/36}\right)-10n\bar{\varepsilon}_{n}^{2}∑ start_POSTSUBSCRIPT italic_k : italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 36 end_ARG ) - 10 italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (76)
LemmaB.6𝐿𝑒𝑚𝑚𝑎B.6\displaystyle\overset{Lemma\ref{lemma:bound134_2}}{\geq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG N(d,p)log(36μp𝔠2221/n)102n𝑁𝑑𝑝36subscript𝜇𝑝superscriptsubscript𝔠22subscript21𝑛10subscript2𝑛\displaystyle~{}N(d,p)\log\left(\frac{36\mu_{p}}{\mathfrak{c}_{2}^{2}\mathfrak% {C}_{2}\sqrt{1/n}}\right)-10\mathfrak{C}_{2}\sqrt{n}italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 36 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG 1 / italic_n end_ARG end_ARG ) - 10 fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG
Lemma B.2Lemma B.2\displaystyle\overset{\text{Lemma }\ref{lemma:inner_mendelson_point_control_% assist_2}}{\geq}overLemma start_ARG ≥ end_ARG 1dplog(361𝔠222)102c2dpsubscript1superscript𝑑𝑝36subscript1superscriptsubscript𝔠22subscript210subscript2subscript𝑐2superscript𝑑𝑝\displaystyle~{}\mathfrak{C}_{1}d^{p}\log\left(\frac{36\mathfrak{C}_{1}}{% \mathfrak{c}_{2}^{2}\mathfrak{C}_{2}}\right)-10\mathfrak{C}_{2}\sqrt{c_{2}}d^{p}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( divide start_ARG 36 fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) - 10 fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT
>Definition of 𝔠2Definition of subscript𝔠2\displaystyle\overset{\text{Definition of }\mathfrak{c}_{2}}{>}start_OVERACCENT Definition of fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT start_ARG > end_ARG 0.0\displaystyle~{}0.0 .

\square

Appendix C Proof of Claims and Theorems in Section 4

C.1 The inequality (33) does not hold when γ(2p,2p+1]𝛾2𝑝2𝑝1\gamma\in(2p,2p+1]italic_γ ∈ ( 2 italic_p , 2 italic_p + 1 ] for some integer p0𝑝0p\geq 0italic_p ≥ 0

Lemma C.1.

Suppose that γ(2p,2p+1]𝛾2𝑝2𝑝1\gamma\in(2p,2p+1]italic_γ ∈ ( 2 italic_p , 2 italic_p + 1 ] for some integer p𝑝pitalic_p. Then for any constant 𝔠2>0subscript𝔠20\mathfrak{c}_{2}>0fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 only depending on c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ, when n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾{\gamma}italic_γ defined in (7), we have

VK(ε¯n,𝒟)>15V2(𝔠2ε¯n,).subscript𝑉𝐾subscript¯𝜀𝑛𝒟15subscript𝑉2subscript𝔠2subscript¯𝜀𝑛\displaystyle V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})>\frac{1}{5}V_% {2}(\mathfrak{c}_{2}\bar{\varepsilon}_{n},\mathcal{B}).italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) > divide start_ARG 1 end_ARG start_ARG 5 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) . (77)
Proof.

Recall that K(ε)=1/2j:λj>ε2log(λj/ε2)𝐾𝜀12subscript:𝑗subscript𝜆𝑗superscript𝜀2subscript𝜆𝑗superscript𝜀2K(\varepsilon)=1/2\sum_{j:\lambda_{j}>\varepsilon^{2}}\log\left({\lambda_{j}}/% {\varepsilon^{2}}\right)italic_K ( italic_ε ) = 1 / 2 ∑ start_POSTSUBSCRIPT italic_j : italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Hence, when n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾{\gamma}italic_γ, we have

VK(ε¯n,𝒟)V2(𝔠2ε¯n,)K(2σε¯n)K(𝔠2ε¯n/6) Claim 212N(d,p)log(μp2σ2ε¯n2)(1+14)12N(d,p)log(36μp𝔠22ε¯n2)35>15.subscript𝑉𝐾subscript¯𝜀𝑛𝒟subscript𝑉2subscript𝔠2subscript¯𝜀𝑛𝐾2𝜎subscript¯𝜀𝑛𝐾subscript𝔠2subscript¯𝜀𝑛6 Claim 212𝑁𝑑𝑝subscript𝜇𝑝2superscript𝜎2superscriptsubscript¯𝜀𝑛211412𝑁𝑑𝑝36subscript𝜇𝑝superscriptsubscript𝔠22superscriptsubscript¯𝜀𝑛23515\displaystyle\frac{V_{K}(\bar{\varepsilon}_{n},\mathcal{\mathcal{D}})}{V_{2}(% \mathfrak{c}_{2}\bar{\varepsilon}_{n},\mathcal{B})}\geq\frac{K(\sqrt{2}\sigma% \bar{\varepsilon}_{n})}{K(\mathfrak{c}_{2}\bar{\varepsilon}_{n}/6)}\overset{% \text{ Claim }\ref{claim_3_inner}}{\geq}\frac{\frac{1}{2}N(d,p)\log\left(\frac% {\mu_{p}}{2\sigma^{2}\bar{\varepsilon}_{n}^{2}}\right)}{\left(1+\frac{1}{4}% \right)\frac{1}{2}N(d,p)\log\left(\frac{36\mu_{p}}{\mathfrak{c}_{2}^{2}\bar{% \varepsilon}_{n}^{2}}\right)}\geq\frac{3}{5}>\frac{1}{5}.divide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) end_ARG start_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≥ divide start_ARG italic_K ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_K ( fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / 6 ) end_ARG over Claim start_ARG ≥ end_ARG divide start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG start_ARG ( 1 + divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 36 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG fraktur_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG ≥ divide start_ARG 3 end_ARG start_ARG 5 end_ARG > divide start_ARG 1 end_ARG start_ARG 5 end_ARG . (78)

\square

From Lemma B.2 and Lemma 3.4, one can check the following claim:

Claim 2.

Suppose that γ(2p,2p+1]𝛾2𝑝2𝑝1\gamma\in(2p,2p+1]italic_γ ∈ ( 2 italic_p , 2 italic_p + 1 ] for some integer p𝑝pitalic_p. Then, for any δ>0superscript𝛿0\delta^{\prime}>0italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 and for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ, δsuperscript𝛿\delta^{\prime}italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

K(2σε~2/6)(1+δ4)12N(d,p)log(18μpσ2ε~22).𝐾2𝜎subscript~𝜀261superscript𝛿412𝑁𝑑𝑝18subscript𝜇𝑝superscript𝜎2superscriptsubscript~𝜀22\displaystyle K\left(\sqrt{2}\sigma\tilde{\varepsilon}_{2}/6\right)\leq\left(1% +\frac{\delta^{\prime}}{4}\right)\frac{1}{2}N(d,p)\log\left(\frac{18\mu_{p}}{% \sigma^{2}\tilde{\varepsilon}_{2}^{2}}\right).italic_K ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 6 ) ≤ ( 1 + divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ) divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

C.2 Proof of Lemma 4.1

The proof of Lemma 4.1 can be obtained by slightly modifying the proof of Theorem 1 in [82], where ε¯n,dsubscript¯𝜀𝑛𝑑\underline{\varepsilon}_{n,d}under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n , italic_d end_POSTSUBSCRIPT and εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in [82] are replaced by ε~1subscript~𝜀1\tilde{\varepsilon}_{1}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ε~2subscript~𝜀2\tilde{\varepsilon}_{2}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT respectively. For the readers’ convenience, we present its proof below.

Let Nε~1subscript𝑁subscript~𝜀1N_{\tilde{\varepsilon}_{1}}italic_N start_POSTSUBSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT be an ε~1subscript~𝜀1\tilde{\varepsilon}_{1}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-packing set of (,d2=L22)(\mathcal{B},d^{2}=\|\cdot\|_{L^{2}}^{2})( caligraphic_B , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and let Gε~2subscript𝐺subscript~𝜀2G_{\tilde{\varepsilon}_{2}}italic_G start_POSTSUBSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT be an ε~2subscript~𝜀2\tilde{\varepsilon}_{2}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-net of (𝒟,d2= KL divergence )𝒟superscript𝑑2 KL divergence (\mathcal{D},d^{2}=\text{ KL divergence })( caligraphic_D , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = KL divergence ). The proof of Theorem 1 in [82] showed that

minf^maxf(𝕏,𝕪)ρfn(f^fL2214ε~12)1VK(ε~2,𝒟)+nε~22+log2V2(ε~1,).subscript^𝑓subscriptsubscript𝑓subscriptsimilar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿2214superscriptsubscript~𝜀121subscript𝑉𝐾subscript~𝜀2𝒟𝑛superscriptsubscript~𝜀222subscript𝑉2subscript~𝜀1\displaystyle\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{P}_{(\mathbb{% X},\mathbb{y})\sim\rho_{f_{\star}}^{\otimes n}}\left(\left\|\hat{f}-f_{\star}% \right\|_{L^{2}}^{2}\geq\frac{1}{4}\tilde{\varepsilon}_{1}^{2}\right)\geq 1-% \frac{V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})+n\tilde{\varepsilon}_{2}^{2}+% \log 2}{V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})}.roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≥ 1 - divide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) + italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log 2 end_ARG start_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) end_ARG .

Since VK(ε~2,𝒟)+nε~22+log(2)V2(ε~1,)𝔠subscript𝑉𝐾subscript~𝜀2𝒟𝑛superscriptsubscript~𝜀222subscript𝑉2subscript~𝜀1𝔠\frac{V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})+n\tilde{\varepsilon}_{2}^{2}+% \log(2)}{V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})}\leq\mathfrak{c}divide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) + italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log ( 2 ) end_ARG start_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≤ fraktur_c, we have

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL221𝔠4ε~12.subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿221𝔠4superscriptsubscript~𝜀12\displaystyle\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{% X},\mathbb{y})\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right% \|_{L^{2}}^{2}\geq\frac{1-\mathfrak{c}}{4}\tilde{\varepsilon}_{1}^{2}.roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 - fraktur_c end_ARG start_ARG 4 end_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

\square

C.3 Proof of Theorem 4.2

We will use Lemma 4.1 to prove Theorem 4.2, and the proof will be divided into three parts:

  • (i)

    γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ },

  • (ii)

    γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ],

  • (iii)

    γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ).

Proof of Theorem 4.2 (i)

This part is a direct corollary of Theorem 3.5.

Proof of Theorem 4.2 (ii)

Suppose that γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ]. Let p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋.

Let δ=ϵ+(γ2p)/(2γ)𝛿italic-ϵ𝛾2𝑝2𝛾\delta=\epsilon+(\gamma-2p)/(2\gamma)italic_δ = italic_ϵ + ( italic_γ - 2 italic_p ) / ( 2 italic_γ ). Then we have δ>(γ2p)/(2γ)𝛿𝛾2𝑝2𝛾\delta>(\gamma-2p)/(2\gamma)italic_δ > ( italic_γ - 2 italic_p ) / ( 2 italic_γ ) and (γ2p)/[(γ2p+2γδ)/2]<1𝛾2𝑝delimited-[]𝛾2𝑝2𝛾𝛿21(\gamma-2p)/[(\gamma-2p+2\gamma\delta)/2]<1( italic_γ - 2 italic_p ) / [ ( italic_γ - 2 italic_p + 2 italic_γ italic_δ ) / 2 ] < 1. Thus, it is possible to find δ>0superscript𝛿0\delta^{\prime}>0italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 only depending on γ𝛾\gammaitalic_γ and δ𝛿\deltaitalic_δ, such that

(γ2p)/[(γ2p+2γδ)/2]<(1δ)2/(1+δ)<1.𝛾2𝑝delimited-[]𝛾2𝑝2𝛾𝛿2superscript1superscript𝛿21superscript𝛿1(\gamma-2p)/[(\gamma-2p+2\gamma\delta)/2]<(1-\delta^{\prime})^{2}/(1+\delta^{% \prime})<1.( italic_γ - 2 italic_p ) / [ ( italic_γ - 2 italic_p + 2 italic_γ italic_δ ) / 2 ] < ( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 1 + italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) < 1 .

Let C(p)=δ4(γ2p)12[C1eppp1/2]𝐶𝑝superscript𝛿4𝛾2𝑝12delimited-[]subscript𝐶1superscript𝑒𝑝superscript𝑝𝑝12C(p)=\frac{\delta^{\prime}}{4}(\gamma-2p)\cdot\frac{1}{2}[C_{1}e^{p}p^{-p-1/2}]italic_C ( italic_p ) = divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( italic_γ - 2 italic_p ) ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT - italic_p - 1 / 2 end_POSTSUPERSCRIPT ] be a constant only depending on γ𝛾\gammaitalic_γ and δsuperscript𝛿\delta^{\prime}italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then we introduce

ε~12n1/2δ and ε~22C(p)dpnlog(d).superscriptsubscript~𝜀12superscript𝑛12𝛿 and superscriptsubscript~𝜀22𝐶𝑝superscript𝑑𝑝𝑛𝑑\displaystyle\tilde{\varepsilon}_{1}^{2}\triangleq n^{-1/2-\delta}\mbox{~{}and% ~{}}\tilde{\varepsilon}_{2}^{2}\triangleq C(p)\frac{d^{p}}{n}\log(d).over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ italic_n start_POSTSUPERSCRIPT - 1 / 2 - italic_δ end_POSTSUPERSCRIPT and over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ italic_C ( italic_p ) divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) . (79)

Let us further assume that d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. By Lemma B.2 we have

ε~12superscriptsubscript~𝜀12\displaystyle\tilde{\varepsilon}_{1}^{2}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =n1/2δ(c1dγ)1/2δ<1dpμpabsentsuperscript𝑛12𝛿superscriptsubscript𝑐1superscript𝑑𝛾12𝛿subscript1superscript𝑑𝑝subscript𝜇𝑝\displaystyle=n^{-1/2-\delta}\leq\left(c_{1}d^{\gamma}\right)^{-1/2-\delta}<% \frac{\mathfrak{C}_{1}}{d^{p}}\leq\mu_{p}= italic_n start_POSTSUPERSCRIPT - 1 / 2 - italic_δ end_POSTSUPERSCRIPT ≤ ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 - italic_δ end_POSTSUPERSCRIPT < divide start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT (80)
μp+1<ε~22subscript𝜇𝑝1superscriptsubscript~𝜀22\displaystyle\mu_{p+1}<\tilde{\varepsilon}_{2}^{2}italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT < over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =C(p)dpnlog(d)C(p)c1dpγlog(d)<μpabsent𝐶𝑝superscript𝑑𝑝𝑛𝑑𝐶𝑝subscript𝑐1superscript𝑑𝑝𝛾𝑑subscript𝜇𝑝\displaystyle=C(p)\frac{d^{p}}{n}\log(d)\leq\frac{C(p)}{c_{1}}d^{p-\gamma}\log% (d)<\mu_{p}= italic_C ( italic_p ) divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) ≤ divide start_ARG italic_C ( italic_p ) end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ) < italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
nε~22𝑛superscriptsubscript~𝜀22\displaystyle n\tilde{\varepsilon}_{2}^{2}italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Definition of 2δ4(γ2p)12N(d,p)log(d).Definition of subscript2superscript𝛿4𝛾2𝑝12𝑁𝑑𝑝𝑑\displaystyle\overset{\text{Definition of }\mathfrak{C}_{2}}{\leq}\frac{\delta% ^{\prime}}{4}(\gamma-2p)\cdot\frac{1}{2}N(d,p)\log(d).start_OVERACCENT Definition of fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( italic_γ - 2 italic_p ) ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( italic_d ) .

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have

V2(ε~1,)LemmaA.5K(ε~1)12N(d,p)log(μpε~12)subscript𝑉2subscript~𝜀1𝐿𝑒𝑚𝑚𝑎A.5𝐾subscript~𝜀112𝑁𝑑𝑝subscript𝜇𝑝superscriptsubscript~𝜀12\displaystyle~{}V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})\overset{Lemma\ref{% lemma_entropy_of_RKHS}}{\geq}K\left(\tilde{\varepsilon}_{1}\right)\geq\frac{1}% {2}N(d,p)\log\left(\frac{\mu_{p}}{\tilde{\varepsilon}_{1}^{2}}\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (81)
Definition of ε~12Definition of superscriptsubscript~𝜀12\displaystyle\overset{\text{Definition of }\tilde{\varepsilon}_{1}^{2}}{\geq}start_OVERACCENT Definition of over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≥ end_ARG 12N(d,p)log(1c11/2+δdγ2p+2γδ2)12𝑁𝑑𝑝subscript1superscriptsubscript𝑐112𝛿superscript𝑑𝛾2𝑝2𝛾𝛿2\displaystyle~{}\frac{1}{2}N(d,p)\log\left(\mathfrak{C}_{1}c_{1}^{1/2+\delta}{% d^{\frac{\gamma-2p+2\gamma\delta}{2}}}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 + italic_δ end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT divide start_ARG italic_γ - 2 italic_p + 2 italic_γ italic_δ end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT )
\displaystyle\geq (1δ)γ2p+2γδ212N(d,p)log(d).1superscript𝛿𝛾2𝑝2𝛾𝛿212𝑁𝑑𝑝𝑑\displaystyle~{}(1-\delta^{\prime})\frac{\gamma-2p+2\gamma\delta}{2}\cdot\frac% {1}{2}N(d,p)\log(d).( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) divide start_ARG italic_γ - 2 italic_p + 2 italic_γ italic_δ end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( italic_d ) .

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ, δ𝛿\deltaitalic_δ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

VK(ε~2,𝒟)=subscript𝑉𝐾subscript~𝜀2𝒟absent\displaystyle V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})=italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) = V2(2σε~2,)LemmaA.5K(2σε~2/6)subscript𝑉22𝜎subscript~𝜀2𝐿𝑒𝑚𝑚𝑎A.5𝐾2𝜎subscript~𝜀26\displaystyle~{}V_{2}(\sqrt{2}\sigma\tilde{\varepsilon}_{2},\mathcal{B})% \overset{Lemma\ref{lemma_entropy_of_RKHS}}{\leq}K\left(\sqrt{2}\sigma\tilde{% \varepsilon}_{2}/6\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG italic_K ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 6 ) (82)
Claim 2Claim 2\displaystyle\overset{\text{ Claim }\ref{claim_3_inner}}{\leq}overClaim start_ARG ≤ end_ARG (1+δ4)12N(d,p)log(18μpσ2ε~22)1superscript𝛿412𝑁𝑑𝑝18subscript𝜇𝑝superscript𝜎2superscriptsubscript~𝜀22\displaystyle~{}\left(1+\frac{\delta^{\prime}}{4}\right)\frac{1}{2}N(d,p)\log% \left(\frac{18\mu_{p}}{\sigma^{2}\tilde{\varepsilon}_{2}^{2}}\right)( 1 + divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ) divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
Definition of ε~22Definition of superscriptsubscript~𝜀22\displaystyle\overset{\text{Definition of }\tilde{\varepsilon}_{2}^{2}}{\leq}start_OVERACCENT Definition of over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≤ end_ARG (1+δ4)12N(d,p)log(182σ2[C(p)]1c2[log(d)]1dγ2p)1superscript𝛿412𝑁𝑑𝑝18subscript2superscript𝜎2superscriptdelimited-[]𝐶𝑝1subscript𝑐2superscriptdelimited-[]𝑑1superscript𝑑𝛾2𝑝\displaystyle~{}\left(1+\frac{\delta^{\prime}}{4}\right)\frac{1}{2}N(d,p)\log% \left(18\mathfrak{C}_{2}\sigma^{-2}[C(p)]^{-1}c_{2}[\log(d)]^{-1}d^{\gamma-2p}\right)( 1 + divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ) divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( 18 fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT [ italic_C ( italic_p ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ roman_log ( italic_d ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ - 2 italic_p end_POSTSUPERSCRIPT )
\displaystyle\leq (1+δ2)(γ2p)12N(d,p)log(d).1superscript𝛿2𝛾2𝑝12𝑁𝑑𝑝𝑑\displaystyle~{}\left(1+\frac{\delta^{\prime}}{2}\right)(\gamma-2p)\cdot\frac{% 1}{2}N(d,p)\log(d).( 1 + divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) ( italic_γ - 2 italic_p ) ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( italic_d ) .

Combining (80), (81), and (82), we finally have:

VK(ε~2,𝒟)+nε~22+log(2)V2(ε~1,)subscript𝑉𝐾subscript~𝜀2𝒟𝑛superscriptsubscript~𝜀222subscript𝑉2subscript~𝜀1absent\displaystyle\frac{V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})+n\tilde{% \varepsilon}_{2}^{2}+\log(2)}{V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})}\leqdivide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) + italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log ( 2 ) end_ARG start_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≤ (1+δ)(γ2p)12N(d,p)log(d)(1δ)γ2p+2γδ212N(d,p)log(d)1superscript𝛿𝛾2𝑝12𝑁𝑑𝑝𝑑1superscript𝛿𝛾2𝑝2𝛾𝛿212𝑁𝑑𝑝𝑑\displaystyle~{}\frac{\left(1+\delta^{\prime}\right)(\gamma-2p)\cdot\frac{1}{2% }N(d,p)\log(d)}{(1-\delta^{\prime})\frac{\gamma-2p+2\gamma\delta}{2}\cdot\frac% {1}{2}N(d,p)\log(d)}divide start_ARG ( 1 + italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( italic_γ - 2 italic_p ) ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( italic_d ) end_ARG start_ARG ( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) divide start_ARG italic_γ - 2 italic_p + 2 italic_γ italic_δ end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p ) roman_log ( italic_d ) end_ARG
<Definition of δ(1δ)<1,Definition of superscript𝛿1superscript𝛿1\displaystyle\overset{\text{Definition of }\delta^{\prime}}{<}(1-\delta^{% \prime})<1,start_OVERACCENT Definition of italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_OVERACCENT start_ARG < end_ARG ( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) < 1 ,

and from Lemma 4.1, we get

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL22δ4ε~12,subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿22superscript𝛿4superscriptsubscript~𝜀12\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\frac{\delta^{\prime}}{4}\tilde{\varepsilon}_{1}^{2},roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

finishing the proof. \square

Remark C.2.

Suppose that γ(2p,2p+1]𝛾2𝑝2𝑝1\gamma\in(2p,2p+1]italic_γ ∈ ( 2 italic_p , 2 italic_p + 1 ] for some integer p𝑝pitalic_p. In (80) and (82), if we let δ=1superscript𝛿1\delta^{\prime}=1italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 and ε~2=𝔠3dpnlog(d)subscript~𝜀2subscript𝔠3superscript𝑑𝑝𝑛𝑑\tilde{\varepsilon}_{2}=\sqrt{\mathfrak{c_{3}}\frac{d^{p}}{n}\log(d)}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG fraktur_c start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) end_ARG, one can further show that VK(𝔠3dpnlog(d),𝒟)𝔠3dplog(d)subscript𝑉𝐾subscript𝔠3superscript𝑑𝑝𝑛𝑑𝒟subscript𝔠3superscript𝑑𝑝𝑑V_{K}\left(\sqrt{\mathfrak{c_{3}}\frac{d^{p}}{n}\log(d)},\mathcal{D}\right)% \leq\mathfrak{c_{3}}d^{p}\log(d)italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( square-root start_ARG fraktur_c start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) end_ARG , caligraphic_D ) ≤ fraktur_c start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( italic_d ), and thus ε¯n2𝔠3dpnlog(d)3dpγlog(d)superscriptsubscript¯𝜀𝑛2subscript𝔠3superscript𝑑𝑝𝑛𝑑subscript3superscript𝑑𝑝𝛾𝑑\bar{\varepsilon}_{n}^{2}\leq\mathfrak{c_{3}}\frac{d^{p}}{n}\log(d)\leq% \mathfrak{C_{3}}d^{p-\gamma}\log(d)over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_c start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) ≤ fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ), where 𝔠3subscript𝔠3\mathfrak{c_{3}}fraktur_c start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT and 3subscript3\mathfrak{C_{3}}fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT are constants only depending on γ𝛾\gammaitalic_γ and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Similarly, if we let δ=1superscript𝛿1\delta^{\prime}=1italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 and ε~2=𝔠4dpnlog(d)subscript~𝜀2subscript𝔠4superscript𝑑𝑝𝑛𝑑\tilde{\varepsilon}_{2}=\sqrt{\mathfrak{c_{4}}\frac{d^{p}}{n}\log(d)}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG fraktur_c start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) end_ARG, one can further show that VK(𝔠4dpnlog(d),𝒟)𝔠4dplog(d)subscript𝑉𝐾subscript𝔠4superscript𝑑𝑝𝑛𝑑𝒟subscript𝔠4superscript𝑑𝑝𝑑V_{K}\left(\sqrt{\mathfrak{c_{4}}\frac{d^{p}}{n}\log(d)},\mathcal{D}\right)% \geq\mathfrak{c_{4}}d^{p}\log(d)italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( square-root start_ARG fraktur_c start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) end_ARG , caligraphic_D ) ≥ fraktur_c start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( italic_d ), and thus ε¯n2𝔠4dpnlog(d)4dpγlog(d)superscriptsubscript¯𝜀𝑛2subscript𝔠4superscript𝑑𝑝𝑛𝑑subscript4superscript𝑑𝑝𝛾𝑑\bar{\varepsilon}_{n}^{2}\geq\mathfrak{c_{4}}\frac{d^{p}}{n}\log(d)\geq% \mathfrak{C_{4}}d^{p-\gamma}\log(d)over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ fraktur_c start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG roman_log ( italic_d ) ≥ fraktur_C start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ), where 𝔠4subscript𝔠4\mathfrak{c_{4}}fraktur_c start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT and 4subscript4\mathfrak{C_{4}}fraktur_C start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT are constants only depending on γ𝛾\gammaitalic_γ and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proof of Theorem 4.2 (iii)

Suppose that γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ). Let p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋.

We further introduce

ε~12121d(p+2) and ε~22C~(p)dp+1nsuperscriptsubscript~𝜀1212subscript1superscript𝑑𝑝2 and superscriptsubscript~𝜀22~𝐶𝑝superscript𝑑𝑝1𝑛\displaystyle\tilde{\varepsilon}_{1}^{2}\triangleq\frac{1}{2}\mathfrak{C}_{1}d% ^{-(p+2)}\mbox{\quad and\quad}\tilde{\varepsilon}_{2}^{2}\triangleq\tilde{C}(p% )\frac{d^{p+1}}{n}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 2 ) end_POSTSUPERSCRIPT and over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≜ over~ start_ARG italic_C end_ARG ( italic_p ) divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG

where 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on p𝑝pitalic_p given in the Lemma B.2, and C~(p)=log(2)121~𝐶𝑝212subscript1\tilde{C}(p)=\frac{\log(2)}{12}\mathfrak{C}_{1}over~ start_ARG italic_C end_ARG ( italic_p ) = divide start_ARG roman_log ( 2 ) end_ARG start_ARG 12 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on p𝑝pitalic_p.

Suppose further that d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. By Lemma B.2, we have

ε~12superscriptsubscript~𝜀12\displaystyle\tilde{\varepsilon}_{1}^{2}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =121d(p+1)μp+1absent12subscript1superscript𝑑𝑝1subscript𝜇𝑝1\displaystyle=\frac{1}{2}\mathfrak{C}_{1}d^{-(p+1)}\leq\mu_{p+1}= divide start_ARG 1 end_ARG start_ARG 2 end_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT (83)
μp+1<ε~22subscript𝜇𝑝1superscriptsubscript~𝜀22\displaystyle\mu_{p+1}<\tilde{\varepsilon}_{2}^{2}italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT < over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =C~(p)dp+1nC~(p)c1dp+1γ<μpabsent~𝐶𝑝superscript𝑑𝑝1𝑛~𝐶𝑝subscript𝑐1superscript𝑑𝑝1𝛾subscript𝜇𝑝\displaystyle=\tilde{C}(p)\frac{d^{p+1}}{n}\leq\frac{\tilde{C}(p)}{c_{1}}d^{p+% 1-\gamma}<\mu_{p}= over~ start_ARG italic_C end_ARG ( italic_p ) divide start_ARG italic_d start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ≤ divide start_ARG over~ start_ARG italic_C end_ARG ( italic_p ) end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT italic_p + 1 - italic_γ end_POSTSUPERSCRIPT < italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
nε~22𝑛superscriptsubscript~𝜀22\displaystyle n\tilde{\varepsilon}_{2}^{2}italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Definition of 2log(2)12N(d,p+1).Definition of subscript2212𝑁𝑑𝑝1\displaystyle\overset{\text{Definition of }\mathfrak{C}_{2}}{\leq}\frac{\log(2% )}{12}N(d,p+1).start_OVERACCENT Definition of fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG roman_log ( 2 ) end_ARG start_ARG 12 end_ARG italic_N ( italic_d , italic_p + 1 ) .

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have

V2(ε~1,)LemmaA.5K(ε~1)subscript𝑉2subscript~𝜀1𝐿𝑒𝑚𝑚𝑎A.5𝐾subscript~𝜀1absent\displaystyle V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})\overset{Lemma\ref{% lemma_entropy_of_RKHS}}{\geq}K\left(\tilde{\varepsilon}_{1}\right)\geqitalic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≥ end_ARG italic_K ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ 12N(d,p+1)log(μp+1ε~12)12𝑁𝑑𝑝1subscript𝜇𝑝1superscriptsubscript~𝜀12\displaystyle~{}\frac{1}{2}N(d,p+1)\log\left(\frac{\mu_{p+1}}{\tilde{% \varepsilon}_{1}^{2}}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p + 1 ) roman_log ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (84)
Definition of ε~12Definition of superscriptsubscript~𝜀12\displaystyle\overset{\text{Definition of }\tilde{\varepsilon}_{1}^{2}}{\geq}start_OVERACCENT Definition of over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≥ end_ARG log(2)2N(d,p+1).22𝑁𝑑𝑝1\displaystyle~{}\frac{\log(2)}{2}N(d,{p+1}).divide start_ARG roman_log ( 2 ) end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p + 1 ) .

On the other hand, from Lemma B.2 and Lemma 3.4, one can check the following claim:

Claim 3.

Suppose that γ(2p+1,2p+2)𝛾2𝑝12𝑝2\gamma\in(2p+1,2p+2)italic_γ ∈ ( 2 italic_p + 1 , 2 italic_p + 2 ) for some integer p𝑝pitalic_p. For any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

K(2σε~2/6)N(d,p)log(18μpσ2ε~22).𝐾2𝜎subscript~𝜀26𝑁𝑑𝑝18subscript𝜇𝑝superscript𝜎2superscriptsubscript~𝜀22\displaystyle K\left(\sqrt{2}\sigma\tilde{\varepsilon}_{2}/6\right)\leq N(d,p)% \log\left(\frac{18\mu_{p}}{\sigma^{2}\tilde{\varepsilon}_{2}^{2}}\right).italic_K ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 6 ) ≤ italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Therefore, for any d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

VK(ε~2,𝒟)=subscript𝑉𝐾subscript~𝜀2𝒟absent\displaystyle V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})=italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) = V2(2σε~2,)LemmaA.5K(2σε~2/6)subscript𝑉22𝜎subscript~𝜀2𝐿𝑒𝑚𝑚𝑎A.5𝐾2𝜎subscript~𝜀26\displaystyle~{}V_{2}(\sqrt{2}\sigma\tilde{\varepsilon}_{2},\mathcal{B})% \overset{Lemma\ref{lemma_entropy_of_RKHS}}{\leq}K\left(\sqrt{2}\sigma\tilde{% \varepsilon}_{2}/6\right)italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_B ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG italic_K ( square-root start_ARG 2 end_ARG italic_σ over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 6 ) (85)
Claim 3Claim 3\displaystyle\overset{\text{ Claim }\ref{claim_2_inner}}{\leq}overClaim start_ARG ≤ end_ARG N(d,p)log(18μpσ2ε~22)𝑁𝑑𝑝18subscript𝜇𝑝superscript𝜎2superscriptsubscript~𝜀22\displaystyle~{}N(d,{p})\log\left(\frac{18\mu_{p}}{\sigma^{2}\tilde{% \varepsilon}_{2}^{2}}\right)italic_N ( italic_d , italic_p ) roman_log ( divide start_ARG 18 italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
Definition of ε~22Definition of superscriptsubscript~𝜀22\displaystyle\overset{\text{Definition of }\tilde{\varepsilon}_{2}^{2}}{\leq}start_OVERACCENT Definition of over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_OVERACCENT start_ARG ≤ end_ARG N(d,p)log(18[C~(p)]1σ22c2dγ2p1)𝑁𝑑𝑝18superscriptdelimited-[]~𝐶𝑝1superscript𝜎2subscript2subscript𝑐2superscript𝑑𝛾2𝑝1\displaystyle~{}N(d,{p})\log\left(18[\tilde{C}(p)]^{-1}\sigma^{-2}\mathfrak{C}% _{2}c_{2}d^{\gamma-2p-1}\right)italic_N ( italic_d , italic_p ) roman_log ( 18 [ over~ start_ARG italic_C end_ARG ( italic_p ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ - 2 italic_p - 1 end_POSTSUPERSCRIPT )
LemmaB.2𝐿𝑒𝑚𝑚𝑎B.2\displaystyle\overset{Lemma\ref{lemma:inner_mendelson_point_control_assist_2}}% {\leq}start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG nε~22.𝑛superscriptsubscript~𝜀22\displaystyle~{}n\tilde{\varepsilon}_{2}^{2}.italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Combining (83), (84), and (85), we finally have:

VK(ε~2,𝒟)+nε~22+log(2)V2(ε~1,)subscript𝑉𝐾subscript~𝜀2𝒟𝑛superscriptsubscript~𝜀222subscript𝑉2subscript~𝜀1absent\displaystyle\frac{V_{K}(\tilde{\varepsilon}_{2},\mathcal{D})+n\tilde{% \varepsilon}_{2}^{2}+\log(2)}{V_{2}(\tilde{\varepsilon}_{1},\mathcal{B})}\leqdivide start_ARG italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , caligraphic_D ) + italic_n over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log ( 2 ) end_ARG start_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_B ) end_ARG ≤ log(2)4N(d,p+1)log(2)2N(d,p+1)=12,24𝑁𝑑𝑝122𝑁𝑑𝑝112\displaystyle~{}\frac{\frac{\log(2)}{4}N(d,p+1)}{\frac{\log(2)}{2}N(d,p+1)}=% \frac{1}{2},divide start_ARG divide start_ARG roman_log ( 2 ) end_ARG start_ARG 4 end_ARG italic_N ( italic_d , italic_p + 1 ) end_ARG start_ARG divide start_ARG roman_log ( 2 ) end_ARG start_ARG 2 end_ARG italic_N ( italic_d , italic_p + 1 ) end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ,

and from Lemma 4.1, we get

minf^maxf𝔼(𝕏,𝕪)ρfnf^fL2218ε~12,subscript^𝑓subscriptsubscript𝑓subscript𝔼similar-to𝕏𝕪superscriptsubscript𝜌subscript𝑓tensor-productabsent𝑛superscriptsubscriptnorm^𝑓subscript𝑓superscript𝐿2218superscriptsubscript~𝜀12\min_{\hat{f}}\max_{f_{\star}\in\mathcal{B}}\mathbb{E}_{(\mathbb{X},\mathbb{y}% )\sim\rho_{f_{\star}}^{\otimes n}}\left\|\hat{f}-f_{\star}\right\|_{L^{2}}^{2}% \geq\frac{1}{8}\tilde{\varepsilon}_{1}^{2},roman_min start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT ( blackboard_X , blackboard_y ) ∼ italic_ρ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_f end_ARG - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

finishing the proof. \square

Remark C.3.

Suppose that γ(2p+1,2p+2)𝛾2𝑝12𝑝2\gamma\in(2p+1,2p+2)italic_γ ∈ ( 2 italic_p + 1 , 2 italic_p + 2 ) for some integer p𝑝pitalic_p. In (83) and (85), if we let ε~2=3d(p+1)subscript~𝜀2subscript3superscript𝑑𝑝1\tilde{\varepsilon}_{2}=\sqrt{\mathfrak{C_{3}}d^{-(p+1)}}over~ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT end_ARG, we can further show that VK(3d(p+1),𝒟)3nd(p+1)subscript𝑉𝐾subscript3superscript𝑑𝑝1𝒟subscript3𝑛superscript𝑑𝑝1V_{K}\left(\sqrt{\mathfrak{C_{3}}d^{-(p+1)}},\mathcal{D}\right)\leq\mathfrak{C% _{3}}nd^{-(p+1)}italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( square-root start_ARG fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT end_ARG , caligraphic_D ) ≤ fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_n italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT, and thus ε¯n23d(p+1)superscriptsubscript¯𝜀𝑛2subscript3superscript𝑑𝑝1\bar{\varepsilon}_{n}^{2}\leq\mathfrak{C_{3}}d^{-(p+1)}over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT, where 3subscript3\mathfrak{C_{3}}fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT is a constant only depending on γ𝛾\gammaitalic_γ.

C.4 Proof of Theorem 4.3

Let γ>0𝛾0\gamma>0italic_γ > 0 be a fixed real number and p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋. Recall that the empirical eigenvalues λ^isubscript^𝜆𝑖\widehat{\lambda}_{i}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are defined in Definition 6.1. The following lemma shows that there is a gap between two empirical eigenvalues λ^N(p)+1subscript^𝜆𝑁𝑝1\widehat{\lambda}_{N(p)+1}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT and λ^N(p)subscript^𝜆𝑁𝑝\widehat{\lambda}_{N(p)}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT in large dimensions.

Lemma C.4.

Adopt all notations and conditions in Theorem 4.3. Further suppose that γ2,4,6,𝛾246\gamma\neq 2,4,6,\cdotsitalic_γ ≠ 2 , 4 , 6 , ⋯. For any constants 0<c1c2<0subscript𝑐1subscript𝑐20<c_{1}\leq c_{2}<\infty0 < italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ∞ and any δ>0𝛿0\delta>0italic_δ > 0, there exist constants ′′superscript′′\mathfrak{C}^{\prime\prime}fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, δ𝛿\deltaitalic_δ, and γ𝛾\gammaitalic_γ, such that for any d′′𝑑superscript′′d\geq\mathfrak{C}^{\prime\prime}italic_d ≥ fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, when c1dγn<c2dγsubscript𝑐1superscript𝑑𝛾𝑛subscript𝑐2superscript𝑑𝛾c_{1}d^{\gamma}\leq n<c_{2}d^{\gamma}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ≤ italic_n < italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, we have

λ^N(0)+1subscript^𝜆𝑁01\displaystyle\widehat{\lambda}_{N(0)+1}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) + 1 end_POSTSUBSCRIPT <1n,μ0/4<λ^N(0), if γ(0,1]formulae-sequenceabsentsubscript1𝑛formulae-sequencesubscript𝜇04subscript^𝜆𝑁0 if 𝛾01\displaystyle<\frac{\mathfrak{C}_{1}}{n},\quad\mu_{0}/4<\widehat{\lambda}_{N(0% )},\quad\text{ if }\gamma\in(0,1]< divide start_ARG fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG , italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) end_POSTSUBSCRIPT , if italic_γ ∈ ( 0 , 1 ] (86)
λ^N(p)+1subscript^𝜆𝑁𝑝1\displaystyle\widehat{\lambda}_{N(p)+1}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT <4μp+1<μp/4<λ^N(p), if γ>1,formulae-sequenceabsent4subscript𝜇𝑝1subscript𝜇𝑝4subscript^𝜆𝑁𝑝 if 𝛾1\displaystyle<4\mu_{p+1}<\mu_{p}/4<\widehat{\lambda}_{N(p)},\quad\text{ if }% \gamma>1,< 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT , if italic_γ > 1 , (87)

with probability at least 1δ1𝛿1-\delta1 - italic_δ, where N(p)=k=0pN(d,k)𝑁𝑝superscriptsubscript𝑘0𝑝𝑁𝑑𝑘N(p)=\sum_{k=0}^{p}N(d,k)italic_N ( italic_p ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ).

Proof.

Deferred to the end of this subsection.

The proof of Theorem 4.3 is mainly based on the proof of Theorem 6.3. But we have to update Lemma A.2, E.3, and E.4 into following lemmas, respectively.

Lemma C.5 (Proposition A.4 in [47]).

Let μ𝜇\muitalic_μ be a probability measure on 𝒳𝒳\mathcal{X}caligraphic_X, and suppose we have x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\ldots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT sampled i.i.d. from μ𝜇\muitalic_μ. For any M>0𝑀0M>0italic_M > 0, suppose gM:={ggM}𝑔𝑀assignconditional-set𝑔subscriptnorm𝑔𝑀g\in M\mathcal{B}:=\left\{g\in\mathcal{H}\mid\|g\|_{\mathcal{H}}\leq M\right\}italic_g ∈ italic_M caligraphic_B := { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ italic_M }. Then, the following holds with probability at least 1δ11subscript𝛿11-\delta_{1}1 - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :

12gL225M23nln2δ1gn232gL22+5M23nln2δ1.12superscriptsubscriptnorm𝑔superscript𝐿225superscript𝑀23𝑛2subscript𝛿1superscriptsubscriptnorm𝑔𝑛232superscriptsubscriptnorm𝑔superscript𝐿225superscript𝑀23𝑛2subscript𝛿1\displaystyle\frac{1}{2}\|g\|_{L^{2}}^{2}-\frac{5M^{2}}{3n}\ln\frac{2}{\delta_% {1}}\leq\|g\|_{n}^{2}\leq\frac{3}{2}\|g\|_{L^{2}}^{2}+\frac{5M^{2}}{3n}\ln% \frac{2}{\delta_{1}}.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 5 italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 3 italic_n end_ARG roman_ln divide start_ARG 2 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 5 italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 3 italic_n end_ARG roman_ln divide start_ARG 2 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (88)
Lemma C.6.

For any J1𝐽1J\geq 1italic_J ≥ 1, if t1[λ^J+1,λ^J)superscript𝑡1subscript^𝜆𝐽1subscript^𝜆𝐽t^{-1}\in[\widehat{\lambda}_{J+1},\widehat{\lambda}_{J})italic_t start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∈ [ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ), then we have

𝐁t21t2λ^J+λ^J+1.superscriptsubscript𝐁𝑡21superscript𝑡2subscript^𝜆𝐽subscript^𝜆𝐽1\displaystyle\mathbf{B}_{t}^{2}\leq\frac{1}{t^{2}\widehat{\lambda}_{J}}+% \widehat{\lambda}_{J+1}.bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT . (89)
Proof.

Deferred to the end of this subsection.

Lemma C.7.

For any δ2>0subscript𝛿20\delta_{2}>0italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 and any J1𝐽1J\geq 1italic_J ≥ 1, if t1[λ^J+1,λ^J)superscript𝑡1subscript^𝜆𝐽1subscript^𝜆𝐽t^{-1}\in[\widehat{\lambda}_{J+1},\widehat{\lambda}_{J})italic_t start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∈ [ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ), then we have

𝐕t2σ2t2n(Jt2+λ^J+1)+δ2,subscript𝐕𝑡2superscript𝜎2superscript𝑡2𝑛𝐽superscript𝑡2subscript^𝜆𝐽1subscript𝛿2\displaystyle\mathbf{V}_{t}\leq 2\sigma^{2}\frac{t^{2}}{n}\left(\frac{J}{t^{2}% }+\widehat{\lambda}_{J+1}\right)+\delta_{2},bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (90)

with probability at least 1exp(Cmin{nδ22,n2δ224t2(Jt2+λ^J+1)})1𝐶𝑛subscript𝛿22superscript𝑛2superscriptsubscript𝛿224superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽11-\exp\left(-C\min\left\{\frac{n\delta_{2}}{2},\frac{n^{2}\delta_{2}^{2}}{4t^{% 2}\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right)}\right\}\right)1 - roman_exp ( - italic_C roman_min { divide start_ARG italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) end_ARG } ).

Proof.

Deferred to the end of this subsection.

Now let’s begin to prove Theorem 4.3. The proof will be divided into three parts:

  • (i)

    γ{2,4,6,}𝛾246\gamma\in\{2,4,6,\cdots\}italic_γ ∈ { 2 , 4 , 6 , ⋯ },

  • (ii)

    γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ],

  • (iii)

    γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ).

Proof of Theorem 4.3 (i)

This is a direct corollary of Theorem 3.3.

Proof of Theorem 4.3 (ii)

Suppose that γj=0(2j,2j+1]𝛾superscriptsubscript𝑗02𝑗2𝑗1\gamma\in\bigcup_{j=0}^{\infty}(2j,2j+1]italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j , 2 italic_j + 1 ] be a real number. Let p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋.

For any given δ>0𝛿0\delta>0italic_δ > 0, let d=max{,′′}𝑑superscriptsuperscript′′d\geq\mathfrak{C}=\max\{\mathfrak{C}^{\prime},\mathfrak{C}^{\prime\prime}\}italic_d ≥ fraktur_C = roman_max { fraktur_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT }, where superscript\mathfrak{C}^{\prime}fraktur_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the constant (only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ𝛾\gammaitalic_γ ) introduced in Theorem 3.3 and ′′superscript′′\mathfrak{C}^{\prime\prime}fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is the constant (only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, γ𝛾\gammaitalic_γ and δ𝛿\deltaitalic_δ ) introduced in Lemma C.4.

Note that Theorem 3.3, Lemma A.3, and Lemma B.4 imply that

4μp+11n1/2T^1=ε^n22n1/2μp/44subscript𝜇𝑝1subscript1superscript𝑛12superscript^𝑇1superscriptsubscript^𝜀𝑛2subscript2superscript𝑛12subscript𝜇𝑝44\mu_{p+1}\leq\mathfrak{C}_{1}n^{-1/2}\leq\widehat{T}^{-1}=\widehat{% \varepsilon}_{n}^{2}\leq\mathfrak{C}_{2}n^{-1/2}\leq\mu_{p}/44 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4

holds with probability at least 13exp{4n1/2}1subscript3subscript4superscript𝑛121-\mathfrak{C}_{3}\exp\left\{-\mathfrak{C}_{4}n^{1/2}\right\}1 - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT } and Lemma C.4 implies that

λ^J+1<4μp+1<μp/4<λ^Jsubscript^𝜆𝐽14subscript𝜇𝑝1subscript𝜇𝑝4subscript^𝜆𝐽\displaystyle\widehat{\lambda}_{J+1}<4\mu_{p+1}<\mu_{p}/4<\widehat{\lambda}_{J}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT < 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT

holds with probability at least 1δ1𝛿1-\delta1 - italic_δ where J=N(p)𝐽𝑁𝑝J=N(p)italic_J = italic_N ( italic_p ). Thus, we know that λ^J+1<4μp+1T^1μp/4<λ^Jsubscript^𝜆𝐽14subscript𝜇𝑝1superscript^𝑇1subscript𝜇𝑝4subscript^𝜆𝐽\widehat{\lambda}_{J+1}<4\mu_{p+1}\leq\widehat{T}^{-1}\leq\mu_{p}/4<\widehat{% \lambda}_{J}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT < 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT with probability at least 1δ3exp{4n1/2}1𝛿subscript3subscript4superscript𝑛121-\delta-\mathfrak{C}_{3}\exp\left\{-\mathfrak{C}_{4}n^{1/2}\right\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT }.

Let δ2=23dpγlog(d)subscript𝛿22subscript3superscript𝑑𝑝𝛾𝑑\delta_{2}=2\mathfrak{C_{3}}d^{p-\gamma}\log(d)italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ), where 3subscript3\mathfrak{C_{3}}fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT given in Remark C.2 is a constant only depending on γ𝛾\gammaitalic_γ and c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Conditioning on the event Ω={λ^J+1<4μp+1T^1μp/4<λ^J}Ωsubscript^𝜆𝐽14subscript𝜇𝑝1superscript^𝑇1subscript𝜇𝑝4subscript^𝜆𝐽\Omega=\left\{\widehat{\lambda}_{J+1}<4\mu_{p+1}\leq\widehat{T}^{-1}\leq\mu_{p% }/4<\widehat{\lambda}_{J}\right\}roman_Ω = { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT < 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT }, we have

fT^𝚒𝚗fn2superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓𝑛2\displaystyle\left\|f_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{n}^{2}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝐁T^2+𝐕T^absentsuperscriptsubscript𝐁^𝑇2subscript𝐕^𝑇\displaystyle\leq\mathbf{B}_{\widehat{T}}^{2}+\mathbf{V}_{\widehat{T}}≤ bold_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + bold_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT
1T^2λ^J+λ^J+1+2σ2Jn+2σ2T^2λ^J+1n+δ2absent1superscript^𝑇2subscript^𝜆𝐽subscript^𝜆𝐽12superscript𝜎2𝐽𝑛2superscript𝜎2superscript^𝑇2subscript^𝜆𝐽1𝑛subscript𝛿2\displaystyle\leq\frac{1}{\widehat{T}^{2}\widehat{\lambda}_{J}}+\widehat{% \lambda}_{J+1}+2\sigma^{2}\frac{J}{n}+2\sigma^{2}\frac{\widehat{T}^{2}\widehat% {\lambda}_{J+1}}{n}+\delta_{2}≤ divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_J end_ARG start_ARG italic_n end_ARG + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
1nμp+μp+1+2σ2Jn+2σ2μp+1+δ2absent1𝑛subscript𝜇𝑝subscript𝜇𝑝12superscript𝜎2𝐽𝑛2superscript𝜎2subscript𝜇𝑝1subscript𝛿2\displaystyle\leq\frac{1}{n\mu_{p}}+\mu_{p+1}+2\sigma^{2}\frac{J}{n}+2\sigma^{% 2}\mu_{p+1}+\delta_{2}≤ divide start_ARG 1 end_ARG start_ARG italic_n italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG + italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_J end_ARG start_ARG italic_n end_ARG + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
4(dpγ+dp1+2σ2dpγ+2σ2dp1)+δ2absentsubscript4superscript𝑑𝑝𝛾superscript𝑑𝑝12superscript𝜎2superscript𝑑𝑝𝛾2superscript𝜎2superscript𝑑𝑝1subscript𝛿2\displaystyle\leq\mathfrak{C}_{4}\left(d^{p-\gamma}+d^{-p-1}+2\sigma^{2}d^{p-% \gamma}+2\sigma^{2}d^{-p-1}\right)+\delta_{2}≤ fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT + italic_d start_POSTSUPERSCRIPT - italic_p - 1 end_POSTSUPERSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT - italic_p - 1 end_POSTSUPERSCRIPT ) + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
32δ2,absent32subscript𝛿2\displaystyle\leq\frac{3}{2}\delta_{2},≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

holds with probability at least 12exp(3dplog(d))1subscript2subscript3superscript𝑑𝑝𝑑1-\mathfrak{C}_{2}\exp\left(-\mathfrak{C}_{3}d^{p}\log(d)\right)1 - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp ( - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( italic_d ) ) where the second inequality follows from Lemma C.6 and Lemma C.7 and the second last inequality follows from Lemma B.2 with a constant 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Let ¯={f¯1,,f¯N}¯subscript¯𝑓1subscript¯𝑓𝑁\bar{\mathcal{F}}=\{\bar{f}_{1},\cdots,\bar{f}_{N}\}over¯ start_ARG caligraphic_F end_ARG = { over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } be a 32σε¯n32𝜎subscript¯𝜀𝑛3\sqrt{2}\sigma\bar{\varepsilon}_{n}3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT-net of :=3{ggn232δ2}assignsuperscript3conditional-set𝑔superscriptsubscriptnorm𝑔𝑛232subscript𝛿2\mathcal{B}^{\prime}:=3\mathcal{B}\cap\{g\in\mathcal{H}\mid\|g\|_{n}^{2}\leq% \frac{3}{2}\delta_{2}\}caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := 3 caligraphic_B ∩ { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. By Definition 6.6 and Lemma A.8, the 32σε¯n32𝜎subscript¯𝜀𝑛3\sqrt{2}\sigma\bar{\varepsilon}_{n}3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT covering-entropy of 333\mathcal{B}3 caligraphic_B is

V2(32σε¯n,3)=V2(2σε¯n,)=VK(ε¯n,𝒟)=nε¯n2.subscript𝑉232𝜎subscript¯𝜀𝑛3subscript𝑉22𝜎subscript¯𝜀𝑛subscript𝑉𝐾subscript¯𝜀𝑛𝒟𝑛superscriptsubscript¯𝜀𝑛2\displaystyle V_{2}(3\sqrt{2}\sigma\bar{\varepsilon}_{n},3\mathcal{B})=V_{2}(% \sqrt{2}\sigma\bar{\varepsilon}_{n},\mathcal{B})=V_{K}(\bar{\varepsilon}_{n},% \mathcal{D})=n\bar{\varepsilon}_{n}^{2}.italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 3 caligraphic_B ) = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) = italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (91)

Thus, we have logNnε¯n2nδ2/2𝑁𝑛superscriptsubscript¯𝜀𝑛2𝑛subscript𝛿22\log N\leq n\bar{\varepsilon}_{n}^{2}\leq n\delta_{2}/2roman_log italic_N ≤ italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 (Remark C.2).

Denote another event Ω1={ωf¯jL22/215δ2f¯jn23f¯jL22/2+15δ2, 1jN}subscriptΩ1conditional-set𝜔formulae-sequencesuperscriptsubscriptnormsubscript¯𝑓𝑗superscript𝐿22215subscript𝛿2superscriptsubscriptnormsubscript¯𝑓𝑗𝑛23superscriptsubscriptnormsubscript¯𝑓𝑗superscript𝐿22215subscript𝛿21𝑗𝑁\Omega_{1}=\{\omega\mid\|\bar{f}_{j}\|_{L^{2}}^{2}/2-15\delta_{2}\leq\|\bar{f}% _{j}\|_{n}^{2}\leq 3\|\bar{f}_{j}\|_{L^{2}}^{2}/2+15\delta_{2},\ 1\leq j\leq N\}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_ω ∣ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 - 15 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 3 ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 + 15 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 ≤ italic_j ≤ italic_N }. Applying Lemma C.5 with M=3𝑀3M=3italic_M = 3 and δ1=2exp{nδ2}subscript𝛿12𝑛subscript𝛿2\delta_{1}=2\exp\{-n\delta_{2}\}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 roman_exp { - italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, we have

(Ω1)12Nexp{nδ2}12exp{nδ2/2}subscriptΩ112𝑁𝑛subscript𝛿212𝑛subscript𝛿22\mathbb{P}(\Omega_{1})\geq 1-2N\exp\{-n\delta_{2}\}\geq 1-2\exp\{-n\delta_{2}/2\}blackboard_P ( roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ 1 - 2 italic_N roman_exp { - italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } ≥ 1 - 2 roman_exp { - italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 }

Conditioning on the event ΩΩ1ΩsubscriptΩ1\Omega\cap\Omega_{1}roman_Ω ∩ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, for any f:=3{ggn232δ2}𝑓superscriptassign3conditional-set𝑔superscriptsubscriptnorm𝑔𝑛232subscript𝛿2f\in\mathcal{B}^{\prime}:=3\mathcal{B}\cap\{g\in\mathcal{H}\mid\|g\|_{n}^{2}% \leq\frac{3}{2}\delta_{2}\}italic_f ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := 3 caligraphic_B ∩ { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, we have

fL2subscriptnorm𝑓superscript𝐿2\displaystyle\|f\|_{L^{2}}∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT f¯jL2+ff¯jL22f¯jn2+30δ2+32σε¯nabsentsubscriptnormsubscript¯𝑓𝑗superscript𝐿2subscriptnorm𝑓subscript¯𝑓𝑗superscript𝐿22superscriptsubscriptnormsubscript¯𝑓𝑗𝑛230subscript𝛿232𝜎subscript¯𝜀𝑛\displaystyle\leq\|\bar{f}_{j}\|_{L^{2}}+\|f-\bar{f}_{j}\|_{L^{2}}\leq\sqrt{2% \|\bar{f}_{j}\|_{n}^{2}+30\delta_{2}}+3\sqrt{2}\sigma\bar{\varepsilon}_{n}≤ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ∥ italic_f - over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ square-root start_ARG 2 ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 30 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + 3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (92)
3δ2+30δ2+32σδ2/2=(33+3σ)δ2.absent3subscript𝛿230subscript𝛿232𝜎subscript𝛿22333𝜎subscript𝛿2\displaystyle\leq\sqrt{3\delta_{2}+30\delta_{2}}+3\sqrt{2}\sigma\sqrt{\delta_{% 2}/2}=(\sqrt{33}+3\sigma)\sqrt{\delta_{2}}.≤ square-root start_ARG 3 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 30 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + 3 square-root start_ARG 2 end_ARG italic_σ square-root start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 end_ARG = ( square-root start_ARG 33 end_ARG + 3 italic_σ ) square-root start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .

Since fT^𝚒𝚗fsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscriptf_{\widehat{T}}^{\mathtt{in}}-f_{\star}\in\mathcal{B}^{\prime}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we have

fT^𝚒𝚗fL224dpγlog(d),subscriptsuperscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓2superscript𝐿2subscript4superscript𝑑𝑝𝛾𝑑\displaystyle\|f_{\widehat{T}}^{\mathtt{in}}-f_{\star}\|^{2}_{L^{2}}{\leq}% \mathfrak{C_{4}}d^{p-\gamma}\log(d),∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT roman_log ( italic_d ) ,

holds with probability at least 1δ2exp{3dplog(d)}1𝛿subscript2subscript3superscript𝑑𝑝𝑑1-\delta-\mathfrak{C}_{2}\exp\{-\mathfrak{C_{3}}d^{p}\log(d)\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_log ( italic_d ) }, where 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT are constants only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. \square

Proof of Theorem 4.3 (iii)

Suppose that γj=0(2j+1,2j+2)𝛾superscriptsubscript𝑗02𝑗12𝑗2\gamma\in\bigcup_{j=0}^{\infty}(2j+1,2j+2)italic_γ ∈ ⋃ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 2 italic_j + 1 , 2 italic_j + 2 ) be a real number. Let p=γ/2𝑝𝛾2p=\lfloor\gamma/2\rflooritalic_p = ⌊ italic_γ / 2 ⌋.

Similar to the above, we can show that λ^J+1<4μp+1T^1μp/4<λ^Jsubscript^𝜆𝐽14subscript𝜇𝑝1superscript^𝑇1subscript𝜇𝑝4subscript^𝜆𝐽\widehat{\lambda}_{J+1}<4\mu_{p+1}\leq\widehat{T}^{-1}\leq\mu_{p}/4<\widehat{% \lambda}_{J}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT < 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT holds with probability at least 1δ3exp{4n1/2}1𝛿subscript3subscript4superscript𝑛121-\delta-\mathfrak{C}_{3}\exp\left\{-\mathfrak{C}_{4}n^{1/2}\right\}1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_exp { - fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT } where J=N(p)𝐽𝑁𝑝J=N(p)italic_J = italic_N ( italic_p ).

Let δ2=23d(p+1)subscript𝛿22subscript3superscript𝑑𝑝1\delta_{2}=2\mathfrak{C_{3}}d^{-(p+1)}italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT, where 3subscript3\mathfrak{C_{3}}fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT given in Remark C.3 is a constant only depending on γ𝛾\gammaitalic_γ. Conditioning on the event Ω={λ^J+1<4μp+1T^1μp/4<λ^J}Ωsubscript^𝜆𝐽14subscript𝜇𝑝1superscript^𝑇1subscript𝜇𝑝4subscript^𝜆𝐽\Omega=\left\{\widehat{\lambda}_{J+1}<4\mu_{p+1}\leq\widehat{T}^{-1}\leq\mu_{p% }/4<\widehat{\lambda}_{J}\right\}roman_Ω = { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT < 4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT }, we have

fT^𝚒𝚗fn2superscriptsubscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓𝑛2\displaystyle\left\|f_{\widehat{T}}^{\mathtt{in}}-f_{\star}\right\|_{n}^{2}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝐁T^2+𝐕T^absentsuperscriptsubscript𝐁^𝑇2subscript𝐕^𝑇\displaystyle\leq\mathbf{B}_{\widehat{T}}^{2}+\mathbf{V}_{\widehat{T}}≤ bold_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + bold_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT
1T^2λ^J+λ^J+1+2σ2Jn+2σ2T^2λ^J+1n+δ2absent1superscript^𝑇2subscript^𝜆𝐽subscript^𝜆𝐽12superscript𝜎2𝐽𝑛2superscript𝜎2superscript^𝑇2subscript^𝜆𝐽1𝑛subscript𝛿2\displaystyle\leq\frac{1}{\widehat{T}^{2}\widehat{\lambda}_{J}}+\widehat{% \lambda}_{J+1}+2\sigma^{2}\frac{J}{n}+2\sigma^{2}\frac{\widehat{T}^{2}\widehat% {\lambda}_{J+1}}{n}+\delta_{2}≤ divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_J end_ARG start_ARG italic_n end_ARG + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
1nμp+μp+1+2σ2Jn+2σ2μp+1+δ2absent1𝑛subscript𝜇𝑝subscript𝜇𝑝12superscript𝜎2𝐽𝑛2superscript𝜎2subscript𝜇𝑝1subscript𝛿2\displaystyle\leq\frac{1}{n\mu_{p}}+\mu_{p+1}+2\sigma^{2}\frac{J}{n}+2\sigma^{% 2}\mu_{p+1}+\delta_{2}≤ divide start_ARG 1 end_ARG start_ARG italic_n italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG + italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_J end_ARG start_ARG italic_n end_ARG + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
4(dpγ+dp1+2σ2dpγ+2σ2dp1)+δ2absentsubscript4superscript𝑑𝑝𝛾superscript𝑑𝑝12superscript𝜎2superscript𝑑𝑝𝛾2superscript𝜎2superscript𝑑𝑝1subscript𝛿2\displaystyle\leq\mathfrak{C}_{4}\left(d^{p-\gamma}+d^{-p-1}+2\sigma^{2}d^{p-% \gamma}+2\sigma^{2}d^{-p-1}\right)+\delta_{2}≤ fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT + italic_d start_POSTSUPERSCRIPT - italic_p - 1 end_POSTSUPERSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_p - italic_γ end_POSTSUPERSCRIPT + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT - italic_p - 1 end_POSTSUPERSCRIPT ) + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
5δ2,absentsubscript5subscript𝛿2\displaystyle\leq\mathfrak{C}_{5}\delta_{2},≤ fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

holds with probability at least 12exp(3d(p+1))1subscript2subscript3superscript𝑑𝑝11-\mathfrak{C}_{2}\exp\left(-\mathfrak{C_{3}}d^{-(p+1)}\right)1 - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp ( - fraktur_C start_POSTSUBSCRIPT fraktur_3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT ) where the second inequality follows from Lemma C.6 and Lemma C.7, the second last inequality follows from Lemma B.2 with a constant 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and 5=4(1+2σ2)/(23)+2subscript5subscript412superscript𝜎22subscript32\mathfrak{C}_{5}=\mathfrak{C}_{4}(1+2\sigma^{2})/(2\mathfrak{C}_{3})+2fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( 1 + 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ( 2 fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) + 2.

Let ¯={f¯1,,f¯N}¯subscript¯𝑓1subscript¯𝑓𝑁\bar{\mathcal{F}}=\{\bar{f}_{1},\cdots,\bar{f}_{N}\}over¯ start_ARG caligraphic_F end_ARG = { over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } be a 32σε¯n32𝜎subscript¯𝜀𝑛3\sqrt{2}\sigma\bar{\varepsilon}_{n}3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT-net of :=3{ggn25δ2}assignsuperscript3conditional-set𝑔superscriptsubscriptnorm𝑔𝑛2subscript5subscript𝛿2\mathcal{B}^{\prime}:=3\mathcal{B}\cap\{g\in\mathcal{H}\mid\|g\|_{n}^{2}\leq% \mathfrak{C}_{5}\delta_{2}\}caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := 3 caligraphic_B ∩ { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. By Definition 6.6 and Lemma A.8, the 32σε¯n32𝜎subscript¯𝜀𝑛3\sqrt{2}\sigma\bar{\varepsilon}_{n}3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT covering-entropy of 333\mathcal{B}3 caligraphic_B is

V2(32σε¯n,3)=V2(2σε¯n,)=VK(ε¯n,𝒟)=nε¯n2.subscript𝑉232𝜎subscript¯𝜀𝑛3subscript𝑉22𝜎subscript¯𝜀𝑛subscript𝑉𝐾subscript¯𝜀𝑛𝒟𝑛superscriptsubscript¯𝜀𝑛2\displaystyle V_{2}(3\sqrt{2}\sigma\bar{\varepsilon}_{n},3\mathcal{B})=V_{2}(% \sqrt{2}\sigma\bar{\varepsilon}_{n},\mathcal{B})=V_{K}(\bar{\varepsilon}_{n},% \mathcal{D})=n\bar{\varepsilon}_{n}^{2}.italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 3 caligraphic_B ) = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_B ) = italic_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_D ) = italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (93)

Thus, we have logNnε¯n2nδ2/2𝑁𝑛superscriptsubscript¯𝜀𝑛2𝑛subscript𝛿22\log N\leq n\bar{\varepsilon}_{n}^{2}\leq n\delta_{2}/2roman_log italic_N ≤ italic_n over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 (Remark C.3).

Denote the event Ω2={ωf¯jL22/2155δ2f¯jn23f¯jL22/2+155δ2, 1jN}subscriptΩ2conditional-set𝜔formulae-sequencesuperscriptsubscriptnormsubscript¯𝑓𝑗superscript𝐿22215subscript5subscript𝛿2superscriptsubscriptnormsubscript¯𝑓𝑗𝑛23superscriptsubscriptnormsubscript¯𝑓𝑗superscript𝐿22215subscript5subscript𝛿21𝑗𝑁\Omega_{2}=\{\omega\mid\|\bar{f}_{j}\|_{L^{2}}^{2}/2-15\mathfrak{C}_{5}\delta_% {2}\leq\|\bar{f}_{j}\|_{n}^{2}\leq 3\|\bar{f}_{j}\|_{L^{2}}^{2}/2+15\mathfrak{% C}_{5}\delta_{2},\ 1\leq j\leq N\}roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_ω ∣ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 - 15 fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 3 ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 + 15 fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 ≤ italic_j ≤ italic_N }. Applying Lemma C.5 with M=3𝑀3M=3italic_M = 3 and δ1=2exp{δ2}subscript𝛿12subscript𝛿2\delta_{1}=2\exp\{-\delta_{2}\}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 roman_exp { - italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, we have

(Ω2)12Nexp{nδ2}Remark C.312exp{nδ2/2}.subscriptΩ212𝑁𝑛subscript𝛿2Remark C.312𝑛subscript𝛿22\mathbb{P}(\Omega_{2})\geq 1-2N\exp\{-n\delta_{2}\}\overset{\text{Remark }\ref% {remark_control_metric_case_3_inner}}{\geq}1-2\exp\{-n\delta_{2}/2\}.blackboard_P ( roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ 1 - 2 italic_N roman_exp { - italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } overRemark start_ARG ≥ end_ARG 1 - 2 roman_exp { - italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 } .

Conditioning on the event ΩΩ2ΩsubscriptΩ2\Omega\cap\Omega_{2}roman_Ω ∩ roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, for any f=3{ggn25δ2}𝑓superscript3conditional-set𝑔superscriptsubscriptnorm𝑔𝑛2subscript5subscript𝛿2f\in\mathcal{B}^{\prime}=3\mathcal{B}\cap\{g\in\mathcal{H}\mid\|g\|_{n}^{2}% \leq\mathfrak{C}_{5}\delta_{2}\}italic_f ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 3 caligraphic_B ∩ { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, we have

fL2subscriptnorm𝑓superscript𝐿2\displaystyle\|f\|_{L^{2}}∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT f¯jL2+ff¯jL22f¯jn2+30δ2+32σε¯nabsentsubscriptnormsubscript¯𝑓𝑗superscript𝐿2subscriptnorm𝑓subscript¯𝑓𝑗superscript𝐿22superscriptsubscriptnormsubscript¯𝑓𝑗𝑛230subscript𝛿232𝜎subscript¯𝜀𝑛\displaystyle\leq\|\bar{f}_{j}\|_{L^{2}}+\|f-\bar{f}_{j}\|_{L^{2}}\leq\sqrt{2% \|\bar{f}_{j}\|_{n}^{2}+30\delta_{2}}+3\sqrt{2}\sigma\bar{\varepsilon}_{n}≤ ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ∥ italic_f - over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ square-root start_ARG 2 ∥ over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 30 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + 3 square-root start_ARG 2 end_ARG italic_σ over¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (94)
25δ2+30δ2+32σδ2/2.absent2subscript5subscript𝛿230subscript𝛿232𝜎subscript𝛿22\displaystyle\leq\sqrt{2\mathfrak{C}_{5}\delta_{2}+30\delta_{2}}+3\sqrt{2}% \sigma\sqrt{\delta_{2}/2}.≤ square-root start_ARG 2 fraktur_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 30 italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + 3 square-root start_ARG 2 end_ARG italic_σ square-root start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 end_ARG .

Since fT^𝚒𝚗fsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓superscriptf_{\widehat{T}}^{\mathtt{in}}-f_{\star}\in\mathcal{B}^{\prime}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we have

fT^𝚒𝚗fL224d(p+1),subscriptsuperscriptnormsuperscriptsubscript𝑓^𝑇𝚒𝚗subscript𝑓2superscript𝐿2subscript4superscript𝑑𝑝1\displaystyle\|f_{\widehat{T}}^{\mathtt{in}}-f_{\star}\|^{2}_{L^{2}}{\leq}% \mathfrak{C_{4}}d^{-(p+1)},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT fraktur_4 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( italic_p + 1 ) end_POSTSUPERSCRIPT ,

holds with probability at least 1δ2exp(3dγ(p+1))1𝛿subscript2subscript3superscript𝑑𝛾𝑝11-\delta-\mathfrak{C}_{2}\exp\left(-\mathfrak{C}_{3}d^{\gamma-(p+1)}\right)1 - italic_δ - fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp ( - fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_γ - ( italic_p + 1 ) end_POSTSUPERSCRIPT ), where 2subscript2\mathfrak{C}_{2}fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and 4subscript4\mathfrak{C}_{4}fraktur_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT are constants only depending on γ𝛾\gammaitalic_γ, c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. \square

Proof of Lemma C.4: First, consider the case γ>1𝛾1\gamma>1italic_γ > 1, and let’s prove (87). From Mercer’s decomposition, we have the following decomposition:

1nK(𝑿,𝑿)1𝑛𝐾𝑿𝑿\displaystyle\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) =1n𝒀p+1𝑫p+1𝒀p+1τ+1nk=2𝒀p+k𝑫p+k𝒀p+kτabsent1𝑛subscript𝒀absent𝑝1subscript𝑫absent𝑝1superscriptsubscript𝒀absent𝑝1𝜏1𝑛superscriptsubscript𝑘2subscript𝒀𝑝𝑘subscript𝑫𝑝𝑘superscriptsubscript𝒀𝑝𝑘𝜏\displaystyle=\frac{1}{n}\boldsymbol{Y}_{\leq p+1}\boldsymbol{D}_{\leq p+1}% \boldsymbol{Y}_{\leq p+1}^{\tau}+\frac{1}{n}\sum_{k=2}^{\infty}\boldsymbol{Y}_% {p+k}\boldsymbol{D}_{p+k}\boldsymbol{Y}_{p+k}^{\tau}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT (95)
=Kmain+Kresidual,absentsubscript𝐾mainsubscript𝐾residual\displaystyle=K_{\text{main}}+K_{\text{residual}},= italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT ,

where Yq,j()subscript𝑌𝑞𝑗Y_{q,j}(\cdot)italic_Y start_POSTSUBSCRIPT italic_q , italic_j end_POSTSUBSCRIPT ( ⋅ ) for j=1,,N(d,q)𝑗1𝑁𝑑𝑞j=1,\cdots,N(d,q)italic_j = 1 , ⋯ , italic_N ( italic_d , italic_q ) are spherical harmonic polynomials of degree q{0,1,2,}𝑞012q\in\{0,1,2,\cdots\}italic_q ∈ { 0 , 1 , 2 , ⋯ }, 𝒀q=(Yql(𝒙i))i[n],l[N(d,q)]n×N(d,q)subscript𝒀𝑞subscriptsubscript𝑌𝑞𝑙subscript𝒙𝑖formulae-sequence𝑖delimited-[]𝑛𝑙delimited-[]𝑁𝑑𝑞superscript𝑛𝑁𝑑𝑞\boldsymbol{Y}_{q}=\left(Y_{ql}\left(\boldsymbol{x}_{i}\right)\right)_{i\in[n]% ,l\in[N(d,q)]}\in\mathbb{R}^{n\times N(d,q)}bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = ( italic_Y start_POSTSUBSCRIPT italic_q italic_l end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i ∈ [ italic_n ] , italic_l ∈ [ italic_N ( italic_d , italic_q ) ] end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_N ( italic_d , italic_q ) end_POSTSUPERSCRIPT,
𝒀p+1=(𝒀0,,𝒀p+1)n×N(p+1)subscript𝒀absent𝑝1subscript𝒀0subscript𝒀𝑝1superscript𝑛𝑁𝑝1\boldsymbol{Y}_{\leq p+1}=\left(\boldsymbol{Y}_{0},\ldots,\boldsymbol{Y}_{p+1}% \right)\in\mathbb{R}^{n\times N(p+1)}bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT = ( bold_italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , bold_italic_Y start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_N ( italic_p + 1 ) end_POSTSUPERSCRIPT, 𝑫p+1=diag(μ0𝐈N(d,0),,μp+1𝐈N(d,p+1))subscript𝑫absent𝑝1diagsubscript𝜇0subscript𝐈𝑁𝑑0subscript𝜇𝑝1subscript𝐈𝑁𝑑𝑝1\boldsymbol{D}_{\leq p+1}=\text{diag}(\mu_{0}\mathbf{I}_{N(d,0)},\cdots,\mu_{p% +1}\mathbf{I}_{N(d,p+1)})bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT = diag ( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_N ( italic_d , 0 ) end_POSTSUBSCRIPT , ⋯ , italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_N ( italic_d , italic_p + 1 ) end_POSTSUBSCRIPT ), and 𝑫p+k=μp+k𝐈N(d,p+k)subscript𝑫𝑝𝑘subscript𝜇𝑝𝑘subscript𝐈𝑁𝑑𝑝𝑘\boldsymbol{D}_{p+k}=\mu_{p+k}\mathbf{I}_{N(d,p+k)}bold_italic_D start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_N ( italic_d , italic_p + italic_k ) end_POSTSUBSCRIPT.

We replicate some results from [30] and [81].

Proposition C.8 (Lemma 11 in [30]).

For any fixed integer q0𝑞0q\geq 0italic_q ≥ 0, let N(q)=k=0qN(d,k)𝟏{μk>0}𝑁𝑞superscriptsubscript𝑘0𝑞𝑁𝑑𝑘1subscript𝜇𝑘0N(q)=\sum_{k=0}^{q}N(d,k)\mathbf{1}\{\mu_{k}>0\}italic_N ( italic_q ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) bold_1 { italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 } be defined as in Lemma C.4. Then, when nN(q)log(N(q))much-greater-than𝑛𝑁𝑞𝑁𝑞n\gg N(q)\log(N(q))italic_n ≫ italic_N ( italic_q ) roman_log ( italic_N ( italic_q ) ), we have

𝒀qτ𝒀qn=𝐈N(q)+𝚫q,superscriptsubscript𝒀absent𝑞𝜏subscript𝒀absent𝑞𝑛subscript𝐈𝑁𝑞subscript𝚫absent𝑞\frac{\boldsymbol{Y}_{\leq q}^{\tau}\boldsymbol{Y}_{\leq q}}{n}=\mathbf{I}_{N(% q)}+\boldsymbol{\Delta}_{\leq q},divide start_ARG bold_italic_Y start_POSTSUBSCRIPT ≤ italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_Y start_POSTSUBSCRIPT ≤ italic_q end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG = bold_I start_POSTSUBSCRIPT italic_N ( italic_q ) end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT ≤ italic_q end_POSTSUBSCRIPT ,

where 𝔼[𝚫qop]=od(1)𝔼delimited-[]subscriptnormsubscript𝚫absent𝑞opsubscript𝑜𝑑1\mathbb{E}\left[\|\boldsymbol{\Delta}_{\leq q}\|_{\mathrm{op}}\right]=o_{d}(1)blackboard_E [ ∥ bold_Δ start_POSTSUBSCRIPT ≤ italic_q end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 ).

Proposition C.9 (Equation (67) and (72) in [30]).

For any fixed integer q𝑞qitalic_q, there exist constants 0subscript0\mathfrak{C}_{0}fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and \mathfrak{C}fraktur_C only depending on q𝑞qitalic_q, such that for any n,d𝑛𝑑n,d\geq\mathfrak{C}italic_n , italic_d ≥ fraktur_C, we have

𝔼[1N(d,q)𝒀q𝒀qτ𝐈nop]0{n1/4ndq+(v=24(ndq)v)1/4}.𝔼delimited-[]subscriptnorm1𝑁𝑑𝑞subscript𝒀𝑞superscriptsubscript𝒀𝑞𝜏subscript𝐈𝑛opsubscript0superscript𝑛14𝑛superscript𝑑𝑞superscriptsuperscriptsubscript𝑣24superscript𝑛superscript𝑑𝑞𝑣14\displaystyle\mathbb{E}\left[\left\|\frac{1}{N(d,q)}\boldsymbol{Y}_{q}% \boldsymbol{Y}_{q}^{\tau}-\mathbf{I}_{n}\right\|_{\mathrm{op}}\right]\leq% \mathfrak{C}_{0}\left\{n^{1/4}\sqrt{\frac{n}{d^{q}}}+\left(\sum_{v=2}^{4}\left% (\frac{n}{d^{q}}\right)^{v}\right)^{1/4}\right\}.blackboard_E [ ∥ divide start_ARG 1 end_ARG start_ARG italic_N ( italic_d , italic_q ) end_ARG bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT - bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] ≤ fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { italic_n start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_ARG end_ARG + ( ∑ start_POSTSUBSCRIPT italic_v = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT } . (96)
Proposition C.10 (Proposition 3 in [30]).

If ndqδ1much-less-than𝑛superscript𝑑𝑞subscript𝛿1n\ll d^{q-\delta_{1}}italic_n ≪ italic_d start_POSTSUPERSCRIPT italic_q - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for a fixed integer q𝑞qitalic_q and a fixed constant δ1>0subscript𝛿10\delta_{1}>0italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0, then we have

limd,n𝔼[1N(d,q)𝒀q𝒀qτ𝐈nop]=0.subscript𝑑𝑛𝔼delimited-[]subscriptnorm1𝑁𝑑𝑞subscript𝒀𝑞superscriptsubscript𝒀𝑞𝜏subscript𝐈𝑛op0\displaystyle\lim_{d,n\rightarrow\infty}\mathbb{E}\left[\left\|\frac{1}{N(d,q)% }\boldsymbol{Y}_{q}\boldsymbol{Y}_{q}^{\tau}-\mathbf{I}_{n}\right\|_{\mathrm{% op}}\right]=0.roman_lim start_POSTSUBSCRIPT italic_d , italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ∥ divide start_ARG 1 end_ARG start_ARG italic_N ( italic_d , italic_q ) end_ARG bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT - bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = 0 . (97)
Proposition C.11 (Theorem 1 in [81]).

If N(d,1)/nα(0,)𝑁𝑑1𝑛𝛼0N(d,1)/n\to\alpha\in(0,\infty)italic_N ( italic_d , 1 ) / italic_n → italic_α ∈ ( 0 , ∞ ), then the empirical spectral distribution of 𝐘1τ𝐘1/nsuperscriptsubscript𝐘1𝜏subscript𝐘1𝑛\boldsymbol{Y}_{1}^{\tau}\boldsymbol{Y}_{1}/nbold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_n converges in distribution to the Marchenko-Pastur distribution μMP(α)subscript𝜇𝑀𝑃𝛼\mu_{MP}(\alpha)italic_μ start_POSTSUBSCRIPT italic_M italic_P end_POSTSUBSCRIPT ( italic_α ) defined as (5) in [81].

The following proofs aim at bounding the eigenvalues of Kmainsubscript𝐾mainK_{\text{main}}italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT and Kresidualsubscript𝐾residualK_{\text{residual}}italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT. Then, the bounds on the eigenvalues of 1nK(𝑿,𝑿)1𝑛𝐾𝑿𝑿\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) can be obtained by Weyl’s inequality. Therefore, we split the remaining proofs into three parts.

Part I: bounding Kmainsubscript𝐾mainK_{\text{main}}italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT

Let us consider the singular value decomposition of 𝕐p+1subscript𝕐absent𝑝1\mathbb{Y}_{\leq p+1}blackboard_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT. That is, 𝒀p+1=n𝑶𝑺𝑽τsubscript𝒀absent𝑝1𝑛𝑶𝑺superscript𝑽𝜏\boldsymbol{Y}_{\leq p+1}=\sqrt{n}\boldsymbol{O}\boldsymbol{S}\boldsymbol{V}^{\tau}bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT = square-root start_ARG italic_n end_ARG bold_italic_O bold_italic_S bold_italic_V start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT where 𝑶n×n𝑶superscript𝑛𝑛\boldsymbol{O}\in\mathbb{R}^{n\times n}bold_italic_O ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and 𝑽N(p+1)×N(p+1)𝑽superscript𝑁𝑝1𝑁𝑝1\boldsymbol{V}\in\mathbb{R}^{N(p+1)\times N(p+1)}bold_italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_N ( italic_p + 1 ) × italic_N ( italic_p + 1 ) end_POSTSUPERSCRIPT are orthogonal matrices, and 𝑺=[𝑺;𝟎]τ[𝐈N(p+1)+𝚫s;𝟎]τn×N(p+1)𝑺superscriptsubscript𝑺0𝜏superscriptsubscript𝐈𝑁𝑝1subscript𝚫𝑠0𝜏superscript𝑛𝑁𝑝1\boldsymbol{S}=\left[\boldsymbol{S}_{\star};\mathbf{0}\right]^{\tau}\equiv% \left[\mathbf{I}_{N(p+1)}+\boldsymbol{\Delta}_{s};\mathbf{0}\right]^{\tau}\in% \mathbb{R}^{n\times N(p+1)}bold_italic_S = [ bold_italic_S start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ; bold_0 ] start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ≡ [ bold_I start_POSTSUBSCRIPT italic_N ( italic_p + 1 ) end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ; bold_0 ] start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_N ( italic_p + 1 ) end_POSTSUPERSCRIPT.

Notice that we have n(p+1)dp+1log(d)much-greater-than𝑛𝑝1superscript𝑑𝑝1𝑑n\gg(p+1)d^{p+1}\log(d)italic_n ≫ ( italic_p + 1 ) italic_d start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT roman_log ( italic_d ) when γ>1𝛾1\gamma>1italic_γ > 1 and γ2,4,6,𝛾246\gamma\neq 2,4,6,\cdotsitalic_γ ≠ 2 , 4 , 6 , ⋯. From Lemma B.2, we further have nN(p+1)log(N(p+1))much-greater-than𝑛𝑁𝑝1𝑁𝑝1n\gg N(p+1)\log(N(p+1))italic_n ≫ italic_N ( italic_p + 1 ) roman_log ( italic_N ( italic_p + 1 ) ). Hence, from Proposition C.8 with q=p+1𝑞𝑝1q=p+1italic_q = italic_p + 1, we have 𝒀p+1τ𝒀p+1/n=𝐈N(p+1)+𝚫p+1superscriptsubscript𝒀absent𝑝1𝜏subscript𝒀absent𝑝1𝑛subscript𝐈𝑁𝑝1subscript𝚫absent𝑝1\boldsymbol{Y}_{\leq p+1}^{\tau}\boldsymbol{Y}_{\leq p+1}/n=\mathbf{I}_{N(p+1)% }+\boldsymbol{\Delta}_{\leq p+1}bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT / italic_n = bold_I start_POSTSUBSCRIPT italic_N ( italic_p + 1 ) end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT, where 𝔼[𝚫p+1op]=od(1)𝔼delimited-[]subscriptnormsubscript𝚫absent𝑝1opsubscript𝑜𝑑1\mathbb{E}\left[\|\boldsymbol{\Delta}_{\leq p+1}\|_{\mathrm{op}}\right]=o_{d}(1)blackboard_E [ ∥ bold_Δ start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 ). Therefore, we have 𝚫sop=od,(1)subscriptnormsubscript𝚫𝑠opsubscript𝑜𝑑1\left\|\boldsymbol{\Delta}_{s}\right\|_{\mathrm{op}}=o_{d,\mathbb{P}}(1)∥ bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT = italic_o start_POSTSUBSCRIPT italic_d , blackboard_P end_POSTSUBSCRIPT ( 1 ).

Conditioning on the event Ω1={𝚫sop1/4}subscriptΩ1subscriptnormsubscript𝚫𝑠op14\Omega_{1}=\{\left\|\boldsymbol{\Delta}_{s}\right\|_{\mathrm{op}}\leq 1/4\}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { ∥ bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ 1 / 4 }, then we have

λN(p)(Kmain)=subscript𝜆𝑁𝑝subscript𝐾mainabsent\displaystyle\lambda_{N(p)}\left(K_{\text{main}}\right)=italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT ) = λN(p)(1n𝒀p+1𝑫p+1𝒀p+1τ)subscript𝜆𝑁𝑝1𝑛subscript𝒀absent𝑝1subscript𝑫absent𝑝1superscriptsubscript𝒀absent𝑝1𝜏\displaystyle~{}\lambda_{N(p)}\left(\frac{1}{n}\boldsymbol{Y}_{\leq p+1}% \boldsymbol{D}_{\leq p+1}\boldsymbol{Y}_{\leq p+1}^{\tau}\right)italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ) (98)
=Definition of 𝒀p+1Definition of subscript𝒀absent𝑝1\displaystyle\overset{\text{Definition of }\boldsymbol{Y}_{\leq p+1}}{=}start_OVERACCENT Definition of bold_italic_Y start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT end_OVERACCENT start_ARG = end_ARG λN(p)(Vτ𝑫p+1V(𝐈N(p+1)+𝚫s+𝚫sτ+𝚫s𝚫sτ))subscript𝜆𝑁𝑝superscript𝑉𝜏subscript𝑫absent𝑝1𝑉subscript𝐈𝑁𝑝1subscript𝚫𝑠superscriptsubscript𝚫𝑠𝜏subscript𝚫𝑠superscriptsubscript𝚫𝑠𝜏\displaystyle~{}\lambda_{N(p)}\left(V^{\tau}\boldsymbol{D}_{\leq p+1}V(\mathbf% {I}_{N(p+1)}+\boldsymbol{\Delta}_{s}+\boldsymbol{\Delta}_{s}^{\tau}+% \boldsymbol{\Delta}_{s}\boldsymbol{\Delta}_{s}^{\tau})\right)italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT italic_V ( bold_I start_POSTSUBSCRIPT italic_N ( italic_p + 1 ) end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ) )
Weyl’s ineuqalityWeyl’s ineuqality\displaystyle\overset{\text{Weyl's ineuqality}}{\geq}overWeyl’s ineuqality start_ARG ≥ end_ARG 716λN(p)(𝑫p+1)=716μp.716subscript𝜆𝑁𝑝subscript𝑫absent𝑝1716subscript𝜇𝑝\displaystyle~{}\frac{7}{16}\lambda_{N(p)}\left(\boldsymbol{D}_{\leq p+1}% \right)=\frac{7}{16}\mu_{p}.divide start_ARG 7 end_ARG start_ARG 16 end_ARG italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT ) = divide start_ARG 7 end_ARG start_ARG 16 end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT .

Similarly, we have

λN(p)+1(Kmain)=λN(p)+1(Vτ𝑫p+1V(𝐈N(p+1)+𝚫s+𝚫sτ+𝚫s𝚫sτ))Weyl’s ineuqality2516μp+1.subscript𝜆𝑁𝑝1subscript𝐾mainsubscript𝜆𝑁𝑝1superscript𝑉𝜏subscript𝑫absent𝑝1𝑉subscript𝐈𝑁𝑝1subscript𝚫𝑠superscriptsubscript𝚫𝑠𝜏subscript𝚫𝑠superscriptsubscript𝚫𝑠𝜏Weyl’s ineuqality2516subscript𝜇𝑝1\displaystyle\lambda_{N(p)+1}\left(K_{\text{main}}\right)=\lambda_{N(p)+1}% \left(V^{\tau}\boldsymbol{D}_{\leq p+1}V(\mathbf{I}_{N(p+1)}+\boldsymbol{% \Delta}_{s}+\boldsymbol{\Delta}_{s}^{\tau}+\boldsymbol{\Delta}_{s}\boldsymbol{% \Delta}_{s}^{\tau})\right)\overset{\text{Weyl's ineuqality}}{\leq}\frac{25}{16% }\mu_{p+1}.italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT ) = italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_D start_POSTSUBSCRIPT ≤ italic_p + 1 end_POSTSUBSCRIPT italic_V ( bold_I start_POSTSUBSCRIPT italic_N ( italic_p + 1 ) end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT bold_Δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ) ) overWeyl’s ineuqality start_ARG ≤ end_ARG divide start_ARG 25 end_ARG start_ARG 16 end_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT . (99)
Part II: bounding Kresidualsubscript𝐾residualK_{\text{residual}}italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT

For any 2kp+12𝑘𝑝12\leq k\leq p+12 ≤ italic_k ≤ italic_p + 1 and any δ𝛿\deltaitalic_δ, when d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, δ𝛿\deltaitalic_δ, and p𝑝pitalic_p, from Proposition C.9, we have

𝔼[1n𝒀p+k𝑫p+k𝒀p+kτop]𝔼delimited-[]1𝑛subscriptnormsubscript𝒀𝑝𝑘subscript𝑫𝑝𝑘superscriptsubscript𝒀𝑝𝑘𝜏op\displaystyle~{}\mathbb{E}\left[\frac{1}{n}\|\boldsymbol{Y}_{p+k}\boldsymbol{D% }_{p+k}\boldsymbol{Y}_{p+k}^{\tau}\|_{\mathrm{op}}\right]blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] (100)
\displaystyle\leq μp+kn𝔼[𝒀p+k𝒀p+kτN(d,p+k)𝐈nop]+μp+kN(d,p+k)nsubscript𝜇𝑝𝑘𝑛𝔼delimited-[]subscriptnormsubscript𝒀𝑝𝑘superscriptsubscript𝒀𝑝𝑘𝜏𝑁𝑑𝑝𝑘subscript𝐈𝑛opsubscript𝜇𝑝𝑘𝑁𝑑𝑝𝑘𝑛\displaystyle~{}\frac{\mu_{p+k}}{n}\mathbb{E}\left[\|\boldsymbol{Y}_{p+k}% \boldsymbol{Y}_{p+k}^{\tau}-N(d,p+k)\mathbf{I}_{n}\|_{\mathrm{op}}\right]+% \frac{\mu_{p+k}N(d,p+k)}{n}divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG blackboard_E [ ∥ bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT - italic_N ( italic_d , italic_p + italic_k ) bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] + divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_p + italic_k ) end_ARG start_ARG italic_n end_ARG
\displaystyle\leq 0μp+kN(d,p+k)n{n1/4ndp+k+ndp+k+1}subscript0subscript𝜇𝑝𝑘𝑁𝑑𝑝𝑘𝑛superscript𝑛14𝑛superscript𝑑𝑝𝑘𝑛superscript𝑑𝑝𝑘1\displaystyle~{}\mathfrak{C}_{0}\frac{\mu_{p+k}N(d,p+k)}{n}\left\{n^{1/4}\sqrt% {\frac{n}{d^{p+k}}}+\frac{n}{d^{p+k}}+1\right\}fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_p + italic_k ) end_ARG start_ARG italic_n end_ARG { italic_n start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_p + italic_k end_POSTSUPERSCRIPT end_ARG end_ARG + divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_p + italic_k end_POSTSUPERSCRIPT end_ARG + 1 }
\displaystyle\leq 022n{n1/4ndp+2+ndp+2+1}subscript0superscriptsubscript22𝑛superscript𝑛14𝑛superscript𝑑𝑝2𝑛superscript𝑑𝑝21\displaystyle~{}\mathfrak{C}_{0}\frac{\mathfrak{C}_{2}^{2}}{n}\left\{n^{1/4}% \sqrt{\frac{n}{d^{p+2}}}+\frac{n}{d^{p+2}}+1\right\}fraktur_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG { italic_n start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_p + 2 end_POSTSUPERSCRIPT end_ARG end_ARG + divide start_ARG italic_n end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_p + 2 end_POSTSUPERSCRIPT end_ARG + 1 }
\displaystyle\leq δ3pμp+1,𝛿3𝑝subscript𝜇𝑝1\displaystyle~{}\frac{\delta}{3p}\mu_{p+1},divide start_ARG italic_δ end_ARG start_ARG 3 italic_p end_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ,

where the second last inequality comes from Lemma B.2.

For any kp+2𝑘𝑝2k\geq p+2italic_k ≥ italic_p + 2, if we denote q=p+k2p+2𝑞𝑝𝑘2𝑝2q=p+k\geq 2p+2italic_q = italic_p + italic_k ≥ 2 italic_p + 2 and δ1=(2p+2γ)/2subscript𝛿12𝑝2𝛾2\delta_{1}=(2p+2-\gamma)/2italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( 2 italic_p + 2 - italic_γ ) / 2, then we have ndqδ1much-less-than𝑛superscript𝑑𝑞subscript𝛿1n\ll d^{q-\delta_{1}}italic_n ≪ italic_d start_POSTSUPERSCRIPT italic_q - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Hence, from Proposition C.10, we have

𝔼[1n𝒀p+k𝑫p+k𝒀p+kτop]=𝔼delimited-[]1𝑛subscriptnormsubscript𝒀𝑝𝑘subscript𝑫𝑝𝑘superscriptsubscript𝒀𝑝𝑘𝜏opabsent\displaystyle\mathbb{E}\left[\frac{1}{n}\|\boldsymbol{Y}_{p+k}\boldsymbol{D}_{% p+k}\boldsymbol{Y}_{p+k}^{\tau}\|_{\mathrm{op}}\right]=blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = μp+kN(d,p+k)n(1+od(1)).subscript𝜇𝑝𝑘𝑁𝑑𝑝𝑘𝑛1subscript𝑜𝑑1\displaystyle~{}\frac{\mu_{p+k}N(d,p+k)}{n}\left(1+o_{d}(1)\right).divide start_ARG italic_μ start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_p + italic_k ) end_ARG start_ARG italic_n end_ARG ( 1 + italic_o start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 ) ) . (101)

Therefore, for any δ𝛿\deltaitalic_δ, when d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, δ𝛿\deltaitalic_δ, and γ𝛾\gammaitalic_γ, from Markov’s inequality we have

(Kresidualop>μp+1)subscriptnormsubscript𝐾residualopsubscript𝜇𝑝1\displaystyle~{}\mathbb{P}\left(\|K_{\text{residual}}\|_{\mathrm{op}}>\mu_{p+1% }\right)blackboard_P ( ∥ italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) (102)
=\displaystyle== (1nk=2𝒀p+k𝑫p+k𝒀p+kτop>μp+1)1𝑛subscriptnormsuperscriptsubscript𝑘2subscript𝒀𝑝𝑘subscript𝑫𝑝𝑘superscriptsubscript𝒀𝑝𝑘𝜏opsubscript𝜇𝑝1\displaystyle~{}\mathbb{P}\left(\frac{1}{n}\|\sum_{k=2}^{\infty}\boldsymbol{Y}% _{p+k}\boldsymbol{D}_{p+k}\boldsymbol{Y}_{p+k}^{\tau}\|_{\mathrm{op}}>\mu_{p+1% }\right)blackboard_P ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_p + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT )
\displaystyle\leq (k=2p+1δ3pμp+1+2nk=0μkN(d,k))/(μp+1)superscriptsubscript𝑘2𝑝1𝛿3𝑝subscript𝜇𝑝12𝑛superscriptsubscript𝑘0subscript𝜇𝑘𝑁𝑑𝑘subscript𝜇𝑝1\displaystyle~{}\left(\sum_{k=2}^{p+1}\frac{\delta}{3p}\mu_{p+1}+\frac{2}{n}% \sum_{k=0}^{\infty}\mu_{k}N(d,k)\right)/(\mu_{p+1})( ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 3 italic_p end_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) ) / ( italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT )
\displaystyle\leq (δ3μp+1+2n)/(μp+1)<2δ3.𝛿3subscript𝜇𝑝12𝑛subscript𝜇𝑝12𝛿3\displaystyle~{}\left(\frac{\delta}{3}\mu_{p+1}+\frac{2}{n}\right)/(\mu_{p+1})% <\frac{2\delta}{3}.( divide start_ARG italic_δ end_ARG start_ARG 3 end_ARG italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ) / ( italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) < divide start_ARG 2 italic_δ end_ARG start_ARG 3 end_ARG .
Part III: bounding the empirical matrix

When d𝑑d\geq\mathfrak{C}italic_d ≥ fraktur_C, where \mathfrak{C}fraktur_C is a sufficiently large constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ, we have 716μpμp+114μp716subscript𝜇𝑝subscript𝜇𝑝114subscript𝜇𝑝\frac{7}{16}\mu_{p}-\mu_{p+1}\geq\frac{1}{4}\mu_{p}divide start_ARG 7 end_ARG start_ARG 16 end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

Define the event Ω2={Kresidualopμp+1}subscriptΩ2subscriptnormsubscript𝐾residualopsubscript𝜇𝑝1\Omega_{2}=\left\{\left\|K_{\text{residual}}\right\|_{\mathrm{op}}\leq\mu_{p+1% }\right\}roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { ∥ italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT }. Conditioning on the event Ω1Ω2subscriptΩ1subscriptΩ2\Omega_{1}\cap\Omega_{2}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, then we have

λ^N(p)=subscript^𝜆𝑁𝑝absent\displaystyle\widehat{\lambda}_{N(p)}=over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT = λN(p)(1nK(𝑿,𝑿))subscript𝜆𝑁𝑝1𝑛𝐾𝑿𝑿\displaystyle~{}\lambda_{N(p)}\left(\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X}% )\right)italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) ) (103)
Weyl’s ineuqalityWeyl’s ineuqality\displaystyle\overset{\text{Weyl's ineuqality}}{\geq}overWeyl’s ineuqality start_ARG ≥ end_ARG λN(p)(Kmain)Kresidualopsubscript𝜆𝑁𝑝subscript𝐾mainsubscriptnormsubscript𝐾residualop\displaystyle~{}\lambda_{N(p)}\left(K_{\text{main}}\right)-\|K_{\text{residual% }}\|_{\mathrm{op}}italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT ) - ∥ italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT
(98)italic-(98italic-)\displaystyle\overset{\eqref{eqn_182_main_part_inner}}{\geq}start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≥ end_ARG 716μpμp+114μp.716subscript𝜇𝑝subscript𝜇𝑝114subscript𝜇𝑝\displaystyle~{}\frac{7}{16}\mu_{p}-\mu_{p+1}\geq\frac{1}{4}\mu_{p}.divide start_ARG 7 end_ARG start_ARG 16 end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT .

Similarly, we have

λ^N(p)+1=subscript^𝜆𝑁𝑝1absent\displaystyle\widehat{\lambda}_{N(p)+1}=over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT = λN(p)+1(1nK(𝑿,𝑿))subscript𝜆𝑁𝑝11𝑛𝐾𝑿𝑿\displaystyle~{}\lambda_{N(p)+1}\left(\frac{1}{n}K(\boldsymbol{X},\boldsymbol{% X})\right)italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) ) (104)
Weyl’s ineuqalityWeyl’s ineuqality\displaystyle\overset{\text{Weyl's ineuqality}}{\leq}overWeyl’s ineuqality start_ARG ≤ end_ARG λN(p)+1(Kmain)+Kresidualopsubscript𝜆𝑁𝑝1subscript𝐾mainsubscriptnormsubscript𝐾residualop\displaystyle~{}\lambda_{N(p)+1}\left(K_{\text{main}}\right)+\|K_{\text{% residual}}\|_{\mathrm{op}}italic_λ start_POSTSUBSCRIPT italic_N ( italic_p ) + 1 end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT main end_POSTSUBSCRIPT ) + ∥ italic_K start_POSTSUBSCRIPT residual end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT
(99)italic-(99italic-)\displaystyle\overset{\eqref{eqn_183_main_part_inner}}{\leq}start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG ≤ end_ARG 4μp+1.4subscript𝜇𝑝1\displaystyle~{}4\mu_{p+1}.4 italic_μ start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT .

Since (Ω1Ω2)>1δsubscriptΩ1subscriptΩ21𝛿\mathbb{P}(\Omega_{1}\cap\Omega_{2})>1-\deltablackboard_P ( roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > 1 - italic_δ, we then get (87).

Next, we consider the case where γ(0,1)𝛾01\gamma\in(0,1)italic_γ ∈ ( 0 , 1 ). Recall that we have p=0𝑝0p=0italic_p = 0 and γ(p,p+1)𝛾𝑝𝑝1\gamma\in(p,p+1)italic_γ ∈ ( italic_p , italic_p + 1 ). For any integer q=1,2,𝑞12q=1,2,\cdotsitalic_q = 1 , 2 , ⋯, if we denote δ1=(1γ)/2subscript𝛿11𝛾2\delta_{1}=(1-\gamma)/2italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( 1 - italic_γ ) / 2, then we have ndqδ1much-less-than𝑛superscript𝑑𝑞subscript𝛿1n\ll d^{q-\delta_{1}}italic_n ≪ italic_d start_POSTSUPERSCRIPT italic_q - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Hence, from Proposition C.10, we have

limd,n𝔼[𝒀q𝑫q𝒀qτμqN(d,q)𝐈nop]=0,q=1,2,.formulae-sequencesubscript𝑑𝑛𝔼delimited-[]subscriptnormsubscript𝒀𝑞subscript𝑫𝑞superscriptsubscript𝒀𝑞𝜏subscript𝜇𝑞𝑁𝑑𝑞subscript𝐈𝑛op0𝑞12\displaystyle\lim_{d,n\rightarrow\infty}\mathbb{E}\left[\left\|\boldsymbol{Y}_% {q}\boldsymbol{D}_{q}\boldsymbol{Y}_{q}^{\tau}-\mu_{q}N(d,q)\mathbf{I}_{n}% \right\|_{\mathrm{op}}\right]=0,\quad q=1,2,\cdots.roman_lim start_POSTSUBSCRIPT italic_d , italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ∥ bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_N ( italic_d , italic_q ) bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = 0 , italic_q = 1 , 2 , ⋯ . (105)

Hence, Equation (95) can be rewritten as

1nK(𝑿,𝑿)=1n𝒀0𝑫0𝒀0τ+κ1n(𝐈n+𝚫h),1𝑛𝐾𝑿𝑿1𝑛subscript𝒀0subscript𝑫0superscriptsubscript𝒀0𝜏subscript𝜅1𝑛subscript𝐈𝑛subscript𝚫\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})=\frac{1}{n}\boldsymbol{Y}_{0}% \boldsymbol{D}_{0}\boldsymbol{Y}_{0}^{\tau}+\frac{\kappa_{1}}{n}\left(\mathbf{% I}_{n}+\boldsymbol{\Delta}_{h}\right),divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + divide start_ARG italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ( bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , (106)

where κq:=k=qμkN(d,k)1assignsubscript𝜅𝑞superscriptsubscript𝑘𝑞subscript𝜇𝑘𝑁𝑑𝑘1\kappa_{q}:=\sum_{k=q}^{\infty}\mu_{k}N(d,k)\leq 1italic_κ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_k = italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N ( italic_d , italic_k ) ≤ 1, and 𝚫hop=od,(1)subscriptnormsubscript𝚫opsubscript𝑜𝑑1\left\|\boldsymbol{\Delta}_{h}\right\|_{\mathrm{op}}=o_{d,\mathbb{P}}(1)∥ bold_Δ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT = italic_o start_POSTSUBSCRIPT italic_d , blackboard_P end_POSTSUBSCRIPT ( 1 ). Similar as the case for γ>1𝛾1\gamma>1italic_γ > 1, we can get

λ^N(0)+1subscript^𝜆𝑁01\displaystyle\widehat{\lambda}_{N(0)+1}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) + 1 end_POSTSUBSCRIPT <4n,μ0/4<λ^N(0),formulae-sequenceabsent4𝑛subscript𝜇04subscript^𝜆𝑁0\displaystyle<\frac{4}{n},\quad\mu_{0}/4<\widehat{\lambda}_{N(0)},< divide start_ARG 4 end_ARG start_ARG italic_n end_ARG , italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) end_POSTSUBSCRIPT , (107)

with probability at least 1δ1𝛿1-\delta1 - italic_δ.

Finally, let’s consider the case where γ=1𝛾1\gamma=1italic_γ = 1. For any integer q=2,3,𝑞23q=2,3,\cdotsitalic_q = 2 , 3 , ⋯, if we denote δ1=1/2subscript𝛿112\delta_{1}=1/2italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 / 2, then we have ndqδ1much-less-than𝑛superscript𝑑𝑞subscript𝛿1n\ll d^{q-\delta_{1}}italic_n ≪ italic_d start_POSTSUPERSCRIPT italic_q - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Hence, from Proposition C.10, we have

limd,n𝔼[𝒀q𝑫q𝒀qτμqN(d,q)𝐈nop]=0,q=2,4,6,.formulae-sequencesubscript𝑑𝑛𝔼delimited-[]subscriptnormsubscript𝒀𝑞subscript𝑫𝑞superscriptsubscript𝒀𝑞𝜏subscript𝜇𝑞𝑁𝑑𝑞subscript𝐈𝑛op0𝑞246\displaystyle\lim_{d,n\rightarrow\infty}\mathbb{E}\left[\left\|\boldsymbol{Y}_% {q}\boldsymbol{D}_{q}\boldsymbol{Y}_{q}^{\tau}-\mu_{q}N(d,q)\mathbf{I}_{n}% \right\|_{\mathrm{op}}\right]=0,\quad q=2,4,6,\cdots.roman_lim start_POSTSUBSCRIPT italic_d , italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ∥ bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_N ( italic_d , italic_q ) bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ] = 0 , italic_q = 2 , 4 , 6 , ⋯ . (108)

Hence, Equation (95) can be rewritten as

1nK(𝑿,𝑿)=1n𝒀0𝑫0𝒀0τ+1n𝒀1𝑫1𝒀1τ+κ2n(𝐈n+𝚫h);1𝑛𝐾𝑿𝑿1𝑛subscript𝒀0subscript𝑫0superscriptsubscript𝒀0𝜏1𝑛subscript𝒀1subscript𝑫1superscriptsubscript𝒀1𝜏subscript𝜅2𝑛subscript𝐈𝑛subscript𝚫\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})=\frac{1}{n}\boldsymbol{Y}_{0}% \boldsymbol{D}_{0}\boldsymbol{Y}_{0}^{\tau}+\frac{1}{n}\boldsymbol{Y}_{1}% \boldsymbol{D}_{1}\boldsymbol{Y}_{1}^{\tau}+\frac{\kappa_{2}}{n}\left(\mathbf{% I}_{n}+\boldsymbol{\Delta}_{h}\right);divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + divide start_ARG italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ( bold_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + bold_Δ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ; (109)

Furthermore, from Proposition C.11, for any δ𝛿\deltaitalic_δ, there exist two constant ′′superscript′′\mathfrak{C}^{\prime\prime}fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and 1subscript1\mathfrak{C}_{1}fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and δ𝛿\deltaitalic_δ, such that when d′′𝑑superscript′′d\geq\mathfrak{C}^{\prime\prime}italic_d ≥ fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, we have

(1n𝒀1𝑫1𝒀1τop1μ1)δ2.1𝑛subscriptnormsubscript𝒀1subscript𝑫1superscriptsubscript𝒀1𝜏opsubscript1subscript𝜇1𝛿2\mathbb{P}\left(\frac{1}{n}\|\boldsymbol{Y}_{1}\boldsymbol{D}_{1}\boldsymbol{Y% }_{1}^{\tau}\|_{\mathrm{op}}\geq\mathfrak{C}_{1}\mu_{1}\right)\leq\frac{\delta% }{2}.blackboard_P ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≥ fraktur_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG .

For any given δ>0𝛿0\delta>0italic_δ > 0, let d=max{,′′}𝑑superscriptsuperscript′′d\geq\mathfrak{C}=\max\{\mathfrak{C}^{\prime},\mathfrak{C}^{\prime\prime}\}italic_d ≥ fraktur_C = roman_max { fraktur_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT }, where superscript\mathfrak{C}^{\prime}fraktur_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the constant (only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) introduced in Lemma B.2 and ′′superscript′′\mathfrak{C}^{\prime\prime}fraktur_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is the constant (only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and δ𝛿\deltaitalic_δ ) introduced as the previous paragraph.

Since ndasymptotically-equals𝑛𝑑n\asymp ditalic_n ≍ italic_d, from Lemma B.2, we have μ12c2n1subscript𝜇1subscript2subscript𝑐2superscript𝑛1\mu_{1}\leq\mathfrak{C}_{2}c_{2}n^{-1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ fraktur_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Similar as the case for γ>1𝛾1\gamma>1italic_γ > 1, we can get

λ^N(0)+1subscript^𝜆𝑁01\displaystyle\widehat{\lambda}_{N(0)+1}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) + 1 end_POSTSUBSCRIPT <3n,μ0/4<λ^N(0),formulae-sequenceabsentsubscript3𝑛subscript𝜇04subscript^𝜆𝑁0\displaystyle<\frac{\mathfrak{C}_{3}}{n},\quad\mu_{0}/4<\widehat{\lambda}_{N(0% )},< divide start_ARG fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG , italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / 4 < over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_N ( 0 ) end_POSTSUBSCRIPT , (110)

with probability at least 1δ1𝛿1-\delta1 - italic_δ, where 3subscript3\mathfrak{C}_{3}fraktur_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is a constant only depending on c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and δ𝛿\deltaitalic_δ. \square

Proof of Lemma C.6: The proof is a simple modification of the proof of Lemma E.3:

𝐁t2superscriptsubscript𝐁𝑡2\displaystyle\mathbf{B}_{t}^{2}bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =2netΣUτf(𝑿)2(133)2ni=1J[Uτf(𝑿)]i2(tλ^i)2+1ni=J+1nzi2absent2𝑛superscriptnormsuperscript𝑒𝑡Σsuperscript𝑈𝜏subscript𝑓𝑿21332𝑛superscriptsubscript𝑖1𝐽superscriptsubscriptdelimited-[]superscript𝑈𝜏subscript𝑓𝑿𝑖2superscript𝑡subscript^𝜆𝑖21𝑛superscriptsubscript𝑖𝐽1𝑛superscriptsubscript𝑧𝑖2\displaystyle=\frac{2}{n}\left\|e^{-t\Sigma}U^{\tau}f_{\star}(\boldsymbol{X})% \right\|^{2}\overset{(\ref{eqn:inequality_lemma_B_t:thm:empirical_loss})}{\leq% }\frac{2}{n}\sum_{i=1}^{J}\frac{[U^{\tau}f_{\star}(\boldsymbol{X})]_{i}^{2}}{(% t\widehat{\lambda}_{i})^{2}}+\frac{1}{n}\sum_{i=J+1}^{n}z_{i}^{2}= divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∥ italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT divide start_ARG [ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_t over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_J + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (111)
=1nt2i=1Jzi2λ^i2+1ni=J+1nzi2=1t2i=1Jλ^i[Ψa]i2λ^i2+i=J+1nλ^i[Ψa]i2absent1𝑛superscript𝑡2superscriptsubscript𝑖1𝐽superscriptsubscript𝑧𝑖2superscriptsubscript^𝜆𝑖21𝑛superscriptsubscript𝑖𝐽1𝑛superscriptsubscript𝑧𝑖21superscript𝑡2superscriptsubscript𝑖1𝐽subscript^𝜆𝑖superscriptsubscriptdelimited-[]superscriptΨ𝑎𝑖2superscriptsubscript^𝜆𝑖2superscriptsubscript𝑖𝐽1𝑛subscript^𝜆𝑖superscriptsubscriptdelimited-[]superscriptΨ𝑎𝑖2\displaystyle=\frac{1}{nt^{2}}\sum_{i=1}^{J}\frac{z_{i}^{2}}{\widehat{\lambda}% _{i}^{2}}+\frac{1}{n}\sum_{i=J+1}^{n}z_{i}^{2}=\frac{1}{t^{2}}\sum_{i=1}^{J}% \frac{\widehat{\lambda}_{i}[\Psi^{*}a]_{i}^{2}}{\widehat{\lambda}_{i}^{2}}+% \sum_{i=J+1}^{n}\widehat{\lambda}_{i}[\Psi^{*}a]_{i}^{2}= divide start_ARG 1 end_ARG start_ARG italic_n italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_J + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_i = italic_J + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1t2λ^J+λ^J+1)Ψa221t2λ^J+λ^J+1.absent1superscript𝑡2subscript^𝜆𝐽subscript^𝜆𝐽1superscriptsubscriptnormsuperscriptΨ𝑎221superscript𝑡2subscript^𝜆𝐽subscript^𝜆𝐽1\displaystyle\leq\left(\frac{1}{t^{2}\widehat{\lambda}_{J}}+\widehat{\lambda}_% {J+1}\right)\|\Psi^{*}a\|_{2}^{2}\leq\frac{1}{t^{2}\widehat{\lambda}_{J}}+% \widehat{\lambda}_{J+1}.≤ ( divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) ∥ roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT .

\square

Proof of Lemma C.7: Let H=(𝑰etΣ)𝐻𝑰superscript𝑒𝑡ΣH=\left(\boldsymbol{I}-e^{-t\Sigma}\right)italic_H = ( bold_italic_I - italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT ) and P=2nH𝑃2𝑛𝐻P=\sqrt{\frac{2}{n}}Hitalic_P = square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n end_ARG end_ARG italic_H. Then, 𝑽t=𝒆τUP2Uτ𝒆=𝑑𝒆τP2𝒆subscript𝑽𝑡superscript𝒆𝜏𝑈superscript𝑃2superscript𝑈𝜏𝒆𝑑superscript𝒆𝜏superscript𝑃2𝒆\boldsymbol{V}_{t}=\boldsymbol{e}^{\tau}UP^{2}U^{\tau}\boldsymbol{e}\overset{d% }{=}\boldsymbol{e}^{\tau}P^{2}\boldsymbol{e}bold_italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e overitalic_d start_ARG = end_ARG bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_e, where 𝒆=(𝒆1,,𝒆n)τ𝒆superscriptsubscript𝒆1subscript𝒆𝑛𝜏\boldsymbol{e}=(\boldsymbol{e}_{1},\cdots,\boldsymbol{e}_{n})^{\tau}bold_italic_e = ( bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT and 𝒆i=𝒚if(Xi)N(0,σ2)subscript𝒆𝑖subscript𝒚𝑖subscript𝑓subscript𝑋𝑖similar-to𝑁0superscript𝜎2\boldsymbol{e}_{i}=\boldsymbol{y}_{i}-f_{\star}(X_{i})\sim N(0,\sigma^{2})bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∼ italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for any 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n. Applying Lemma F.10 with A=P2𝐴superscript𝑃2A=P^{2}italic_A = italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, δ=δ2𝛿subscript𝛿2\delta=\delta_{2}italic_δ = italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Q=i,j=1naij𝒆i𝒆j=𝑑𝑽t𝑄superscriptsubscript𝑖𝑗1𝑛subscript𝑎𝑖𝑗subscript𝒆𝑖subscript𝒆𝑗𝑑subscript𝑽𝑡Q=\sum_{i,j=1}^{n}a_{ij}\boldsymbol{e}_{i}\boldsymbol{e}_{j}\overset{d}{=}% \boldsymbol{V}_{t}italic_Q = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT overitalic_d start_ARG = end_ARG bold_italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we then have that

|Q𝔼[Q]|δ2,𝑄𝔼delimited-[]𝑄subscript𝛿2\displaystyle|Q-\mathbb{E}[Q]|\leq\delta_{2},| italic_Q - blackboard_E [ italic_Q ] | ≤ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (112)

holds with probability at least 1exp(𝔠1min{δ2Aop,δ22AF2})1subscript𝔠1subscript𝛿2subscriptnorm𝐴𝑜𝑝superscriptsubscript𝛿22subscriptsuperscriptnorm𝐴2𝐹1-\exp\left(-\mathfrak{c}_{1}\min\left\{\frac{\delta_{2}}{\|A\|_{op}},\frac{% \delta_{2}^{2}}{\|A\|^{2}_{F}}\right\}\right)1 - roman_exp ( - fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_min { divide start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_A ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT end_ARG , divide start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ italic_A ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG } ) where 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on σ𝜎\sigmaitalic_σ, and the randomness comes from the noise term 𝒆𝒆\boldsymbol{e}bold_italic_e.

It is easy to verify that Hop1subscriptnorm𝐻𝑜𝑝1\|H\|_{op}\leq 1∥ italic_H ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ 1 , Aop2nsubscriptnorm𝐴𝑜𝑝2𝑛\|A\|_{op}\leq\frac{2}{n}∥ italic_A ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG and

tr(H2)(133)j(1tλ^j)2t2(Jt2+λ^J+1j=J+1nλ^j)t2(Jt2+λ^J+1).𝑡𝑟superscript𝐻2133subscript𝑗superscript1𝑡subscript^𝜆𝑗2superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽1superscriptsubscript𝑗𝐽1𝑛subscript^𝜆𝑗superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽1\displaystyle tr(H^{2})\overset{(\ref{eqn:inequality_lemma_B_t:thm:empirical_% loss})}{\leq}\sum_{j}\left(1\wedge{t}\widehat{\lambda}_{j}\right)^{2}\leq t^{2% }\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\sum_{j=J+1}^{n}\widehat{\lambda% }_{j}\right)\leq t^{2}\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right).italic_t italic_r ( italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 1 ∧ italic_t over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_J + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) . (113)

Thus, we have

AF2superscriptsubscriptnorm𝐴𝐹2\displaystyle\|A\|_{F}^{2}∥ italic_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =tr(P4)=4n2tr(H4)4n2tr(H2)4t2n2(Jt2+λ^J+1),absent𝑡𝑟superscript𝑃44superscript𝑛2𝑡𝑟superscript𝐻44superscript𝑛2𝑡𝑟superscript𝐻24superscript𝑡2superscript𝑛2𝐽superscript𝑡2subscript^𝜆𝐽1\displaystyle=tr(P^{4})=\frac{4}{n^{2}}tr(H^{4})\leq\frac{4}{n^{2}}tr(H^{2})% \leq\frac{4t^{2}}{n^{2}}\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right),= italic_t italic_r ( italic_P start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) = divide start_ARG 4 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_t italic_r ( italic_H start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 4 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_t italic_r ( italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 4 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) , (114)
𝔼[Q]𝔼delimited-[]𝑄\displaystyle\mathbb{E}[Q]blackboard_E [ italic_Q ] =𝔼[𝐕t]=2σ2ntr((𝐈etΣ)2)2σ2nt2(Jt2+λ^J+1);absent𝔼delimited-[]subscript𝐕𝑡2superscript𝜎2𝑛𝑡𝑟superscript𝐈superscript𝑒𝑡Σ22superscript𝜎2𝑛superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽1\displaystyle=\mathbb{E}[\mathbf{V}_{t}]=\frac{2\sigma^{2}}{n}tr\left(\left(% \mathbf{I}-e^{-{t}\Sigma}\right)^{2}\right)\leq\frac{2\sigma^{2}}{n}t^{2}\left% (\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right);= blackboard_E [ bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG italic_t italic_r ( ( bold_I - italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) ;

From (112), we know that there exists an absolute constant C𝐶Citalic_C, such that we have

𝐕t𝔼[Q]+δ22σ2nt2(Jt2+λ^J+1)+δ2,subscript𝐕𝑡𝔼delimited-[]𝑄subscript𝛿22superscript𝜎2𝑛superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽1subscript𝛿2\displaystyle\mathbf{V}_{t}\leq\mathbb{E}[Q]+\delta_{2}\leq\frac{2\sigma^{2}}{% n}t^{2}\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right)+\delta_{2},bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ blackboard_E [ italic_Q ] + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (115)

with probability at least 1exp(Cmin{nδ22,n2δ224t2(Jt2+λ^J+1)})1𝐶𝑛subscript𝛿22superscript𝑛2superscriptsubscript𝛿224superscript𝑡2𝐽superscript𝑡2subscript^𝜆𝐽11-\exp\left(-C\min\left\{\frac{n\delta_{2}}{2},\frac{n^{2}\delta_{2}^{2}}{4t^{% 2}\left(\frac{J}{t^{2}}+\widehat{\lambda}_{J+1}\right)}\right\}\right)1 - roman_exp ( - italic_C roman_min { divide start_ARG italic_n italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_J end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPT ) end_ARG } ). \square

Appendix D Properties of the inner product kernels

D.1 Mercer decomposition of the inner product kernels on the sphere

For inner product kernels on the sphere, Mercer’s decomposition (4) can be expressed in the basis of spherical harmonics [68, 69]. This allows for the eigenvalues of such kernels to be computed. In this subsection, we will briefly review the Mercer decomposition corresponding to inner product kernels on the sphere. See [28, 11] for references.

Let ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT be the uniform measure on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and let’s assume that K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is an inner product kernel defined on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, that is , there exists a function Φ:𝕊d[1,1]:Φsuperscript𝕊𝑑11\Phi:\mathbb{S}^{d}\to[-1,1]roman_Φ : blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → [ - 1 , 1 ], such that for any 𝒙,𝒙𝕊d𝒙superscript𝒙superscript𝕊𝑑\boldsymbol{x},\boldsymbol{x}^{\prime}\in\mathbb{S}^{d}bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we have K𝚒𝚗(𝒙,𝒙)=Φ(𝒙,𝒙)superscript𝐾𝚒𝚗𝒙superscript𝒙Φ𝒙superscript𝒙K^{\mathtt{in}}(\boldsymbol{x},\boldsymbol{x}^{\prime})=\Phi(\left\langle% \boldsymbol{x},\boldsymbol{x}^{\prime}\right\rangle)italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_Φ ( ⟨ bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ ).

Similar to (4), Mercer’s decomposition for the inner product kernel K𝚒𝚗superscript𝐾𝚒𝚗{K}^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT is given in the basis of spherical harmonics :

K𝚒𝚗(𝒙,𝒙)=k=0μkj=1N(d,k)Yk,j(𝒙)Yk,j(𝒙),superscript𝐾𝚒𝚗𝒙superscript𝒙superscriptsubscript𝑘0subscript𝜇𝑘superscriptsubscript𝑗1𝑁𝑑𝑘subscript𝑌𝑘𝑗𝒙subscript𝑌𝑘𝑗superscript𝒙\displaystyle{K}^{\mathtt{in}}(\boldsymbol{x},\boldsymbol{x}^{\prime})=\sum_{k% =0}^{\infty}\mu_{k}\sum_{j=1}^{N(d,k)}Y_{k,j}(\boldsymbol{x})Y_{k,j}\left(% \boldsymbol{x}^{\prime}\right),italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N ( italic_d , italic_k ) end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ( bold_italic_x ) italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (116)

where Yk,jsubscript𝑌𝑘𝑗Y_{k,j}italic_Y start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT for j=1,,N(d,k)𝑗1𝑁𝑑𝑘j=1,\cdots,N(d,k)italic_j = 1 , ⋯ , italic_N ( italic_d , italic_k ) are spherical harmonic polynomials of degree k𝑘kitalic_k, μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s are the eigenvalues of K𝚒𝚗superscript𝐾𝚒𝚗K^{\mathtt{in}}italic_K start_POSTSUPERSCRIPT typewriter_in end_POSTSUPERSCRIPT with multiplicity N(d,k)𝑁𝑑𝑘N(d,k)italic_N ( italic_d , italic_k ), where N(d,0)=1𝑁𝑑01N(d,0)=1italic_N ( italic_d , 0 ) = 1, and N(d,k)=2k+d1k(k+d2)!(d1)!(k1)!𝑁𝑑𝑘2𝑘𝑑1𝑘𝑘𝑑2𝑑1𝑘1N(d,k)=\frac{2k+d-1}{k}\cdot\frac{(k+d-2)!}{(d-1)!(k-1)!}italic_N ( italic_d , italic_k ) = divide start_ARG 2 italic_k + italic_d - 1 end_ARG start_ARG italic_k end_ARG ⋅ divide start_ARG ( italic_k + italic_d - 2 ) ! end_ARG start_ARG ( italic_d - 1 ) ! ( italic_k - 1 ) ! end_ARG for any k=1,𝑘1k=1,\cdotsitalic_k = 1 , ⋯.

By known results on spherical harmonics, the eigenvalues μksubscript𝜇𝑘\mu_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s have the following explicit expression [11]:

μk=ωd1ωd11Φ(t)Pk(t)(1t2)(d2)/2𝖽t,subscript𝜇𝑘subscript𝜔𝑑1subscript𝜔𝑑superscriptsubscript11Φ𝑡subscript𝑃𝑘𝑡superscript1superscript𝑡2𝑑22differential-d𝑡\displaystyle\mu_{k}=\frac{\omega_{d-1}}{\omega_{d}}\int_{-1}^{1}\Phi(t)P_{k}(% t)\left(1-t^{2}\right)^{(d-2)/2}~{}\mathsf{d}t,italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_ω start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT roman_Φ ( italic_t ) italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ( 1 - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_d - 2 ) / 2 end_POSTSUPERSCRIPT sansserif_d italic_t , (117)

where Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the k𝑘kitalic_k-th Legendre polynomial in dimension d+1𝑑1d+1italic_d + 1, ωdsubscript𝜔𝑑\omega_{d}italic_ω start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT denotes the surface of the sphere 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

D.2 Maximum value of NTK

The following lemma is a direct result of (10) and (11) in [46].

Proposition D.1.

We have

max𝒙𝕊dK𝙽𝚃(𝒙,𝒙)κ,subscript𝒙superscript𝕊𝑑superscript𝐾𝙽𝚃𝒙𝒙𝜅\displaystyle\max_{\boldsymbol{x}\in\mathbb{S}^{d}}K^{\mathtt{NT}}(\boldsymbol% {x},\boldsymbol{x})\leq\kappa,roman_max start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT typewriter_NT end_POSTSUPERSCRIPT ( bold_italic_x , bold_italic_x ) ≤ italic_κ , (118)

where κ𝜅\kappaitalic_κ is a constant only depending on the number of hidden layers L𝐿Litalic_L.

D.3 Calculation of N(d,k)𝑁𝑑𝑘N(d,k)italic_N ( italic_d , italic_k )

Lemma D.2.

Let N(d,k)𝑁𝑑𝑘N(d,k)italic_N ( italic_d , italic_k ) be defined as (8). Then there exist absolute constants C1,C2subscript𝐶1subscript𝐶2C_{1},C_{2}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any k=1,2,3,𝑘123k=1,2,3,\cdotsitalic_k = 1 , 2 , 3 , ⋯ and any d𝑑ditalic_d, we have

C1(2k+d)(k+d)k+d3/2kk+1/2dd1/2subscript𝐶12𝑘𝑑superscript𝑘𝑑𝑘𝑑32superscript𝑘𝑘12superscript𝑑𝑑12\displaystyle C_{1}\cdot(2k+d)\frac{(k+d)^{k+d-3/2}}{k^{k+1/2}d^{d-1/2}}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ( 2 italic_k + italic_d ) divide start_ARG ( italic_k + italic_d ) start_POSTSUPERSCRIPT italic_k + italic_d - 3 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k start_POSTSUPERSCRIPT italic_k + 1 / 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT end_ARG N(d,k)C2(2k+d)(k+d)k+d3/2kk+1/2dd1/2.absent𝑁𝑑𝑘subscript𝐶22𝑘𝑑superscript𝑘𝑑𝑘𝑑32superscript𝑘𝑘12superscript𝑑𝑑12\displaystyle\leq N(d,k)\leq C_{2}\cdot(2k+d)\frac{(k+d)^{k+d-3/2}}{k^{k+1/2}d% ^{d-1/2}}.≤ italic_N ( italic_d , italic_k ) ≤ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ( 2 italic_k + italic_d ) divide start_ARG ( italic_k + italic_d ) start_POSTSUPERSCRIPT italic_k + italic_d - 3 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k start_POSTSUPERSCRIPT italic_k + 1 / 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT end_ARG . (119)
Proof.

From Section 1.6 in [28], when k2𝑘2k\geq 2italic_k ≥ 2, we have

N(d,k)𝑁𝑑𝑘\displaystyle N(d,k)italic_N ( italic_d , italic_k ) =2k+d1k(k+d1)(k+d1)!(d1)!(k1)!absent2𝑘𝑑1𝑘𝑘𝑑1𝑘𝑑1𝑑1𝑘1\displaystyle=\frac{2k+d-1}{k(k+d-1)}\cdot\frac{(k+d-1)!}{(d-1)!(k-1)!}= divide start_ARG 2 italic_k + italic_d - 1 end_ARG start_ARG italic_k ( italic_k + italic_d - 1 ) end_ARG ⋅ divide start_ARG ( italic_k + italic_d - 1 ) ! end_ARG start_ARG ( italic_d - 1 ) ! ( italic_k - 1 ) ! end_ARG (120)
2k+d1k(k+d1)1B(k,d).absent2𝑘𝑑1𝑘𝑘𝑑11𝐵𝑘𝑑\displaystyle\triangleq\frac{2k+d-1}{k(k+d-1)}\frac{1}{B(k,d)}.≜ divide start_ARG 2 italic_k + italic_d - 1 end_ARG start_ARG italic_k ( italic_k + italic_d - 1 ) end_ARG divide start_ARG 1 end_ARG start_ARG italic_B ( italic_k , italic_d ) end_ARG .

From Stirling’s approximation we have x!2πxx+1/2exsimilar-to𝑥2𝜋superscript𝑥𝑥12superscript𝑒𝑥x!\sim\sqrt{2\pi}x^{x+1/2}e^{-x}italic_x ! ∼ square-root start_ARG 2 italic_π end_ARG italic_x start_POSTSUPERSCRIPT italic_x + 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT (meaning that
limxx!/(2πxx+1/2ex)=1subscript𝑥𝑥2𝜋superscript𝑥𝑥12superscript𝑒𝑥1\lim_{x\to\infty}x!/(\sqrt{2\pi}x^{x+1/2}e^{-x})=1roman_lim start_POSTSUBSCRIPT italic_x → ∞ end_POSTSUBSCRIPT italic_x ! / ( square-root start_ARG 2 italic_π end_ARG italic_x start_POSTSUPERSCRIPT italic_x + 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT ) = 1). Moreover, we further have

limx(x+1)x+1/2xx+1/2=limx(1+1x)xlimx(1+1x)1/2=e.subscript𝑥superscript𝑥1𝑥12superscript𝑥𝑥12subscript𝑥superscript11𝑥𝑥subscript𝑥superscript11𝑥12𝑒\lim_{x\to\infty}\frac{(x+1)^{x+1/2}}{x^{x+1/2}}=\lim_{x\to\infty}\left(1+% \frac{1}{x}\right)^{x}\lim_{x\to\infty}\left(1+\frac{1}{x}\right)^{1/2}=e.roman_lim start_POSTSUBSCRIPT italic_x → ∞ end_POSTSUBSCRIPT divide start_ARG ( italic_x + 1 ) start_POSTSUPERSCRIPT italic_x + 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_x start_POSTSUPERSCRIPT italic_x + 1 / 2 end_POSTSUPERSCRIPT end_ARG = roman_lim start_POSTSUBSCRIPT italic_x → ∞ end_POSTSUBSCRIPT ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT roman_lim start_POSTSUBSCRIPT italic_x → ∞ end_POSTSUBSCRIPT ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = italic_e .

Therefore, when when both k𝑘kitalic_k and d𝑑ditalic_d are large, we have

1B(k,d)1𝐵𝑘𝑑\displaystyle\frac{1}{B(k,d)}divide start_ARG 1 end_ARG start_ARG italic_B ( italic_k , italic_d ) end_ARG =(k+d1)!(d1)!(k1)!absent𝑘𝑑1𝑑1𝑘1\displaystyle=\frac{(k+d-1)!}{(d-1)!(k-1)!}= divide start_ARG ( italic_k + italic_d - 1 ) ! end_ARG start_ARG ( italic_d - 1 ) ! ( italic_k - 1 ) ! end_ARG (121)
(k+d1)k+d1/2e(k+d1)2π(d1)d1/2e(d1)(k1)k1/2e(k1)similar-toabsentsuperscript𝑘𝑑1𝑘𝑑12superscript𝑒𝑘𝑑12𝜋superscript𝑑1𝑑12superscript𝑒𝑑1superscript𝑘1𝑘12superscript𝑒𝑘1\displaystyle\sim\frac{(k+d-1)^{k+d-1/2}e^{-(k+d-1)}}{\sqrt{2\pi}(d-1)^{d-1/2}% e^{-(d-1)}(k-1)^{k-1/2}e^{-(k-1)}}∼ divide start_ARG ( italic_k + italic_d - 1 ) start_POSTSUPERSCRIPT italic_k + italic_d - 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( italic_k + italic_d - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG ( italic_d - 1 ) start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( italic_d - 1 ) end_POSTSUPERSCRIPT ( italic_k - 1 ) start_POSTSUPERSCRIPT italic_k - 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( italic_k - 1 ) end_POSTSUPERSCRIPT end_ARG
(k+d)k+d1/22πdd1/2kk1/2.similar-toabsentsuperscript𝑘𝑑𝑘𝑑122𝜋superscript𝑑𝑑12superscript𝑘𝑘12\displaystyle\sim\frac{(k+d)^{k+d-1/2}}{\sqrt{2\pi}d^{d-1/2}k^{k-1/2}}.∼ divide start_ARG ( italic_k + italic_d ) start_POSTSUPERSCRIPT italic_k + italic_d - 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG italic_d start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT italic_k - 1 / 2 end_POSTSUPERSCRIPT end_ARG .

Combining (120) and (121), there exist absolute constants C1,C2subscript𝐶1subscript𝐶2C_{1},C_{2}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that for any k2𝑘2k\geq 2italic_k ≥ 2 and any d𝑑ditalic_d, we have

C1(2k+d)(k+d)k+d3/2kk+1/2dd1/2N(d,k)C2(2k+d)(k+d)k+d3/2kk+1/2dd1/2.subscript𝐶12𝑘𝑑superscript𝑘𝑑𝑘𝑑32superscript𝑘𝑘12superscript𝑑𝑑12𝑁𝑑𝑘subscript𝐶22𝑘𝑑superscript𝑘𝑑𝑘𝑑32superscript𝑘𝑘12superscript𝑑𝑑12\displaystyle C_{1}\cdot(2k+d)\frac{(k+d)^{k+d-3/2}}{k^{k+1/2}d^{d-1/2}}\leq N% (d,k)\leq C_{2}\cdot(2k+d)\frac{(k+d)^{k+d-3/2}}{k^{k+1/2}d^{d-1/2}}.italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ( 2 italic_k + italic_d ) divide start_ARG ( italic_k + italic_d ) start_POSTSUPERSCRIPT italic_k + italic_d - 3 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k start_POSTSUPERSCRIPT italic_k + 1 / 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT end_ARG ≤ italic_N ( italic_d , italic_k ) ≤ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ( 2 italic_k + italic_d ) divide start_ARG ( italic_k + italic_d ) start_POSTSUPERSCRIPT italic_k + italic_d - 3 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_k start_POSTSUPERSCRIPT italic_k + 1 / 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_d - 1 / 2 end_POSTSUPERSCRIPT end_ARG . (122)

When k=1𝑘1k=1italic_k = 1, from Section 1.6 in [28] we have N(d,1)=d+1𝑁𝑑1𝑑1N(d,1)=d+1italic_N ( italic_d , 1 ) = italic_d + 1, hence (122) also holds when k=1𝑘1k=1italic_k = 1. \square

Appendix E Supplementary proofs of Theorem 6.3

E.1 An elementary lemma

Lemma E.1.

Let εn=min{εRK(εn)=ε22eσ}subscript𝜀𝑛conditional𝜀subscript𝑅𝐾subscript𝜀𝑛superscript𝜀22𝑒𝜎\varepsilon_{n}=\min\left\{\varepsilon~{}\mid~{}R_{K}(\varepsilon_{n})=\frac{% \varepsilon^{2}}{2e\sigma}\right\}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_min { italic_ε ∣ italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG }. Then we have

  • i)

    For any ε𝜀\varepsilonitalic_ε satisfying RK(ε)ε22eσsubscript𝑅𝐾𝜀superscript𝜀22𝑒𝜎R_{K}(\varepsilon)\geq\frac{\varepsilon^{2}}{2e\sigma}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≥ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG, we have εnεsubscript𝜀𝑛𝜀\varepsilon_{n}\geq\varepsilonitalic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_ε.

  • ii)

    For any ε𝜀\varepsilonitalic_ε satisfying RK(ε)ε22eσsubscript𝑅𝐾𝜀superscript𝜀22𝑒𝜎R_{K}(\varepsilon)\leq\frac{\varepsilon^{2}}{2e\sigma}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≤ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG, we have εnεsubscript𝜀𝑛𝜀\varepsilon_{n}\geq\varepsilonitalic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_ε.

Similarly, let ε^n=min{εR^K(εn)=ε22eσ}subscript^𝜀𝑛conditional𝜀subscript^𝑅𝐾subscript𝜀𝑛superscript𝜀22𝑒𝜎\widehat{\varepsilon}_{n}=\min\left\{\varepsilon~{}\mid~{}\widehat{R}_{K}(% \varepsilon_{n})=\frac{\varepsilon^{2}}{2e\sigma}\right\}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_min { italic_ε ∣ over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG }. Then we have

  • i)

    For any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the inequality εε^n𝜀subscript^𝜀𝑛\varepsilon\leq\widehat{\varepsilon}_{n}italic_ε ≤ over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT holds if the the following event occurs:

    Ω1(ε)={ω|^K(ε)ε22eσ}.subscriptΩ1𝜀conditional-set𝜔subscript^𝐾𝜀superscript𝜀22𝑒𝜎\displaystyle\Omega_{1}(\varepsilon)=\left\{\omega~{}\big{|}~{}\widehat{% \mathcal{R}}_{{K}}\left(\varepsilon\right)\geq\frac{\varepsilon^{2}}{2e\sigma}% \right\}.roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ε ) = { italic_ω | over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≥ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG } . (123)
  • ii)

    For any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the inequality εε^n𝜀subscript^𝜀𝑛\varepsilon\leq\widehat{\varepsilon}_{n}italic_ε ≤ over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT holds if the the following event occurs:

    Ω2(ε)={ω|^K(ε)ε22eσ}.subscriptΩ2𝜀conditional-set𝜔subscript^𝐾𝜀superscript𝜀22𝑒𝜎\displaystyle\Omega_{2}(\varepsilon)=\left\{\omega~{}\big{|}~{}\widehat{% \mathcal{R}}_{{K}}\left(\varepsilon\right)\leq\frac{\varepsilon^{2}}{2e\sigma}% \right\}.roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ε ) = { italic_ω | over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≤ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG } . (124)
Proof.

It is clear that RK(ε)/ε=(iλiε21)1/2subscript𝑅𝐾𝜀𝜀superscriptsubscript𝑖subscript𝜆𝑖superscript𝜀2112R_{K}(\varepsilon)/\varepsilon=\left(\sum_{i}\frac{\lambda_{i}}{\varepsilon^{2% }}\wedge 1\right)^{1/2}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) / italic_ε = ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∧ 1 ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT is a non-increasing function and ε2eσ𝜀2𝑒𝜎\frac{\varepsilon}{2e\sigma}divide start_ARG italic_ε end_ARG start_ARG 2 italic_e italic_σ end_ARG is a strictly increasing function.

If RK(ε)ε22eσsubscript𝑅𝐾𝜀superscript𝜀22𝑒𝜎R_{K}(\varepsilon)\geq\frac{\varepsilon^{2}}{2e\sigma}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≥ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG, for any δ<ε𝛿𝜀\delta<\varepsilonitalic_δ < italic_ε, we have

RK(δ)δRK(ε)εε2eσ>δ2eσ.subscript𝑅𝐾𝛿𝛿subscript𝑅𝐾𝜀𝜀𝜀2𝑒𝜎𝛿2𝑒𝜎\displaystyle\frac{R_{K}(\delta)}{\delta}\geq\frac{R_{K}(\varepsilon)}{% \varepsilon}\geq\frac{\varepsilon}{2e\sigma}>\frac{\delta}{2e\sigma}.divide start_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_δ ) end_ARG start_ARG italic_δ end_ARG ≥ divide start_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) end_ARG start_ARG italic_ε end_ARG ≥ divide start_ARG italic_ε end_ARG start_ARG 2 italic_e italic_σ end_ARG > divide start_ARG italic_δ end_ARG start_ARG 2 italic_e italic_σ end_ARG . (125)

Thus, we have εnεsubscript𝜀𝑛𝜀\varepsilon_{n}\geq\varepsilonitalic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_ε.

If RK(ε)ε22eσsubscript𝑅𝐾𝜀superscript𝜀22𝑒𝜎R_{K}(\varepsilon)\leq\frac{\varepsilon^{2}}{2e\sigma}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) ≤ divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG, for any δ>ε𝛿𝜀\delta>\varepsilonitalic_δ > italic_ε, we have

RK(δ)δRK(ε)εε2eσ<δ2eσ.subscript𝑅𝐾𝛿𝛿subscript𝑅𝐾𝜀𝜀𝜀2𝑒𝜎𝛿2𝑒𝜎\displaystyle\frac{R_{K}(\delta)}{\delta}\leq\frac{R_{K}(\varepsilon)}{% \varepsilon}\leq\frac{\varepsilon}{2e\sigma}<\frac{\delta}{2e\sigma}.divide start_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_δ ) end_ARG start_ARG italic_δ end_ARG ≤ divide start_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε ) end_ARG start_ARG italic_ε end_ARG ≤ divide start_ARG italic_ε end_ARG start_ARG 2 italic_e italic_σ end_ARG < divide start_ARG italic_δ end_ARG start_ARG 2 italic_e italic_σ end_ARG . (126)

Thus, we have εnεsubscript𝜀𝑛𝜀\varepsilon_{n}\leq\varepsilonitalic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_ε.

The empirical version can be proved similarly. \square

Remark E.2.

The Lemma E.1 provides us an easy way to bound the Mendelson complexity εnsubscript𝜀𝑛\varepsilon_{n}italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the empirical Mendelson complexity ε^nsubscript^𝜀𝑛\widehat{\varepsilon}_{n}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. For example, if we can find εlowsubscript𝜀𝑙𝑜𝑤\varepsilon_{low}italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT and εuppsubscript𝜀𝑢𝑝𝑝\varepsilon_{upp}italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT satisfying that

RK(εlow)εlow22eσ and RK(εupp)εupp22eσ,subscript𝑅𝐾subscript𝜀𝑙𝑜𝑤superscriptsubscript𝜀𝑙𝑜𝑤22𝑒𝜎 and subscript𝑅𝐾subscript𝜀𝑢𝑝𝑝superscriptsubscript𝜀𝑢𝑝𝑝22𝑒𝜎\displaystyle R_{K}(\varepsilon_{low})\geq\frac{\varepsilon_{low}^{2}}{2e% \sigma}~{}\mbox{ and }~{}R_{K}(\varepsilon_{upp})\leq\frac{\varepsilon_{upp}^{% 2}}{2e\sigma},italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT ) ≥ divide start_ARG italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG and italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG , (127)

then we have εlowεεuppsubscript𝜀𝑙𝑜𝑤𝜀subscript𝜀𝑢𝑝𝑝\varepsilon_{low}\leq\varepsilon\leq\varepsilon_{upp}italic_ε start_POSTSUBSCRIPT italic_l italic_o italic_w end_POSTSUBSCRIPT ≤ italic_ε ≤ italic_ε start_POSTSUBSCRIPT italic_u italic_p italic_p end_POSTSUBSCRIPT.

E.2 Detailed proofs of the Lemmas A.1, A.2, A.3 and A.4

The purpose of these proofs is to illustrate the constants that appeared in the Lemmas A.1, A.2, A.3 and A.4 are absolute constants. We included them here for self-content.

E.2.1 Proof of Lemma A.1

Proof.

From (5) we have

ft(𝑿)f(𝑿)subscript𝑓𝑡𝑿subscript𝑓𝑿\displaystyle f_{t}(\boldsymbol{X})-f_{\star}(\boldsymbol{X})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_X ) - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) =(𝐈e1ntK(𝑿,𝑿))𝒚f(𝑿)absent𝐈superscript𝑒1𝑛𝑡𝐾𝑿𝑿𝒚subscript𝑓𝑿\displaystyle=\left(\mathbf{I}-e^{-\frac{1}{n}tK(\boldsymbol{X},\boldsymbol{X}% )}\right)\boldsymbol{y}-f_{\star}(\boldsymbol{X})= ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_t italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) (128)
=e1ntK(𝑿,𝑿)f(𝑿)+(𝐈e1ntK(𝑿,𝑿))(𝒚f(𝑿)).absentsuperscript𝑒1𝑛𝑡𝐾𝑿𝑿subscript𝑓𝑿𝐈superscript𝑒1𝑛𝑡𝐾𝑿𝑿𝒚subscript𝑓𝑿\displaystyle=-e^{-\frac{1}{n}tK(\boldsymbol{X},\boldsymbol{X})}f_{\star}(% \boldsymbol{X})+\left(\mathbf{I}-e^{-\frac{1}{n}tK(\boldsymbol{X},\boldsymbol{% X})}\right)(\boldsymbol{y}-f_{\star}(\boldsymbol{X})).= - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_t italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) + ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_t italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) ( bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) ) .

Let 1nK(𝑿,𝑿)=UΣUτ1𝑛𝐾𝑿𝑿𝑈Σsuperscript𝑈𝜏\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})=U\Sigma U^{\tau}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) = italic_U roman_Σ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT, where U𝑈Uitalic_U is an orthogonal matrix, and Σ=diag{λ^1,,λ^n}Σ𝑑𝑖𝑎𝑔subscript^𝜆1subscript^𝜆𝑛\Sigma=diag\{\widehat{\lambda}_{1},\cdots,\widehat{\lambda}_{n}\}roman_Σ = italic_d italic_i italic_a italic_g { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. Let gt=Uτft(𝑿)subscript𝑔𝑡superscript𝑈𝜏subscript𝑓𝑡𝑿g_{t}=U^{\tau}f_{t}(\boldsymbol{X})italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_X ), g=Uτf(𝑿)superscript𝑔superscript𝑈𝜏subscript𝑓𝑿g^{*}=U^{\tau}f_{\star}(\boldsymbol{X})italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ), and 𝒆=𝒚f(𝑿)𝒆𝒚subscript𝑓𝑿\boldsymbol{e}=\boldsymbol{y}-f_{\star}(\boldsymbol{X})bold_italic_e = bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ), then we have

ftfn2superscriptsubscriptnormsubscript𝑓𝑡subscript𝑓𝑛2\displaystyle\left\|f_{t}-f_{\star}\right\|_{n}^{2}∥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =1ngtg2=1netΣg+(𝐈etΣ)Uτ𝒆2absent1𝑛superscriptnormsubscript𝑔𝑡superscript𝑔21𝑛superscriptnormsuperscript𝑒𝑡Σsuperscript𝑔𝐈superscript𝑒𝑡Σsuperscript𝑈𝜏𝒆2\displaystyle=\frac{1}{n}\left\|g_{t}-g^{*}\right\|^{2}=\frac{1}{n}\left\|-e^{% -t\Sigma}g^{*}+\left(\mathbf{I}-e^{-t\Sigma}\right)U^{\tau}\boldsymbol{e}% \right\|^{2}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ - italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ( bold_I - italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT ) italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (129)
2netΣg2+2n(𝐈etΣ)Uτ𝒆2:=𝐁t2+𝐕t.absent2𝑛superscriptnormsuperscript𝑒𝑡Σsuperscript𝑔22𝑛superscriptnorm𝐈superscript𝑒𝑡Σsuperscript𝑈𝜏𝒆2assignsuperscriptsubscript𝐁𝑡2subscript𝐕𝑡\displaystyle\leq\frac{2}{n}\left\|e^{-t\Sigma}g^{*}\right\|^{2}+\frac{2}{n}% \left\|\left(\mathbf{I}-e^{-t\Sigma}\right)U^{\tau}\boldsymbol{e}\right\|^{2}:% =\mathbf{B}_{t}^{2}+\mathbf{V}_{t}.≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∥ italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∥ ( bold_I - italic_e start_POSTSUPERSCRIPT - italic_t roman_Σ end_POSTSUPERSCRIPT ) italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

We then bound the terms 𝐁t2superscriptsubscript𝐁𝑡2\mathbf{B}_{t}^{2}bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 𝐕tsubscript𝐕𝑡\mathbf{V}_{t}bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on the proof of Theorem 1 in [64]. We need the following two lemmas:

Lemma E.3.

For any t>0𝑡0t>0italic_t > 0, we have

𝐁t21t.superscriptsubscript𝐁𝑡21𝑡\displaystyle\mathbf{B}_{t}^{2}\leq\frac{1}{t}.bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_t end_ARG . (130)
Proof.

Deferred to the end of this subsection.

Recall that T^=(ε^n)2^𝑇superscriptsubscript^𝜀𝑛2\widehat{T}=(\widehat{\varepsilon}_{n})^{-2}over^ start_ARG italic_T end_ARG = ( over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT where ε^nsubscript^𝜀𝑛\widehat{\varepsilon}_{n}over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the empirical Mendelson complexity defined by (28).

Lemma E.4.

There exists an absolute constant C𝐶Citalic_C, such that for T^=(ε^n)2^𝑇superscriptsubscript^𝜀𝑛2\widehat{T}=(\widehat{\varepsilon}_{n})^{-2}over^ start_ARG italic_T end_ARG = ( over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, we have

𝐕T^ε^n2e2σ2,subscript𝐕^𝑇superscriptsubscript^𝜀𝑛2superscript𝑒2superscript𝜎2\displaystyle\mathbf{V}_{\widehat{T}}\leq\frac{\widehat{\varepsilon}_{n}^{2}}{% e^{2}\sigma^{2}},bold_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ≤ divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (131)

with probability at least 1exp(Cnε^n2)1𝐶𝑛superscriptsubscript^𝜀𝑛21-\exp\left(-Cn\widehat{\varepsilon}_{n}^{2}\right)1 - roman_exp ( - italic_C italic_n over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where the randomness comes from the noise term 𝐞𝐞\boldsymbol{e}bold_italic_e.

Proof.

Deferred to the end of this subsection.

From the above lemmas, when t=T^𝑡^𝑇t=\widehat{T}italic_t = over^ start_ARG italic_T end_ARG ( which is (ε^n)2superscriptsubscript^𝜀𝑛2(\widehat{\varepsilon}_{n})^{-2}( over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ), there exist absolute constants C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, such that we have

fT^fn2ε^n2+ε^n2e2σ2σ2+1σ2ε^n2,superscriptsubscriptnormsubscript𝑓^𝑇subscript𝑓𝑛2superscriptsubscript^𝜀𝑛2superscriptsubscript^𝜀𝑛2superscript𝑒2superscript𝜎2superscript𝜎21superscript𝜎2superscriptsubscript^𝜀𝑛2\left\|f_{\widehat{T}}-f_{\star}\right\|_{n}^{2}\leq\widehat{\varepsilon}_{n}^% {2}+\frac{\widehat{\varepsilon}_{n}^{2}}{e^{2}\sigma^{2}}\leq\frac{\sigma^{2}+% 1}{\sigma^{2}}\widehat{\varepsilon}_{n}^{2},∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (132)

with probability at least 1C2exp(C3nε^n2)1subscript𝐶2subscript𝐶3𝑛superscriptsubscript^𝜀𝑛21-C_{2}\exp\left(-C_{3}n\widehat{\varepsilon}_{n}^{2}\right)1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_exp ( - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). \square

Proof of Lemma E.3: We have the following inequality:

etx1tx and (1tx)/21etx1tx.superscript𝑒𝑡𝑥1𝑡𝑥 and 1𝑡𝑥21superscript𝑒𝑡𝑥1𝑡𝑥\displaystyle e^{-tx}\leq\frac{1}{tx}\text{ and }(1\wedge tx)/2\leq 1-e^{-tx}% \leq 1\wedge tx.italic_e start_POSTSUPERSCRIPT - italic_t italic_x end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_t italic_x end_ARG and ( 1 ∧ italic_t italic_x ) / 2 ≤ 1 - italic_e start_POSTSUPERSCRIPT - italic_t italic_x end_POSTSUPERSCRIPT ≤ 1 ∧ italic_t italic_x . (133)

Define

Φ𝑿:2:subscriptΦ𝑿superscript2\displaystyle\Phi_{\boldsymbol{X}}:\ell^{2}roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT : roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT n,absentsuperscript𝑛\displaystyle\rightarrow\mathbb{R}^{n},→ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , (134)
(aj)subscript𝑎𝑗\displaystyle(a_{j})( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (jajϕj(x1),,jajϕj(xn))τ.absentsuperscriptsubscript𝑗subscript𝑎𝑗subscriptitalic-ϕ𝑗subscript𝑥1subscript𝑗subscript𝑎𝑗subscriptitalic-ϕ𝑗subscript𝑥𝑛𝜏\displaystyle\rightarrow(\sum_{j}a_{j}\phi_{j}(x_{1}),\cdots,\sum_{j}a_{j}\phi% _{j}(x_{n}))^{\tau}.→ ( ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ⋯ , ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT .

Similarly, we define a (diagonal) linear operator D:22:𝐷superscript2superscript2D:\ell^{2}\rightarrow\ell^{2}italic_D : roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with entries [D]jk=λjδjksubscriptdelimited-[]𝐷𝑗𝑘subscript𝜆𝑗subscript𝛿𝑗𝑘[D]_{jk}=\lambda_{j}\delta_{jk}[ italic_D ] start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT. Then we have f(𝑿)=Φ𝑿D1/2asubscript𝑓𝑿subscriptΦ𝑿superscript𝐷12𝑎f_{\star}(\boldsymbol{X})=\Phi_{\boldsymbol{X}}D^{1/2}aitalic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) = roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_a for some sequence a2𝑎superscript2a\in\ell^{2}italic_a ∈ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. By Mercer’s decomposition, we have

1nK(𝑿,𝑿)=UΣUτ=1nΦ𝑿DΦ𝑿τ,1𝑛𝐾𝑿𝑿𝑈Σsuperscript𝑈𝜏1𝑛subscriptΦ𝑿𝐷superscriptsubscriptΦ𝑿𝜏\displaystyle\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})=U\Sigma U^{\tau}=% \frac{1}{n}\Phi_{\boldsymbol{X}}D\Phi_{\boldsymbol{X}}^{\tau},divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) = italic_U roman_Σ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_D roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT , (135)

and hence there exists an operator Ψ:n2:Ψmaps-tosuperscript𝑛superscript2\Psi:\mathbb{R}^{n}\mapsto\ell^{2}roman_Ψ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT such that

1nΦ𝑿D1/2=UΣ1/2Ψ and ΨΨ=In.1𝑛subscriptΦ𝑿superscript𝐷12𝑈superscriptΣ12superscriptΨ and superscriptΨΨsubscript𝐼𝑛\displaystyle\frac{1}{\sqrt{n}}\Phi_{\boldsymbol{X}}D^{1/2}=U\Sigma^{1/2}\Psi^% {*}\text{ and }\Psi^{*}\circ\Psi=I_{n}.divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = italic_U roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∘ roman_Ψ = italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (136)

Denote

(z1,,zn)τ=Uτf(𝑿)=UτΦ𝑿D1/2a=nUτUΣ1/2Ψa=nΣ1/2Ψa,superscriptsubscript𝑧1subscript𝑧𝑛𝜏superscript𝑈𝜏subscript𝑓𝑿superscript𝑈𝜏subscriptΦ𝑿superscript𝐷12𝑎𝑛superscript𝑈𝜏𝑈superscriptΣ12superscriptΨ𝑎𝑛superscriptΣ12superscriptΨ𝑎\displaystyle(z_{1},\cdots,z_{n})^{\tau}=U^{\tau}f_{\star}(\boldsymbol{X})=U^{% \tau}\Phi_{\boldsymbol{X}}D^{1/2}a=\sqrt{n}U^{\tau}U\Sigma^{1/2}\Psi^{*}a=% \sqrt{n}\Sigma^{1/2}\Psi^{*}a,( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT = italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) = italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_a = square-root start_ARG italic_n end_ARG italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a = square-root start_ARG italic_n end_ARG roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a , (137)

then from (133) we have

𝐁t2superscriptsubscript𝐁𝑡2\displaystyle\mathbf{B}_{t}^{2}bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2ni=1n[Uτf(𝑿)]i22tλ^i=1nti=1nzi2λ^i=1ti=1nλ^i[Ψa]i2λ^i=1tΨa221t.absent2𝑛superscriptsubscript𝑖1𝑛superscriptsubscriptdelimited-[]superscript𝑈𝜏subscript𝑓𝑿𝑖22𝑡subscript^𝜆𝑖1𝑛𝑡superscriptsubscript𝑖1𝑛superscriptsubscript𝑧𝑖2subscript^𝜆𝑖1𝑡superscriptsubscript𝑖1𝑛subscript^𝜆𝑖superscriptsubscriptdelimited-[]superscriptΨ𝑎𝑖2subscript^𝜆𝑖1𝑡superscriptsubscriptnormsuperscriptΨ𝑎221𝑡\displaystyle\leq\frac{2}{n}\sum_{i=1}^{n}\frac{[U^{\tau}f_{\star}(\boldsymbol% {X})]_{i}^{2}}{2t\widehat{\lambda}_{i}}=\frac{1}{nt}\sum_{i=1}^{n}\frac{z_{i}^% {2}}{\widehat{\lambda}_{i}}=\frac{1}{t}\sum_{i=1}^{n}\frac{\widehat{\lambda}_{% i}[\Psi^{*}a]_{i}^{2}}{\widehat{\lambda}_{i}}=\frac{1}{t}\|\Psi^{*}a\|_{2}^{2}% \leq\frac{1}{t}.≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG [ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ) ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_n italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ∥ roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_a ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_t end_ARG . (138)

\square

Proof of Lemma E.4: Let H=(𝑰eT^Σ)𝐻𝑰superscript𝑒^𝑇ΣH=\left(\boldsymbol{I}-e^{-\widehat{T}\Sigma}\right)italic_H = ( bold_italic_I - italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_T end_ARG roman_Σ end_POSTSUPERSCRIPT ) and P=2nH𝑃2𝑛𝐻P=\sqrt{\frac{2}{n}}Hitalic_P = square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n end_ARG end_ARG italic_H. Then, 𝑽T^=𝒆τUP2Uτ𝒆=𝑑𝒆τP2𝒆subscript𝑽^𝑇superscript𝒆𝜏𝑈superscript𝑃2superscript𝑈𝜏𝒆𝑑superscript𝒆𝜏superscript𝑃2𝒆\boldsymbol{V}_{\widehat{T}}=\boldsymbol{e}^{\tau}UP^{2}U^{\tau}\boldsymbol{e}% \overset{d}{=}\boldsymbol{e}^{\tau}P^{2}\boldsymbol{e}bold_italic_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT = bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e overitalic_d start_ARG = end_ARG bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_e, where 𝒆=(𝒆1,,𝒆n)τ𝒆superscriptsubscript𝒆1subscript𝒆𝑛𝜏\boldsymbol{e}=(\boldsymbol{e}_{1},\cdots,\boldsymbol{e}_{n})^{\tau}bold_italic_e = ( bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT and 𝒆i=𝒚if(Xi)N(0,σ2)subscript𝒆𝑖subscript𝒚𝑖subscript𝑓subscript𝑋𝑖similar-to𝑁0superscript𝜎2\boldsymbol{e}_{i}=\boldsymbol{y}_{i}-f_{\star}(X_{i})\sim N(0,\sigma^{2})bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∼ italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for any 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n. Applying Lemma F.10 with A=P2𝐴superscript𝑃2A=P^{2}italic_A = italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, δ=ε^n2/(2e2σ2)𝛿superscriptsubscript^𝜀𝑛22superscript𝑒2superscript𝜎2\delta=\widehat{\varepsilon}_{n}^{2}/({2e^{2}\sigma^{2}})italic_δ = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and Q=i,j=1naij𝒆i𝒆j=𝑑𝑽T^𝑄superscriptsubscript𝑖𝑗1𝑛subscript𝑎𝑖𝑗subscript𝒆𝑖subscript𝒆𝑗𝑑subscript𝑽^𝑇Q=\sum_{i,j=1}^{n}a_{ij}\boldsymbol{e}_{i}\boldsymbol{e}_{j}\overset{d}{=}% \boldsymbol{V}_{{\widehat{T}}}italic_Q = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT overitalic_d start_ARG = end_ARG bold_italic_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT, we then have that

|Q𝔼[Q]|ε^n22e2σ2𝑄𝔼delimited-[]𝑄superscriptsubscript^𝜀𝑛22superscript𝑒2superscript𝜎2\displaystyle|Q-\mathbb{E}[Q]|\leq\frac{\widehat{\varepsilon}_{n}^{2}}{2e^{2}% \sigma^{2}}| italic_Q - blackboard_E [ italic_Q ] | ≤ divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (139)

holds with probability at least 1exp(𝔠1min{ε^n22e2σ2Aop,ε^n44e4σ4AF2})1subscript𝔠1superscriptsubscript^𝜀𝑛22superscript𝑒2superscript𝜎2subscriptnorm𝐴𝑜𝑝superscriptsubscript^𝜀𝑛44superscript𝑒4superscript𝜎4subscriptsuperscriptnorm𝐴2𝐹1-\exp\left(-\mathfrak{c}_{1}\min\left\{\frac{\widehat{\varepsilon}_{n}^{2}}{2% e^{2}\sigma^{2}\|A\|_{op}},\frac{\widehat{\varepsilon}_{n}^{4}}{4e^{4}\sigma^{% 4}\|A\|^{2}_{F}}\right\}\right)1 - roman_exp ( - fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_min { divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_A ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT end_ARG , divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ∥ italic_A ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG } ) where 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on σ𝜎\sigmaitalic_σ, and the randomness comes from the noise term 𝒆𝒆\boldsymbol{e}bold_italic_e.

It is easy to verify that Hop1subscriptnorm𝐻𝑜𝑝1\|H\|_{op}\leq 1∥ italic_H ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ 1 , Aop2nsubscriptnorm𝐴𝑜𝑝2𝑛\|A\|_{op}\leq\frac{2}{n}∥ italic_A ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG and

tr(H)(133)j(1T^λ^j)=nT^(^K(1/T^))2=nε^n24e2σ4.𝑡𝑟𝐻133subscript𝑗1^𝑇subscript^𝜆𝑗𝑛^𝑇superscriptsubscript^𝐾1^𝑇2𝑛superscriptsubscript^𝜀𝑛24superscript𝑒2superscript𝜎4\displaystyle tr(H)\overset{(\ref{eqn:inequality_lemma_B_t:thm:empirical_loss}% )}{\leq}\sum_{j}\left(1\wedge{\widehat{T}}\widehat{\lambda}_{j}\right)=n% \widehat{T}\left(\widehat{\mathcal{R}}_{{K}}(1/\sqrt{{\widehat{T}}})\right)^{2% }=\frac{n\widehat{\varepsilon}_{n}^{2}}{4e^{2}\sigma^{4}}.italic_t italic_r ( italic_H ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 1 ∧ over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_n over^ start_ARG italic_T end_ARG ( over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( 1 / square-root start_ARG over^ start_ARG italic_T end_ARG end_ARG ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_n over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG . (140)

Thus, we have

AF2superscriptsubscriptnorm𝐴𝐹2\displaystyle\|A\|_{F}^{2}∥ italic_A ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =tr(P4)=4n2tr(H4)4n2tr(H)ε^n2e2σ4n,absent𝑡𝑟superscript𝑃44superscript𝑛2𝑡𝑟superscript𝐻44superscript𝑛2𝑡𝑟𝐻superscriptsubscript^𝜀𝑛2superscript𝑒2superscript𝜎4𝑛\displaystyle=tr(P^{4})=\frac{4}{n^{2}}tr(H^{4})\leq\frac{4}{n^{2}}tr(H)\leq% \frac{\widehat{\varepsilon}_{n}^{2}}{e^{2}\sigma^{4}n},= italic_t italic_r ( italic_P start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) = divide start_ARG 4 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_t italic_r ( italic_H start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 4 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_t italic_r ( italic_H ) ≤ divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_n end_ARG , (141)
𝔼[Q]𝔼delimited-[]𝑄\displaystyle\mathbb{E}[Q]blackboard_E [ italic_Q ] =𝔼[𝐕T^]=2σ2ntr((𝐈eT^Σ)2)2σ2ntr(𝑰eT^Σ)ε^n22e2σ2;absent𝔼delimited-[]subscript𝐕^𝑇2superscript𝜎2𝑛𝑡𝑟superscript𝐈superscript𝑒^𝑇Σ22superscript𝜎2𝑛𝑡𝑟𝑰superscript𝑒^𝑇Σsuperscriptsubscript^𝜀𝑛22superscript𝑒2superscript𝜎2\displaystyle=\mathbb{E}[\mathbf{V}_{\widehat{T}}]=\frac{2\sigma^{2}}{n}tr% \left(\left(\mathbf{I}-e^{-{\widehat{T}}\Sigma}\right)^{2}\right)\leq\frac{2% \sigma^{2}}{n}tr\left(\boldsymbol{I}-e^{-{\widehat{T}}\Sigma}\right)\leq\frac{% \widehat{\varepsilon}_{n}^{2}}{2e^{2}\sigma^{2}};= blackboard_E [ bold_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ] = divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG italic_t italic_r ( ( bold_I - italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_T end_ARG roman_Σ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ divide start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG italic_t italic_r ( bold_italic_I - italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_T end_ARG roman_Σ end_POSTSUPERSCRIPT ) ≤ divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ;

From (139), we know that there exists an absolute constant C𝐶Citalic_C, such that we have

𝐕T^𝔼[Q]+ε^n22e2σ2ε^n2e2σ2,subscript𝐕^𝑇𝔼delimited-[]𝑄superscriptsubscript^𝜀𝑛22superscript𝑒2superscript𝜎2superscriptsubscript^𝜀𝑛2superscript𝑒2superscript𝜎2\displaystyle\mathbf{V}_{\widehat{T}}\leq\mathbb{E}[Q]+\frac{\widehat{% \varepsilon}_{n}^{2}}{2e^{2}\sigma^{2}}\leq\frac{\widehat{\varepsilon}_{n}^{2}% }{e^{2}\sigma^{2}},bold_V start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ≤ blackboard_E [ italic_Q ] + divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (142)

with probability at least 1exp(Cnε^n2)1𝐶𝑛superscriptsubscript^𝜀𝑛21-\exp\left(-Cn\widehat{\varepsilon}_{n}^{2}\right)1 - roman_exp ( - italic_C italic_n over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). \square

E.2.2 Proof of Lemma A.2

Proof.

For any M>0𝑀0M>0italic_M > 0, and any gM𝑔𝑀g\in M\mathcal{B}italic_g ∈ italic_M caligraphic_B, it is clear that we have

g2=g,K(x,)2g2supx𝒳K(x,x)M2κ.superscriptsubscriptnorm𝑔2superscriptsubscriptnorm𝑔𝐾𝑥2superscriptsubscriptnorm𝑔2subscriptsupremum𝑥𝒳𝐾𝑥𝑥superscript𝑀2𝜅\|g\|_{\infty}^{2}=\|\left\langle g,K(x,\cdot)\right\rangle\|_{\infty}^{2}\leq% \|g\|_{\mathcal{H}}^{2}\sup_{x\in\mathcal{X}}K(x,x)\leq M^{2}\kappa.∥ italic_g ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ ⟨ italic_g , italic_K ( italic_x , ⋅ ) ⟩ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_K ( italic_x , italic_x ) ≤ italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ .

Thus, if we choose ε=εn𝜀subscript𝜀𝑛\varepsilon=\varepsilon_{n}italic_ε = italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in Lemma F.12, then we have

Qn(εn)LemmaF.12RK(εn)=(27)2εn22eσ,subscript𝑄𝑛subscript𝜀𝑛𝐿𝑒𝑚𝑚𝑎F.12subscript𝑅𝐾subscript𝜀𝑛272superscriptsubscript𝜀𝑛22𝑒𝜎Q_{n}(\varepsilon_{n})\overset{Lemma\ref{theorem:1}}{\leq}\sqrt{2}R_{K}(% \varepsilon_{n})\overset{(\ref{eqn:def_population_mendelson_complexity})}{=}% \frac{\sqrt{2}\varepsilon_{n}^{2}}{2e\sigma},italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_OVERACCENT italic_L italic_e italic_m italic_m italic_a end_OVERACCENT start_ARG ≤ end_ARG square-root start_ARG 2 end_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_OVERACCENT ( ) end_OVERACCENT start_ARG = end_ARG divide start_ARG square-root start_ARG 2 end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG ,

and from Lemma F.12, we have

|fn2fL22|12fL22+C1M2κε2 for all fM,formulae-sequencesuperscriptsubscriptnorm𝑓𝑛2superscriptsubscriptnorm𝑓superscript𝐿2212superscriptsubscriptnorm𝑓superscript𝐿22subscript𝐶1superscript𝑀2𝜅superscript𝜀2 for all 𝑓𝑀\left|\|f\|_{n}^{2}-\|f\|_{L^{2}}^{2}\right|\leq\frac{1}{2}\|f\|_{L^{2}}^{2}+C% _{1}{M^{2}\kappa\varepsilon^{2}}\quad\text{ for all }f\in M\mathcal{B},| ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all italic_f ∈ italic_M caligraphic_B ,

with probability at least 1C2eC3nε21subscript𝐶2superscript𝑒subscript𝐶3𝑛superscript𝜀21-C_{2}e^{-C_{3}n\varepsilon^{2}}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are absolute constants. \square

E.2.3 Proof of Lemma A.3

Proof.

Before we start the proof, we need the following three lemmas.

Lemma E.5.

Suppose that wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are i.i.d. Rademacher random variables independent of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and let

Z^n(w,t):=assignsubscript^𝑍𝑛𝑤𝑡absent\displaystyle\hat{Z}_{n}(w,t):=over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) := supggnt|1ni=1nwig(xi)| and Zn(w,t):=𝔼x1,x2,μ[supggL2t|1ni=1nwig(xi)|]assignsubscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖 and subscript𝑍𝑛𝑤𝑡subscript𝔼similar-tosubscript𝑥1subscript𝑥2𝜇delimited-[]subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{n}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x_{i}% \right)\right|\mbox{~{}and~{}}Z_{n}(w,t):=\mathbb{E}_{x_{1},x_{2},...\sim\mu}% \left[\sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x% _{i}\right)\right|\right]roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | and italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) := blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ∼ italic_μ end_POSTSUBSCRIPT [ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ]

where gn2=1njg(xj)2subscriptsuperscriptnorm𝑔2𝑛1𝑛subscript𝑗𝑔superscriptsubscript𝑥𝑗2\|g\|^{2}_{n}=\frac{1}{n}\sum_{j}g(x_{j})^{2}∥ italic_g ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and ={gg1}conditional-set𝑔subscriptnorm𝑔1\mathcal{B}=\left\{g\in\mathcal{H}\mid\|g\|_{\mathcal{H}}\leq 1\right\}caligraphic_B = { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 }. For any c>0𝑐0c>0italic_c > 0, the event

Ω3(c)={Z^n(w,cεn)32Zn(w,2max{c,1}εn)+100εn2,Z^n(w,cεn)12Zn(w,2cεn/5)4.32cεn2100εn2}subscriptΩ3𝑐subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛absent32subscript𝑍𝑛𝑤2𝑐1subscript𝜀𝑛100superscriptsubscript𝜀𝑛2subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛absent12subscript𝑍𝑛𝑤2𝑐subscript𝜀𝑛54.32𝑐superscriptsubscript𝜀𝑛2100superscriptsubscript𝜀𝑛2\Omega_{3}(c)=\left\{\begin{aligned} \widehat{Z}_{n}(w,c\varepsilon_{n})&\leq% \frac{3}{2}Z_{n}(w,2\max\{c,1\}\varepsilon_{n})+100\varepsilon_{n}^{2},\\ \widehat{Z}_{n}(w,c\varepsilon_{n})&\geq\frac{1}{2}Z_{n}(w,\sqrt{2}c% \varepsilon_{n}/\sqrt{5})-\sqrt{4.32}c\varepsilon_{n}^{2}-100\varepsilon_{n}^{% 2}\end{aligned}\right\}roman_Ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_c ) = { start_ROW start_CELL over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) - square-root start_ARG 4.32 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW } (143)

occurs with probability at least 15exp{min{c2,c2}nεn2}15superscript𝑐2superscript𝑐2𝑛superscriptsubscript𝜀𝑛21-5\exp\{-\min\{c^{2},c^{-2}\}n\varepsilon_{n}^{2}\}1 - 5 roman_exp { - roman_min { italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT } italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Proof.

Deferred to the end of this subsection.

Lemma E.6.

Suppose that wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are i.i.d. Rademacher random variables independent of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and let

Q^n(t)=𝔼w[Z^n(w,t)],Qn(t)=𝔼w[Zn(w,t)].formulae-sequencesubscript^𝑄𝑛𝑡subscript𝔼𝑤delimited-[]subscript^𝑍𝑛𝑤𝑡subscript𝑄𝑛𝑡subscript𝔼𝑤delimited-[]subscript𝑍𝑛𝑤𝑡\displaystyle\widehat{Q}_{n}(t)=\mathbb{E}_{w}[\widehat{Z}_{n}(w,t)],\quad Q_{% n}(t)=\mathbb{E}_{w}[Z_{n}(w,t)].over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT [ over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ] , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ] .

There exist absolute constants C4subscript𝐶4C_{4}italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, C5subscript𝐶5C_{5}italic_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, such that for any t,t0>0𝑡subscript𝑡00t,t_{0}>0italic_t , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0, the event

Ω4(t,t0)={|Z^n(w,t)𝒬^n(t)|t0, and |Zn(w,t)𝒬n(t)|t0,}\Omega_{4}(t,t_{0})=\left\{\quad\left|\widehat{Z}_{n}(w,t)-\widehat{\mathcal{Q% }}_{n}(t)\right|\leq t_{0},~{}\text{ and }\left|Z_{n}(w,t)-\mathcal{Q}_{n}(t)% \right|\leq t_{0},\right\}roman_Ω start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = { | over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) | ≤ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , and | italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - caligraphic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) | ≤ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , } (144)

occurs with probability at least 1C4exp{C5nt02t2}1subscript𝐶4subscript𝐶5𝑛superscriptsubscript𝑡02superscript𝑡21-C_{4}\exp\left\{-C_{5}\frac{nt_{0}^{2}}{t^{2}}\right\}1 - italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT roman_exp { - italic_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT divide start_ARG italic_n italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG }.

Proof.

Deferred to the end of this subsection.

Lemma E.7.

There exists an absolute positive constant C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT such that for any t21nsuperscript𝑡21𝑛t^{2}\geq\frac{1}{n}italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, one has

C3K(t)𝒬n(t)2K(t).subscript𝐶3subscript𝐾𝑡subscript𝒬𝑛𝑡2subscript𝐾𝑡\displaystyle C_{3}\mathcal{R}_{{K}}(t)\leq\mathcal{Q}_{n}(t)\leq\sqrt{2}% \mathcal{R}_{{K}}(t).italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) ≤ caligraphic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≤ square-root start_ARG 2 end_ARG caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (145)

Moreover, as random variables, we have

C3^K(t)𝒬^n(t)2^K(t),a.e.formulae-sequencesubscript𝐶3subscript^𝐾𝑡subscript^𝒬𝑛𝑡2subscript^𝐾𝑡𝑎𝑒\displaystyle C_{3}\widehat{\mathcal{R}}_{{K}}(t)\leq\widehat{\mathcal{Q}}_{n}% (t)\leq\sqrt{2}\widehat{\mathcal{R}}_{{K}}(t),\ a.e.italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) ≤ over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≤ square-root start_ARG 2 end_ARG over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) , italic_a . italic_e . (146)
Proof.

Deferred to Appendix F.1.

Thanks to Lemma E.1 and Remark E.2, we only need to prove that, there exist absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that the event

Ω1(C1εn)Ω2(C2εn)={ω|^K(C1εn)C12εn22eσ and ^K(C2εn)C22εn22eσ}subscriptΩ1subscript𝐶1subscript𝜀𝑛subscriptΩ2subscript𝐶2subscript𝜀𝑛conditional-set𝜔subscript^𝐾subscript𝐶1subscript𝜀𝑛superscriptsubscript𝐶12superscriptsubscript𝜀𝑛22𝑒𝜎 and subscript^𝐾subscript𝐶2subscript𝜀𝑛superscriptsubscript𝐶22superscriptsubscript𝜀𝑛22𝑒𝜎\Omega_{1}(C_{1}\varepsilon_{n})\cap\Omega_{2}(C_{2}\varepsilon_{n})=\begin{% aligned} \left\{\omega~{}\big{|}~{}\widehat{\mathcal{R}}_{{K}}\left(C_{1}% \varepsilon_{n}\right)\geq\frac{C_{1}^{2}\varepsilon_{n}^{2}}{2e\sigma}\mbox{~% {}~{}and~{} }\widehat{\mathcal{R}}_{{K}}\left(C_{2}\varepsilon_{n}\right)\leq% \frac{C_{2}^{2}\varepsilon_{n}^{2}}{2e\sigma}\right\}\end{aligned}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∩ roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = start_ROW start_CELL { italic_ω | over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ divide start_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG and over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG } end_CELL end_ROW (147)

occurs with high probability.

For any absolute constant C𝐶Citalic_C, there exist a constant \mathfrak{C}fraktur_C only depending on c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and γ𝛾\gammaitalic_γ, such that for any n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we have C2εn21/nsuperscript𝐶2superscriptsubscript𝜀𝑛21𝑛C^{2}\varepsilon_{n}^{2}\geq 1/nitalic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 1 / italic_n. Therefore, when n𝑛n\geq\mathfrak{C}italic_n ≥ fraktur_C, we can use the results given in Lemma E.7 to prove (147). For any absolute constant C21subscript𝐶21C_{2}\geq 1italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ 1, conditioning on the event

Ω4(C2εn,2C2εn2)Ω3(C2)Ω4(2C2εn,22C2εn2),subscriptΩ4subscript𝐶2subscript𝜀𝑛2subscript𝐶2superscriptsubscript𝜀𝑛2subscriptΩ3subscript𝐶2subscriptΩ42subscript𝐶2subscript𝜀𝑛22subscript𝐶2superscriptsubscript𝜀𝑛2\Omega_{4}\left(C_{2}\varepsilon_{n},\sqrt{2}C_{2}\varepsilon_{n}^{2}\right)% \cap\Omega_{3}\left(C_{2}\right)\cap\Omega_{4}\left(2C_{2}\varepsilon_{n},2% \sqrt{2}C_{2}\varepsilon_{n}^{2}\right),roman_Ω start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∩ roman_Ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∩ roman_Ω start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 2 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

we have

^K(C2εn)subscript^𝐾subscript𝐶2subscript𝜀𝑛\displaystyle\widehat{\mathcal{R}}_{{K}}\left(C_{2}\varepsilon_{n}\right)over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 1C3𝒬^n(C2εn)((146))absent1subscript𝐶3subscript^𝒬𝑛subscript𝐶2subscript𝜀𝑛146\displaystyle\leq\frac{1}{C_{3}}\widehat{\mathcal{Q}}_{n}(C_{2}\varepsilon_{n}% )\quad((\ref{eqn:example_7_of_kol}))≤ divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( ( ) ) (148)
1C3Z^n(w,C2εn)+2C2C3εn2((144), let t=C2εn,t0=2C2εn2)\displaystyle\leq\frac{1}{C_{3}}\widehat{Z}_{n}(w,C_{2}\varepsilon_{n})+\frac{% \sqrt{2}C_{2}}{C_{3}}\varepsilon_{n}^{2}\quad\left((\ref{eqn:concentration_% ledoux}),\text{ let }t=C_{2}\varepsilon_{n},t_{0}=\sqrt{2}C_{2}\varepsilon_{n}% ^{2}\right)≤ divide start_ARG 1 end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( ) , let italic_t = italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
32C3Zn(w,2C2εn)+2C2+100C3εn2( Lemma E.5)absent32subscript𝐶3subscript𝑍𝑛𝑤2subscript𝐶2subscript𝜀𝑛2subscript𝐶2100subscript𝐶3superscriptsubscript𝜀𝑛2 Lemma E.5\displaystyle\leq\frac{3}{2C_{3}}{Z}_{n}(w,2C_{2}\varepsilon_{n})+\frac{\sqrt{% 2}C_{2}+100}{C_{3}}\varepsilon_{n}^{2}\quad(\text{ Lemma }\ref{lemma:relation_% between_Z_and_hat_Z})≤ divide start_ARG 3 end_ARG start_ARG 2 italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 100 end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( Lemma )
32C3𝒬n(2C2εn)+2C2+100+32C2C3εn2((144), let t=2C2εn,t0=22C2εn2)\displaystyle\leq\frac{3}{2C_{3}}{\mathcal{Q}}_{n}(2C_{2}\varepsilon_{n})+% \frac{\sqrt{2}C_{2}+100+3\sqrt{2}C_{2}}{C_{3}}\varepsilon_{n}^{2}\quad\left((% \ref{eqn:concentration_ledoux}),\text{ let }t=2C_{2}\varepsilon_{n},t_{0}=2% \sqrt{2}C_{2}\varepsilon_{n}^{2}\right)≤ divide start_ARG 3 end_ARG start_ARG 2 italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG caligraphic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 100 + 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( ) , let italic_t = 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
322C3K(2C2εn)+2C2+100+32C2C3εn2((146))absent322subscript𝐶3subscript𝐾2subscript𝐶2subscript𝜀𝑛2subscript𝐶210032subscript𝐶2subscript𝐶3superscriptsubscript𝜀𝑛2146\displaystyle\leq\frac{3\sqrt{2}}{2C_{3}}\mathcal{R}_{{K}}(2C_{2}\varepsilon_{% n})+\frac{\sqrt{2}C_{2}+100+3\sqrt{2}C_{2}}{C_{3}}\varepsilon_{n}^{2}\quad((% \ref{eqn:example_7_of_kol}))≤ divide start_ARG 3 square-root start_ARG 2 end_ARG end_ARG start_ARG 2 italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 100 + 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( ) )
C22εn22eσ.(we can choose C2 large enough, see the remarks below.)\displaystyle\leq\frac{C_{2}^{2}\varepsilon_{n}^{2}}{2e\sigma}.\quad(\text{we % can choose }C_{2}\text{ large enough, see the remarks below.})≤ divide start_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG . ( we can choose italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT large enough, see the remarks below. )

Therefore, there exist three absolute constants C2,C6subscript𝐶2subscript𝐶6C_{2},C_{6}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, and C7subscript𝐶7C_{7}italic_C start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT, such that

(Ω2(C2εn))subscriptΩ2subscript𝐶2subscript𝜀𝑛\displaystyle\mathbb{P}\left(\Omega_{2}\left(C_{2}\varepsilon_{n}\right)\right)blackboard_P ( roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) (Ω4(C2εn,2C2εn2)Ω3(C2)Ω4(2C2εn,22C2εn2))absentsubscriptΩ4subscript𝐶2subscript𝜀𝑛2subscript𝐶2superscriptsubscript𝜀𝑛2subscriptΩ3subscript𝐶2subscriptΩ42subscript𝐶2subscript𝜀𝑛22subscript𝐶2superscriptsubscript𝜀𝑛2\displaystyle\geq\mathbb{P}\left(\Omega_{4}\left(C_{2}\varepsilon_{n},\sqrt{2}% C_{2}\varepsilon_{n}^{2}\right)\cap\Omega_{3}\left(C_{2}\right)\cap\Omega_{4}% \left(2C_{2}\varepsilon_{n},2\sqrt{2}C_{2}\varepsilon_{n}^{2}\right)\right)≥ blackboard_P ( roman_Ω start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∩ roman_Ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∩ roman_Ω start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 2 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) (149)
1C6exp{C7nεn2}.absent1subscript𝐶6subscript𝐶7𝑛superscriptsubscript𝜀𝑛2\displaystyle\geq 1-C_{6}\exp\left\{C_{7}n\varepsilon_{n}^{2}\right\}.≥ 1 - italic_C start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT roman_exp { italic_C start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

Similarly, there exist three absolute constants C1,C8subscript𝐶1subscript𝐶8C_{1},C_{8}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT, and C9subscript𝐶9C_{9}italic_C start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT, such that

(Ω1(C1εn))subscriptΩ1subscript𝐶1subscript𝜀𝑛\displaystyle\mathbb{P}\left(\Omega_{1}\left(C_{1}\varepsilon_{n}\right)\right)blackboard_P ( roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) 1C8exp{C9nεn2};absent1subscript𝐶8subscript𝐶9𝑛superscriptsubscript𝜀𝑛2\displaystyle\geq 1-C_{8}\exp\left\{C_{9}n\varepsilon_{n}^{2}\right\};≥ 1 - italic_C start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT roman_exp { italic_C start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ; (150)

and thus we get the desired results. \square

Remark E.8.

Here we give a detailed discussion of the last inequality in (148). Suppose C21subscript𝐶21C_{2}\geq 1italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ 1. Since for any j1𝑗1j\geq 1italic_j ≥ 1, min{λj,4C22εn2}max{4C22,1}min{λj,εn2}=4C22min{λj,εn2}subscript𝜆𝑗4superscriptsubscript𝐶22superscriptsubscript𝜀𝑛24superscriptsubscript𝐶221subscript𝜆𝑗superscriptsubscript𝜀𝑛24superscriptsubscript𝐶22subscript𝜆𝑗superscriptsubscript𝜀𝑛2\min\left\{\lambda_{j},4C_{2}^{2}\varepsilon_{n}^{2}\right\}\leq\max\{4C_{2}^{% 2},1\}\min\left\{\lambda_{j},\varepsilon_{n}^{2}\right\}=4C_{2}^{2}\min\left\{% \lambda_{j},\varepsilon_{n}^{2}\right\}roman_min { italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , 4 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ≤ roman_max { 4 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 1 } roman_min { italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = 4 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_min { italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, we have

K(2C2εn)2C2K(εn).subscript𝐾2subscript𝐶2subscript𝜀𝑛2subscript𝐶2subscript𝐾subscript𝜀𝑛\mathcal{R}_{{K}}(2C_{2}\varepsilon_{n})\leq{2C_{2}}\mathcal{R}_{{K}}(% \varepsilon_{n}).caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (151)

If C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is sufficiently large such that

C22superscriptsubscript𝐶22\displaystyle C_{2}^{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 62C2C3,absent62subscript𝐶2subscript𝐶3\displaystyle\geq\frac{6\sqrt{2}C_{2}}{C_{3}},≥ divide start_ARG 6 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG , (152)
C22superscriptsubscript𝐶22\displaystyle C_{2}^{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4eσ2C2+100+32C2C3,absent4𝑒𝜎2subscript𝐶210032subscript𝐶2subscript𝐶3\displaystyle\geq 4e\sigma\frac{\sqrt{2}C_{2}+100+3\sqrt{2}C_{2}}{C_{3}},≥ 4 italic_e italic_σ divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 100 + 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG , (153)

then we have

322C3K(2C2εn)(151)32C2C3K(εn)=32C22eσC3εn2(152)C22εn24eσ,322subscript𝐶3subscript𝐾2subscript𝐶2subscript𝜀𝑛15132subscript𝐶2subscript𝐶3subscript𝐾subscript𝜀𝑛32subscript𝐶22𝑒𝜎subscript𝐶3superscriptsubscript𝜀𝑛2152superscriptsubscript𝐶22superscriptsubscript𝜀𝑛24𝑒𝜎\frac{3\sqrt{2}}{2C_{3}}\mathcal{R}_{{K}}(2C_{2}\varepsilon_{n})\overset{(\ref% {eqn:C_2_kappa_def_0})}{\leq}\frac{3\sqrt{2}C_{2}}{C_{3}}\mathcal{R}_{{K}}(% \varepsilon_{n})=\frac{3\sqrt{2}C_{2}}{2e\sigma C_{3}}\varepsilon_{n}^{2}% \overset{(\ref{eqn:C_2_kappa_def_1})}{\leq}\frac{C_{2}^{2}\varepsilon_{n}^{2}}% {4e\sigma},divide start_ARG 3 square-root start_ARG 2 end_ARG end_ARG start_ARG 2 italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( 2 italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_e italic_σ italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e italic_σ end_ARG , (154)

and

2C2+100+32C2C3εn2(153)C22εn24eσ.2subscript𝐶210032subscript𝐶2subscript𝐶3superscriptsubscript𝜀𝑛2153superscriptsubscript𝐶22superscriptsubscript𝜀𝑛24𝑒𝜎\frac{\sqrt{2}C_{2}+100+3\sqrt{2}C_{2}}{C_{3}}\varepsilon_{n}^{2}\overset{(% \ref{eqn:C_2_kappa_def_2})}{\leq}\frac{C_{2}^{2}\varepsilon_{n}^{2}}{4e\sigma}.divide start_ARG square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 100 + 3 square-root start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_e italic_σ end_ARG . (155)

Proof of Lemma E.5: From Lemma A.2, the event

Ω3,1={ω| For any g~,g~ncεn, we have g~L224max{c2,1}εn2},subscriptΩ31conditional-set𝜔formulae-sequence For any ~𝑔formulae-sequencesubscriptnorm~𝑔𝑛𝑐subscript𝜀𝑛 we have superscriptsubscriptnorm~𝑔superscript𝐿224superscript𝑐21superscriptsubscript𝜀𝑛2\Omega_{3,1}=\left\{\omega~{}\big{|}~{}\text{ For any }\tilde{g}\in\mathcal{B}% ,\|\tilde{g}\|_{n}\leq c\varepsilon_{n},\text{ we have }\|\tilde{g}\|_{L^{2}}^% {2}\leq 4\max\{c^{2},1\}\varepsilon_{n}^{2}\right\},roman_Ω start_POSTSUBSCRIPT 3 , 1 end_POSTSUBSCRIPT = { italic_ω | For any over~ start_ARG italic_g end_ARG ∈ caligraphic_B , ∥ over~ start_ARG italic_g end_ARG ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , we have ∥ over~ start_ARG italic_g end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 roman_max { italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ,

occurs with probability at least 1C1eC2nεn21subscript𝐶1superscript𝑒subscript𝐶2𝑛superscriptsubscript𝜀𝑛21-C_{1}e^{-C_{2}n\varepsilon_{n}^{2}}1 - italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Conditioning on the event Ω3,1subscriptΩ31\Omega_{3,1}roman_Ω start_POSTSUBSCRIPT 3 , 1 end_POSTSUBSCRIPT, we have

Z^n(w,cεn)=supggncεn|1ni=1nwig(xi)|supggL22max{c,1}εn|1ni=1nwig(xi)|.subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑐subscript𝜀𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿22𝑐1subscript𝜀𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\widehat{Z}_{n}(w,c\varepsilon_{n})=\sup_{\begin{subarray}{c}g\in% {\mathcal{B}}\\ \|g\|_{n}\leq c\varepsilon_{n}\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_% {i}g\left(x_{i}\right)\right|\leq\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{L^{2}}\leq 2\max\{c,1\}\varepsilon_{n}\end{subarray}}\left|\frac{1}{n}% \sum_{i=1}^{n}w_{i}g\left(x_{i}\right)\right|.over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≤ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | . (156)

For any t>0𝑡0t>0italic_t > 0, denote Hn(t):=supffL2t1ni=1nf(xi)assignsubscript𝐻𝑛𝑡subscriptsupremum𝑓subscriptnorm𝑓superscript𝐿2𝑡1𝑛superscriptsubscript𝑖1𝑛𝑓subscript𝑥𝑖H_{n}(t):=\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{L^{2}}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}f\left(x_{i}\right)italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) := roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). For any g𝑔g\in\mathcal{B}italic_g ∈ caligraphic_B, gL2tsubscriptnorm𝑔superscript𝐿2𝑡\|g\|_{L^{2}}\leq t∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t, there exists f𝑓f\in\mathcal{B}italic_f ∈ caligraphic_B, fL2tsubscriptnorm𝑓superscript𝐿2𝑡\|f\|_{L^{2}}\leq t∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t, such that i=1nwig(xi)=i=1nf(xi)superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖superscriptsubscript𝑖1𝑛𝑓subscript𝑥𝑖\sum_{i=1}^{n}w_{i}g\left(x_{i}\right)=\sum_{i=1}^{n}f\left(x_{i}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Therefore, we have

supggL2t|1ni=1nwig(xi)|=Hn(t),a.e..formulae-sequencesubscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖subscript𝐻𝑛𝑡𝑎𝑒\displaystyle\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x% _{i}\right)\right|=H_{n}(t),\ a.e..roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | = italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , italic_a . italic_e . . (157)

Similarly, we have

Zn(w,t)=𝔼x1,x2,μHn(t),a.e..formulae-sequencesubscript𝑍𝑛𝑤𝑡subscript𝔼similar-tosubscript𝑥1subscript𝑥2𝜇subscript𝐻𝑛𝑡𝑎𝑒\displaystyle Z_{n}(w,t)=\mathbb{E}_{x_{1},x_{2},...\sim\mu}H_{n}(t),\ a.e..italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ∼ italic_μ end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , italic_a . italic_e . . (158)

Using results in Lemma F.13 (and the remark below Lemma F.13) with ={f,fL22max{c,1}εn}formulae-sequence𝑓subscriptnorm𝑓superscript𝐿22𝑐1subscript𝜀𝑛\mathcal{F}=\{f\in\mathcal{B},\|f\|_{L^{2}}\leq 2\max\{c,1\}\varepsilon_{n}\}caligraphic_F = { italic_f ∈ caligraphic_B , ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, Z=nHn(2max{c,1}εn)𝑍𝑛subscript𝐻𝑛2𝑐1subscript𝜀𝑛Z=nH_{n}(2\max\{c,1\}\varepsilon_{n})italic_Z = italic_n italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), and δ=min{c2,1}nεn2𝛿superscript𝑐21𝑛superscriptsubscript𝜀𝑛2\delta=\min\{c^{-2},1\}n\varepsilon_{n}^{2}italic_δ = roman_min { italic_c start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 1 } italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

Hn(2max{c,1}εn)subscript𝐻𝑛2𝑐1subscript𝜀𝑛\displaystyle H_{n}(2\max\{c,1\}\varepsilon_{n})italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 32Zn(w,2max{c,1}εn)+42εn2+66.5min{c2,1}εn2absent32subscript𝑍𝑛𝑤2𝑐1subscript𝜀𝑛42superscriptsubscript𝜀𝑛266.5superscript𝑐21superscriptsubscript𝜀𝑛2\displaystyle\leq\frac{3}{2}Z_{n}(w,2\max\{c,1\}\varepsilon_{n})+4\sqrt{2}% \varepsilon_{n}^{2}+66.5\min\{c^{-2},1\}\varepsilon_{n}^{2}≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + 4 square-root start_ARG 2 end_ARG italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 66.5 roman_min { italic_c start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (159)
32Zn(w,2max{c,1}εn)+100εn2,absent32subscript𝑍𝑛𝑤2𝑐1subscript𝜀𝑛100superscriptsubscript𝜀𝑛2\displaystyle\leq\frac{3}{2}Z_{n}(w,2\max\{c,1\}\varepsilon_{n})+100% \varepsilon_{n}^{2},≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

with probability at least 1exp{min{c2,1}nεn2}1superscript𝑐21𝑛superscriptsubscript𝜀𝑛21-\exp\{-\min\{c^{-2},1\}n\varepsilon_{n}^{2}\}1 - roman_exp { - roman_min { italic_c start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 1 } italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, where the randomness comes from n𝑛nitalic_n samples x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\cdots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Denote the event Ω3,2={ω|Hn(2max{c,1}εn)(3/2)Zn(w,2max{c,1}εn)+100εn2}subscriptΩ32conditional-set𝜔subscript𝐻𝑛2𝑐1subscript𝜀𝑛32subscript𝑍𝑛𝑤2𝑐1subscript𝜀𝑛100superscriptsubscript𝜀𝑛2\Omega_{3,2}=\left\{\omega~{}\big{|}~{}H_{n}(2\max\{c,1\}\varepsilon_{n})\leq(% 3/2)Z_{n}(w,2\max\{c,1\}\varepsilon_{n})+100\varepsilon_{n}^{2}\right\}roman_Ω start_POSTSUBSCRIPT 3 , 2 end_POSTSUBSCRIPT = { italic_ω | italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ ( 3 / 2 ) italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. Combining results in (156), (157), and (159), conditioning on the event Ω3,1Ω3,2subscriptΩ31subscriptΩ32\Omega_{3,1}\cap\Omega_{3,2}roman_Ω start_POSTSUBSCRIPT 3 , 1 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 3 , 2 end_POSTSUBSCRIPT, we have

Z^n(w,cεn)subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛\displaystyle\widehat{Z}_{n}(w,c\varepsilon_{n})over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 32Zn(w,2max{c,1}εn)+100εn2.absent32subscript𝑍𝑛𝑤2𝑐1subscript𝜀𝑛100superscriptsubscript𝜀𝑛2\displaystyle\leq\frac{3}{2}Z_{n}(w,2\max\{c,1\}\varepsilon_{n})+100% \varepsilon_{n}^{2}.≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , 2 roman_max { italic_c , 1 } italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (160)

Since Ω3,1Ω3,2subscriptΩ31subscriptΩ32\Omega_{3,1}\cap\Omega_{3,2}roman_Ω start_POSTSUBSCRIPT 3 , 1 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 3 , 2 end_POSTSUBSCRIPT occurs with probability at least 13exp{min{c2,3/5}nεn2}13superscript𝑐235𝑛superscriptsubscript𝜀𝑛21-3\exp\{-\min\{c^{-2},3/5\}n\varepsilon_{n}^{2}\}1 - 3 roman_exp { - roman_min { italic_c start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 3 / 5 } italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, we obtain the first inequality in (143).

As for the second inequality in (143), from Lemma A.2, the event

Ω3,3={ω| For any g~,g~ncεn, we have g~L22/5cεn},subscriptΩ33conditional-set𝜔formulae-sequence For any ~𝑔formulae-sequencesubscriptnorm~𝑔𝑛𝑐subscript𝜀𝑛 we have subscriptnorm~𝑔superscript𝐿225𝑐subscript𝜀𝑛\Omega_{3,3}=\left\{\omega~{}\big{|}~{}\text{ For any }\tilde{g}\in\mathcal{B}% ,\|\tilde{g}\|_{n}\geq c\varepsilon_{n},\text{ we have }\|\tilde{g}\|_{L^{2}}% \geq\sqrt{2/5}c\varepsilon_{n}\right\},roman_Ω start_POSTSUBSCRIPT 3 , 3 end_POSTSUBSCRIPT = { italic_ω | For any over~ start_ARG italic_g end_ARG ∈ caligraphic_B , ∥ over~ start_ARG italic_g end_ARG ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , we have ∥ over~ start_ARG italic_g end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ square-root start_ARG 2 / 5 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ,

occurs with probability at least 1C1eC2nεn21subscript𝐶1superscript𝑒subscript𝐶2𝑛superscriptsubscript𝜀𝑛21-C_{1}e^{-C_{2}n\varepsilon_{n}^{2}}1 - italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Conditioning on the event Ω3,3subscriptΩ33\Omega_{3,3}roman_Ω start_POSTSUBSCRIPT 3 , 3 end_POSTSUBSCRIPT, we have

Z^n(w,cεn)=supggncεn|1ni=1nwig(xi)|supggL22cεn/5|1ni=1nwig(xi)|.subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑐subscript𝜀𝑛1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿22𝑐subscript𝜀𝑛51𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\widehat{Z}_{n}(w,c\varepsilon_{n})=\sup_{\begin{subarray}{c}g\in% {\mathcal{B}}\\ \|g\|_{n}\leq c\varepsilon_{n}\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_% {i}g\left(x_{i}\right)\right|\geq\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{L^{2}}\leq\sqrt{2}c\varepsilon_{n}/\sqrt{5}\end{subarray}}\left|\frac{1% }{n}\sum_{i=1}^{n}w_{i}g\left(x_{i}\right)\right|.over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≥ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | . (161)

Using results in Lemma F.13 again with ={f,fL22cεn/5}formulae-sequence𝑓subscriptnorm𝑓superscript𝐿22𝑐subscript𝜀𝑛5\mathcal{F}=\{f\in\mathcal{B},\|f\|_{L^{2}}\leq\sqrt{2}c\varepsilon_{n}/\sqrt{% 5}\}caligraphic_F = { italic_f ∈ caligraphic_B , ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG }, Z=nHn(2cεn/5)𝑍𝑛subscript𝐻𝑛2𝑐subscript𝜀𝑛5Z=nH_{n}(\sqrt{2}c\varepsilon_{n}/\sqrt{5})italic_Z = italic_n italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ), and δ=nεn2𝛿𝑛superscriptsubscript𝜀𝑛2\delta=n\varepsilon_{n}^{2}italic_δ = italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

Hn(2cεn/5)subscript𝐻𝑛2𝑐subscript𝜀𝑛5\displaystyle H_{n}(\sqrt{2}c\varepsilon_{n}/\sqrt{5})italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) 12Zn(w,2cεn/5)4.32cεn288.9εn2absent12subscript𝑍𝑛𝑤2𝑐subscript𝜀𝑛54.32𝑐superscriptsubscript𝜀𝑛288.9superscriptsubscript𝜀𝑛2\displaystyle\geq\frac{1}{2}Z_{n}(w,\sqrt{2}c\varepsilon_{n}/\sqrt{5})-\sqrt{4% .32}c\varepsilon_{n}^{2}-88.9\varepsilon_{n}^{2}≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) - square-root start_ARG 4.32 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 88.9 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (162)
12Zn(w,2cεn/5)4.32cεn2100εn2,absent12subscript𝑍𝑛𝑤2𝑐subscript𝜀𝑛54.32𝑐superscriptsubscript𝜀𝑛2100superscriptsubscript𝜀𝑛2\displaystyle\geq\frac{1}{2}Z_{n}(w,\sqrt{2}c\varepsilon_{n}/\sqrt{5})-\sqrt{4% .32}c\varepsilon_{n}^{2}-100\varepsilon_{n}^{2},≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) - square-root start_ARG 4.32 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

with probability at least 1exp{nεn2}1𝑛superscriptsubscript𝜀𝑛21-\exp\left\{-n\varepsilon_{n}^{2}\right\}1 - roman_exp { - italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Denote the event Ω3,4={ω|Hn(2cεn/5)12Zn(w,2cεn/5)4.32cεn2100εn2}subscriptΩ34conditional-set𝜔subscript𝐻𝑛2𝑐subscript𝜀𝑛512subscript𝑍𝑛𝑤2𝑐subscript𝜀𝑛54.32𝑐superscriptsubscript𝜀𝑛2100superscriptsubscript𝜀𝑛2\Omega_{3,4}=\left\{\omega~{}\big{|}~{}H_{n}(\sqrt{2}c\varepsilon_{n}/\sqrt{5}% )\geq\frac{1}{2}Z_{n}(w,\sqrt{2}c\varepsilon_{n}/\sqrt{5})-\sqrt{4.32}c% \varepsilon_{n}^{2}-100\varepsilon_{n}^{2}\right\}roman_Ω start_POSTSUBSCRIPT 3 , 4 end_POSTSUBSCRIPT = { italic_ω | italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) - square-root start_ARG 4.32 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. Combining results in (161), (157), and (162), conditioning on the event Ω3,3Ω3,4subscriptΩ33subscriptΩ34\Omega_{3,3}\cap\Omega_{3,4}roman_Ω start_POSTSUBSCRIPT 3 , 3 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 3 , 4 end_POSTSUBSCRIPT, we have

Z^n(w,cεn)subscript^𝑍𝑛𝑤𝑐subscript𝜀𝑛\displaystyle\widehat{Z}_{n}(w,c\varepsilon_{n})over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 12Zn(w,2cεn/5)4.32cεn2100εn2.absent12subscript𝑍𝑛𝑤2𝑐subscript𝜀𝑛54.32𝑐superscriptsubscript𝜀𝑛2100superscriptsubscript𝜀𝑛2\displaystyle\geq\frac{1}{2}Z_{n}(w,\sqrt{2}c\varepsilon_{n}/\sqrt{5})-\sqrt{4% .32}c\varepsilon_{n}^{2}-100\varepsilon_{n}^{2}.≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , square-root start_ARG 2 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG 5 end_ARG ) - square-root start_ARG 4.32 end_ARG italic_c italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 100 italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (163)

Since Ω3,3Ω3,4subscriptΩ33subscriptΩ34\Omega_{3,3}\cap\Omega_{3,4}roman_Ω start_POSTSUBSCRIPT 3 , 3 end_POSTSUBSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT 3 , 4 end_POSTSUBSCRIPT occurs with probability at least 13exp{6min{c2,1}nεn2/25}136superscript𝑐21𝑛superscriptsubscript𝜀𝑛2251-3\exp\left\{-6\min\{c^{2},1\}n\varepsilon_{n}^{2}/25\right\}1 - 3 roman_exp { - 6 roman_min { italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 1 } italic_n italic_ε start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 25 }, we obtain the second inequality in (143), and finishing the proof. \square

Proof of Lemma E.6: We will use Lemma F.11 to prove Lemma E.6. Therefore, we need to show that for any t>0𝑡0t>0italic_t > 0, both Z^n(w,t)subscript^𝑍𝑛𝑤𝑡\hat{Z}_{n}(w,t)over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) and Zn(w,t)subscript𝑍𝑛𝑤𝑡{Z}_{n}(w,t)italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) are Lipschitz convex functions with respect to w{1,1}n𝑤superscript11𝑛w\in\{-1,1\}^{n}italic_w ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Denote F^(w):=n/tZ^n(w,t)assign^𝐹𝑤𝑛𝑡subscript^𝑍𝑛𝑤𝑡\widehat{F}(w):=\sqrt{n}/t\widehat{Z}_{n}(w,t)over^ start_ARG italic_F end_ARG ( italic_w ) := square-root start_ARG italic_n end_ARG / italic_t over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ), F(w):=n/tZn(w,t)assign𝐹𝑤𝑛𝑡subscript𝑍𝑛𝑤𝑡F(w):=\sqrt{n}/t{Z}_{n}(w,t)italic_F ( italic_w ) := square-root start_ARG italic_n end_ARG / italic_t italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ). Notice that we have

Z^n(w,t):=supggnt|1ni=1nwig(xi)|=supggnt1ni=1nwig(xi).assignsubscript^𝑍𝑛𝑤𝑡subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\hat{Z}_{n}(w,t):=\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x_{i}% \right)\right|=\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x_{i}\right).over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) := roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . (164)

Since max{ab,ba}=|ab|𝑎𝑏𝑏𝑎𝑎𝑏\max\{a-b,b-a\}=\left|a-b\right|roman_max { italic_a - italic_b , italic_b - italic_a } = | italic_a - italic_b |, we have

|Z^n(w,t)Z^n(w,t)|subscript^𝑍𝑛𝑤𝑡subscript^𝑍𝑛superscript𝑤𝑡\displaystyle\left|\widehat{Z}_{n}(w,t)-\widehat{Z}_{n}\left(w^{\prime},t% \right)\right|| over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_t ) | supggnt1n|i=1n(wiwi)g(xi)|absentsubscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖superscriptsubscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\leq\sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\left|\sum_{i=1}^{n}\left(w_{i}-w_{i}% ^{\prime}\right)g\left(x_{i}\right)\right|≤ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | (165)
1nww2supggnti=1ng2(xi)absent1𝑛subscriptnorm𝑤superscript𝑤2subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡superscriptsubscript𝑖1𝑛superscript𝑔2subscript𝑥𝑖\displaystyle\leq\frac{1}{n}\|w-w^{\prime}\|_{2}\sup_{\begin{subarray}{c}g\in{% \mathcal{B}}\\ \|g\|_{n}\leq t\end{subarray}}\sqrt{\sum_{i=1}^{n}g^{2}\left(x_{i}\right)}≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_w - italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG
tnww2,absent𝑡𝑛subscriptnorm𝑤superscript𝑤2\displaystyle\leq\frac{t}{\sqrt{n}}\left\|w-w^{\prime}\right\|_{2},≤ divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∥ italic_w - italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

and hence F^(w)=n/tZ^n(w,t)^𝐹𝑤𝑛𝑡subscript^𝑍𝑛𝑤𝑡\widehat{F}(w)=\sqrt{n}/t\widehat{Z}_{n}(w,t)over^ start_ARG italic_F end_ARG ( italic_w ) = square-root start_ARG italic_n end_ARG / italic_t over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) is a 1-Lipschitz function. Similarly, we can show that F(w)=n/tZn(w,t)𝐹𝑤𝑛𝑡subscript𝑍𝑛𝑤𝑡F(w)=\sqrt{n}/t{Z}_{n}(w,t)italic_F ( italic_w ) = square-root start_ARG italic_n end_ARG / italic_t italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) is a 1-Lipschitz function as follows:

|Zn(w,t)Zn(w,t)|subscript𝑍𝑛𝑤𝑡subscript𝑍𝑛superscript𝑤𝑡\displaystyle\left|{Z}_{n}(w,t)-{Z}_{n}\left(w^{\prime},t\right)\right|| italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_t ) | ExsupggL2t1n|i=1n(wiwi)g(xi)|absentsubscript𝐸𝑥subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖superscriptsubscript𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\leq E_{x}\sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\frac{1}{n}\left|\sum_{i=1}^{n}\left(w_{i}-w% _{i}^{\prime}\right)g\left(x_{i}\right)\right|≤ italic_E start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | (166)
1nww2supggL2tExi=1ng2(xi)absent1𝑛subscriptnorm𝑤superscript𝑤2subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡subscript𝐸𝑥superscriptsubscript𝑖1𝑛superscript𝑔2subscript𝑥𝑖\displaystyle\leq\frac{1}{n}\|w-w^{\prime}\|_{2}\sup_{\begin{subarray}{c}g\in{% \mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}E_{x}\sqrt{\sum_{i=1}^{n}g^{2}\left(x_{i}% \right)}≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_w - italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG
1nww2supggL2ti=1ngL22absent1𝑛subscriptnorm𝑤superscript𝑤2subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡superscriptsubscript𝑖1𝑛superscriptsubscriptnorm𝑔superscript𝐿22\displaystyle\leq\frac{1}{n}\|w-w^{\prime}\|_{2}\sup_{\begin{subarray}{c}g\in{% \mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\sqrt{\sum_{i=1}^{n}\|g\|_{L^{2}}^{2}}≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_w - italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
tnww2.absent𝑡𝑛subscriptnorm𝑤superscript𝑤2\displaystyle\leq\frac{t}{\sqrt{n}}\left\|w-w^{\prime}\right\|_{2}.≤ divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ∥ italic_w - italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

From (164), for any 0<a<10𝑎10<a<10 < italic_a < 1, and any w,w~{1,1}n𝑤~𝑤superscript11𝑛w,\tilde{w}\in\{-1,1\}^{n}italic_w , over~ start_ARG italic_w end_ARG ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we have

tnF^(aw+(1a)w~)𝑡𝑛^𝐹𝑎𝑤1𝑎~𝑤\displaystyle\frac{t}{\sqrt{n}}\widehat{F}(aw+(1-a)\tilde{w})divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG over^ start_ARG italic_F end_ARG ( italic_a italic_w + ( 1 - italic_a ) over~ start_ARG italic_w end_ARG ) =supggnt1ni=1n(awi+(1a)w~i)g(xi)absentsubscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛𝑎subscript𝑤𝑖1𝑎subscript~𝑤𝑖𝑔subscript𝑥𝑖\displaystyle=\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}(aw_{i}+(1-a)\tilde{w}_% {i})g\left(x_{i}\right)= roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_a ) over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (167)
(i)supggnt1ni=1nawig(xi)+supggnt1ni=1n(1a)w~ig(xi)𝑖subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛𝑎subscript𝑤𝑖𝑔subscript𝑥𝑖subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛1𝑎subscript~𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\overset{(i)}{\leq}\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}aw_{i}g\left(x_{i}% \right)+\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}(1-a)\tilde{w}_{i}g% \left(x_{i}\right)start_OVERACCENT ( italic_i ) end_OVERACCENT start_ARG ≤ end_ARG roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - italic_a ) over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
=tn(F^(aw)+F^((1a)w~)),absent𝑡𝑛^𝐹𝑎𝑤^𝐹1𝑎~𝑤\displaystyle=\frac{t}{\sqrt{n}}\left(\widehat{F}(aw)+\widehat{F}((1-a)\tilde{% w})\right),= divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ( over^ start_ARG italic_F end_ARG ( italic_a italic_w ) + over^ start_ARG italic_F end_ARG ( ( 1 - italic_a ) over~ start_ARG italic_w end_ARG ) ) ,

where inequality (i) follows by noticing that for any g~~𝑔\tilde{g}\in\mathcal{B}over~ start_ARG italic_g end_ARG ∈ caligraphic_B, and g~ntsubscriptnorm~𝑔𝑛𝑡\|\tilde{g}\|_{n}\leq t∥ over~ start_ARG italic_g end_ARG ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t, we have

1ni=1n(awi+(1a)w~i)g~(xi)1𝑛superscriptsubscript𝑖1𝑛𝑎subscript𝑤𝑖1𝑎subscript~𝑤𝑖~𝑔subscript𝑥𝑖\displaystyle\frac{1}{n}\sum_{i=1}^{n}(aw_{i}+(1-a)\tilde{w}_{i})\tilde{g}% \left(x_{i}\right)divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_a ) over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) over~ start_ARG italic_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =1ni=1nawig~(xi)+1ni=1n(1a)w~ig~(xi)absent1𝑛superscriptsubscript𝑖1𝑛𝑎subscript𝑤𝑖~𝑔subscript𝑥𝑖1𝑛superscriptsubscript𝑖1𝑛1𝑎subscript~𝑤𝑖~𝑔subscript𝑥𝑖\displaystyle=\frac{1}{n}\sum_{i=1}^{n}aw_{i}\tilde{g}\left(x_{i}\right)+\frac% {1}{n}\sum_{i=1}^{n}(1-a)\tilde{w}_{i}\tilde{g}\left(x_{i}\right)= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - italic_a ) over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (168)
supggnt1ni=1nawig(xi)+supggnt1ni=1n(1a)w~ig(xi).absentsubscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛𝑎subscript𝑤𝑖𝑔subscript𝑥𝑖subscriptsupremum𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛1𝑎subscript~𝑤𝑖𝑔subscript𝑥𝑖\displaystyle\leq\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}aw_{i}g\left(x_{i}% \right)+\sup_{\begin{subarray}{c}g\in\mathcal{B}\\ \|g\|_{n}\leq t\end{subarray}}\frac{1}{n}\sum_{i=1}^{n}(1-a)\tilde{w}_{i}g% \left(x_{i}\right).≤ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - italic_a ) over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Therefore, F^(w)^𝐹𝑤\widehat{F}(w)over^ start_ARG italic_F end_ARG ( italic_w ) is a convex function. Similarly, we can show that F(w)𝐹𝑤F(w)italic_F ( italic_w ) is a convex function.

Applying Lemma F.11 with G=F^𝐺^𝐹G=\widehat{F}italic_G = over^ start_ARG italic_F end_ARG (and F𝐹Fitalic_F), and δ=nt0/t𝛿𝑛subscript𝑡0𝑡\delta=\sqrt{n}t_{0}/titalic_δ = square-root start_ARG italic_n end_ARG italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_t, then we have

|Z^n(w,t)𝒬^n(t)|subscript^𝑍𝑛𝑤𝑡subscript^𝒬𝑛𝑡\displaystyle\left|\widehat{Z}_{n}(w,t)-\widehat{\mathcal{Q}}_{n}(t)\right|| over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) | =tn|F^(w)𝔼F^(w)|t0,absent𝑡𝑛^𝐹𝑤𝔼^𝐹𝑤subscript𝑡0\displaystyle=\frac{t}{\sqrt{n}}\left|\widehat{F}(w)-\mathbb{E}\widehat{F}(w)% \right|\leq t_{0},= divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG | over^ start_ARG italic_F end_ARG ( italic_w ) - blackboard_E over^ start_ARG italic_F end_ARG ( italic_w ) | ≤ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (169)
|Zn(w,t)𝒬n(t)|subscript𝑍𝑛𝑤𝑡subscript𝒬𝑛𝑡\displaystyle\left|{Z}_{n}(w,t)-{\mathcal{Q}}_{n}(t)\right|| italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) - caligraphic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) | =tn|F(w)𝔼F(w)|t0,absent𝑡𝑛𝐹𝑤𝔼𝐹𝑤subscript𝑡0\displaystyle=\frac{t}{\sqrt{n}}\left|F(w)-\mathbb{E}F(w)\right|\leq t_{0},= divide start_ARG italic_t end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG | italic_F ( italic_w ) - blackboard_E italic_F ( italic_w ) | ≤ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,

with probability at least 1C1exp(C2nt02t2)1subscript𝐶1subscript𝐶2𝑛superscriptsubscript𝑡02superscript𝑡21-C_{1}\exp\left(-C_{2}\frac{nt_{0}^{2}}{t^{2}}\right)1 - italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp ( - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_n italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) for some absolute constants C1,C2>0subscript𝐶1subscript𝐶20C_{1},C_{2}>0italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0. \square

E.2.4 Proof of Lemma A.4

Proof.

The bound on the \mathcal{H}caligraphic_H-norm can be attained by modifying the proof of Lemma 9 in [64]. To make the proof self-content, we reproduce a full proof below.

Let us write fT^=k=0λka^kϕksubscript𝑓^𝑇superscriptsubscript𝑘0subscript𝜆𝑘subscript^𝑎𝑘subscriptitalic-ϕ𝑘{f}_{\widehat{T}}=\sum_{k=0}^{\infty}\sqrt{\lambda_{k}}\hat{a}_{k}\phi_{k}italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Thus, we have fT^2=k=0a^k2superscriptsubscriptnormsubscript𝑓^𝑇2superscriptsubscript𝑘0superscriptsubscript^𝑎𝑘2\left\|{f}_{\widehat{T}}\right\|_{\mathcal{H}}^{2}=\sum_{k=0}^{\infty}\hat{a}_% {k}^{2}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Recall the linear operator ΦX:2n:subscriptΦ𝑋superscript2superscript𝑛\Phi_{X}:\ell^{2}\rightarrow\mathbb{R}^{n}roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT defined in (134). Similar to (137), we have

a^^𝑎\displaystyle\hat{a}over^ start_ARG italic_a end_ARG =1n(Ψ)τΣ1/2UτfT^(𝑿)absent1𝑛superscriptsuperscriptΨ𝜏superscriptΣ12superscript𝑈𝜏subscript𝑓^𝑇𝑿\displaystyle=\frac{1}{\sqrt{n}}(\Psi^{*})^{\tau}\Sigma^{-1/2}U^{\tau}{f}_{% \widehat{T}}(\boldsymbol{X})= divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ( roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X ) (170)
=1n(Ψ)τΣ1/2UτUΣ1UτfT^(𝑿)absent1𝑛superscriptsuperscriptΨ𝜏superscriptΣ12superscript𝑈𝜏𝑈superscriptΣ1superscript𝑈𝜏subscript𝑓^𝑇𝑿\displaystyle=\frac{1}{\sqrt{n}}(\Psi^{*})^{\tau}\Sigma^{1/2}U^{\tau}U\Sigma^{% -1}U^{\tau}{f}_{\widehat{T}}(\boldsymbol{X})= divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ( roman_Ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X )
=(136)1nD1/2(Φ𝑿)τ[1nK(𝑿,𝑿)]1fT^(𝑿);1361𝑛superscript𝐷12superscriptsubscriptΦ𝑿𝜏superscriptdelimited-[]1𝑛𝐾𝑿𝑿1subscript𝑓^𝑇𝑿\displaystyle\overset{(\ref{eqn:64_phi})}{=}\frac{1}{n}D^{1/2}(\Phi_{% \boldsymbol{X}})^{\tau}\left[\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})\right% ]^{-1}{f}_{\widehat{T}}(\boldsymbol{X});start_OVERACCENT ( ) end_OVERACCENT start_ARG = end_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_D start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X ) ;

therefore, from (135), we have

fT^2=a^22=1nfT^(𝑿)τ[1nK(𝑿,𝑿)]1fT^(𝑿).superscriptsubscriptnormsubscript𝑓^𝑇2superscriptsubscriptnorm^𝑎221𝑛subscript𝑓^𝑇superscript𝑿𝜏superscriptdelimited-[]1𝑛𝐾𝑿𝑿1subscript𝑓^𝑇𝑿\displaystyle\left\|f_{\widehat{T}}\right\|_{\mathcal{H}}^{2}=\|\hat{a}\|_{2}^% {2}=\frac{1}{n}{f}_{\widehat{T}}(\boldsymbol{X})^{\tau}\left[\frac{1}{n}K(% \boldsymbol{X},\boldsymbol{X})\right]^{-1}{f}_{\widehat{T}}(\boldsymbol{X}).∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG italic_a end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X ) . (171)

Recall the eigen-decomposition in (135) that 1nK(𝑿,𝑿)=UΣUτ1𝑛𝐾𝑿𝑿𝑈Σsuperscript𝑈𝜏\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})=U\Sigma U^{\tau}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) = italic_U roman_Σ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT, and the relation in (5) that UτfT^(𝑿)=(𝐈e1nT^K(𝑿,𝑿))Uτ𝒚superscript𝑈𝜏subscript𝑓^𝑇𝑿𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿superscript𝑈𝜏𝒚U^{\tau}{f}_{\widehat{T}}(\boldsymbol{X})=\left(\mathbf{I}-e^{-\frac{1}{n}% \widehat{T}K(\boldsymbol{X},\boldsymbol{X})}\right)U^{\tau}\boldsymbol{y}italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ( bold_italic_X ) = ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_y. Substituting into Equation (171) yields

fT^2=1n𝒚τU(𝐈e1nT^K(𝑿,𝑿))2Σ1Uτ𝒚superscriptsubscriptnormsubscript𝑓^𝑇21𝑛superscript𝒚𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿2superscriptΣ1superscript𝑈𝜏𝒚\displaystyle\left\|{f}_{\widehat{T}}\right\|_{\mathcal{H}}^{2}=\frac{1}{n}% \boldsymbol{y}^{\tau}U\left(\mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(% \boldsymbol{X},\boldsymbol{X})}\right)^{2}\Sigma^{-1}U^{\tau}\boldsymbol{y}∥ italic_f start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_y start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_y (172)
=1n(f(𝑿)+𝒆)τU(𝐈e1nT^K(𝑿,𝑿))2Σ1Uτ(f(𝑿)+𝒆)absent1𝑛superscriptsuperscript𝑓𝑿𝒆𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿2superscriptΣ1superscript𝑈𝜏superscript𝑓𝑿𝒆\displaystyle=\frac{1}{n}\left(f^{*}\left(\boldsymbol{X}\right)+\boldsymbol{e}% \right)^{\tau}U\left(\mathbf{I}-e^{-\frac{1}{n}\widehat{T}K(\boldsymbol{X},% \boldsymbol{X})}\right)^{2}\Sigma^{-1}U^{\tau}\left(f^{*}\left(\boldsymbol{X}% \right)+\boldsymbol{e}\right)= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) + bold_italic_e ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) + bold_italic_e )
=2n𝒆τU(𝐈e1nT^K(𝑿,𝑿))2Σ1Uτf(𝑿)AT^+1n𝒆τU(𝐈e1nT^K(𝑿,𝑿))2Σ1Uτ𝒆BT^absentsubscript2𝑛superscript𝒆𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿2superscriptΣ1superscript𝑈𝜏superscript𝑓𝑿subscript𝐴^𝑇subscript1𝑛superscript𝒆𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿2superscriptΣ1superscript𝑈𝜏𝒆subscript𝐵^𝑇\displaystyle=\underbrace{\frac{2}{n}\boldsymbol{e}^{\tau}U\left(\mathbf{I}-e^% {-\frac{1}{n}{\widehat{T}}K(\boldsymbol{X},\boldsymbol{X})}\right)^{2}\Sigma^{% -1}U^{\tau}f^{*}\left(\boldsymbol{X}\right)}_{A_{\widehat{T}}}+\underbrace{% \frac{1}{n}\boldsymbol{e}^{\tau}U\left(\mathbf{I}-e^{-\frac{1}{n}\widehat{T}K(% \boldsymbol{X},\boldsymbol{X})}\right)^{2}\Sigma^{-1}U^{\tau}\boldsymbol{e}}_{% B_{\widehat{T}}}= under⏟ start_ARG divide start_ARG 2 end_ARG start_ARG italic_n end_ARG bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e end_ARG start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT
+1nf(𝑿)τU(𝐈e1nT^K(𝑿,𝑿))2Σ1Uτf(𝑿)CT^;subscript1𝑛superscript𝑓superscript𝑿𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿2superscriptΣ1superscript𝑈𝜏superscript𝑓𝑿subscript𝐶^𝑇\displaystyle+\underbrace{\frac{1}{n}f^{*}\left(\boldsymbol{X}\right)^{\tau}U% \left(\mathbf{I}-e^{-\frac{1}{n}\widehat{T}K(\boldsymbol{X},\boldsymbol{X})}% \right)^{2}\Sigma^{-1}U^{\tau}f^{*}\left(\boldsymbol{X}\right)}_{C_{\widehat{T% }}};+ under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) end_ARG start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT ;

where 𝒆=𝒚f(𝑿)𝒆𝒚subscript𝑓𝑿\boldsymbol{e}=\boldsymbol{y}-f_{\star}(\boldsymbol{X})bold_italic_e = bold_italic_y - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( bold_italic_X ). From (133), we have

CT^1nf(𝑿)τUΣ1Uτf(𝑿)1,subscript𝐶^𝑇1𝑛superscript𝑓superscript𝑿𝜏𝑈superscriptΣ1superscript𝑈𝜏superscript𝑓𝑿1\displaystyle C_{\widehat{T}}\leq\frac{1}{n}f^{*}\left(\boldsymbol{X}\right)^{% \tau}U\Sigma^{-1}U^{\tau}f^{*}\left(\boldsymbol{X}\right){\leq}1,italic_C start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) ≤ 1 , (173)

where the last inequality follows from (138). It remains to derive upper bounds on the random variables AT^subscript𝐴^𝑇A_{\widehat{T}}italic_A start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT and BT^subscript𝐵^𝑇B_{\widehat{T}}italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT.

Bounding AT^subscript𝐴^𝑇A_{\widehat{T}}italic_A start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT

Since the elements of 𝒆𝒆\boldsymbol{e}bold_italic_e are i.i.d, zero-mean Gaussian with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have [|AT^|1]2exp(n2σ2ν2)delimited-[]subscript𝐴^𝑇12𝑛2superscript𝜎2superscript𝜈2\mathbb{P}\left[\left|A_{\widehat{T}}\right|\geq 1\right]\leq 2\exp\left(-% \frac{n}{2\sigma^{2}\nu^{2}}\right)blackboard_P [ | italic_A start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT | ≥ 1 ] ≤ 2 roman_exp ( - divide start_ARG italic_n end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ), where

ν2:=4nf(𝑿)τU(𝐈e1nT^K(𝑿,𝑿))4Σ2Uτf(𝑿).assignsuperscript𝜈24𝑛superscript𝑓superscript𝑿𝜏𝑈superscript𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿4superscriptΣ2superscript𝑈𝜏superscript𝑓𝑿\displaystyle\nu^{2}:=\frac{4}{n}f^{*}\left(\boldsymbol{X}\right)^{\tau}U\left% (\mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(\boldsymbol{X},\boldsymbol{X})}% \right)^{4}\Sigma^{-2}U^{\tau}f^{*}\left(\boldsymbol{X}\right).italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := divide start_ARG 4 end_ARG start_ARG italic_n end_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) . (174)

From (133) we have

ν2superscript𝜈2\displaystyle\nu^{2}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4nf(𝑿)τU(𝐈e1nT^K(𝑿,𝑿))Σ2Uτf(𝑿)absent4𝑛superscript𝑓superscript𝑿𝜏𝑈𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿superscriptΣ2superscript𝑈𝜏superscript𝑓𝑿\displaystyle\leq\frac{4}{n}f^{*}\left(\boldsymbol{X}\right)^{\tau}U\left(% \mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(\boldsymbol{X},\boldsymbol{X})}\right% )\Sigma^{-2}U^{\tau}f^{*}\left(\boldsymbol{X}\right)≤ divide start_ARG 4 end_ARG start_ARG italic_n end_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_U ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) roman_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) (175)
4nj=1n[Uτf(𝑿)]j2λ^j2min(1,T^λ^j)absent4𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptdelimited-[]superscript𝑈𝜏superscript𝑓𝑿𝑗2superscriptsubscript^𝜆𝑗21^𝑇subscript^𝜆𝑗\displaystyle\leq\frac{4}{n}\sum_{j=1}^{n}\frac{\left[U^{\tau}f^{*}\left(% \boldsymbol{X}\right)\right]_{j}^{2}}{\widehat{\lambda}_{j}^{2}}\min\left(1,% \widehat{T}\widehat{\lambda}_{j}\right)≤ divide start_ARG 4 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG [ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_italic_X ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_min ( 1 , over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
4T^nj=1n[Uτf(x1n)]j2λ^jabsent4^𝑇𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptdelimited-[]superscript𝑈𝜏superscript𝑓superscriptsubscript𝑥1𝑛𝑗2subscript^𝜆𝑗\displaystyle\leq 4\frac{{\widehat{T}}}{n}\sum_{j=1}^{n}\frac{\left[U^{\tau}f^% {*}\left(x_{1}^{n}\right)\right]_{j}^{2}}{\widehat{\lambda}_{j}}≤ 4 divide start_ARG over^ start_ARG italic_T end_ARG end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG [ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG
4T^=4ε^n2,absent4^𝑇4subscriptsuperscript^𝜀2𝑛\displaystyle\leq 4{\widehat{T}}=4\widehat{\varepsilon}^{-2}_{n},≤ 4 over^ start_ARG italic_T end_ARG = 4 over^ start_ARG italic_ε end_ARG start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ,

where the final inequality follows from (138).

Bounding BT^subscript𝐵^𝑇B_{\widehat{T}}italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT

We begin by noting that

BT^=1nj=1n[(𝐈e1nT^K(𝑿,𝑿))]jj2λj^[Uτ𝒆]j2=1ni,j=1n[UPUτ]ij(𝒆i𝒆j),subscript𝐵^𝑇1𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptdelimited-[]𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿𝑗𝑗2^subscript𝜆𝑗superscriptsubscriptdelimited-[]superscript𝑈𝜏𝒆𝑗21𝑛superscriptsubscript𝑖𝑗1𝑛subscriptdelimited-[]𝑈𝑃superscript𝑈𝜏𝑖𝑗subscript𝒆𝑖subscript𝒆𝑗\displaystyle B_{\widehat{T}}=\frac{1}{n}\sum_{j=1}^{n}\frac{\left[\left(% \mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(\boldsymbol{X},\boldsymbol{X})}\right% )\right]_{jj}^{2}}{\widehat{\lambda_{j}}}\left[U^{\tau}\boldsymbol{e}\right]_{% j}^{2}=\frac{1}{n}\sum_{i,j=1}^{n}\left[UPU^{\tau}\right]_{ij}(\boldsymbol{e}_% {i}\boldsymbol{e}_{j}),italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG [ ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG [ italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT bold_italic_e ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (176)

where P=(𝑰eT^Σ)2𝑃superscript𝑰superscript𝑒^𝑇Σ2P=\left(\boldsymbol{I}-e^{-\widehat{T}\Sigma}\right)^{2}italic_P = ( bold_italic_I - italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_T end_ARG roman_Σ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Consequently, BT^subscript𝐵^𝑇B_{\widehat{T}}italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT is a quadratic form in zero-mean Gaussian variables with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and using the tail bound Lemma F.10, we have

[|BT^𝔼[BT^]|1]exp(Cmin{nUPUτop1,n2UPUτF2}),delimited-[]subscript𝐵^𝑇𝔼delimited-[]subscript𝐵^𝑇1𝐶𝑛superscriptsubscriptnorm𝑈𝑃superscript𝑈𝜏op1superscript𝑛2superscriptsubscriptnorm𝑈𝑃superscript𝑈𝜏F2\displaystyle\mathbb{P}\left[\left|B_{\widehat{T}}-\mathbb{E}\left[B_{\widehat% {T}}\right]\right|\geq 1\right]\leq\exp\left(-C\min\left\{n\left\|UPU^{\tau}% \right\|_{\mathrm{op}}^{-1},n^{2}\left\|UPU^{\tau}\right\|_{\mathrm{F}}^{-2}% \right\}\right),blackboard_P [ | italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT - blackboard_E [ italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ] | ≥ 1 ] ≤ roman_exp ( - italic_C roman_min { italic_n ∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT } ) , (177)

for an absolute constant C𝐶Citalic_C. It remains to bound 𝔼[BT^],UPUτop𝔼delimited-[]subscript𝐵^𝑇subscriptnorm𝑈𝑃superscript𝑈𝜏op\mathbb{E}\left[B_{\widehat{T}}\right],\left\|UPU^{\tau}\right\|_{\mathrm{op}}blackboard_E [ italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ] , ∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT and UPUτFsubscriptnorm𝑈𝑃superscript𝑈𝜏F\left\|UPU^{\tau}\right\|_{\mathrm{F}}∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT. We first bound the mean. Since 𝔼[𝒆𝒆τ]=σ2𝑰n𝔼delimited-[]𝒆superscript𝒆𝜏superscript𝜎2subscript𝑰𝑛\mathbb{E}\left[\boldsymbol{e}\boldsymbol{e}^{\tau}\right]=\sigma^{2}% \boldsymbol{I}_{n}blackboard_E [ bold_italic_e bold_italic_e start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have

𝔼[BT^]σ2nj=1n[(𝐈e1nT^K(𝑿,𝑿))]jj2λj^σ2T^nj=1nmin((T^λj^)1,T^λj^).𝔼delimited-[]subscript𝐵^𝑇superscript𝜎2𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptdelimited-[]𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿𝑗𝑗2^subscript𝜆𝑗superscript𝜎2^𝑇𝑛superscriptsubscript𝑗1𝑛superscript^𝑇^subscript𝜆𝑗1^𝑇^subscript𝜆𝑗\displaystyle\mathbb{E}\left[B_{\widehat{T}}\right]\leq\frac{\sigma^{2}}{n}% \sum_{j=1}^{n}\frac{\left[\left(\mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(% \boldsymbol{X},\boldsymbol{X})}\right)\right]_{jj}^{2}}{\widehat{\lambda_{j}}}% \leq\frac{\sigma^{2}{\widehat{T}}}{n}\sum_{j=1}^{n}\min\left(\left({\widehat{T% }}\widehat{\lambda_{j}}\right)^{-1},{\widehat{T}}\widehat{\lambda_{j}}\right).blackboard_E [ italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG [ ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG ≤ divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_T end_ARG end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min ( ( over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) . (178)

Since T^=ε^n2^𝑇subscriptsuperscript^𝜀2𝑛{\widehat{T}}=\widehat{\varepsilon}^{-2}_{n}over^ start_ARG italic_T end_ARG = over^ start_ARG italic_ε end_ARG start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have

T^nj=1nmin((T^λj^)1,T^λj^)T^2^K2(1/T^)1σ2,^𝑇𝑛superscriptsubscript𝑗1𝑛superscript^𝑇^subscript𝜆𝑗1^𝑇^subscript𝜆𝑗superscript^𝑇2superscriptsubscript^𝐾21^𝑇1superscript𝜎2\displaystyle\frac{{\widehat{T}}}{n}\sum_{j=1}^{n}\min\left(\left({\widehat{T}% }\widehat{\lambda_{j}}\right)^{-1},{\widehat{T}}\widehat{\lambda_{j}}\right)% \leq{\widehat{T}}^{2}\widehat{\mathcal{R}}_{{K}}^{2}\left(1/\sqrt{{\widehat{T}% }}\right)\leq\frac{1}{\sigma^{2}},divide start_ARG over^ start_ARG italic_T end_ARG end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min ( ( over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 / square-root start_ARG over^ start_ARG italic_T end_ARG end_ARG ) ≤ divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (179)

showing that 𝔼[BT^]1𝔼delimited-[]subscript𝐵^𝑇1\mathbb{E}\left[B_{\widehat{T}}\right]\leq 1blackboard_E [ italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT ] ≤ 1.

Turning to the operator norm, we have

UPUτop=maxj=1,,n([(𝐈e1nT^K(𝑿,𝑿))]jj2λ^j)maxj=1,,n[min(λj^1,T^2λj^)]T^.subscriptnorm𝑈𝑃superscript𝑈𝜏opsubscript𝑗1𝑛superscriptsubscriptdelimited-[]𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿𝑗𝑗2subscript^𝜆𝑗subscript𝑗1𝑛superscript^subscript𝜆𝑗1superscript^𝑇2^subscript𝜆𝑗^𝑇\displaystyle\left\|UPU^{\tau}\right\|_{\mathrm{op}}=\max_{j=1,\cdots,n}\left(% \frac{\left[\left(\mathbf{I}-e^{-\frac{1}{n}{\widehat{T}}K(\boldsymbol{X},% \boldsymbol{X})}\right)\right]_{jj}^{2}}{\widehat{\lambda}_{j}}\right)\leq\max% _{j=1,\cdots,n}\left[\min\left({\widehat{\lambda_{j}}}^{-1},{\widehat{T}}^{2}% \widehat{\lambda_{j}}\right)\right]\leq{\widehat{T}}.∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j = 1 , ⋯ , italic_n end_POSTSUBSCRIPT ( divide start_ARG [ ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) ≤ roman_max start_POSTSUBSCRIPT italic_j = 1 , ⋯ , italic_n end_POSTSUBSCRIPT [ roman_min ( over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) ] ≤ over^ start_ARG italic_T end_ARG . (180)

As for the Frobenius norm, we have

1nUPUτF21𝑛superscriptsubscriptnorm𝑈𝑃superscript𝑈𝜏F2\displaystyle\frac{1}{n}\left\|UPU^{\tau}\right\|_{\mathrm{F}}^{2}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =j=1n([(𝐈e1nT^K(𝑿,𝑿))]jj4λj^2)absentsuperscriptsubscript𝑗1𝑛superscriptsubscriptdelimited-[]𝐈superscript𝑒1𝑛^𝑇𝐾𝑿𝑿𝑗𝑗4superscript^subscript𝜆𝑗2\displaystyle=\sum_{j=1}^{n}\left(\frac{\left[\left(\mathbf{I}-e^{-\frac{1}{n}% {\widehat{T}}K(\boldsymbol{X},\boldsymbol{X})}\right)\right]_{jj}^{4}}{{% \widehat{\lambda_{j}}}^{2}}\right)= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG [ ( bold_I - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG over^ start_ARG italic_T end_ARG italic_K ( bold_italic_X , bold_italic_X ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (181)
1nj=1nmin(λj^2,T^4λj^2)absent1𝑛superscriptsubscript𝑗1𝑛superscript^subscript𝜆𝑗2superscript^𝑇4superscript^subscript𝜆𝑗2\displaystyle\leq\frac{1}{n}\sum_{j=1}^{n}\min\left({\widehat{\lambda_{j}}}^{-% 2},{\widehat{T}}^{4}{\widehat{\lambda_{j}}}^{2}\right)≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min ( over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
T^3nj=1nmin(T^3λj^2,T^λj^2).absentsuperscript^𝑇3𝑛superscriptsubscript𝑗1𝑛superscript^𝑇3superscript^subscript𝜆𝑗2^𝑇superscript^subscript𝜆𝑗2\displaystyle\leq\frac{{\widehat{T}}^{3}}{n}\sum_{j=1}^{n}\min\left({\widehat{% T}}^{-3}{\widehat{\lambda_{j}}}^{-2},{\widehat{T}}{\widehat{\lambda_{j}}}^{2}% \right).≤ divide start_ARG over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min ( over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_T end_ARG over^ start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Using the definition of empirical Mendelson complexity, we have

1nUPUτF2T^3K2(1/T^)T^σ2.1𝑛superscriptsubscriptnorm𝑈𝑃superscript𝑈𝜏F2superscript^𝑇3superscriptsubscript𝐾21^𝑇^𝑇superscript𝜎2\displaystyle\frac{1}{n}\left\|UPU^{\tau}\right\|_{\mathrm{F}}^{2}\leq{% \widehat{T}}^{3}\mathcal{R}_{K}^{2}\left(1/\sqrt{{\widehat{T}}}\right)\leq% \frac{{\widehat{T}}}{\sigma^{2}}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_U italic_P italic_U start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ over^ start_ARG italic_T end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 / square-root start_ARG over^ start_ARG italic_T end_ARG end_ARG ) ≤ divide start_ARG over^ start_ARG italic_T end_ARG end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (182)

Putting together the pieces, we have shown that there exists an absolute constant C𝐶Citalic_C, such that we have

[|BT^|2 or |AT^|1]exp(Cn/T^).delimited-[]subscript𝐵^𝑇2 or subscript𝐴^𝑇1𝐶𝑛^𝑇\displaystyle\mathbb{P}\left[\left|B_{\widehat{T}}\right|\geq 2\text{ or }% \left|A_{\widehat{T}}\right|\geq 1\right]\leq\exp\left(-Cn/{\widehat{T}}\right).blackboard_P [ | italic_B start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT | ≥ 2 or | italic_A start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT | ≥ 1 ] ≤ roman_exp ( - italic_C italic_n / over^ start_ARG italic_T end_ARG ) . (183)

Since T^=ε^n2^𝑇superscriptsubscript^𝜀𝑛2{\widehat{T}}=\widehat{\varepsilon}_{n}^{-2}over^ start_ARG italic_T end_ARG = over^ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, the claim follows.

Appendix F Assisting Lemmas

F.1 Local Rademacher complexity

Suppose that K𝐾Kitalic_K is a kernel defined on 𝒳d+1𝒳superscript𝑑1\mathcal{X}\subset\mathbb{R}^{d+1}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT and \mathcal{H}caligraphic_H is the RKHS associated to the kernel K𝐾Kitalic_K. Let

K(x,y)=jλjϕj(x)ϕj(y)𝐾𝑥𝑦subscript𝑗subscript𝜆𝑗subscriptitalic-ϕ𝑗𝑥subscriptitalic-ϕ𝑗𝑦\displaystyle K(x,y)=\sum_{j}\lambda_{j}\phi_{j}(x)\phi_{j}(y)italic_K ( italic_x , italic_y ) = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) (184)

be the Mercer’s decomposition of K𝐾Kitalic_K where λ1λ20subscript𝜆1subscript𝜆20\lambda_{1}\geq\lambda_{2}\geq...\geq 0italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ … ≥ 0 is non-increasing non-negative real numbers and {ϕj}subscriptitalic-ϕ𝑗\{\phi_{j}\}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } are orthonormal functions in L2(𝒳,ρ𝒳)superscript𝐿2𝒳subscript𝜌𝒳L^{2}(\mathcal{X},\rho_{\mathcal{X}})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_X , italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ). Let Φ(x)τ=(λ1ϕ1(x),λ1ϕ2(x),.)\Phi(x)^{\tau}=(\sqrt{\lambda_{1}}\phi_{1}(x),\sqrt{\lambda_{1}}\phi_{2}(x),..% ..)roman_Φ ( italic_x ) start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT = ( square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) , … . ). Then we introduce a natural isomorphism i:2:𝑖superscript2i:\ell^{2}\to\mathcal{H}italic_i : roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → caligraphic_H given by

a=(a1,a2,)aτΦ=jajλjϕj(x).𝑎subscript𝑎1subscript𝑎2maps-tosuperscript𝑎𝜏Φsubscript𝑗subscript𝑎𝑗subscript𝜆𝑗subscriptitalic-ϕ𝑗𝑥\displaystyle a=(a_{1},a_{2},\cdots)\mapsto a^{\tau}\Phi=\sum_{j}a_{j}\sqrt{% \lambda_{j}}\phi_{j}(x).italic_a = ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ ) ↦ italic_a start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT roman_Φ = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) . (185)

F.1.1 Population version

We introduce the following quantities:

RK(t)=[1nj=1min{λj,t2}]1/2,Qn(t)=𝔼w[Zn(w,t)]formulae-sequencesubscript𝑅𝐾𝑡superscriptdelimited-[]1𝑛superscriptsubscript𝑗1subscript𝜆𝑗superscript𝑡212subscript𝑄𝑛𝑡subscript𝔼𝑤delimited-[]subscript𝑍𝑛𝑤𝑡\displaystyle R_{K}(t)=\left[\frac{1}{n}\sum_{j=1}^{\infty}\min\{\lambda_{j},t% ^{2}\}\right]^{1/2},\quad Q_{n}(t)=\mathbb{E}_{w}[Z_{n}(w,t)]italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) = [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT roman_min { italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ] (186)

where Zn(w,t):=𝔼x1,x2,μ[supggL2t|1ni=1nwig(xi)|]assignsubscript𝑍𝑛𝑤𝑡subscript𝔼similar-tosubscript𝑥1subscript𝑥2𝜇delimited-[]subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖Z_{n}(w,t):=\mathbb{E}_{x_{1},x_{2},...\sim\mu}\left[\sup_{\begin{subarray}{c}% g\in{\mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x% _{i}\right)\right|\right]italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) := blackboard_E start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ∼ italic_μ end_POSTSUBSCRIPT [ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ], wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are i.i.d. Rademacher random variables independent of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and ={gg1}conditional-set𝑔subscriptnorm𝑔1\mathcal{B}=\left\{g\in\mathcal{H}\mid\|g\|_{\mathcal{H}}\leq 1\right\}caligraphic_B = { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 }.

The following Lemma is modified from Theorem 41 of [58], and the proof is mainly based on that for Theorem 41 of [58].

Lemma F.1.

For any t>0𝑡0t>0italic_t > 0, we have

Qn(t)2RK(t).subscript𝑄𝑛𝑡2subscript𝑅𝐾𝑡\displaystyle Q_{n}(t)\leq\sqrt{2}R_{K}(t).italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≤ square-root start_ARG 2 end_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (187)

Furthermore, there exist an absolute positive constant c𝑐citalic_c such that for any t21nsuperscript𝑡21𝑛t^{2}\geq\frac{1}{n}italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, one has

Qn(t)cRK(t).subscript𝑄𝑛𝑡𝑐subscript𝑅𝐾𝑡\displaystyle Q_{n}(t)\geq cR_{K}(t).italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≥ italic_c italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (188)
Proof.

Let T(t)=supggL2t|i=1nwig(xi)|𝑇𝑡subscriptsupremum𝑔subscriptnorm𝑔superscript𝐿2𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖T(t)=\sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{L^{2}}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}g\left(x_{i}\right)\right|italic_T ( italic_t ) = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |. We need the following two lemmas:

Lemma F.2 (Lemma 42 in [58]).

For any t>0𝑡0t>0italic_t > 0, we have

n2RK2(t)𝔼w,x1,..,xnT(t)22n2RK2(t).\displaystyle n^{2}R_{K}^{2}(t)\leq\mathbb{E}_{w,x_{1},..,x_{n}}T(t)^{2}\leq 2% n^{2}R_{K}^{2}(t).italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ≤ blackboard_E start_POSTSUBSCRIPT italic_w , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , . . , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) . (189)
Proof.

Denote (t)={ffL2t}𝑡conditional-set𝑓subscriptnorm𝑓superscript𝐿2𝑡\mathcal{F}(t)=\left\{f\in{\mathcal{B}}\mid\|f\|_{L^{2}}\leq t\right\}caligraphic_F ( italic_t ) = { italic_f ∈ caligraphic_B ∣ ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t }. Since there exists β2𝛽superscript2\beta\in\ell^{2}italic_β ∈ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT such that f(x)=βτΦ(x)𝑓𝑥superscript𝛽𝜏Φ𝑥f(x)=\beta^{\tau}\Phi(x)italic_f ( italic_x ) = italic_β start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT roman_Φ ( italic_x ), we know that (t)=i({β|βj21 and βj2λjt2})𝑡𝑖conditional-set𝛽superscriptsubscript𝛽𝑗21 and superscriptsubscript𝛽𝑗2subscript𝜆𝑗superscript𝑡2\mathcal{F}(t)=i\left(\left\{\beta~{}|~{}\sum\beta_{j}^{2}\leq 1\text{ and }% \sum\beta_{j}^{2}\lambda_{j}\leq t^{2}\right\}\right)caligraphic_F ( italic_t ) = italic_i ( { italic_β | ∑ italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 and ∑ italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ).

For any s𝑠sitalic_s, Let (s)={βjβj2μjs}𝑠conditional-set𝛽subscript𝑗subscriptsuperscript𝛽2𝑗subscript𝜇𝑗𝑠\mathcal{E}(s)=\left\{~{}\beta~{}\mid~{}\sum_{j}\beta^{2}_{j}\mu_{j}\leq s~{}\right\}caligraphic_E ( italic_s ) = { italic_β ∣ ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_s } where μj:=μj(t)=(min{1,t2/λj})1max{1,λj/t2}assignsubscript𝜇𝑗subscript𝜇𝑗𝑡superscript1superscript𝑡2subscript𝜆𝑗11subscript𝜆𝑗superscript𝑡2\mu_{j}:=\mu_{j}(t)=\left(\min\{1,t^{2}/\lambda_{j}\}\right)^{-1}\leq\max\{1,% \lambda_{j}/t^{2}\}italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = ( roman_min { 1 , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ roman_max { 1 , italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. Then

i((1))(t)i((2)).𝑖1𝑡𝑖2i\left(\mathcal{E}(1)\right)\subset\mathcal{F}(t)\subset i\left(\mathcal{E}(2)% \right).italic_i ( caligraphic_E ( 1 ) ) ⊂ caligraphic_F ( italic_t ) ⊂ italic_i ( caligraphic_E ( 2 ) ) .

Thus we have

𝔼T(t)2𝔼supβj2μj2β,i=1nwiΦ(xi)2=2ni=1λiμi=2ni=1min{λi,t2}=2n2RK(t)2.𝔼𝑇superscript𝑡2𝔼subscriptsupremumsubscriptsuperscript𝛽2𝑗subscript𝜇𝑗2superscript𝛽superscriptsubscript𝑖1𝑛subscript𝑤𝑖Φsubscript𝑥𝑖22𝑛superscriptsubscript𝑖1subscript𝜆𝑖subscript𝜇𝑖2𝑛superscriptsubscript𝑖1subscript𝜆𝑖superscript𝑡22superscript𝑛2subscript𝑅𝐾superscript𝑡2\displaystyle\mathbb{E}T(t)^{2}\leq\mathbb{E}\sup_{\sum\beta^{2}_{j}\mu_{j}% \leq 2}\langle\beta,\sum_{i=1}^{n}w_{i}\Phi(x_{i})\rangle^{2}=2n\sum_{i=1}^{% \infty}\frac{\lambda_{i}}{\mu_{i}}=2n\sum_{i=1}^{\infty}\min\left\{\lambda_{i}% ,t^{2}\right\}=2n^{2}R_{K}(t)^{2}.blackboard_E italic_T ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ blackboard_E roman_sup start_POSTSUBSCRIPT ∑ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ 2 end_POSTSUBSCRIPT ⟨ italic_β , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Φ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 italic_n ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 2 italic_n ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT roman_min { italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Similarly, we can show that 𝔼T(t)2n2RK2(t)𝔼𝑇superscript𝑡2superscript𝑛2superscriptsubscript𝑅𝐾2𝑡\mathbb{E}T(t)^{2}\geq n^{2}R_{K}^{2}(t)blackboard_E italic_T ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ).

\square

Lemma F.3 (Theorem 43 in [58]).

There exists an absolute constant c𝑐citalic_c such that for any t21nsuperscript𝑡21𝑛t^{2}\geq\frac{1}{n}italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, one has

(𝔼1nT(t))2c𝔼(1nT(t))2superscript𝔼1𝑛𝑇𝑡2𝑐𝔼superscript1𝑛𝑇𝑡2\displaystyle\left(\mathbb{E}\frac{1}{n}T(t)\right)^{2}\geq c\mathbb{E}\left(% \frac{1}{n}T(t)\right)^{2}( blackboard_E divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_T ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_c blackboard_E ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_T ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (190)

Note that Qn(t)=1n𝔼w,x1,,xnT(t)subscript𝑄𝑛𝑡1𝑛subscript𝔼𝑤subscript𝑥1subscript𝑥𝑛𝑇𝑡Q_{n}(t)=\frac{1}{n}\mathbb{E}_{w,x_{1},...,x_{n}}T(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG blackboard_E start_POSTSUBSCRIPT italic_w , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T ( italic_t ). For any t>0𝑡0t>0italic_t > 0, we have the following holds:

Qn(t)1n(𝔼w,x1,,xnT(t)2)1/2(189)2RK(t).subscript𝑄𝑛𝑡1𝑛superscriptsubscript𝔼𝑤subscript𝑥1subscript𝑥𝑛𝑇superscript𝑡2121892subscript𝑅𝐾𝑡\displaystyle Q_{n}(t)\leq\frac{1}{n}\left(\mathbb{E}_{w,x_{1},...,x_{n}}T(t)^% {2}\right)^{1/2}\overset{(\ref{rademacher_popu_lemma_235})}{\leq}\sqrt{2}R_{K}% (t).italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_w , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG square-root start_ARG 2 end_ARG italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (191)

Furthermore, for any t21/nsuperscript𝑡21𝑛t^{2}\geq 1/nitalic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 1 / italic_n, we have the following holds for some absolute constant c𝑐citalic_c:

cRK(t)(189)cn(𝔼w,x1,,xnT(t)2)1/2(190)Qn(t).𝑐subscript𝑅𝐾𝑡189𝑐𝑛superscriptsubscript𝔼𝑤subscript𝑥1subscript𝑥𝑛𝑇superscript𝑡212190subscript𝑄𝑛𝑡\displaystyle cR_{K}(t)\overset{(\ref{rademacher_popu_lemma_235})}{\leq}\frac{% c}{n}\left(\mathbb{E}_{w,x_{1},...,x_{n}}T(t)^{2}\right)^{1/2}\overset{(\ref{% rademacher_popu_lemma_236})}{\leq}Q_{n}(t).italic_c italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_c end_ARG start_ARG italic_n end_ARG ( blackboard_E start_POSTSUBSCRIPT italic_w , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) . (192)

\square

F.1.2 Empirical version

Suppose that we have n𝑛nitalic_n i.i.d. random samples xiμ,i=1,formulae-sequencesimilar-tosubscript𝑥𝑖𝜇𝑖1x_{i}\sim\mu,i=1,...italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_μ , italic_i = 1 , …. Let λ^1λ^nsubscript^𝜆1subscript^𝜆𝑛\widehat{\lambda}_{1}\geq...\geq\widehat{\lambda}_{n}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ … ≥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the eigenvalues of 1nK(X,X)1𝑛𝐾𝑋𝑋\frac{1}{n}K(X,X)divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( italic_X , italic_X ). We then introduce the empirical version of the aforementioned quantities:

R^K(t)=[1njmin{λ^j,t2}]1/2,Q^n(t)=𝔼w[Z^n(w,t)]formulae-sequencesubscript^𝑅𝐾𝑡superscriptdelimited-[]1𝑛subscript𝑗subscript^𝜆𝑗superscript𝑡212subscript^𝑄𝑛𝑡subscript𝔼𝑤delimited-[]subscript^𝑍𝑛𝑤𝑡\displaystyle\widehat{R}_{K}(t)=\left[\frac{1}{n}\sum_{j}\min\{\widehat{% \lambda}_{j},t^{2}\}\right]^{1/2},\quad\widehat{Q}_{n}(t)=\mathbb{E}_{w}[% \widehat{Z}_{n}(w,t)]over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) = [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT , over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT [ over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ] (193)

where Z^n(w,t):=supggnt|1ni=1nwig(xi)|assignsubscript^𝑍𝑛𝑤𝑡𝑠𝑢subscript𝑝𝑔subscriptnorm𝑔𝑛𝑡1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑔subscript𝑥𝑖\hat{Z}_{n}(w,t):=sup_{\begin{subarray}{c}g\in{\mathcal{B}}\\ \|g\|_{n}\leq t\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}g\left(x_{i}% \right)\right|over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) := italic_s italic_u italic_p start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_g ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_g ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |, wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are i.i.d. Rademacher random variables independent of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, gn2=1njg(xj)2subscriptsuperscriptnorm𝑔2𝑛1𝑛subscript𝑗𝑔superscriptsubscript𝑥𝑗2\|g\|^{2}_{n}=\frac{1}{n}\sum_{j}g(x_{j})^{2}∥ italic_g ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and ={gg1}conditional-set𝑔subscriptnorm𝑔1\mathcal{B}=\left\{g\in\mathcal{H}\mid\|g\|_{\mathcal{H}}\leq 1\right\}caligraphic_B = { italic_g ∈ caligraphic_H ∣ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 }.

Lemma F.4.

For any t>0𝑡0t>0italic_t > 0, we have

Q^n(t)2R^K(t).subscript^𝑄𝑛𝑡2subscript^𝑅𝐾𝑡\displaystyle\widehat{Q}_{n}(t)\leq\sqrt{2}\widehat{R}_{K}(t).over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≤ square-root start_ARG 2 end_ARG over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (194)

Furthermore, there exist an absolute positive constant c𝑐citalic_c such that for any t21nsuperscript𝑡21𝑛t^{2}\geq\frac{1}{n}italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, one has

Q^n(t)cR^K(t).subscript^𝑄𝑛𝑡𝑐subscript^𝑅𝐾𝑡\displaystyle\widehat{Q}_{n}(t)\geq c\widehat{R}_{K}(t).over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ≥ italic_c over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (195)
Remark F.5.

We notice that [42] claimed that (194) and (195) held without proving it, and [6] only gave the proof of the upper bound of Q^n(t)subscript^𝑄𝑛𝑡\widehat{Q}_{n}(t)over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ).

Proof.

Introduce the operator C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT on \mathcal{H}caligraphic_H defined by

(C^nf)(x)=1ni=1nf(Xi)K(Xi,x),subscript^𝐶𝑛𝑓𝑥1𝑛superscriptsubscript𝑖1𝑛𝑓subscript𝑋𝑖𝐾subscript𝑋𝑖𝑥\left(\hat{C}_{n}f\right)(x)=\frac{1}{n}\sum_{i=1}^{n}f\left(X_{i}\right)K% \left(X_{i},x\right),( over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f ) ( italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x ) ,

then we have the following lemma:

Lemma F.6.

The n𝑛nitalic_n largest eigenvalues of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are λ^1λ^nsubscript^𝜆1subscript^𝜆𝑛\widehat{\lambda}_{1}\geq...\geq\widehat{\lambda}_{n}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ … ≥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and the remaining eigenvalues of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are zero.

Proof.

Deferred to the end of this subsection.

Note that C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is an operator with rank nabsent𝑛\leq n≤ italic_n. Thus it takes 0 as its eigenvalue with infinite multiplicity. For notation simplicity, let (λ^i)i=1superscriptsubscriptsubscript^𝜆𝑖𝑖1\left(\hat{\lambda}_{i}\right)_{i=1}^{\infty}( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT denote the eigenvalues of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, arranged in non-increasing order. Let (ϕ^i)i1subscriptsubscript^italic-ϕ𝑖𝑖1\left(\hat{\phi}_{i}\right)_{i\geq 1}( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT be an orthonormal basis of \mathcal{H}caligraphic_H of eigen-functions of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (such that ϕ^isubscript^italic-ϕ𝑖\hat{\phi}_{i}over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is associated with λ^isubscript^𝜆𝑖\hat{\lambda}_{i}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Since λ^i=0subscript^𝜆𝑖0\hat{\lambda}_{i}=0over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 when i>n𝑖𝑛i>nitalic_i > italic_n, the choice of (ϕ^i)i1subscriptsubscript^italic-ϕ𝑖𝑖1\left(\hat{\phi}_{i}\right)_{i\geq 1}( over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT is not unique. For any f𝑓f\in\mathcal{H}italic_f ∈ caligraphic_H, we have the following decomposition:

f𝑓\displaystyle fitalic_f =i1f,ϕ^iϕ^iabsentsubscript𝑖1subscript𝑓subscript^italic-ϕ𝑖subscript^italic-ϕ𝑖\displaystyle=\sum_{i\geq 1}\left\langle f,\hat{\phi}_{i}\right\rangle_{% \mathcal{H}}\hat{\phi}_{i}= ∑ start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT ⟨ italic_f , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (196)
C^nfsubscript^𝐶𝑛𝑓\displaystyle\hat{C}_{n}fover^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f =i1λ^if,ϕ^iϕ^iabsentsubscript𝑖1subscript^𝜆𝑖subscript𝑓subscript^italic-ϕ𝑖subscript^italic-ϕ𝑖\displaystyle=\sum_{i\geq 1}\hat{\lambda}_{i}\left\langle f,\hat{\phi}_{i}% \right\rangle_{\mathcal{H}}\hat{\phi}_{i}= ∑ start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟨ italic_f , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

We need the following three lemmas:

Lemma F.7.

For any t>0𝑡0t>0italic_t > 0, we have

(𝔼wZ^n(w,t))22njmin{λ^j,t2}.superscriptsubscript𝔼𝑤subscript^𝑍𝑛𝑤𝑡22𝑛subscript𝑗subscript^𝜆𝑗superscript𝑡2\displaystyle\left(\mathbb{E}_{w}\widehat{Z}_{n}(w,t)\right)^{2}\leq\frac{2}{n% }\sum_{j}\min\{\widehat{\lambda}_{j},t^{2}\}.( blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . (197)
Proof.

Deferred to the end of this subsection.

Lemma F.8 (Theorem 43 in [58]).

There exists an absolute constant c𝑐citalic_c such that for any t21nsuperscript𝑡21𝑛t^{2}\geq\frac{1}{n}italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, one has

(𝔼wZ^n(w,t))2c𝔼w(Z^n(w,t))2superscriptsubscript𝔼𝑤subscript^𝑍𝑛𝑤𝑡2𝑐subscript𝔼𝑤superscriptsubscript^𝑍𝑛𝑤𝑡2\displaystyle\left(\mathbb{E}_{w}\widehat{Z}_{n}(w,t)\right)^{2}\geq c\mathbb{% E}_{w}\left(\widehat{Z}_{n}(w,t)\right)^{2}( blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_c blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (198)
Proof.

Deferred to the end of this subsection.

Lemma F.9.

For any t>0𝑡0t>0italic_t > 0, we have

𝔼w(Z^n(w,t))21njmin{λ^j,t2}.subscript𝔼𝑤superscriptsubscript^𝑍𝑛𝑤𝑡21𝑛subscript𝑗subscript^𝜆𝑗superscript𝑡2\displaystyle\mathbb{E}_{w}\left(\widehat{Z}_{n}(w,t)\right)^{2}\geq\frac{1}{n% }\sum_{j}\min\{\widehat{\lambda}_{j},t^{2}\}.blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . (199)
Proof.

Deferred to the end of this subsection.

Note that Q^n(t)=𝔼wZ^n(w,t)subscript^𝑄𝑛𝑡subscript𝔼𝑤subscript^𝑍𝑛𝑤𝑡\widehat{Q}_{n}(t)=\mathbb{E}_{w}\widehat{Z}_{n}(w,t)over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ). For any t>0𝑡0t>0italic_t > 0, we have

Q^n(t)(197)2R^K(t).subscript^𝑄𝑛𝑡1972subscript^𝑅𝐾𝑡\displaystyle\widehat{Q}_{n}(t)\overset{(\ref{eqn:lemma:men_empirical_upper})}% {\leq}\sqrt{2}\widehat{R}_{K}(t).over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG square-root start_ARG 2 end_ARG over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) . (200)

Furthermore, for any t21/nsuperscript𝑡21𝑛t^{2}\geq 1/nitalic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 1 / italic_n, we have the following holds for some absolute constant c𝑐citalic_c:

cR^K(t)(199)c[𝔼w(Z^n(w,t))2]1/2(198)Q^n(t).𝑐subscript^𝑅𝐾𝑡199𝑐superscriptdelimited-[]subscript𝔼𝑤superscriptsubscript^𝑍𝑛𝑤𝑡212198subscript^𝑄𝑛𝑡\displaystyle c\widehat{R}_{K}(t)\overset{(\ref{rademacher_popu_lemma_245})}{% \leq}c\left[\mathbb{E}_{w}\left(\widehat{Z}_{n}(w,t)\right)^{2}\right]^{1/2}% \overset{(\ref{rademacher_popu_lemma_244})}{\leq}\widehat{Q}_{n}(t).italic_c over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_t ) start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG italic_c [ blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT start_OVERACCENT ( ) end_OVERACCENT start_ARG ≤ end_ARG over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) . (201)

\square

Proof of Lemma F.6: For any f,g𝑓𝑔f,g\in\mathcal{H}italic_f , italic_g ∈ caligraphic_H, we have

g,C^nf=1ni=1nf(Xi)g(Xi),subscript𝑔subscript^𝐶𝑛𝑓1𝑛superscriptsubscript𝑖1𝑛𝑓subscript𝑋𝑖𝑔subscript𝑋𝑖\left\langle g,\hat{C}_{n}f\right\rangle_{\mathcal{H}}=\frac{1}{n}\sum_{i=1}^{% n}f\left(X_{i}\right)g\left(X_{i}\right),⟨ italic_g , over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_g ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,

and f,C^nf=fn2subscript𝑓subscript^𝐶𝑛𝑓superscriptsubscriptnorm𝑓𝑛2\left\langle f,\hat{C}_{n}f\right\rangle_{\mathcal{H}}=\|f\|_{n}^{2}⟨ italic_f , over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, implying that C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is positive semi-definite. Suppose that f𝑓fitalic_f is an eigenfunction of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with eigenvalue λ𝜆\lambdaitalic_λ. Then for all i𝑖iitalic_i,

λf(Xi)=(C^nf)(Xi)=1nj=1nf(Xj)K(Xj,Xi).𝜆𝑓subscript𝑋𝑖subscript^𝐶𝑛𝑓subscript𝑋𝑖1𝑛superscriptsubscript𝑗1𝑛𝑓subscript𝑋𝑗𝐾subscript𝑋𝑗subscript𝑋𝑖\lambda f\left(X_{i}\right)=\left(\hat{C}_{n}f\right)\left(X_{i}\right)=\frac{% 1}{n}\sum_{j=1}^{n}f\left(X_{j}\right)K\left(X_{j},X_{i}\right).italic_λ italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f ) ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_K ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Thus, the vector (f(X1),,f(Xn))𝑓subscript𝑋1𝑓subscript𝑋𝑛\left(f\left(X_{1}\right),\ldots,f\left(X_{n}\right)\right)( italic_f ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_f ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) is either zero (which implies C^nf=0subscript^𝐶𝑛𝑓0\hat{C}_{n}f=0over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f = 0 and hence λ=0𝜆0\lambda=0italic_λ = 0 ) or is an eigenvector of 1nK(𝑿,𝑿)1𝑛𝐾𝑿𝑿\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) with eigenvalue λ𝜆\lambdaitalic_λ. Conversely, if 1nK(𝑿,𝑿)v=λv1𝑛𝐾𝑿𝑿𝑣𝜆𝑣\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})v=\lambda vdivide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) italic_v = italic_λ italic_v for some vector v𝑣vitalic_v, then

C^n(i=1nviK(Xi,))=1ni,j=1nviK(Xi,Xj)K(Xj,)=λnj=1nvjK(Xj,).subscript^𝐶𝑛superscriptsubscript𝑖1𝑛subscript𝑣𝑖𝐾subscript𝑋𝑖1𝑛superscriptsubscript𝑖𝑗1𝑛subscript𝑣𝑖𝐾subscript𝑋𝑖subscript𝑋𝑗𝐾subscript𝑋𝑗𝜆𝑛superscriptsubscript𝑗1𝑛subscript𝑣𝑗𝐾subscript𝑋𝑗\hat{C}_{n}\left(\sum_{i=1}^{n}v_{i}K\left(X_{i},\cdot\right)\right)=\frac{1}{% n}\sum_{i,j=1}^{n}v_{i}K\left(X_{i},X_{j}\right)K\left(X_{j},\cdot\right)=% \frac{\lambda}{n}\sum_{j=1}^{n}v_{j}K\left(X_{j},\cdot\right).over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_K ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ⋅ ) = divide start_ARG italic_λ end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ⋅ ) .

Thus, the eigenvalues of 1nK(𝑿,𝑿)1𝑛𝐾𝑿𝑿\frac{1}{n}K(\boldsymbol{X},\boldsymbol{X})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_K ( bold_italic_X , bold_italic_X ) are the same as the n𝑛nitalic_n largest eigenvalues of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and the remaining eigenvalues of C^nsubscript^𝐶𝑛\hat{C}_{n}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are zero. \square

Proof of Lemma F.7: Fix 0hn0𝑛0\leq h\leq n0 ≤ italic_h ≤ italic_n. For any f𝑓f\in\mathcal{H}italic_f ∈ caligraphic_H satisfying f1subscriptnorm𝑓1\|f\|_{\mathcal{H}}\leq 1∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 and

fn2superscriptsubscriptnorm𝑓𝑛2\displaystyle\|f\|_{n}^{2}∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =f,C^nf=i1λ^if,ϕ^i2t2,absentsubscript𝑓subscript^𝐶𝑛𝑓subscript𝑖1subscript^𝜆𝑖superscriptsubscript𝑓subscript^italic-ϕ𝑖2superscript𝑡2\displaystyle=\left\langle f,\hat{C}_{n}f\right\rangle_{\mathcal{H}}=\sum_{i% \geq 1}\hat{\lambda}_{i}\left\langle f,\hat{\phi}_{i}\right\rangle_{\mathcal{H% }}^{2}\leq t^{2},= ⟨ italic_f , over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟨ italic_f , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (202)

we have

i=1nwif(Xi)=superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖absent\displaystyle\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)=∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = f,i=1nwiK(Xi,)subscript𝑓superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖\displaystyle\ \left\langle f,\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot\right)% \right\rangle_{\mathcal{H}}⟨ italic_f , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT (203)
=\displaystyle== j=1hλ^jf,ϕ^jϕ^j,j=1h1λ^ji=1nwiK(Xi,),ϕ^jϕ^jsubscriptsuperscriptsubscript𝑗1subscript^𝜆𝑗subscript𝑓subscript^italic-ϕ𝑗subscript^italic-ϕ𝑗superscriptsubscript𝑗11subscript^𝜆𝑗subscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗subscript^italic-ϕ𝑗\displaystyle\ \left\langle\sum_{j=1}^{h}\sqrt{\hat{\lambda}_{j}}\left\langle f% ,\hat{\phi}_{j}\right\rangle_{\mathcal{H}}\hat{\phi}_{j},\sum_{j=1}^{h}\frac{1% }{\sqrt{\hat{\lambda}_{j}}}\left\langle\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot% \right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}\hat{\phi}_{j}\right\rangle_{% \mathcal{H}}⟨ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⟨ italic_f , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT
+f,j>hi=1nwiK(Xi,),ϕ^jϕ^jsubscript𝑓subscript𝑗subscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗subscript^italic-ϕ𝑗\displaystyle\ +\left\langle f,\sum_{j>h}\left\langle\sum_{i=1}^{n}w_{i}K\left% (X_{i},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}\hat{\phi}_{j}% \right\rangle_{\mathcal{H}}+ ⟨ italic_f , ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT
\displaystyle\leq t2j=1h1λ^ji=1nwiK(Xi,),ϕ^j2+12j>hi=1nwiK(Xi,),ϕ^j2.superscript𝑡2superscriptsubscript𝑗11subscript^𝜆𝑗superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2superscript12subscript𝑗superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2\displaystyle\ \sqrt{t^{2}\cdot\sum_{j=1}^{h}\frac{1}{\hat{\lambda}_{j}}\left% \langle\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right% \rangle_{\mathcal{H}}^{2}}+\sqrt{1^{2}\cdot\sum_{j>h}\left\langle\sum_{i=1}^{n% }w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}^{2}}.square-root start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + square-root start_ARG 1 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

By Jensen’s inequality that 𝔼Z𝔼Z𝔼𝑍𝔼𝑍\mathbb{E}\sqrt{Z}\leq\sqrt{\mathbb{E}Z}blackboard_E square-root start_ARG italic_Z end_ARG ≤ square-root start_ARG blackboard_E italic_Z end_ARG, we have

n𝔼wZ^n(w,t)=𝑛subscript𝔼𝑤subscript^𝑍𝑛𝑤𝑡absent\displaystyle n\mathbb{E}_{w}\widehat{Z}_{n}(w,t)=italic_n blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) = 𝔼wsupffnt|i=1nwif(xi)|subscript𝔼𝑤subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑥𝑖\displaystyle\ \mathbb{E}_{w}\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(x_{i}\right)\right|blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | (204)
\displaystyle\leq t𝔼wj=1h1λ^ji=1nwiK(Xi,),ϕ^j2+𝔼wj>hi=1nwiK(Xi,),ϕ^j2𝑡subscript𝔼𝑤superscriptsubscript𝑗11subscript^𝜆𝑗superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2subscript𝔼𝑤subscript𝑗superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2\displaystyle\ t\mathbb{E}_{w}\sqrt{\sum_{j=1}^{h}\frac{1}{\hat{\lambda}_{j}}% \left\langle\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right% \rangle_{\mathcal{H}}^{2}}+\mathbb{E}_{w}\sqrt{\sum_{j>h}\left\langle\sum_{i=1% }^{n}w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}^% {2}}italic_t blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq tj=1h1λ^j𝔼wi=1nwiK(Xi,),ϕ^j2+j>h𝔼wi=1nwiK(Xi,),ϕ^j2𝑡superscriptsubscript𝑗11subscript^𝜆𝑗subscript𝔼𝑤superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2subscript𝑗subscript𝔼𝑤superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2\displaystyle\ t\sqrt{\sum_{j=1}^{h}\frac{1}{\hat{\lambda}_{j}}\mathbb{E}_{w}% \left\langle\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right% \rangle_{\mathcal{H}}^{2}}+\sqrt{\sum_{j>h}\mathbb{E}_{w}\left\langle\sum_{i=1% }^{n}w_{i}K\left(X_{i},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}^% {2}}italic_t square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=\displaystyle== tj=1h1λ^j𝐇j+j>h𝐇j,𝑡superscriptsubscript𝑗11subscript^𝜆𝑗subscript𝐇𝑗subscript𝑗subscript𝐇𝑗\displaystyle\ t\sqrt{\sum_{j=1}^{h}\frac{1}{\hat{\lambda}_{j}}\mathbf{H}_{j}}% +\sqrt{\sum_{j>h}\mathbf{H}_{j}},italic_t square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG + square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ,

where 𝐇j=𝔼wi=1nwiK(Xi,),ϕ^j2subscript𝐇𝑗subscript𝔼𝑤superscriptsubscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2\mathbf{H}_{j}=\mathbb{E}_{w}\left\langle\sum_{i=1}^{n}w_{i}K\left(X_{i},\cdot% \right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}^{2}bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ⟨ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

For any j1𝑗1j\geq 1italic_j ≥ 1, we have

𝐇jsubscript𝐇𝑗\displaystyle\mathbf{H}_{j}bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT =𝔼wi,=1nwiwK(Xi,),ϕ^jK(Xl,),ϕ^jabsentsubscript𝔼𝑤superscriptsubscript𝑖1𝑛subscript𝑤𝑖subscript𝑤subscript𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗subscript𝐾subscript𝑋𝑙subscript^italic-ϕ𝑗\displaystyle=\mathbb{E}_{w}\sum_{i,\ell=1}^{n}w_{i}w_{\ell}\left\langle K% \left(X_{i},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}\left\langle K% \left(X_{l},\cdot\right),\hat{\phi}_{j}\right\rangle_{\mathcal{H}}= blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i , roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ⟨ italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ⟨ italic_K ( italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT (205)
=i=1nK(Xi,),ϕ^j2=i=1nϕj(Xi)2=nϕjn2absentsuperscriptsubscript𝑖1𝑛superscriptsubscript𝐾subscript𝑋𝑖subscript^italic-ϕ𝑗2superscriptsubscript𝑖1𝑛subscriptitalic-ϕ𝑗superscriptsubscript𝑋𝑖2𝑛superscriptsubscriptnormsubscriptitalic-ϕ𝑗𝑛2\displaystyle=\sum_{i=1}^{n}\left\langle K\left(X_{i},\cdot\right),\hat{\phi}_% {j}\right\rangle_{\mathcal{H}}^{2}=\sum_{i=1}^{n}\phi_{j}(X_{i})^{2}=n\|\phi_{% j}\|_{n}^{2}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ italic_K ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋅ ) , over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_n ∥ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(202)nϕ^j,C^nϕ^j202𝑛subscriptsubscript^italic-ϕ𝑗subscript^𝐶𝑛subscript^italic-ϕ𝑗\displaystyle\overset{(\ref{eqn:men_empirical_14})}{=}n\left\langle\hat{\phi}_% {j},\hat{C}_{n}\hat{\phi}_{j}\right\rangle_{\mathcal{H}}start_OVERACCENT ( ) end_OVERACCENT start_ARG = end_ARG italic_n ⟨ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT
=nλ^j.absent𝑛subscript^𝜆𝑗\displaystyle=n\hat{\lambda}_{j}.= italic_n over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

Combining (204) and (205) we have the upper bound of 𝒬^n(t)subscript^𝒬𝑛𝑡\widehat{\mathcal{Q}}_{n}(t)over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ):

(𝔼wZ^n(w,t))2superscriptsubscript𝔼𝑤subscript^𝑍𝑛𝑤𝑡2\displaystyle\left(\mathbb{E}_{w}\widehat{Z}_{n}(w,t)\right)^{2}( blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 1n2minhn(tnh+nj>hλ^j)2\displaystyle\leq\frac{1}{n^{2}}\min_{h\leq n}\left(t\sqrt{nh}+\sqrt{n\sum_{j>% h}\hat{\lambda}_{j}}\right)^{2}≤ divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_min start_POSTSUBSCRIPT italic_h ≤ italic_n end_POSTSUBSCRIPT ( italic_t square-root start_ARG italic_n italic_h end_ARG + square-root start_ARG italic_n ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (206)
2nminhn(t2h+j>hλ^j)absent2𝑛subscript𝑛superscript𝑡2subscript𝑗subscript^𝜆𝑗\displaystyle\leq\frac{2}{n}\min_{h\leq n}\left(t^{2}h+\sum_{j>h}\hat{\lambda}% _{j}\right)≤ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG roman_min start_POSTSUBSCRIPT italic_h ≤ italic_n end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h + ∑ start_POSTSUBSCRIPT italic_j > italic_h end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=2njnmin{λ^j,t2}.absent2𝑛subscript𝑗𝑛subscript^𝜆𝑗superscript𝑡2\displaystyle=\frac{2}{n}\sum_{j\leq n}\min\{\hat{\lambda}_{j},t^{2}\}.= divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j ≤ italic_n end_POSTSUBSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

\square

Proof of Lemma F.8: This proof is essentially borrowed from the proof of Lemma 43 in [58], for the self-consistency of the article, we show that all "absolute constants" mentioned in the proof of Lemma 43 in [58] are indeed absolute constants.

Set R=n1/2supffnt|i=1nwif(Xi)|𝑅superscript𝑛12subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖R=n^{-1/2}\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)\right|italic_R = italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |. Denote σ2=nt2superscript𝜎2𝑛superscript𝑡2\sigma^{2}=nt^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_n italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we apply (4.1) in [58], for the random variable

Z=supffnt|i=1nwif(Xi)𝔼wi=1nwif(Xi)|=nR,𝑍subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖subscript𝔼𝑤superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖𝑛𝑅\displaystyle Z=\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)-% \mathbb{E}_{w}\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)\right|=\sqrt{n}R,italic_Z = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | = square-root start_ARG italic_n end_ARG italic_R , (207)

and with probability larger than 1ex1superscript𝑒𝑥1-e^{-x}1 - italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT we have

1nsupffnt|i=1nwif(Xi)|2𝔼w1nsupffnt|i=1nwif(Xi)|+C(tx+xn),1𝑛subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖2subscript𝔼𝑤1𝑛subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖𝐶𝑡𝑥𝑥𝑛\displaystyle\frac{1}{\sqrt{n}}\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)% \right|\leq 2\mathbb{E}_{w}\frac{1}{\sqrt{n}}\sup_{\begin{subarray}{c}f\in{% \mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)% \right|+C\left(t\sqrt{x}+\frac{x}{\sqrt{n}}\right),divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≤ 2 blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | + italic_C ( italic_t square-root start_ARG italic_x end_ARG + divide start_ARG italic_x end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ) , (208)

and from Theorem 3 of [54], we know that C𝐶Citalic_C can be taken as 45.745.745.745.7.

From Lemma 45 of [58], we have

ctn1/2𝔼wsupffnt|i=1nwif(Xi)|,𝑐𝑡superscript𝑛12subscript𝔼𝑤subscriptsupremum𝑓subscriptnorm𝑓𝑛𝑡superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖\displaystyle ct\leq n^{-1/2}\mathbb{E}_{w}\sup_{\begin{subarray}{c}f\in{% \mathcal{B}}\\ \|f\|_{n}\leq t\end{subarray}}\left|\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)% \right|,italic_c italic_t ≤ italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | , (209)

and from Section 9.2 in [59] we know that c𝑐citalic_c is an absolute constant.

From (208) and (209), with probability larger than 1ex1superscript𝑒𝑥1-e^{-x}1 - italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT we have

R𝑅\displaystyle Ritalic_R 2𝔼wR+C(tx+xn)absent2subscript𝔼𝑤𝑅𝐶𝑡𝑥𝑥𝑛\displaystyle\leq 2\mathbb{E}_{w}R+C\left(t\sqrt{x}+\frac{x}{\sqrt{n}}\right)≤ 2 blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_R + italic_C ( italic_t square-root start_ARG italic_x end_ARG + divide start_ARG italic_x end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ) (210)
c1x𝔼wR,absentsubscript𝑐1𝑥subscript𝔼𝑤𝑅\displaystyle\leq c_{1}x\mathbb{E}_{w}R,≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_R ,

where c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is an absolute constant. Hence, {Rm𝔼wR}ec2m𝑅𝑚subscript𝔼𝑤𝑅superscript𝑒subscript𝑐2𝑚\mathbb{P}\{R\geq m\mathbb{E}_{w}R\}\leq e^{-c_{2}m}blackboard_P { italic_R ≥ italic_m blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_R } ≤ italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m end_POSTSUPERSCRIPT, where c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is an absolute constant, and m𝑚mitalic_m is an integer. Using Lemma 44 in [58], we get the desired result. \square

Proof of Lemma F.9: Since {ϕ^j}subscript^italic-ϕ𝑗\{\hat{\phi}_{j}\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } is a basis of \mathcal{H}caligraphic_H, we have a natural isomorphism i^:2:^𝑖superscript2\widehat{i}:\ell^{2}\to\mathcal{H}over^ start_ARG italic_i end_ARG : roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → caligraphic_H given by

b=(b1,b2,)jbjϕ^j().𝑏subscript𝑏1subscript𝑏2maps-tosubscript𝑗subscript𝑏𝑗subscript^italic-ϕ𝑗\displaystyle b=(b_{1},b_{2},\cdots)\mapsto\sum_{j}b_{j}\hat{\phi}_{j}(\cdot).italic_b = ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ ) ↦ ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) . (211)

Denote (t)={ffnt}𝑡conditional-set𝑓subscriptnorm𝑓𝑛𝑡\mathcal{F}(t)=\{f\in{\mathcal{B}}\mid\|f\|_{n}\leq t\}caligraphic_F ( italic_t ) = { italic_f ∈ caligraphic_B ∣ ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_t }. Since there exists b2𝑏superscript2b\in\ell^{2}italic_b ∈ roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT such that f(x)=jbjϕ^j(x)𝑓𝑥subscript𝑗subscript𝑏𝑗subscript^italic-ϕ𝑗𝑥f(x)=\sum_{j}b_{j}\hat{\phi}_{j}(x)italic_f ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ), we know that (t)=i({bj21 and bj2λ^jt2})𝑡𝑖superscriptsubscript𝑏𝑗21 and superscriptsubscript𝑏𝑗2subscript^𝜆𝑗superscript𝑡2\mathcal{F}(t)=i\left(\{\sum b_{j}^{2}\leq 1\text{ and }\sum b_{j}^{2}\hat{% \lambda}_{j}\leq t^{2}\}\right)caligraphic_F ( italic_t ) = italic_i ( { ∑ italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 and ∑ italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ).

Let ^={bi=1μ^ibi21}^conditional-set𝑏superscriptsubscript𝑖1subscript^𝜇𝑖superscriptsubscript𝑏𝑖21\widehat{\mathcal{E}}=\left\{b\mid\sum_{i=1}^{\infty}\hat{\mu}_{i}b_{i}^{2}% \leq 1\right\}over^ start_ARG caligraphic_E end_ARG = { italic_b ∣ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 }, where μ^i=(min{1,t2/λ^i})1subscript^𝜇𝑖superscript1superscript𝑡2subscript^𝜆𝑖1\hat{\mu}_{i}=\left(\min\left\{1,t^{2}/\hat{\lambda}_{i}\right\}\right)^{-1}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( roman_min { 1 , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Then i^(^)(t)^𝑖^𝑡\widehat{i}\left(\widehat{\mathcal{E}}\right)\subset\mathcal{F}(t)over^ start_ARG italic_i end_ARG ( over^ start_ARG caligraphic_E end_ARG ) ⊂ caligraphic_F ( italic_t ). Thus we have

𝔼w(Z^n(w,t))2subscript𝔼𝑤superscriptsubscript^𝑍𝑛𝑤𝑡2\displaystyle\mathbb{E}_{w}\left(\widehat{Z}_{n}(w,t)\right)^{2}blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 𝔼wsupf^|1ni=1nwif(Xi)|2absentsubscript𝔼𝑤subscriptsupremum𝑓^superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖𝑓subscript𝑋𝑖2\displaystyle\geq\mathbb{E}_{w}\sup_{f\in\widehat{\mathcal{E}}}\left|\frac{1}{% n}\sum_{i=1}^{n}w_{i}f\left(X_{i}\right)\right|^{2}≥ blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_f ∈ over^ start_ARG caligraphic_E end_ARG end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (212)
=𝔼wsup{β^iμ^iβ^i21}|1ni=1nwij=1β^jϕ^j(Xi)|2absentsubscript𝔼𝑤subscriptsupremumconditional-setsubscript^𝛽𝑖subscript^𝜇𝑖superscriptsubscript^𝛽𝑖21superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖superscriptsubscript𝑗1subscript^𝛽𝑗subscript^italic-ϕ𝑗subscript𝑋𝑖2\displaystyle=\mathbb{E}_{w}\sup_{\{\hat{\beta}_{i}\mid\sum\hat{\mu}_{i}\hat{% \beta}_{i}^{2}\leq 1\}}\left|\frac{1}{n}\sum_{i=1}^{n}w_{i}\sum_{j=1}^{\infty}% \hat{\beta}_{j}\hat{\phi}_{j}\left(X_{i}\right)\right|^{2}= blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ ∑ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1 } end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=1n2𝔼wj=11μ^j(i=1nwiϕ^j(Xi))2absent1superscript𝑛2subscript𝔼𝑤superscriptsubscript𝑗11subscript^𝜇𝑗superscriptsuperscriptsubscript𝑖1𝑛subscript𝑤𝑖subscript^italic-ϕ𝑗subscript𝑋𝑖2\displaystyle=\frac{1}{n^{2}}\cdot\mathbb{E}_{w}\sum_{j=1}^{\infty}\frac{1}{% \hat{\mu}_{j}}\left(\sum_{i=1}^{n}w_{i}\hat{\phi}_{j}\left(X_{i}\right)\right)% ^{2}= divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ blackboard_E start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(205)1nj=1λ^jμ^j=1ni=1nmin{λ^i,t2}.2051𝑛superscriptsubscript𝑗1subscript^𝜆𝑗subscript^𝜇𝑗1𝑛superscriptsubscript𝑖1𝑛subscript^𝜆𝑖superscript𝑡2\displaystyle\overset{(\ref{eqn:men_empirical_16})}{=}\frac{1}{n}\cdot\sum_{j=% 1}^{\infty}\frac{\hat{\lambda}_{j}}{\hat{\mu}_{j}}=\frac{1}{n}\sum_{i=1}^{n}% \min\left\{\hat{\lambda}_{i},t^{2}\right\}.start_OVERACCENT ( ) end_OVERACCENT start_ARG = end_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

\square

F.2 Concentration bounds

The following quadratic concentration inequality is introduced in [80].

Lemma F.10 (The main theorem in [80]).

Suppose we have 𝐞1,,𝐞ni.i.d.N(0,σ2)subscriptsimilar-toformulae-sequence𝑖𝑖𝑑subscript𝐞1subscript𝐞𝑛𝑁0superscript𝜎2\boldsymbol{e}_{1},\cdots,\boldsymbol{e}_{n}\sim_{i.i.d.}N(0,\sigma^{2})bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ start_POSTSUBSCRIPT italic_i . italic_i . italic_d . end_POSTSUBSCRIPT italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). For any matrix A={aij}i,j=1n𝐴superscriptsubscriptsubscript𝑎𝑖𝑗𝑖𝑗1𝑛A=\left\{a_{ij}\right\}_{i,j=1}^{n}italic_A = { italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, denote Q=i,j=1naij𝐞i𝐞j𝑄superscriptsubscript𝑖𝑗1𝑛subscript𝑎𝑖𝑗subscript𝐞𝑖subscript𝐞𝑗Q=\sum_{i,j=1}^{n}a_{ij}\boldsymbol{e}_{i}\boldsymbol{e}_{j}italic_Q = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then we have

[|Q𝔼[Q]|δ]exp(𝔠1min{δAop,δ2AF2}) for all δ>0,formulae-sequencedelimited-[]𝑄𝔼delimited-[]𝑄𝛿subscript𝔠1𝛿subscriptnorm𝐴opsuperscript𝛿2superscriptsubscriptnorm𝐴F2 for all 𝛿0\displaystyle\mathbb{P}[|Q-\mathbb{E}[Q]|\geq\delta]\leq\exp\left(-\mathfrak{c% }_{1}\min\left\{\frac{\delta}{\|A\|_{\mathrm{op}}},\frac{\delta^{2}}{\|A\|_{% \mathrm{F}}^{2}}\right\}\right)\quad\text{ for all }\delta>0,blackboard_P [ | italic_Q - blackboard_E [ italic_Q ] | ≥ italic_δ ] ≤ roman_exp ( - fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_min { divide start_ARG italic_δ end_ARG start_ARG ∥ italic_A ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT end_ARG , divide start_ARG italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ italic_A ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ) for all italic_δ > 0 , (213)

where 𝔠1subscript𝔠1\mathfrak{c}_{1}fraktur_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant only depending on σ𝜎\sigmaitalic_σ, and (Aop,AF)subscriptnorm𝐴opsubscriptnorm𝐴F\left(\|A\|_{\mathrm{op}},\|A\|_{\mathrm{F}}\right)( ∥ italic_A ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT , ∥ italic_A ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT ) are (respectively) the operator and Frobenius norms of the matrix A𝐴Aitalic_A.

Lemma F.11 (Theorem 9 in [77]).

Let w1,,wnsubscript𝑤1subscript𝑤𝑛w_{1},\ldots,w_{n}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be independent random variables with |wi|1subscript𝑤𝑖1\left|w_{i}\right|\leq 1| italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ 1 for all 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n. Let G:n:𝐺superscript𝑛G:\mathbb{R}^{n}\rightarrow\mathbb{R}italic_G : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R be a 1-Lipschitz convex function. Then for any δ>0𝛿0\delta>0italic_δ > 0 one has

(|G(w1,,wn)𝔼G(w1,,wn)|δ)C1exp(C2δ2)𝐺subscript𝑤1subscript𝑤𝑛𝔼𝐺subscript𝑤1subscript𝑤𝑛𝛿subscript𝐶1subscript𝐶2superscript𝛿2\displaystyle\mathbb{P}\left(|G(w_{1},\ldots,w_{n})-\mathbb{E}G(w_{1},\ldots,w% _{n})|\geq\delta\right)\leq C_{1}\exp\left(-C_{2}\delta^{2}\right)blackboard_P ( | italic_G ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - blackboard_E italic_G ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) | ≥ italic_δ ) ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp ( - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (214)

for some absolute constants C1,C2>0subscript𝐶1subscript𝐶20C_{1},C_{2}>0italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0.

Lemma F.12 (Theorem 14.1 in [79]).

Let \mathcal{B}caligraphic_B be the unit ball of the RKHS \mathcal{H}caligraphic_H. Let δnsubscript𝛿𝑛\delta_{n}italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be any positive solution of the inequality

Qn(δ)2δ22eσ,subscript𝑄𝑛𝛿2superscript𝛿22𝑒𝜎Q_{n}(\delta)\leq\frac{\sqrt{2}\delta^{2}}{2e\sigma},italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_δ ) ≤ divide start_ARG square-root start_ARG 2 end_ARG italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_e italic_σ end_ARG ,

where Qnsubscript𝑄𝑛Q_{n}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is defined in Lemma E.6. Then there exist absolute constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, such that for any εδn𝜀subscript𝛿𝑛\varepsilon\geq\delta_{n}italic_ε ≥ italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we have

|fn2fL22|fL222+C1ε2 for all f,formulae-sequencesuperscriptsubscriptnorm𝑓𝑛2superscriptsubscriptnorm𝑓superscript𝐿22superscriptsubscriptnorm𝑓superscript𝐿222subscript𝐶1superscript𝜀2 for all 𝑓\left|\|f\|_{n}^{2}-\|f\|_{L^{2}}^{2}\right|\leq\frac{\|f\|_{L^{2}}^{2}}{2}+C_% {1}\varepsilon^{2}\quad\text{ for all }f\in\mathcal{B},| ∥ italic_f ∥ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≤ divide start_ARG ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all italic_f ∈ caligraphic_B ,

with probability at least 1C2eC3nε21subscript𝐶2superscript𝑒subscript𝐶3𝑛superscript𝜀21-C_{2}e^{-C_{3}n\varepsilon^{2}}1 - italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Lemma F.13 (Theorem 3 in [54]).

Consider n𝑛nitalic_n independent random variables x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\cdots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT sampling from ρ𝒳subscript𝜌𝒳\rho_{\mathcal{X}}italic_ρ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT on 𝒳𝒳\mathcal{X}caligraphic_X. For any t>0𝑡0t>0italic_t > 0, let \mathcal{F}caligraphic_F be some countable family of real-valued measurable functions in L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT spaces, such that fL2tsubscriptnorm𝑓superscript𝐿2𝑡\|f\|_{L^{2}}\leq t∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t and f1subscriptnorm𝑓1\|f\|_{\infty}\leq 1∥ italic_f ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1 for every f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F. Let Z𝑍Zitalic_Z denote supf|i=1nf(xi)|subscriptsupremum𝑓superscriptsubscript𝑖1𝑛𝑓subscript𝑥𝑖\sup_{f\in\mathcal{F}}\left|\sum_{i=1}^{n}f\left(x_{i}\right)\right|roman_sup start_POSTSUBSCRIPT italic_f ∈ caligraphic_F end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |. Let σ2:=nt2supfi=1nVar(f(xi))assignsuperscript𝜎2𝑛superscript𝑡2subscriptsupremum𝑓superscriptsubscript𝑖1𝑛Var𝑓subscript𝑥𝑖\sigma^{2}:=nt^{2}\geq\sup_{f\in\mathcal{F}}\sum_{i=1}^{n}\operatorname{Var}% \left(f\left(x_{i}\right)\right)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := italic_n italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ roman_sup start_POSTSUBSCRIPT italic_f ∈ caligraphic_F end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Var ( italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), then, for any positive real number δ𝛿\deltaitalic_δ,

(Z32𝔼[Z]+2σ2δ+66.5δ)exp{δ}.𝑍32𝔼delimited-[]𝑍2𝜎2𝛿66.5𝛿𝛿\displaystyle\mathbb{P}\left(Z\geq\frac{3}{2}\mathbb{E}[Z]+2\sigma\sqrt{2% \delta}+66.5\delta\right)\leq\exp\{-\delta\}.blackboard_P ( italic_Z ≥ divide start_ARG 3 end_ARG start_ARG 2 end_ARG blackboard_E [ italic_Z ] + 2 italic_σ square-root start_ARG 2 italic_δ end_ARG + 66.5 italic_δ ) ≤ roman_exp { - italic_δ } . (215)

Moreover, one also has

(Z12𝔼[Z]σ10.8δ88.9δ)exp{δ}.𝑍12𝔼delimited-[]𝑍𝜎10.8𝛿88.9𝛿𝛿\displaystyle\mathbb{P}\left(Z\leq\frac{1}{2}\mathbb{E}[Z]-\sigma\sqrt{10.8% \delta}-88.9\delta\right)\leq\exp\{-\delta\}.blackboard_P ( italic_Z ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG blackboard_E [ italic_Z ] - italic_σ square-root start_ARG 10.8 italic_δ end_ARG - 88.9 italic_δ ) ≤ roman_exp { - italic_δ } . (216)
Remark F.14.

From Corollary 3.6 of [72] and Assumption 1, we know that the RKHS \mathcal{H}caligraphic_H as well as its unit ball \mathcal{B}caligraphic_B are separable. Therefore, there exists a countable dense subset \mathcal{F}\subset\mathcal{B}caligraphic_F ⊂ caligraphic_B. One can show that

supffL2t|i=1nf(xi)|=supffL2t|i=1nf(xi)|,subscriptsupremum𝑓subscriptnorm𝑓superscript𝐿2𝑡superscriptsubscript𝑖1𝑛𝑓subscript𝑥𝑖subscriptsupremum𝑓subscriptnorm𝑓superscript𝐿2𝑡superscriptsubscript𝑖1𝑛𝑓subscript𝑥𝑖\displaystyle\sup_{\begin{subarray}{c}f\in{\mathcal{F}}\\ \|f\|_{L^{2}}\leq t\end{subarray}}\left|\sum_{i=1}^{n}f\left(x_{i}\right)% \right|=\sup_{\begin{subarray}{c}f\in{\mathcal{B}}\\ \|f\|_{L^{2}}\leq t\end{subarray}}\left|\sum_{i=1}^{n}f\left(x_{i}\right)% \right|,roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_F end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_B end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t end_CELL end_ROW end_ARG end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | , (217)

and hence results in Lemma F.13 still hold when \mathcal{F}caligraphic_F is replaced by the set {ffL2t}conditional-set𝑓subscriptnorm𝑓superscript𝐿2𝑡\{f\in{\mathcal{B}}\mid\|f\|_{L^{2}}\leq t\}{ italic_f ∈ caligraphic_B ∣ ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_t }.