License: CC BY 4.0
arXiv:2310.15351v2 [cs.LG] 02 Feb 2024

Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

Sudeep Salgia School of Electrical & Computer Engineering, Cornell University, Ithaca, NY, {ss3827,qz16}@cornell.edu Sattar Vakili MediaTek Research, UK, [email protected] Qing Zhao School of Electrical & Computer Engineering, Cornell University, Ithaca, NY, {ss3827,qz16}@cornell.edu
(Oct 2023; Revised Feb 2024)
Abstract

We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established in this work, which may be of independent interest. We further develop an algorithm based on random exploration with domain shrinking and establish its order-optimal regret guarantees under both noise-free and noisy settings. In the noise-free setting, our analysis closes the existing gap in regret performance and thereby resolves a COLT open problem. The proposed algorithm also enjoys a computational advantage over prevailing methods due to the random exploration that obviates the expensive optimization of a non-convex acquisition function for choosing the query points at each iteration.

1 Introduction

1.1 GP-based Bayesian Optimization

We consider the problem of sequential optimization of an unknown, possibly non-convex, function f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R. The learner sequentially chooses a query point xt𝒳subscript𝑥𝑡𝒳x_{t}\in\mathcal{X}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X at each time t𝑡titalic_t and observes the function value (potentially subject to noise) at xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The learning objective is to approach a global maximizer x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of the function through a sequence of query points {xt}t=1Tsuperscriptsubscriptsubscript𝑥𝑡𝑡1𝑇\{x_{t}\}_{t=1}^{T}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT chosen sequentially in time. In addition to the convergence of {xt}t=1Tsuperscriptsubscriptsubscript𝑥𝑡𝑡1𝑇\{x_{t}\}_{t=1}^{T}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT to x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, an online measure of the learning efficiency is the cumulative regret

R(T)=t=1T[f(x*)f(xt)].𝑅𝑇superscriptsubscript𝑡1𝑇delimited-[]𝑓superscript𝑥𝑓subscript𝑥𝑡\displaystyle R(T)=\sum_{t=1}^{T}\left[f(x^{*})-f(x_{t})\right].italic_R ( italic_T ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] . (1)

The above problem finds a wide range of applications including hyperparameter optimization Li et al. (2016), experimental design Greenhill et al. (2020), recommendation systems Vanchinathan et al. (2014) and robotics Lizotte et al. (2007). An approach that has proven to be particularly effective is Bayesian Optimization (BO) using Gaussian Process (GP) models (a.k.a. kernel-based bandit optimization). The unknown objective function f𝑓fitalic_f is assumed to live in a Reproducing Kernel Hilbert Space (RKHS) associated with a known kernel. Within the GP-based BO framework, f𝑓fitalic_f is viewed as a realization of a Gaussian process over 𝒳𝒳\mathcal{X}caligraphic_X. With each new query xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the learner sharpens the posterior distribution and uses it as a proxy for f𝑓fitalic_f for subsequent optimization. We point out that such a Bayesian approach is equally applicable to a frequentist formulation where f𝑓fitalic_f is deterministic as considered in this work. In this case, the GP model of f𝑓fitalic_f is fictitious and internal to the algorithm.

Under the assumption of noise-free query feedback, BO techniques were used for optimization as early as 1964 Kushner (1964). GP-based BO was popularized through the work of Močkus et al. (1978). Since then, a number of approaches have been developed and analyzed over the years, often under certain conditions on the kernels and functional characteristics around x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT (see Sec. 1.3 for a detailed discussion). Surprisingly, despite the long history, an algorithm with guaranteed order-optimal regret performance remains open as discussed in Vakili (2022).

GP-based BO under noisy query was studied much more recently, following the pioneering work by  Srinivas et al. (2010) where they proposed the celebrated GP-UCB algorithm. Extensive studies since then have fully characterized the achievable learning performance, both in terms of information-theoretic lower bounds Scarlett et al. (2017) and the design of algorithms such as SupKernel-UCB Valko et al. (2013), GP-ThreDS Salgia et al. (2021), BPE Li and Scarlett (2022), and RIPS Camilleri et al. (2021) that achieve the optimal performance.

Under both the noise-free and noisy settings, a key practical concern for GP-based algorithms is their computational cost. The major computational bottleneck of prevailing GP-based algorithms is the maximization of an acquisition function for choosing the query point at each time instant. The acquisition functions are often non-convex and computationally expensive to maximize. To achieve low regret order, such an optimization often needs to be carried out with increasing accuracy as time goes, resulting in a high overall computational requirement.

1.2 Main Results

We explore a new design methodology for GP-based BO: an open-loop exploration of the domain using query points sampled at random from an arbitrary probability distribution supported over the domain. We show that this random exploration approach, while simplistic in nature, leads to order-optimal regret guarantees under both noise-free and noisy feedback models, thus closing the long standing regret gap in the noise-free setting. Moreover, the non-adaptive nature of random sampling bypasses the expensive step of optimizing a non-convex acquisition function, offering a computationally efficient solution without sacrificing learning efficiency.

Random exploration, while not new to many problems (see Sec. 1.3), has not been considered or analyzed for GP-based BO. It stands in sharp contrast to the prevailing exploratory query strategy in GP-based BO: the maximum posterior variance (MPV) sampling. Under MPV, the learning algorithm at each time queries the point with the highest posterior variance conditioned on past observations, i.e., a greedy approach to maximal uncertainty reduction. Surprisingly, we show that the simple, non-adaptive scheme of random exploration achieves the same order of predictive performance as MPV sampling, which is known to be order-optimal. In particular, we show that the worst-case posterior variance corresponding to n𝑛nitalic_n randomly drawn points is bounded with high probability by 𝒪~(γn/n)~𝒪subscript𝛾𝑛𝑛\tilde{\mathcal{O}}(\gamma_{n}/n)over~ start_ARG caligraphic_O end_ARG ( italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n ) and 𝒪~(n1β)~𝒪superscript𝑛1𝛽\tilde{\mathcal{O}}(n^{1-\beta})over~ start_ARG caligraphic_O end_ARG ( italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT ) under noisy and noise-free feedback models, where γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the maximal information gain from n𝑛nitalic_n query points and β>1𝛽1\beta>1italic_β > 1 is the order of the polynomial eigendecay of the kernel (see Sec. 2 for their definitions).

A simpler solution is often more demanding when it comes to establishing optimality in performance. The drastically different nature of random exploration from MPV demands different analytical techniques in characterizing its predictive performance. The tightest bound on the worst-case predictive error of MPV sampling, derived in Wenzel et al. (2021), was obtained using the results on scattered data interpolation (i.e., approximating an unknown function using a given set of points) of functions in Sobolev spaces that provide bounds on the worst-case estimation error of the best interpolant based on the fill distance of the given set of points Wendland (2004); Narcowich et al. (2006); Brenner et al. (2008); Arcangéli et al. (2012); Wenzel et al. (2021). Since RKHSs of Matérn kernels are norm-equivalent to Sobolev spaces, these results also immediately translate to estimation errors for function interpolation in RKHSs. The analytical techniques used in these studies require various technical assumptions on the regularity of the function domain and its boundary. These technical assumptions on the function domain present major challenges in incorporating MPV sampling with effective optimization techniques such as domain shrinking/elimination, hindering its potential applicability in designing algorithms with optimal regret. In contrast, in analyzing random exploration, we establish the concentration of the spectrum of the sample covariance operator to that of the true covariance operator that holds universally for all compact domains. The crux of our analysis builds upon a careful treatment of the infinite-dimensional operators to separately ensure the concentration of the initial spectrum (consisting of the larger eigenvalues) and the tail spectrum, which allows us to obtain optimal convergence rate. The simplicity of random exploration in its implementation and the generality in its guaranteed predictive performance as established in this work make this exploration strategy an attractive alternative to MPV. We believe that the tools and techniques established here are of independent interest for extending the methodology of random exploration to other problem fields.

Built upon the above key results on random exploration, we develop and analyze a new algorithm for GP-based BO. Referred to as Random Exploration with Domain Shrinking (REDS), this algorithm integrates the exploration strategy of random sampling with the optimization technique of domain shrinking Li and Scarlett (2022); Salgia et al. (2021). Under the noise-free feedback model, we show that REDS incurs a cumulative regret of 𝒪~(max{T(3β)/2,1})~𝒪superscript𝑇3𝛽21\tilde{\mathcal{O}}(\max\{T^{(3-\beta)/2},1\})over~ start_ARG caligraphic_O end_ARG ( roman_max { italic_T start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT , 1 } ), which closes the gap to the known lower bound established in Tuo and Wang (2020) and hence resolves the longstanding open problem. The generality of random exploration, both in terms of the design methodology and performance guarantee is the reason behind the optimal regret performance of REDS. In particular, the order-optimal predictive performance of random exploration that holds universally over all compact domain enables a seamless integration of this exploration strategy with domain shrinking. Similarly, in the noisy setting, we show that REDS offers a cumulative regret of 𝒪~(TγT)~𝒪𝑇subscript𝛾𝑇\tilde{\mathcal{O}}(\sqrt{T\gamma_{T}})over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_T italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ), which is order-optimal up to logarithmic factors.

The computational advantage of REDS is evident due to the simplicity of random exploration. We further demonstrate this with empirical studies where we compare REDS with BPE Li and Scarlett (2022) and GP-ThreDS Salgia et al. (2021), all offering optimal regret performance. GP-ThreDS was shown to be computationally more efficient than prevailing algorithms such as GP-UCB. We show that REDS offers a significant speed-up in running time over both algorithms without compromising the regret performance. As shown in Table 1, REDS offers a 15×\sim 15\times∼ 15 × and 100×\sim 100\times∼ 100 × speed-up in runtime over GP-ThreDS and BPE, respectively.

1.3 Related Work

For GP-based BO with noise-free feedback, a number of algorithms such as GP-EI Močkus (1975), EGO Jones et al. (1998), knowledge-gradient policy Frazier et al. (2008), and GP-PI Kushner (1964); Törn and Žilinskas (1989); Jones (2001) have been proposed, which have since become classical. We refer the reader to the excellent tutorial by Brochu et al. (2010) for a more detailed description of the classical approaches. Despite their good empirical performance and popularity, theoretical guarantee on the convergence of these algorithms has only been established relatively recently. Vazquez and Bect (2010) showed that EI converges almost surely for any function drawn from a GP prior of finite smoothness. Grünewälder et al. (2010) established the convergence rate of a computationally infeasible version of EI. Later, Bull (2011) established convergence rates for the computationally feasible version, showing that GP-EI achieves the optimal simple regret for Matérn kernels with smoothness ν<1𝜈1\nu<1italic_ν < 1, which does not translate to optimal cumulative regret performance. More recently, De Freitas et al. (2012) proposed the Branch and Bound algorithm that achieves a constant cumulative regret in Bayesian setting under additional assumptions on the differentiability of the kernel and the behaviour around the unique global maximum, which in practice are difficult to verify. In contrast, REDS requires no such additional assumptions and is analyzed in the frequentist setting. Lyu et al. (2020) showed that for kernels with a polynomial eigendecay with parameter β𝛽\betaitalic_β (See Definition 2.2), the GP-UCB algorithm achieves a regret of 𝒪(T1+β2β)𝒪superscript𝑇1𝛽2𝛽\mathcal{O}(T^{\frac{1+\beta}{2\beta}})caligraphic_O ( italic_T start_POSTSUPERSCRIPT divide start_ARG 1 + italic_β end_ARG start_ARG 2 italic_β end_ARG end_POSTSUPERSCRIPT ), which is sub-optimal, as shown in Vakili (2022).

The idea of using random sampling has been explored in related fields. The reconstruction of square integrable functions using random samples is a well-studied problem Bohn and Griebel (2017); Bastian Bohn (2017); Bohn (2018); Smale and Zhou (2004); Cohen et al. (2013); Chkifa et al. (2015); Cohen and Migliorati (2017). In particular, a series of studies considers efficient reconstruction of functions in RKHS using random samples drawn from the domain Kämmerer et al. (2021); Krieg and Ullrich (2021a, b); Moeller and Ullrich (2021). Despite certain similarities in the problem setup, an important point of distinction is that these studies focus on bounding the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error of the reconstruction. In this work, we focus on bounding the sup-norm (or equivalently, Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm) of the estimation error, which is larger than the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm and more challenging than bounding the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm. Since the analysis of algorithms requires a bound on the sup-norm of the estimation error, existing results are not applicable here.

2 Problem Statement

2.1 RKHS and Mercer’s Theorem

Let 𝒳𝒳\mathcal{X}caligraphic_X be a compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and ϱitalic-ϱ\varrhoitalic_ϱ a finite Borel measure supported on 𝒳𝒳\mathcal{X}caligraphic_X. A measure ϱitalic-ϱ\varrhoitalic_ϱ is said to be supported on 𝒳𝒳\mathcal{X}caligraphic_X if ϱ(𝒴)>0italic-ϱ𝒴0\varrho(\mathcal{Y})>0italic_ϱ ( caligraphic_Y ) > 0 for all open sets 𝒴𝒳𝒴𝒳\mathcal{Y}\subset\mathcal{X}caligraphic_Y ⊂ caligraphic_X. For 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, this is equivalent to ϱitalic-ϱ\varrhoitalic_ϱ being absolutely continuous w.r.t. the Lebesgue measure. Let L2(ϱ,𝒳)subscript𝐿2italic-ϱ𝒳L_{2}(\varrho,\mathcal{X})italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ , caligraphic_X ) denote the Hilbert space of (real) functions defined over 𝒳𝒳\mathcal{X}caligraphic_X that are square-integrable w.r.t. ϱitalic-ϱ\varrhoitalic_ϱ111To be rigorous, each fL2(ϱ,𝒳)𝑓subscript𝐿2italic-ϱ𝒳f\in L_{2}(\varrho,\mathcal{X})italic_f ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ , caligraphic_X ) represents the class of functions that are equivalent ϱitalic-ϱ\varrhoitalic_ϱ-everywhere..

Consider a positive definite kernel k:𝒳×𝒳:𝑘𝒳𝒳k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}italic_k : caligraphic_X × caligraphic_X → blackboard_R. A Hilbert space ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of functions on 𝒳𝒳\mathcal{X}caligraphic_X equipped with an inner product ,ksubscriptsubscript𝑘\langle\cdot,\cdot\rangle_{\mathcal{H}_{k}}⟨ ⋅ , ⋅ ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is called a Reproducing Kernel Hilbert Space (RKHS) with reproducing kernel k𝑘kitalic_k if the following conditions are satisfied: (i) x𝒳for-all𝑥𝒳\forall\ x\in\mathcal{X}∀ italic_x ∈ caligraphic_X, k(,x)k𝑘𝑥subscript𝑘k(\cdot,x)\in\mathcal{H}_{k}italic_k ( ⋅ , italic_x ) ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT; (ii) x𝒳for-all𝑥𝒳\forall\ x\in\mathcal{X}∀ italic_x ∈ caligraphic_X, fkfor-all𝑓subscript𝑘\forall\ f\in\mathcal{H}_{k}∀ italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, f(x)=f,k(,x)k𝑓𝑥subscript𝑓𝑘𝑥subscript𝑘f(x)=\langle f,k(\cdot,x)\rangle_{\mathcal{H}_{k}}italic_f ( italic_x ) = ⟨ italic_f , italic_k ( ⋅ , italic_x ) ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT. For simplicity, we use ψxsubscript𝜓𝑥\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to denote k(,x)𝑘𝑥k(\cdot,x)italic_k ( ⋅ , italic_x ). The inner product induces the RKHS norm, fk2=f,fksuperscriptsubscriptnorm𝑓subscript𝑘2subscript𝑓𝑓subscript𝑘\|f\|_{\mathcal{H}_{k}}^{2}=\langle f,f\rangle_{\mathcal{H}_{k}}∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ⟨ italic_f , italic_f ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT. WLOG, we assume that k(x,x)=ψxk21𝑘𝑥𝑥superscriptsubscriptnormsubscript𝜓𝑥subscript𝑘21k(x,x)=\|\psi_{x}\|_{\mathcal{H}_{k}}^{2}\leq 1italic_k ( italic_x , italic_x ) = ∥ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 1. For brevity, we drop the subscript of ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from the inner product for the rest of the paper.

Mercer’s Theorem provides an alternative representation for RKHSs through the eigenvalues and eigenfunctions of a kernel integral operator defined over L2(ϱ,𝒳)subscript𝐿2italic-ϱ𝒳L_{2}(\varrho,\mathcal{X})italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ , caligraphic_X ) using the kernel k𝑘kitalic_k.

Theorem 2.1.

(Steinwart and Christmann, 2008, Theorem 4.49) Let 𝒳𝒳\mathcal{X}caligraphic_X be a compact metric space, k:𝒳×𝒳normal-:𝑘normal-→𝒳𝒳k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}italic_k : caligraphic_X × caligraphic_X → blackboard_R be a continuous kernel and ϱitalic-ϱ\varrhoitalic_ϱ be a finite Borel measure supported on 𝒳𝒳\mathcal{X}caligraphic_X. Then, there exists an orthonormal system of functions {φj}jsubscriptsubscript𝜑𝑗𝑗\{\varphi_{j}\}_{j\in\mathbb{N}}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT in L2(ϱ,𝒳)subscript𝐿2italic-ϱ𝒳L_{2}(\varrho,\mathcal{X})italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ , caligraphic_X ) and a sequence of non-negative values {λj}jsubscriptsubscript𝜆𝑗𝑗\{\lambda_{j}\}_{j\in\mathbb{N}}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT satisfying λ1λ20subscript𝜆1subscript𝜆2normal-⋯0\lambda_{1}\geq\lambda_{2}\geq\dots\geq 0italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ ⋯ ≥ 0, such that k(x,x)=jλjφj(x)φj(x)𝑘𝑥superscript𝑥normal-′subscript𝑗subscript𝜆𝑗subscript𝜑𝑗𝑥subscript𝜑𝑗superscript𝑥normal-′\displaystyle k(x,x^{\prime})=\sum_{j\in\mathbb{N}}\lambda_{j}\varphi_{j}(x)% \varphi_{j}(x^{\prime})italic_k ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) holds for all x,x𝒳𝑥superscript𝑥normal-′𝒳x,x^{\prime}\in\mathcal{X}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X and the convergence is absolute and uniform over x,x𝒳𝑥superscript𝑥normal-′𝒳x,x^{\prime}\in\mathcal{X}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X. Moreover, {(λj,φj)}jsubscriptsubscript𝜆𝑗subscript𝜑𝑗𝑗\{(\lambda_{j},\varphi_{j})\}_{j\in\mathbb{N}}{ ( italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT corresponds to the eigensystem of the kernel integral operator Tk:L2(ϱ)L2(ϱ)normal-:subscript𝑇𝑘normal-→subscript𝐿2italic-ϱsubscript𝐿2italic-ϱT_{k}:L_{2}(\varrho)\to L_{2}(\varrho)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ ) → italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ ) given by Tkf=𝒳k(,x)f(x)𝑑ϱ(x)subscript𝑇𝑘𝑓subscript𝒳𝑘normal-⋅𝑥𝑓𝑥differential-ditalic-ϱ𝑥T_{k}f=\int_{\mathcal{X}}k(\cdot,x)f(x)d\varrho(x)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_f = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_k ( ⋅ , italic_x ) italic_f ( italic_x ) italic_d italic_ϱ ( italic_x ) for all fL2(ϱ)𝑓subscript𝐿2italic-ϱf\in L_{2}(\varrho)italic_f ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ ).

Consequently, the Mercer representation (Steinwart and Christmann, 2008, Thm. 4.51) of the RKHS of k𝑘kitalic_k is given as

k={f:=jαjλj12φj:fk2=jαj2<}.subscript𝑘conditional-setassign𝑓subscript𝑗subscript𝛼𝑗superscriptsubscript𝜆𝑗12subscript𝜑𝑗superscriptsubscriptnorm𝑓subscript𝑘2subscript𝑗superscriptsubscript𝛼𝑗2\displaystyle\mathcal{H}_{k}=\left\{f:=\sum_{j\in\mathbb{N}}\alpha_{j}{\lambda% _{j}}^{\frac{1}{2}}\varphi_{j}:\|f\|_{\mathcal{H}_{k}}^{2}=\sum_{j\in\mathbb{N% }}\alpha_{j}^{2}<\infty\right\}.caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { italic_f := ∑ start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ } .

This also implies that {υj}jsubscriptsubscript𝜐𝑗𝑗\{\upsilon_{j}\}_{j\in\mathbb{N}}{ italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT with υj=λjφjsubscript𝜐𝑗subscript𝜆𝑗subscript𝜑𝑗\upsilon_{j}=\sqrt{\lambda_{j}}\varphi_{j}italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is an orthonormal basis for ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The following definition characterizes a class of kernels based on their eigendecay profile corresponding to their Mercer representation.

Definition 2.2.

Let {λj}jsubscriptsubscript𝜆𝑗𝑗\{\lambda_{j}\}_{j\in\mathbb{N}}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT denote the eigenvalues of a kernel k𝑘kitalic_k arranged in the descending order. The kernel k𝑘kitalic_k is said to satisfy the polynomial eigendecay condition with a parameter β>1𝛽1\beta>1italic_β > 1 if, for some universal constant C>0𝐶0C>0italic_C > 0, we have λjCjβsubscript𝜆𝑗𝐶superscript𝑗𝛽\lambda_{j}\leq Cj^{-\beta}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_C italic_j start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT for all j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N.

The above class of kernels encompasses a large number of kernels including the widely used Matérn family. We make the following assumption on the kernel k𝑘kitalic_k which is commonly adopted in the literature Vakili et al. (2021b); Chatterji et al. (2019); Riutort-Mayol et al. (2023).

Assumption 2.3.

The eigenfunctions {φj}jsubscriptsubscript𝜑𝑗𝑗\{\varphi_{j}\}_{j\in\mathbb{N}}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT corresponding to k𝑘kitalic_k are continuous and hence bounded on 𝒳𝒳\mathcal{X}caligraphic_X, i.e., there exists F>0𝐹0F>0italic_F > 0 such that supx𝒳|φj(x)|Fsubscriptsupremum𝑥𝒳subscript𝜑𝑗𝑥𝐹\sup_{x\in\mathcal{X}}|\varphi_{j}(x)|\leq Froman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT | italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_F for all j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N.

2.2 Problem Formulation

We consider the problem of optimizing a fixed and unknown function f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R, where 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a compact domain and fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with fkBsubscriptnorm𝑓subscript𝑘𝐵\|f\|_{\mathcal{H}_{k}}\leq B∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_B. A sequential optimization algorithm chooses a point xt𝒳subscript𝑥𝑡𝒳x_{t}\in\mathcal{X}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X at each time t𝑡titalic_t and observes yt=f(xt)+εtsubscript𝑦𝑡𝑓subscript𝑥𝑡subscript𝜀𝑡y_{t}=f(x_{t})+\varepsilon_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. In the noise-free setting, εt0subscript𝜀𝑡0\varepsilon_{t}\equiv 0italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≡ 0 for all t𝑡titalic_t. For the noisy setting, we assume that {εt}t=1Tsuperscriptsubscriptsubscript𝜀𝑡𝑡1𝑇\{\varepsilon_{t}\}_{t=1}^{T}{ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are independent, zero-mean, R𝑅Ritalic_R-sub Gaussian random variables for some fixed constant R0𝑅0R\geq 0italic_R ≥ 0, i.e., 𝔼[exp(ζεt)]exp(ζ2R2/2)𝔼delimited-[]𝜁subscript𝜀𝑡superscript𝜁2superscript𝑅22\mathbb{E}[\exp(\zeta\varepsilon_{t})]\leq\exp(\zeta^{2}R^{2}/2)blackboard_E [ roman_exp ( italic_ζ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] ≤ roman_exp ( italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ), for all ζ𝜁\zeta\in\mathbb{R}italic_ζ ∈ blackboard_R and tT𝑡𝑇t\leq Titalic_t ≤ italic_T. The performance of the sequential algorithm is measured using the notion of cumulative regret, as defined in Eqn. (1).

2.3 Preliminaries on Gaussian Processes

Under the GP model, the unknown function f𝑓fitalic_f is treated hypothetically as a realization of GP(0,k)GP0𝑘\text{GP}(0,k)GP ( 0 , italic_k ), a Gaussian Process over 𝒳𝒳\mathcal{X}caligraphic_X with zero mean and k(,)𝑘k(\cdot,\cdot)italic_k ( ⋅ , ⋅ ) as the covariance kernel. The noise terms ε𝜀\varepsilonitalic_ε are also viewed as zero mean Gaussian variables with variance τ𝜏\tauitalic_τ. The conjugate property of GPs with Gaussian noise allows for a closed form expression of the posterior distribution. Specifically, let 𝒵t={(xi,yi)}i=1tsubscript𝒵𝑡superscriptsubscriptsubscript𝑥𝑖subscript𝑦𝑖𝑖1𝑡\mathcal{Z}_{t}=\{(x_{i},y_{i})\}_{i=1}^{t}caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denote a collection of points and their corresponding observations obtained according to the model described in Sec. 2.2. Then, conditioned on 𝒵tsubscript𝒵𝑡\mathcal{Z}_{t}caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the posterior distribution of f𝑓fitalic_f is also a GP with the following mean and covariance functions:

μt,τ(x)subscript𝜇𝑡𝜏𝑥\displaystyle\mu_{t,\tau}(x)italic_μ start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT ( italic_x ) =kXt,x(KXt,Xt+τIt)1Yt,absentsuperscriptsubscript𝑘subscript𝑋𝑡𝑥topsuperscriptsubscript𝐾subscript𝑋𝑡subscript𝑋𝑡𝜏subscript𝐼𝑡1subscript𝑌𝑡\displaystyle=k_{X_{t},x}^{\top}(K_{X_{t},X_{t}}+\tau I_{t})^{-1}Y_{t},= italic_k start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_τ italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (2)
kt,τ(x,x¯)subscript𝑘𝑡𝜏𝑥¯𝑥\displaystyle k_{t,\tau}(x,\bar{x})italic_k start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT ( italic_x , over¯ start_ARG italic_x end_ARG ) =k(x,x¯)kXt,x(KXt,Xt+τIt)1kXt,x¯,absent𝑘𝑥¯𝑥superscriptsubscript𝑘subscript𝑋𝑡𝑥topsuperscriptsubscript𝐾subscript𝑋𝑡subscript𝑋𝑡𝜏subscript𝐼𝑡1subscript𝑘subscript𝑋𝑡¯𝑥\displaystyle=k(x,\bar{x})-k_{X_{t},x}^{\top}(K_{X_{t},X_{t}}+\tau I_{t})^{-1}% k_{X_{t},\bar{x}},= italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) - italic_k start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_τ italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT , (3)

where kXt,x=[k(x1,x),k(xt,x)]subscript𝑘subscript𝑋𝑡𝑥superscript𝑘subscript𝑥1𝑥𝑘subscript𝑥𝑡𝑥topk_{X_{t},x}=[k(x_{1},x),\dots k(x_{t},x)]^{\top}italic_k start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x end_POSTSUBSCRIPT = [ italic_k ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) , … italic_k ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, Yt=[y1,,yt]subscript𝑌𝑡superscriptsubscript𝑦1subscript𝑦𝑡topY_{t}=[y_{1},\dots,y_{t}]^{\top}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, KXt,Xt=[k(xi,xj)]i,j=1tsubscript𝐾subscript𝑋𝑡subscript𝑋𝑡superscriptsubscriptdelimited-[]𝑘subscript𝑥𝑖subscript𝑥𝑗𝑖𝑗1𝑡K_{X_{t},X_{t}}=[k(x_{i},x_{j})]_{i,j=1}^{t}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ italic_k ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and Itsubscript𝐼𝑡I_{t}italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the t×t𝑡𝑡t\times titalic_t × italic_t identity matrix. The posterior variance at a point x𝑥xitalic_x is given as σt,τ2(x)=kt,τ(x,x)superscriptsubscript𝜎𝑡𝜏2𝑥subscript𝑘𝑡𝜏𝑥𝑥\sigma_{t,\tau}^{2}(x)=k_{t,\tau}(x,x)italic_σ start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = italic_k start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT ( italic_x , italic_x ). The expression for posterior mean and variance in the noise-free setting is simply obtained by setting τ=0𝜏0\tau=0italic_τ = 0 in the above relations.

The posterior mean and variance computed using the GP model above are powerful tools to predict the values of the unknown function f𝑓fitalic_f and to quantify the uncertainty in the prediction. In particular, the prediction error at a point x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, |f(x)μt,τ(x)|𝑓𝑥subscript𝜇𝑡𝜏𝑥|f(x)-\mu_{t,\tau}(x)|| italic_f ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT ( italic_x ) |, can be upper bounded by ασt,τ(x)𝛼subscript𝜎𝑡𝜏𝑥\alpha\sigma_{t,\tau}(x)italic_α italic_σ start_POSTSUBSCRIPT italic_t , italic_τ end_POSTSUBSCRIPT ( italic_x ), for a certain scaling factor α>0𝛼0\alpha>0italic_α > 0 that depends on the feedback model Vakili et al. (2021a).

Lastly, we define the information gain of a set of points Xn={x1,x2,,xn}subscript𝑋𝑛subscript𝑥1subscript𝑥2subscript𝑥𝑛X_{n}=\{x_{1},x_{2},\dots,x_{n}\}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } as

γ~Xn,τ:=12log(det(It+τ1KXn,Xn)).assignsubscript~𝛾subscript𝑋𝑛𝜏12subscript𝐼𝑡superscript𝜏1subscript𝐾subscript𝑋𝑛subscript𝑋𝑛\displaystyle\tilde{\gamma}_{X_{n},\tau}:=\frac{1}{2}\log\left(\det\left(I_{t}% +\tau^{-1}K_{X_{n},X_{n}}\right)\right).over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) . (4)

Similarly, we define the maximal information gain as γn,τ:=supXn𝒳nγ~Xn,τassignsubscript𝛾𝑛𝜏subscriptsupremumsubscript𝑋𝑛superscript𝒳𝑛subscript~𝛾subscript𝑋𝑛𝜏\gamma_{n,\tau}:=\sup_{X_{n}\subset\mathcal{X}^{n}}\tilde{\gamma}_{X_{n},\tau}italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT. Maximal information gain is an important term that corresponds to the effective dimension of the kernel and helps characterize the regret of the algorithms. It depends only on the kernel and τ𝜏\tauitalic_τ.

3 The Predictive Performance of Random Exploration

The following theorem characterizes the predictive variance, and consequently the predictive error, of a set of randomly sampled points from the domain.

Theorem 3.1.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, ϱitalic-ϱ\varrhoitalic_ϱ be a finite Borel measure supported on 𝒳𝒳\mathcal{X}caligraphic_X, and k:𝒳×𝒳normal-:𝑘normal-→𝒳𝒳k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}italic_k : caligraphic_X × caligraphic_X → blackboard_R be a continuous kernel satisfying the polynomial eigendecay condition with parameter β>1𝛽1\beta>1italic_β > 1 (Defn. 2.2). Let Xn={x1,x2,,xn}subscript𝑋𝑛subscript𝑥1subscript𝑥2normal-…subscript𝑥𝑛X_{n}=\{x_{1},x_{2},\dots,x_{n}\}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } denote a collection of n𝑛nitalic_n i.i.d. points drawn from 𝒳𝒳\mathcal{X}caligraphic_X according to ϱitalic-ϱ\varrhoitalic_ϱ. Let σn,02superscriptsubscript𝜎𝑛02\sigma_{n,0}^{2}italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σn,τ2superscriptsubscript𝜎𝑛𝜏2\sigma_{n,\tau}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denote, respectively, the posterior variance conditioned on Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in the noise-free setting and the noisy setting with a noise variance of τ>0𝜏0\tau>0italic_τ > 0. Then, for a given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), there exists a constant N¯(δ,k,ϱ,τ)>0normal-¯𝑁𝛿𝑘italic-ϱ𝜏0\overline{N}(\delta,k,\varrho,\tau)>0over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) > 0, such that, with probability at least 1δ1𝛿1-\delta1 - italic_δ, for all n>N¯(δ,k,ϱ,τ)𝑛normal-¯𝑁𝛿𝑘italic-ϱ𝜏n>\overline{N}(\delta,k,\varrho,\tau)italic_n > over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ),

supx𝒳σn,τ2(x)subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑛𝜏2𝑥\displaystyle\sup_{x\in\mathcal{X}}\sigma_{n,\tau}^{2}(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) =𝒪(τγn,τn)=𝒪~((n/τ)1β1),absent𝒪𝜏subscript𝛾𝑛𝜏𝑛~𝒪superscript𝑛𝜏1𝛽1\displaystyle=\mathcal{O}\left(\frac{\tau\gamma_{n,\tau}}{n}\right)=\tilde{% \mathcal{O}}((n/\tau)^{\frac{1}{\beta}-1}),= caligraphic_O ( divide start_ARG italic_τ italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ) = over~ start_ARG caligraphic_O end_ARG ( ( italic_n / italic_τ ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_β end_ARG - 1 end_POSTSUPERSCRIPT ) ,
supx𝒳σn,02(x)subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑛02𝑥\displaystyle\sup_{x\in\mathcal{X}}\sigma_{n,0}^{2}(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) =𝒪~(n1β).absent~𝒪superscript𝑛1𝛽\displaystyle=\tilde{\mathcal{O}}(n^{1-\beta}).= over~ start_ARG caligraphic_O end_ARG ( italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT ) .

The above obtained bounds on the worst-case posterior variance under the random exploration scheme are order-optimal (up to polylogarithmic factors), matching the existing lower bounds Scarlett et al. (2017); Tuo and Wang (2020). The above theorem also improves upon the best known results for noisy scattered data approximation. In particular, for the class of Matérn kernels with smoothness ν𝜈\nuitalic_ν (i.e., β=(2ν+d)/d𝛽2𝜈𝑑𝑑\beta=(2\nu+d)/ditalic_β = ( 2 italic_ν + italic_d ) / italic_d), Theorem 3.1 implies a worst-case predictive error of 𝒪~(nν2ν+d)~𝒪superscript𝑛𝜈2𝜈𝑑\tilde{\mathcal{O}}(n^{-\frac{\nu}{2\nu+d}})over~ start_ARG caligraphic_O end_ARG ( italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_ν end_ARG start_ARG 2 italic_ν + italic_d end_ARG end_POSTSUPERSCRIPT ), improving upon the bound of 𝒪~(nν2ν+2d)~𝒪superscript𝑛𝜈2𝜈2𝑑\tilde{\mathcal{O}}(n^{-\frac{\nu}{2\nu+2d}})over~ start_ARG caligraphic_O end_ARG ( italic_n start_POSTSUPERSCRIPT - divide start_ARG italic_ν end_ARG start_ARG 2 italic_ν + 2 italic_d end_ARG end_POSTSUPERSCRIPT ) established by Wynne et al. (2021, Corollary 3).

The constant N¯(δ,k,ϱ,τ)¯𝑁𝛿𝑘italic-ϱ𝜏\overline{N}(\delta,k,\varrho,\tau)over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) is related to the kernel k𝑘kitalic_k and measure ϱitalic-ϱ\varrhoitalic_ϱ through two fundamental functions, N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ), which are given as follows for any R𝑅R\in\mathbb{N}italic_R ∈ blackboard_N:

N(R)𝑁𝑅\displaystyle N(R)italic_N ( italic_R ) :=supx𝒳j=1Rφj2(x),assignabsentsubscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅superscriptsubscript𝜑𝑗2𝑥\displaystyle:=\sup_{x\in\mathcal{X}}\sum_{j=1}^{R}\varphi_{j}^{2}(x),:= roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ,
T(R)𝑇𝑅\displaystyle T(R)italic_T ( italic_R ) :=supx𝒳j=R+1λjφj2(x)=supx𝒳j=R+1υj2(x).assignabsentsubscriptsupremum𝑥𝒳superscriptsubscript𝑗𝑅1subscript𝜆𝑗superscriptsubscript𝜑𝑗2𝑥subscriptsupremum𝑥𝒳superscriptsubscript𝑗𝑅1superscriptsubscript𝜐𝑗2𝑥\displaystyle:=\sup_{x\in\mathcal{X}}\sum_{j=R+1}^{\infty}\lambda_{j}\varphi_{% j}^{2}(x)=\sup_{x\in\mathcal{X}}\sum_{j=R+1}^{\infty}\upsilon_{j}^{2}(x).:= roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) .

They are referred to as the spectral functions of the kernel (see Gröchenig (2020) and references therein) because of their dependence on the eigensystem corresponding to the kernel k𝑘kitalic_k induced by the measure ϱitalic-ϱ\varrhoitalic_ϱ. Both N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ) are fundamental quantities that appear in the analysis of reconstruction and estimation of functions in general L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT spaces. The function N(R)𝑁𝑅N(R)italic_N ( italic_R ) corresponds to the inverse of the infimum of the Christoffel function Dunkl and Xu (2014) in the special case of reconstruction using orthogonal polynomials. Under Assumption 2.3 and the condition of polynomial eigendecay (Def. 2.2), N¯(δ,k,ϱ,τ)¯𝑁𝛿𝑘italic-ϱ𝜏\overline{N}(\delta,k,\varrho,\tau)over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) can be shown to be bounded as 𝒪(max{F4,(F2/τ)1β1}log(F/δ))𝒪superscript𝐹4superscriptsuperscript𝐹2𝜏1𝛽1𝐹𝛿\mathcal{O}(\max\{F^{4},(F^{2}/\tau)^{\frac{1}{\beta-1}}\}\log(F/\delta))caligraphic_O ( roman_max { italic_F start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , ( italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_τ ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_β - 1 end_ARG end_POSTSUPERSCRIPT } roman_log ( italic_F / italic_δ ) ). The dependence of N¯(δ,k,ϱ,τ)¯𝑁𝛿𝑘italic-ϱ𝜏\overline{N}(\delta,k,\varrho,\tau)over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) on δ𝛿\deltaitalic_δ is mild, as evident from the previous expression. Lastly, N¯(δ,k,ϱ,τ)¯𝑁𝛿𝑘italic-ϱ𝜏\overline{N}(\delta,k,\varrho,\tau)over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) is inversely proportional to τ𝜏\tauitalic_τ. Note that Theorem 3.1 ensures that a smaller value of τ𝜏\tauitalic_τ results in a tighter bound on the posterior variance, which in turn requires a larger number of samples. We refer the interested reader to the Appendix A for a more detailed discussion of N¯(δ,k,ϱ,τ)¯𝑁𝛿𝑘italic-ϱ𝜏\overline{N}(\delta,k,\varrho,\tau)over¯ start_ARG italic_N end_ARG ( italic_δ , italic_k , italic_ϱ , italic_τ ) and its dependence on N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ). For brevity, we drop the arguments and use the notation N¯¯𝑁\overline{N}over¯ start_ARG italic_N end_ARG in the rest of the paper.

We provide a sketch of the proof of Theorem 3.1 below and refer the reader to Appendix A for a detailed proof.

Proof.

The main idea of the proof is to relate the worst-case posterior variance conditioned on Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to γ~Xn,τsubscript~𝛾subscript𝑋𝑛𝜏\tilde{\gamma}_{X_{n},\tau}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT. This relation is established in two parts. In the first part, we establish that as the number of samples grow, the spectrum of random operator 𝐙^^𝐙\hat{\mathbf{Z}}over^ start_ARG bold_Z end_ARG concentrates to that of 𝐙𝐙\mathbf{Z}bold_Z, where 𝐙^,𝐙:kk:^𝐙𝐙subscript𝑘subscript𝑘\hat{\mathbf{Z}},\mathbf{Z}:\mathcal{H}_{k}\to\mathcal{H}_{k}over^ start_ARG bold_Z end_ARG , bold_Z : caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are defined as follows:

𝐙^g:=[i=1ng,ψxiψxi]+τg;𝐙:=𝔼Xn[𝐙^],formulae-sequenceassign^𝐙𝑔delimited-[]superscriptsubscript𝑖1𝑛𝑔subscript𝜓subscript𝑥𝑖subscript𝜓subscript𝑥𝑖𝜏𝑔assign𝐙subscript𝔼subscript𝑋𝑛delimited-[]^𝐙\displaystyle\hat{\mathbf{Z}}g:=\left[\sum_{i=1}^{n}\langle g,\psi_{x_{i}}% \rangle\psi_{x_{i}}\right]+\tau g;\quad\mathbf{Z}:=\mathbb{E}_{X_{n}}[\hat{% \mathbf{Z}}],over^ start_ARG bold_Z end_ARG italic_g := [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ italic_g , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] + italic_τ italic_g ; bold_Z := blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG bold_Z end_ARG ] ,

where {x1,x2,,xn}subscript𝑥1subscript𝑥2subscript𝑥𝑛\{x_{1},x_{2},\dots,x_{n}\}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } denotes the random ensemble of points drawn according to the measure ϱitalic-ϱ\varrhoitalic_ϱ. The concentration in spectral norm allows us to approximate the expression of σn,τ2(x)=τψx,𝐙^1ψxsuperscriptsubscript𝜎𝑛𝜏2𝑥𝜏subscript𝜓𝑥superscript^𝐙1subscript𝜓𝑥\sigma_{n,\tau}^{2}(x)=\tau\langle{\psi_{x}},{\hat{\mathbf{Z}}^{-1}\psi_{x}}\rangleitalic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = italic_τ ⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ as σn,τ2(x)τψx,𝐙1ψxsuperscriptsubscript𝜎𝑛𝜏2𝑥𝜏subscript𝜓𝑥superscript𝐙1subscript𝜓𝑥\sigma_{n,\tau}^{2}(x)\approx\tau\langle{\psi_{x}},{{\mathbf{Z}}^{-1}\psi_{x}}\rangleitalic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≈ italic_τ ⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩, i.e., by replacing the sample covariance operator, 𝐙^^𝐙\hat{\mathbf{Z}}over^ start_ARG bold_Z end_ARG, with the true covariance operator, 𝐙𝐙\mathbf{Z}bold_Z. Here, A1superscript𝐴1A^{-1}italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT denotes the inverse of an operator A𝐴Aitalic_A, i.e., AA1=A1A=𝐈𝐝𝐴superscript𝐴1superscript𝐴1𝐴𝐈𝐝A\circ A^{-1}=A^{-1}\circ A=\mathbf{Id}italic_A ∘ italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ italic_A = bold_Id and 𝐈𝐝𝐈𝐝\mathbf{Id}bold_Id denotes the identity operator. Thus, this step allows us to obtain a deterministic bound on posterior variance, which is easier to understand and analyze. We establish the required relation using the following two lemmas:

Lemma 3.2.

For all nN¯𝑛normal-¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG, the following relation holds with probability 1δ/21𝛿21-\delta/21 - italic_δ / 2:

𝐙12𝐙^𝐙12𝐈𝐝21/9.subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝219\displaystyle\|\mathbf{Z}^{-\frac{1}{2}}\hat{\mathbf{Z}}\mathbf{Z}^{-\frac{1}{% 2}}-\mathbf{Id}\|_{2}\leq 1/9.∥ bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 / 9 .
Lemma 3.3.

If the relation 𝐙12𝐙^𝐙12𝐈𝐝2bsubscriptnormsuperscript𝐙12normal-^𝐙superscript𝐙12𝐈𝐝2𝑏\|\mathbf{Z}^{-\frac{1}{2}}\hat{\mathbf{Z}}\mathbf{Z}^{-\frac{1}{2}}-\mathbf{% Id}\|_{2}\leq b∥ bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b is true for some b(0,1/3)𝑏013b\in(0,1/3)italic_b ∈ ( 0 , 1 / 3 ), then following is true x𝒳for-all𝑥𝒳\forall\ x\in\mathcal{X}∀ italic_x ∈ caligraphic_X:

ψx,𝐙^1ψx1b1b2bψx,𝐙1ψx.subscript𝜓𝑥superscript^𝐙1subscript𝜓𝑥1𝑏1𝑏2𝑏subscript𝜓𝑥superscript𝐙1subscript𝜓𝑥\displaystyle\langle{\psi_{x}},{\hat{\mathbf{Z}}^{-1}\psi_{x}}\rangle\leq\frac% {\sqrt{1-b}}{\sqrt{1-b}-\sqrt{2b}}\cdot\langle{\psi_{x}},{{\mathbf{Z}}^{-1}% \psi_{x}}\rangle.⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ ≤ divide start_ARG square-root start_ARG 1 - italic_b end_ARG end_ARG start_ARG square-root start_ARG 1 - italic_b end_ARG - square-root start_ARG 2 italic_b end_ARG end_ARG ⋅ ⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ .

Lemma 3.2 forms the cornerstone of the proof of the theorem. The result is established by bounding the expression |g,(𝐙1/2𝐙^𝐙1/2𝐈𝐝)g|𝑔superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝑔|\langle g,(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})g\rangle|| ⟨ italic_g , ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) italic_g ⟩ | for an arbitrary g𝑔gitalic_g with gk=1subscriptnorm𝑔subscript𝑘1\|g\|_{\mathcal{H}_{k}}=1∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1. We bound the above expression by decomposing it into a sum of three terms. Each of the three terms is then carefully bounded using a combination of Matrix-Chernoff inequality (Tropp, 2012, Theorem 1.1), a result for spectral norm concentration based on non-commutative Khinchtine inequality Buchholz (2001, 2005); Moeller and Ullrich (2021) and Bernstein inequality. Lemma 3.3 is established using a combination the structure of covariance matrices, the Cauchy-Schwarz inequality and the relation between the operator norm and 2222-norm. We would like to emphasize that both the above lemmas are true in general for all eigendecay profiles and even without Assumption 2.3 being true.

In the second part, we show that, with high probability, the information gain of the (random) set Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is lower bounded by nsupx𝒳ψx,𝐙1ψx𝑛subscriptsupremum𝑥𝒳subscript𝜓𝑥superscript𝐙1subscript𝜓𝑥n\cdot\sup_{x\in\mathcal{X}}\langle{\psi_{x}},{{\mathbf{Z}}^{-1}\psi_{x}}\rangleitalic_n ⋅ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩, upto a multiplicative constant. The above idea is formalized in the following lemma.

Lemma 3.4.

For all nN¯𝑛normal-¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG, the following relation holds with probability 1δ/21𝛿21-\delta/21 - italic_δ / 2:

γ~Xn,τ1354F2nsupx𝒳ψx,𝐙1ψx.subscript~𝛾subscript𝑋𝑛𝜏1354superscript𝐹2𝑛subscriptsupremum𝑥𝒳subscript𝜓𝑥superscript𝐙1subscript𝜓𝑥\displaystyle\tilde{\gamma}_{X_{n},\tau}\geq\frac{13}{54F^{2}}\cdot n\cdot\sup% _{x\in\mathcal{X}}\langle{\psi_{x}},{{\mathbf{Z}}^{-1}\psi_{x}}\rangle.over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT ≥ divide start_ARG 13 end_ARG start_ARG 54 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_n ⋅ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ .

Thus ψx,𝐙1ψxsubscript𝜓𝑥superscript𝐙1subscript𝜓𝑥\langle{\psi_{x}},{{\mathbf{Z}}^{-1}\psi_{x}}\rangle⟨ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ serves as the bridge for connecting the posterior variance to maximal information gain.

The result for the noisy case follows immediately from the above lemmas by noting that γXn,τγn,τsubscript𝛾subscript𝑋𝑛𝜏subscript𝛾𝑛𝜏\gamma_{X_{n},\tau}\leq\gamma_{n,\tau}italic_γ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT. For the noise-free setting, the results do not carry forward immediately as the above analysis does not hold for τ=0𝜏0\tau=0italic_τ = 0. To circumvent this issue, we use the fact that σn,τ2(x)superscriptsubscript𝜎𝑛𝜏2𝑥\sigma_{n,\tau}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) is an increasing function of τ𝜏\tauitalic_τ. Thus, we obtain a bound on σn,02(x)superscriptsubscript𝜎𝑛02𝑥\sigma_{n,0}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) by using the bound on σn,τ*2(x)superscriptsubscript𝜎𝑛superscript𝜏2𝑥\sigma_{n,\tau^{*}}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_n , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ), where τ*superscript𝜏\tau^{*}italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is a carefully chosen value that not only allows us to use the analysis from the noisy case but also ensures that σn,τ*2superscriptsubscript𝜎𝑛superscript𝜏2\sigma_{n,\tau^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a close representation of σn,02superscriptsubscript𝜎𝑛02\sigma_{n,0}^{2}italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to guarantee tightest possible bounds. ∎

Remark 3.5.

We would like to emphasize that the above result holds for samples generated under every finite Borel measure ϱitalic-ϱ\varrhoitalic_ϱ supported on 𝒳𝒳\mathcal{X}caligraphic_X. However, the quality of the estimate changes with the choice of the measure through the leading constant in the bound in Theorem 3.1.

4 The REDS algorithm

In this section, we present the proposed algorithm and analyze its regret performance.

4.1 REDS with Noise-Free Feedback

REDS integrates random exploration with domain shrinking. It proceeds in epochs, maintaining an active region 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT of the domain during each epoch r1𝑟1r\geq 1italic_r ≥ 1. The sequence of active regions {𝒳r}rsubscriptsubscript𝒳𝑟𝑟\{\mathcal{X}_{r}\}_{r}{ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT shrinks across epochs, i.e., 𝒳r𝒳r1𝒳1=𝒳subscript𝒳𝑟subscript𝒳𝑟1subscript𝒳1𝒳\mathcal{X}_{r}\subseteq\mathcal{X}_{r-1}\subseteq\dots\mathcal{X}_{1}=% \mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ⊆ … caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = caligraphic_X, while ensuring x*𝒳rsuperscript𝑥subscript𝒳𝑟x^{*}\in\mathcal{X}_{r}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for all r𝑟ritalic_r with high probability. During the rthsuperscript𝑟thr^{\text{th}}italic_r start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT epoch, REDS samples Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT points, uniformly at random from the set 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT222If 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT consists of multiple disjoint regions, then we carry out this step for each region separately., where Nr=N12r1subscript𝑁𝑟subscript𝑁1superscript2𝑟1N_{r}=N_{1}\cdot 2^{r-1}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT and the initial batch size N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is an input to the algorithm.

Using the observations from these points, REDS computes the posterior mean and variance function over 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, denoted by μrsubscript𝜇𝑟\mu_{r}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and σr2subscriptsuperscript𝜎2𝑟\sigma^{2}_{r}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT respectively, using the Equations (2) and (3) with τ=0𝜏0\tau=0italic_τ = 0. The posterior mean and variance are then used to obtain 𝒳r+1subscript𝒳𝑟1\mathcal{X}_{r+1}caligraphic_X start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT, an improved localization of x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, as follows:

𝒳r+1={x𝒳r|UCBr(x)supx𝒳rLCBr(x)}.subscript𝒳𝑟1conditional-set𝑥subscript𝒳𝑟subscriptUCB𝑟𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟subscriptLCB𝑟superscript𝑥\displaystyle\mathcal{X}_{r+1}=\left\{x\in\mathcal{X}_{r}\ \bigg{|}\ \mathrm{% UCB}_{r}(x)\geq\sup_{x^{\prime}\in\mathcal{X}_{r}}\mathrm{LCB}_{r}(x^{\prime})% \right\}.caligraphic_X start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | roman_UCB start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_LCB start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } .

Here, UCB(x)=μr(x)+Bσr(x)UCB𝑥subscript𝜇𝑟𝑥𝐵subscript𝜎𝑟𝑥\mathrm{UCB}(x)=\mu_{r}(x)+B\sigma_{r}(x)roman_UCB ( italic_x ) = italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) + italic_B italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) and LCB(x)=μr(x)Bσr(x)LCB𝑥subscript𝜇𝑟𝑥𝐵subscript𝜎𝑟𝑥\mathrm{LCB}(x)=\mu_{r}(x)-B\sigma_{r}(x)roman_LCB ( italic_x ) = italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) - italic_B italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) correspond to upper and lower bounds on the estimate of f𝑓fitalic_f. A pseudocode for the algorithm is provided in Algorithm 1.

Algorithm 1 Random Exploration with Domain Shrinking
1:  Input: N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the initial batch size.
2:  Set 𝒳1𝒳subscript𝒳1𝒳\mathcal{X}_{1}\leftarrow\mathcal{X}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← caligraphic_X, tcurr0subscript𝑡curr0t_{\text{curr}}\leftarrow 0italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT ← 0, r1𝑟1r\leftarrow 1italic_r ← 1
3:  for t=tcurr+1,tcurr+2,,tcurr+Nr𝑡subscript𝑡curr1subscript𝑡curr2subscript𝑡currsubscript𝑁𝑟t=t_{\text{curr}}+1,t_{\text{curr}}+2,\dots,t_{\text{curr}}+N_{r}italic_t = italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + 1 , italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + 2 , … , italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT do
4:     Sample a point xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT uniformly at random from 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and observe ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
5:     if  t>T𝑡𝑇t>Titalic_t > italic_T then
6:        Terminate
7:     end if
8:  end for
9:  Construct μrsubscript𝜇𝑟\mu_{r}italic_μ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and σrsubscript𝜎𝑟\sigma_{r}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT based on observations {(xt,yt:t{tcurr+1,tcurr+2,,Nr}}\{(x_{t},y_{t}:t\in\{t_{\text{curr}}+1,t_{\text{curr}}+2,\dots,N_{r}\}\}{ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_t ∈ { italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + 1 , italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + 2 , … , italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } } using Eqn (2) and (3) with τ=0𝜏0\tau=0italic_τ = 0.
10:  Set 𝒳r+1={x𝒳r|UCBr(x)supx𝒳rLCBr(x)}subscript𝒳𝑟1conditional-set𝑥subscript𝒳𝑟subscriptUCB𝑟𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟subscriptLCB𝑟superscript𝑥\mathcal{X}_{r+1}=\{x\in\mathcal{X}_{r}\ |\ \mathrm{UCB}_{r}(x)\geq\sup_{x^{% \prime}\in\mathcal{X}_{r}}\mathrm{LCB}_{r}(x^{\prime})\}caligraphic_X start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | roman_UCB start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_LCB start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) }
11:  tcurrtcurr+Nrsubscript𝑡currsubscript𝑡currsubscript𝑁𝑟t_{\text{curr}}\leftarrow t_{\text{curr}}+N_{r}italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT ← italic_t start_POSTSUBSCRIPT curr end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, Nr+12Nrsubscript𝑁𝑟12subscript𝑁𝑟N_{r+1}\leftarrow 2N_{r}italic_N start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT ← 2 italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT
12:  rr+1𝑟𝑟1r\leftarrow r+1italic_r ← italic_r + 1

4.2 REDS under noisy feedback

The REDS algorithm can be extended to operate under noisy feedback with the following two minor modifications to Algorithm 1. First, the posterior mean and variance (μr,τ,σr,τ2)subscript𝜇𝑟𝜏superscriptsubscript𝜎𝑟𝜏2(\mu_{r,\tau},~{}\sigma_{r,\tau}^{2})( italic_μ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) in each epoch should be computed using a noise variance τ>0𝜏0\tau>0italic_τ > 0 (Line 9999 of Algorithm 1). Second, the upper and lower confidence bounds, i.e., UCB and LCB (Line 10101010 of Algorithm 1), should be updated to the following:

UCBr,τ,δ(x)subscriptUCB𝑟𝜏𝛿𝑥\displaystyle\mathrm{UCB}_{r,\tau,\delta}(x)roman_UCB start_POSTSUBSCRIPT italic_r , italic_τ , italic_δ end_POSTSUBSCRIPT ( italic_x ) :=μr,τ(x)+ατ,δσr,τ(x)+cT,τ,δassignabsentsubscript𝜇𝑟𝜏𝑥subscript𝛼𝜏𝛿subscript𝜎𝑟𝜏𝑥subscript𝑐𝑇𝜏𝛿\displaystyle:=\mu_{r,\tau}(x)+\alpha_{\tau,\delta}\sigma_{r,\tau}(x)+c_{T,% \tau,\delta}:= italic_μ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT ( italic_x ) + italic_α start_POSTSUBSCRIPT italic_τ , italic_δ end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT ( italic_x ) + italic_c start_POSTSUBSCRIPT italic_T , italic_τ , italic_δ end_POSTSUBSCRIPT (5)
LCBr,τ,δ(x)subscriptLCB𝑟𝜏𝛿𝑥\displaystyle\mathrm{LCB}_{r,\tau,\delta}(x)roman_LCB start_POSTSUBSCRIPT italic_r , italic_τ , italic_δ end_POSTSUBSCRIPT ( italic_x ) :=μr,τ(x)ατ,δσr,τ(x)cT,τ,δ,assignabsentsubscript𝜇𝑟𝜏𝑥subscript𝛼𝜏𝛿subscript𝜎𝑟𝜏𝑥subscript𝑐𝑇𝜏𝛿\displaystyle:=\mu_{r,\tau}(x)-\alpha_{\tau,\delta}\sigma_{r,\tau}(x)-c_{T,% \tau,\delta},:= italic_μ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT ( italic_x ) - italic_α start_POSTSUBSCRIPT italic_τ , italic_δ end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r , italic_τ end_POSTSUBSCRIPT ( italic_x ) - italic_c start_POSTSUBSCRIPT italic_T , italic_τ , italic_δ end_POSTSUBSCRIPT , (6)

where ατ,δ=B+R(2/τ)log(|𝒟T|/δ)subscript𝛼𝜏𝛿𝐵𝑅2𝜏subscript𝒟𝑇𝛿\alpha_{\tau,\delta}=B+R\sqrt{(2/\tau)\log(|\mathcal{D}_{T}|/\delta)}italic_α start_POSTSUBSCRIPT italic_τ , italic_δ end_POSTSUBSCRIPT = italic_B + italic_R square-root start_ARG ( 2 / italic_τ ) roman_log ( | caligraphic_D start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | / italic_δ ) end_ARG, cT,τ,δ=2BT+R2Tτlog(4Tδ)subscript𝑐𝑇𝜏𝛿2𝐵𝑇𝑅2𝑇𝜏4𝑇𝛿c_{T,\tau,\delta}=\frac{2B}{T}+R\sqrt{\frac{2}{T\tau}\log\left(\frac{4T}{% \delta}\right)}italic_c start_POSTSUBSCRIPT italic_T , italic_τ , italic_δ end_POSTSUBSCRIPT = divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG + italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ end_ARG ) end_ARG and 𝒟Tsubscript𝒟𝑇\mathcal{D}_{T}caligraphic_D start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is defined in Assumption 4.1.

4.3 Performance Analysis

For the analysis of the REDS algorithm, we need to make the following two additional assumptions.

Assumption 4.1.

For all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, there exists a discretization 𝒟nsubscript𝒟𝑛\mathcal{D}_{n}caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of 𝒳𝒳\mathcal{X}caligraphic_X such that for all fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, |f(x)f([x]𝒟n)|fk/n𝑓𝑥𝑓subscriptdelimited-[]𝑥subscript𝒟𝑛subscriptnorm𝑓subscript𝑘𝑛|f(x)-f([x]_{\mathcal{D}_{n}})|\leq\|f\|_{\mathcal{H}_{k}}/n| italic_f ( italic_x ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | ≤ ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT / italic_n and |𝒟n|=poly(n)subscript𝒟𝑛poly𝑛|\mathcal{D}_{n}|=\text{poly}(n)| caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | = poly ( italic_n )333The notation f(x)=poly(x)𝑓𝑥poly𝑥f(x)=\mathrm{poly}(x)italic_f ( italic_x ) = roman_poly ( italic_x ) is equivalent to f(x)=𝒪(xk)𝑓𝑥𝒪superscript𝑥𝑘f(x)=\mathcal{O}(x^{k})italic_f ( italic_x ) = caligraphic_O ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) for some k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N., where [x]𝒟n=argminy𝒟nxy2subscriptdelimited-[]𝑥subscript𝒟𝑛subscriptargmin𝑦subscript𝒟𝑛subscriptnorm𝑥𝑦2[x]_{\mathcal{D}_{n}}=\operatorname*{arg\,min}_{y\in\mathcal{D}_{n}}\|x-y\|_{2}[ italic_x ] start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_y ∈ caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, is the point in 𝒟nsubscript𝒟𝑛\mathcal{D}_{n}caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT that is closest to x𝑥xitalic_x.

Assumption 4.2.

Let η={x𝒳|f(x)η}subscript𝜂conditional-set𝑥𝒳𝑓𝑥𝜂\mathcal{L}_{\eta}=\{x\in\mathcal{X}|f(x)\geq\eta\}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X | italic_f ( italic_x ) ≥ italic_η } denote the level set of f𝑓fitalic_f for η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ]. We assume that for all η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ], ηsubscript𝜂\mathcal{L}_{\eta}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is a disjoint union of at most Mf<subscript𝑀𝑓M_{f}<\inftyitalic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT < ∞ components, each of which is closed and connected. Moreover, for each such component, there exists a bi-Lipschitzian map444We refer the reader to the supplementary material for additional details about the terms used in this assumption. between each such component and 𝒳𝒳\mathcal{X}caligraphic_X with normalized Lipschitz constant pair Lf,Lf<subscript𝐿𝑓superscriptsubscript𝐿𝑓L_{f},L_{f}^{\prime}<\inftyitalic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < ∞.

Assumption 4.1 is only required for the noisy case and is a standard assumption adopted in the literature. The existence of such a discretization has been justified and adopted in previous studies Srinivas et al. (2010); Chowdhury and Gopalan (2017); Vakili et al. (2021a); Salgia et al. (2022) and is a mild assumption on the kernel. Specifically, the popular class of kernels like Squared Exponential and Matérn kernels are known to be Lipschitz continuous, in which case a ε𝜀\varepsilonitalic_ε-cover of the domain with ε=𝒪(1/n)𝜀𝒪1𝑛\varepsilon=\mathcal{O}(1/n)italic_ε = caligraphic_O ( 1 / italic_n ) is sufficient to show the existence of such a discretization. Assumption 4 is an assumption on the regularity of the level sets of the function f𝑓fitalic_f. The existence of a bi-Lipschitzian map between two sets implies topological similarity between the two sets. Intuitively, this assumption ensures that the shape of the level-sets is not “too arbitrary”. Note that such an assumption on the level sets of f𝑓fitalic_f is relatively mild as the RKHS endows smoothness properties to the function f𝑓fitalic_f which translate to a degree of topological regularity of level sets Alberti et al. (2011); Lee (2010).

The following theorem characterizes the regret performance of REDS under noise-free feedback.

Theorem 4.3.

Assume that the kernel k𝑘kitalic_k satisfies the polynomial eigendecay condition with parameter β>1𝛽1\beta>1italic_β > 1 and function f𝑓fitalic_f satisfies Assumption 4. For a given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), if REDS algorithm is run with N1CLf,LfN¯(δ/log2(T))subscript𝑁1subscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′normal-¯𝑁𝛿subscript2𝑇N_{1}\geq C_{L_{f},L_{f}^{\prime}}\overline{N}(\delta/\log_{2}(T))italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_N end_ARG ( italic_δ / roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ) and noise-free feedback, then the regret incurred by REDS satisfies,

R(T)=𝒪~(max{T3β2,1}).𝑅𝑇~𝒪superscript𝑇3𝛽21\displaystyle R(T)=\tilde{\mathcal{O}}(\max\{T^{\frac{3-\beta}{2}},1\}).italic_R ( italic_T ) = over~ start_ARG caligraphic_O end_ARG ( roman_max { italic_T start_POSTSUPERSCRIPT divide start_ARG 3 - italic_β end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , 1 } ) .

with probability at least 1δ1𝛿1-\delta1 - italic_δ. Here, CLf,Lfsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′C_{L_{f},L_{f}^{\prime}}italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is a constant that depends only on Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and Lfsuperscriptsubscript𝐿𝑓normal-′L_{f}^{\prime}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

The following is an immediate corollary of the above theorem for the case of Matérn kernels.

Corollary 4.4.

Let k𝑘kitalic_k be the Matérn kernel with smoothness ν>0𝜈0\nu>0italic_ν > 0. For a given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), if REDS algorithm is run with N1CLf,LfN¯(δ/log2(T))subscript𝑁1subscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′normal-¯𝑁𝛿subscript2𝑇N_{1}\geq C_{L_{f},L_{f}^{\prime}}\overline{N}(\delta/\log_{2}(T))italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_N end_ARG ( italic_δ / roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T ) ) under noise-free feedback on a function fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT satisfying Assumption 4, then the regret incurred by REDS satisfies,

R(T)={𝒪~(T1ν/d) if ν<d,𝒪((logT)5/2) if ν=d,𝒪((logT)3/2) if ν>d..𝑅𝑇cases~𝒪superscript𝑇1𝜈𝑑 if 𝜈𝑑𝒪superscript𝑇52 if 𝜈𝑑𝒪superscript𝑇32 if 𝜈𝑑\displaystyle R(T)=\begin{cases}\tilde{\mathcal{O}}(T^{1-\nu/d})&\text{ if }% \nu<d,\\ \mathcal{O}((\log T)^{5/2})&\text{ if }\nu=d,\\ \mathcal{O}((\log T)^{3/2})&\text{ if }\nu>d.\end{cases}.italic_R ( italic_T ) = { start_ROW start_CELL over~ start_ARG caligraphic_O end_ARG ( italic_T start_POSTSUPERSCRIPT 1 - italic_ν / italic_d end_POSTSUPERSCRIPT ) end_CELL start_CELL if italic_ν < italic_d , end_CELL end_ROW start_ROW start_CELL caligraphic_O ( ( roman_log italic_T ) start_POSTSUPERSCRIPT 5 / 2 end_POSTSUPERSCRIPT ) end_CELL start_CELL if italic_ν = italic_d , end_CELL end_ROW start_ROW start_CELL caligraphic_O ( ( roman_log italic_T ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ) end_CELL start_CELL if italic_ν > italic_d . end_CELL end_ROW .

with probability at least 1δ1𝛿1-\delta1 - italic_δ. Here, CLf,Lfsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′C_{L_{f},L_{f}^{\prime}}italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is a constant that depends only on Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and Lfsuperscriptsubscript𝐿𝑓normal-′L_{f}^{\prime}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

This matches the result conjectured in Vakili (2022) upto logarithmic factors, resolving the open problem.

The following theorem characterizes the regret performance of REDS in the noisy feedback setting.

Theorem 4.5.

Consider the noisy observation model described in Sec. 2.2 and assume that Assumptions 4.1 and 4 hold. For a given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), if REDS algorithm is run with N1CLf,LfN¯(δ/(2log2T))subscript𝑁1subscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′normal-¯𝑁𝛿2subscript2𝑇N_{1}\geq C_{L_{f},L_{f}^{\prime}}\overline{N}(\delta/(2\log_{2}T))italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_N end_ARG ( italic_δ / ( 2 roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T ) ) and UCB and LCB functions as defined in Eqns. (5) and (6) with parameter δ=δ/(2log2T)superscript𝛿normal-′𝛿2subscript2𝑇\delta^{\prime}=\delta/(2\log_{2}T)italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_δ / ( 2 roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T ), then the regret incurred by REDS satisfies,

R(T)=𝒪~(TγTlog(T/δ)).𝑅𝑇~𝒪𝑇subscript𝛾𝑇𝑇𝛿\displaystyle R(T)=\tilde{\mathcal{O}}(\sqrt{T\gamma_{T}}\log(T/\delta)).italic_R ( italic_T ) = over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_T italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG roman_log ( italic_T / italic_δ ) ) .

with probability at least 1δ1𝛿1-\delta1 - italic_δ.

As shown by the above theorem, REDS achieves order-optimal regret (upto logarithmic factors) even under the noisy feedback model.

The proofs of both Theorems 4.3 and 4.5 follow a similar blueprint. A key aspect of both the proofs is to ensure that as Theorem 3.1 is invoked across the sets {𝒳r}rsubscriptsubscript𝒳𝑟𝑟\{\mathcal{X}_{r}\}_{r\in\mathbb{N}}{ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_r ∈ blackboard_N end_POSTSUBSCRIPT, the leading constant in Theorem 3.1, which has an implicit dependence on the domain through the constant F𝐹Fitalic_F, remains bounded and is independent of T𝑇Titalic_T. The following lemma shows that for all functions f𝑓fitalic_f satisfying Assumption 4, the leading constant only depends on the function and the initial domain.

Lemma 4.6.

Let fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be such that Assumption 4 holds. Let 𝒳superscript𝒳normal-′\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote a path connected component of any level set of f𝑓fitalic_f and X𝒳superscript𝑋normal-′superscript𝒳normal-′X^{\prime}\subset\mathcal{X}^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊂ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a set of n𝑛nitalic_n points drawn uniformly at random from 𝒳superscript𝒳normal-′\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then for nCL,LfN¯(δ)𝑛subscript𝐶𝐿superscriptsubscript𝐿𝑓normal-′normal-¯𝑁𝛿n\geq C_{{L},{L}_{f}^{\prime}}\overline{N}(\delta)italic_n ≥ italic_C start_POSTSUBSCRIPT italic_L , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_N end_ARG ( italic_δ ), the following relations holds with probability 1δ1𝛿1-\delta1 - italic_δ:

supx𝒳σX,τ2(x)subscriptsupremum𝑥superscript𝒳superscriptsubscript𝜎superscript𝑋𝜏2𝑥\displaystyle\sup_{x\in\mathcal{X}^{\prime}}\sigma_{X^{\prime},\tau}^{2}(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) CL,LfF2τγn,τnabsentsuperscriptsubscript𝐶𝐿superscriptsubscript𝐿𝑓superscript𝐹2𝜏subscript𝛾𝑛𝜏𝑛\displaystyle\leq C_{{L},{L}_{f}^{\prime}}^{\prime}\cdot F^{2}\tau\cdot\frac{% \gamma_{n,\tau}}{n}≤ italic_C start_POSTSUBSCRIPT italic_L , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ ⋅ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG
supx𝒳σX,02(x)subscriptsupremum𝑥superscript𝒳superscriptsubscript𝜎superscript𝑋02𝑥\displaystyle\sup_{x\in\mathcal{X}^{\prime}}\sigma_{X^{\prime},0}^{2}(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) CL,LfF2n1βabsentsuperscriptsubscript𝐶𝐿superscriptsubscript𝐿𝑓superscript𝐹2superscript𝑛1𝛽\displaystyle\leq C_{{L},{L}_{f}^{\prime}}^{\prime}\cdot F^{2}\cdot n^{1-\beta}≤ italic_C start_POSTSUBSCRIPT italic_L , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT

where F𝐹Fitalic_F and N¯(δ)normal-¯𝑁𝛿\overline{N}(\delta)over¯ start_ARG italic_N end_ARG ( italic_δ ) represent, respectively, the constants in Assumption 2.3 and Theorem 3.1 corresponding to the uniform measure on 𝒳𝒳\mathcal{X}caligraphic_X, and CL,Lf,CLf,Lfsubscript𝐶𝐿superscriptsubscript𝐿𝑓normal-′superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′normal-′C_{{L},{L}_{f}^{\prime}},C_{{L}_{f},{L}_{f}^{\prime}}^{\prime}italic_C start_POSTSUBSCRIPT italic_L , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are constants that depend only on Lf,Lfsubscript𝐿𝑓superscriptsubscript𝐿𝑓normal-′{L}_{f},{L}_{f}^{\prime}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

At a high level, the above lemma ensures that under the regularity condition on the topology of level sets (Assumption 4), Theorem 3.1 can be applied across level sets of f𝑓fitalic_f by just paying the penalty of a constant that depends only on f𝑓fitalic_f. The proof is based on the inclusion of RKHSs over subsets along with a change of measure argument. We refer the reader to Appendix B for a detailed proof of Lemma 4.6 and Theorems 4.3 and 4.5.

5 Empirical Studies

Refer to caption
(a) Branin
Refer to caption
(b) Hartmann-4D
Refer to caption
(c) Hartmann-6D
Figure 1: Cumulative regret averaged over 10101010 Monte Carlo runs for all algorithms across different benchmark functions. The shaded region represents the error bars upto one standard deviation. As evident from the plots, the regret of REDS is comparable to that of BPE and GP-ThreDS.

We compare the computational efficiency of REDS against algorithms with order-optimal regret performance, namely BPE (Li and Scarlett, 2022) and GP-ThreDS (Salgia et al., 2021) through an empirical study. We compare the regret performance and the running time of the three algorithms for three commonly used benchmark functions in Bayesian Optimization, namely, Branin (Azimi et al., 2012; Picheny et al., 2013), Hartmann-4D (Picheny et al., 2013) and Hartmann-6D (Picheny et al., 2013). The analytical expressions for the three benchmark functions are given as follows:

  • Branin function, denoted by B(x1,x2)𝐵subscript𝑥1subscript𝑥2B(x_{1},x_{2})italic_B ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), is defined over 𝒳=[0,1]2𝒳superscript012\mathcal{X}=[0,1]^{2}caligraphic_X = [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

    B(x1,x2)𝐵subscript𝑥1subscript𝑥2\displaystyle B(x_{1},x_{2})italic_B ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =151.95((v5.1u24π2+5uπ6)2+(10108π)cos(u)44.81),absent151.95superscript𝑣5.1superscript𝑢24superscript𝜋25𝑢𝜋6210108𝜋𝑢44.81\displaystyle=-\frac{1}{51.95}\left(\left(v-\frac{5.1u^{2}}{4\pi^{2}}+\frac{5u% }{\pi}-6\right)^{2}+\left(10-\frac{10}{8\pi}\right)\cos(u)-44.81\right),= - divide start_ARG 1 end_ARG start_ARG 51.95 end_ARG ( ( italic_v - divide start_ARG 5.1 italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 5 italic_u end_ARG start_ARG italic_π end_ARG - 6 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 10 - divide start_ARG 10 end_ARG start_ARG 8 italic_π end_ARG ) roman_cos ( italic_u ) - 44.81 ) ,

    where u=15x15𝑢15subscript𝑥15u=15x_{1}-5italic_u = 15 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 5 and v=15x2𝑣15subscript𝑥2v=15x_{2}italic_v = 15 italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

  • Hartmann-4444D function, denoted by H4(x1,x2,x3,x4)subscript𝐻4subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑥4H_{4}(x_{1},x_{2},x_{3},x_{4})italic_H start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ), is defined over 𝒳=[0,1]4𝒳superscript014\mathcal{X}=[0,1]^{4}caligraphic_X = [ 0 , 1 ] start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT.

    H4(x1,x2,x3,x4)=i=14wiexp(j=14Aij(xjCij)2).subscript𝐻4subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑥4superscriptsubscript𝑖14subscript𝑤𝑖superscriptsubscript𝑗14subscript𝐴𝑖𝑗superscriptsubscript𝑥𝑗subscript𝐶𝑖𝑗2\displaystyle H_{4}(x_{1},x_{2},x_{3},x_{4})=\sum_{i=1}^{4}w_{i}\exp\left(-% \sum_{j=1}^{4}A_{ij}(x_{j}-C_{ij})^{2}\right).italic_H start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_exp ( - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .
  • Hartmann-6666D function, denoted by H6(x1,x2,x3,x4,x5,x6)subscript𝐻6subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑥4subscript𝑥5subscript𝑥6H_{6}(x_{1},x_{2},x_{3},x_{4},x_{5},x_{6})italic_H start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ), is defined over 𝒳=[0,1]6𝒳superscript016\mathcal{X}=[0,1]^{6}caligraphic_X = [ 0 , 1 ] start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT.

    H6(x1,x2,x3,x4,x5,x6)=i=14wiexp(j=16Aij(xjCij)2).subscript𝐻6subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑥4subscript𝑥5subscript𝑥6superscriptsubscript𝑖14subscript𝑤𝑖superscriptsubscript𝑗16subscript𝐴𝑖𝑗superscriptsubscript𝑥𝑗subscript𝐶𝑖𝑗2\displaystyle H_{6}(x_{1},x_{2},x_{3},x_{4},x_{5},x_{6})=\sum_{i=1}^{4}w_{i}% \exp\left(-\sum_{j=1}^{6}A_{ij}(x_{j}-C_{ij})^{2}\right).italic_H start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_exp ( - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

In the definitions above, wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the ithsuperscript𝑖thi^{\text{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT element of the vector w=(1.01.23.03.2)𝑤superscriptmatrix1.01.23.03.2topw=\begin{pmatrix}1.0&1.2&3.0&3.2\end{pmatrix}^{\top}italic_w = ( start_ARG start_ROW start_CELL 1.0 end_CELL start_CELL 1.2 end_CELL start_CELL 3.0 end_CELL start_CELL 3.2 end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and Aijsubscript𝐴𝑖𝑗A_{ij}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and Cijsubscript𝐶𝑖𝑗C_{ij}italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT refer to the (i,j)thsuperscript𝑖𝑗th(i,j)^{\text{th}}( italic_i , italic_j ) start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT element of the matrices A𝐴Aitalic_A and C𝐶Citalic_C, defined below:

A=(103173.51.780.0510170.181433.51.7101781780.05100.114);C=104(1312169655691248283588623294135830737361004999123481451352228833047665040478828873257431091381)formulae-sequence𝐴matrix103173.51.780.0510170.181433.51.7101781780.05100.114𝐶superscript104matrix1312169655691248283588623294135830737361004999123481451352228833047665040478828873257431091381\displaystyle A=\begin{pmatrix}10&3&17&3.5&1.7&8\\ 0.05&10&17&0.1&8&14\\ 3&3.5&1.7&10&17&8\\ 17&8&0.05&10&0.1&14\end{pmatrix};\quad C=10^{-4}\cdot\begin{pmatrix}1312&1696&% 5569&124&8283&5886\\ 2329&4135&8307&3736&1004&9991\\ 2348&1451&3522&2883&3047&6650\\ 4047&8828&8732&5743&1091&381\end{pmatrix}italic_A = ( start_ARG start_ROW start_CELL 10 end_CELL start_CELL 3 end_CELL start_CELL 17 end_CELL start_CELL 3.5 end_CELL start_CELL 1.7 end_CELL start_CELL 8 end_CELL end_ROW start_ROW start_CELL 0.05 end_CELL start_CELL 10 end_CELL start_CELL 17 end_CELL start_CELL 0.1 end_CELL start_CELL 8 end_CELL start_CELL 14 end_CELL end_ROW start_ROW start_CELL 3 end_CELL start_CELL 3.5 end_CELL start_CELL 1.7 end_CELL start_CELL 10 end_CELL start_CELL 17 end_CELL start_CELL 8 end_CELL end_ROW start_ROW start_CELL 17 end_CELL start_CELL 8 end_CELL start_CELL 0.05 end_CELL start_CELL 10 end_CELL start_CELL 0.1 end_CELL start_CELL 14 end_CELL end_ROW end_ARG ) ; italic_C = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ⋅ ( start_ARG start_ROW start_CELL 1312 end_CELL start_CELL 1696 end_CELL start_CELL 5569 end_CELL start_CELL 124 end_CELL start_CELL 8283 end_CELL start_CELL 5886 end_CELL end_ROW start_ROW start_CELL 2329 end_CELL start_CELL 4135 end_CELL start_CELL 8307 end_CELL start_CELL 3736 end_CELL start_CELL 1004 end_CELL start_CELL 9991 end_CELL end_ROW start_ROW start_CELL 2348 end_CELL start_CELL 1451 end_CELL start_CELL 3522 end_CELL start_CELL 2883 end_CELL start_CELL 3047 end_CELL start_CELL 6650 end_CELL end_ROW start_ROW start_CELL 4047 end_CELL start_CELL 8828 end_CELL start_CELL 8732 end_CELL start_CELL 5743 end_CELL start_CELL 1091 end_CELL start_CELL 381 end_CELL end_ROW end_ARG )

For BPE and REDS, we consider a discretized version of the domain consisting of 2000200020002000, 7000700070007000 and 20000200002000020000 points chosen uniformly at random from the domain for the Branin, Hartmann-4444D and Hartmann-6666D functions respectively. We use the exponentially growing epoch schedule for both BPE and REDS as described in (Algorithm 1) for a fair comparison. We implement GP-ThreDS as described in Salgia et al. (2021). For each node in the tree, we consider a discretization, chosen uniformly at random, of size 100100100100, 200200200200 and 500500500500 for the Branin, Hartmann-4444D and Hartmann-6666D functions respectively. The values of (a,b)𝑎𝑏(a,b)( italic_a , italic_b ) (the lower and upper bound on f(x*)𝑓superscript𝑥f(x^{*})italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )) are set to (0.5,1.2)0.51.2(0.5,1.2)( 0.5 , 1.2 ), (0,3.8)03.8(0,3.8)( 0 , 3.8 ) and (0,3.5)03.5(0,3.5)( 0 , 3.5 ) for Branin, Hartmann-4444D and Hartmann-6666D respectively. We set τ=0.2𝜏0.2\tau=0.2italic_τ = 0.2 for all experiments. The value of ατsubscript𝛼𝜏\alpha_{\tau}italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT is set to 1111 across all experiments, except for BPE with Hartmann-4444D and Hartmann-6666D for which we set it to 0.750.750.750.75. These values are obtained using a grid search over [0.25,2]0.252[0.25,2][ 0.25 , 2 ] in steps of 0.250.250.250.25. The parameter N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in REDS and BPE was set to 50505050 for Branin and 100100100100 for Hartmann-4444D and Hartmann-6666D functions.

BPE GP-ThreDS REDS
Branin 29.84±6.13plus-or-minus29.846.1329.84\pm 6.1329.84 ± 6.13 4.37±0.28plus-or-minus4.370.284.37\pm 0.284.37 ± 0.28 0.32±0.08plus-or-minus0.320.08\mathbf{0.32}\pm 0.08bold_0.32 ± 0.08
Hartmann-4D 38.45±3.93plus-or-minus38.453.9338.45\pm 3.9338.45 ± 3.93 7.59±0.54plus-or-minus7.590.547.59\pm 0.547.59 ± 0.54 0.47±0.11plus-or-minus0.470.11\mathbf{0.47}\pm 0.11bold_0.47 ± 0.11
Hartmann-6D 119.71±23.75plus-or-minus119.7123.75119.71\pm 23.75119.71 ± 23.75 19.33±0.54plus-or-minus19.330.5419.33\pm 0.5419.33 ± 0.54 1.19±0.08plus-or-minus1.190.08\mathbf{1.19}\pm 0.08bold_1.19 ± 0.08
Table 1: Time taken (in seconds) by different algorithms across the different benchmark functions.

For all the experiments, we used the Square exponential kernel. The length scale was set to 0.20.20.20.2 for Branin and 1111 for Hartmann-4444D and Hartmann-6666D functions. We corrupted the observations with a zero mean Gaussian noise to the with a standard deviation of 0.20.20.20.2. All the algorithms were run for T=1000𝑇1000T=1000italic_T = 1000 time steps. We recorded the cumulative regret and time taken by different algorithms for 10101010 Monte Carlo runs for each benchmark function.

The regret for the algorithms over different functions is plotted in Figure 1. The shaded region represents the error bars upto standard deviation on either side. The running times, with an error bar of one standard deviation, are tabulated in Table 1. As evident from the plots in Figure 1, the regret incurred by REDS is comparable to that of other algorithms for all benchmark functions. At the same time, REDS offers about a 15×15\times15 × and 100×100\times100 × speedup in terms of runtime over the GP-ThreDS and BPE (See Table 1), demonstrating the practical benefits of our proposed methodology of random sampling.

6 Conclusion

In this work, we studied the methodology of exploring the domain using random samples drawn from a distribution supported on a compact domain. We showed that this non-adaptive approach offers the optimal-order of worst case predictive error for RKHS function in both noisy and noise-free feedback settings. The proposed approach offers a simple alternative for designing Bayesian Optimization algorithms which typically involve choosing points through a computationally expensive step of optimizing a non-convex acquisition function. Based on this methodology, we developed a algorithm that achieves order-optimal regret in both noisy and noise-free settings, resolving a COLT open problem. We demonstrated the computational advantage of the proposed approach through an empirical study, where the proposed algorithm achieved upto a 100×100\times100 × runtime speed up over state-of-the-art algorithms.

References

  • Alberti et al. (2011) G. Alberti, S. Bianchini, and G. Crippa. Structure of level sets and sard-type properties of lipschitz maps. Annali della Scuola Normale Superiore di Pisa. Classe di Scienze. Serie V, 4, 08 2011. doi: 10.2422/2036-2145.201107_006.
  • Arcangéli et al. (2012) R. Arcangéli, M. C. López de Silanes, and J. J. Torrens. Extension of sampling inequalities to Sobolev semi-norms of fractional order and derivative data. Numerische Mathematik, 121(3):587–608, 2012. ISSN 0029599X. doi: 10.1007/s00211-011-0439-3.
  • Azimi et al. (2012) J. Azimi, A. Jalali, and X. Z. Fern. Hybrid batch bayesian optimization. In Proceedings of the 29th International Conference on Machine Learning, ICML, volume 2, pages 1215–1222, 2012. ISBN 9781450312851.
  • Bastian Bohn (2017) Bastian Bohn. Error analysis of regularized and unregularized least-squares regression on discretized function spaces. PhD thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, 2017. URL https://hdl.handle.net/20.500.11811/7094.
  • Bohn (2018) B. Bohn. On the convergence rate of sparse grid least squares regression. In Sparse Grids and Applications, pages 19–41. Springer International Publishing, 2018. ISBN 978-3-319-75426-0.
  • Bohn and Griebel (2017) B. Bohn and M. Griebel. Error estimates for multivariate regression on discretized function spaces. SIAM Journal on Numerical Analysis, 55(4):1843–1866, 2017.
  • Brenner et al. (2008) S. C. Brenner, L. R. Scott, and L. R. Scott. The mathematical theory of finite element methods, volume 3. Springer, 2008.
  • Brochu et al. (2010) E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.
  • Buchholz (2001) A. Buchholz. Operator khintchine inequality in non-commutative probability. Mathematische Annalen, 319(1):1–16, 2001.
  • Buchholz (2005) A. Buchholz. Optimal constants in khintchine type inequalities for fermions, rademachers and q-gaussian operators. Bulletin of The Polish Academy of Sciences Mathematics, 53:315–321, 2005. URL https://api.semanticscholar.org/CorpusID:55683104.
  • Bull (2011) A. D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12:2879–2904, 2011. ISSN 15324435.
  • Camilleri et al. (2021) R. Camilleri, J. Katz-Samuels, and K. Jamieson. High-Dimensional Experimental Design and Kernel Bandits. In Proceedings of the 38th International Conference on Machine Learning, ICML, 2021. URL https://arxiv.longhoe.net/abs/2105.05806v1http://arxiv.longhoe.net/abs/2105.05806.
  • Chatterji et al. (2019) N. Chatterji, A. Pacchiano, and P. Bartlett. Online learning with kernel losses. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 971–980. PMLR, 2019.
  • Chkifa et al. (2015) A. Chkifa, A. Cohen, G. Migliorati, F. Nobile, and R. Tempone. Discrete least squares polynomial approximation with random evaluations- application to parametric and stochastic elliptic pdes. ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, 49(3):815–837, 2015.
  • Chowdhury and Gopalan (2017) S. R. Chowdhury and A. Gopalan. On kernelized multi-armed bandits. In Proceedings of the 34th International Conference on Machine Learning, ICML, volume 2, pages 1397–1422, 2017. ISBN 9781510855144.
  • Cohen and Migliorati (2017) A. Cohen and G. Migliorati. Optimal weighted least-squares methods. The SIAM journal of computational mathematics, 3:181–203, 2017.
  • Cohen et al. (2013) A. Cohen, M. Davenport, and D. Leviatan. On the stability and accuracy of least squares approximations. Foundations of Computational Mathematics, 13:819–834, 2013.
  • De Freitas et al. (2012) N. De Freitas, A. J. Smola, and M. Zoghi. Exponential regret bounds for Gaussian process bandits with deterministic observations. In Proceedings of the 29th International Conference on Machine Learning, ICML, volume 2, pages 1743–1750, 2012. ISBN 9781450312851.
  • Dunkl and Xu (2014) C. F. Dunkl and Y. Xu. Orthogonal Polynomials of Several Variables. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2 edition, 2014. doi: 10.1017/CBO9781107786134.
  • Frazier et al. (2008) P. I. Frazier, W. B. Powell, and S. Dayanik. A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47(5):2410–2439, 2008.
  • Greenhill et al. (2020) S. Greenhill, S. Rana, S. Gupta, P. Vellanki, and S. Venkatesh. Bayesian optimization for adaptive experimental design: A review. IEEE access, 8:13937–13948, 2020.
  • Gröchenig (2020) K. Gröchenig. Sampling, marcinkiewicz–zygmund inequalities, approximation, and quadrature rules. Journal of Approximation Theory, 257:105455, 2020.
  • Grünewälder et al. (2010) S. Grünewälder, J.-Y. Audibert, M. Opper, and J. Shawe-Taylor. Regret bounds for gaussian process bandit problems. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 273–280, 2010.
  • Jones (2001) D. R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of global optimization, 21:345–383, 2001.
  • Jones et al. (1998) D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998.
  • Kämmerer et al. (2021) L. Kämmerer, T. Ullrich, and T. Volkmer. Worst-case recovery guarantees for least squares approximation using random samples. Constructive Approximation, 54(2):295–352, 2021.
  • Kanagawa et al. (2018) M. Kanagawa, P. Hennig, D. Sejdinovic, and B. K. Sriperumbudur. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences, 2018.
  • Krieg and Ullrich (2021a) D. Krieg and M. Ullrich. Function values are enough for l2-approximation. Foundations of Computational Mathematics, 21:1141–1151, 2021a. doi: https://doi.org/10.1007/s10208-020-09481-w.
  • Krieg and Ullrich (2021b) D. Krieg and M. Ullrich. Function values are enough for l2-approximation: Part ii. Journal of Complexity, 66, 2021b. ISSN 0885-064X. doi: https://doi.org/10.1016/j.jco.2021.101569.
  • Kushner (1964) H. Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86:97–106, 1964.
  • Lee (2010) J. Lee. Introduction to Topological Manifolds. Springer, 2010.
  • Li et al. (2016) L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization, 2016.
  • Li and Scarlett (2022) Z. Li and J. Scarlett. Gaussian process bandit optimization with few batches. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS, 2022.
  • Lizotte et al. (2007) D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans. Automatic gait optimization with gaussian process regression. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI), volume 7, pages 944–949, 2007.
  • Lyu et al. (2020) Y. Lyu, Y. Yuan, and I. W. Tsang. Efficient batch black-box optimization with deterministic regret bounds, 2020.
  • Močkus (1975) J. Močkus. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference, pages 400–404, Berlin, Heidelberg, 1975. Springer Berlin Heidelberg. ISBN 978-3-540-37497-8.
  • Moeller and Ullrich (2021) M. Moeller and T. Ullrich. L 2-norm sampling discretization and recovery of functions from rkhs with finite trace. Sampling Theory, Signal Processing, and Data Analysis, 19(2):13, 2021.
  • Močkus et al. (1978) J. Močkus, V. Tiesis, and A. Žilinskas. Towards Global Optimization, volume 2, chapter The application of Bayesian methods for seeking the extremum, pages 117–129. Elsevier, 09 1978. ISBN 0-444-85171-2.
  • Narcowich et al. (2006) F. J. Narcowich, J. D. Ward, and H. Wendland. Sobolev error estimates and a bernstein inequality for scattered data interpolation via radial basis functions. Constructive Approximation, 24:175–186, 2006.
  • Ostrowski (1959) A. M. Ostrowski. A quantitative formulation of slyvester’s law of inertia. Proceedings of the National Academy of Sciences, 45(5):740–744, 1959. doi: 10.1073/pnas.45.5.740. URL https://www.pnas.org/doi/abs/10.1073/pnas.45.5.740.
  • Picheny et al. (2013) V. Picheny, T. Wagner, and D. Ginsbourger. A benchmark of kriging-based infill criteria for noisy optimization. Structural and Multidisciplinary Optimization, 48(3):607–626, 2013. ISSN 1615147X. doi: 10.1007/s00158-013-0919-4. URL https://link.springer.com/article/10.1007/s00158-013-0919-4.
  • Riutort-Mayol et al. (2023) G. Riutort-Mayol, P.-C. Bürkner, M. R. Andersen, A. Solin, and A. Vehtari. Practical hilbert space approximate bayesian gaussian processes for probabilistic programming. Statistics and Computing, 33(1):17, 2023.
  • Rudin (1987) W. Rudin. Real and complex analysis, 3rd ed. McGraw-Hill, Inc., USA, 1987. ISBN 0070542341.
  • Salgia et al. (2021) S. Salgia, S. Vakili, and Q. Zhao. A domain-shrinking based Bayesian optimization algorithm with order-optimal regret performance. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems, volume 34, 2021.
  • Salgia et al. (2022) S. Salgia, S. Vakili, and Q. Zhao. Collaborative Learning in Kernel-based Bandits for Distributed Users, 2022.
  • Scarlett et al. (2017) J. Scarlett, I. Bogunovic, and V. Cehver. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization. In Conference on Learning Theory, volume 65, pages 1–20, 2017.
  • Smale and Zhou (2004) S. Smale and D.-X. Zhou. Shannon sampling and function reconstruction from point values. Bulletin of The American Mathematical Society, 41:279–306, 2004. doi: 10.1090/S0273-0979-04-01025-0.
  • Srinivas et al. (2010) N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, ICML, pages 1015–1022, 2010. ISBN 9781605589077. doi: 10.1109/TIT.2011.2182033.
  • Steinwart and Christmann (2008) I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008. doi: https://doi.org/10.1007/978-0-387-77242-4.
  • Törn and Žilinskas (1989) A. Törn and A. Žilinskas. Global Optimization. Springer Berlin, Heidelberg, 1989.
  • Tropp (2012) J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12:389–434, 2012.
  • Tuo and Wang (2020) R. Tuo and W. Wang. Kriging prediction with isotropic matérn correlations: Robustness and experimental designs. The Journal of Machine Learning Research, 21(1):7604–7641, 2020.
  • Vakili (2022) S. Vakili. Open problem: Regret bounds for noise-free kernel-based bandits. In Proceedings of 35th Conference on Learning Theory (COLT), volume 178, pages 5624–5629, 2022.
  • Vakili et al. (2021a) S. Vakili, N. Bouziani, S. Jalali, A. Bernacchia, and D.-s. Shiu. Optimal order simple regret for Gaussian process bandits. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems, 2021a.
  • Vakili et al. (2021b) S. Vakili, K. Khezeli, and V. Picheny. On information gain and regret bounds in Gaussian process bandits. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, AISTATS, 2021b.
  • Valko et al. (2013) M. Valko, N. Korda, R. Munos, I. Flaounas, and N. Cristianini. Finite-time analysis of kernelised contextual bandits. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, UAI, pages 654–663, 2013.
  • Vanchinathan et al. (2014) H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. In Proceedings of the 8th ACM Conference on Recommender Systems, pages 225–232, 2014.
  • Vazquez and Bect (2010) E. Vazquez and J. Bect. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and Inference, 140(11):3088–3095, 2010. ISSN 0378-3758. doi: https://doi.org/10.1016/j.jspi.2010.04.018.
  • Wasserman (2008) L. Wasserman. Lecture notes on statistical methods for machine learning, 2008. URL https://www.stat.cmu.edu/~larry/=sml/Concentration.pdf.
  • Wendland (2004) H. Wendland. Scattered Data Approximation. Cambridge University Press, 2004. doi: 10.1017/CBO9780511617539.
  • Wenzel et al. (2021) T. Wenzel, G. Santin, and B. Haasdonk. A novel class of stabilized greedy kernel approximation algorithms: Convergence, stability and uniform point distribution. Journal of Approximation Theory, 262, 2021. ISSN 10960430. doi: 10.1016/j.jat.2020.105508.
  • Wynne et al. (2021) G. Wynne, F.-X. Briol, and M. Girolami. Convergence guarantees for gaussian process means with misspecified likelihoods and smoothness. The Journal of Machine Learning Research, 22(1):5468–5507, 2021.

Appendix A Proof of Theorem 3.1

We begin with setting up some notation that will be used throughout the proof. Throughout the appendix, we will represent the elements of ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as infinite dimensional vectors and operators over these function spaces as infinite dimensional matrices. We adopt such a convention for ease for presentation while kee** in mind that despite the matrix representation, the actual operation is over elements of ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Recall that we defined the sample covariance operator 𝐙^^𝐙\hat{\mathbf{Z}}over^ start_ARG bold_Z end_ARG for a randomly chosen sample Xn={x1,x2,,xn}subscript𝑋𝑛subscript𝑥1subscript𝑥2subscript𝑥𝑛X_{n}=\{x_{1},x_{2},\dots,x_{n}\}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and its expected value 𝐙=𝔼[𝐙^]𝐙𝔼delimited-[]^𝐙\mathbf{Z}=\mathbb{E}[\hat{\mathbf{Z}}]bold_Z = blackboard_E [ over^ start_ARG bold_Z end_ARG ] as follows for any gk𝑔subscript𝑘g\in\mathcal{H}_{k}italic_g ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

𝐙^g^𝐙𝑔\displaystyle\hat{\mathbf{Z}}gover^ start_ARG bold_Z end_ARG italic_g :=[i=1ng,ψxiψxi]+τgassignabsentdelimited-[]superscriptsubscript𝑖1𝑛𝑔subscript𝜓subscript𝑥𝑖subscript𝜓subscript𝑥𝑖𝜏𝑔\displaystyle:=\left[\sum_{i=1}^{n}\langle g,\psi_{x_{i}}\rangle\psi_{x_{i}}% \right]+\tau g:= [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟨ italic_g , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] + italic_τ italic_g
𝐙𝐙\displaystyle\mathbf{Z}bold_Z :=𝔼[𝐙^].assignabsent𝔼delimited-[]^𝐙\displaystyle:=\mathbb{E}[\hat{\mathbf{Z}}].:= blackboard_E [ over^ start_ARG bold_Z end_ARG ] .

In the matrix-vector notation, the operators (equivalently, matrices) are given as:

𝐙^^𝐙\displaystyle\hat{\mathbf{Z}}over^ start_ARG bold_Z end_ARG :=(i=1nψxiψxi)+τ𝐈𝐝assignabsentsuperscriptsubscript𝑖1𝑛subscript𝜓subscript𝑥𝑖superscriptsubscript𝜓subscript𝑥𝑖top𝜏𝐈𝐝\displaystyle:=\left(\sum_{i=1}^{n}\psi_{x_{i}}\psi_{x_{i}}^{\top}\right)+\tau% \mathbf{Id}:= ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) + italic_τ bold_Id
𝐙𝐙\displaystyle\mathbf{Z}bold_Z =𝔼[𝐙^]=𝔼[i=1nψxiψxi]+τ𝐈𝐝absent𝔼delimited-[]^𝐙𝔼delimited-[]superscriptsubscript𝑖1𝑛subscript𝜓subscript𝑥𝑖superscriptsubscript𝜓subscript𝑥𝑖top𝜏𝐈𝐝\displaystyle=\mathbb{E}[\hat{\mathbf{Z}}]=\mathbb{E}\left[\sum_{i=1}^{n}\psi_% {x_{i}}\psi_{x_{i}}^{\top}\right]+\tau\mathbf{Id}= blackboard_E [ over^ start_ARG bold_Z end_ARG ] = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] + italic_τ bold_Id
=n𝔼[ψx1ψx1]+τ𝐈𝐝=n𝚲+τ𝐈𝐝,absent𝑛𝔼delimited-[]subscript𝜓subscript𝑥1superscriptsubscript𝜓subscript𝑥1top𝜏𝐈𝐝𝑛𝚲𝜏𝐈𝐝\displaystyle=n\mathbb{E}[\psi_{x_{1}}\psi_{x_{1}}^{\top}]+\tau\mathbf{Id}=n% \boldsymbol{\Lambda}+\tau\mathbf{Id},= italic_n blackboard_E [ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] + italic_τ bold_Id = italic_n bold_Λ + italic_τ bold_Id ,

where 𝐈𝐝𝐈𝐝\mathbf{Id}bold_Id is the identity matrix (operator) and 𝚲=diag(λ1,λ2,)𝚲diagsubscript𝜆1subscript𝜆2\boldsymbol{\Lambda}=\text{diag}(\lambda_{1},\lambda_{2},\dots)bold_Λ = diag ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ) is the diagonal matrices consisting of the eigenvalues of the kernel k𝑘kitalic_k corresponding to the measure ϱitalic-ϱ\varrhoitalic_ϱ. If we define 𝚿n:=[ψx1,ψx2,,ψxn]assignsubscript𝚿𝑛subscript𝜓subscript𝑥1subscript𝜓subscript𝑥2subscript𝜓subscript𝑥𝑛\boldsymbol{\Psi}_{n}:=[\psi_{x_{1}},\psi_{x_{2}},\dots,\psi_{x_{n}}]bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := [ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ], then we can also write 𝐙^=𝚿n𝚿n+τ𝐈𝐝^𝐙subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝜏𝐈𝐝\hat{\mathbf{Z}}=\boldsymbol{\Psi}_{n}\boldsymbol{\Psi}_{n}^{\top}+\tau\mathbf% {Id}over^ start_ARG bold_Z end_ARG = bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ bold_Id. Consequently, the posterior variance at any point x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X is given as:

σn,τ2(x)=τψx𝐙^1ψx.superscriptsubscript𝜎𝑛𝜏2𝑥𝜏superscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\displaystyle\sigma_{n,\tau}^{2}(x)=\tau\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}% \psi_{x}.italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = italic_τ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT .

For any R𝑅R\in\mathbb{N}italic_R ∈ blackboard_N, we define the following two quantities that will be relevant during our analysis:

N(R)𝑁𝑅\displaystyle N(R)italic_N ( italic_R ) :=supx𝒳j=1Rφj2(x),assignabsentsubscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅superscriptsubscript𝜑𝑗2𝑥\displaystyle:=\sup_{x\in\mathcal{X}}\sum_{j=1}^{R}\varphi_{j}^{2}(x),:= roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) , (7)
T(R)𝑇𝑅\displaystyle T(R)italic_T ( italic_R ) :=supx𝒳j=R+1λjφj2(x)=supx𝒳j=R+1υj2(x).assignabsentsubscriptsupremum𝑥𝒳superscriptsubscript𝑗𝑅1subscript𝜆𝑗superscriptsubscript𝜑𝑗2𝑥subscriptsupremum𝑥𝒳superscriptsubscript𝑗𝑅1superscriptsubscript𝜐𝑗2𝑥\displaystyle:=\sup_{x\in\mathcal{X}}\sum_{j=R+1}^{\infty}\lambda_{j}\varphi_{% j}^{2}(x)=\sup_{x\in\mathcal{X}}\sum_{j=R+1}^{\infty}\upsilon_{j}^{2}(x).:= roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) . (8)

Recall that {φj}jsubscriptsubscript𝜑𝑗𝑗\{\varphi_{j}\}_{j\in\mathbb{N}}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT are eigenfunctions of the kernel operator and form an orthonormal system in L2(ϱ,𝒳)subscript𝐿2italic-ϱ𝒳L_{2}(\varrho,\mathcal{X})italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϱ , caligraphic_X ) and {υ}jsubscript𝜐𝑗\{\upsilon\}_{j}{ italic_υ } start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are an orthonormal basis for ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The term N(R)𝑁𝑅N(R)italic_N ( italic_R ) is often referred to as the spectral function (see Gröchenig (2020) and references therein) and in case of orthogonal polynomials, it is the inverse of the infimum of the Christoffel function Dunkl and Xu (2014). Both N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ) are fundamental quantities that appear in the analysis of reconstruction and estimation of functions.

Lastly, based on N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ), for a given kernel k𝑘kitalic_k, measure ϱitalic-ϱ\varrhoitalic_ϱ and δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), we define the following terms for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N and τ>0𝜏0\tau>0italic_τ > 0:

k,ϱ(1)(n,τ,δ)superscriptsubscript𝑘italic-ϱ1𝑛𝜏𝛿\displaystyle\mathcal{R}_{k,\varrho}^{(1)}(n,\tau,\delta)caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( italic_n , italic_τ , italic_δ ) :={R:N(R)n1944log(6n/δ)}assignabsentconditional-set𝑅𝑁𝑅𝑛19446𝑛𝛿\displaystyle:=\left\{R\in\mathbb{N}:N(R)\leq\frac{n}{1944\log(6n/\delta)}\right\}:= { italic_R ∈ blackboard_N : italic_N ( italic_R ) ≤ divide start_ARG italic_n end_ARG start_ARG 1944 roman_log ( 6 italic_n / italic_δ ) end_ARG }
k,ϱ(2)(n,τ,δ)superscriptsubscript𝑘italic-ϱ2𝑛𝜏𝛿\displaystyle\mathcal{R}_{k,\varrho}^{(2)}(n,\tau,\delta)caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( italic_n , italic_τ , italic_δ ) :={R:max{42T(R),nλR+1}log(12δ)τ27}assignabsentconditional-set𝑅42𝑇𝑅𝑛subscript𝜆𝑅112𝛿𝜏27\displaystyle:=\left\{R\in\mathbb{N}:\max\{42T(R),n\lambda_{R+1}\}\log\left(% \frac{12}{\delta}\right)\leq\frac{\tau}{27}\right\}:= { italic_R ∈ blackboard_N : roman_max { 42 italic_T ( italic_R ) , italic_n italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT } roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) ≤ divide start_ARG italic_τ end_ARG start_ARG 27 end_ARG }
k,ϱ(n,τ,δ)subscript𝑘italic-ϱ𝑛𝜏𝛿\displaystyle\mathcal{R}_{k,\varrho}(n,\tau,\delta)caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT ( italic_n , italic_τ , italic_δ ) :=k,ϱ(1)(n,τ,δ)k,ϱ(2)(n,τ,δ)assignabsentsuperscriptsubscript𝑘italic-ϱ1𝑛𝜏𝛿superscriptsubscript𝑘italic-ϱ2𝑛𝜏𝛿\displaystyle:=\mathcal{R}_{k,\varrho}^{(1)}(n,\tau,\delta)\cap\mathcal{R}_{k,% \varrho}^{(2)}(n,\tau,\delta):= caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( italic_n , italic_τ , italic_δ ) ∩ caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( italic_n , italic_τ , italic_δ )
N¯(k,ϱ,δ,τ)¯𝑁𝑘italic-ϱ𝛿𝜏\displaystyle\overline{N}(k,\varrho,\delta,\tau)over¯ start_ARG italic_N end_ARG ( italic_k , italic_ϱ , italic_δ , italic_τ ) :=max{min{n:k,ϱ(n,τ,δ)},729F4log(12/δ)}assignabsent:𝑛subscript𝑘italic-ϱ𝑛𝜏𝛿729superscript𝐹412𝛿\displaystyle:=\max\left\{\min\left\{n:\mathcal{R}_{k,\varrho}(n,\tau,\delta)% \neq\emptyset\right\},\lceil 729\cdot F^{4}\cdot\log(12/\delta)\rceil\right\}:= roman_max { roman_min { italic_n : caligraphic_R start_POSTSUBSCRIPT italic_k , italic_ϱ end_POSTSUBSCRIPT ( italic_n , italic_τ , italic_δ ) ≠ ∅ } , ⌈ 729 ⋅ italic_F start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ⋅ roman_log ( 12 / italic_δ ) ⌉ }

The dependence on k𝑘kitalic_k and ϱitalic-ϱ\varrhoitalic_ϱ is implicit through {φj}jsubscriptsubscript𝜑𝑗𝑗\{\varphi_{j}\}_{j\in\mathbb{N}}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT and {λj}jsubscriptsubscript𝜆𝑗𝑗\{\lambda_{j}\}_{j\in\mathbb{N}}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT used to define N(R)𝑁𝑅N(R)italic_N ( italic_R ) and T(R)𝑇𝑅T(R)italic_T ( italic_R ). For brevity of notation, going forward, we drop the explicit description of dependence on k𝑘kitalic_k and ϱitalic-ϱ\varrhoitalic_ϱ.

We are now ready to prove the theorem. We first prove the statement of the theorem, assuming that the lemmas hold, followed by the proofs of the lemmas.

We begin with result for the noisy case, where τ>0𝜏0\tau>0italic_τ > 0 is fixed (independent of n𝑛nitalic_n). From Lemma 3.2, we know that for nN¯𝑛¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG, 𝐙1/2𝐙^𝐙1/2𝐈𝐝21/9subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝219\|\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}\|_{2}\leq 1/9∥ bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 / 9 holds with probability 1δ1𝛿1-\delta1 - italic_δ. Using this result along Lemma 3.3, we can conclude that ψx𝐙^1ψx2ψx𝐙1ψxsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥2superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}\leq 2\psi_{x}^{\top}{\mathbf{Z}}^% {-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ 2 italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT holds for all x𝑥xitalic_x. Thus, we have,

σn,τ2(x)superscriptsubscript𝜎𝑛𝜏2𝑥\displaystyle\sigma_{n,\tau}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) =τψx𝐙^1ψxabsent𝜏superscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\displaystyle=\tau\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}= italic_τ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT
2τψx𝐙1ψxabsent2𝜏superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\displaystyle\leq 2\tau\psi_{x}^{\top}{\mathbf{Z}}^{-1}\psi_{x}≤ 2 italic_τ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT
108F213τγ~Xn,τnabsent108superscript𝐹213𝜏subscript~𝛾subscript𝑋𝑛𝜏𝑛\displaystyle\leq\frac{108F^{2}}{13}\cdot\tau\cdot\frac{\tilde{\gamma}_{X_{n},% \tau}}{n}≤ divide start_ARG 108 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 13 end_ARG ⋅ italic_τ ⋅ divide start_ARG over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG
108F213τγn,τn,absent108superscript𝐹213𝜏subscript𝛾𝑛𝜏𝑛\displaystyle\leq\frac{108F^{2}}{13}\cdot\tau\cdot\frac{\gamma_{n,\tau}}{n},≤ divide start_ARG 108 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 13 end_ARG ⋅ italic_τ ⋅ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG , (9)

as required. The third line in the above expression follows from Lemma 3.4. We would like to emphasize that the polynomial eigendecay condition is not necessary to obtain the above relation. It is only necessary to bound the information gain in terms on n𝑛nitalic_n. Under the polynomial eigendecay condition with parameter β>1𝛽1\beta>1italic_β > 1, the above equation can also be written as

σn,τ2(x)C0(nτ)1β1log(n),superscriptsubscript𝜎𝑛𝜏2𝑥subscript𝐶0superscript𝑛𝜏1𝛽1𝑛\displaystyle\sigma_{n,\tau}^{2}(x)\leq C_{0}\cdot\left(\frac{n}{\tau}\right)^% {\frac{1}{\beta}-1}\log(n),italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ ( divide start_ARG italic_n end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_β end_ARG - 1 end_POSTSUPERSCRIPT roman_log ( italic_n ) ,

where we used the bound on information gain from Vakili et al. (2021b, Corollary 1) and C0subscript𝐶0C_{0}italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is an appropriately chosen constant independent of n𝑛nitalic_n and τ𝜏\tauitalic_τ.

We now consider the noise-free case. Since information gain is only defined for τ>0𝜏0\tau>0italic_τ > 0, we cannot directly extend the analysis as used in the noisy case by substituting τ=0𝜏0\tau=0italic_τ = 0. To circumvent this issue, we carefully choose τ*>0superscript𝜏0\tau^{*}>0italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT > 0, such that σn,τ*2superscriptsubscript𝜎𝑛superscript𝜏2\sigma_{n,\tau^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a close representation of σn,02superscriptsubscript𝜎𝑛02\sigma_{n,0}^{2}italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We choose τ*superscript𝜏\tau^{*}italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT to be dependent on n𝑛nitalic_n such that τ*superscript𝜏\tau^{*}italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT goes to 00 as n𝑛nitalic_n becomes larger. This allows σn,τ*2superscriptsubscript𝜎𝑛superscript𝜏2\sigma_{n,\tau^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to faithfully represent the value of σn,02superscriptsubscript𝜎𝑛02\sigma_{n,0}^{2}italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over the range of n𝑛nitalic_n. Specifically, we choose τ*=cn1β(log(n/δ))βsuperscript𝜏superscript𝑐superscript𝑛1𝛽superscript𝑛𝛿𝛽\tau^{*}=c^{\prime}n^{1-\beta}(\log(n/\delta))^{\beta}italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT ( roman_log ( italic_n / italic_δ ) ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT for cC(1944F2)βsuperscript𝑐𝐶superscript1944superscript𝐹2𝛽c^{\prime}\geq C(1944F^{2})^{\beta}italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ italic_C ( 1944 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT, where C𝐶Citalic_C is the constant in Assumption 2.3. The condition on constant csuperscript𝑐c^{\prime}italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ensures that N¯(k,ϱ,δ,τ*)¯𝑁𝑘italic-ϱ𝛿superscript𝜏\overline{N}(k,\varrho,\delta,\tau^{*})over¯ start_ARG italic_N end_ARG ( italic_k , italic_ϱ , italic_δ , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) exists. Since all conditions of the analysis for τ>0𝜏0\tau>0italic_τ > 0 (noisy case) are satisfied, we can directly invoke the result for τ>0𝜏0\tau>0italic_τ > 0. Using the bound on σn,τ2superscriptsubscript𝜎𝑛𝜏2\sigma_{n,\tau}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the monotonicity of σn,τ2superscriptsubscript𝜎𝑛𝜏2\sigma_{n,\tau}^{2}italic_σ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as a function of τ𝜏\tauitalic_τ, we obtain,

σn,02(x)σn,τ*2(x)C1n1β(log(n/δ))β,superscriptsubscript𝜎𝑛02𝑥superscriptsubscript𝜎𝑛superscript𝜏2𝑥subscript𝐶1superscript𝑛1𝛽superscript𝑛𝛿𝛽\displaystyle\sigma_{n,0}^{2}(x)\leq\sigma_{n,\tau^{*}}^{2}(x)\leq C_{1}\cdot n% ^{1-\beta}(\log(n/\delta))^{\beta},italic_σ start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_σ start_POSTSUBSCRIPT italic_n , italic_τ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT ( roman_log ( italic_n / italic_δ ) ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ,

where C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a constant independent of n𝑛nitalic_n.

In the following subsections, we prove Lemmas 3.23.3 and 3.4.

A.1 Proof of Lemma 3.2

Since we are interested in bounding the 2222-norm of the operator 𝐙1/2𝐙^𝐙1/2𝐈𝐝superscript𝐙12^𝐙superscript𝐙12𝐈𝐝\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id, we will focus on finding an upper bound on g(𝐙1/2𝐙^𝐙1/2𝐈𝐝)gsuperscript𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝑔g^{\top}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})gitalic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) italic_g that holds uniformly for all functions g𝑔gitalic_g in the unit ball in RKHS, i.e., {g:gk1}conditional-set𝑔subscriptnorm𝑔subscript𝑘1\{g:\|g\|_{\mathcal{H}_{k}}\leq 1\}{ italic_g : ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ 1 }. The high level idea is to separately consider the contribution of component of g𝑔gitalic_g that belongs to the subspace spanned by eigenfunctions corresponding to the “large” eigenvalues, i.e., head of the spectrum and those corresponding to the “small” eigenvalues, i.e., tail of the spectrum.

Throughout the proof, we fix a Rn,τ𝑅subscript𝑛𝜏R\in\mathcal{R}_{n,\tau}italic_R ∈ caligraphic_R start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT. The existence of such an R𝑅Ritalic_R is guaranteed by the assumption n>N¯𝑛¯𝑁n>\overline{N}italic_n > over¯ start_ARG italic_N end_ARG. For the analysis, we define two projection operators, 𝐏𝐏\mathbf{P}bold_P and 𝐐𝐐\mathbf{Q}bold_Q. We define 𝐏𝐏\mathbf{P}bold_P as the projection operator onto the subspace spanned by {υj}j=1Rsuperscriptsubscriptsubscript𝜐𝑗𝑗1𝑅\{\upsilon_{j}\}_{j=1}^{R}{ italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT, i.e., for any g=jgjυjk𝑔subscript𝑗subscript𝑔𝑗subscript𝜐𝑗subscript𝑘g=\sum_{j\in\mathbb{N}}g_{j}\upsilon_{j}\in\mathcal{H}_{k}italic_g = ∑ start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝐏g=j=1Rgjυj𝐏𝑔superscriptsubscript𝑗1𝑅subscript𝑔𝑗subscript𝜐𝑗\mathbf{P}g=\sum_{j=1}^{R}g_{j}\upsilon_{j}bold_P italic_g = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_υ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Note that 𝐏𝐏\mathbf{P}bold_P is an orthogonal projection operator. Similarly, we define 𝐐=Id𝐏𝐐Id𝐏\mathbf{Q}=\text{Id}-\mathbf{P}bold_Q = Id - bold_P.

We also introduce some additional notation for the ease of presentation. We define 𝐋𝐋\mathbf{L}bold_L to be the diagonal matrix (operator) whose jthsuperscript𝑗thj^{\text{th}}italic_j start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT entry is λjnλj+τsubscript𝜆𝑗𝑛subscript𝜆𝑗𝜏\dfrac{\lambda_{j}}{n\lambda_{j}+\tau}divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG. Similarly, let ωi=𝚲1/2ψxisubscript𝜔𝑖superscript𝚲12subscript𝜓subscript𝑥𝑖\omega_{i}=\boldsymbol{\Lambda}^{-1/2}\psi_{x_{i}}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n. Using this notation, we can rewrite the matrix 𝐙1/2𝐙^𝐙1/2𝐈𝐝superscript𝐙12^𝐙superscript𝐙12𝐈𝐝\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id as

𝐙1/2𝐙^𝐙1/2𝐈𝐝superscript𝐙12^𝐙superscript𝐙12𝐈𝐝\displaystyle\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id =𝐙1/2(i=1nψxiψxi+τ𝐈𝐝)𝐙1/2𝐈𝐝absentsuperscript𝐙12superscriptsubscript𝑖1𝑛subscript𝜓subscript𝑥𝑖superscriptsubscript𝜓subscript𝑥𝑖top𝜏𝐈𝐝superscript𝐙12𝐈𝐝\displaystyle=\mathbf{Z}^{-1/2}\left(\sum_{i=1}^{n}\psi_{x_{i}}\psi_{x_{i}}^{% \top}+\tau\mathbf{Id}\right)\mathbf{Z}^{-1/2}-\mathbf{Id}= bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ bold_Id ) bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id
=i=1n(𝐙1/2ψxi)(𝐙1/2ψxi)+τ𝐙1𝐈𝐝absentsuperscriptsubscript𝑖1𝑛superscript𝐙12subscript𝜓subscript𝑥𝑖superscriptsuperscript𝐙12subscript𝜓subscript𝑥𝑖top𝜏superscript𝐙1𝐈𝐝\displaystyle=\sum_{i=1}^{n}(\mathbf{Z}^{-1/2}\psi_{x_{i}})(\mathbf{Z}^{-1/2}% \psi_{x_{i}})^{\top}+\tau\mathbf{Z}^{-1}-\mathbf{Id}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_τ bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_Id
=i=1n(𝐋1/2ωi)(𝐋1/2ωi)n𝐋.absentsuperscriptsubscript𝑖1𝑛superscript𝐋12subscript𝜔𝑖superscriptsuperscript𝐋12subscript𝜔𝑖top𝑛𝐋\displaystyle=\sum_{i=1}^{n}(\mathbf{L}^{1/2}\omega_{i})(\mathbf{L}^{1/2}% \omega_{i})^{\top}-n\mathbf{L}.= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_n bold_L .

For any gk𝑔subscript𝑘g\in\mathcal{H}_{k}italic_g ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we have the following decomposition:

|g(𝐙1/2𝐙^𝐙1/2𝐈𝐝)g|superscript𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝑔\displaystyle|g^{\top}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-% \mathbf{Id})g|| italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) italic_g | =|(𝐏g+𝐐g)(𝐙1/2𝐙^𝐙1/2𝐈𝐝)(𝐏g+𝐐g)|absentsuperscript𝐏𝑔𝐐𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐏𝑔𝐐𝑔\displaystyle=|(\mathbf{P}g+\mathbf{Q}g)^{\top}(\mathbf{Z}^{-1/2}\hat{\mathbf{% Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})(\mathbf{P}g+\mathbf{Q}g)|= | ( bold_P italic_g + bold_Q italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) ( bold_P italic_g + bold_Q italic_g ) |
|(𝐏g)(𝐙1/2𝐙^𝐙1/2𝐈𝐝)(𝐏g)|+|(𝐐g)(𝐙1/2𝐙^𝐙1/2𝐈𝐝)(𝐐g)|absentsuperscript𝐏𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐏𝑔superscript𝐐𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐐𝑔\displaystyle\leq|(\mathbf{P}g)^{\top}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}% \mathbf{Z}^{-1/2}-\mathbf{Id})(\mathbf{P}g)|+|(\mathbf{Q}g)^{\top}(\mathbf{Z}^% {-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})(\mathbf{Q}g)|≤ | ( bold_P italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) ( bold_P italic_g ) | + | ( bold_Q italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) ( bold_Q italic_g ) |
+|(𝐏g)(𝐙1/2𝐙^𝐙1/2𝐈𝐝)+(𝐐g)(𝐙1/2𝐙^𝐙1/2𝐈𝐝)(𝐏g)|superscript𝐏𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝superscript𝐐𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐏𝑔\displaystyle~{}~{}~{}~{}~{}~{}+|(\mathbf{P}g)^{\top}(\mathbf{Z}^{-1/2}\hat{% \mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})+(\mathbf{Q}g)^{\top}(\mathbf{Z}^{-1/% 2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})(\mathbf{P}g)|+ | ( bold_P italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) + ( bold_Q italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) ( bold_P italic_g ) |
|g𝐏(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐏g|:=E1+|g𝐐(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐐g|:=E2absentsubscriptsuperscript𝑔top𝐏superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐏𝑔assignabsentsubscript𝐸1subscriptsuperscript𝑔top𝐐superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐐𝑔assignabsentsubscript𝐸2\displaystyle\leq\underbrace{|g^{\top}\mathbf{P}(\mathbf{Z}^{-1/2}\hat{\mathbf% {Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})\mathbf{P}g|}_{:=E_{1}}+\underbrace{|g^{\top% }\mathbf{Q}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})% \mathbf{Q}g|}_{:=E_{2}}≤ under⏟ start_ARG | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_P italic_g | end_ARG start_POSTSUBSCRIPT := italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_Q italic_g | end_ARG start_POSTSUBSCRIPT := italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
+2|g𝐏(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐐g|:=E3.2subscriptsuperscript𝑔top𝐏superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐐𝑔assignabsentsubscript𝐸3\displaystyle~{}~{}~{}~{}~{}~{}+2\underbrace{|g^{\top}\mathbf{P}(\mathbf{Z}^{-% 1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})\mathbf{Q}g|}_{:=E_{3}}.+ 2 under⏟ start_ARG | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_Q italic_g | end_ARG start_POSTSUBSCRIPT := italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT . (10)

We separately bound the terms E1,E2subscript𝐸1subscript𝐸2E_{1},E_{2}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, beginning we E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We have,

E1subscript𝐸1\displaystyle E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =|g𝐏(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐏g|absentsuperscript𝑔top𝐏superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐏𝑔\displaystyle=|g^{\top}\mathbf{P}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^% {-1/2}-\mathbf{Id})\mathbf{P}g|= | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_P italic_g |
=|(𝐏g)𝐏(i=1n(𝐋1/2ωi)(𝐋1/2ωi)nL𝐋)𝐏(𝐏g)|absentsuperscript𝐏𝑔top𝐏superscriptsubscript𝑖1𝑛superscript𝐋12subscript𝜔𝑖superscriptsuperscript𝐋12subscript𝜔𝑖top𝑛𝐿𝐋𝐏𝐏𝑔\displaystyle=\left|(\mathbf{P}g)^{\top}\mathbf{P}\left(\sum_{i=1}^{n}(\mathbf% {L}^{1/2}\omega_{i})(\mathbf{L}^{1/2}\omega_{i})^{\top}-nL\mathbf{L}\right)% \mathbf{P}(\mathbf{P}g)\right|= | ( bold_P italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_n italic_L bold_L ) bold_P ( bold_P italic_g ) |
=|(𝐏g)(i=1n(𝐏𝐋1/2𝐏ωi)(𝐏𝐋1/2𝐏ωi)n𝐏𝐋𝐏)(𝐏g)|absentsuperscript𝐏𝑔topsuperscriptsubscript𝑖1𝑛superscript𝐏𝐋12𝐏subscript𝜔𝑖superscriptsuperscript𝐏𝐋12𝐏subscript𝜔𝑖top𝑛𝐏𝐋𝐏𝐏𝑔\displaystyle=\left|(\mathbf{P}g)^{\top}\left(\sum_{i=1}^{n}(\mathbf{P}\mathbf% {L}^{1/2}\mathbf{P}\omega_{i})(\mathbf{P}\mathbf{L}^{1/2}\mathbf{P}\omega_{i})% ^{\top}-n\mathbf{P}\mathbf{L}\mathbf{P}\right)(\mathbf{P}g)\right|= | ( bold_P italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_n bold_PLP ) ( bold_P italic_g ) |
=n|(𝐏g)𝐏𝐋1/2𝐏(1ni=1n(𝐏ωi)(𝐏ωi)𝐏)𝐏𝐋1/2𝐏(𝐏g)|absent𝑛superscript𝐏𝑔topsuperscript𝐏𝐋12𝐏1𝑛superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖top𝐏superscript𝐏𝐋12𝐏𝐏𝑔\displaystyle=n\left|(\mathbf{P}g)^{\top}\mathbf{P}\mathbf{L}^{1/2}\mathbf{P}% \left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{P}\omega_{i})(\mathbf{P}\omega_{i})^{% \top}-\mathbf{P}\right)\mathbf{P}\mathbf{L}^{1/2}\mathbf{P}(\mathbf{P}g)\right|= italic_n | ( bold_P italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_P ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_P ) bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_P ( bold_P italic_g ) |
n(1ni=1n(𝐏ωi)(𝐏ωi)𝐏)2𝐏𝐋1/2𝐏(𝐏g)k2absent𝑛subscriptnorm1𝑛superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖top𝐏2superscriptsubscriptnormsuperscript𝐏𝐋12𝐏𝐏𝑔subscript𝑘2\displaystyle\leq n\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{P}\omega_{i}% )(\mathbf{P}\omega_{i})^{\top}-\mathbf{P}\right)\right\|_{2}\cdot\|\mathbf{P}% \mathbf{L}^{1/2}\mathbf{P}(\mathbf{P}g)\|_{\mathcal{H}_{k}}^{2}≤ italic_n ∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_P ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ∥ bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_P ( bold_P italic_g ) ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
n(1ni=1n(𝐏ωi)(𝐏ωi)𝐏)2(g𝐏𝐋𝐏g)absent𝑛subscriptnorm1𝑛superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖top𝐏2superscript𝑔top𝐏𝐋𝐏𝑔\displaystyle\leq n\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{P}\omega_{i}% )(\mathbf{P}\omega_{i})^{\top}-\mathbf{P}\right)\right\|_{2}\cdot(g^{\top}% \mathbf{P}\mathbf{L}\mathbf{P}g)≤ italic_n ∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_P ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_PLP italic_g )
(1ni=1n(𝐏ωi)(𝐏ωi)𝐏)2(n𝐋2)𝐏gk2.absentsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖top𝐏2𝑛subscriptnorm𝐋2superscriptsubscriptnorm𝐏𝑔subscript𝑘2\displaystyle\leq\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{P}\omega_{i})(% \mathbf{P}\omega_{i})^{\top}-\mathbf{P}\right)\right\|_{2}\cdot(n\|\mathbf{L}% \|_{2})\cdot\|\mathbf{P}g\|_{\mathcal{H}_{k}}^{2}.≤ ∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_P ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ( italic_n ∥ bold_L ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⋅ ∥ bold_P italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (11)

In the above equations, we used the fact that for any diagonal matrix D𝐷Ditalic_D, 𝐏D=D𝐏=𝐏D𝐏𝐏𝐷𝐷𝐏𝐏𝐷𝐏\mathbf{P}D=D\mathbf{P}=\mathbf{P}D\mathbf{P}bold_P italic_D = italic_D bold_P = bold_P italic_D bold_P and that 𝐏2=𝐏superscript𝐏2𝐏\mathbf{P}^{2}=\mathbf{P}bold_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = bold_P. Firstly, note that 𝐋2=maxjλj/(nλj+τ)1/nsubscriptnorm𝐋2subscript𝑗subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏1𝑛\|\mathbf{L}\|_{2}=\max_{j\in\mathbb{N}}\lambda_{j}/(n\lambda_{j}+\tau)\leq 1/n∥ bold_L ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / ( italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ ) ≤ 1 / italic_n. Consequently, n𝐋21𝑛subscriptnorm𝐋21n\|\mathbf{L}\|_{2}\leq 1italic_n ∥ bold_L ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1. Secondly, to bound the first term on the RHS, we denote 𝐏ωi:=Aiassign𝐏subscript𝜔𝑖subscript𝐴𝑖\mathbf{P}\omega_{i}:=A_{i}bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n. We have, 𝔼[AiAi]=𝐏𝔼[ωiωi]𝐏=𝐏𝚲1/2𝔼[ψxiψxiT]𝚲1/2𝐏=𝐏𝚲1/2𝚲𝚲1/2𝐏=𝐏𝔼delimited-[]subscript𝐴𝑖superscriptsubscript𝐴𝑖top𝐏𝔼delimited-[]subscript𝜔𝑖superscriptsubscript𝜔𝑖top𝐏𝐏superscript𝚲12𝔼delimited-[]subscript𝜓subscript𝑥𝑖superscriptsubscript𝜓subscript𝑥𝑖𝑇superscript𝚲12𝐏𝐏superscript𝚲12𝚲superscript𝚲12𝐏𝐏\mathbb{E}[A_{i}A_{i}^{\top}]=\mathbf{P}\mathbb{E}[\omega_{i}\omega_{i}^{\top}% ]\mathbf{P}=\mathbf{P}\boldsymbol{\Lambda}^{-1/2}\mathbb{E}[\psi_{x_{i}}\psi_{% x_{i}}^{T}]\boldsymbol{\Lambda}^{-1/2}\mathbf{P}=\mathbf{P}\boldsymbol{\Lambda% }^{-1/2}\boldsymbol{\Lambda}\boldsymbol{\Lambda}^{-1/2}\mathbf{P}=\mathbf{P}blackboard_E [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] = bold_P blackboard_E [ italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] bold_P = bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT blackboard_E [ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_P = bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Λ bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_P = bold_P. Moreover, for all Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, only the top R×R𝑅𝑅R\times Ritalic_R × italic_R sub-matrix has non-zero entries, implying it is sufficient to bound the 2222-norm of that finite sub-matrix to bound the first term on the RHS. We use Matrix-Chernoff inequality (Tropp, 2012, Theorem 1.1) to bound the 2222-norm of this finite dimensional submatrix.

For all i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n, let [Ai]RRsubscriptdelimited-[]subscript𝐴𝑖𝑅superscript𝑅[A_{i}]_{R}\in\mathbb{R}^{R}[ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT denote the R𝑅Ritalic_R-dimensional vector corresponding to the first R𝑅Ritalic_R coordinates of Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Thus, we are interested in applying the Matrix-Chernoff inequality to bound the following expression:

E11:=(1ni=1n[Ai]R[Ai]RIR)2,assignsubscript𝐸11subscriptnorm1𝑛superscriptsubscript𝑖1𝑛subscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅topsubscript𝐼𝑅2\displaystyle E_{11}:=\left\|\left(\frac{1}{n}\sum_{i=1}^{n}[A_{i}]_{R}[A_{i}]% _{R}^{\top}-I_{R}\right)\right\|_{2},italic_E start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT := ∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_I start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where IRsubscript𝐼𝑅I_{R}italic_I start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT denotes the R𝑅Ritalic_R dimensional identity matrix. Here, we used the fact that the relevant R×R𝑅𝑅R\times Ritalic_R × italic_R sub-matrix of 𝐏𝐏\mathbf{P}bold_P, or equivalently 𝔼[[A1]R[A1]R]𝔼delimited-[]subscriptdelimited-[]subscript𝐴1𝑅superscriptsubscriptdelimited-[]subscript𝐴1𝑅top\mathbb{E}[[A_{1}]_{R}[A_{1}]_{R}^{\top}]blackboard_E [ [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ], corresponds to IRsubscript𝐼𝑅I_{R}italic_I start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. To invoke the Matrix-Chernoff inequality, we need bounds on the maximum and minimum eigenvalue of 𝔼[1ni=1n[Ai]R[Ai]R]𝔼delimited-[]1𝑛superscriptsubscript𝑖1𝑛subscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅top\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}[A_{i}]_{R}[A_{i}]_{R}^{% \top}\right]blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] and a bound on [Ai]R[Ai]R/n2subscriptnormsubscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅top𝑛2\|[A_{i}]_{R}[A_{i}]_{R}^{\top}/n\|_{2}∥ [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT / italic_n ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that holds almost surely for all i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n. Since 𝔼[[A1]R[A1]R]=IR𝔼delimited-[]subscriptdelimited-[]subscript𝐴1𝑅superscriptsubscriptdelimited-[]subscript𝐴1𝑅topsubscript𝐼𝑅\mathbb{E}[[A_{1}]_{R}[A_{1}]_{R}^{\top}]=I_{R}blackboard_E [ [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] = italic_I start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, 𝔼[1ni=1n[Ai]R[Ai]R]=IR𝔼delimited-[]1𝑛superscriptsubscript𝑖1𝑛subscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅topsubscript𝐼𝑅\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}[A_{i}]_{R}[A_{i}]_{R}^{% \top}\right]=I_{R}blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] = italic_I start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT implying that both the maximum and minimum eigenvalues are 1111. For any i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n, we have,

[Ai]R[Ai]R2nsubscriptnormsubscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅top2𝑛\displaystyle\frac{\|[A_{i}]_{R}[A_{i}]_{R}^{\top}\|_{2}}{n}divide start_ARG ∥ [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG 1ntrace([Ai]R[Ai]R)1ntrace([Ai]R[Ai]R)1n𝐏ωik21nj=1Rφj2(xi)N(R)n.absent1𝑛tracesubscriptdelimited-[]subscript𝐴𝑖𝑅superscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅top1𝑛tracesuperscriptsubscriptdelimited-[]subscript𝐴𝑖𝑅topsubscriptdelimited-[]subscript𝐴𝑖𝑅1𝑛superscriptsubscriptnorm𝐏subscript𝜔𝑖subscript𝑘21𝑛superscriptsubscript𝑗1𝑅superscriptsubscript𝜑𝑗2subscript𝑥𝑖𝑁𝑅𝑛\displaystyle\leq\frac{1}{n}\mathrm{trace}([A_{i}]_{R}[A_{i}]_{R}^{\top})\leq% \frac{1}{n}\mathrm{trace}([A_{i}]_{R}^{\top}[A_{i}]_{R})\leq\frac{1}{n}\|% \mathbf{P}\omega_{i}\|_{\mathcal{H}_{k}}^{2}\leq\frac{1}{n}\sum_{j=1}^{R}% \varphi_{j}^{2}(x_{i})\leq\frac{N(R)}{n}.≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_trace ( [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_trace ( [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG .

On invoking the Matrix-Chernoff inequality with these results, we obtain that the following relation is true with probability 1δ/61𝛿61-\delta/61 - italic_δ / 6:

E113N(R)log(3R/δ)n.subscript𝐸113𝑁𝑅3𝑅𝛿𝑛\displaystyle E_{11}\leq\sqrt{\frac{3N(R)\log(3R/\delta)}{n}}.italic_E start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ≤ square-root start_ARG divide start_ARG 3 italic_N ( italic_R ) roman_log ( 3 italic_R / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG . (12)

On combining the above bound with Eqn. (11) along with noting that nL21𝑛subscriptnorm𝐿21n\|L\|_{2}\leq 1italic_n ∥ italic_L ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1, we can conclude that:

E13N(R)log(3R/δ)n𝐏gk2.subscript𝐸13𝑁𝑅3𝑅𝛿𝑛superscriptsubscriptnorm𝐏𝑔subscript𝑘2\displaystyle E_{1}\leq\sqrt{\frac{3N(R)\log(3R/\delta)}{n}}\cdot\|\mathbf{P}g% \|_{\mathcal{H}_{k}}^{2}.italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ square-root start_ARG divide start_ARG 3 italic_N ( italic_R ) roman_log ( 3 italic_R / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG ⋅ ∥ bold_P italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (13)

We would like to mention that the above bound is only valid when the RHS in Eqn. (12) is less than 1111. However, this condition is satisfied by the choice of n>N¯𝑛¯𝑁n>\overline{N}italic_n > over¯ start_ARG italic_N end_ARG.

We now consider the second term, E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We have,

E2subscript𝐸2\displaystyle E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =|g𝐐(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐐g|absentsuperscript𝑔top𝐐superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐐𝑔\displaystyle=|g^{\top}\mathbf{Q}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^% {-1/2}-\mathbf{Id})\mathbf{Q}g|= | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_Q italic_g |
=|(𝐐g)(i=1n(𝐐𝐋1/2ωi)(𝐐𝐋1/2ωi)n𝐐𝐋𝐐)(𝐐g)|absentsuperscript𝐐𝑔topsuperscriptsubscript𝑖1𝑛superscript𝐐𝐋12subscript𝜔𝑖superscriptsuperscript𝐐𝐋12subscript𝜔𝑖top𝑛𝐐𝐋𝐐𝐐𝑔\displaystyle=\left|(\mathbf{Q}g)^{\top}\left(\sum_{i=1}^{n}(\mathbf{Q}\mathbf% {L}^{1/2}\omega_{i})(\mathbf{Q}\mathbf{L}^{1/2}\omega_{i})^{\top}-n\mathbf{Q}% \mathbf{L}\mathbf{Q}\right)(\mathbf{Q}g)\right|= | ( bold_Q italic_g ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_n bold_QLQ ) ( bold_Q italic_g ) | (14)
n(1ni=1n(𝐐𝐋1/2ωi)(𝐐𝐋1/2ωi)𝐐𝐋𝐐)2:=E21𝐐gk2.absent𝑛subscriptsubscriptnorm1𝑛superscriptsubscript𝑖1𝑛superscript𝐐𝐋12subscript𝜔𝑖superscriptsuperscript𝐐𝐋12subscript𝜔𝑖top𝐐𝐋𝐐2assignabsentsubscript𝐸21superscriptsubscriptnorm𝐐𝑔subscript𝑘2\displaystyle\leq n\underbrace{\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{% Q}\mathbf{L}^{1/2}\omega_{i})(\mathbf{Q}\mathbf{L}^{1/2}\omega_{i})^{\top}-% \mathbf{Q}\mathbf{L}\mathbf{Q}\right)\right\|_{2}}_{:=E_{21}}\cdot\|\mathbf{Q}% g\|_{\mathcal{H}_{k}}^{2}.≤ italic_n under⏟ start_ARG ∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_QLQ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT := italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ ∥ bold_Q italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (15)

Note that the term E21subscript𝐸21E_{21}italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT has a similar structure as E11subscript𝐸11E_{11}italic_E start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT except for the fact that E21subscript𝐸21E_{21}italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT involves infinite-dimensional vectors as opposed to finite-dimensional vectors. Thus, to bound E21subscript𝐸21E_{21}italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT we use a result from Moeller and Ullrich (2021, Proposition 3.8) which is spectral concentration inequality for infinite-dimensional vectors derived using non-commutative Khinchtine inequality Buchholz (2001, 2005); Moeller and Ullrich (2021). From Proposition 3.83.83.83.8 in Moeller and Ullrich (2021), we can conclude that the following relation holds with probability at least 1δ/61𝛿61-\delta/61 - italic_δ / 6:

(1ni=1n(𝐐𝐋1/2ωi)(𝐐𝐋1/2ωi)𝐐𝐋𝐐)2max{42nlog(12δ)B1,B2},subscriptnorm1𝑛superscriptsubscript𝑖1𝑛superscript𝐐𝐋12subscript𝜔𝑖superscriptsuperscript𝐐𝐋12subscript𝜔𝑖top𝐐𝐋𝐐242𝑛12𝛿subscript𝐵1subscript𝐵2\displaystyle\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{Q}\mathbf{L}^{1/2}% \omega_{i})(\mathbf{Q}\mathbf{L}^{1/2}\omega_{i})^{\top}-\mathbf{Q}\mathbf{L}% \mathbf{Q}\right)\right\|_{2}\leq\max\left\{\frac{42}{n}\log\left(\frac{12}{% \delta}\right)B_{1},B_{2}\right\},∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_QLQ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ roman_max { divide start_ARG 42 end_ARG start_ARG italic_n end_ARG roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , (16)

where B1=maxi=1,2,,n𝐐𝐋1/2ωik2subscript𝐵1subscript𝑖12𝑛superscriptsubscriptnormsuperscript𝐐𝐋12subscript𝜔𝑖subscript𝑘2B_{1}=\max_{i=1,2,\dots,n}\|\mathbf{Q}\mathbf{L}^{1/2}\omega_{i}\|_{\mathcal{H% }_{k}}^{2}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i = 1 , 2 , … , italic_n end_POSTSUBSCRIPT ∥ bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and B2=𝐐𝐋𝐐2subscript𝐵2subscriptnorm𝐐𝐋𝐐2B_{2}=\|\mathbf{Q}\mathbf{L}\mathbf{Q}\|_{2}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ bold_QLQ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We can further bound the terms B1subscript𝐵1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and B2subscript𝐵2B_{2}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as follows.

B1subscript𝐵1\displaystyle B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =maxi=1,2,,n𝐐𝐋1/2ωik2=maxi=1,2,,nj=R+1λjnλj+τφj2(xi)supx𝒳1τj=R+1λjφj2(x)=T(R)τabsentsubscript𝑖12𝑛superscriptsubscriptnormsuperscript𝐐𝐋12subscript𝜔𝑖subscript𝑘2subscript𝑖12𝑛superscriptsubscript𝑗𝑅1subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2subscript𝑥𝑖subscriptsupremum𝑥𝒳1𝜏superscriptsubscript𝑗𝑅1subscript𝜆𝑗superscriptsubscript𝜑𝑗2𝑥𝑇𝑅𝜏\displaystyle=\max_{i=1,2,\dots,n}\|\mathbf{Q}\mathbf{L}^{1/2}\omega_{i}\|_{% \mathcal{H}_{k}}^{2}=\max_{i=1,2,\dots,n}\sum_{j={R+1}}^{\infty}\frac{\lambda_% {j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x_{i})\leq\sup_{x\in\mathcal{X}}\frac{1% }{\tau}\sum_{j={R+1}}^{\infty}\lambda_{j}\varphi_{j}^{2}(x)=\frac{T(R)}{\tau}= roman_max start_POSTSUBSCRIPT italic_i = 1 , 2 , … , italic_n end_POSTSUBSCRIPT ∥ bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT italic_i = 1 , 2 , … , italic_n end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) = divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG
B2subscript𝐵2\displaystyle B_{2}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =𝐐𝐋𝐐2=maxj,j>Rλjnλj+τλR+1τ.absentsubscriptnorm𝐐𝐋𝐐2subscriptformulae-sequence𝑗𝑗𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏subscript𝜆𝑅1𝜏\displaystyle=\|\mathbf{Q}\mathbf{L}\mathbf{Q}\|_{2}=\max_{j\in\mathbb{N},j>R}% \frac{\lambda_{j}}{n\lambda_{j}+\tau}\leq\frac{\lambda_{R+1}}{\tau}.= ∥ bold_QLQ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j ∈ blackboard_N , italic_j > italic_R end_POSTSUBSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG ≤ divide start_ARG italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_τ end_ARG .

On plugging this into Eqn. (16), we obtain the following bound on E21subscript𝐸21E_{21}italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT.

E211τ{42nlog(12δ)T(R),λR+1}.subscript𝐸211𝜏42𝑛12𝛿𝑇𝑅subscript𝜆𝑅1\displaystyle E_{21}\leq\frac{1}{\tau}\left\{\frac{42}{n}\log\left(\frac{12}{% \delta}\right)T(R),\lambda_{R+1}\right\}.italic_E start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG { divide start_ARG 42 end_ARG start_ARG italic_n end_ARG roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_T ( italic_R ) , italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT } . (17)

Combining Eqn. (15) and (17) yields us,

E21τ{42log(12δ)T(R),nλR+1}𝐐gk2.subscript𝐸21𝜏4212𝛿𝑇𝑅𝑛subscript𝜆𝑅1superscriptsubscriptnorm𝐐𝑔subscript𝑘2\displaystyle E_{2}\leq\frac{1}{\tau}\left\{42\log\left(\frac{12}{\delta}% \right)T(R),n\lambda_{R+1}\right\}\|\mathbf{Q}g\|_{\mathcal{H}_{k}}^{2}.italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG { 42 roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_T ( italic_R ) , italic_n italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT } ∥ bold_Q italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (18)

We now move onto the third term, E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, which contains the cross terms. For brevity of notation, we define ζi:=𝐏𝐋1/2ωiassignsubscript𝜁𝑖superscript𝐏𝐋12subscript𝜔𝑖\zeta_{i}:=\mathbf{P}\mathbf{L}^{1/2}\omega_{i}italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ξi:=𝐐𝐋1/2ωiassignsubscript𝜉𝑖superscript𝐐𝐋12subscript𝜔𝑖\xi_{i}:=\mathbf{Q}\mathbf{L}^{1/2}\omega_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,2,,n𝑖12𝑛i=1,2,\dots,nitalic_i = 1 , 2 , … , italic_n. Note that ζiξj=0superscriptsubscript𝜁𝑖topsubscript𝜉𝑗0\zeta_{i}^{\top}\xi_{j}=0italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0 for all i,j=1,2,,nformulae-sequence𝑖𝑗12𝑛i,j=1,2,\dots,nitalic_i , italic_j = 1 , 2 , … , italic_n. Since 𝐏𝐏\mathbf{P}bold_P and 𝐐𝐐\mathbf{Q}bold_Q commute with 𝐋𝐋\mathbf{L}bold_L, a diagonal matrix, it is straightforward to note that 𝐏𝐋𝐐=0𝐏𝐋𝐐0\mathbf{P}\mathbf{L}\mathbf{Q}=0bold_PLQ = 0. Using this relation along with the definition of {ζi}i=1nsuperscriptsubscriptsubscript𝜁𝑖𝑖1𝑛\{\zeta_{i}\}_{i=1}^{n}{ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and {ξi}i=1nsuperscriptsubscriptsubscript𝜉𝑖𝑖1𝑛\{\xi_{i}\}_{i=1}^{n}{ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we can rewrite E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT as follows:

E3subscript𝐸3\displaystyle E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =|g𝐏(𝐙1/2𝐙^𝐙1/2𝐈𝐝)𝐐g|absentsuperscript𝑔top𝐏superscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝐐𝑔\displaystyle=|g^{\top}\mathbf{P}(\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^% {-1/2}-\mathbf{Id})\mathbf{Q}g|= | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) bold_Q italic_g |
=|g𝐏(i=1n(𝐋1/2ωi)(𝐋1/2ωi)n𝐋)𝐐g|absentsuperscript𝑔top𝐏superscriptsubscript𝑖1𝑛superscript𝐋12subscript𝜔𝑖superscriptsuperscript𝐋12subscript𝜔𝑖top𝑛𝐋𝐐𝑔\displaystyle=\left|g^{\top}\mathbf{P}\left(\sum_{i=1}^{n}(\mathbf{L}^{1/2}% \omega_{i})(\mathbf{L}^{1/2}\omega_{i})^{\top}-n\mathbf{L}\right)\mathbf{Q}g\right|= | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_L start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - italic_n bold_L ) bold_Q italic_g |
=|i=1n(g𝐏𝐋1/2ωi)(g𝐐𝐋1/2ωi)|absentsuperscriptsubscript𝑖1𝑛superscript𝑔topsuperscript𝐏𝐋12subscript𝜔𝑖superscriptsuperscript𝑔topsuperscript𝐐𝐋12subscript𝜔𝑖top\displaystyle=\left|\sum_{i=1}^{n}(g^{\top}\mathbf{P}\mathbf{L}^{1/2}\omega_{i% })(g^{\top}\mathbf{Q}\mathbf{L}^{1/2}\omega_{i})^{\top}\right|= | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_PL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_QL start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT |
=|i=1n(gζi)(gξi):=Wi|.absentsuperscriptsubscript𝑖1𝑛subscriptsuperscript𝑔topsubscript𝜁𝑖superscript𝑔topsubscript𝜉𝑖assignabsentsubscript𝑊𝑖\displaystyle=\left|\sum_{i=1}^{n}\underbrace{(g^{\top}\zeta_{i})(g^{\top}\xi_% {i})}_{:=W_{i}}\right|.= | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT under⏟ start_ARG ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT := italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | . (19)

We use Bernstein inequality to bound the sum of the random variables Wisubscript𝑊𝑖W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for which we need the values of 𝔼[Wi]𝔼delimited-[]subscript𝑊𝑖\mathbb{E}[W_{i}]blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], 𝔼[Wi2]𝔼delimited-[]superscriptsubscript𝑊𝑖2\mathbb{E}[W_{i}^{2}]blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] and an upper bound on |Wi|subscript𝑊𝑖|W_{i}|| italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | that holds almost surely. We begin with 𝔼[Wi]𝔼delimited-[]subscript𝑊𝑖\mathbb{E}[W_{i}]blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. We have,

𝔼[Wi]=𝔼[(gζi)(gξi)]=g𝔼[ζiξi]g=0.𝔼delimited-[]subscript𝑊𝑖𝔼delimited-[]superscript𝑔topsubscript𝜁𝑖superscript𝑔topsubscript𝜉𝑖superscript𝑔top𝔼delimited-[]subscript𝜁𝑖superscriptsubscript𝜉𝑖top𝑔0\displaystyle\mathbb{E}[W_{i}]=\mathbb{E}[(g^{\top}\zeta_{i})(g^{\top}\xi_{i})% ]=g^{\top}\mathbb{E}[\zeta_{i}\xi_{i}^{\top}]g=0.blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = blackboard_E [ ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] = italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] italic_g = 0 . (20)

For an upper bound on |Wi|subscript𝑊𝑖|W_{i}|| italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, note that for any g𝑔gitalic_g with gk=1subscriptnorm𝑔subscript𝑘1\|g\|_{\mathcal{H}_{k}}=1∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1, |Wi|subscript𝑊𝑖|W_{i}|| italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | is maximized for the choice of g=ψxi𝑔subscript𝜓subscript𝑥𝑖g=\psi_{x_{i}}italic_g = italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Thus,

|Wi|subscript𝑊𝑖\displaystyle|W_{i}|| italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | =gk2(gζigk)(gξigk)absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscript𝑔topsubscript𝜁𝑖subscriptnorm𝑔subscript𝑘superscript𝑔topsubscript𝜉𝑖subscriptnorm𝑔subscript𝑘\displaystyle=\|g\|_{\mathcal{H}_{k}}^{2}\left(\frac{g^{\top}\zeta_{i}}{\|g\|_% {\mathcal{H}_{k}}}\right)\left(\frac{g^{\top}\xi_{i}}{\|g\|_{\mathcal{H}_{k}}}\right)= ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG )
gk2(ψxiζi)(ψxiξi)absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscriptsubscript𝜓subscript𝑥𝑖topsubscript𝜁𝑖superscriptsubscript𝜓subscript𝑥𝑖topsubscript𝜉𝑖\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}(\psi_{x_{i}}^{\top}\zeta_{i})(% \psi_{x_{i}}^{\top}\xi_{i})≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
gk2ζik2ξik2absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscriptsubscriptnormsubscript𝜁𝑖subscript𝑘2superscriptsubscriptnormsubscript𝜉𝑖subscript𝑘2\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\|\zeta_{i}\|_{\mathcal{H}_{k}}^{2% }\|\xi_{i}\|_{\mathcal{H}_{k}}^{2}≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
gk2(j=1Rλjnλj+τφj2(xi))(j=R+1λjnλj+τφj2(xi))absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscriptsubscript𝑗1𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2subscript𝑥𝑖superscriptsubscript𝑗𝑅1subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2subscript𝑥𝑖\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\left(\sum_{j=1}^{R}\frac{% \lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x_{i})\right)\cdot\left(\sum_{j% =R+1}^{\infty}\frac{\lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x_{i})\right)≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ⋅ ( ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )
gk2(1nj=1Rφj2(xi))(1τj=R+1λjφj2(xi))absentsuperscriptsubscriptnorm𝑔subscript𝑘21𝑛superscriptsubscript𝑗1𝑅superscriptsubscript𝜑𝑗2subscript𝑥𝑖1𝜏superscriptsubscript𝑗𝑅1subscript𝜆𝑗superscriptsubscript𝜑𝑗2subscript𝑥𝑖\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\left(\frac{1}{n}\sum_{j=1}^{% R}\varphi_{j}^{2}(x_{i})\right)\cdot\left(\frac{1}{\tau}\sum_{j=R+1}^{\infty}% \lambda_{j}\varphi_{j}^{2}(x_{i})\right)≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )
gk2N(R)nT(R)τ.absentsuperscriptsubscriptnorm𝑔subscript𝑘2𝑁𝑅𝑛𝑇𝑅𝜏\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\frac{N(R)}{n}\cdot\frac{T(R)% }{\tau}.≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ⋅ divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG . (21)

From the above expressions, we can also conclude that |gζi|gkN(R)nsuperscript𝑔topsubscript𝜁𝑖subscriptnorm𝑔subscript𝑘𝑁𝑅𝑛|g^{\top}\zeta_{i}|\leq\|g\|_{\mathcal{H}_{k}}\cdot\dfrac{N(R)}{n}| italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG and |gξi|gkT(R)τsuperscript𝑔topsubscript𝜉𝑖subscriptnorm𝑔subscript𝑘𝑇𝑅𝜏|g^{\top}\xi_{i}|\leq\|g\|_{\mathcal{H}_{k}}\cdot\dfrac{T(R)}{\tau}| italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG. We use these relations to obtain a bound on 𝔼[Wi2]𝔼delimited-[]superscriptsubscript𝑊𝑖2\mathbb{E}[W_{i}^{2}]blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. We have,

𝔼[Wi2]𝔼delimited-[]superscriptsubscript𝑊𝑖2\displaystyle\mathbb{E}[W_{i}^{2}]blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] =𝔼[(gζi)2(gξi)2]absent𝔼delimited-[]superscriptsuperscript𝑔topsubscript𝜁𝑖2superscriptsuperscript𝑔topsubscript𝜉𝑖2\displaystyle=\mathbb{E}[(g^{\top}\zeta_{i})^{2}(g^{\top}\xi_{i})^{2}]= blackboard_E [ ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
gk2min{𝔼[(gζi)2](T(R)τ)2,𝔼[(gξi)2](N(R)n)2}absentsuperscriptsubscriptnorm𝑔subscript𝑘2𝔼delimited-[]superscriptsuperscript𝑔topsubscript𝜁𝑖2superscript𝑇𝑅𝜏2𝔼delimited-[]superscriptsuperscript𝑔topsubscript𝜉𝑖2superscript𝑁𝑅𝑛2\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\min\left\{\mathbb{E}\left[(g% ^{\top}\zeta_{i})^{2}\right]\left(\frac{T(R)}{\tau}\right)^{2},\mathbb{E}\left% [(g^{\top}\xi_{i})^{2}\right]\left(\frac{N(R)}{n}\right)^{2}\right\}≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ roman_min { blackboard_E [ ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , blackboard_E [ ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }
gk2min{(g𝐏L𝐏g)(T(R)τ)2,(g𝐐L𝐐g)(N(R)n)2}absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscript𝑔top𝐏𝐿𝐏𝑔superscript𝑇𝑅𝜏2superscript𝑔top𝐐𝐿𝐐𝑔superscript𝑁𝑅𝑛2\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\min\left\{(g^{\top}\mathbf{P% }L\mathbf{P}g)\cdot\left(\frac{T(R)}{\tau}\right)^{2},(g^{\top}\mathbf{Q}L% \mathbf{Q}g)\cdot\left(\frac{N(R)}{n}\right)^{2}\right\}≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ roman_min { ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P italic_L bold_P italic_g ) ⋅ ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q italic_L bold_Q italic_g ) ⋅ ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }
gk2min{𝐏gk2n(T(R)τ)2,λR+1𝐐gk2τ(N(R)n)2}.absentsuperscriptsubscriptnorm𝑔subscript𝑘2superscriptsubscriptnorm𝐏𝑔subscript𝑘2𝑛superscript𝑇𝑅𝜏2subscript𝜆𝑅1superscriptsubscriptnorm𝐐𝑔subscript𝑘2𝜏superscript𝑁𝑅𝑛2\displaystyle\leq\|g\|_{\mathcal{H}_{k}}^{2}\cdot\min\left\{\frac{\|\mathbf{P}% g\|_{\mathcal{H}_{k}}^{2}}{n}\cdot\left(\frac{T(R)}{\tau}\right)^{2},\frac{% \lambda_{R+1}\|\mathbf{Q}g\|_{\mathcal{H}_{k}}^{2}}{\tau}\cdot\left(\frac{N(R)% }{n}\right)^{2}\right\}.≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ roman_min { divide start_ARG ∥ bold_P italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , divide start_ARG italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT ∥ bold_Q italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ end_ARG ⋅ ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . (22)

In the last step, we used the bounds on 𝐋2subscriptnorm𝐋2\|\mathbf{L}\|_{2}∥ bold_L ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐐𝐋𝐐2subscriptnorm𝐐𝐋𝐐2\|\mathbf{Q}\mathbf{L}\mathbf{Q}\|_{2}∥ bold_QLQ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT derived in the earlier part of the proof. Lastly, since 𝔼[Wi]=0𝔼delimited-[]subscript𝑊𝑖0\mathbb{E}[W_{i}]=0blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0, Var(Wi)=𝔼[Wi2]Varsubscript𝑊𝑖𝔼delimited-[]superscriptsubscript𝑊𝑖2\text{Var}(W_{i})=\mathbb{E}[W_{i}^{2}]Var ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. On applying Bernstein inequality (Wasserman, 2008, Lemma 7.37) using the relations from Eqns. (20), (21) and (22), we can conclude that the following relation holds with probability 1δ/61𝛿61-\delta/61 - italic_δ / 6:

E3subscript𝐸3\displaystyle E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =|i=1n(gζi)(gξi)|absentsuperscriptsubscript𝑖1𝑛superscript𝑔topsubscript𝜁𝑖superscript𝑔topsubscript𝜉𝑖\displaystyle=\left|\sum_{i=1}^{n}(g^{\top}\zeta_{i})(g^{\top}\xi_{i})\right|= | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |
gk2nlog(6δ)min{𝐏gk2n(T(R)τ)2,λR+1𝐐gk2τ(N(R)n)2}absentsubscriptnorm𝑔subscript𝑘2𝑛6𝛿superscriptsubscriptnorm𝐏𝑔subscript𝑘2𝑛superscript𝑇𝑅𝜏2subscript𝜆𝑅1superscriptsubscriptnorm𝐐𝑔subscript𝑘2𝜏superscript𝑁𝑅𝑛2\displaystyle\leq\|g\|_{\mathcal{H}_{k}}\cdot\sqrt{2n\log\left(\frac{6}{\delta% }\right)\min\left\{\frac{\|\mathbf{P}g\|_{\mathcal{H}_{k}}^{2}}{n}\cdot\left(% \frac{T(R)}{\tau}\right)^{2},\frac{\lambda_{R+1}\|\mathbf{Q}g\|_{\mathcal{H}_{% k}}^{2}}{\tau}\cdot\left(\frac{N(R)}{n}\right)^{2}\right\}}≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ square-root start_ARG 2 italic_n roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) roman_min { divide start_ARG ∥ bold_P italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , divide start_ARG italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT ∥ bold_Q italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ end_ARG ⋅ ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_ARG
+gk22N(R)3nT(R)τlog(6δ).superscriptsubscriptnorm𝑔subscript𝑘22𝑁𝑅3𝑛𝑇𝑅𝜏6𝛿\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{% }~{}+\|g\|_{\mathcal{H}_{k}}^{2}\cdot\frac{2N(R)}{3n}\cdot\frac{T(R)}{\tau}% \cdot\log\left(\frac{6}{\delta}\right).+ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ divide start_ARG 2 italic_N ( italic_R ) end_ARG start_ARG 3 italic_n end_ARG ⋅ divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ⋅ roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) . (23)

On plugging the results from Eqns. (13), (18) and (23) into Eqn. (10), we obtain

𝐙1/2𝐙^𝐙1/2𝐈𝐝2subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝2\displaystyle\|\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}% \|_{2}∥ bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =supg:gk1|g(𝐙1/2𝐙^𝐙1/2𝐈𝐝)g|absentsubscriptsupremum:𝑔subscriptnorm𝑔subscript𝑘1superscript𝑔topsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝𝑔\displaystyle=\sup_{g:\|g\|_{\mathcal{H}_{k}}\leq 1}|g^{\top}(\mathbf{Z}^{-1/2% }\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id})g|= roman_sup start_POSTSUBSCRIPT italic_g : ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ 1 end_POSTSUBSCRIPT | italic_g start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ) italic_g |
supg:gk1[3N(R)log(6R/δ)n𝐏gk2+1τmax{42log(12δ)T(R),nλR+1}𝐐gk2\displaystyle\leq\sup_{g:\|g\|_{\mathcal{H}_{k}}\leq 1}\bigg{[}\sqrt{\frac{3N(% R)\log(6R/\delta)}{n}}\|\mathbf{P}g\|_{\mathcal{H}_{k}}^{2}+\frac{1}{\tau}\max% \left\{42\log\left(\frac{12}{\delta}\right)T(R),n\lambda_{R+1}\right\}\|% \mathbf{Q}g\|_{\mathcal{H}_{k}}^{2}≤ roman_sup start_POSTSUBSCRIPT italic_g : ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ 1 end_POSTSUBSCRIPT [ square-root start_ARG divide start_ARG 3 italic_N ( italic_R ) roman_log ( 6 italic_R / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG ∥ bold_P italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG roman_max { 42 roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_T ( italic_R ) , italic_n italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT } ∥ bold_Q italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2gk2nlog(6δ)min{𝐏fk2n(T(R)τ)2,λR+1𝐐fk2τ(N(R)n)2}2subscriptnorm𝑔subscript𝑘2𝑛6𝛿superscriptsubscriptnorm𝐏𝑓subscript𝑘2𝑛superscript𝑇𝑅𝜏2subscript𝜆𝑅1superscriptsubscriptnorm𝐐𝑓subscript𝑘2𝜏superscript𝑁𝑅𝑛2\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2\|g\|_{\mathcal{H}_{k}}% \sqrt{2n\log\left(\frac{6}{\delta}\right)\min\left\{\frac{\|\mathbf{P}f\|_{% \mathcal{H}_{k}}^{2}}{n}\cdot\left(\frac{T(R)}{\tau}\right)^{2},\frac{\lambda_% {R+1}\|\mathbf{Q}f\|_{\mathcal{H}_{k}}^{2}}{\tau}\cdot\left(\frac{N(R)}{n}% \right)^{2}\right\}}+ 2 ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT square-root start_ARG 2 italic_n roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) roman_min { divide start_ARG ∥ bold_P italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ⋅ ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , divide start_ARG italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT ∥ bold_Q italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ end_ARG ⋅ ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_ARG
+gk24N(R)3nT(R)τlog(6δ)]\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+\|g% \|_{\mathcal{H}_{k}}^{2}\cdot\frac{4N(R)}{3n}\cdot\frac{T(R)}{\tau}\cdot\log% \left(\frac{6}{\delta}\right)\bigg{]}+ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ divide start_ARG 4 italic_N ( italic_R ) end_ARG start_ARG 3 italic_n end_ARG ⋅ divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ⋅ roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) ]
[3N(R)log(6R/δ)n+1τmax{42log(12δ)T(R),nλR+1}\displaystyle\leq\bigg{[}\sqrt{\frac{3N(R)\log(6R/\delta)}{n}}+\frac{1}{\tau}% \max\left\{42\log\left(\frac{12}{\delta}\right)T(R),n\lambda_{R+1}\right\}≤ [ square-root start_ARG divide start_ARG 3 italic_N ( italic_R ) roman_log ( 6 italic_R / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG roman_max { 42 roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_T ( italic_R ) , italic_n italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT }
+22nlog(6δ)min{1n(T(R)τ)2,λR+1τ(N(R)n)2}+4N(R)T(R)3nτlog(6δ)]\displaystyle~{}~{}~{}~{}+2\sqrt{2n\log\left(\frac{6}{\delta}\right)\min\left% \{\frac{1}{n}\cdot\left(\frac{T(R)}{\tau}\right)^{2},\frac{\lambda_{R+1}}{\tau% }\cdot\left(\frac{N(R)}{n}\right)^{2}\right\}}+\frac{4N(R)T(R)}{3n\tau}\log% \left(\frac{6}{\delta}\right)\bigg{]}+ 2 square-root start_ARG 2 italic_n roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) roman_min { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ⋅ ( divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , divide start_ARG italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_τ end_ARG ⋅ ( divide start_ARG italic_N ( italic_R ) end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_ARG + divide start_ARG 4 italic_N ( italic_R ) italic_T ( italic_R ) end_ARG start_ARG 3 italic_n italic_τ end_ARG roman_log ( divide start_ARG 6 end_ARG start_ARG italic_δ end_ARG ) ]

On plugging in any value of R(n,τ,δ)𝑅𝑛𝜏𝛿R\in\mathcal{R}(n,\tau,\delta)italic_R ∈ caligraphic_R ( italic_n , italic_τ , italic_δ ) and using the definition of n,τ,δsubscript𝑛𝜏𝛿\mathcal{R}_{n,\tau,\delta}caligraphic_R start_POSTSUBSCRIPT italic_n , italic_τ , italic_δ end_POSTSUBSCRIPT along with the relation nN¯𝑛¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG, we can conclude that 𝐙1/2𝐙^𝐙1/2𝐈𝐝21/9subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝219\|\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}\mathbf{Z}^{-1/2}-\mathbf{Id}\|_{2}\leq 1/9∥ bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 / 9 with probability at least 1δ/21𝛿21-\delta/21 - italic_δ / 2. The overall probability on the bound is obtained using a union bound for the relations on E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

A.2 Proof of Lemma 3.3

We begin the proof by showing that we can relate the ψx𝐙^1ψxsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to ψx𝐙1ψxsuperscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\psi_{x}^{\top}{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT through the operator norm of 𝐌:=𝐙^1/2(𝐙𝐙^)𝐙1/2assign𝐌superscript^𝐙12𝐙^𝐙superscript𝐙12\mathbf{M}:=\hat{\mathbf{Z}}^{-1/2}(\mathbf{Z}-\hat{\mathbf{Z}})\mathbf{Z}^{-1% /2}bold_M := over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_Z - over^ start_ARG bold_Z end_ARG ) bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT. Specifically, we show if that operator norm of 𝐌𝐌\mathbf{M}bold_M is small, then ψx𝐙^1ψxsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and ψx𝐙1ψxsuperscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\psi_{x}^{\top}{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT are within a constant factor of each other. Lastly, we use the condition on 𝐙12𝐙^𝐙12𝐈𝐝2subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝2\|\mathbf{Z}^{-\frac{1}{2}}\hat{\mathbf{Z}}\mathbf{Z}^{-\frac{1}{2}}-\mathbf{% Id}\|_{2}∥ bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to bound the 𝐌opsubscriptnorm𝐌op\|\mathbf{M}\|_{\text{op}}∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT, the operator norm of 𝐌𝐌\mathbf{M}bold_M, to obtain the required result.

We begin with considering the following expression.

|ψx(𝐙^1𝐙1)ψx|superscriptsubscript𝜓𝑥topsuperscript^𝐙1superscript𝐙1subscript𝜓𝑥\displaystyle\left|\psi_{x}^{\top}(\hat{\mathbf{Z}}^{-1}-\mathbf{Z}^{-1})\psi_% {x}\right|| italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT | =|ψx𝐙^1(𝐙𝐙^)𝐙1ψx|absentsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1𝐙^𝐙superscript𝐙1subscript𝜓𝑥\displaystyle=\left|\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}(\mathbf{Z}-\hat{% \mathbf{Z}})\mathbf{Z}^{-1}\psi_{x}\right|= | italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_Z - over^ start_ARG bold_Z end_ARG ) bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT |
=|ψx𝐙^1/2𝐙^1/2(𝐙𝐙^)𝐙1/2𝐙1/2ψx|absentsuperscriptsubscript𝜓𝑥topsuperscript^𝐙12superscript^𝐙12𝐙^𝐙superscript𝐙12superscript𝐙12subscript𝜓𝑥\displaystyle=\left|\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1/2}\cdot\hat{\mathbf{Z}% }^{-1/2}(\mathbf{Z}-\hat{\mathbf{Z}})\mathbf{Z}^{-1/2}\cdot\mathbf{Z}^{-1/2}% \psi_{x}\right|= | italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ⋅ over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_Z - over^ start_ARG bold_Z end_ARG ) bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ⋅ bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT |
𝐙^1/2ψxk𝐙1/2ψxk𝐙^1/2(𝐙𝐙^)𝐙1/2opabsentsubscriptnormsuperscript^𝐙12subscript𝜓𝑥subscript𝑘subscriptnormsuperscript𝐙12subscript𝜓𝑥subscript𝑘subscriptnormsuperscript^𝐙12𝐙^𝐙superscript𝐙12op\displaystyle\leq\|\hat{\mathbf{Z}}^{-1/2}\psi_{x}\|_{\mathcal{H}_{k}}\|{% \mathbf{Z}}^{-1/2}\psi_{x}\|_{\mathcal{H}_{k}}\|\hat{\mathbf{Z}}^{-1/2}(% \mathbf{Z}-\hat{\mathbf{Z}})\mathbf{Z}^{-1/2}\|_{\text{op}}≤ ∥ over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( bold_Z - over^ start_ARG bold_Z end_ARG ) bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT
(ψx𝐙^1ψx)(ψx𝐙1ψx)𝐌op.absentsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥subscriptnorm𝐌op\displaystyle\leq\sqrt{(\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x})}\cdot% \sqrt{(\psi_{x}^{\top}\mathbf{Z}^{-1}\psi_{x})}\cdot\|\mathbf{M}\|_{\text{op}}.≤ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ ∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT . (24)

Consider the scenario where the relation 𝐌opcsubscriptnorm𝐌op𝑐\|\mathbf{M}\|_{\text{op}}\leq c∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT ≤ italic_c is satisfied for some c(0,1)𝑐01c\in(0,1)italic_c ∈ ( 0 , 1 ). We claim that under this scenario, we have, ψx𝐙^1ψx(1c)1ψx𝐙1ψxsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥superscript1𝑐1superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}\leq(1-c)^{-1}\cdot\psi_{x}^{\top}% {\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ ( 1 - italic_c ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT. To show this claim, we consider Eqn. (24). If ψx𝐙1ψxψx𝐙^1ψxsuperscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥superscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\psi_{x}^{\top}{\mathbf{Z}}^{-1}\psi_{x}\geq\psi_{x}^{\top}\hat{\mathbf{Z}}^{-% 1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≥ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, the claim follows immediately. For the other case, we have,

ψx𝐙^1ψxψx𝐙1ψxsubscript𝜓𝑥superscript^𝐙1subscript𝜓𝑥subscript𝜓𝑥superscript𝐙1subscript𝜓𝑥\displaystyle\psi_{x}\hat{\mathbf{Z}}^{-1}\psi_{x}-\psi_{x}{\mathbf{Z}}^{-1}% \psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (ψx𝐙^1ψx)(ψx𝐙1ψx)cabsentsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥𝑐\displaystyle\leq\sqrt{(\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x})}\cdot% \sqrt{(\psi_{x}^{\top}\mathbf{Z}^{-1}\psi_{x})}\cdot c≤ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ italic_c
(ψx𝐙^1ψx)(ψx𝐙^1ψx)cabsentsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥superscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥𝑐\displaystyle\leq\sqrt{(\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x})}\cdot% \sqrt{(\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x})}\cdot c≤ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ square-root start_ARG ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) end_ARG ⋅ italic_c
c(ψx𝐙^1ψx)absent𝑐superscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\displaystyle\leq c\cdot(\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x})≤ italic_c ⋅ ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT )
ψx𝐙^1ψxabsentsubscript𝜓𝑥superscript^𝐙1subscript𝜓𝑥\displaystyle\implies\psi_{x}\hat{\mathbf{Z}}^{-1}\psi_{x}⟹ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (ψx𝐙1ψx)11c,absentsubscript𝜓𝑥superscript𝐙1subscript𝜓𝑥11𝑐\displaystyle\leq\left(\psi_{x}{\mathbf{Z}}^{-1}\psi_{x}\right)\cdot\frac{1}{1% -c},≤ ( italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) ⋅ divide start_ARG 1 end_ARG start_ARG 1 - italic_c end_ARG ,

as claimed. Thus, it suffices to show that 𝐌opsubscriptnorm𝐌op\|\mathbf{M}\|_{\text{op}}∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT is small.

To that effect, note that we can write the operator 𝐌𝐌\mathbf{M}bold_M as 𝐌=𝐙^1/2𝐙1/2𝐙^1/2𝐙1/2=𝐂1𝐂𝐌superscript^𝐙12superscript𝐙12superscript^𝐙12superscript𝐙12superscript𝐂1superscript𝐂top\mathbf{M}=\hat{\mathbf{Z}}^{-1/2}\mathbf{Z}^{1/2}-\hat{\mathbf{Z}}^{1/2}% \mathbf{Z}^{-1/2}=\mathbf{C}^{-1}-\mathbf{C}^{\top}bold_M = over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT - over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT = bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_C start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT where, 𝐂:=𝐙1/2𝐙^1/2assign𝐂superscript𝐙12superscript^𝐙12\mathbf{C}:=\mathbf{Z}^{-1/2}\hat{\mathbf{Z}}^{1/2}bold_C := bold_Z start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. Consequently, using the definition of operator norm yields us,

𝐌op2=𝐌𝐌2superscriptsubscriptnorm𝐌op2subscriptnormsuperscript𝐌top𝐌2\displaystyle\|\mathbf{M}\|_{\text{op}}^{2}=\|\mathbf{M}^{\top}\mathbf{M}\|_{2}∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_M start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_M ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =((𝐂)1𝐂)(𝐂1𝐂)2absentsubscriptnormsuperscriptsuperscript𝐂top1𝐂superscript𝐂1superscript𝐂top2\displaystyle=\|((\mathbf{C}^{\top})^{-1}-\mathbf{C})(\mathbf{C}^{-1}-\mathbf{% C}^{\top})\|_{2}= ∥ ( ( bold_C start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_C ) ( bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_C start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=(𝐂𝐂)1Id+𝐂𝐂Id2absentsubscriptnormsuperscriptsuperscript𝐂𝐂top1Idsuperscript𝐂𝐂topId2\displaystyle=\|(\mathbf{C}\mathbf{C}^{\top})^{-1}-\text{Id}+\mathbf{C}\mathbf% {C}^{\top}-\text{Id}\|_{2}= ∥ ( bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - Id + bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
(𝐂𝐂)1Id2+𝐂𝐂Id2.absentsubscriptnormsuperscriptsuperscript𝐂𝐂top1Id2subscriptnormsuperscript𝐂𝐂topId2\displaystyle\leq\|(\mathbf{C}\mathbf{C}^{\top})^{-1}-\text{Id}\|_{2}+\|% \mathbf{C}\mathbf{C}^{\top}-\text{Id}\|_{2}.≤ ∥ ( bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (25)

From the definition of 𝐂𝐂\mathbf{C}bold_C, we have 𝐂𝐂Id2=𝐙12𝐙^𝐙12𝐈𝐝2bsubscriptnormsuperscript𝐂𝐂topId2subscriptnormsuperscript𝐙12^𝐙superscript𝐙12𝐈𝐝2𝑏\|\mathbf{C}\mathbf{C}^{\top}-\text{Id}\|_{2}=\|\mathbf{Z}^{-\frac{1}{2}}\hat{% \mathbf{Z}}\mathbf{Z}^{-\frac{1}{2}}-\mathbf{Id}\|_{2}\leq b∥ bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG bold_Z start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT - bold_Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b, from the given statement in the Lemma. Note that if 𝐂𝐂Id2bsubscriptnormsuperscript𝐂𝐂topId2𝑏\|\mathbf{C}\mathbf{C}^{\top}-\text{Id}\|_{2}\leq b∥ bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b for some b(0,1/3)𝑏013b\in(0,1/3)italic_b ∈ ( 0 , 1 / 3 ), then all eigenvalues of 𝐂𝐂superscript𝐂𝐂top\mathbf{C}\mathbf{C}^{\top}bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT lie in the interval [1b,1+b]1𝑏1𝑏[1-b,1+b][ 1 - italic_b , 1 + italic_b ]. This implies that all the eigenvalues of (𝐂𝐂)1superscriptsuperscript𝐂𝐂top1(\mathbf{C}\mathbf{C}^{\top})^{-1}( bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT lie in the interval [(1+b)1,(1b)1]superscript1𝑏1superscript1𝑏1[(1+b)^{-1},(1-b)^{-1}][ ( 1 + italic_b ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , ( 1 - italic_b ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ]. Hence, (𝐂𝐂)1Id2b/(1b)subscriptnormsuperscriptsuperscript𝐂𝐂top1Id2𝑏1𝑏\|(\mathbf{C}\mathbf{C}^{\top})^{-1}-\text{Id}\|_{2}\leq b/(1-b)∥ ( bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b / ( 1 - italic_b ). On combining this with Eqn. (25), we can conclude that if 𝐂𝐂Id2bsubscriptnormsuperscript𝐂𝐂topId2𝑏\|\mathbf{C}\mathbf{C}^{\top}-\text{Id}\|_{2}\leq b∥ bold_CC start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - Id ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_b, then 𝐌op2b/(1b)<1subscriptnorm𝐌op2𝑏1𝑏1\|\mathbf{M}\|_{\text{op}}\leq\sqrt{2b/(1-b)}<1∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT ≤ square-root start_ARG 2 italic_b / ( 1 - italic_b ) end_ARG < 1. On combining this with the previous claim that relates ψx𝐙^1ψxsuperscriptsubscript𝜓𝑥topsuperscript^𝐙1subscript𝜓𝑥\psi_{x}^{\top}\hat{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Z end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to ψx𝐙1ψxsuperscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\psi_{x}^{\top}{\mathbf{Z}}^{-1}\psi_{x}italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT through 𝐌opsubscriptnorm𝐌op\|\mathbf{M}\|_{\text{op}}∥ bold_M ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT, we arrive at the result.

A.3 Proof of Lemma 3.4

Similar to the analysis in Appendix A.1, we fix an R(n,τ,δ)𝑅𝑛𝜏𝛿R\in\mathcal{R}(n,\tau,\delta)italic_R ∈ caligraphic_R ( italic_n , italic_τ , italic_δ ) and define projection matrices 𝐏𝐏\mathbf{P}bold_P and 𝐐𝐐\mathbf{Q}bold_Q using the value of R𝑅Ritalic_R as defined in Appendix A.1. We define the projection of the kernel operator k(,)𝑘k(\cdot,\cdot)italic_k ( ⋅ , ⋅ ) on the subspaces spanned by 𝐏𝐏\mathbf{P}bold_P and 𝐐𝐐\mathbf{Q}bold_Q as follows:

k(𝐏)(x,y)=j=1Rλjφj(x)φj(y);k(𝐐)(x,y)=k(x,y)k(𝐏)(x,y).formulae-sequencesuperscript𝑘𝐏𝑥𝑦superscriptsubscript𝑗1𝑅subscript𝜆𝑗subscript𝜑𝑗𝑥subscript𝜑𝑗𝑦superscript𝑘𝐐𝑥𝑦𝑘𝑥𝑦superscript𝑘𝐏𝑥𝑦\displaystyle k^{(\mathbf{P})}(x,y)=\sum_{j=1}^{R}\lambda_{j}\varphi_{j}(x)% \varphi_{j}(y);\quad k^{(\mathbf{Q})}(x,y)=k(x,y)-k^{(\mathbf{P})}(x,y).italic_k start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT ( italic_x , italic_y ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) ; italic_k start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT ( italic_x , italic_y ) = italic_k ( italic_x , italic_y ) - italic_k start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT ( italic_x , italic_y ) .

Recall that γ~Xn,τsubscript~𝛾subscript𝑋𝑛𝜏\tilde{\gamma}_{X_{n},\tau}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT denotes the information gain corresponding to the randomly drawn set of points Xn={x1,x2,,xn}subscript𝑋𝑛subscript𝑥1subscript𝑥2subscript𝑥𝑛X_{n}=\{x_{1},x_{2},\dots,x_{n}\}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. Similar to KXn,Xnsubscript𝐾subscript𝑋𝑛subscript𝑋𝑛K_{X_{n},X_{n}}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we also define KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT as KXn,Xn(𝐏)=[k(𝐏)(xi,xj)]i,j=1nsubscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛superscriptsubscriptdelimited-[]superscript𝑘𝐏subscript𝑥𝑖subscript𝑥𝑗𝑖𝑗1𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}=[k^{(\mathbf{P})}(x_{i},x_{j})]_{i,j=1}^{n}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ italic_k start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and KXn,Xn(𝐐)=[k(𝐐)(xi,xj)]i,j=1nsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛superscriptsubscriptdelimited-[]superscript𝑘𝐐subscript𝑥𝑖subscript𝑥𝑗𝑖𝑗1𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}=[k^{(\mathbf{Q})}(x_{i},x_{j})]_{i,j=1}^{n}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ italic_k start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. It is straightforward to note that KXn,Xn=KXn,Xn(𝐏)+KXn,Xn(𝐐)subscript𝐾subscript𝑋𝑛subscript𝑋𝑛subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K_{X_{n},X_{n}}=K^{(\mathbf{P})}_{X_{n},X_{n}}+K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

We first derive some auxiliary results on the spectrum of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT which will be useful in the analysis later. Recall that we defined 𝚿n:=[ψx1,ψx2,,ψxn]assignsubscript𝚿𝑛subscript𝜓subscript𝑥1subscript𝜓subscript𝑥2subscript𝜓subscript𝑥𝑛\boldsymbol{\Psi}_{n}:=[\psi_{x_{1}},\psi_{x_{2}},\dots,\psi_{x_{n}}]bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := [ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ]. We can also rewrite KXn,Xnsubscript𝐾subscript𝑋𝑛subscript𝑋𝑛K_{X_{n},X_{n}}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT in terms of 𝚿nsubscript𝚿𝑛\boldsymbol{\Psi}_{n}bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as: KXn,Xn=𝚿n𝚿nsubscript𝐾subscript𝑋𝑛subscript𝑋𝑛superscriptsubscript𝚿𝑛topsubscript𝚿𝑛K_{X_{n},X_{n}}=\boldsymbol{\Psi}_{n}^{\top}\boldsymbol{\Psi}_{n}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, KXn,Xn(𝐏)=𝚿nP𝚿nsubscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛superscriptsubscript𝚿𝑛top𝑃subscript𝚿𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}=\boldsymbol{\Psi}_{n}^{\top}P\boldsymbol{\Psi}_% {n}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and KXn,Xn(𝐐)=𝚿n𝐐𝚿nsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛superscriptsubscript𝚿𝑛top𝐐subscript𝚿𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}=\boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}% \boldsymbol{\Psi}_{n}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Using this relation, note that the singular values of KXn,Xn(𝐏)=(P𝚿n)(P𝚿n)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛superscript𝑃subscript𝚿𝑛top𝑃subscript𝚿𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}=(P\boldsymbol{\Psi}_{n})^{\top}(P\boldsymbol{% \Psi}_{n})italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and KXn,Xn(𝐐)=𝚿n𝐐𝚿nsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛superscriptsubscript𝚿𝑛top𝐐subscript𝚿𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}=\boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}% \boldsymbol{\Psi}_{n}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are the same as that of (P𝚿n)(P𝚿n)=P𝚿n𝚿nP𝑃subscript𝚿𝑛superscript𝑃subscript𝚿𝑛top𝑃subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝑃(P\boldsymbol{\Psi}_{n})(P\boldsymbol{\Psi}_{n})^{\top}=P\boldsymbol{\Psi}_{n}% \boldsymbol{\Psi}_{n}^{\top}P( italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P and (𝐐𝚿n)(𝐐𝚿n)=𝐐𝚿n𝚿n𝐐𝐐subscript𝚿𝑛superscript𝐐subscript𝚿𝑛top𝐐subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝐐(\mathbf{Q}\boldsymbol{\Psi}_{n})(\mathbf{Q}\boldsymbol{\Psi}_{n})^{\top}=% \mathbf{Q}\boldsymbol{\Psi}_{n}\boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}( bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q respectively.

For the spectrum of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, note that

KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛\displaystyle K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT =(𝐏𝚿n)(𝐏𝚿n)=((n𝚲)1/2𝐏𝚿n)(n𝚲)((n𝚲)1/2𝐏𝚿n)absentsuperscript𝐏subscript𝚿𝑛top𝐏subscript𝚿𝑛superscriptsuperscript𝑛𝚲12𝐏subscript𝚿𝑛top𝑛𝚲superscript𝑛𝚲12𝐏subscript𝚿𝑛\displaystyle=(\mathbf{P}\boldsymbol{\Psi}_{n})^{\top}(\mathbf{P}\boldsymbol{% \Psi}_{n})=((n\boldsymbol{\Lambda})^{-1/2}\mathbf{P}\boldsymbol{\Psi}_{n})^{% \top}(n\boldsymbol{\Lambda})((n\boldsymbol{\Lambda})^{-1/2}\mathbf{P}% \boldsymbol{\Psi}_{n})= ( bold_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( ( italic_n bold_Λ ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_n bold_Λ ) ( ( italic_n bold_Λ ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_P bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=(𝐏(n𝚲)1/2𝚿n)𝐏(n𝚲)𝐏(𝐏(n𝚲)1/2𝚿n).absentsuperscript𝐏superscript𝑛𝚲12subscript𝚿𝑛top𝐏𝑛𝚲𝐏𝐏superscript𝑛𝚲12subscript𝚿𝑛\displaystyle=(\mathbf{P}(n\boldsymbol{\Lambda})^{-1/2}\boldsymbol{\Psi}_{n})^% {\top}\mathbf{P}(n\boldsymbol{\Lambda})\mathbf{P}(\mathbf{P}(n\boldsymbol{% \Lambda})^{-1/2}\boldsymbol{\Psi}_{n}).= ( bold_P ( italic_n bold_Λ ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_P ( italic_n bold_Λ ) bold_P ( bold_P ( italic_n bold_Λ ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .

If λ~1λ~2λ~Rsubscript~𝜆1subscript~𝜆2subscript~𝜆𝑅\tilde{\lambda}_{1}\geq\tilde{\lambda}_{2}\geq\dots\geq\tilde{\lambda}_{R}over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ ⋯ ≥ over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT denote the eigenvalues of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then using Ostrowski’s Theorem Ostrowski (1959), we can conclude that λ~j=θjnλjsubscript~𝜆𝑗subscript𝜃𝑗𝑛subscript𝜆𝑗\tilde{\lambda}_{j}=\theta_{j}n\lambda_{j}over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all j=1,2,,R𝑗12𝑅j=1,2,\dots,Ritalic_j = 1 , 2 , … , italic_R, where {nλj}j=1Rsuperscriptsubscript𝑛subscript𝜆𝑗𝑗1𝑅\{n\lambda_{j}\}_{j=1}^{R}{ italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT correspond to the eigenvalues of n𝐏𝚲𝐏𝑛𝐏𝚲𝐏n\mathbf{P}\boldsymbol{\Lambda}\mathbf{P}italic_n bold_P bold_Λ bold_P and θjsubscript𝜃𝑗\theta_{j}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT lie between the smallest and largest eigenvalues of the matrix n1(𝐏𝚲1/2𝚿n)(𝐏𝚲1/2𝚿n)superscript𝑛1superscript𝐏superscript𝚲12subscript𝚿𝑛top𝐏superscript𝚲12subscript𝚿𝑛n^{-1}(\mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})^{\top}(% \mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Note that the singular values (in this case, also eigenvalues) of n1(𝐏𝚲1/2𝚿n)(𝐏𝚲1/2𝚿n)superscript𝑛1superscript𝐏superscript𝚲12subscript𝚿𝑛top𝐏superscript𝚲12subscript𝚿𝑛n^{-1}(\mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})^{\top}(% \mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are the same as that of n1(𝐏𝚲1/2𝚿n)(𝐏𝚲1/2𝚿n)=n1i=1n(𝐏ωi)(𝐏ωi)superscript𝑛1𝐏superscript𝚲12subscript𝚿𝑛superscript𝐏superscript𝚲12subscript𝚿𝑛topsuperscript𝑛1superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖topn^{-1}(\mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})(\mathbf{P}% \boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})^{\top}=n^{-1}\sum_{i=1}^{n}(% \mathbf{P}\omega_{i})(\mathbf{P}\omega_{i})^{\top}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, where ωi=𝚲1/2ψxisubscript𝜔𝑖superscript𝚲12subscript𝜓subscript𝑥𝑖\omega_{i}=\boldsymbol{\Lambda}^{-1/2}\psi_{x_{i}}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, as defined in Appendix A.1. Using Eqn. (12) and that R(n,τ,δ)𝑅𝑛𝜏𝛿R\in\mathcal{R}(n,\tau,\delta)italic_R ∈ caligraphic_R ( italic_n , italic_τ , italic_δ ) and nN¯𝑛¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG, we can conclude that the following relation is true with probability 1δ/61𝛿61-\delta/61 - italic_δ / 6:

(1ni=1n(𝐏ωi)(𝐏ωi)𝐏)2127.subscriptnorm1𝑛superscriptsubscript𝑖1𝑛𝐏subscript𝜔𝑖superscript𝐏subscript𝜔𝑖top𝐏2127\displaystyle\left\|\left(\frac{1}{n}\sum_{i=1}^{n}(\mathbf{P}\omega_{i})(% \mathbf{P}\omega_{i})^{\top}-\mathbf{P}\right)\right\|_{2}\leq\frac{1}{27}.∥ ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_P italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_P ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 27 end_ARG .

Thus, we can conclude that eigenvalues of n1(𝐏𝚲1/2𝚿n)(𝐏𝚲1/2𝚿n)superscript𝑛1superscript𝐏superscript𝚲12subscript𝚿𝑛top𝐏superscript𝚲12subscript𝚿𝑛n^{-1}(\mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})^{\top}(% \mathbf{P}\boldsymbol{\Lambda}^{-1/2}\boldsymbol{\Psi}_{n})italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_P bold_Λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) lie in the range [26/27,28/27]26272827[26/27,28/27][ 26 / 27 , 28 / 27 ] and consequently, λ~j26nλj/27subscript~𝜆𝑗26𝑛subscript𝜆𝑗27\tilde{\lambda}_{j}\geq 26n\lambda_{j}/27over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 26 italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / 27.

As mentioned earlier, the singular values of KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the same as those of 𝐐𝚿n𝚿n𝐐𝐐subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝐐\mathbf{Q}\boldsymbol{\Psi}_{n}\boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q. For the analysis, it suffices to have an upper bound on KXn,Xn(𝐐)2subscriptnormsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛2\|K^{(\mathbf{Q})}_{X_{n},X_{n}}\|_{2}∥ italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, or equivalently, 𝐐𝚿n𝚿n𝐐2subscriptnorm𝐐subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝐐2\|\mathbf{Q}\boldsymbol{\Psi}_{n}\boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}\|_{2}∥ bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Using the result from Moeller and Ullrich (2021, Proposition 3.8), we know that the following relation holds with probability 1δ/61𝛿61-\delta/61 - italic_δ / 6:

𝐐𝚿n𝚿n𝐐22{42log(12δ)T(R),nλR+1}.subscriptnorm𝐐subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝐐224212𝛿𝑇𝑅𝑛subscript𝜆𝑅1\displaystyle\|\mathbf{Q}\boldsymbol{\Psi}_{n}\boldsymbol{\Psi}_{n}^{\top}% \mathbf{Q}\|_{2}\leq 2\left\{42\log\left(\frac{12}{\delta}\right)T(R),n\lambda% _{R+1}\right\}.∥ bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 2 { 42 roman_log ( divide start_ARG 12 end_ARG start_ARG italic_δ end_ARG ) italic_T ( italic_R ) , italic_n italic_λ start_POSTSUBSCRIPT italic_R + 1 end_POSTSUBSCRIPT } .

Since Rn,τ𝑅subscript𝑛𝜏R\in\mathcal{R}_{n,\tau}italic_R ∈ caligraphic_R start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT, we can conclude that KXn,Xn(𝐐)2=𝐐𝚿n𝚿n𝐐22τ/27subscriptnormsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛2subscriptnorm𝐐subscript𝚿𝑛superscriptsubscript𝚿𝑛top𝐐22𝜏27\|K^{(\mathbf{Q})}_{X_{n},X_{n}}\|_{2}=\|\mathbf{Q}\boldsymbol{\Psi}_{n}% \boldsymbol{\Psi}_{n}^{\top}\mathbf{Q}\|_{2}\leq 2\tau/27∥ italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ bold_Q bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_Ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Q ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 2 italic_τ / 27. We are now ready to prove the lemma.

Using the relation KXn,Xn=KXn,Xn(𝐏)+KXn,Xn(𝐐)subscript𝐾subscript𝑋𝑛subscript𝑋𝑛subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K_{X_{n},X_{n}}=K^{(\mathbf{P})}_{X_{n},X_{n}}+K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we can decompose the information gain of Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as follows:

γ~Xn,τsubscript~𝛾subscript𝑋𝑛𝜏\displaystyle\tilde{\gamma}_{X_{n},\tau}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT =12log(det(In+τ1KXn,Xn))absent12subscript𝐼𝑛superscript𝜏1subscript𝐾subscript𝑋𝑛subscript𝑋𝑛\displaystyle=\frac{1}{2}\log\left(\det(I_{n}+\tau^{-1}K_{X_{n},X_{n}})\right)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) )
=12log(det(In+τ1KXn,Xn(𝐏)+τ1KXn,Xn(𝐐)))absent12subscript𝐼𝑛superscript𝜏1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛\displaystyle=\frac{1}{2}\log\left(\det(I_{n}+\tau^{-1}K^{(\mathbf{P})}_{X_{n}% ,X_{n}}+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})\right)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) )
=12log(det((In+τ1KXn,Xn(𝐐))(In+τ1(In+τ1KXn,Xn(𝐐))1KXn,Xn(𝐏))))absent12subscript𝐼𝑛superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛subscript𝐼𝑛superscript𝜏1superscriptsubscript𝐼𝑛superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛\displaystyle=\frac{1}{2}\log\left(\det((I_{n}+\tau^{-1}K^{(\mathbf{Q})}_{X_{n% },X_{n}})(I_{n}+\tau^{-1}(I_{n}+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})^{-1}K% ^{(\mathbf{P})}_{X_{n},X_{n}}))\right)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) )
=12log(det(I+τ1KXn,Xn(𝐐))):=G1+12log(det(I+τ1(I+τ1KXn,Xn(𝐐))1KXn,Xn(𝐏))):=G2.absent12subscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛assignabsentsubscript𝐺112subscript𝐼superscript𝜏1superscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛assignabsentsubscript𝐺2\displaystyle=\frac{1}{2}\underbrace{\log(\det(I+\tau^{-1}K^{(\mathbf{Q})}_{X_% {n},X_{n}}))}_{:=G_{1}}+\frac{1}{2}\underbrace{\log(\det(I+\tau^{-1}(I+\tau^{-% 1}K^{(\mathbf{Q})}_{X_{n},X_{n}})^{-1}K^{(\mathbf{P})}_{X_{n},X_{n}}))}_{:=G_{% 2}}.= divide start_ARG 1 end_ARG start_ARG 2 end_ARG under⏟ start_ARG roman_log ( roman_det ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) end_ARG start_POSTSUBSCRIPT := italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG under⏟ start_ARG roman_log ( roman_det ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) end_ARG start_POSTSUBSCRIPT := italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

This decomposition is similar to that derived in Vakili et al. (2021b, App. A, Eqn. 8) with the roles of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT interchanged.

We begin with G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Since KXn,Xn(𝐐)22τ/27subscriptnormsubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛22𝜏27\|K^{(\mathbf{Q})}_{X_{n},X_{n}}\|_{2}\leq 2\tau/27∥ italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 2 italic_τ / 27, all eigenvalues of τ1KXn,Xn(𝐐)superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT are less than 1111. Using the relation log(1+x)x/21𝑥𝑥2\log(1+x)\geq x/2roman_log ( 1 + italic_x ) ≥ italic_x / 2, which holds for all x[0,1]𝑥01x\in[0,1]italic_x ∈ [ 0 , 1 ], we can lower bound G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as follows:

G1=log(det(I+τ1KXn,Xn(𝐐)))12τtrace(KXn,Xn(𝐐)).subscript𝐺1𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛12𝜏tracesubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛\displaystyle G_{1}=\log(\det(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}}))\geq% \frac{1}{2\tau}\mathrm{trace}(K^{(\mathbf{Q})}_{X_{n},X_{n}}).italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_log ( roman_det ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ≥ divide start_ARG 1 end_ARG start_ARG 2 italic_τ end_ARG roman_trace ( italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Note k(𝐐)(Xi,Xi)superscript𝑘𝐐subscript𝑋𝑖subscript𝑋𝑖k^{(\mathbf{Q})}(X_{i},X_{i})italic_k start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are i.i.d. random variables with 𝔼[k(𝐐)(Xi,Xi)]=r=R+1λr𝔼delimited-[]superscript𝑘𝐐subscript𝑋𝑖subscript𝑋𝑖superscriptsubscript𝑟𝑅1subscript𝜆𝑟\mathbb{E}[k^{(\mathbf{Q})}(X_{i},X_{i})]=\sum_{r=R+1}^{\infty}\lambda_{r}blackboard_E [ italic_k start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] = ∑ start_POSTSUBSCRIPT italic_r = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and |k(𝐐)(Xi,Xi)|T(R)superscript𝑘𝐐subscript𝑋𝑖subscript𝑋𝑖𝑇𝑅|k^{(\mathbf{Q})}(X_{i},X_{i})|\leq T(R)| italic_k start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≤ italic_T ( italic_R ). We can thus use Hoeffding inequality to obtain the following bound on trace(KXn,Xn(𝐐))tracesubscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛\mathrm{trace}(K^{(\mathbf{Q})}_{X_{n},X_{n}})roman_trace ( italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) which holds with probability at least 1δ/61𝛿61-\delta/61 - italic_δ / 6:

G1subscript𝐺1\displaystyle G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 12τtrace(KXn,Xn(O))absent12𝜏tracesubscriptsuperscript𝐾𝑂subscript𝑋𝑛subscript𝑋𝑛\displaystyle\geq\frac{1}{2\tau}\mathrm{trace}(K^{(O)}_{X_{n},X_{n}})≥ divide start_ARG 1 end_ARG start_ARG 2 italic_τ end_ARG roman_trace ( italic_K start_POSTSUPERSCRIPT ( italic_O ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
12τ[nr=R+1λrT(R)nlog(12/δ)]absent12𝜏delimited-[]𝑛superscriptsubscript𝑟𝑅1subscript𝜆𝑟𝑇𝑅𝑛12𝛿\displaystyle\geq\frac{1}{2\tau}\left[n\sum_{r=R+1}^{\infty}\lambda_{r}-T(R)% \sqrt{n\log(12/\delta)}\right]≥ divide start_ARG 1 end_ARG start_ARG 2 italic_τ end_ARG [ italic_n ∑ start_POSTSUBSCRIPT italic_r = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_T ( italic_R ) square-root start_ARG italic_n roman_log ( 12 / italic_δ ) end_ARG ]
nT(R)2τF2(1F2log(12/δ)n)absent𝑛𝑇𝑅2𝜏superscript𝐹21superscript𝐹212𝛿𝑛\displaystyle\geq\frac{nT(R)}{2\tau F^{2}}\left(1-F^{2}\sqrt{\frac{\log(12/% \delta)}{n}}\right)≥ divide start_ARG italic_n italic_T ( italic_R ) end_ARG start_ARG 2 italic_τ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG roman_log ( 12 / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG )
13nT(R)27τF2absent13𝑛𝑇𝑅27𝜏superscript𝐹2\displaystyle\geq\frac{13nT(R)}{27\tau F^{2}}≥ divide start_ARG 13 italic_n italic_T ( italic_R ) end_ARG start_ARG 27 italic_τ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

In the third line, we used the fact that T(R)F2r=R+1λr𝑇𝑅superscript𝐹2superscriptsubscript𝑟𝑅1subscript𝜆𝑟T(R)\leq F^{2}\sum_{r=R+1}^{\infty}\lambda_{r}italic_T ( italic_R ) ≤ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT since φjFsubscriptnormsubscript𝜑𝑗𝐹\|\varphi_{j}\|_{\infty}\leq F∥ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_F for all j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N (Assumption 2.3). The fourth line uses the condition that nN¯𝑛¯𝑁n\geq\overline{N}italic_n ≥ over¯ start_ARG italic_N end_ARG.

To bound G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, first note that using the condition on the spectrum on τ1KXn,Xn(𝐐)superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we can conclude that all the eigenvalues of (I+τ1KXn,Xn(𝐐))𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) lie in the range [1,2]12[1,2][ 1 , 2 ]. Moreover, note that the spectrum of (I+τ1KXn,Xn(𝐐))1KXn,Xn(𝐏)superscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})^{-1}K^{(\mathbf{P})}_{X_{n},X_{n}}( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the same as that of (I+τ1KXn,Xn(𝐐))1/2KXn,Xn(𝐏)(I+τ1KXn,Xn(𝐐))1/2superscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛12subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛superscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛12(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})^{-1/2}K^{(\mathbf{P})}_{X_{n},X_{n% }}(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})^{-1/2}( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT. On using Ostrowski’s Theorem Ostrowski (1959) along with range of eigenvalues of (I+τ1KXn,Xn(𝐐))𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X_{n}})( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), we can conclude that

G2=log(det(I+τ1(I+τ1KXn,Xn(𝐐))1KXn,Xn(𝐏)))log(det(I+(2τ)1KXn,Xn(𝐏))).subscript𝐺2𝐼superscript𝜏1superscript𝐼superscript𝜏1subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛𝐼superscript2𝜏1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛\displaystyle G_{2}=\log(\det(I+\tau^{-1}(I+\tau^{-1}K^{(\mathbf{Q})}_{X_{n},X% _{n}})^{-1}K^{(\mathbf{P})}_{X_{n},X_{n}}))\geq\log(\det(I+(2\tau)^{-1}K^{(% \mathbf{P})}_{X_{n},X_{n}})).italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_log ( roman_det ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I + italic_τ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ≥ roman_log ( roman_det ( italic_I + ( 2 italic_τ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) .

Using the relation for the eigenvalues of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT derived earlier, we can further G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as follows:

G2subscript𝐺2\displaystyle G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT log(det(I+(2τ)1KXn,Xn(𝐏)))absent𝐼superscript2𝜏1subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛\displaystyle\geq\log(\det(I+(2\tau)^{-1}K^{(\mathbf{P})}_{X_{n},X_{n}}))≥ roman_log ( roman_det ( italic_I + ( 2 italic_τ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) )
j=1Rlog(1+(2τ)1λ~j)absentsuperscriptsubscript𝑗1𝑅1superscript2𝜏1subscript~𝜆𝑗\displaystyle\geq\sum_{j=1}^{R}\log(1+(2\tau)^{-1}\tilde{\lambda}_{j})≥ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT roman_log ( 1 + ( 2 italic_τ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
j=1Rlog(1+13nλj27τ)absentsuperscriptsubscript𝑗1𝑅113𝑛subscript𝜆𝑗27𝜏\displaystyle\geq\sum_{j=1}^{R}\log\left(1+\frac{13n\lambda_{j}}{27\tau}\right)≥ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG 13 italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG 27 italic_τ end_ARG )
j=1R13nλj13nλj+27τabsentsuperscriptsubscript𝑗1𝑅13𝑛subscript𝜆𝑗13𝑛subscript𝜆𝑗27𝜏\displaystyle\geq\sum_{j=1}^{R}\frac{13n\lambda_{j}}{13n\lambda_{j}+27\tau}≥ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG 13 italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG 13 italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 27 italic_τ end_ARG
13n27F2supx𝒳j=1Rλjnλj+τφj2(x).absent13𝑛27superscript𝐹2subscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2𝑥\displaystyle\geq\frac{13n}{27F^{2}}\sup_{x\in\mathcal{X}}\sum_{j=1}^{R}\frac{% \lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x).≥ divide start_ARG 13 italic_n end_ARG start_ARG 27 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) .

In the fourth line, we used the relation log(1+x)xx+11𝑥𝑥𝑥1\log(1+x)\geq\frac{x}{x+1}roman_log ( 1 + italic_x ) ≥ divide start_ARG italic_x end_ARG start_ARG italic_x + 1 end_ARG, which holds for all x0𝑥0x\geq 0italic_x ≥ 0.

On combining the bounds for G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we obtain,

γ~Xn,τsubscript~𝛾subscript𝑋𝑛𝜏\displaystyle\tilde{\gamma}_{X_{n},\tau}over~ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT =12(G1+G2)absent12subscript𝐺1subscript𝐺2\displaystyle=\frac{1}{2}(G_{1}+G_{2})= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
13nT(R)54τF2+13n54F2supx𝒳j=1Rλjnλj+τφj2(x)absent13𝑛𝑇𝑅54𝜏superscript𝐹213𝑛54superscript𝐹2subscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2𝑥\displaystyle\geq\frac{13nT(R)}{54\tau F^{2}}+\frac{13n}{54F^{2}}\sup_{x\in% \mathcal{X}}\sum_{j=1}^{R}\frac{\lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}% (x)≥ divide start_ARG 13 italic_n italic_T ( italic_R ) end_ARG start_ARG 54 italic_τ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 13 italic_n end_ARG start_ARG 54 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x )
13n54F2(supx𝒳j=1Rλjnλj+τφ2(x)+T(R)τ)absent13𝑛54superscript𝐹2subscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscript𝜑2𝑥𝑇𝑅𝜏\displaystyle\geq\frac{13n}{54F^{2}}\left(\sup_{x\in\mathcal{X}}\sum_{j=1}^{R}% \frac{\lambda_{j}}{n\lambda_{j}+\tau}\varphi^{2}(x)+\frac{T(R)}{\tau}\right)≥ divide start_ARG 13 italic_n end_ARG start_ARG 54 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) + divide start_ARG italic_T ( italic_R ) end_ARG start_ARG italic_τ end_ARG )
13n54F2supx𝒳(j=1Rλjnλj+τφj2(x)+j=R+1λjnλj+τφj2(x))absent13𝑛54superscript𝐹2subscriptsupremum𝑥𝒳superscriptsubscript𝑗1𝑅subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2𝑥superscriptsubscript𝑗𝑅1subscript𝜆𝑗𝑛subscript𝜆𝑗𝜏superscriptsubscript𝜑𝑗2𝑥\displaystyle\geq\frac{13n}{54F^{2}}\sup_{x\in\mathcal{X}}\left(\sum_{j=1}^{R}% \frac{\lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x)+\sum_{j=R+1}^{\infty}% \frac{\lambda_{j}}{n\lambda_{j}+\tau}\varphi_{j}^{2}(x)\right)≥ divide start_ARG 13 italic_n end_ARG start_ARG 54 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) + ∑ start_POSTSUBSCRIPT italic_j = italic_R + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_n italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_τ end_ARG italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) )
13n54F2supx𝒳ψx𝐙1ψx,absent13𝑛54superscript𝐹2subscriptsupremum𝑥𝒳superscriptsubscript𝜓𝑥topsuperscript𝐙1subscript𝜓𝑥\displaystyle\geq\frac{13n}{54F^{2}}\sup_{x\in\mathcal{X}}\psi_{x}^{\top}% \mathbf{Z}^{-1}\psi_{x},≥ divide start_ARG 13 italic_n end_ARG start_ARG 54 italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ,

as required. Since each of the bounds on G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the eigenvalues of KXn,Xn(𝐏)subscriptsuperscript𝐾𝐏subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{P})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and KXn,Xn(𝐐)subscriptsuperscript𝐾𝐐subscript𝑋𝑛subscript𝑋𝑛K^{(\mathbf{Q})}_{X_{n},X_{n}}italic_K start_POSTSUPERSCRIPT ( bold_Q ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, holds with probability at least 1δ/61𝛿61-\delta/61 - italic_δ / 6, the overall bound holds with probability at least 1δ/21𝛿21-\delta/21 - italic_δ / 2.

Appendix B Proof of Theorems 4.3 and 4.5

The proof of both the theorems is based along the lines of the proof of the Batched Pure Exploration (BPE) algorithm Li and Scarlett (2022). We first begin with a brief discussion about Assumption 4 and then move on to the proof.

Definition B.1.

Let Γ:𝒳𝒳:Γ𝒳superscript𝒳\Gamma:\mathcal{X}\to\mathcal{X}^{\prime}roman_Γ : caligraphic_X → caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a map between two sets 𝒳,𝒳d𝒳superscript𝒳superscript𝑑\mathcal{X},\mathcal{X}^{\prime}\subset\mathbb{R}^{d}caligraphic_X , caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. We call ΓΓ\Gammaroman_Γ to be a bi-Lipschitz map if the inverse map, Γ1superscriptΓ1\Gamma^{-1}roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, exists and the following relations hold for some L,L>0𝐿superscript𝐿0L,L^{\prime}>0italic_L , italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0:

Γ(x)Γ(y)2subscriptnormΓ𝑥Γ𝑦2\displaystyle\|\Gamma(x)-\Gamma(y)\|_{2}∥ roman_Γ ( italic_x ) - roman_Γ ( italic_y ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Lxy2x,y𝒳formulae-sequenceabsent𝐿subscriptnorm𝑥𝑦2for-all𝑥𝑦𝒳\displaystyle\leq L\|x-y\|_{2}\ \ \forall x,y\in\mathcal{X}≤ italic_L ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∀ italic_x , italic_y ∈ caligraphic_X
Γ1(x)Γ1(y)2subscriptnormsuperscriptΓ1𝑥superscriptΓ1𝑦2\displaystyle\|\Gamma^{-1}(x)-\Gamma^{-1}(y)\|_{2}∥ roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x ) - roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Lxy2x,y𝒳.formulae-sequenceabsentsuperscript𝐿subscriptnorm𝑥𝑦2for-all𝑥𝑦superscript𝒳\displaystyle\leq L^{\prime}\|x-y\|_{2}\ \ \forall x,y\in\mathcal{X}^{\prime}.≤ italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∀ italic_x , italic_y ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .

We refer to (L,L)𝐿superscript𝐿(L,L^{\prime})( italic_L , italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) the Lipschitz constant pair of ΓΓ\Gammaroman_Γ. We also define normalized Lipschitz constant pair of ΓΓ\Gammaroman_Γ to be the pair (L~,L~)=(L(vol(𝒳)vol(𝒳))1/d,L(vol(𝒳)vol(𝒳))1/d)~𝐿superscript~𝐿𝐿superscriptvol𝒳volsuperscript𝒳1𝑑superscript𝐿superscriptvolsuperscript𝒳vol𝒳1𝑑(\tilde{L},\tilde{L}^{\prime})=\left(L\left(\frac{\mathrm{vol}(\mathcal{X})}{% \mathrm{vol}(\mathcal{X}^{\prime})}\right)^{1/d},L^{\prime}\left(\frac{\mathrm% {vol}(\mathcal{X}^{\prime})}{\mathrm{vol}(\mathcal{X})}\right)^{1/d}\right)( over~ start_ARG italic_L end_ARG , over~ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( italic_L ( divide start_ARG roman_vol ( caligraphic_X ) end_ARG start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT 1 / italic_d end_POSTSUPERSCRIPT , italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_vol ( caligraphic_X ) end_ARG ) start_POSTSUPERSCRIPT 1 / italic_d end_POSTSUPERSCRIPT ).

The normalized Lipschitz constant pair quantifies solely the change due to structure and discounts for the change in size between 𝒳𝒳\mathcal{X}caligraphic_X and 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The following is a restatement of Assumption 4.

Assumption B.2.

Let η={x𝒳|f(x)η}subscript𝜂conditional-set𝑥𝒳𝑓𝑥𝜂\mathcal{L}_{\eta}=\{x\in\mathcal{X}|f(x)\geq\eta\}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X | italic_f ( italic_x ) ≥ italic_η } denote the level set of f𝑓fitalic_f for η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ]. Then,

  • For all η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ], ηsubscript𝜂\mathcal{L}_{\eta}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is a disjoint union of at most Mf<subscript𝑀𝑓M_{f}<\inftyitalic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT < ∞ closed, path connected components.

  • For a given η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ], let ηisuperscriptsubscript𝜂𝑖\mathcal{L}_{\eta}^{i}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denote the ithsuperscript𝑖thi^{\text{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT such connected component of ηsubscript𝜂\mathcal{L}_{\eta}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT. We assume that there exists a bi-Lipschitzian map Γη,i:𝒳ηi:subscriptΓ𝜂𝑖𝒳superscriptsubscript𝜂𝑖\Gamma_{\eta,i}:\mathcal{X}\to\mathcal{L}_{\eta}^{i}roman_Γ start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT : caligraphic_X → caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with normalized Lipschitz constant pair L~η,i,L~η,i>0subscript~𝐿𝜂𝑖superscriptsubscript~𝐿𝜂𝑖0\tilde{L}_{\eta,i},\tilde{L}_{\eta,i}^{\prime}>0over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 for all η,i𝜂𝑖\eta,iitalic_η , italic_i. Let Lf=supη,iL~η,isubscript𝐿𝑓subscriptsupremum𝜂𝑖subscript~𝐿𝜂𝑖L_{f}=\sup_{\eta,i}\tilde{L}_{\eta,i}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT and Lf=supη,iL~η,isuperscriptsubscript𝐿𝑓subscriptsupremum𝜂𝑖superscriptsubscript~𝐿𝜂𝑖L_{f}^{\prime}=\sup_{\eta,i}\tilde{L}_{\eta,i}^{\prime}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_sup start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_η , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We assume that Lf,Lf<subscript𝐿𝑓superscriptsubscript𝐿𝑓L_{f},L_{f}^{\prime}<\inftyitalic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < ∞.

Assumption 4 is an assumption on the regularity of the level sets of the function f𝑓fitalic_f. The term Mfsubscript𝑀𝑓M_{f}italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT can be thought of as the number of local maximas of f𝑓fitalic_f and hence finiteness of Mfsubscript𝑀𝑓M_{f}italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is a mild assumption on f𝑓fitalic_f satisfied by functions encountered in practice. Moreover, the knowledge of Mfsubscript𝑀𝑓M_{f}italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is only required for analysis and not for the algorithm to run. The second condition on f𝑓fitalic_f is to ensure that the these connected components are topologically regular enough and to avoid certain pathological cases. In particular, the existence of a bi-Lipschitzian map between two sets implies topological similarity between the two sets. Intuitively, this assumption ensures that the shape of the level-sets is not “too arbitrary”. Note that such an assumption on the level sets of f𝑓fitalic_f is relatively mild as the RKHS endows smoothness properties to the function f𝑓fitalic_f which translate to a degree of topological regularity of level sets Alberti et al. (2011); Lee (2010).

B.1 Proof of Theorem 4.3

At a high level, the bound on regret is obtained by first separately bounding the regret during every epoch r𝑟ritalic_r and then summing it across all epochs. During any epoch r𝑟ritalic_r, since REDS chooses points uniformly at random from the current domain 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, we simply bound the regret incurred at each point queried during this epoch by the worst case scenario, i.e., ςr:=f(x*)infx𝒳rf(x)assignsubscript𝜍𝑟𝑓superscript𝑥subscriptinfimum𝑥subscript𝒳𝑟𝑓𝑥\varsigma_{r}:=f(x^{*})-\inf_{x\in\mathcal{X}_{r}}f(x)italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT := italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ). This leads to an upper bound of ςrNrMfsubscript𝜍𝑟subscript𝑁𝑟subscript𝑀𝑓\varsigma_{r}N_{r}M_{f}italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT on the regret incurred during epoch r𝑟ritalic_r, as there are at most Mfsubscript𝑀𝑓M_{f}italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT connected components in each level set. Since poorly performing regions of the domain are eliminated as the algorithm proceeds, infx𝒳rf(x)subscriptinfimum𝑥subscript𝒳𝑟𝑓𝑥\inf_{x\in\mathcal{X}_{r}}f(x)roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ) gets closer to f(x*)𝑓superscript𝑥f(x^{*})italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ), reducing the regret in each epoch as the algorithm proceeds.

The following two lemmas ensure the correctness of the algorithm and help bound the regret incurred during each epoch.

Lemma B.3.

x*𝒳rsuperscript𝑥subscript𝒳𝑟x^{*}\in\mathcal{X}_{r}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for all r1𝑟1r\geq 1italic_r ≥ 1.

Lemma B.4.

For all epochs r𝑟ritalic_r, we have,

ςr{2B if r=1,4Bsupx𝒳r1σr1(x) if r2.subscript𝜍𝑟cases2𝐵 if 𝑟14𝐵subscriptsupremum𝑥subscript𝒳𝑟1subscript𝜎𝑟1𝑥 if 𝑟2\displaystyle\varsigma_{r}\leq\begin{cases}2B&\text{ if }r=1,\\ 4B\sup_{x\in\mathcal{X}_{r-1}}\sigma_{r-1}(x)&\text{ if }r\geq 2.\end{cases}italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≤ { start_ROW start_CELL 2 italic_B end_CELL start_CELL if italic_r = 1 , end_CELL end_ROW start_ROW start_CELL 4 italic_B roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL if italic_r ≥ 2 . end_CELL end_ROW

We defer the proof of these lemmas to Appendix B.3. Equipped with these lemmas, we move on to the proof of Theorem 4.3. The regret incurred by REDS can be bounded as

R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) =t=1Tf(x*)f(xt)r=1SςrNrMfabsentsuperscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscript𝑥𝑡superscriptsubscript𝑟1𝑆subscript𝜍𝑟subscript𝑁𝑟subscript𝑀𝑓\displaystyle=\sum_{t=1}^{T}f(x^{*})-f(x_{t})\leq\sum_{r=1}^{S}\varsigma_{r}N_% {r}M_{f}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT
2BN1+4BMfr=2S[Nrsupx𝒳r1σr1(x)].absent2𝐵subscript𝑁14𝐵subscript𝑀𝑓superscriptsubscript𝑟2𝑆delimited-[]subscript𝑁𝑟subscriptsupremum𝑥subscript𝒳𝑟1subscript𝜎𝑟1𝑥\displaystyle\leq 2BN_{1}+4BM_{f}\sum_{r=2}^{S}\left[N_{r}\cdot\sup_{x\in% \mathcal{X}_{r-1}}\sigma_{r-1}(x)\right].≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 4 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT [ italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ] .

In the above expression, S𝑆Sitalic_S denotes the total number of epochs that begin during a run of REDS algorithm before reaching a total of T𝑇Titalic_T queries. Since the epoch lengths double every epoch, we have S1+log2(T/N1)𝑆1subscript2𝑇subscript𝑁1S\leq 1+\log_{2}(T/N_{1})italic_S ≤ 1 + roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T / italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). We can further bound R(T)𝑅𝑇R(T)italic_R ( italic_T ) using Lemma 4.6 (which in turn is based on Theorem 3.1) to bound the worst-case posterior standard deviation in the above equation. Since 𝒳r1subscript𝒳𝑟1\mathcal{X}_{r-1}caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT is compact (𝒳r1subscript𝒳𝑟1\mathcal{X}_{r-1}caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT is closed by definition and 𝒳r1subscript𝒳𝑟1\mathcal{X}_{r-1}caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT is bounded because 𝒳r1𝒳subscript𝒳𝑟1𝒳\mathcal{X}_{r-1}\subseteq\mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ⊆ caligraphic_X) and Nr1N1CLf,LfN¯subscript𝑁𝑟1subscript𝑁1subscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓¯𝑁N_{r-1}\geq N_{1}\geq C_{L_{f},L_{f}^{\prime}}\overline{N}italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ≥ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_N end_ARG, we can invoke Lemma 4.6 to conclude

R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+4BC2CLf,LfMfr=2SNrNr1(1β)/2(log(n/δ))β/2,absent2𝐵subscript𝑁14𝐵subscript𝐶2superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝑀𝑓superscriptsubscript𝑟2𝑆subscript𝑁𝑟superscriptsubscript𝑁𝑟11𝛽2superscript𝑛superscript𝛿𝛽2\displaystyle\leq 2BN_{1}+4BC_{2}C_{L_{f},L_{f}^{\prime}}^{\prime}M_{f}\sum_{r% =2}^{S}N_{r}\cdot N_{r-1}^{(1-\beta)/2}(\log(n/\delta^{\prime}))^{\beta/2},≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 4 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( italic_n / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT , (26)

where δ=δ/log2Tsuperscript𝛿𝛿subscript2𝑇\delta^{\prime}=\delta/\log_{2}Titalic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_δ / roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_T, C2=C1subscript𝐶2subscript𝐶1C_{2}=\sqrt{C_{1}}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG and CLf,Lf,CLf,Lfsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓C_{L_{f},L_{f}^{\prime}},C_{L_{f},L_{f}^{\prime}}^{\prime}italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are the constants from Lemma 4.6 and depend only on Lf,Lfsubscript𝐿𝑓superscriptsubscript𝐿𝑓L_{f},L_{f}^{\prime}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. For simplicity, we define Cf:=CLf,LfMfassignsubscript𝐶𝑓superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝑀𝑓C_{f}:=C_{L_{f},L_{f}^{\prime}}^{\prime}M_{f}italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, as a constant that depends only on the function f𝑓fitalic_f. On plugging in the values of Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, Eqn. (26) simplifies to

R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+4BC2Cfr=2SNrNr1(1β)/2(log(n/δ))β/2absent2𝐵subscript𝑁14𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑟2𝑆subscript𝑁𝑟superscriptsubscript𝑁𝑟11𝛽2superscript𝑛superscript𝛿𝛽2\displaystyle\leq 2BN_{1}+4BC_{2}C_{f}\sum_{r=2}^{S}N_{r}\cdot N_{r-1}^{(1-% \beta)/2}(\log(n/\delta^{\prime}))^{\beta/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 4 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( italic_n / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT
2BN1+4BC2CfN1(3β)/2r=2S2r12(r2)(1β)/2(log(N1δ2r2))β/2absent2𝐵subscript𝑁14𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscriptsubscript𝑟2𝑆superscript2𝑟1superscript2𝑟21𝛽2superscriptsubscript𝑁1superscript𝛿superscript2𝑟2𝛽2\displaystyle\leq 2BN_{1}+4BC_{2}C_{f}N_{1}^{(3-\beta)/2}\sum_{r=2}^{S}2^{r-1}% \cdot 2^{(r-2)(1-\beta)/2}\left(\log\left(\frac{N_{1}}{\delta^{\prime}}\cdot 2% ^{r-2}\right)\right)^{\beta/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 4 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT ⋅ 2 start_POSTSUPERSCRIPT ( italic_r - 2 ) ( 1 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r - 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT
2BN1+8BC2CfN1(3β)/2r=0S22r(3β)/2(log(N1δ2r))β/2.absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽2superscriptsubscript𝑁1superscript𝛿superscript2𝑟𝛽2\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\sum_{r=0}^{S-2}2^{r(% 3-\beta)/2}\left(\log\left(\frac{N_{1}}{\delta^{\prime}}\cdot 2^{r}\right)% \right)^{\beta/2}.≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT . (27)

We consider three separate cases based on the value of β𝛽\betaitalic_β:

  • β<3𝛽3\beta<3italic_β < 3: Under this case, Eqn. (27) can be simplified as follows:

    R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+8BC2CfN1(3β)/2r=0S22r(3β)/2(log(N1δ2r))β/2absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽2superscriptsubscript𝑁1superscript𝛿superscript2𝑟𝛽2\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\sum_{r=0}^{S-2}2^{r(% 3-\beta)/2}\left(\log\left(\frac{N_{1}}{\delta^{\prime}}\cdot 2^{r}\right)% \right)^{\beta/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT
    2BN1+8BC2CfN1(3β)/2(log(Tδ))3/2r=0S22r(3β)/2absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscript𝑇superscript𝛿32superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽2\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\left(\log\left(\frac% {T}{\delta^{\prime}}\right)\right)^{3/2}\sum_{r=0}^{S-2}2^{r(3-\beta)/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT
    2BN1+8BC2CfN1(3β)/2(log(Tδ))3/22(S1)(3β)/212(3β)/21absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscript𝑇superscript𝛿32superscript2𝑆13𝛽21superscript23𝛽21\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\left(\log\left(\frac% {T}{\delta^{\prime}}\right)\right)^{3/2}\frac{2^{(S-1)(3-\beta)/2}-1}{2^{(3-% \beta)/2}-1}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT divide start_ARG 2 start_POSTSUPERSCRIPT ( italic_S - 1 ) ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT - 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT - 1 end_ARG
    2BN1+8BC2Cf2(3β)/21T(3β)/2(log(Tδ))3/2.absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript23𝛽21superscript𝑇3𝛽2superscript𝑇superscript𝛿32\displaystyle\leq 2BN_{1}+\frac{8BC_{2}C_{f}}{2^{(3-\beta)/2}-1}T^{(3-\beta)/2% }\left(\log\left(\frac{T}{\delta^{\prime}}\right)\right)^{3/2}.≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT - 1 end_ARG italic_T start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT .
  • β=3𝛽3\beta=3italic_β = 3: For this value of β𝛽\betaitalic_β, Eqn. (27) can be simplified as follows:

    R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+8BC2CfN1(3β)/2r=0S22r(3β)/2(log(N1δ2r))β/2absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽2superscriptsubscript𝑁1superscript𝛿superscript2𝑟𝛽2\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\sum_{r=0}^{S-2}2^{r(% 3-\beta)/2}\left(\log\left(\frac{N_{1}}{\delta^{\prime}}\cdot 2^{r}\right)% \right)^{\beta/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT
    2BN1+8BC2Cf(log(Tδ))3/2r=0S21absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript𝑇superscript𝛿32superscriptsubscript𝑟0𝑆21\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}\cdot\left(\log\left(\frac{T}{\delta^{% \prime}}\right)\right)^{3/2}\cdot\sum_{r=0}^{S-2}1≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 1
    2BN1+8BC2Cf(log(Tδ))3/2log(TN1).absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript𝑇superscript𝛿32𝑇subscript𝑁1\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}\cdot\left(\log\left(\frac{T}{\delta^{% \prime}}\right)\right)^{3/2}\cdot\log\left(\frac{T}{N_{1}}\right).≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ⋅ roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) .
  • β>3𝛽3\beta>3italic_β > 3: For this range, we have,

    R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+8BC2CfN1(3β)/2r=0S22r(3β)/2(log(N1δ2r))β/2absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscriptsubscript𝑁13𝛽2superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽2superscriptsubscript𝑁1superscript𝛿superscript2𝑟𝛽2\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}N_{1}^{(3-\beta)/2}\sum_{r=0}^{S-2}2^{r(% 3-\beta)/2}\left(\log\left(\frac{N_{1}}{\delta^{\prime}}\cdot 2^{r}\right)% \right)^{\beta/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 2 end_POSTSUPERSCRIPT ( roman_log ( divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT
    2BN1+8BC2Cf(log(Tδ))3/2r=0S22r(3β)/4[log(N12r)+log(1/δ)N12r/2](β3)/2absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript𝑇superscript𝛿32superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽4superscriptdelimited-[]subscript𝑁1superscript2𝑟1superscript𝛿subscript𝑁1superscript2𝑟2𝛽32\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}\cdot\left(\log\left(\frac{T}{\delta^{% \prime}}\right)\right)^{3/2}\cdot\sum_{r=0}^{S-2}2^{r(3-\beta)/4}\left[\frac{% \log(N_{1}\cdot 2^{r})+\log(1/\delta^{\prime})}{N_{1}\cdot 2^{r/2}}\right]^{(% \beta-3)/2}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 4 end_POSTSUPERSCRIPT [ divide start_ARG roman_log ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT italic_r / 2 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT ( italic_β - 3 ) / 2 end_POSTSUPERSCRIPT
    2BN1+8BC2Cf(log(Tδ))3/2[log(N1/δ)N1](β3)/2r=0S22r(3β)/4absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript𝑇superscript𝛿32superscriptdelimited-[]subscript𝑁1superscript𝛿subscript𝑁1𝛽32superscriptsubscript𝑟0𝑆2superscript2𝑟3𝛽4\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}\cdot\left(\log\left(\frac{T}{\delta^{% \prime}}\right)\right)^{3/2}\cdot\left[\frac{\log(N_{1}/\delta^{\prime})}{N_{1% }}\right]^{(\beta-3)/2}\cdot\sum_{r=0}^{S-2}2^{r(3-\beta)/4}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ⋅ [ divide start_ARG roman_log ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT ( italic_β - 3 ) / 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 4 end_POSTSUPERSCRIPT
    2BN1+8BC2Cf(log(Tδ))3/2[log(N1/δ)N1](β3)/2r=02r(3β)/4absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓superscript𝑇superscript𝛿32superscriptdelimited-[]subscript𝑁1superscript𝛿subscript𝑁1𝛽32superscriptsubscript𝑟0superscript2𝑟3𝛽4\displaystyle\leq 2BN_{1}+8BC_{2}C_{f}\cdot\left(\log\left(\frac{T}{\delta^{% \prime}}\right)\right)^{3/2}\cdot\left[\frac{\log(N_{1}/\delta^{\prime})}{N_{1% }}\right]^{(\beta-3)/2}\cdot\sum_{r=0}^{\infty}2^{r(3-\beta)/4}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ⋅ [ divide start_ARG roman_log ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT ( italic_β - 3 ) / 2 end_POSTSUPERSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r ( 3 - italic_β ) / 4 end_POSTSUPERSCRIPT
    2BN1+8BC2Cf12(3β)/4(log(Tδ))3/2.absent2𝐵subscript𝑁18𝐵subscript𝐶2subscript𝐶𝑓1superscript23𝛽4superscript𝑇superscript𝛿32\displaystyle\leq 2BN_{1}+\frac{8BC_{2}C_{f}}{1-2^{(3-\beta)/4}}\cdot\left(% \log\left(\frac{T}{\delta^{\prime}}\right)\right)^{3/2}.≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 8 italic_B italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG 1 - 2 start_POSTSUPERSCRIPT ( 3 - italic_β ) / 4 end_POSTSUPERSCRIPT end_ARG ⋅ ( roman_log ( divide start_ARG italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT .

    In the third step, we used the fact that log(N12r/δ)N12r/2subscript𝑁1superscript2𝑟superscript𝛿subscript𝑁1superscript2𝑟2\dfrac{\log(N_{1}\cdot 2^{r}/\delta^{\prime})}{N_{1}\cdot 2^{r/2}}divide start_ARG roman_log ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 start_POSTSUPERSCRIPT italic_r / 2 end_POSTSUPERSCRIPT end_ARG is a decreasing function of r𝑟ritalic_r for all r0𝑟0r\geq 0italic_r ≥ 0 and in the fifth step we used the fact that N1log(N1/δ)subscript𝑁1subscript𝑁1superscript𝛿N_{1}\geq\log(N_{1}/\delta^{\prime})italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ roman_log ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) since N1N¯(δ)subscript𝑁1¯𝑁superscript𝛿N_{1}\geq\overline{N}(\delta^{\prime})italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ over¯ start_ARG italic_N end_ARG ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

On combining all the cases, we arrive at the result. The statement in Corollary 4.4 follows immediately from the above proof by plugging in β=1+2ν/d𝛽12𝜈𝑑\beta=1+2\nu/ditalic_β = 1 + 2 italic_ν / italic_d.

B.2 Proof of Theorem 4.5

The proof of Theorem 4.5 is almost identical to that of Theorem 4.3. The following lemma is a counterpart to Lemma B.4 for the noisy case.

Lemma B.5.

For all epochs r𝑟ritalic_r, the following relation holds with probability at least 1δ/21𝛿21-\delta/21 - italic_δ / 2:

ςr{2B if r=1,4ατ(δ/2)[supx𝒳r1σr1,τ(x)]+2BT+R2Tτlog(4Tδ) if r2.subscript𝜍𝑟cases2𝐵 if 𝑟14subscript𝛼𝜏superscript𝛿2delimited-[]subscriptsupremum𝑥subscript𝒳𝑟1subscript𝜎𝑟1𝜏𝑥2𝐵𝑇𝑅2𝑇𝜏4𝑇superscript𝛿 if 𝑟2\displaystyle\varsigma_{r}\leq\begin{cases}2B&\text{ if }r=1,\\ 4\alpha_{\tau}(\delta^{\prime}/2)\left[\sup_{x\in\mathcal{X}_{r-1}}\sigma_{r-1% ,\tau}(x)\right]+\frac{2B}{T}+R\sqrt{\frac{2}{T\tau}\log\left(\frac{4T}{\delta% ^{\prime}}\right)}&\text{ if }r\geq 2.\end{cases}italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≤ { start_ROW start_CELL 2 italic_B end_CELL start_CELL if italic_r = 1 , end_CELL end_ROW start_ROW start_CELL 4 italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) [ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r - 1 , italic_τ end_POSTSUBSCRIPT ( italic_x ) ] + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG + italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG end_CELL start_CELL if italic_r ≥ 2 . end_CELL end_ROW

The proof of this lemma is identical to that of Lemma B.4 with the definitions of UCBUCB\mathrm{UCB}roman_UCB and LCBLCB\mathrm{LCB}roman_LCB changed according to the noisy setup (See Vakili et al. (2021a) for an exact derivation). On using Lemma 4.6 (for the noisy case) along with Lemma B.5, we can rewrite Eqn. (26) as

R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+Mfr=2SNr[4CτCLf,Lfατ(δ/2)γNr1,τNr1+2BT+R2Tτlog(4Tδ)]absent2𝐵subscript𝑁1subscript𝑀𝑓superscriptsubscript𝑟2𝑆subscript𝑁𝑟delimited-[]4subscript𝐶𝜏superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾subscript𝑁𝑟1𝜏subscript𝑁𝑟12𝐵𝑇𝑅2𝑇𝜏4𝑇superscript𝛿\displaystyle\leq 2BN_{1}+M_{f}\sum_{r=2}^{S}N_{r}\cdot\left[4\sqrt{C_{\tau}}C% _{L_{f},L_{f}^{\prime}}^{\prime}\alpha_{\tau}(\delta^{\prime}/2)\sqrt{\frac{% \gamma_{N_{r-1},\tau}}{N_{r-1}}}+\frac{2B}{T}+R\sqrt{\frac{2}{T\tau}\log\left(% \frac{4T}{\delta^{\prime}}\right)}\right]≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ [ 4 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG divide start_ARG italic_γ start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG + italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ]
2BN1+Mfr=2SNr[4CτCLf,Lfατ(δ/2)γT,τNr1+2BT+R2Tτlog(4Tδ)],absent2𝐵subscript𝑁1subscript𝑀𝑓superscriptsubscript𝑟2𝑆subscript𝑁𝑟delimited-[]4subscript𝐶𝜏superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾𝑇𝜏subscript𝑁𝑟12𝐵𝑇𝑅2𝑇𝜏4𝑇superscript𝛿\displaystyle\leq 2BN_{1}+M_{f}\sum_{r=2}^{S}N_{r}\cdot\left[4\sqrt{C_{\tau}}C% _{L_{f},L_{f}^{\prime}}^{\prime}\alpha_{\tau}(\delta^{\prime}/2)\sqrt{\frac{% \gamma_{T,\tau}}{N_{r-1}}}+\frac{2B}{T}+R\sqrt{\frac{2}{T\tau}\log\left(\frac{% 4T}{\delta^{\prime}}\right)}\right],≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ [ 4 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG divide start_ARG italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG + italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ] , (28)

where second line follows using monotonicity of γn,τsubscript𝛾𝑛𝜏\gamma_{n,\tau}italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT i.e., γn1,τγn2,τsubscript𝛾subscript𝑛1𝜏subscript𝛾subscript𝑛2𝜏\gamma_{n_{1},\tau}\leq\gamma_{n_{2},\tau}italic_γ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_τ end_POSTSUBSCRIPT for all n1n2subscript𝑛1subscript𝑛2n_{1}\leq n_{2}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and Cτsubscript𝐶𝜏C_{\tau}italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT is the leading constant in Eqn. (9). On plugging in the values of Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in Eqn. (28), we obtain,

R(T)𝑅𝑇\displaystyle R(T)italic_R ( italic_T ) 2BN1+r=2SNr[4CτCLf,LfMfατ(δ/2)γT,τNr1+2BMfT+RMf2Tτlog(4Tδ)]absent2𝐵subscript𝑁1superscriptsubscript𝑟2𝑆subscript𝑁𝑟delimited-[]4subscript𝐶𝜏superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝑀𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾𝑇𝜏subscript𝑁𝑟12𝐵subscript𝑀𝑓𝑇𝑅subscript𝑀𝑓2𝑇𝜏4𝑇superscript𝛿\displaystyle\leq 2BN_{1}+\sum_{r=2}^{S}N_{r}\cdot\left[4\sqrt{C_{\tau}}C_{L_{% f},L_{f}^{\prime}}^{\prime}M_{f}\alpha_{\tau}(\delta^{\prime}/2)\sqrt{\frac{% \gamma_{T,\tau}}{N_{r-1}}}+\frac{2BM_{f}}{T}+RM_{f}\sqrt{\frac{2}{T\tau}\log% \left(\frac{4T}{\delta^{\prime}}\right)}\right]≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ [ 4 square-root start_ARG italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG divide start_ARG italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG 2 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG + italic_R italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ]
2BN1+r=2S[4N1CτCfατ(δ/2)γT,τ2r12(r2)/2+Mf2BN1T2r1+MfRN12Tτlog(4Tδ)2r1]absent2𝐵subscript𝑁1superscriptsubscript𝑟2𝑆delimited-[]4subscript𝑁1subscript𝐶𝜏subscript𝐶𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾𝑇𝜏superscript2𝑟1superscript2𝑟22subscript𝑀𝑓2𝐵subscript𝑁1𝑇superscript2𝑟1subscript𝑀𝑓𝑅subscript𝑁12𝑇𝜏4𝑇superscript𝛿superscript2𝑟1\displaystyle\leq 2BN_{1}+\sum_{r=2}^{S}\left[4\sqrt{N_{1}C_{\tau}}C_{f}\alpha% _{\tau}(\delta^{\prime}/2)\sqrt{\gamma_{T,\tau}}\cdot 2^{r-1}\cdot 2^{-(r-2)/2% }+M_{f}\cdot\frac{2BN_{1}}{T}\cdot 2^{r-1}+M_{f}\cdot RN_{1}\sqrt{\frac{2}{T% \tau}\log\left(\frac{4T}{\delta^{\prime}}\right)}\cdot 2^{r-1}\right]≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_r = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT [ 4 square-root start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT ⋅ 2 start_POSTSUPERSCRIPT - ( italic_r - 2 ) / 2 end_POSTSUPERSCRIPT + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ divide start_ARG 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ italic_R italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT ]
2BN1+8N1CτCfατ(δ/2)γT,τ(r=0S22r/2)+Mf(4BT+2R2Tτlog(4Tδ))N1(r=0S22r)absent2𝐵subscript𝑁18subscript𝑁1subscript𝐶𝜏subscript𝐶𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾𝑇𝜏superscriptsubscript𝑟0𝑆2superscript2𝑟2subscript𝑀𝑓4𝐵𝑇2𝑅2𝑇𝜏4𝑇superscript𝛿subscript𝑁1superscriptsubscript𝑟0𝑆2superscript2𝑟\displaystyle\leq 2BN_{1}+8\sqrt{N_{1}C_{\tau}}C_{f}\alpha_{\tau}(\delta^{% \prime}/2)\sqrt{\gamma_{T,\tau}}\left(\sum_{r=0}^{S-2}2^{r/2}\right)+M_{f}% \cdot\left(\frac{4B}{T}+2R\sqrt{\frac{2}{T\tau}\log\left(\frac{4T}{\delta^{% \prime}}\right)}\right)N_{1}\left(\sum_{r=0}^{S-2}2^{r}\right)≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 8 square-root start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r / 2 end_POSTSUPERSCRIPT ) + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG + 2 italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ) italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_r = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S - 2 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT )
2BN1+821N1CτCfατ(δ/2)γT,τTN1+Mf(4BT+2R2Tτlog(4Tδ))N1TN1absent2𝐵subscript𝑁1821subscript𝑁1subscript𝐶𝜏subscript𝐶𝑓subscript𝛼𝜏superscript𝛿2subscript𝛾𝑇𝜏𝑇subscript𝑁1subscript𝑀𝑓4𝐵𝑇2𝑅2𝑇𝜏4𝑇superscript𝛿subscript𝑁1𝑇subscript𝑁1\displaystyle\leq 2BN_{1}+\frac{8}{\sqrt{2}-1}\sqrt{N_{1}C_{\tau}}C_{f}\alpha_% {\tau}(\delta^{\prime}/2)\sqrt{\gamma_{T,\tau}}\cdot\sqrt{\frac{T}{N_{1}}}+M_{% f}\cdot\left(\frac{4B}{T}+2R\sqrt{\frac{2}{T\tau}\log\left(\frac{4T}{\delta^{% \prime}}\right)}\right)\cdot N_{1}\cdot\frac{T}{N_{1}}≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 8 end_ARG start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG square-root start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG ⋅ square-root start_ARG divide start_ARG italic_T end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG + italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG + 2 italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ) ⋅ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ divide start_ARG italic_T end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
2BN1+821CτCfατ(δ/2)TγT,τ+4BMf+2RMf2Tτlog(4Tδ),absent2𝐵subscript𝑁1821subscript𝐶𝜏subscript𝐶𝑓subscript𝛼𝜏superscript𝛿2𝑇subscript𝛾𝑇𝜏4𝐵subscript𝑀𝑓2𝑅subscript𝑀𝑓2𝑇𝜏4𝑇superscript𝛿\displaystyle\leq 2BN_{1}+\frac{8}{\sqrt{2}-1}\sqrt{C_{\tau}}C_{f}\alpha_{\tau% }(\delta^{\prime}/2)\sqrt{T\gamma_{T,\tau}}+4BM_{f}+2RM_{f}\sqrt{\frac{2T}{% \tau}\log\left(\frac{4T}{\delta^{\prime}}\right)},≤ 2 italic_B italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 8 end_ARG start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG square-root start_ARG italic_C start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / 2 ) square-root start_ARG italic_T italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG + 4 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + 2 italic_R italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 2 italic_T end_ARG start_ARG italic_τ end_ARG roman_log ( divide start_ARG 4 italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) end_ARG ,

where Cf=CLf,LfMfsubscript𝐶𝑓superscriptsubscript𝐶subscript𝐿𝑓superscriptsubscript𝐿𝑓subscript𝑀𝑓C_{f}=C_{L_{f},L_{f}^{\prime}}^{\prime}M_{f}italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT as before. Hence, R(T)𝑅𝑇R(T)italic_R ( italic_T ) satisfies 𝒪~(TγT,τ)~𝒪𝑇subscript𝛾𝑇𝜏\tilde{\mathcal{O}}(\sqrt{T\gamma_{T,\tau}})over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_T italic_γ start_POSTSUBSCRIPT italic_T , italic_τ end_POSTSUBSCRIPT end_ARG ), as required.

B.3 Proof of Auxiliary Lemmas

B.3.1 Proof of Lemma B.3

The main ingredient in the proof is the relation: |f(x)μr1(x)|Bσr1(x)𝑓𝑥subscript𝜇𝑟1𝑥𝐵subscript𝜎𝑟1𝑥|f(x)-\mu_{r-1}(x)|\leq B\sigma_{r-1}(x)| italic_f ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ), which holds for all x𝒳r1𝑥subscript𝒳𝑟1x\in\mathcal{X}_{r-1}italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT and across all epochs r𝑟ritalic_r. This is a well-known relation in the literature Vakili et al. (2021a); Lyu et al. (2020) that bounds the predictive performance of the posterior mean in terms of posterior variance.

We use induction to prove the lemma. Since 𝒳1=𝒳subscript𝒳1𝒳\mathcal{X}_{1}=\mathcal{X}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = caligraphic_X and x*𝒳superscript𝑥𝒳x^{*}\in\mathcal{X}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X holds by definition, x*𝒳1superscript𝑥subscript𝒳1x^{*}\in\mathcal{X}_{1}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Assume that x*𝒳r1superscript𝑥subscript𝒳𝑟1x^{*}\in\mathcal{X}_{r-1}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT. Using the relation |f(x)μr1(x)|Bσr1(x)𝑓𝑥subscript𝜇𝑟1𝑥𝐵subscript𝜎𝑟1𝑥|f(x)-\mu_{r-1}(x)|\leq B\sigma_{r-1}(x)| italic_f ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ), we can conclude,

supx𝒳r1LCBr1(x)=supx𝒳r1(μr1(x)Bσr1(x))supx𝒳r1f(x)=f(x*)UCBr1(x*),subscriptsupremumsuperscript𝑥subscript𝒳𝑟1subscriptLCB𝑟1superscript𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟1subscript𝜇𝑟1superscript𝑥𝐵subscript𝜎𝑟1superscript𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥𝑓superscript𝑥subscriptUCB𝑟1superscript𝑥\displaystyle\sup_{x^{\prime}\in\mathcal{X}_{r-1}}\mathrm{LCB}_{r-1}(x^{\prime% })=\sup_{x^{\prime}\in\mathcal{X}_{r-1}}(\mu_{r-1}(x^{\prime})-B\sigma_{r-1}(x% ^{\prime}))\leq\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})=f(x^{*})\leq% \mathrm{UCB}_{r-1}(x^{*}),roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_LCB start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ roman_UCB start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ,

where we used the inductive hypothesis to establish supx𝒳r1f(x)=f(x*)subscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥𝑓superscript𝑥\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})=f(x^{*})roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). This implies that x*𝒳rsuperscript𝑥subscript𝒳𝑟x^{*}\in\mathcal{X}_{r}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, as required.

B.3.2 Proof of Lemma B.4

We separately show the bounds for r=1𝑟1r=1italic_r = 1 and r2𝑟2r\geq 2italic_r ≥ 2. For the first epoch, we have,

ς1=f(x*)infx𝒳1f(x)=f(x*)infx𝒳f(x)2supx𝒳f(x)2B.subscript𝜍1𝑓superscript𝑥subscriptinfimum𝑥subscript𝒳1𝑓𝑥𝑓superscript𝑥subscriptinfimum𝑥𝒳𝑓𝑥2subscriptsupremum𝑥𝒳𝑓𝑥2𝐵\displaystyle\varsigma_{1}=f(x^{*})-\inf_{x\in\mathcal{X}_{1}}f(x)=f(x^{*})-% \inf_{x\in\mathcal{X}}f(x)\leq 2\sup_{x\in\mathcal{X}}f(x)\leq 2B.italic_ς start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ) = italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ) ≤ 2 roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ) ≤ 2 italic_B .

We used the fact that supx𝒳f(x)=supx𝒳fψxsupx𝒳fkψxkBsubscriptsupremum𝑥𝒳𝑓𝑥subscriptsupremum𝑥𝒳superscript𝑓topsubscript𝜓𝑥subscriptsupremum𝑥𝒳subscriptnorm𝑓subscript𝑘subscriptnormsubscript𝜓𝑥subscript𝑘𝐵\sup_{x\in\mathcal{X}}f(x)=\sup_{x\in\mathcal{X}}f^{\top}\psi_{x}\leq\sup_{x% \in\mathcal{X}}\|f\|_{\mathcal{H}_{k}}\|\psi_{x}\|_{\mathcal{H}_{k}}\leq Broman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ) = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_ψ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_B. Consider any epoch r2𝑟2r\geq 2italic_r ≥ 2. For the analysis, we define

𝒳r:={x𝒳r1:f(x)+2Bσr1(x)supx𝒳r1f(x)2Bσr1(x)}.assignsuperscriptsubscript𝒳𝑟conditional-set𝑥subscript𝒳𝑟1𝑓𝑥2𝐵subscript𝜎𝑟1𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥2𝐵subscript𝜎𝑟1superscript𝑥\displaystyle\mathcal{X}_{r}^{\prime}:=\{x\in\mathcal{X}_{r-1}:f(x)+2B\sigma_{% r-1}(x)\geq\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})-2B\sigma_{r-1}(x% ^{\prime})\}.caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := { italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT : italic_f ( italic_x ) + 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } .

The region 𝒳rsuperscriptsubscript𝒳𝑟\mathcal{X}_{r}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT satisfies 𝒳r𝒳rsubscript𝒳𝑟superscriptsubscript𝒳𝑟\mathcal{X}_{r}\subseteq\mathcal{X}_{r}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. To establish this, we once again employ the relation |f(x)μr1(x)|Bσr1(x)𝑓𝑥subscript𝜇𝑟1𝑥𝐵subscript𝜎𝑟1𝑥|f(x)-\mu_{r-1}(x)|\leq B\sigma_{r-1}(x)| italic_f ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ). Using the relation, we can conclude that

UCBr1(x)subscriptUCB𝑟1𝑥\displaystyle\mathrm{UCB}_{r-1}(x)roman_UCB start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) =μr1(x)+Bσr1(x)(f(x)+Bσr1(x))+Bσr1(x)=f(x)+2Bσr1(x)absentsubscript𝜇𝑟1𝑥𝐵subscript𝜎𝑟1𝑥𝑓𝑥𝐵subscript𝜎𝑟1𝑥𝐵subscript𝜎𝑟1𝑥𝑓𝑥2𝐵subscript𝜎𝑟1𝑥\displaystyle=\mu_{r-1}(x)+B\sigma_{r-1}(x)\leq(f(x)+B\sigma_{r-1}(x))+B\sigma% _{r-1}(x)=f(x)+2B\sigma_{r-1}(x)= italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) + italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ≤ ( italic_f ( italic_x ) + italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ) + italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) = italic_f ( italic_x ) + 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x )
LCBr1(x)subscriptLCB𝑟1𝑥\displaystyle\mathrm{LCB}_{r-1}(x)roman_LCB start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) =μr1(x)Bσr1(x)(f(x)Bσr1(x))Bσr1(x)=f(x)2Bσr1(x).absentsubscript𝜇𝑟1𝑥𝐵subscript𝜎𝑟1𝑥𝑓𝑥𝐵subscript𝜎𝑟1𝑥𝐵subscript𝜎𝑟1𝑥𝑓𝑥2𝐵subscript𝜎𝑟1𝑥\displaystyle=\mu_{r-1}(x)-B\sigma_{r-1}(x)\geq(f(x)-B\sigma_{r-1}(x))-B\sigma% _{r-1}(x)=f(x)-2B\sigma_{r-1}(x).= italic_μ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) - italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ≥ ( italic_f ( italic_x ) - italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) ) - italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) = italic_f ( italic_x ) - 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) .

The inclusion 𝒳r𝒳rsubscript𝒳𝑟superscriptsubscript𝒳𝑟\mathcal{X}_{r}\subseteq\mathcal{X}_{r}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊆ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT follows immediately from the definition of 𝒳rsubscript𝒳𝑟\mathcal{X}_{r}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and 𝒳rsuperscriptsubscript𝒳𝑟\mathcal{X}_{r}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the above expressions.

Consider the following relation which holds for any x𝒳r𝑥superscriptsubscript𝒳𝑟x\in\mathcal{X}_{r}^{\prime}italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

f(x)+2Bσr1(x)𝑓𝑥2𝐵subscript𝜎𝑟1𝑥\displaystyle f(x)+2B\sigma_{r-1}(x)italic_f ( italic_x ) + 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x ) supx𝒳r1f(x)2Bσr1(x)absentsubscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥2𝐵subscript𝜎𝑟1superscript𝑥\displaystyle\geq\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})-2B\sigma_{% r-1}(x^{\prime})≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
f(x)absent𝑓𝑥\displaystyle\implies f(x)⟹ italic_f ( italic_x ) supx𝒳r1[f(x)2Bσr1(x)]supx′′𝒳r1[2Bσr1(x′′)]absentsubscriptsupremumsuperscript𝑥subscript𝒳𝑟1delimited-[]𝑓superscript𝑥2𝐵subscript𝜎𝑟1superscript𝑥subscriptsupremumsuperscript𝑥′′subscript𝒳𝑟1delimited-[]2𝐵subscript𝜎𝑟1superscript𝑥′′\displaystyle\geq\sup_{x^{\prime}\in\mathcal{X}_{r-1}}[f(x^{\prime})-2B\sigma_% {r-1}(x^{\prime})]-\sup_{x^{\prime\prime}\in\mathcal{X}_{r-1}}[2B\sigma_{r-1}(% x^{\prime\prime})]≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] - roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ 2 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ]
supx𝒳r1f(x)supx′′𝒳r1[4Bσr1(x′′)]absentsubscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥subscriptsupremumsuperscript𝑥′′subscript𝒳𝑟1delimited-[]4𝐵subscript𝜎𝑟1superscript𝑥′′\displaystyle\geq\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})-\sup_{x^{% \prime\prime}\in\mathcal{X}_{r-1}}[4B\sigma_{r-1}(x^{\prime\prime})]≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ 4 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ]
f(x*)supx′′𝒳r1[4Bσr1(x′′)].absent𝑓superscript𝑥subscriptsupremumsuperscript𝑥′′subscript𝒳𝑟1delimited-[]4𝐵subscript𝜎𝑟1superscript𝑥′′\displaystyle\geq f(x^{*})-\sup_{x^{\prime\prime}\in\mathcal{X}_{r-1}}[4B% \sigma_{r-1}(x^{\prime\prime})].≥ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ 4 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ] . (29)

In the last line, we used Lemma B.3 to conclude supx𝒳r1f(x)=f(x*)subscriptsupremumsuperscript𝑥subscript𝒳𝑟1𝑓superscript𝑥𝑓superscript𝑥\sup_{x^{\prime}\in\mathcal{X}_{r-1}}f(x^{\prime})=f(x^{*})roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). Since 𝒳r𝒳rsubscript𝒳𝑟superscriptsubscript𝒳𝑟\mathcal{X}_{r}\subset\mathcal{X}_{r}^{\prime}caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⊂ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we can use Eqn. (29) to obtain an upper bound on ςrsubscript𝜍𝑟\varsigma_{r}italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT as follows:

ςrsubscript𝜍𝑟\displaystyle\varsigma_{r}italic_ς start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =f(x*)infx𝒳rf(x)absent𝑓superscript𝑥subscriptinfimum𝑥subscript𝒳𝑟𝑓𝑥\displaystyle=f(x^{*})-\inf_{x\in\mathcal{X}_{r}}f(x)= italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x )
f(x*)infx𝒳rf(x)absent𝑓superscript𝑥subscriptinfimum𝑥superscriptsubscript𝒳𝑟𝑓𝑥\displaystyle\leq f(x^{*})-\inf_{x\in\mathcal{X}_{r}^{\prime}}f(x)≤ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x )
f(x*)[f(x*)supx𝒳r14Bσr1(x)]absent𝑓superscript𝑥delimited-[]𝑓superscript𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑟14𝐵subscript𝜎𝑟1superscript𝑥\displaystyle\leq f(x^{*})-\left[f(x^{*})-\sup_{x^{\prime}\in\mathcal{X}_{r-1}% }4B\sigma_{r-1}(x^{\prime})\right]≤ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - [ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT 4 italic_B italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ]
4Bsupx𝒳r1σr1(x).absent4𝐵subscriptsupremumsuperscript𝑥subscript𝒳𝑟1subscript𝜎𝑟1superscript𝑥\displaystyle\leq 4B\sup_{x^{\prime}\in\mathcal{X}_{r-1}}\sigma_{r-1}(x^{% \prime}).≤ 4 italic_B roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

B.3.3 Proof of Lemma 4.6

We begin with the noiseless case. For brevity, we drop the subscript 00 from the posterior variance corresponding to the noiseless case. Consider a kernel k𝑘kitalic_k and let \mathcal{H}caligraphic_H and superscript\mathcal{H}^{\prime}caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote the RKHS induced by k𝑘kitalic_k on 𝒳𝒳\mathcal{X}caligraphic_X and 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Since 𝒳𝒳superscript𝒳𝒳\mathcal{X}^{\prime}\subset\mathcal{X}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊂ caligraphic_X, it is straightforward to note that superscript\mathcal{H}^{\prime}\subseteq\mathcal{H}caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ caligraphic_H. Using the result from Wendland (2004, Theorem 10.46), we know that for every f𝑓superscriptf\in\mathcal{H}^{\prime}italic_f ∈ caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT there exists a natural extension f𝑓\mathscr{E}f\in\mathcal{H}script_E italic_f ∈ caligraphic_H such that f=fsubscriptnorm𝑓subscriptnorm𝑓superscript\|\mathscr{E}f\|_{\mathcal{H}}=\|f\|_{\mathcal{H}^{\prime}}∥ script_E italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Consequently, we can conclude {f:f1}{f:f1}conditional-set𝑓subscriptnorm𝑓superscript1conditional-set𝑓subscriptnorm𝑓1\{f:\|f\|_{\mathcal{H}^{\prime}}\leq 1\}\subseteq\{f:\|f\|_{\mathcal{H}}\leq 1\}{ italic_f : ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 } ⊆ { italic_f : ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 }. Lastly, note that superscript\mathcal{H}^{\prime}caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is same as the RKHS of the kernel k(x,y)=k(Γ(x),Γ(y))superscript𝑘𝑥𝑦𝑘Γ𝑥Γ𝑦k^{\prime}(x,y)=k(\Gamma(x),\Gamma(y))italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x , italic_y ) = italic_k ( roman_Γ ( italic_x ) , roman_Γ ( italic_y ) ) over the domain 𝒳𝒳\mathcal{X}caligraphic_X. Here ΓΓ\Gammaroman_Γ denotes the bi-Lipschitian map Γ:𝒳𝒳:Γ𝒳superscript𝒳\Gamma:\mathcal{X}\to\mathcal{X}^{\prime}roman_Γ : caligraphic_X → caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as given by Assumption 4.

Let X𝒳𝑋𝒳X\subset\mathcal{X}italic_X ⊂ caligraphic_X be any set of distinct points and σX(x)superscriptsubscript𝜎𝑋𝑥\sigma_{X}^{\prime}(x)italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) and σX(x)subscript𝜎𝑋𝑥\sigma_{X}(x)italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) denote the posterior standard deviation at any point x𝑥xitalic_x computed using the kernels ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and k𝑘kitalic_k. Using the dual formulation of posterior variance, we have the following relation:

σX(x)=supff1f(X)={0}f(x)supff1f(X)={0}f(x)=σX(x).superscriptsubscript𝜎𝑋𝑥subscriptsupremum𝑓superscriptsubscriptnorm𝑓superscript1𝑓𝑋0𝑓𝑥subscriptsupremum𝑓subscriptnorm𝑓1𝑓𝑋0𝑓𝑥subscript𝜎𝑋𝑥\displaystyle\sigma_{X}^{\prime}(x)=\sup_{\begin{subarray}{c}f\in\mathcal{H}^{% \prime}\\ \|f\|_{\mathcal{H}^{\prime}}\leq 1\\ f(X)=\{0\}\end{subarray}}f(x)\leq\sup_{\begin{subarray}{c}f\in\mathcal{H}\\ \|f\|_{\mathcal{H}}\leq 1\\ f(X)=\{0\}\end{subarray}}f(x)=\sigma_{X}(x).italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 end_CELL end_ROW start_ROW start_CELL italic_f ( italic_X ) = { 0 } end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_f ( italic_x ) ≤ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_f ∈ caligraphic_H end_CELL end_ROW start_ROW start_CELL ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT ≤ 1 end_CELL end_ROW start_ROW start_CELL italic_f ( italic_X ) = { 0 } end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_f ( italic_x ) = italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) .

In the above relation, we used the fact that superscript\mathcal{H}^{\prime}\subset\mathcal{H}caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊂ caligraphic_H and the unit ball in superscript\mathcal{H}^{\prime}caligraphic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is contained in the unit ball in \mathcal{H}caligraphic_H. This implies that the prediction made using the kernel ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT has a smaller error than the prediction made by using kernel k𝑘kitalic_k. If we set X=Γ1(X)𝑋superscriptΓ1superscript𝑋X=\Gamma^{-1}(X^{\prime})italic_X = roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )555For any operator ΓΓ\Gammaroman_Γ and X={x1,x2,,xn}𝑋subscript𝑥1subscript𝑥2subscript𝑥𝑛X=\{x_{1},x_{2},\dots,x_{n}\}italic_X = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, we use the shorthand Γ(X)Γ𝑋\Gamma(X)roman_Γ ( italic_X ) for the set {Γ(x1),Γ(x2),,Γ(xn)}Γsubscript𝑥1Γsubscript𝑥2Γsubscript𝑥𝑛\{\Gamma(x_{1}),\Gamma(x_{2}),\dots,\Gamma(x_{n})\}{ roman_Γ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , roman_Γ ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , roman_Γ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }., then the above is equivalent to saying that the prediction error using kernel k𝑘kitalic_k corresponding to set of points X𝒳superscript𝑋superscript𝒳X^{\prime}\in\mathcal{X}^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is smaller than the prediction error using kernel k𝑘kitalic_k corresponding to set of points X𝒳𝑋𝒳X\in\mathcal{X}italic_X ∈ caligraphic_X.

Since the points Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are distributed uniformly in 𝒳superscript𝒳\mathcal{X}^{\prime}caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the points X=Γ1(X)𝑋superscriptΓ1superscript𝑋X=\Gamma^{-1}(X^{\prime})italic_X = roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) are distributed according to density ϑ(x)=det(Γ(x))vol(𝒳)italic-ϑ𝑥Γ𝑥volsuperscript𝒳\vartheta(x)=\frac{\det(\nabla\Gamma(x))}{\mathrm{vol}(\mathcal{X}^{\prime})}italic_ϑ ( italic_x ) = divide start_ARG roman_det ( ∇ roman_Γ ( italic_x ) ) end_ARG start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG for all x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, where det(A)𝐴\det(A)roman_det ( italic_A ) denotes the determinant of a matrix A𝐴Aitalic_A and ΓΓ\nabla\Gamma∇ roman_Γ denotes the Jacobian of ΓΓ\Gammaroman_Γ. Note that ΓΓ\nabla\Gamma∇ roman_Γ (and hence the density ϑitalic-ϑ\varthetaitalic_ϑ) is well-defined almost everywhere (a.e.) as a consequence of Rademacher’s theorem (Rudin, 1987, Chp. 7) and Lipschitz continuity of ΓΓ\Gammaroman_Γ.

Let ϱunifsubscriptitalic-ϱunif\varrho_{\mathrm{unif}}italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT denote the uniform distribution on 𝒳𝒳\mathcal{X}caligraphic_X (i.e., the Lebesgue measure). We construct a (random) subset of X𝑋Xitalic_X, denoted by Y𝑌Yitalic_Y, as follows. Each point xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i{1,2,,n}𝑖12𝑛i\in\{1,2,\dots,n\}italic_i ∈ { 1 , 2 , … , italic_n } is added into Y𝑌Yitalic_Y independently of others with probability cϑϱunif(xi)ϑ(xi)subscript𝑐italic-ϑsubscriptitalic-ϱunifsubscript𝑥𝑖italic-ϑsubscript𝑥𝑖c_{\vartheta}\frac{\varrho_{\mathrm{unif}}(x_{i})}{\vartheta(x_{i})}italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT divide start_ARG italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ϑ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG, where cϑ=infxϑ(x)ϱunif(x)subscript𝑐italic-ϑsubscriptinfimum𝑥italic-ϑ𝑥subscriptitalic-ϱunif𝑥c_{\vartheta}=\inf_{x}\frac{\vartheta(x)}{\varrho_{\mathrm{unif}}(x)}italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG italic_ϑ ( italic_x ) end_ARG start_ARG italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT ( italic_x ) end_ARG (where the infimum is taken over where ϑitalic-ϑ\varthetaitalic_ϑ is well defined). It is straightforward to note that the samples in Y𝑌Yitalic_Y are distributed according to ϱunifsubscriptitalic-ϱunif\varrho_{\mathrm{unif}}italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT. Using the Bernstein inequality for sum of Bernoulli random variables, we can conclude that |Y|𝑌|Y|| italic_Y |, the number of points in Y𝑌Yitalic_Y satisfies the relation |Y|cϑn2Cϑ𝑌subscript𝑐italic-ϑ𝑛2subscript𝐶italic-ϑ|Y|\geq\frac{c_{\vartheta}n}{2C_{\vartheta}}| italic_Y | ≥ divide start_ARG italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT italic_n end_ARG start_ARG 2 italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG with probability 1δ1𝛿1-\delta1 - italic_δ as long as 3cϑn16Cϑlog(2/δ)3subscript𝑐italic-ϑ𝑛16subscript𝐶italic-ϑ2𝛿\frac{3c_{\vartheta}n}{16C_{\vartheta}}\geq\log(2/\delta)divide start_ARG 3 italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT italic_n end_ARG start_ARG 16 italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG ≥ roman_log ( 2 / italic_δ ). Here Cϑ=supxϑ(x)ϱunif(x)subscript𝐶italic-ϑsubscriptsupremum𝑥italic-ϑ𝑥subscriptitalic-ϱunif𝑥C_{\vartheta}=\sup_{x}\frac{\vartheta(x)}{\varrho_{\mathrm{unif}}(x)}italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG italic_ϑ ( italic_x ) end_ARG start_ARG italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT ( italic_x ) end_ARG. Since YX𝑌𝑋Y\subseteq Xitalic_Y ⊆ italic_X, the prediction based on the values of X𝑋Xitalic_X is no worse than the prediction based on the values of Y𝑌Yitalic_Y. Thus,

supx𝒳σX2(x)subscriptsupremumsuperscript𝑥superscript𝒳superscriptsubscript𝜎superscript𝑋2superscript𝑥\displaystyle\sup_{x^{\prime}\in\mathcal{X}^{\prime}}\sigma_{X^{\prime}}^{2}(x% ^{\prime})roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) supx𝒳σX2(x)supx𝒳σY2(x)absentsubscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑋2𝑥subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑌2𝑥\displaystyle\leq\sup_{x\in\mathcal{X}}\sigma_{X}^{2}(x)\leq\sup_{x\in\mathcal% {X}}\sigma_{Y}^{2}(x)≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x )

An identical result holds for the noisy case using an identical series of arguments using the kernel kτ(x,x)=k(x,x)+τδx=xsubscript𝑘𝜏𝑥superscript𝑥𝑘𝑥𝑥𝜏subscript𝛿𝑥superscript𝑥k_{\tau}(x,x^{\prime})=k(x,x)+\tau\delta_{x=x^{\prime}}italic_k start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_k ( italic_x , italic_x ) + italic_τ italic_δ start_POSTSUBSCRIPT italic_x = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT Kanagawa et al. (2018), where δx=xsubscript𝛿𝑥superscript𝑥\delta_{x=x^{\prime}}italic_δ start_POSTSUBSCRIPT italic_x = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT denotes the dirac delta function. We can invoke the result from Theorem 3.1 for uniform samples on 𝒳𝒳\mathcal{X}caligraphic_X to bound σY2(x)superscriptsubscript𝜎𝑌2𝑥\sigma_{Y}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) under both the noisy and noiseless settings to obtain the following relations

supx𝒳σX,τ2(x)supx𝒳σY,τ2(x)Cϑcϑ21613F2τγn,τn,subscriptsupremumsuperscript𝑥superscript𝒳superscriptsubscript𝜎superscript𝑋𝜏2superscript𝑥subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑌𝜏2𝑥subscript𝐶italic-ϑsubscript𝑐italic-ϑ21613superscript𝐹2𝜏subscript𝛾𝑛𝜏𝑛\displaystyle\sup_{x^{\prime}\in\mathcal{X}^{\prime}}\sigma_{X^{\prime},\tau}^% {2}(x^{\prime})\leq\sup_{x\in\mathcal{X}}\sigma_{Y,\tau}^{2}(x)\leq\frac{C_{% \vartheta}}{c_{\vartheta}}\cdot\frac{216}{13}\cdot F^{2}\tau\cdot\frac{\gamma_% {n,\tau}}{n},roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_Y , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ divide start_ARG italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG 216 end_ARG start_ARG 13 end_ARG ⋅ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ ⋅ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_n , italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ,
supx𝒳σX,02(x)supx𝒳σY,02(x)Cϑcϑ21613F2n1β.subscriptsupremumsuperscript𝑥superscript𝒳superscriptsubscript𝜎superscript𝑋02superscript𝑥subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝑌02𝑥subscript𝐶italic-ϑsubscript𝑐italic-ϑ21613superscript𝐹2superscript𝑛1𝛽\displaystyle\sup_{x^{\prime}\in\mathcal{X}^{\prime}}\sigma_{X^{\prime},0}^{2}% (x^{\prime})\leq\sup_{x\in\mathcal{X}}\sigma_{Y,0}^{2}(x)\leq\frac{C_{% \vartheta}}{c_{\vartheta}}\cdot\frac{216}{13}\cdot F^{2}\cdot n^{1-\beta}.roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_Y , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ divide start_ARG italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG 216 end_ARG start_ARG 13 end_ARG ⋅ italic_F start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_n start_POSTSUPERSCRIPT 1 - italic_β end_POSTSUPERSCRIPT .

We only need to obtain a bound the ratio Cϑ/cϑsubscript𝐶italic-ϑsubscript𝑐italic-ϑC_{\vartheta}/c_{\vartheta}italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT that is independent of n𝑛nitalic_n to complete the proof. Using the Lipschitzness of ΓΓ\Gammaroman_Γ and Γ1superscriptΓ1\Gamma^{-1}roman_Γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, we can conclude that

Lfd|det(Γ)|Lfd.superscriptsubscript𝐿𝑓𝑑Γsuperscriptsubscript𝐿𝑓𝑑\displaystyle L_{f}^{\prime-d}\leq|\det(\nabla\Gamma)|\leq L_{f}^{d}.italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ - italic_d end_POSTSUPERSCRIPT ≤ | roman_det ( ∇ roman_Γ ) | ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Using the definition of cϑsubscript𝑐italic-ϑc_{\vartheta}italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT, we have,

cϑ=infxϑ(x)ϱunif(x)=infxdet(Γ(x))vol(𝒳)vol(𝒳)vol(𝒳)Lfdvol(𝒳)=Lf~d.subscript𝑐italic-ϑsubscriptinfimum𝑥italic-ϑ𝑥subscriptitalic-ϱunif𝑥subscriptinfimum𝑥Γ𝑥vol𝒳volsuperscript𝒳vol𝒳superscriptsubscript𝐿𝑓𝑑volsuperscript𝒳superscript~subscript𝐿𝑓𝑑\displaystyle c_{\vartheta}=\inf_{x}\frac{\vartheta(x)}{\varrho_{\mathrm{unif}% }(x)}=\inf_{x}\frac{\det(\nabla\Gamma(x))\mathrm{vol}(\mathcal{X})}{\mathrm{% vol}(\mathcal{X}^{\prime})}\geq\frac{\mathrm{vol}(\mathcal{X})}{L_{f}^{\prime d% }\mathrm{vol}(\mathcal{X}^{\prime})}=\tilde{L_{f}}^{\prime-d}.italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG italic_ϑ ( italic_x ) end_ARG start_ARG italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT ( italic_x ) end_ARG = roman_inf start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG roman_det ( ∇ roman_Γ ( italic_x ) ) roman_vol ( caligraphic_X ) end_ARG start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ≥ divide start_ARG roman_vol ( caligraphic_X ) end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_d end_POSTSUPERSCRIPT roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG = over~ start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT ′ - italic_d end_POSTSUPERSCRIPT .

Similarly,

Cϑ=supxϑ(x)ϱunif(x)=supxdet(Γ(x))vol(𝒳)vol(𝒳)Lfdvol(𝒳)vol(𝒳)=Lf~d.subscript𝐶italic-ϑsubscriptsupremum𝑥italic-ϑ𝑥subscriptitalic-ϱunif𝑥subscriptsupremum𝑥Γ𝑥vol𝒳volsuperscript𝒳superscriptsubscript𝐿𝑓𝑑vol𝒳volsuperscript𝒳superscript~subscript𝐿𝑓𝑑\displaystyle C_{\vartheta}=\sup_{x}\frac{\vartheta(x)}{\varrho_{\mathrm{unif}% }(x)}=\sup_{x}\frac{\det(\nabla\Gamma(x))\mathrm{vol}(\mathcal{X})}{\mathrm{% vol}(\mathcal{X}^{\prime})}\leq\frac{L_{f}^{d}\mathrm{vol}(\mathcal{X})}{% \mathrm{vol}(\mathcal{X}^{\prime})}=\tilde{L_{f}}^{d}.italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG italic_ϑ ( italic_x ) end_ARG start_ARG italic_ϱ start_POSTSUBSCRIPT roman_unif end_POSTSUBSCRIPT ( italic_x ) end_ARG = roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG roman_det ( ∇ roman_Γ ( italic_x ) ) roman_vol ( caligraphic_X ) end_ARG start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_vol ( caligraphic_X ) end_ARG start_ARG roman_vol ( caligraphic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG = over~ start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Hence, Cϑ/cϑ(L~f/L~f)dsubscript𝐶italic-ϑsubscript𝑐italic-ϑsuperscriptsubscript~𝐿𝑓superscriptsubscript~𝐿𝑓𝑑C_{\vartheta}/c_{\vartheta}\leq(\tilde{L}_{f}/\tilde{L}_{f}^{\prime})^{d}italic_C start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT ≤ ( over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT / over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT depends only on (L~f,L~f)subscript~𝐿𝑓superscriptsubscript~𝐿𝑓(\tilde{L}_{f},\tilde{L}_{f}^{\prime})( over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , over~ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and is independent of n𝑛nitalic_n, as required.