License: CC BY 4.0
arXiv:2402.13182v1 [cs.LG] 20 Feb 2024

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Nikola Pavlovic School of Electrical & Computer Engineering, Cornell University, Ithaca, NY, {np358, qz16}@cornell.edu Sudeep Salgia Carnegie Mellon University, Pittsburgh, PA, [email protected] Qing Zhao School of Electrical & Computer Engineering, Cornell University, Ithaca, NY, {np358, qz16}@cornell.edu
(Feb 2024)
Abstract

We consider distributed kernel bandits where N𝑁Nitalic_N agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time T𝑇Titalic_T and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both N𝑁Nitalic_N and T𝑇Titalic_T. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.

1 Introduction

1.1 Distributed Kernel Bandits

We study the problem of zeroth-order online stochastic optimization in a distributed setting, where N𝑁Nitalic_N agents aim to collaboratively maximize a reward function with communications facilitated by a central server. The reward function f:𝒳:𝑓𝒳f:\mathcal{X}\rightarrow\mathbb{R}italic_f : caligraphic_X → blackboard_R is unknown; it is only known that it lives in a Reproducing Kernel Hilbert Space (RKHS) associated with a known kernel k𝑘kitalic_k. Each agent sequentially chooses points in the function domain 𝒳𝒳\mathcal{X}caligraphic_X to query and subsequently receives noisy feedback on the function values (i.e., random rewards) at the query points. The goal is for each distributed agent to converge quickly to x*argmaxx𝒳f(x)superscript𝑥subscriptargmax𝑥𝒳𝑓𝑥x^{*}\in\operatorname*{arg\,max}_{x\in\mathcal{X}}f(x)italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ), a global maximizer of f𝑓fitalic_f. We quantify this goal as minimizing the cumulative regret summed over a learning horizon of length T𝑇Titalic_T and over all N𝑁Nitalic_N agents:

R=n=1Nt=1T(f(x*)f(xt(n))),𝑅superscriptsubscript𝑛1𝑁superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscriptsuperscript𝑥𝑛𝑡\displaystyle R=\sum_{n=1}^{N}\sum_{t=1}^{T}\left(f(x^{*})-f(x^{(n)}_{t})% \right),italic_R = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) , (1)

where xt(n)subscriptsuperscript𝑥𝑛𝑡x^{(n)}_{t}italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the point queried by agent n𝑛nitalic_n at time t𝑡titalic_t.

The above zeroth-order stochastic optimization problem can be viewed as a continuum-armed kernelized-bandit problem (Srinivas et al., 2010). The expressive power of the RKHS model represents a broad family of objective functions. In particular, it is known that the RKHS of typical kernels, such as the Matérn family of kernels, can approximate almost all continuous functions on compact subsets of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (Srinivas et al., 2010). The problem has been studied extensively under a centralized setting with a single decision maker (i.e., N=1𝑁1N=1italic_N = 1), for which several algorithms have been proposed, including UCB-based algorithms (Srinivas et al., 2010; Chowdhury and Gopalan, 2017; Abbasi-Yadkori et al., 2011), batched pure exploration (Li and Scarlett, 2022), tree-based domain shrinking (Salgia et al., 2021) and RIPS (Camilleri et al., 2021). Optimal learning efficiency in terms of regret order in T𝑇Titalic_T has been obtained in both the stochastic (Li and Scarlett, 2022; Salgia et al., 2021) and the contextual setting (Valko et al., 2013).

In addition to learning efficiency, distributed kernel bandits face a new challenge of communication efficiency. Without constraints on the communication overhead, all agents can share their local observations and coordinate their individual query actions at no cost. The distributed problem can be trivially reduced to a centralized one. At the other end of the spectrum is a complete decoupling of the agents, resulting in N𝑁Nitalic_N independent single-user problems without the benefit of data sharing for accelerated learning. The tension between learning efficiency (which demands data sharing and action coordination) and communication efficiency is evident. A central question to this trade-off is how to achieve the optimal learning rate enjoyed by the centralized setting using a minimum amount of message exchange among agents.

In contrast to the extensive literature on centralized kernel bandits, distributed kernel bandits are much less explored despite their broad applications (e.g., federated learning for hyperparameter tuning (Dai et al., 2020) and collaborative training of neural nets using the recent theory of Neural Tangent Kernel (Jacot et al., 2018)). There exist only a handful of studies under drastically different settings and constraints (see Sec.1.3). For the setting considered in this work, no distributed learning algorithms exist that achieve the optimal regret order with a sublinear (in both T𝑇Titalic_T and N𝑁Nitalic_N) message exchange among agents.

1.2 Main Results

In this paper, we develop the first algorithm for distributed kernel bandits that achieves the optimal order of regret enjoyed by centralized learning with a sublinear message exchange in both T𝑇Titalic_T and N𝑁Nitalic_N.

To tackle the essential tradeoff between learning rate and communication efficiency, a distributed learning algorithm needs a communication strategy that governs what to communicate and how to integrate the shared information into local query actions. To minimize the total regret that is accumulating over time and aggregating over the agents, the communication strategy needs to work in tandem with the query actions to ensure a continual flow of information available at all agents for decision-making.

A natural answer to what to communicate in a distributed learning problem is certain sufficient local statistics of the underlying unknown parameters. For example, for multi-armed (i.e., discrete arms) and linear bandits, this corresponds to the local estimates of the arm mean values and the mean reward vector respectively. However, for kernel bandits, the corresponding quantity would be an estimate of the function, which is potentially infinite-dimensional and hence an impractical choice for communication. Existing studies resolve this issue by exchanging local query actions and observations across all agents and throughout the learning horizon (Li et al., 2022; Dubey and Pentland, 2020), resulting in a communication cost growing linearly in both N𝑁Nitalic_N and T𝑇Titalic_T.

Even with a communication cost growing linearly in both N𝑁Nitalic_N and T𝑇Titalic_T, preserving the full learning power of a centralized decision maker with NT𝑁𝑇NTitalic_N italic_T query points is not immediate. The prevailing approaches to centralized kernel bandits that achieve order optimal regret build on the maximum posterior variance (MPV) sampling strategy (Li and Scarlett, 2022) which queries, at each time, the point with the highest posterior variance conditioned on all past observations. Ensuring such a maximal uncertainty reduction at each query point is believed to be crucial in utilizing the full statistical power of all query points. Unfortunately, such a fully adaptive query strategy is incompatible with the parallel learning among distributed agents. To emulate the MPV sampling at each of the NT𝑁𝑇NTitalic_N italic_T query points would require the agents to take turns in their queries and share the local observations immediately with all other agents, an infeasible strategy for most distributed learning problems. Implementing MPV-based sampling in parallel across agents, however, loses the full adaptivity. This is arguably the main obstacle in realizing the optimal learning rate of a centralized kernel bandit in a distributed setting.

To tackle the above challenges, our proposed algorithm represents major departures from the prevailing approaches. Referred to as DUETS (Distributed Uniform Exploration of Trimmed Sets), this algorithm has two key features: uniform exploration at the local agents and shared randomness with the central server.

In DUETS , each agent employs uniform (at random) sampling as the query strategy. Uniform sampling is fully compatible with parallel learning. In particular, note that the union of the local sets of size t𝑡titalic_t query points obtained at the agents through uniform sampling is identical (in distribution) to the set of size Nt𝑁𝑡Ntitalic_N italic_t query points obtained at a centralized decision maker using the same uniform sampling strategy. This superposition property of uniform sampling allows us to leverage the recent results on random exploration in centralized kernel bandits (Salgia et al., 2023a), and is crucial in achieving the optimal learning rate defined by the centralized setting. In addition to preserving the learning rate of the centralized setting, uniform sampling enjoys advantages in computation as well as communication aspects. Comparing with the MPV strategy that requires an expensive maximization of a non-convex acquisition function for finding each query point, uniform sampling is extremely simple to implement. This computational efficiency can be particularly attractive to distributed local devices. In terms of communication efficiency, uniform sampling makes it possible to bypass the exchange of query points altogether and reduce the exchange of reward observations through the shared randomness strategy detailed below.

In DUETS, each agent has access to an independent coin, i.e., a source of randomness, which is unknown to the other agents but is known to the server. The shared randomness enables the server to reproduce the points queried by the agents, thereby resulting in effective transmission of the local set of queried points at each agent to the server at no communication cost. To reduce the communication overhead associated with the reward observations, we employ sparse approximation of GP models (Wild et al., 2021). The availability of all the queried points at the server provides the perfect platform for leveraging the power of sparse approximation to reduce the communication to a diminishing fraction of the total number of observations. Specifically, the server, with access to all the query points, selects a small subset of points that can approximate, to sufficient accuracy, the posterior statistics corresponding to all the points queried by the agents. This allows a diminishing rate of communication to share local reward observations. It is this integration of uniform sampling, shared randomness, and sparse approximation in DUETS that makes it possible to achieve the optimal learning rate of the centralized setting at a communication cost that is sublinear in both N𝑁Nitalic_N and T𝑇Titalic_T.

We analyze the performance of DUETS and establish that it incurs a cumulative regret of 𝒪~(NTγNTlog(T/δ))~𝒪𝑁𝑇subscript𝛾𝑁𝑇𝑇𝛿\widetilde{\mathcal{O}}(\sqrt{NT\gamma_{NT}}\log(T/\delta))over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG roman_log ( italic_T / italic_δ ) )111The notation 𝒪~()~𝒪\tilde{\mathcal{O}}(\cdot)over~ start_ARG caligraphic_O end_ARG ( ⋅ ) hides poly-logarithmic factors. with probability 1δ1𝛿1-\delta1 - italic_δ, where γNTsubscript𝛾𝑁𝑇\gamma_{NT}italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT denotes the maximal information gain of the kernel and represents the effective dimension of the kernel. Note that this matches the lower bound (up to logarithmic factors) for any centralized algorithm with a total of NT𝑁𝑇NTitalic_N italic_T queries as established in Scarlett et al. (2017), thereby establishing the order-optimality of the proposed algorithm. To the best knowledge of the authors, this is the first algorithm to achieve the optimal order of regret for the problem of distributed kernel bandits. We also establish a bound of 𝒪~(γNT)~𝒪subscript𝛾𝑁𝑇\tilde{\mathcal{O}}(\gamma_{NT})over~ start_ARG caligraphic_O end_ARG ( italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ) on the communication cost incurred by DUETS , where communication cost is measured by the number of real numbers transmitted during the algorithm (See Section 2 for more details). This significantly improves over the state-of-the-art of 𝒪(NγNT3)𝒪𝑁superscriptsubscript𝛾𝑁𝑇3\mathcal{O}(N\gamma_{NT}^{3})caligraphic_O ( italic_N italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) achieved by ApproxDisKernelUCB algorithm proposed by Li et al. (2022) and is always guaranteed to sublinear in the total number of queries, NT𝑁𝑇NTitalic_N italic_T.

1.3 Related Work

The existing literature on distributed kernel bandits is relatively slim. The most relevant to our work is that by Li et al. (2022), where the authors consider the problem of distributed contextual kernel bandits and propose a UCB based policy with sparse approximation of GP models and intermittent communication. Their proposed policy was shown to incur a cumulative regret of 𝒪~(NTγNT)~𝒪𝑁𝑇subscript𝛾𝑁𝑇\widetilde{\mathcal{O}}(\sqrt{NT}\gamma_{NT})over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T end_ARG italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ) and communication cost of 𝒪(NγNT3)𝒪𝑁superscriptsubscript𝛾𝑁𝑇3\mathcal{O}(N\gamma_{NT}^{3})caligraphic_O ( italic_N italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). The DUETS algorithm proposed in this work, offers an improvement over the algorithm in Li et al. (2022) both in terms of regret and communication cost. While the contextual setting with varying arm action sets considered in their work is more general that the setting with a fixed arm set considered in this work, their proposed algorithm does not offer non-trivial reduction in regret or communication cost in the fixed arm setting. Moreover, both the regret and communication cost incurred by the algorithm in Li et al. (2022) are not guaranteed to be sublinear in the total number of queries, NT𝑁𝑇NTitalic_N italic_T, for all kernels. Consequently, their algorithm does not guarantee convergence to x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT or a non-trivial communication cost for all kernels. On the other hand, both regret and communication cost of DUETS is guaranteed to be sub-linear implying both convergence and communication efficiency.

Among other studies, Du et al. (2023) consider the problem of distributed pure exploration in kernel bandits over finite action set, where they focus on designing learning strategies with low simple regret. In this work, we consider the more challenging continuum-armed setup with a focus on minimizing cumulative regret as opposed to simple regret. Another line of work explores impact of heterogeneity among clients and design algorithms to minimize this impact. Salgia et al. (2023b) consider personalized kernel bandits in which agents have heterogeneous models and aim to optimize the weighted sum of their own reward function and the average reward function over all the agents. Dubey and Pentland (2020) consider heterogeneous distributed kernel bandits over a graph in which they use additional kernel-based modeling to measure task similarity across different agents.

In contrast to the distributed kernel bandit, the problems of distributed multi-armed bandits and linear bandits have been extensively studied. For distributed multi-armed bandits (MAB), a variety of algorithms have been proposed for distributed learning under different network topologies (Landgren et al., 2017; Shahrampour et al., 2017; Sankararaman et al., 2019; Chawla et al., 2020; Zhu et al., 2021)Shi et al. (2021) and Shi and Shen (2021) have analyzed the impact of heterogeneity among agents in the distributed MAB problem. Similarly, the problem of distributed linear bandits is also well-understood in variety of settings with different network topologies (Korda et al., 2016), heterogeneity among agents (Mitra et al., 2021; Ghosh et al., 2021; Hanna et al., 2022) and communication constraints (Mitra et al., 2022; Wang et al., 2019; Huang et al., 2021; Amani et al., 2022; Salgia and Zhao, 2023).

2 Problem Formulation

We consider a distributed learning framework consisting of N𝑁Nitalic_N agents indexed by {1,2,,N}12𝑁\{1,2,\dots,N\}{ 1 , 2 , … , italic_N }. Under this framework, we study the problem of collaboratively maximizing an unknown function f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R, where 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a compact, convex set. The function f𝑓fitalic_f belongs to the Reproducing Kernel Hilbert Space (RKHS), ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, associated with a known positive definite kernel k:𝒳×𝒳:𝑘𝒳𝒳k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}italic_k : caligraphic_X × caligraphic_X → blackboard_R. The RKHS, ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, is a Hilbert space that is endowed by with an inner product ,ksubscriptsubscript𝑘\langle\cdot,\cdot\rangle_{\mathcal{H}_{k}}⟨ ⋅ , ⋅ ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT that obeys the reproducing property, i.e., g,k(x,)k=g(x)subscript𝑔𝑘𝑥subscript𝑘𝑔𝑥\langle g,k(x,\cdot)\rangle_{\mathcal{H}_{k}}=g(x)⟨ italic_g , italic_k ( italic_x , ⋅ ) ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_g ( italic_x ) for all gk𝑔subscript𝑘g\in\mathcal{H}_{k}italic_g ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and induces the norm gk=g,gksubscriptnorm𝑔subscript𝑘subscript𝑔𝑔subscript𝑘\|g\|_{\mathcal{H}_{k}}=\langle g,g\rangle_{\mathcal{H}_{k}}∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ⟨ italic_g , italic_g ⟩ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

The agents can access the unknown function by querying the function at different points in the domain 𝒳𝒳\mathcal{X}caligraphic_X. Upon querying a point x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, the agent receives a reward y=f(x)+ϵ𝑦𝑓𝑥italic-ϵy=f(x)+\epsilonitalic_y = italic_f ( italic_x ) + italic_ϵ, where ϵitalic-ϵ\epsilonitalic_ϵ is a noise term. We make the following assumptions on the unknown function f𝑓fitalic_f and noise.

Assumption 2.1.

The RKHS norm of the function f𝑓fitalic_f is bounded by a known constant B𝐵Bitalic_B, i.e., fkBsubscriptnorm𝑓subscript𝑘𝐵\|f\|_{\mathcal{H}_{k}}\leq B∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_B.

Assumption 2.2.

The noise term ϵitalic-ϵ\epsilonitalic_ϵ is assumed to be independent across all agents and all queries and is a zero-mean, R𝑅Ritalic_R sub-Gaussian random variable i.e., it satisfies the relation 𝔼[exp(λϵ)]expλ2R22𝔼delimited-[]𝜆italic-ϵsuperscript𝜆2superscript𝑅22\mathbb{E}[\exp(\lambda\epsilon)]\leq\exp{\frac{\lambda^{2}R^{2}}{2}}blackboard_E [ roman_exp ( italic_λ italic_ϵ ) ] ≤ roman_exp divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG for all λ𝜆\lambda\in\mathbb{R}italic_λ ∈ blackboard_R.

Assumption 2.3.

For each r𝑟r\in\mathbb{N}italic_r ∈ blackboard_N, there exists a discretization 𝒰rsubscript𝒰𝑟\mathcal{U}_{r}caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT of 𝒳𝒳\mathcal{X}caligraphic_X with |𝒰r|=poly(r)subscript𝒰𝑟poly𝑟|\mathcal{U}_{r}|=\mathrm{poly}(r)| caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT | = roman_poly ( italic_r )222The notation g(x)=poly(x)𝑔𝑥poly𝑥g(x)=\mathrm{poly}(x)italic_g ( italic_x ) = roman_poly ( italic_x ) is equivalent to g(x)=𝒪(xk)𝑔𝑥𝒪superscript𝑥𝑘g(x)=\mathcal{O}(x^{k})italic_g ( italic_x ) = caligraphic_O ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) for some k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. such that, for any fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we have |f(x)f([x]𝒰r)|fkr𝑓𝑥𝑓subscriptdelimited-[]𝑥subscript𝒰𝑟subscriptnorm𝑓subscript𝑘𝑟|f(x)-f([x]_{\mathcal{U}_{r}})|\leq\frac{\|f\|_{\mathcal{H}_{k}}}{r}| italic_f ( italic_x ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | ≤ divide start_ARG ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG, where [x]𝒰r=argminx𝒰rxx2subscriptdelimited-[]𝑥subscript𝒰𝑟subscriptargminsuperscript𝑥subscript𝒰𝑟subscriptnorm𝑥superscript𝑥2[x]_{\mathcal{U}_{r}}=\operatorname*{arg\,min}_{x^{\prime}\in\mathcal{U}_{r}}% \|x-x^{\prime}\|_{2}[ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Assumption 2.4.

Let η={x𝒳|f(x)η}subscript𝜂conditional-set𝑥𝒳𝑓𝑥𝜂\mathcal{L}_{\eta}=\{x\in\mathcal{X}|f(x)\geq\eta\}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X | italic_f ( italic_x ) ≥ italic_η } denote the level set of f𝑓fitalic_f for η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ]. We assume that for all η[B,B]𝜂𝐵𝐵\eta\in[-B,B]italic_η ∈ [ - italic_B , italic_B ], ηsubscript𝜂\mathcal{L}_{\eta}caligraphic_L start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is a disjoint union of at most Mf<subscript𝑀𝑓M_{f}<\inftyitalic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT < ∞ components, each of which is closed and connected. Moreover, for each such component, there exists a bi-Lipschitzian map between each such component and 𝒳𝒳\mathcal{X}caligraphic_X with normalized Lipschitz constant pair Lf,Lf<subscript𝐿𝑓superscriptsubscript𝐿𝑓L_{f},L_{f}^{\prime}<\inftyitalic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < ∞.

Assumptions 2.1-2.3 are standard, mild assumptions that are commonly adopted in the literature (Srinivas et al., 2010; Chowdhury and Gopalan, 2017; Li and Scarlett, 2022; Vakili et al., 2022, 2021a). The existence of the discretization 𝒰rsubscript𝒰𝑟\mathcal{U}_{r}caligraphic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in Assumption 2.3 has been justified and adopted in previous studies (Srinivas et al., 2010; Vakili et al., 2021a). In particular, the popular class of kernels like Squared Exponential and Matérn kernels are known to be Lipschitz continuous, in which case a ε𝜀\varepsilonitalic_ε-cover of the domain with ε=𝒪(1/r)𝜀𝒪1𝑟\varepsilon=\mathcal{O}(1/r)italic_ε = caligraphic_O ( 1 / italic_r ) is sufficient to show the existence of such a discretization. At a high level, Assumption 2.4 ensures that the structure of the levels sets of f𝑓fitalic_f satisfy a mild regularity condition. This is a mild assumption on f𝑓fitalic_f that we require to adopt a result from Salgia et al. (2023a) for our analysis.

The agents collaborate with each other by communicating through a central server. At each time instant, each agent can send a message to the server through the uplink channel. Based on the messages from different agents received by the server, it can then broadcast a message back to all the agents through the downlink channel.

Our objective is to design a distributed learning policy π𝜋\piitalic_π that specifies for each agent n𝑛nitalic_n, the point xt(n)superscriptsubscript𝑥𝑡𝑛x_{t}^{(n)}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT to be queried at each time instant t𝑡titalic_t, based on the information available at that agent upto time instant t𝑡titalic_t. The performance of a collaborative learning policy π𝜋\piitalic_π is measured through its performance in terms of both learning and communication efficiency over a learning horizon of T𝑇Titalic_T steps. The learning efficiency is measured using the notion of cumulative regret, as defined in (1).

The communication efficiency is measured using the sum of the uplink and downlink communication costs. In particular, let Cup(n)(T)superscriptsubscript𝐶up𝑛𝑇C_{\mathrm{up}}^{(n)}(T)italic_C start_POSTSUBSCRIPT roman_up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_T ) denote the number of real numbers sent by the agent n𝑛nitalic_n to the server over the time horizon. The uplink cost of π𝜋\piitalic_π, Cupπ(T)superscriptsubscript𝐶up𝜋𝑇C_{\mathrm{up}}^{\pi}(T)italic_C start_POSTSUBSCRIPT roman_up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ) is then given as the average communication cost over all agents:

Cupπ(T)=1Nn=1NCup(n)(T).superscriptsubscript𝐶up𝜋𝑇1𝑁superscriptsubscript𝑛1𝑁superscriptsubscript𝐶up𝑛𝑇\displaystyle C_{\mathrm{up}}^{\pi}(T)=\frac{1}{N}\sum_{n=1}^{N}C_{\mathrm{up}% }^{(n)}(T).italic_C start_POSTSUBSCRIPT roman_up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT roman_up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_T ) . (2)

Similarly, the downlink cost of π𝜋\piitalic_π, Cdownπ(T)superscriptsubscript𝐶down𝜋𝑇C_{\mathrm{down}}^{\pi}(T)italic_C start_POSTSUBSCRIPT roman_down end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ) is given as the number of real numbers broadcast by the server over the entire time horizon averaged over all agents . The overall communication cost of π𝜋\piitalic_π, Cπ(T)superscript𝐶𝜋𝑇C^{\pi}(T)italic_C start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ), is given as Cπ(T)=Cupπ(T)+Cdownπ(T)superscript𝐶𝜋𝑇superscriptsubscript𝐶up𝜋𝑇superscriptsubscript𝐶down𝜋𝑇C^{\pi}(T)=C_{\mathrm{up}}^{\pi}(T)+C_{\mathrm{down}}^{\pi}(T)italic_C start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ) = italic_C start_POSTSUBSCRIPT roman_up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ) + italic_C start_POSTSUBSCRIPT roman_down end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_T ).

The objective is to design a distributed learning policy that achieves the order-optimal cumulative regret and incurs a low communication cost. We aim to provide high probability bounds on both the cumulative regret and communication cost that hold with probability 1δ1𝛿1-\delta1 - italic_δ for any given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ).

We overview the basis of Gaussian Process models and their sparse approximation, both of which are central to our proposed policy.

2.1 GP Models

In this section we present a brief overview of Gaussian Process models and their application on establishing confidence interval for RKHS elements.

A Gaussian Process (GP) is a random process G𝐺Gitalic_G indexed by 𝒳𝒳\mathcal{X}caligraphic_X and is associated with a mean function μ:𝒳:𝜇𝒳\mu:\mathcal{X}\to\mathbb{R}italic_μ : caligraphic_X → blackboard_R and a positive definite kernel k:𝒳×𝒳:𝑘𝒳𝒳k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}italic_k : caligraphic_X × caligraphic_X → blackboard_R. The random process G𝐺Gitalic_G is defined such that for all finite subsets of 𝒳𝒳\mathcal{X}caligraphic_X, {x1,x2,,xm}𝒳subscript𝑥1subscript𝑥2subscript𝑥𝑚𝒳\{x_{1},x_{2},\dots,x_{m}\}\subset\mathcal{X}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ⊂ caligraphic_X, m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N, the random vector [G(x1),G(x2),,G(xm)]superscript𝐺subscript𝑥1𝐺subscript𝑥2𝐺subscript𝑥𝑚top[G(x_{1}),G(x_{2}),\dots,G(x_{m})]^{\top}[ italic_G ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_G ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_G ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT follows a multivariate Gaussian distribution with mean vector [μ(x1),,μ(xn)]][\mu(x_{1}),\dots,\mu(x_{n})]]^{\top}[ italic_μ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_μ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and covariance matrix Σ=[k(xi,xj)]i,j=1mΣsuperscriptsubscriptdelimited-[]𝑘subscript𝑥𝑖subscript𝑥𝑗𝑖𝑗1𝑚\Sigma=[k(x_{i},x_{j})]_{i,j=1}^{m}roman_Σ = [ italic_k ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Throughout the work, we consider GPs with μ0𝜇0\mu\equiv 0italic_μ ≡ 0. When used as a prior for a data generating process under Gaussian noise, the conjugate property provides closed form expressions for the posterior mean and covariance of the GP model. Specifically, given a set of observations {𝐗m,𝐘m}={(xi,yi)}i=1msubscript𝐗𝑚subscript𝐘𝑚superscriptsubscriptsubscript𝑥𝑖subscript𝑦𝑖𝑖1𝑚\{\mathbf{X}_{m},\mathbf{Y}_{m}\}=\{(x_{i},y_{i})\}_{i=1}^{m}{ bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT from the underlying process, the expression for posterior mean and variance of GP model is given as follows:

μm(x)subscript𝜇𝑚𝑥\displaystyle\mu_{m}(x)italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) =k𝐗m(x)(λ𝐈m+𝐊𝐗m,𝐗m)1𝐘m,absentsubscript𝑘subscript𝐗𝑚superscript𝑥topsuperscript𝜆subscript𝐈𝑚subscript𝐊subscript𝐗𝑚subscript𝐗𝑚1subscript𝐘𝑚\displaystyle=k_{\mathbf{X}_{m}}(x)^{\top}(\lambda\mathbf{I}_{m}+\mathbf{K}_{% \mathbf{X}_{m},\mathbf{X}_{m}})^{-1}\mathbf{Y}_{m},= italic_k start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_λ bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + bold_K start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , (3)
σm2(x)subscriptsuperscript𝜎2𝑚𝑥\displaystyle\sigma^{2}_{m}(x)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) =(k(x,x)k𝐗m(x)(λ𝐈m+𝐊𝐗m,𝐗m)1k𝐗m(x)).absent𝑘𝑥𝑥superscriptsubscript𝑘subscript𝐗𝑚top𝑥superscript𝜆subscript𝐈𝑚subscript𝐊subscript𝐗𝑚subscript𝐗𝑚1subscript𝑘subscript𝐗𝑚𝑥\displaystyle=(k(x,x)-k_{\mathbf{X}_{m}}^{\top}(x)(\lambda\mathbf{I}_{m}+% \mathbf{K}_{\mathbf{X}_{m},\mathbf{X}_{m}})^{-1}k_{\mathbf{X}_{m}}(x)).= ( italic_k ( italic_x , italic_x ) - italic_k start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_x ) ( italic_λ bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + bold_K start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ) . (4)

In the above expressions, k𝐗m(x)=[k(x1,x),k(x2,x)k(xn,x)]subscript𝑘subscript𝐗𝑚𝑥superscript𝑘subscript𝑥1𝑥𝑘subscript𝑥2𝑥𝑘subscript𝑥𝑛𝑥topk_{\mathbf{X}_{m}}(x)=[k(x_{1},x),k(x_{2},x)\dots k(x_{n},x)]^{\top}italic_k start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = [ italic_k ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) , italic_k ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x ) … italic_k ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, 𝐊𝐗m,𝐗m={k(xi,xj)}i,j=1msubscript𝐊subscript𝐗𝑚subscript𝐗𝑚superscriptsubscript𝑘subscript𝑥𝑖subscript𝑥𝑗𝑖𝑗1𝑚\mathbf{K}_{\mathbf{X}_{m},\mathbf{X}_{m}}=\{k(x_{i},x_{j})\}_{i,j=1}^{m}bold_K start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_k ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, 𝐈msubscript𝐈𝑚\mathbf{I}_{m}bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the m×m𝑚𝑚m\times mitalic_m × italic_m identity matrix and λ𝜆\lambdaitalic_λ corresponds to the variance of the Gaussian noise.

Following a standard approach in the literature (Srinivas et al., 2010), we model the data corresponding to observations from the unknown f𝑓fitalic_f, which belongs to the RKHS of a positive definite kernel k𝑘kitalic_k, using a GP with the same covariance kernel k𝑘kitalic_k. In particular, we assume a fictitious GP prior over the fixed, unknown function f𝑓fitalic_f along with fictitious Gaussian distribution for the noise. The benefit of this approach is that the posterior mean and variance of this GP model serve as tools to both predict the values of the function f𝑓fitalic_f and quantify the uncertainty of the prediction at unseen points in the domain, as shown by the following lemma .

Lemma 2.5.

Vakili et al. (2021a, Thm. 1) Assume that 2.1 and 2.2 hold. Given a set of observations {𝐗m,𝐘m}subscript𝐗𝑚subscript𝐘𝑚\{\mathbf{X}_{m},\mathbf{Y}_{m}\}{ bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } as described above, such that the query points 𝐗msubscript𝐗𝑚\mathbf{X}_{m}bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are chosen independent of the noise sequence, then for a fixed x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, the following relation holds with probability at least 1δ1𝛿1-\delta1 - italic_δ:

|h(x)μm(x)|β(δ)σm(x),𝑥subscript𝜇𝑚𝑥𝛽𝛿subscript𝜎𝑚𝑥|h(x)-\mu_{m}(x)|\leq\beta(\delta)\cdot\sigma_{m}(x),| italic_h ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_β ( italic_δ ) ⋅ italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) ,

where β(δ)=B+R2λlog(2δ)𝛽𝛿𝐵𝑅2𝜆2𝛿\beta(\delta)=B+R\sqrt{\frac{2}{\lambda}\log{\left(\frac{2}{\delta}\right)}}italic_β ( italic_δ ) = italic_B + italic_R square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_λ end_ARG roman_log ( divide start_ARG 2 end_ARG start_ARG italic_δ end_ARG ) end_ARG.

We would like to emphasize that these assumptions are modeling techniques used as a part of algorithm and not a part of the problem setup. In particular, the function f𝑓fitalic_f is fixed, deterministic function in ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the noise is R𝑅Ritalic_R-sub-Gaussian.

Lastly, given a set of points 𝐗m={x1,x2,,xm}𝒳subscript𝐗𝑚subscript𝑥1subscript𝑥2subscript𝑥𝑚𝒳\mathbf{X}_{m}=\{x_{1},x_{2},\dots,x_{m}\}\in\mathcal{X}bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ∈ caligraphic_X, the information gain of the set 𝐗msubscript𝐗𝑚\mathbf{X}_{m}bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is defined as γ𝐗m:=12log(det(𝐈m+λ1𝐊𝐗m,𝐗m))assignsubscript𝛾subscript𝐗𝑚12subscript𝐈𝑚superscript𝜆1subscript𝐊subscript𝐗𝑚subscript𝐗𝑚\gamma_{\mathbf{X}_{m}}:=\frac{1}{2}\log(\det(\mathbf{I}_{m}+\lambda^{-1}% \mathbf{K}_{\mathbf{X}_{m},\mathbf{X}_{m}}))italic_γ start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( roman_det ( bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_K start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ). Using this, we can define the maximal information gain of a kernel as γm:=sup𝐗mγ𝐗massignsubscript𝛾𝑚subscriptsupremumsubscript𝐗𝑚subscript𝛾subscript𝐗𝑚\gamma_{m}:=\sup_{\mathbf{X}_{m}}\gamma_{\mathbf{X}_{m}}italic_γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Maximal information gain is closely related to the effective dimension of a kernel (Calandriello et al., 2019) and helps characterize the regret performance of kernel bandit algorithms (Srinivas et al., 2010; Chowdhury and Gopalan, 2017). γmsubscript𝛾𝑚\gamma_{m}italic_γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT depends only the kernel and λ𝜆\lambdaitalic_λ and has been shown to be an increasing sublinear function of m𝑚mitalic_m (Srinivas et al., 2010; Vakili et al., 2021b).

2.2 Sparse approximation of GP models

The sparsification of GP models refers to the idea of approximating the posterior mean and variance of a GP model, corresponding to a set of observations {𝐗m,𝐘m}subscript𝐗𝑚subscript𝐘𝑚\{\mathbf{X}_{m},\mathbf{Y}_{m}\}{ bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }, using a subset of query points 𝐗msubscript𝐗𝑚\mathbf{X}_{m}bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. In particular, let 𝒮𝒮\mathcal{S}caligraphic_S be a subset of 𝐗msubscript𝐗𝑚\mathbf{X}_{m}bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT consisting of r<m𝑟𝑚r<mitalic_r < italic_m points. The approximate posterior mean and variance based on the points in 𝒮𝒮\mathcal{S}caligraphic_S, referred to as the inducing set, is given as follows(Wild et al., 2021).

μ~m(x)subscript~𝜇𝑚𝑥\displaystyle\tilde{\mu}_{m}(x)over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) =z𝒮(x)(λ𝐈|𝒮|+𝐙𝐗m,𝒮𝐙𝐗m,𝒮)1𝐙𝒳m,𝒮𝐘mabsentsubscript𝑧𝒮superscript𝑥topsuperscript𝜆subscript𝐈𝒮superscriptsubscript𝐙subscript𝐗𝑚𝒮topsubscript𝐙subscript𝐗𝑚𝒮1subscriptsuperscript𝐙topsubscript𝒳𝑚𝒮subscript𝐘𝑚\displaystyle=z_{\mathcal{S}}(x)^{\top}\left(\lambda\mathbf{I}_{|\mathcal{S}|}% +\mathbf{Z}_{\mathbf{X}_{m},\mathcal{S}}^{\top}\mathbf{Z}_{\mathbf{X}_{m},% \mathcal{S}}\right)^{-1}\mathbf{Z}^{\top}_{\mathcal{X}_{m},\mathcal{S}}\mathbf% {Y}_{m}= italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_λ bold_I start_POSTSUBSCRIPT | caligraphic_S | end_POSTSUBSCRIPT + bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT (5)
λσ~m2(x)𝜆subscriptsuperscript~𝜎2𝑚𝑥\displaystyle\lambda\tilde{\sigma}^{2}_{m}(x)italic_λ over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) =[k(x,x)z𝒮(x)𝐙𝐗m,𝒮𝐙𝐗m,𝒮(λ𝐈|𝒮|+𝐙𝐗m,𝒮𝐙𝐗m,𝒮)1z𝒮(x)],absentdelimited-[]𝑘𝑥𝑥superscriptsubscript𝑧𝒮top𝑥subscriptsuperscript𝐙topsubscript𝐗𝑚𝒮subscript𝐙subscript𝐗𝑚𝒮superscript𝜆subscript𝐈𝒮superscriptsubscript𝐙subscript𝐗𝑚𝒮topsubscript𝐙subscript𝐗𝑚𝒮1subscript𝑧𝒮𝑥\displaystyle=\big{[}k(x,x)-z_{\mathcal{S}}^{\top}(x)\mathbf{Z}^{\top}_{% \mathbf{X}_{m},\mathcal{S}}\mathbf{Z}_{\mathbf{X}_{m},\mathcal{S}}\left(% \lambda\mathbf{I}_{|\mathcal{S}|}+\mathbf{Z}_{\mathbf{X}_{m},\mathcal{S}}^{% \top}\mathbf{Z}_{\mathbf{X}_{m},\mathcal{S}}\right)^{-1}z_{\mathcal{S}}(x)\big% {]},= [ italic_k ( italic_x , italic_x ) - italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_x ) bold_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT ( italic_λ bold_I start_POSTSUBSCRIPT | caligraphic_S | end_POSTSUBSCRIPT + bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) ] , (6)

where z𝒮(x)=𝐊𝒮,𝒮12k𝒮(x)subscript𝑧𝒮𝑥superscriptsubscript𝐊𝒮𝒮12subscript𝑘𝒮𝑥z_{\mathcal{S}}(x)=\mathbf{K}_{\mathcal{S},\mathcal{S}}^{-\frac{1}{2}}k_{% \mathcal{S}}(x)italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) = bold_K start_POSTSUBSCRIPT caligraphic_S , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) and 𝐙𝐗m,𝒮=[z𝒮(x1),z𝒮(x2),,z𝒮(xm)]subscript𝐙subscript𝐗𝑚𝒮superscriptsubscript𝑧𝒮subscript𝑥1subscript𝑧𝒮subscript𝑥2subscript𝑧𝒮subscript𝑥𝑚top\mathbf{Z}_{\mathbf{X}_{m},\mathcal{S}}=[z_{\mathcal{S}}(x_{1}),z_{\mathcal{S}% }(x_{2}),\dots,z_{\mathcal{S}}(x_{m})]^{\top}bold_Z start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT = [ italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_z start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

Note that it is sufficient to know the matrix 𝐙𝒳m,𝒮𝐙𝒳m,𝒮r×rsuperscriptsubscript𝐙subscript𝒳𝑚𝒮topsubscript𝐙subscript𝒳𝑚𝒮superscript𝑟𝑟\mathbf{Z}_{\mathcal{X}_{m},\mathcal{S}}^{\top}\mathbf{Z}_{\mathcal{X}_{m},{% \mathcal{S}}}\in\mathbb{R}^{r\times r}bold_Z start_POSTSUBSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUBSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT, vector 𝐙𝒳m,𝒮𝐘mrsubscriptsuperscript𝐙topsubscript𝒳𝑚𝒮subscript𝐘𝑚superscript𝑟\mathbf{Z}^{\top}_{\mathcal{X}_{m},\mathcal{S}}\mathbf{Y}_{m}\in\mathbb{R}^{r}bold_Z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , caligraphic_S end_POSTSUBSCRIPT bold_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and the set 𝒮𝒮\mathcal{S}caligraphic_S in order for μ~~𝜇\tilde{\mu}over~ start_ARG italic_μ end_ARG and σ~~𝜎\tilde{\sigma}over~ start_ARG italic_σ end_ARG to be calculated.

3 The DUETS Algorithm

In this section, we present the proposed algorithm DUETS.

We first describe the randomization at each agent and the shared randomness with the server. Each agent n𝑛nitalic_n has a private coin 𝒞nsubscript𝒞𝑛\mathscr{C}_{n}script_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for generating random bits that are independent of those generated by other agents. Each agent’s coin is private to other agents, but known to the central server. As a result, the server can reproduce the random bits generated at all agents.

DUETS employs an epoch-based elimination structure where the domain 𝒳𝒳\mathcal{X}caligraphic_X is successively trimmed across epochs to maintain an active region that contains a global maximizer x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT with high probability for future exploration. Specifically, in each epoch j𝑗jitalic_j, the server and the agents maintain a common active subset of the domain 𝒳j𝒳subscript𝒳𝑗𝒳\mathcal{X}_{j}\subseteq\mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊆ caligraphic_X with 𝒳1subscript𝒳1\mathcal{X}_{1}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT initialized to 𝒳𝒳\mathcal{X}caligraphic_X. The operations in each epoch are as follows.

During the jthsuperscript𝑗thj^{\text{th}}italic_j start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT epoch, each agent n𝑛nitalic_n, using its private coin 𝒞nsubscript𝒞𝑛\mathscr{C}_{n}script_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, generates 𝒟j(n)superscriptsubscript𝒟𝑗𝑛\mathcal{D}_{j}^{(n)}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT, a set of Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT points that are uniformly distributed in the set 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT333If the active region consists of multiple disjoint regions, then we carry out this step for each region separately. For simplicity of description, we assume the active region consists of a single connected component.. Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is set to TTj1𝑇subscript𝑇𝑗1\lfloor\sqrt{TT_{j-1}}\rfloor⌊ square-root start_ARG italic_T italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG ⌋, with T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT being an input to the algorithm. Each agent n𝑛nitalic_n queries all the points in 𝒟j(n)superscriptsubscript𝒟𝑗𝑛\mathcal{D}_{j}^{(n)}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT and obtains 𝐘j(n)Tjsuperscriptsubscript𝐘𝑗𝑛superscriptsubscript𝑇𝑗\mathbf{Y}_{j}^{(n)}\in\mathbb{R}^{T_{j}}bold_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the corresponding vector of reward observations.

Since the server has access to the coins of all the agents, it can faithfully reproduce the set 𝒟j=n=1N𝒟j(n)subscript𝒟𝑗superscriptsubscript𝑛1𝑁superscriptsubscript𝒟𝑗𝑛\mathcal{D}_{j}=\bigcup_{n=1}^{N}\mathcal{D}_{j}^{(n)}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT without any communication between the server and the agents. In order to efficiently communicate the observed reward values from the agents to the server, we leverage sparse approximation of GP models along with the knowledge of the set 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT at the server. The server constructs a global inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by including each point in 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability pj:=p0σj,max2assignsubscript𝑝𝑗subscript𝑝0superscriptsubscript𝜎𝑗2p_{j}:=p_{0}\sigma_{j,\max}^{2}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, independent of other points where σj,max2=supx𝒳jσj2(x)superscriptsubscript𝜎𝑗2subscriptsupremum𝑥subscript𝒳𝑗superscriptsubscript𝜎𝑗2𝑥\sigma_{j,\max}^{2}=\sup_{x\in\mathcal{X}_{j}}\sigma_{j}^{2}(x)italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) and σj2()superscriptsubscript𝜎𝑗2\sigma_{j}^{2}(\cdot)italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ ) is the posterior variance corresponding to points collected in 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Here, p0=72log(4NTδ)subscript𝑝0724𝑁𝑇superscript𝛿p_{0}=72\log\left(\frac{4NT}{\delta^{\prime}}\right)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 72 roman_log ( divide start_ARG 4 italic_N italic_T end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) is an appropriately chosen universal constant which ensures that the approximate posterior statistics constructed using 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are a faithful representation of the true posterior statistics corresponding to the set 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability 1δ1𝛿1-\delta1 - italic_δ. The server broadcasts the inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to all the agents.

Upon receiving the inducing set, each agent n𝑛nitalic_n computes the projection vj(n)|𝒮j|superscriptsubscript𝑣𝑗𝑛superscriptsubscript𝒮𝑗v_{j}^{(n)}\in\mathbb{R}^{|\mathcal{S}_{j}|}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT of its reward vector onto the inducing set as follows:

vj(n):=𝐙𝒟j(n),𝒮j𝐘j(n).assignsuperscriptsubscript𝑣𝑗𝑛superscriptsubscript𝐙subscriptsuperscript𝒟𝑛𝑗subscript𝒮𝑗topsubscriptsuperscript𝐘𝑛𝑗\displaystyle v_{j}^{(n)}:=\mathbf{Z}_{\mathcal{D}^{(n)}_{j},\mathcal{S}_{j}}^% {\top}\mathbf{Y}^{(n)}_{j}.italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT := bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Y start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (7)

Each agent then sends back the lower-dimensional projected observations vj(n)superscriptsubscript𝑣𝑗𝑛v_{j}^{(n)}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT to the server, which subsequently aggregates them to obtain the vector v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT given as

v¯j:=(λ𝐈|𝒮j|+𝐙𝒟j,𝒮j𝐙𝒟j,𝒮j)1(n=1Nvj(n)).assignsubscript¯𝑣𝑗superscript𝜆subscript𝐈subscript𝒮𝑗superscriptsubscript𝐙subscript𝒟𝑗subscript𝒮𝑗topsubscript𝐙subscript𝒟𝑗subscript𝒮𝑗1superscriptsubscript𝑛1𝑁superscriptsubscript𝑣𝑗𝑛\displaystyle\overline{v}_{j}:=\left(\lambda\mathbf{I}_{|\mathcal{S}_{j}|}+% \mathbf{Z}_{\mathcal{D}_{j},\mathcal{S}_{j}}^{\top}\mathbf{Z}_{\mathcal{D}_{j}% ,\mathcal{S}_{j}}\right)^{-1}\left(\sum_{n=1}^{N}v_{j}^{(n)}\right).over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := ( italic_λ bold_I start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_POSTSUBSCRIPT + bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) . (8)

Note that the summation n=1Nvj(n)superscriptsubscript𝑛1𝑁superscriptsubscript𝑣𝑗𝑛\sum_{n=1}^{N}v_{j}^{(n)}∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT equals to 𝐙𝒟j,𝒮j𝐘jsuperscriptsubscript𝐙subscript𝒟𝑗subscript𝒮𝑗topsubscript𝐘𝑗\mathbf{Z}_{\mathcal{D}_{j},\mathcal{S}_{j}}^{\top}\mathbf{Y}_{j}bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, i.e., projection of the rewards of all agents onto the inducing set. The server then broadcasts the vector v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σj,maxsubscript𝜎𝑗\sigma_{j,\max}italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT to all the agents. The benefit of sending v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as opposed to the sum of rewards is that it allows the agents to compute the posterior mean at the agents using their knowledge of the inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (See. Eqn (5)).

As the last step of the epoch, all the agents and the server trim the current set 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to 𝒳j+1subscript𝒳𝑗1\mathcal{X}_{j+1}caligraphic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT using the following update rule:

𝒳j+1={x𝒳j:μ~j(x)supx𝒳jμ~j(x)2β(δ)σj,max},subscript𝒳𝑗1conditional-set𝑥subscript𝒳𝑗subscript~𝜇𝑗𝑥subscriptsupremumsuperscript𝑥subscript𝒳𝑗subscript~𝜇𝑗superscript𝑥2𝛽superscript𝛿subscript𝜎𝑗\displaystyle\mathcal{X}_{j+1}=\left\{x\in\mathcal{X}_{j}:\tilde{\mu}_{j}(x)% \geq\sup_{x^{\prime}\in\mathcal{X}_{j}}\tilde{\mu}_{j}(x^{\prime})-2\beta(% \delta^{{}^{\prime}})\sigma_{j,\max}\right\},caligraphic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≥ roman_sup start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - 2 italic_β ( italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT } , (9)

where δ=δ2|𝒰T|(log(logNlogT))+4)\delta^{\prime}=\frac{\delta}{2|\mathcal{U}_{T}|\cdot(\log(\log{N}\log{T}))+4)}italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_δ end_ARG start_ARG 2 | caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | ⋅ ( roman_log ( roman_log italic_N roman_log italic_T ) ) + 4 ) end_ARG and μ~j(x)=z𝒮j(x)v¯jsubscript~𝜇𝑗𝑥subscriptsuperscript𝑧topsubscript𝒮𝑗𝑥subscript¯𝑣𝑗\tilde{\mu}_{j}(x)=z^{\top}_{\mathcal{S}_{j}}(x)\overline{v}_{j}over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the approximate posterior mean computed based on the inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (See Eqn. (5)). Since the posterior mean provides an estimate for the function values, the update condition is designed to eliminate all points at which the (estimated) function value is smaller than the current best estimate of the maximum value, upto an estimation error. Note that trimming is a deterministic procedure which ensures that all the agents and the server share a common value of 𝒳j+1subscript𝒳𝑗1\mathcal{X}_{j+1}caligraphic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT.

A detailed pseudocode of both the agent and the server side of the DUETS is provided in Algorithms 1 and 2 respectively.

Algorithm 1 DUETS : Agent n{1,2,,N}𝑛12𝑁n\in\{1,2,\dots,N\}italic_n ∈ { 1 , 2 , … , italic_N }
1:  Input: Size of the first epoch T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, error probability δ𝛿\deltaitalic_δ
2:  t0,j1formulae-sequence𝑡0𝑗1t\leftarrow 0,j\leftarrow 1italic_t ← 0 , italic_j ← 1, 𝒳1𝒳subscript𝒳1𝒳\mathcal{X}_{1}\leftarrow\mathcal{X}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← caligraphic_X
3:  while t<T𝑡𝑇t<Titalic_t < italic_T do
4:     𝒟j(n)=superscriptsubscript𝒟𝑗𝑛\mathcal{D}_{j}^{(n)}=\emptysetcaligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = ∅
5:     for i{1,2,,Tj}𝑖12subscript𝑇𝑗i\in\{1,2,\dots,T_{j}\}italic_i ∈ { 1 , 2 , … , italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } do
6:        Query a point xt(n)superscriptsubscript𝑥𝑡𝑛x_{t}^{(n)}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT uniformly at random from 𝒳j1subscript𝒳𝑗1\mathcal{X}_{j-1}caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT using the coin 𝒞nsubscript𝒞𝑛\mathscr{C}_{n}script_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and observe yt(n)superscriptsubscript𝑦𝑡𝑛y_{t}^{(n)}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT
7:        𝒟j(n)𝒟j(n){xt(n)}superscriptsubscript𝒟𝑗𝑛superscriptsubscript𝒟𝑗𝑛superscriptsubscript𝑥𝑡𝑛\mathcal{D}_{j}^{(n)}\leftarrow\mathcal{D}_{j}^{(n)}\cup\{x_{t}^{(n)}\}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ← caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∪ { italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT }
8:        tt+1𝑡𝑡1t\leftarrow t+1italic_t ← italic_t + 1
9:        if t>T𝑡𝑇t>Titalic_t > italic_T then
10:           Terminate
11:        end if
12:     end for
13:     Receive the global inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
14:     Set vj(n)𝐙𝒟j(n),𝒮j𝐘j(n)superscriptsubscript𝑣𝑗𝑛superscriptsubscript𝐙subscriptsuperscript𝒟𝑛𝑗subscript𝒮𝑗topsubscriptsuperscript𝐘𝑛𝑗v_{j}^{(n)}\leftarrow\mathbf{Z}_{\mathcal{D}^{(n)}_{j},\mathcal{S}_{j}}^{\top}% \mathbf{Y}^{(n)}_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ← bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Y start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where 𝐘j(n)=[ytTj,ytTj+1,,yt]subscriptsuperscript𝐘𝑛𝑗superscriptsubscript𝑦𝑡subscript𝑇𝑗subscript𝑦𝑡subscript𝑇𝑗1subscript𝑦𝑡top\mathbf{Y}^{(n)}_{j}=[y_{t-T_{j}},y_{t-T_{j}+1},\dots,y_{t}]^{\top}bold_Y start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = [ italic_y start_POSTSUBSCRIPT italic_t - italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
15:     Receive v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σj,maxsubscript𝜎𝑗max\sigma_{j,\mathrm{max}}italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT from the server
16:     Use v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to compute μ~j()=z𝒮j()v¯jsubscript~𝜇𝑗subscriptsuperscript𝑧topsubscript𝒮𝑗subscript¯𝑣𝑗\tilde{\mu}_{j}(\cdot)=z^{\top}_{\mathcal{S}_{j}}(\cdot)\overline{v}_{j}over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) = italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
17:     Update 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to 𝒳j+1subscript𝒳𝑗1\mathcal{X}_{j+1}caligraphic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT using Eqn. (9)
18:     Tj+1TTjsubscript𝑇𝑗1𝑇subscript𝑇𝑗T_{j+1}\leftarrow\lfloor\sqrt{TT_{j}}\rflooritalic_T start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ← ⌊ square-root start_ARG italic_T italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⌋
19:     jj+1𝑗𝑗1j\leftarrow j+1italic_j ← italic_j + 1
20:  end while
Algorithm 2 DUETS : Server
1:  input: Size of the first epoch T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, error probability δ𝛿\deltaitalic_δ
2:  t0,j1formulae-sequence𝑡0𝑗1t\leftarrow 0,j\leftarrow 1italic_t ← 0 , italic_j ← 1, 𝒳1𝒳subscript𝒳1𝒳\mathcal{X}_{1}\leftarrow\mathcal{X}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← caligraphic_X
3:  while t<T𝑡𝑇t<Titalic_t < italic_T do
4:     Use the coins 𝒞1,𝒞2,,𝒞Nsubscript𝒞1subscript𝒞2subscript𝒞𝑁\mathscr{C}_{1},\mathscr{C}_{2},\dots,\mathscr{C}_{N}script_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , script_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , script_C start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT to reproduce the sets 𝒟j(1),𝒟j(2),,𝒟j(N)superscriptsubscript𝒟𝑗1superscriptsubscript𝒟𝑗2superscriptsubscript𝒟𝑗𝑁\mathcal{D}_{j}^{(1)},\mathcal{D}_{j}^{(2)},\dots,\mathcal{D}_{j}^{(N)}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_N ) end_POSTSUPERSCRIPT
5:     𝒟jn=1N𝒟j(n)subscript𝒟𝑗superscriptsubscript𝑛1𝑁superscriptsubscript𝒟𝑗𝑛\mathcal{D}_{j}\leftarrow\bigcup_{n=1}^{N}\mathcal{D}_{j}^{(n)}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ← ⋃ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT
6:     Set σj,maxsupx𝒳jσj(x)subscript𝜎𝑗subscriptsupremum𝑥subscript𝒳𝑗subscript𝜎𝑗𝑥\sigma_{j,\max}\leftarrow\sup_{x\in\mathcal{X}_{j}}\sigma_{j}(x)italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT ← roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x )
7:     Construct the set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by including each point from 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, independent of other points
8:     Broadcast 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to all the agents
9:     Receive vj(n)superscriptsubscript𝑣𝑗𝑛v_{j}^{(n)}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT from all agents n{1,2,,N}𝑛12𝑁n\in\{1,2,\dots,N\}italic_n ∈ { 1 , 2 , … , italic_N }
10:     Set v¯j(λ𝐈|𝒮j|+𝐙𝒟j(n),𝒮j𝐙𝒟j(n),𝒮j)1(n=1Nvj(n)).subscript¯𝑣𝑗superscript𝜆subscript𝐈subscript𝒮𝑗superscriptsubscript𝐙superscriptsubscript𝒟𝑗𝑛subscript𝒮𝑗topsubscript𝐙superscriptsubscript𝒟𝑗𝑛subscript𝒮𝑗1superscriptsubscript𝑛1𝑁superscriptsubscript𝑣𝑗𝑛\overline{v}_{j}\leftarrow\left(\lambda\mathbf{I}_{|\mathcal{S}_{j}|}+\mathbf{% Z}_{\mathcal{D}_{j}^{(n)},\mathcal{S}_{j}}^{\top}\mathbf{Z}_{\mathcal{D}_{j}^{% (n)},\mathcal{S}_{j}}\right)^{-1}(\sum_{n=1}^{N}v_{j}^{(n)}).over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ← ( italic_λ bold_I start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_POSTSUBSCRIPT + bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Z start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) .
11:     Broadcast v¯jsubscript¯𝑣𝑗\overline{v}_{j}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σj,maxsubscript𝜎𝑗\sigma_{j,\max}italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT to all the agents
12:     Update 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to 𝒳j+1subscript𝒳𝑗1\mathcal{X}_{j+1}caligraphic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT using Eqn. (9)
13:     tt+Tj𝑡𝑡subscript𝑇𝑗t\leftarrow t+T_{j}italic_t ← italic_t + italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
14:     Tj+1TTjsubscript𝑇𝑗1𝑇subscript𝑇𝑗T_{j+1}\leftarrow\lfloor\sqrt{TT_{j}}\rflooritalic_T start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ← ⌊ square-root start_ARG italic_T italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⌋
15:     jj+1𝑗𝑗1j\leftarrow j+1italic_j ← italic_j + 1
16:  end while

4 Performance Analysis

The following theorem characterizes the regret performance and communication cost of DUETS.

Theorem 4.1.

Consider the distributed kernel bandit problem described in Section 2. For a given δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), let the policy parameters of DUETS be such that T1M¯/Nsubscript𝑇1normal-¯𝑀𝑁T_{1}\geq\overline{M}/Nitalic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ over¯ start_ARG italic_M end_ARG / italic_N and p0=72log4Nδsubscript𝑝0724𝑁𝛿p_{0}=72\log{\frac{4N}{\delta}}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 72 roman_log divide start_ARG 4 italic_N end_ARG start_ARG italic_δ end_ARG. Then with probability at least 1δ1𝛿1-\delta1 - italic_δ, the regret and communication cost incurred by DUETS satisfy the following relations:

RDUETSsubscript𝑅DUETS\displaystyle R_{\mathrm{DUETS}}italic_R start_POSTSUBSCRIPT roman_DUETS end_POSTSUBSCRIPT =𝒪~(NTγNTlog(T/δ))absent~𝒪𝑁𝑇subscript𝛾𝑁𝑇𝑇𝛿\displaystyle=\tilde{\mathcal{O}}(\sqrt{NT\gamma_{NT}}\log(T/\delta))= over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG roman_log ( italic_T / italic_δ ) )
CDUETSsubscript𝐶DUETS\displaystyle C_{\mathrm{DUETS}}italic_C start_POSTSUBSCRIPT roman_DUETS end_POSTSUBSCRIPT =𝒪~(γNT).absent~𝒪subscript𝛾𝑁𝑇\displaystyle=\tilde{\mathcal{O}}(\gamma_{NT}).= over~ start_ARG caligraphic_O end_ARG ( italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ) .

Here, M¯normal-¯𝑀\overline{M}over¯ start_ARG italic_M end_ARG is a constant that depends only upon the kernel k𝑘kitalic_k and the domain 𝒳𝒳\mathcal{X}caligraphic_X and it is independent of N𝑁Nitalic_N and T𝑇Titalic_T.444The constant M¯normal-¯𝑀\overline{M}over¯ start_ARG italic_M end_ARG is the same as one in Lemma 4.3, which has been adopted from Salgia et al. (2023a). We refer the reader to Salgia et al. (2023a) for an exact expression of the constant and additional related discussion.

As shown in above theorem, DUETS achieves order-optimal regret as it matches the lower bound established in Scarlett et al. (2017) upto logarithmic factors. DUETS is the first algorithm to close this gap to the lower bound in the distributed setup and achieve order-optimal regret performance. Moreover, DUETS incurs a communication cost that is sublinear in both T𝑇Titalic_T and N𝑁Nitalic_N for all kernels. Furthermore, it can be much smaller that NT𝑁𝑇NTitalic_N italic_T, depending upon the smoothness of the kernel. For example, using the bounds on information gain (Vakili et al., 2021b), we can show that the communication cost incurred by DUETS is 𝒪(logd(NT))𝒪superscript𝑑𝑁𝑇\mathcal{O}(\log^{d}(NT))caligraphic_O ( roman_log start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_N italic_T ) ).

Proof.

We provide a sketch of the proof of Theorem 4.1 here. The regret bound is obtained by first bounding the regret incurred by DUETS in each epoch j𝑗jitalic_j and then summing the regret across different epochs. In any epoch j𝑗jitalic_j, the agents take purely exploratory by uniformly sampling the region 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Thus, to bound the regret incurred at any step during an epoch, we use the crude bound Δj:=supx𝒳j(f(x*)f(x))assignsubscriptΔ𝑗subscriptsupremum𝑥subscript𝒳𝑗𝑓superscript𝑥𝑓𝑥\Delta_{j}:=\sup_{x\in\mathcal{X}_{j}}(f(x^{*})-f(x))roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x ) ). Consequently, the regret during the jthsuperscript𝑗thj^{\text{th}}italic_j start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT epoch, denoted by R(j)superscript𝑅𝑗R^{(j)}italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, is upper bounded by NΔjTj𝑁subscriptΔ𝑗subscript𝑇𝑗N\cdot\Delta_{j}T_{j}italic_N ⋅ roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Note that the update criterion (Eqn. (9)) is designed to obtain a refined localization of x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT by eliminating the points with low function values consequently leading to smaller values of ΔjsubscriptΔ𝑗\Delta_{j}roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as the algorithm proceeds. The epoch lengths are carefully chosen to balance the increase in epoch length with the decrease in ΔjsubscriptΔ𝑗\Delta_{j}roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to obtain the tightest bound. These ideas are captured in the following lemmas from the regret bound follows.

Lemma 4.2.

Let Δj:=supx𝒳jf(x*)f(x)assignsubscriptnormal-Δ𝑗subscriptsupremum𝑥subscript𝒳𝑗𝑓superscript𝑥𝑓𝑥\Delta_{j}:=\sup_{x\in\mathcal{X}_{j}}f(x^{*})-f(x)roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x ). Then, the following bound holds all epochs j1𝑗1j\geq 1italic_j ≥ 1 with probability 1δ21𝛿21-\frac{\delta}{2}1 - divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG.

Δj8β(δ)(supx𝒳j1σj(x))+4BT,subscriptΔ𝑗8𝛽superscript𝛿subscriptsupremum𝑥subscript𝒳𝑗1subscript𝜎𝑗𝑥4𝐵𝑇\Delta_{j}\leq 8\beta(\delta^{\prime})\cdot\left(\sup_{x\in\mathcal{X}_{j-1}}% \sigma_{j}(x)\right)+\frac{4B}{T},roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ 8 italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ ( roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG ,

where δ=δ2(log(logN+logT)+4)|𝒰T|superscript𝛿normal-′𝛿2𝑁𝑇4subscript𝒰𝑇\delta^{\prime}=\frac{\delta}{2(\log(\log{N}+\log{T})+4)|\mathcal{U}_{T}|}italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_δ end_ARG start_ARG 2 ( roman_log ( roman_log italic_N + roman_log italic_T ) + 4 ) | caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_ARG and 𝒰Tsubscript𝒰𝑇\mathcal{U}_{T}caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT denotes the discretization defined in Assumption 2.3.

Lemma 4.3.

Let σj2()superscriptsubscript𝜎𝑗2normal-⋅\sigma_{j}^{2}(\cdot)italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ ) denote the posterior variance corresponding to the set 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT obtained by sampling NTj𝑁subscript𝑇𝑗NT_{j}italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT points uniformly at random from the domain 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then, for T1M¯(δ)/Nsubscript𝑇1normal-¯𝑀𝛿𝑁T_{1}\geq\overline{M}(\delta)/Nitalic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ over¯ start_ARG italic_M end_ARG ( italic_δ ) / italic_N and for any f𝑓fitalic_f satisfying Assumption 2.4, the following bound holds with probability 1δ1𝛿1-\delta1 - italic_δ for all epochs j1𝑗1j\geq 1italic_j ≥ 1:

supx𝒳jσj2(x)Cf,𝒳γNTjNTj.subscriptsupremum𝑥subscript𝒳𝑗superscriptsubscript𝜎𝑗2𝑥subscript𝐶𝑓𝒳subscript𝛾𝑁subscript𝑇𝑗𝑁subscript𝑇𝑗\displaystyle\sup_{x\in\mathcal{X}_{j}}\sigma_{j}^{2}(x)\leq C_{f,\mathcal{X}}% \cdot\frac{\gamma_{NT_{j}}}{NT_{j}}.roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT ⋅ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG .

Here Cfsubscript𝐶𝑓C_{f}italic_C start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT denotes a constant that depends only on f𝑓fitalic_f and the domain 𝒳𝒳\mathcal{X}caligraphic_X and is independent of both N𝑁Nitalic_N and T𝑇Titalic_T.

Lemma 4.4.

The total number of epochs in DUETS over a time horizon of T𝑇Titalic_T is less than log(log(max{N,T}))+4𝑁𝑇4\log(\log(\max\{N,T\}))+4roman_log ( roman_log ( roman_max { italic_N , italic_T } ) ) + 4.

Lemma 4.3 is result adopted from the recent work by Salgia et al. (2023a) that establishes bounds on worst-case posterior variance corresponding to a set of randomly sampled points.

For the bound on communication cost, note that each epoch j𝑗jitalic_j, the server broadcasts the inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which consists of |𝒮j|subscript𝒮𝑗|\mathcal{S}_{j}|| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | vectors in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the vector v¯j|𝒮j|subscript¯𝑣𝑗superscriptsubscript𝒮𝑗\overline{v}_{j}\in\mathbb{R}^{|\mathcal{S}_{j}|}over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT and the scalar σj,maxsubscript𝜎𝑗\sigma_{j,\max}italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT, resulting in a downlink cost of 𝒪(|𝒮j|)𝒪subscript𝒮𝑗\mathcal{O}(|\mathcal{S}_{j}|)caligraphic_O ( | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ) in epoch j𝑗jitalic_j. Similarly, since each agent just uploads vj(n)|𝒮j|superscriptsubscript𝑣𝑗𝑛superscriptsubscript𝒮𝑗{v}_{j}^{(n)}\in\mathbb{R}^{|\mathcal{S}_{j}|}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT, the uplink cost of DUETS in epoch j𝑗jitalic_j also satisfies 𝒪(|𝒮j|)𝒪subscript𝒮𝑗\mathcal{O}(|\mathcal{S}_{j}|)caligraphic_O ( | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ). Consequently, the communication cost of DUETS in epoch j𝑗jitalic_j is bounded by 𝒪(|𝒮j|)𝒪subscript𝒮𝑗\mathcal{O}(|\mathcal{S}_{j}|)caligraphic_O ( | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ). The following lemma gives a high probability bound on the |𝒮j|subscript𝒮𝑗|\mathcal{S}_{j}|| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT |.

Lemma 4.5.

Let 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denote the inducing set construct in j𝑡ℎsuperscript𝑗𝑡ℎj^{\text{th}}italic_j start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT epoch, as outlined in Section 3. Then, with probability at least 1δ1𝛿1-\delta1 - italic_δ,

|𝒮j|Cf,𝒳(3+log(log(logNlogT)δ))γNT,subscript𝒮𝑗subscript𝐶𝑓𝒳3𝑁𝑇𝛿subscript𝛾𝑁𝑇\displaystyle|\mathcal{S}_{j}|\leq C_{f,\mathcal{X}}\cdot\left(3+\log\left(% \frac{\log(\log{N}\log{T})}{\delta}\right)\right)\cdot\gamma_{NT},| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT ⋅ ( 3 + roman_log ( divide start_ARG roman_log ( roman_log italic_N roman_log italic_T ) end_ARG start_ARG italic_δ end_ARG ) ) ⋅ italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ,

holds for all epochs j𝑗jitalic_j. In the above expression, Cf,𝒳subscript𝐶𝑓𝒳C_{f,\mathcal{X}}italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT is same as the constant in Lemma 4.3.

The bound on the communication cost follows directly from Lemmas 4.5 and 4.4. Please refer to Appendix A for a detailed proof.

Refer to caption
(a) h1(x)subscript1𝑥h_{1}(x)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x )
Refer to caption
(b) h2(x)subscript2𝑥h_{2}(x)italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x )
Refer to caption
(c) Branin
Refer to caption
(d) Hartmann-4444D
Refer to caption
(e) h1(x)subscript1𝑥h_{1}(x)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x )
Refer to caption
(f) h2(x)subscript2𝑥h_{2}(x)italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x )
Refer to caption
(g) Branin
Refer to caption
(h) Hartmann-4444D
Figure 1: Cumulative regret (Fig. (1a-1d) and communication cost (1e-1h) for all algorithms across different benchmark functions averaged over 5555 Monte Carlo runs. The shaded region represents error bars corresponding to one standard deviation. As seen from the above plots, DUETS obtains a superior performance, both in terms of regret and communication cost, over other algorithm across all functions.

5 Empirical Studies

We perform several empirical studies to corroborate our theoretical findings. We compare the regret performance and communication cost of our proposed algorithm, DUETS, against three baseline algorithms — DisKernelUCB, ApproxDisKernelUCB and N-KernelUCB. The first two are distributed kernel bandits algorithms proposed in Li et al. (2022). N-KernelUCB is a baseline algorithm considered in Li et al. (2022) where each agent locally runs the GP-UCB algorithm (Chowdhury and Gopalan, 2017) with no communication among the agents.

We compare the performance of all the four algorithm across four benchmark functions. The first two are synthetic functions h1,h2::subscript1subscript2h_{1},h_{2}:\mathcal{B}\to\mathbb{R}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : caligraphic_B → blackboard_R considered in Li et al. (2022), where \mathcal{B}caligraphic_B denotes the unit ball centered at origin in 10superscript10\mathbb{R}^{10}blackboard_R start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT. The functions are given by:

h1(x)subscript1𝑥\displaystyle h_{1}(x)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) :=cos(3xθ)assignabsent3superscript𝑥topsuperscript𝜃\displaystyle:=\cos(3x^{\top}\theta^{\star}):= roman_cos ( 3 italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT )
h2(x)subscript2𝑥\displaystyle h_{2}(x)italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) :=(xθ)33(xθ)2+3(xθ)+3.assignabsentsuperscriptsuperscript𝑥topsuperscript𝜃33superscriptsuperscript𝑥topsuperscript𝜃23superscript𝑥topsuperscript𝜃3\displaystyle:=(x^{\top}\theta^{\star})^{3}-3(x^{\top}\theta^{\star})^{2}+3(x^% {\top}\theta^{\star})+3.:= ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 3 ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) + 3 .

For both the functions θsuperscript𝜃\theta^{\star}italic_θ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is randomly chosen from the surface of the unit ball \mathcal{B}caligraphic_B. The other two functions are Branin (Azimi et al., 2012; Picheny et al., 2013) and Hartmann-4444(Picheny et al., 2013), which are commonly used benchmark functions for Bayesian Optimization. The Branin function is defined over 𝒳=[0,1]2𝒳superscript012\mathcal{X}=[0,1]^{2}caligraphic_X = [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT while the Hartmann-4444D function is defined over 𝒳=[0,1]4𝒳superscript014\mathcal{X}=[0,1]^{4}caligraphic_X = [ 0 , 1 ] start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT.

We consider a distributed kernel bandit described in Section 2 with N=10𝑁10N=10italic_N = 10 agents. For all the experiments, we use the Squared Exponential kernel. The length scale was set to 0.20.20.20.2 for Branin and to 1111 for all other functions. The observations were corrupted with zero mean Gaussian noise with a standard deviation of 0.20.20.20.2. The parameter D𝐷Ditalic_D for ApproxDisKernelUCB and DisKernelUCB was set to 20202020 and 10101010 respectively. For DUETS , we set T1=2subscript𝑇12T_{1}=2italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 and p0=10subscript𝑝010p_{0}=10italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 10. The parameter β𝛽\betaitalic_β was selected using a grid search over {0.2,0.5,1,2,5}0.20.5125\{0.2,0.5,1,2,5\}{ 0.2 , 0.5 , 1 , 2 , 5 } for all the algorithms. All the algorithms were run for T=50𝑇50T=50italic_T = 50 time steps. We averaged the cumulative regret and the communication cost incurred by different algorithms over 5555 Monte Carlo runs.

The cumulative regret incurred by different algorithms across the different benchmark function are shown in the top row of Figure 1. The bottom row consists of the corresponding plots for the communication cost incurred by the different algorithm. The shaded regions denotes error bars upto standard deviation on either side of the mean value. As evident from the plots, DUETS achieves a significantly lower regret as compared to all other algorithms consistently across benchmark functions. DUETS also incurs a smaller communication overhead as compared to other algorithms, corroborating our theoretical results.

References

  • Abbasi-Yadkori et al. (2011) Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011. ISBN 9781618395993.
  • Amani et al. (2022) S. Amani, T. Lattimore, A. György, and L. F. Yang. Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost, 2022. URL http://arxiv.longhoe.net/abs/2205.13170.
  • Azimi et al. (2012) J. Azimi, A. Jalali, and X. Z. Fern. Hybrid batch bayesian optimization. In Proceedings of the 29th International Conference on Machine Learning, ICML, volume 2, pages 1215–1222, 2012. ISBN 9781450312851.
  • Calandriello et al. (2019) D. Calandriello, L. Carratino, A. Lazaric, M. Valko, and L. Rosasco. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret. Proceedings of Machine Learning Research, 99:1–25, 2019.
  • Camilleri et al. (2021) R. Camilleri, J. Katz-Samuels, and K. Jamieson. High-Dimensional Experimental Design and Kernel Bandits. In Proceedings of the 38th International Conference on Machine Learning, 2021.
  • Chawla et al. (2020) R. Chawla, A. Sankararaman, A. Ganesh, and S. Shakkottai. The Gossi** Insert-Eliminate Algorithm for Multi-Agent Bandits, 2020.
  • Chowdhury and Gopalan (2017) S. R. Chowdhury and A. Gopalan. On kernelized multi-armed bandits. In Proceedings of the 34th International Conference on Machine Learning, ICML, volume 2, pages 1397–1422, 2017.
  • Dai et al. (2020) Z. Dai, B. K. H. Low, and P. Jaillet. Federated Bayesian optimization via Thompson sampling. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, volume 2020-Decem, 2020.
  • Du et al. (2023) Y. Du, W. Chen, Y. Kuroki, and L. Huang. Collaborative Pure Exploration in Kernel Bandit. In Proceedings of the 11th International Conference on Learning Representations, ICLR, 2023.
  • Dubey and Pentland (2020) A. Dubey and A. Pentland. Kernel methods for cooperative multi-agent contextual bandits. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pages 2720–2730, 2020. ISBN 9781713821120.
  • Ghosh et al. (2021) A. Ghosh, A. Sankararaman, and K. Ramchandran. Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits, 2021. URL http://arxiv.longhoe.net/abs/2106.08902.
  • Hanna et al. (2022) O. Hanna, L. Yang, and C. Fragouli. Learning from distributed users in contextual linear bandits without sharing the context. In Proceedings of the 36th Annual Conference on Neural Information Processing Systems, volume 35, pages 11049–11062, 2022.
  • Huang et al. (2021) R. Huang, W. Wu, J. Yang, and C. Shen. Federated Linear Contextual Bandits. In Advances in Neural Information Processing Systems, volume 32, pages 27057–27068, 2021. ISBN 9781713845393.
  • Jacot et al. (2018) A. Jacot, F. Gabriel, and C. Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, pages 8571–8580, 2018.
  • Korda et al. (2016) N. Korda, B. Szorenyi, and S. Li. Distributed clustering of linear bandits in peer to peer networks. In 33rd International Conference on Machine Learning, ICML 2016, volume 3, pages 1966–1980, 2016. ISBN 9781510829008.
  • Landgren et al. (2017) P. Landgren, V. Srivastava, and N. E. Leonard. On distributed cooperative decision-making in multiarmed bandits. In Proceedings of the European Control Conference, ECC, pages 243–248, 2017. ISBN 9781509025916.
  • Li et al. (2022) C. Li, H. Wang, M. Wang, and H. Wang. Communication Efficient Distributed Learning for Kernelized Contextual Bandits. In Proceedings of the 36th Annual Conference on Neural Information Processing Systems, 2022.
  • Li and Scarlett (2022) Z. Li and J. Scarlett. Gaussian process bandit optimization with few batches. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS, 2022.
  • Mitra et al. (2021) A. Mitra, H. Hassani, and G. Pappas. Exploiting Heterogeneity in Robust Federated Best-Arm Identification, 2021. URL http://arxiv.longhoe.net/abs/2109.05700.
  • Mitra et al. (2022) A. Mitra, H. Hassani, and G. J. Pappas. Linear Stochastic Bandits over a Bit-Constrained Channel, 2022.
  • Picheny et al. (2013) V. Picheny, T. Wagner, and D. Ginsbourger. A benchmark of kriging-based infill criteria for noisy optimization. Structural and Multidisciplinary Optimization, 48(3):607–626, 2013. ISSN 1615147X.
  • Salgia and Zhao (2023) S. Salgia and Q. Zhao. Distributed linear bandits under communication constraints. In Proceedings of the 40th International Conference on Machine Learning, ICML, pages 29845–29875. PMLR, 2023.
  • Salgia et al. (2021) S. Salgia, S. Vakili, and Q. Zhao. A domain-shrinking based Bayesian optimization algorithm with order-optimal regret performance. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems, volume 34, 2021.
  • Salgia et al. (2023a) S. Salgia, S. Vakili, and Q. Zhao. Random exploration in bayesian optimization: Order-optimal regret and computational efficiency, 2023a.
  • Salgia et al. (2023b) S. Salgia, S. Vakili, and Q. Zhao. Collaborative learning in kernel-based bandits for distributed users. IEEE Transactions on Signal Processing, 71:3956–3967, 2023b.
  • Sankararaman et al. (2019) A. Sankararaman, A. Ganesh, and S. Shakkottai. Social learning in multi agent multi armed bandits. Proc. ACM Meas. Anal. Comput. Syst., 3(3), dec 2019.
  • Scarlett et al. (2017) J. Scarlett, I. Bogunovic, and V. Cehver. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization. In Conference on Learning Theory, volume 65, pages 1–20, 2017.
  • Shahrampour et al. (2017) S. Shahrampour, A. Rakhlin, and A. Jadbabaie. Multi-armed bandits in multi-agent networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2786–2790, 2017.
  • Shi and Shen (2021) C. Shi and C. Shen. Federated Multi-Armed Bandits. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pages 9603–9611, 2021.
  • Shi et al. (2021) C. Shi, C. Shen, and J. Yang. Federated Multi-armed Bandits with Personalization, 2021. URL http://arxiv.longhoe.net/abs/2102.13101.
  • Srinivas et al. (2010) N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, ICML, pages 1015–1022, 2010.
  • Vakili et al. (2021a) S. Vakili, N. Bouziani, S. Jalali, A. Bernacchia, and D.-s. Shiu. Optimal order simple regret for Gaussian process bandits. In Proceedings of the 35th Annual Conference on Neural Information Processing Systems, 2021a.
  • Vakili et al. (2021b) S. Vakili, K. Khezeli, and V. Picheny. On information gain and regret bounds in Gaussian process bandits. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, AISTATS, 2021b.
  • Vakili et al. (2022) S. Vakili, J. Scarlett, D.-s. Shiu, and A. Bernacchia. Improved convergence rates for sparse approximation methods in kernel-based learning. In Proceedings of the 39th International Conference on Machine Learning, ICML, pages 21960–21983. PMLR, 2022.
  • Valko et al. (2013) M. Valko, N. Korda, R. Munos, I. Flaounas, and N. Cristianini. Finite-time analysis of kernelised contextual bandits. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, UAI, pages 654–663, 2013.
  • Wang et al. (2019) Y. Wang, J. Hu, X. Chen, and L. Wang. Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication. In Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
  • Wild et al. (2021) V. Wild, M. Kanagawa, and D. Sejdinovic. Connections and equivalences between the nyström method and sparse variational gaussian processes, 2021.
  • Zhu et al. (2021) Z. Zhu, J. Zhu, J. Liu, and Y. Liu. Federated Bandit: A Gossi** Approach. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(1):1–29, 2021.

Appendix A Appendix A.

A.1 Proof of Theorem 4.1

Proof.

In this section, we provide a detailed proof of Theorem 4.1. For the regret bound, we first bound the regret incurred by DUETS in each epoch j𝑗jitalic_j and then sum it across different epochs to obtain a bound on the overall cumulative regret. We first prove the theorem assuming the results from Lemmas 4.24.3 and 4.4 and then separately prove the lemmas.

Consider any epoch j1𝑗1j\geq 1italic_j ≥ 1 and let R(j)superscript𝑅𝑗R^{(j)}italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT denote the regret incurred by DUETS in this epoch. Since the agents take purely exploratory actions by uniform sampling points from the current set, we have the following crude bound R(j)ΔjNTjMfsuperscript𝑅𝑗subscriptΔ𝑗𝑁subscript𝑇𝑗subscript𝑀𝑓R^{(j)}\leq\Delta_{j}\cdot NT_{j}\cdot M_{f}italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ≤ roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, where Δj:=supx𝒳j(f(x*)f(x))assignsubscriptΔ𝑗subscriptsupremum𝑥subscript𝒳𝑗𝑓superscript𝑥𝑓𝑥\Delta_{j}:=\sup_{x\in\mathcal{X}_{j}}(f(x^{*})-f(x))roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x ) ). The term NTjMf𝑁subscript𝑇𝑗subscript𝑀𝑓NT_{j}\cdot M_{f}italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT corresponds to number of points sampled during the epoch as we sample each connected component of 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, of which there are at most Mfsubscript𝑀𝑓M_{f}italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, NTj𝑁subscript𝑇𝑗NT_{j}italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT times. For j=1𝑗1j=1italic_j = 1, we use the trivial bound,

Δ1=supx𝒳(f(x*)f(x))2supx𝒳f(x)2B,subscriptΔ1subscriptsupremum𝑥𝒳𝑓superscript𝑥𝑓𝑥2subscriptsupremum𝑥𝒳𝑓𝑥2𝐵\displaystyle\Delta_{1}=\sup_{x\in\mathcal{X}}(f(x^{*})-f(x))\leq 2\sup_{x\in% \mathcal{X}}f(x)\leq 2B,roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x ) ) ≤ 2 roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ) ≤ 2 italic_B ,

which gives us R(1)2BNT1Mfsuperscript𝑅12𝐵𝑁subscript𝑇1subscript𝑀𝑓R^{(1)}\leq 2B\cdot NT_{1}\cdot M_{f}italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ≤ 2 italic_B ⋅ italic_N italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. On invoking Lemma 4.2 for j>1𝑗1j>1italic_j > 1 we obtain,

R(j)superscript𝑅𝑗\displaystyle R^{(j)}italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ΔjNTjMfabsentsubscriptΔ𝑗𝑁subscript𝑇𝑗subscript𝑀𝑓\displaystyle\leq\Delta_{j}\cdot NT_{j}\cdot M_{f}≤ roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT
NTjMf(8β(δ)(supx𝒳j1σj1(x))+4BT),absent𝑁subscript𝑇𝑗subscript𝑀𝑓8𝛽superscript𝛿subscriptsupremum𝑥subscript𝒳𝑗1subscript𝜎𝑗1𝑥4𝐵𝑇\displaystyle\leq NT_{j}\cdot M_{f}\cdot\left(8\beta(\delta^{\prime})\cdot% \left(\sup_{x\in\mathcal{X}_{j-1}}\sigma_{j-1}(x)\right)+\frac{4B}{T}\right),≤ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( 8 italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ ( roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) ) + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG ) ,

where δ=δ2(loglogNT+4)|𝒰T|superscript𝛿𝛿2𝑁𝑇4subscript𝒰𝑇\delta^{\prime}=\dfrac{\delta}{{2(\log\log{NT}+4)|\mathcal{U}_{T}|}}italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_δ end_ARG start_ARG 2 ( roman_log roman_log italic_N italic_T + 4 ) | caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_ARG. Using Lemma 4.3, we can further bound this expression as

R(j)superscript𝑅𝑗\displaystyle R^{(j)}italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ΔjNTjMfabsentsubscriptΔ𝑗𝑁subscript𝑇𝑗subscript𝑀𝑓\displaystyle\leq\Delta_{j}\cdot NT_{j}\cdot M_{f}≤ roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT
NTjMf(8β(δ)Cf,𝒳γNTj1NTj1+4BT)absent𝑁subscript𝑇𝑗subscript𝑀𝑓8𝛽superscript𝛿subscript𝐶𝑓𝒳subscript𝛾𝑁subscript𝑇𝑗1𝑁subscript𝑇𝑗14𝐵𝑇\displaystyle\leq NT_{j}\cdot M_{f}\cdot\left(8\beta(\delta^{\prime})\cdot C_{% f,\mathcal{X}}\cdot\sqrt{\frac{\gamma_{NT_{j-1}}}{NT_{j-1}}}+\frac{4B}{T}\right)≤ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( 8 italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT ⋅ square-root start_ARG divide start_ARG italic_γ start_POSTSUBSCRIPT italic_N italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_N italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG )
Mf(8Cf,𝒳1/2β(δ)NTγNTj1+4BNTjT)absentsubscript𝑀𝑓8superscriptsubscript𝐶𝑓𝒳12𝛽superscript𝛿𝑁𝑇subscript𝛾𝑁subscript𝑇𝑗14𝐵𝑁subscript𝑇𝑗𝑇\displaystyle\leq M_{f}\cdot\left(8C_{f,\mathcal{X}}^{1/2}\cdot\beta(\delta^{% \prime})\cdot\sqrt{NT\gamma_{NT_{j-1}}}+\frac{4BNT_{j}}{T}\right)≤ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( 8 italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG + divide start_ARG 4 italic_B italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG )
Mf(8Cf,𝒳1/2β(δ)NTγNT+4BNTjT).absentsubscript𝑀𝑓8superscriptsubscript𝐶𝑓𝒳12𝛽superscript𝛿𝑁𝑇subscript𝛾𝑁𝑇4𝐵𝑁subscript𝑇𝑗𝑇\displaystyle\leq M_{f}\cdot\left(8C_{f,\mathcal{X}}^{1/2}\cdot\beta(\delta^{% \prime})\cdot\sqrt{NT\gamma_{NT}}+\frac{4BNT_{j}}{T}\right).≤ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( 8 italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG 4 italic_B italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ) .

In the third line, we used the inequality TjTj1Tsubscript𝑇𝑗subscript𝑇𝑗1𝑇\frac{T_{j}}{\sqrt{T_{j-1}}}\leq\sqrt{T}divide start_ARG italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG end_ARG ≤ square-root start_ARG italic_T end_ARG, which follows from the definition of Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. In the last line, we used the fact that γmsubscript𝛾𝑚\gamma_{m}italic_γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is an increasing function of m𝑚mitalic_m. Thus, if J𝐽Jitalic_J denotes an upper bound on the number of epochs, we can write:

j=1JR(j)superscriptsubscript𝑗1𝐽superscript𝑅𝑗\displaystyle\sum_{j=1}^{J}R^{(j)}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT 2BMfNT1+j=2JMf(8Cf,𝒳1/2β(δ)NTγNT+4BNTjT)absent2𝐵subscript𝑀𝑓𝑁subscript𝑇1superscriptsubscript𝑗2𝐽subscript𝑀𝑓8superscriptsubscript𝐶𝑓𝒳12𝛽superscript𝛿𝑁𝑇subscript𝛾𝑁𝑇4𝐵𝑁subscript𝑇𝑗𝑇\displaystyle\leq 2BM_{f}\cdot NT_{1}+\sum_{j=2}^{J}M_{f}\cdot\left(8C_{f,% \mathcal{X}}^{1/2}\cdot\beta(\delta^{\prime})\cdot\sqrt{NT\gamma_{NT}}+\frac{4% BNT_{j}}{T}\right)≤ 2 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ( 8 italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG 4 italic_B italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG )
2BMfNT1+J(8Cf,𝒳β(δ)NTγNT)+4BNMfTj=1JTjabsent2𝐵subscript𝑀𝑓𝑁subscript𝑇1𝐽8superscriptsubscript𝐶𝑓𝒳𝛽superscript𝛿𝑁𝑇subscript𝛾𝑁𝑇4𝐵𝑁subscript𝑀𝑓𝑇superscriptsubscript𝑗1𝐽subscript𝑇𝑗\displaystyle\leq 2BM_{f}\cdot NT_{1}+J\cdot\left(8C_{f,\mathcal{X}}^{\prime}% \cdot\beta(\delta^{\prime})\cdot\sqrt{NT\gamma_{NT}}\right)+\frac{4BNM_{f}}{T}% \sum_{j=1}^{J}T_{j}≤ 2 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_J ⋅ ( 8 italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG ) + divide start_ARG 4 italic_B italic_N italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
2BMfNT1+J(8Cf,𝒳β(δ)NTγNT)+4BNMf.absent2𝐵subscript𝑀𝑓𝑁subscript𝑇1𝐽8superscriptsubscript𝐶𝑓𝒳𝛽superscript𝛿𝑁𝑇subscript𝛾𝑁𝑇4𝐵𝑁subscript𝑀𝑓\displaystyle\leq 2BM_{f}\cdot NT_{1}+J\cdot\left(8C_{f,\mathcal{X}}^{\prime}% \cdot\beta(\delta^{\prime})\cdot\sqrt{NT\gamma_{NT}}\right)+{4BNM_{f}}.≤ 2 italic_B italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ italic_N italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_J ⋅ ( 8 italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⋅ square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG ) + 4 italic_B italic_N italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT . (10)

We next optimize the length of the first epoch T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in order to achieve order optimal regret. DUETS achieves order optimal regret for Nmax(T,γNT)𝑁𝑇subscript𝛾𝑁𝑇N\leq\max(T,\gamma_{NT})italic_N ≤ roman_max ( italic_T , italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ).

If N<T𝑁𝑇N<Titalic_N < italic_T we can choose T1=TN+M¯(δ)subscript𝑇1𝑇𝑁¯𝑀superscript𝛿T_{1}=\sqrt{\frac{T}{N}}+\overline{M}(\delta^{{}^{\prime}})italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG italic_T end_ARG start_ARG italic_N end_ARG end_ARG + over¯ start_ARG italic_M end_ARG ( italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) where δ=δ2(loglogNT+4)superscript𝛿𝛿2𝑁𝑇4\delta^{{}^{\prime}}=\frac{\delta}{2(\log\log{NT}+4)}italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = divide start_ARG italic_δ end_ARG start_ARG 2 ( roman_log roman_log italic_N italic_T + 4 ) end_ARG. Left hand side of equation (10) can now be written as 𝒪~(NTγNTβ(δ))𝒪~(NTγNT(logTδ))~𝒪𝑁𝑇subscript𝛾𝑁𝑇𝛽superscript𝛿~𝒪𝑁𝑇subscript𝛾𝑁𝑇𝑇𝛿\widetilde{\mathcal{O}}(\sqrt{NT\gamma_{NT}}\beta(\delta^{{}^{\prime}}))\equiv% \widetilde{\mathcal{O}}\left(\sqrt{NT\gamma_{NT}}\left(\log{\frac{T}{\delta}}% \right)\right)over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG italic_β ( italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) ) ≡ over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG ( roman_log divide start_ARG italic_T end_ARG start_ARG italic_δ end_ARG ) ).
If NγNT𝑁subscript𝛾𝑁𝑇N\leq\gamma_{NT}italic_N ≤ italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT we can fix T1=T+M¯(δ)subscript𝑇1𝑇¯𝑀superscript𝛿T_{1}=\sqrt{T}+\overline{M}(\delta^{{}^{\prime}})italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = square-root start_ARG italic_T end_ARG + over¯ start_ARG italic_M end_ARG ( italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ). We have NT1O~(NTγNT)𝑁subscript𝑇1~𝑂𝑁𝑇subscript𝛾𝑁𝑇NT_{1}\leq\widetilde{O}(\sqrt{NT\gamma_{NT}})italic_N italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ over~ start_ARG italic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG ) and the left hand-side is once again 𝒪~(NTγNT(logTδ))~𝒪𝑁𝑇subscript𝛾𝑁𝑇𝑇𝛿\widetilde{\mathcal{O}}\left(\sqrt{NT\gamma_{NT}}\left(\log{\frac{T}{\delta}}% \right)\right)over~ start_ARG caligraphic_O end_ARG ( square-root start_ARG italic_N italic_T italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT end_ARG ( roman_log divide start_ARG italic_T end_ARG start_ARG italic_δ end_ARG ) ). ∎

Note that by Lemma 4.4, J𝐽Jitalic_J is upper bounded by log(logNlogT)+4𝑁𝑇4\log(\log{N}\log{T})+4roman_log ( roman_log italic_N roman_log italic_T ) + 4 and is thus O~(1)~𝑂1\widetilde{O}(1)over~ start_ARG italic_O end_ARG ( 1 ).

Before moving onto the proofs of Lemmas 4.2 and 4.4, we state two auxiliary lemmas that will be useful for our analysis.

Definition A.1.

Let 𝒟={x1,x2,,xm}𝒳𝒟subscript𝑥1subscript𝑥2subscript𝑥𝑚𝒳\mathcal{D}=\{x_{1},x_{2},\dots,x_{m}\}\subset\mathcal{X}caligraphic_D = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ⊂ caligraphic_X be a collection m𝑚mitalic_m points and 𝒮𝒮\mathcal{S}caligraphic_S be any subset of 𝒟𝒟\mathcal{D}caligraphic_D. Let σ𝒟2()superscriptsubscript𝜎𝒟2\sigma_{\mathcal{D}}^{2}(\cdot)italic_σ start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ ) denote the posterior variance corresponding to the points in 𝒟𝒟\mathcal{D}caligraphic_D and σ~𝒮2()superscriptsubscript~𝜎𝒮2\tilde{\sigma}_{\mathcal{S}}^{2}(\cdot)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ⋅ ) denote the approximate posterior computed based on the points in 𝒮𝒮\mathcal{S}caligraphic_S. We call 𝒮𝒮\mathcal{S}caligraphic_S to be an ε𝜀\varepsilonitalic_ε-accurate inducing set if the following relations are true for all x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X.

1ε1+εσ~𝒮2(x)σ𝒟2(x)1+ε1εσ~𝒮2(x).1𝜀1𝜀superscriptsubscript~𝜎𝒮2𝑥superscriptsubscript𝜎𝒟2𝑥1𝜀1𝜀superscriptsubscript~𝜎𝒮2𝑥\displaystyle\frac{1-\varepsilon}{1+\varepsilon}\cdot\tilde{\sigma}_{\mathcal{% S}}^{2}(x)\leq\sigma_{\mathcal{D}}^{2}(x)\leq\frac{1+\varepsilon}{1-% \varepsilon}\cdot\tilde{\sigma}_{\mathcal{S}}^{2}(x).divide start_ARG 1 - italic_ε end_ARG start_ARG 1 + italic_ε end_ARG ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ italic_σ start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ divide start_ARG 1 + italic_ε end_ARG start_ARG 1 - italic_ε end_ARG ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) .
Lemma A.2 (Adapted from Calandriello et al. (2019)).

Let 𝒟={x1,x2,,xm}𝒳𝒟subscript𝑥1subscript𝑥2normal-…subscript𝑥𝑚𝒳\mathcal{D}=\{x_{1},x_{2},\dots,x_{m}\}\subset\mathcal{X}caligraphic_D = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ⊂ caligraphic_X be a collection m𝑚mitalic_m points and 𝒮𝒮\mathcal{S}caligraphic_S be a random subset of 𝒟𝒟\mathcal{D}caligraphic_D constructed by including each point with probability p𝑝pitalic_p, independent of other points. Then 𝒮𝒮\mathcal{S}caligraphic_S is an ε𝜀\varepsilonitalic_ε-accurate inducing set with probability 14mexp(3pε28σmax2)14𝑚3𝑝superscript𝜀28superscriptsubscript𝜎2\displaystyle 1-4m\exp\left(-\frac{3p\varepsilon^{2}}{8\sigma_{\max}^{2}}\right)1 - 4 italic_m roman_exp ( - divide start_ARG 3 italic_p italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_σ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ), where σmax2=supx𝒳σ𝒟2(x)superscriptsubscript𝜎2subscriptsupremum𝑥𝒳superscriptsubscript𝜎𝒟2𝑥\sigma_{\max}^{2}=\sup_{x\in\mathcal{X}}\sigma_{\mathcal{D}}^{2}(x)italic_σ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ).

Lemma A.3.

Let DUETS be run with a choice of p0=72log(4NT/δ)subscript𝑝0724𝑁𝑇superscript𝛿normal-′p_{0}=72\log(4NT/\delta^{\prime})italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 72 roman_log ( 4 italic_N italic_T / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Then, for all epochs j1𝑗1j\geq 1italic_j ≥ 1, the global inducing set 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is 0.50.50.50.5-accurate with probability 1δ1𝛿1-\delta1 - italic_δ.

Proof.

The statement is an immediate consequence of Lemma A.2 with the given choice of parameter p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. ∎

We are now ready to prove Lemmas 4.2 and 4.4.

A.2 Proof of Lemma 4.2

Proof.

Consider any epoch j2𝑗2j\geq 2italic_j ≥ 2 and let x𝒳j𝑥subscript𝒳𝑗x\in\mathcal{X}_{j}italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Let Δ(x):=f(x*)f(x)assignΔ𝑥𝑓superscript𝑥𝑓𝑥\Delta(x):=f(x^{*})-f(x)roman_Δ ( italic_x ) := italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x ). We will obtain a bound on Δ(x)Δ𝑥\Delta(x)roman_Δ ( italic_x ) for any general x𝑥xitalic_x in order establish the bound on ΔjsubscriptΔ𝑗\Delta_{j}roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Using the discretization from Assumption 2.3 for 𝒳jsubscript𝒳𝑗\mathcal{X}_{j}caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we obtain,

Δ(x)Δ𝑥\displaystyle\Delta(x)roman_Δ ( italic_x ) =f(x*)f(x)absent𝑓superscript𝑥𝑓𝑥\displaystyle=f(x^{*})-f(x)= italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( italic_x )
f(x*)f([x*]𝒰T)+f([x*]𝒰T)(f(x)f([x]𝒰T))f([x]𝒰T)absent𝑓superscript𝑥𝑓subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝑓subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝑓𝑥𝑓subscriptdelimited-[]𝑥subscript𝒰𝑇𝑓subscriptdelimited-[]𝑥subscript𝒰𝑇\displaystyle\leq f(x^{*})-f([x^{*}]_{\mathcal{U}_{T}})+f([x^{*}]_{\mathcal{U}% _{T}})-(f(x)-f([x]_{\mathcal{U}_{T}}))-f([x]_{\mathcal{U}_{T}})≤ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_f ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_f ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - ( italic_f ( italic_x ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
f([x*]𝒰T)f([x]𝒰T)+2BT.absent𝑓subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝑓subscriptdelimited-[]𝑥subscript𝒰𝑇2𝐵𝑇\displaystyle\leq f([x^{*}]_{\mathcal{U}_{T}})-f([x]_{\mathcal{U}_{T}})+\frac{% 2B}{T}.≤ italic_f ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG .

Using the result from Salgia et al. (2023b), we obtain the following high probability bound that holds with probability 1δ1𝛿1-\delta1 - italic_δ:

Δ(x)Δ𝑥\displaystyle\Delta(x)roman_Δ ( italic_x ) f([x*]𝒰T)f([x]𝒰T)+2BTabsent𝑓subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝑓subscriptdelimited-[]𝑥subscript𝒰𝑇2𝐵𝑇\displaystyle\leq f([x^{*}]_{\mathcal{U}_{T}})-f([x]_{\mathcal{U}_{T}})+\frac{% 2B}{T}≤ italic_f ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_f ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG
μ~j([x*]𝒰T)+β(δ)σ~j([x*]𝒰T)μ~j([x]𝒰T)+β(δ)σ~j([x]𝒰T)+2BTabsentsubscript~𝜇𝑗subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]superscript𝑥subscript𝒰𝑇subscript~𝜇𝑗subscriptdelimited-[]𝑥subscript𝒰𝑇𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]𝑥subscript𝒰𝑇2𝐵𝑇\displaystyle\leq\tilde{\mu}_{j}([x^{*}]_{\mathcal{U}_{T}})+\beta(\delta^{% \prime})\tilde{\sigma}_{j}([x^{*}]_{\mathcal{U}_{T}})-\tilde{\mu}_{j}([x]_{% \mathcal{U}_{T}})+\beta(\delta^{\prime})\tilde{\sigma}_{j}([x]_{\mathcal{U}_{T% }})+\frac{2B}{T}≤ over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 2 italic_B end_ARG start_ARG italic_T end_ARG
μ~j(x*)μ~j(x)+β(δ)σ~j([x*]𝒰T)+β(δ)σ~j([x]𝒰T)+4BT,absentsubscript~𝜇𝑗superscript𝑥subscript~𝜇𝑗𝑥𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]𝑥subscript𝒰𝑇4𝐵𝑇\displaystyle\leq\tilde{\mu}_{j}(x^{*})-\tilde{\mu}_{j}(x)+\beta(\delta^{% \prime})\tilde{\sigma}_{j}([x^{*}]_{\mathcal{U}_{T}})+\beta(\delta^{\prime})% \tilde{\sigma}_{j}([x]_{\mathcal{U}_{T}})+\frac{4B}{T},≤ over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG ,

where we again used Assumption 2.3 in the last step. We claim that x*𝒳j1superscript𝑥subscript𝒳𝑗1x^{*}\in\mathcal{X}_{j-1}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT for all j2𝑗2j\geq 2italic_j ≥ 2. Assuming this claim this true, we can bound the above expression as

Δ(x)Δ𝑥\displaystyle\Delta(x)roman_Δ ( italic_x ) supx𝒳j1μ~j(x)μ~j(x)+β(δ)σ~j([x*]𝒰T)+β(δ)σ~j([x]𝒰T)+4BTabsentsubscriptsupremum𝑥subscript𝒳𝑗1subscript~𝜇𝑗superscript𝑥subscript~𝜇𝑗𝑥𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]𝑥subscript𝒰𝑇4𝐵𝑇\displaystyle\leq\sup_{x\in\mathcal{X}_{j-1}}\tilde{\mu}_{j}(x^{\prime})-% \tilde{\mu}_{j}(x)+\beta(\delta^{\prime})\tilde{\sigma}_{j}([x^{*}]_{\mathcal{% U}_{T}})+\beta(\delta^{\prime})\tilde{\sigma}_{j}([x]_{\mathcal{U}_{T}})+\frac% {4B}{T}≤ roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - over~ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG
2β(δ)σj,max+β(δ)σ~j([x*]𝒰T)+β(δ)σ~j([x]𝒰T)+4BT,absent2𝛽superscript𝛿subscript𝜎𝑗𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]superscript𝑥subscript𝒰𝑇𝛽superscript𝛿subscript~𝜎𝑗subscriptdelimited-[]𝑥subscript𝒰𝑇4𝐵𝑇\displaystyle\leq 2\beta(\delta^{\prime})\sigma_{j,\max}+\beta(\delta^{\prime}% )\tilde{\sigma}_{j}([x^{*}]_{\mathcal{U}_{T}})+\beta(\delta^{\prime})\tilde{% \sigma}_{j}([x]_{\mathcal{U}_{T}})+\frac{4B}{T},≤ 2 italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( [ italic_x ] start_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG ,

where we used the update condition (Eqn. (9)) in the second step. Since 𝒮jsubscript𝒮𝑗\mathcal{S}_{j}caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is 0.50.50.50.5-accurate (Lemma A.3), we have σ~j2(x)3σj2(x)3σj,max2superscriptsubscript~𝜎𝑗2𝑥3superscriptsubscript𝜎𝑗2𝑥3superscriptsubscript𝜎𝑗2\tilde{\sigma}_{j}^{2}(x)\leq 3\sigma_{j}^{2}(x)\leq 3\sigma_{j,\max}^{2}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ 3 italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x ) ≤ 3 italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. On plugging this back into the above equation, we obtain,

Δ(x)Δ𝑥\displaystyle\Delta(x)roman_Δ ( italic_x ) 8β(δ)σj,max+4BT.absent8𝛽superscript𝛿subscript𝜎𝑗4𝐵𝑇\displaystyle\leq 8\beta(\delta^{\prime})\sigma_{j,\max}+\frac{4B}{T}.≤ 8 italic_β ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT + divide start_ARG 4 italic_B end_ARG start_ARG italic_T end_ARG .

The statement of the lemma follows by Δj=supx𝒳jΔ(x)subscriptΔ𝑗subscriptsupremum𝑥subscript𝒳𝑗Δ𝑥\Delta_{j}=\sup_{x\in\mathcal{X}_{j}}\Delta(x)roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Δ ( italic_x ).

We prove our claim x*𝒳jsuperscript𝑥subscript𝒳𝑗x^{*}\in\mathcal{X}_{j}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all j1𝑗1j\geq 1italic_j ≥ 1 using induction. Clearly, x*𝒳1=𝒳superscript𝑥subscript𝒳1𝒳x^{*}\in\mathcal{X}_{1}=\mathcal{X}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = caligraphic_X, by definition. Assume x*𝒳j1superscript𝑥subscript𝒳𝑗1x^{*}\in\mathcal{X}_{j-1}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT for some j2𝑗2j\geq 2italic_j ≥ 2. Fix an arbitrary x𝒳j1𝑥subscript𝒳𝑗1x\in\mathcal{X}_{j-1}italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT, from the confidence bound lemma we have:

μj1(x)μj1(x*)subscript𝜇𝑗1𝑥subscript𝜇𝑗1superscript𝑥absent\displaystyle\mu_{j-1}(x)-\mu_{j-1}(x^{*})\leqitalic_μ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ (f(x)f(x*))+β(δ)(σj(x)+σj(x*))2σj1.max(x),𝑓𝑥𝑓superscript𝑥𝛽superscript𝛿subscript𝜎𝑗𝑥subscript𝜎𝑗superscript𝑥2subscript𝜎formulae-sequence𝑗1max𝑥\displaystyle(f(x)-f(x^{*}))+\beta(\delta^{{}^{\prime}})(\sigma_{j}(x)+\sigma_% {j}(x^{*}))\leq 2\sigma_{j-1.\mathrm{max}}(x),( italic_f ( italic_x ) - italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) + italic_β ( italic_δ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) ( italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) + italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) ≤ 2 italic_σ start_POSTSUBSCRIPT italic_j - 1 . roman_max end_POSTSUBSCRIPT ( italic_x ) ,

where the second inequality follows as f(x)f(x*)𝑓𝑥𝑓superscript𝑥f(x)\leq f(x^{*})italic_f ( italic_x ) ≤ italic_f ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). As the inequality holds x𝒳j1for-all𝑥subscript𝒳𝑗1\forall x\in\mathcal{X}_{j-1}∀ italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT we must have:

supx𝒳j1μj1(x)μj1(x*)2σj1.max(x)subscriptsupremum𝑥subscript𝒳𝑗1subscript𝜇𝑗1𝑥subscript𝜇𝑗1superscript𝑥2subscript𝜎formulae-sequence𝑗1max𝑥\displaystyle\sup_{x\in\mathcal{X}_{j-1}}\mu_{j-1}(x)-\mu_{j-1}(x^{*})\leq 2% \sigma_{j-1.\mathrm{max}}(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x ) - italic_μ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ 2 italic_σ start_POSTSUBSCRIPT italic_j - 1 . roman_max end_POSTSUBSCRIPT ( italic_x )

and thus indeed x𝒳j𝑥subscript𝒳𝑗x\in\mathcal{X}_{j}italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

A.3 Proof of Lemma 4.4

We define E(s):=min{j:TjT/4|T1=s}assign𝐸𝑠:𝑗subscript𝑇𝑗conditional𝑇4subscript𝑇1𝑠E(s):=\min\{j:T_{j}\geq T/4\ |\ T_{1}=s\}italic_E ( italic_s ) := roman_min { italic_j : italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_T / 4 | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_s }. Note that Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is an increasing function of j𝑗jitalic_j. Since TE(s)T/4subscript𝑇𝐸𝑠𝑇4T_{E(s)}\geq T/4italic_T start_POSTSUBSCRIPT italic_E ( italic_s ) end_POSTSUBSCRIPT ≥ italic_T / 4, we can conclude that E(s)+4𝐸𝑠4E(s)+4italic_E ( italic_s ) + 4 is an upper bound on the number of epochs. Thus, we focus on bounding E(s)𝐸𝑠E(s)italic_E ( italic_s ). We first show that E(s)𝐸𝑠E(s)italic_E ( italic_s ) is a non-decreasing function of s𝑠sitalic_s.

To that effect, we claim that for j2𝑗2j\geq 2italic_j ≥ 2 the epoch lengths satisfy the relation TjT12j+1T12j+1subscript𝑇𝑗superscript𝑇1superscript2𝑗1superscriptsubscript𝑇1superscript2𝑗1T_{j}\geq T^{1-2^{-j+1}}\cdot T_{1}^{2^{-j+1}}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_T start_POSTSUPERSCRIPT 1 - 2 start_POSTSUPERSCRIPT - italic_j + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT - italic_j + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. This relation follows immediately using induction. For the base case, note that T2T1/2T11/2subscript𝑇2superscript𝑇12superscriptsubscript𝑇112T_{2}\geq T^{1/2}\cdot T_{1}^{1/2}italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_T start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, by definition. Assume that the relation holds for j1𝑗1j-1italic_j - 1. Thus,

TjT1/2Tj11/2T1/2T12(j1)+11T12(j1)+11T12j+1T12j+1.subscript𝑇𝑗superscript𝑇12superscriptsubscript𝑇𝑗112superscript𝑇12superscript𝑇1superscript2𝑗111superscriptsubscript𝑇1superscript2𝑗111superscript𝑇1superscript2𝑗1superscriptsubscript𝑇1superscript2𝑗1\displaystyle T_{j}\geq T^{1/2}\cdot T_{j-1}^{1/2}\geq T^{1/2}\cdot T^{1-2^{-(% j-1)+1-1}}\cdot T_{1}^{2^{-(j-1)+1-1}}\geq T^{1-2^{-j+1}}\cdot T_{1}^{2^{-j+1}}.italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_T start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ≥ italic_T start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUPERSCRIPT 1 - 2 start_POSTSUPERSCRIPT - ( italic_j - 1 ) + 1 - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT - ( italic_j - 1 ) + 1 - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ≥ italic_T start_POSTSUPERSCRIPT 1 - 2 start_POSTSUPERSCRIPT - italic_j + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT - italic_j + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . (11)

Since Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’s are lower bounded by an increasing function of T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the number of epochs E(s)𝐸𝑠E(s)italic_E ( italic_s ) is a non-increasing function of s𝑠sitalic_s. Since T1TNsubscript𝑇1𝑇𝑁T_{1}\geq\frac{T}{N}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ divide start_ARG italic_T end_ARG start_ARG italic_N end_ARG, E(TN)𝐸𝑇𝑁E\left(\frac{T}{N}\right)italic_E ( divide start_ARG italic_T end_ARG start_ARG italic_N end_ARG ) is an upper bound on the number of epochs for all choices of T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Let j*=max{log(log(T)),log(log(N))}superscript𝑗𝑇𝑁j^{*}=\max\{\log(\log(T)),\log(\log(N))\}italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_max { roman_log ( roman_log ( italic_T ) ) , roman_log ( roman_log ( italic_N ) ) }. Using the above relation for Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT from Eqn. (11) and the lower bound on T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have,

Tj*TN21j=T(2logN2j)2T22subscript𝑇superscript𝑗𝑇superscript𝑁superscript21𝑗𝑇superscriptsuperscript2𝑁superscript2𝑗2𝑇superscript22\displaystyle T_{j^{*}}\geq T\cdot{N^{-2^{1-j}}}=T\cdot\left(2^{-\frac{\log{N}% }{2^{j}}}\right)^{2}\geq T\cdot 2^{-2}italic_T start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ italic_T ⋅ italic_N start_POSTSUPERSCRIPT - 2 start_POSTSUPERSCRIPT 1 - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = italic_T ⋅ ( 2 start_POSTSUPERSCRIPT - divide start_ARG roman_log italic_N end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_T ⋅ 2 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT

We can hence conclude that Tj*T/4subscript𝑇superscript𝑗𝑇4T_{j^{*}}\geq T/4italic_T start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ italic_T / 4, which implies that E(T1)j*𝐸subscript𝑇1superscript𝑗E(T_{1})\leq j^{*}italic_E ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ italic_j start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for all permissible choices of T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Consequently, the number of epochs are bounded as log(log(max{N,T}))+4𝑁𝑇4\log(\log(\max\{N,T\}))+4roman_log ( roman_log ( roman_max { italic_N , italic_T } ) ) + 4.

A.4 Proof of Lemma 4.5

For all epochs j1𝑗1j\geq 1italic_j ≥ 1, recall that the inducing set is constructed by including each point from 𝒟jsubscript𝒟𝑗\mathcal{D}_{j}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, independent of other points. Thus, |𝒮j|subscript𝒮𝑗|\mathcal{S}_{j}|| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | is a binomial random variable with parameters |𝒟j|=NTjsubscript𝒟𝑗𝑁subscript𝑇𝑗|\mathcal{D}_{j}|=NT_{j}| caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | = italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Using the Chernoff bound for Binomial random variables, we can conclude that

Pr(|𝒮j|>(1+ε)NTjpj)exp(ε2NTjpj2+ε).Prsubscript𝒮𝑗1𝜀𝑁subscript𝑇𝑗subscript𝑝𝑗superscript𝜀2𝑁subscript𝑇𝑗subscript𝑝𝑗2𝜀\displaystyle\Pr(|\mathcal{S}_{j}|>(1+\varepsilon)NT_{j}p_{j})\leq\exp\left(-% \frac{\varepsilon^{2}NT_{j}p_{j}}{2+\varepsilon}\right).roman_Pr ( | caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | > ( 1 + italic_ε ) italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ roman_exp ( - divide start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG 2 + italic_ε end_ARG ) .

Invoking the bound with ε=2+log(1/δ)𝜀21superscript𝛿\varepsilon=2+\log(1/\delta^{\prime})italic_ε = 2 + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), with δ=δ/(loglog(NT)+4)superscript𝛿𝛿𝑁𝑇4\delta^{\prime}=\delta/(\log\log(NT)+4)italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_δ / ( roman_log roman_log ( italic_N italic_T ) + 4 ) yields that the following relation holds with probability 1δ1superscript𝛿1-\delta^{\prime}1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT:

|𝒮j|subscript𝒮𝑗\displaystyle|\mathcal{S}_{j}|| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | (3+log(1/δ))NTjpjabsent31superscript𝛿𝑁subscript𝑇𝑗subscript𝑝𝑗\displaystyle\leq(3+\log(1/\delta^{\prime}))\cdot NT_{j}p_{j}≤ ( 3 + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
(3+log(1/δ))NTjp0σj,max2absent31superscript𝛿𝑁subscript𝑇𝑗subscript𝑝0superscriptsubscript𝜎𝑗2\displaystyle\leq(3+\log(1/\delta^{\prime}))\cdot NT_{j}\cdot p_{0}\sigma_{j,% \max}^{2}≤ ( 3 + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j , roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(3+log(1/δ))NTjp0Cf,𝒳γNTjNTjabsent31superscript𝛿𝑁subscript𝑇𝑗subscript𝑝0subscript𝐶𝑓𝒳subscript𝛾𝑁subscript𝑇𝑗𝑁subscript𝑇𝑗\displaystyle\leq(3+\log(1/\delta^{\prime}))\cdot NT_{j}p_{0}\cdot C_{f,% \mathcal{X}}\cdot\frac{\gamma_{NT_{j}}}{NT_{j}}≤ ( 3 + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ⋅ italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_C start_POSTSUBSCRIPT italic_f , caligraphic_X end_POSTSUBSCRIPT ⋅ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_N italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG
(3+log(1/δ))p0γNT,absent31superscript𝛿subscript𝑝0subscript𝛾𝑁𝑇\displaystyle\leq(3+\log(1/\delta^{\prime}))p_{0}\gamma_{NT},≤ ( 3 + roman_log ( 1 / italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ,

where we used Lemma 4.3 in the third step and monotonicity of γmsubscript𝛾𝑚\gamma_{m}italic_γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in the last step. On taking a union bound over all epochs and using the bound on the number of epochs from Lemma 4.4, we conclude that for all epochs j𝑗jitalic_j, |𝒮j|=𝒪~(γNT)subscript𝒮𝑗~𝒪subscript𝛾𝑁𝑇|\mathcal{S}_{j}|=\tilde{\mathcal{O}}(\gamma_{NT})| caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | = over~ start_ARG caligraphic_O end_ARG ( italic_γ start_POSTSUBSCRIPT italic_N italic_T end_POSTSUBSCRIPT ) with probability 1δ1𝛿1-\delta1 - italic_δ.