Online Stackelberg Optimization via Nonlinear Control

William Brown Columbia University & Morgan Stanley MLR. Email: [email protected].    Christos Papadimitriou Columbia University. Email: [email protected]    Tim Rougharden Columbia University & a16z crypto. Email: [email protected]
(June 27, 2024)
Abstract

In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses. We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy local controllability, with convex losses over a bounded state space which encodes agent behavior, and we introduce a unified algorithmic framework for tractable regret minimization in such cases. When the instance dynamics are known but otherwise arbitrary, we obtain oracle-efficient O(T)𝑂𝑇O(\sqrt{T})italic_O ( square-root start_ARG italic_T end_ARG ) regret by reduction to online convex optimization, which can be made computationally efficient if dynamics are locally action-linear. In the presence of adversarial disturbances to the state, we give tight bounds in terms of either the cumulative or per-round disturbance magnitude (for strongly or weakly locally controllable dynamics, respectively). Additionally, we give sublinear regret results for the cases of unknown locally action-linear dynamics as well as for the bandit feedback setting. Finally, we demonstrate applications of our framework to well-studied problems including performative prediction, recommendations for adaptive agents, adaptive pricing of real-valued goods, and repeated gameplay against no-regret learners, directly yielding extensions beyond prior results in each case.

1 Introduction

Machine learning problems involving strategic or adaptive agents are commonly framed as Stackelberg games, wherein the leader aims to commit to an optimal strategy in anticipation of the follower’s best response. This approach has been effectively applied to challenges ranging from performative feature manipulation (Hardt et al., 2015; Dong et al., 2018; Perdomo et al., 2020; Jagadeesan et al., 2022b) and optimal pricing (Roth et al., 2015; Daskalakis and Syrgkanis, 2015; Nedelec et al., 2020) to resource allocation in security games (Blum et al., 2014; Balcan et al., 2015; Alcantara-Jiménez and Clempner, 2020) and learning in tabular games (Letchford et al., 2009; Peng et al., 2019; Lauffer et al., 2022; Collina et al., 2023), often with a regret minimization objective. Additionally, several of these settings have been independently extended to account for agents that may update their strategies gradually over time rather than optimally responding in each round (Zrnic et al., 2021a; Brown et al., 2022; Braverman et al., 2017; Deng et al., 2019; Brown et al., 2023). Despite their conceptual similarities, these problems have largely been approached as distinct areas of study, each with their own growing body of techniques. Our aim in this work is to offer a unifying perspective and algorithmic approach for problems of this form, through the lens of online control.

For the broad family of online “Stackelberg-style” optimization problems, the language of control is quite natural to adopt: we are navigating a dynamical system where states corresponding to agent strategies evolve as a function of our own actions, and where objectives which consider best-response stability can be expressed in terms of the stationary behavior of this system. Our results consider a general class of online control instances for representing such problems, which we introduce in Section 2, and in Section 3 we give a sequence of no-regret algorithms for these instances satisfying a range of robustness properties. In Section 4, we show that several online optimization problems involving adaptive agents, including variants of online performative prediction (as in Kumar et al. (2022)), online recommendations (as in Agarwal and Brown (2023)), adaptive pricing (as in Roth et al. (2015)), and learning in time-varying games (as in Anagnostides et al. (2023)) can be embedded in our framework and solved by our algorithms.

While there has been a great deal of recent progress in online linear control, yielding algorithms which can optimize over stabilizing linear policies even with general convex costs, adversarial disturbances, and unknown dynamics (Agarwal et al., 2019a; Simchowitz et al., 2020; Cassel et al., 2022; Minasyan et al., 2022), the required assumptions and regret benchmarks for these algorithms do not always type-check with the settings we are interested in. For the examples we consider, we will often wish to allow for nonlinear dynamics (e.g. encoding an agent’s utility function) and explicitly bounded spaces (e.g. via projection into the simplex), and we will seek to compete with regret benchmarks which correspond to stable responses by the agent. Unfortunately, as we show in Proposition 2, the latter goal is incompatible with linear policies even under linear dynamics and in the absence of any disturbances: the performance of every linear policy can be Ω(T)Ω𝑇\Omega(T)roman_Ω ( italic_T ) worse than the best policy in the class of affine “state-targeting” policies.

In contrast, the orthogonal set of assumptions we identify enables tractable regret minimization even for nonlinear control problems and comports with the requirements of Stackelberg optimization across a wide range of settings, including the ability to compete with state-targeting policies. For convex and compact state and action spaces 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X and 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, our first key assumption is that the dynamics D(x,y):𝒳×𝒴𝒴:𝐷𝑥𝑦𝒳𝒴𝒴D(x,y):\operatorname{\mathcal{X}}\times\operatorname{\mathcal{Y}}\rightarrow% \operatorname{\mathcal{Y}}italic_D ( italic_x , italic_y ) : caligraphic_X × caligraphic_Y → caligraphic_Y satisfy a notion of local controllability. While local controllability is well-studied for continuous-time and asymptotic control (Aoki, 1974; Kuhn and Wohltmann, 1989; Barbero-Liñán and Jakubczyk, 2013; Boscain et al., 2021), we are unaware of any prior applications to finite-time online optimization, and we adapt existing definitions to be appropriate for this setting. We say that D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) is strongly locally controllable if every state in a fixed-radius ball around y𝑦yitalic_y is reachable in a single round by an appropriate choice of x𝑥xitalic_x, and that D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) is weakly locally controllable if the reachable radius around y𝑦yitalic_y is allowed to vanish near the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y. We also assume that our loss ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in each round is determined (or well-approximated by) an adversarially-chosen convex function depending only on the state ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

When these conditions hold, we show in Theorem 1 that this is sufficient to obtain O(T)𝑂𝑇O(\sqrt{T})italic_O ( square-root start_ARG italic_T end_ARG ) regret with respect to the loss of the best fixed state, provided that dynamics are known and we have offline access to an oracle for non-convex optimization; the oracle call can be removed if dynamics are locally action-linear, i.e. given by (or locally well-approximated by) a function linear in x𝑥xitalic_x at each fixed y𝑦yitalic_y. If adversarial disturbances to the dynamics are present, our approach can be extended for both weakly (Theorem 2) and strongly (Theorem 3) locally controllable dynamics with additional regret scaling linearly in total disturbance magnitude, provided that each round’s disturbance cannot be too large in the case of weak local controllability; we give lower bounds showing that each dependence on disturbance magnitude is tight. The aforementioned results all extend to the case where the dynamics (absent disturbances) are given by a known but time-dependent function Dt(x,y)subscript𝐷𝑡𝑥𝑦D_{t}(x,y)italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_y ). If dynamics are unknown but time-invariant, and locally action-linear with appropriate regularity parameters, we obtain sublinear regret provided that a “near-stabilizing” action is known at t=1𝑡1t=1italic_t = 1. We additionally extend our approach to the bandit feedback setting, where we obtain O(T3/4)𝑂superscript𝑇34O(T^{3/4})italic_O ( italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ) regret. In Section 4 we show that each of the following, with appropriate assumptions, can be cast as a locally controllable instance with state-only convex surrogate losses:

  • Performative prediction: Minimize prediction loss 𝔼zptft(xt,z)subscript𝔼similar-to𝑧subscript𝑝𝑡subscript𝑓𝑡subscript𝑥𝑡𝑧\operatorname*{\mathbb{E}}_{z\sim p_{t}}f_{t}(x_{t},z)blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z ) for a classifier xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where the distribution ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in each round is updated according to the prior classifier and distribution.

  • Adaptive recommendations: Maximize the reward ft(it)subscript𝑓𝑡subscript𝑖𝑡f_{t}(i_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) when showing menus Kt[n]subscript𝐾𝑡delimited-[]𝑛K_{t}\subseteq[n]italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊆ [ italic_n ] of size knmuch-less-than𝑘𝑛k\ll nitalic_k ≪ italic_n to an agent, whose choice itp(Kt,vt)similar-tosubscript𝑖𝑡𝑝subscript𝐾𝑡subscript𝑣𝑡i_{t}\sim p(K_{t},v_{t})italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in each round depends on preferences which are influenced by choices in prior rounds (encoded in the “memory vector” vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT).

  • Adaptive pricing: Maximize profit pt,xtsubscript𝑝𝑡subscript𝑥𝑡\langle p_{t},x_{t}\rangle⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ - ct(xt)subscript𝑐𝑡subscript𝑥𝑡c_{t}(x_{t})italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for selling bundles of goods xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to an agent at prices ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and with costs ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where the agent’s purchased bundle xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a function of their utility function, consumption rate, and existing reserves.

  • Repeated gameplay: Maximize the reward xtAtytsuperscriptsubscript𝑥𝑡topsubscript𝐴𝑡subscript𝑦𝑡x_{t}^{\top}A_{t}y_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT obtained from playing a sequence of time-varying games (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) against a no-regret learning agent.

In each case, application of our algorithms from Section 3 yields results which extend beyond the applicability regimes of prior work, such as by enabling relaxation of previous assumptions or a novel extension to adversarial or dynamic problem variants.

1.1 Related Work

Online control.

Much of the recent progress in online control (Agarwal et al., 2019a, b; Cassel et al., 2022; Minasyan et al., 2022) considers linear systems with general convex losses, benchmarking against a class of (“strongly stable”) fast-mixing linear policies introduced for linear-quadratic control (Cohen et al., 2018) by leveraging the framework of “OCO with memory” (Anava et al., 2014). Results have also been shown for nonlinear policy classes via neural networks (Chen et al., 2022), and for nonlinear dynamics with oracles in episodic settings (Kakade et al., 2020), via approximation with random Fourier features (Lale et al., 2021; Luo et al., 2022), via adaptive regret for time-varying linear systems (Gradu et al., 2022; Minasyan et al., 2022), and via dynamic regret over actions in terms of disturbance “attenuation” (Muthirayan and Khargonekar, 2022). For a further overview of online control and its historical context, see Hazan and Singh (2022). In contrast to the bulk of prior work in which states and actions are bounded implicitly via policy stability notions, we consider state and action spaces which are bounded explicitly, as enabled by nonlinearity in dynamics (e.g. via projection, or range decay of dynamics near the boundary). These works also view disturbances as intrinsic to the system, and account for their influence directly in regret benchmarks (the “optimal policy” will face the same sequence of disturbances in hindsight, regardless of state). Within the context of Stackelberg optimization where a fixed protocol largely determines an agent’s strategy updates, we view the role of disturbances as more akin to adversarial corruptions as considered in reinforcement learning (Lykouris et al., 2021; Zhang et al., 2021); while we incur linear dependence, our regret benchmarks are agnostic to alternate counterfactual disturbance sequences.

Strategizing against learners.

Initially formulated within the context of repeated auctions (Braverman et al., 2017), a recent line of work has considered the problem of optimizing long-run rewards in a repeated game against a no-regret learner across a range of tabular and Bayesian settings (Deng et al., 2019; Mansour et al., 2022; Brown et al., 2023; Zhang et al., 2023). While bounds on attainable reward have been known in terms of the Price of Anarchy (Blum et al., 2008; Hartline et al., 2015b), this sequence of results has highlighted important connections with Stackelberg equilibria: the Stackelberg value of the game is attainable on average against any no-regret learner, and it is the maximum attainable value against many common no-regret algorithms (such as no-swap learners, as shown by Deng et al. (2019)). This theme has emerged in other simultaneous learning settings as well; notably, Zrnic et al. (2021b) show that long-run outcomes in strategic classification are shaped by relative learning rates between parties, which can designate either as the Stackelberg leader.

Nested convex optimization.

The technique of identifying convex structure nested inside a more general problem has been applied broadly across a range of online optimization settings (Neu and Olkhovskaya, 2021; Shen et al., 2023; Flokas et al., 2019). For repeated interaction problems involving an agent with unknown utility, such as optimal pricing, Roth et al. (2015) identify utility conditions under which the non-convex objective over prices becomes convex in the space of agent actions, and where explorability properties resembling local controllability hold, which enables convex optimization by locally learning agent preferences; this “revealed preferences” approach has also been applied to strategic classification (Dong et al., 2018). In recent work concerning recommendations for agents with history-dependent preferences (Agarwal and Brown, 2022, 2023), properties related to local controllability are leveraged to enable tractable optimization as well. We consider each of these settings as applications in Section 4.

2 Model and Preliminaries

Let 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X and 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y be convex and compact subsets of Euclidean space, respectively denoting the action and state spaces, where we assume dim(𝒳)dim(𝒴)dimension𝒳dimension𝒴\dim(\operatorname{\mathcal{X}})\geq\dim(\operatorname{\mathcal{Y}})roman_dim ( caligraphic_X ) ≥ roman_dim ( caligraphic_Y ). Further, we assume that 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y contains a ball of radius r𝑟ritalic_r around the origin 𝟎0\mathbf{0}bold_0, and is contained in a ball of radius R𝑅Ritalic_R around the origin.

An instance of our control problem consists of choosing a sequence of actions {xt𝒳}subscript𝑥𝑡𝒳\{x_{t}\in\operatorname{\mathcal{X}}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X } over T𝑇Titalic_T rounds, which will yield a sequence of states {yt𝒴}subscript𝑦𝑡𝒴\{y_{t}\in\operatorname{\mathcal{Y}}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_Y }, and we will incur losses determined by adversarially chosen functions {ft}subscript𝑓𝑡\{f_{t}\}{ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }. Let the initial state be y0=𝟎subscript𝑦00y_{0}=\mathbf{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0. In the basic version of our problem, upon choosing each xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for rounds t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], we observe the state update to

yt=subscript𝑦𝑡absent\displaystyle y_{t}=italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = D(xt,yt1),𝐷subscript𝑥𝑡subscript𝑦𝑡1\displaystyle\;D(x_{t},y_{t-1}),italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ,

where D:𝒳×𝒴𝒴:𝐷𝒳𝒴𝒴D:\operatorname{\mathcal{X}}\times\operatorname{\mathcal{Y}}\rightarrow% \operatorname{\mathcal{Y}}italic_D : caligraphic_X × caligraphic_Y → caligraphic_Y is an arbitrary continuous function which we refer to as the dynamics of our problem. We sometimes allow disturbances to the dynamics, where yt=D(xt,yt1)+wt+1subscript𝑦𝑡𝐷subscript𝑥𝑡subscript𝑦𝑡1subscript𝑤𝑡1y_{t}=D(x_{t},y_{t-1})+w_{t+1}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT for {wt}subscript𝑤𝑡\{w_{t}\}{ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } chosen adversarially. In some cases we allow time-varying dynamics D:𝒳×𝒴×[T]𝒴:𝐷𝒳𝒴delimited-[]𝑇𝒴D:\operatorname{\mathcal{X}}\times\operatorname{\mathcal{Y}}\times[T]% \rightarrow\operatorname{\mathcal{Y}}italic_D : caligraphic_X × caligraphic_Y × [ italic_T ] → caligraphic_Y, where the dynamics in each round are denoted by Dt(xt,yt1)subscript𝐷𝑡subscript𝑥𝑡subscript𝑦𝑡1D_{t}(x_{t},y_{t-1})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ).

Here and in Section 3, we assume that our loss in round is given by ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), where each ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a L𝐿Litalic_L-Lipschitz convex function revealed after playing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT; we relax these assumptions for some of our applications in Section 4, e.g. to allow dependence on xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as well. We generally measure will performance with respect to the best fixed state, and the regret for an algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A yielding {yt}subscript𝑦𝑡\{y_{t}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } is

RegT(𝒜)=t=1Tft(yt)miny𝒴t=1Tft(y).subscriptReg𝑇𝒜superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑦𝒴superscriptsubscript𝑡1𝑇subscript𝑓𝑡𝑦\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=% \sum_{t=1}^{T}f_{t}(y_{t})-\min_{y\in\operatorname{\mathcal{Y}}}\sum_{t=1}^{T}% {f}_{t}(y).Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_min start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) .

In Proposition 2, we relate this benchmark to the class of “state-targeting” policies, which can sometimes be expressed by affine functions, and we compare their performance to linear policies. Throughout, we use delimited-∥∥\left\lVert\cdot\right\rVert∥ ⋅ ∥ to donate the Euclidean norm, and we let ϵ(y)={y^:yy^ϵ}subscriptitalic-ϵ𝑦conditional-set^𝑦delimited-∥∥𝑦^𝑦italic-ϵ\operatorname{\mathcal{B}}_{\epsilon}(y)=\{\hat{y}:\left\lVert y-\hat{y}\right% \rVert\leq\epsilon\}caligraphic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_y ) = { over^ start_ARG italic_y end_ARG : ∥ italic_y - over^ start_ARG italic_y end_ARG ∥ ≤ italic_ϵ } denote the norm ball of radius ϵitalic-ϵ\epsilonitalic_ϵ around y𝑦yitalic_y. We let Π𝒴()subscriptΠ𝒴\Pi_{\operatorname{\mathcal{Y}}}(\cdot)roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( ⋅ ) denote Euclidean projection into the set 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y; 𝐮nsubscript𝐮𝑛\mathbf{u}_{n}bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes the uniform distribution over n𝑛nitalic_n items, and Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ) denotes the probability simplex.

2.1 Locally Controllable Dynamics

A number of properties under the name “local controllability” have been considered for various continuous-time and asymptotic control settings (Aoki, 1974; Kuhn and Wohltmann, 1989; Barbero-Liñán and Jakubczyk, 2013; Boscain et al., 2021), generally relating to the notion that all states in a neighborhood around a given state are reachable. We give two formulations of local controllability for our setting, which we take as properties of the dynamics D𝐷Ditalic_D holding over all inputs.

Definition 1 (Weak Local Controllability).

For ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ], an instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) satisfies (weak) ρ𝜌\rhoitalic_ρ-local controllability if for any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y and yρπ(y)(y)superscript𝑦subscript𝜌𝜋𝑦𝑦y^{*}\in\operatorname{\mathcal{B}}_{\rho\cdot\pi(y)}(y)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y ) end_POSTSUBSCRIPT ( italic_y ), there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{*}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, where π(y)=miny^bd(𝒴)y^y𝜋𝑦subscript^𝑦bd𝒴^𝑦𝑦\pi(y)=\min_{\hat{y}\in\operatorname*{bd}(\operatorname{\mathcal{Y}})}\left% \lVert\hat{y}-y\right\rVertitalic_π ( italic_y ) = roman_min start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG ∈ roman_bd ( caligraphic_Y ) end_POSTSUBSCRIPT ∥ over^ start_ARG italic_y end_ARG - italic_y ∥ is the distance from y𝑦yitalic_y to the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y.

Definition 2 (Strong Local Controllability).

For ρ>0𝜌0\rho>0italic_ρ > 0, an instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) satisfies strong ρ𝜌\rhoitalic_ρ-local controllability if for any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y and yρ(y)𝒴superscript𝑦subscript𝜌𝑦𝒴y^{*}\in\operatorname{\mathcal{B}}_{\rho}(y)\cap\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_y ) ∩ caligraphic_Y, there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{*}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

We often refer to weak local controllability simply as local controllability. This property ensures that there is always some action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which results in the next state ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT staying fixed at yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, as well as some action which moves the state to any point in a surrounding ball; in the weak case, the size of the reachable ball is allowed to decay as ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT approaches the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y. The parameter ρ𝜌\rhoitalic_ρ controls the speed at which we can navigate the state space: when ρ=1𝜌1\rho=1italic_ρ = 1 in the weak case (or ρR𝜌𝑅\rho\geq Ritalic_ρ ≥ italic_R in the strong case), we can always immediately reach some point on the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, yet for ρ𝜌\rhoitalic_ρ close to zero we may only be able to move in a small neighborhood. Our results use local controllability to minimize regret over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y by reduction to online convex optimization. As we prove in Appendix A, up to a quantifier alternation which vanishes as ρ𝜌\rhoitalic_ρ approaches 00, a property of this form is essentially necessary: competing with the best state y𝑦yitalic_y is impossible if we cannot remain in its neighborhood.

Proposition 1.

Suppose there is some y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y and values α,β>0𝛼𝛽0\alpha,\beta>0italic_α , italic_β > 0 such that for all y^α(y)^𝑦subscript𝛼𝑦\hat{y}\in\operatorname{\mathcal{B}}_{\alpha}(y)over^ start_ARG italic_y end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) and x𝒳𝑥𝒳x\in\operatorname{\mathcal{X}}italic_x ∈ caligraphic_X, D(x,y^)β(y^)𝐷𝑥^𝑦subscript𝛽^𝑦D(x,\hat{y})\notin\operatorname{\mathcal{B}}_{\beta}(\hat{y})italic_D ( italic_x , over^ start_ARG italic_y end_ARG ) ∉ caligraphic_B start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG ). Then, there are losses such that RegT(𝒜)=Ω(T)subscriptReg𝑇𝒜Ω𝑇\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=\Omega(T)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = roman_Ω ( italic_T ) for any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A.

2.2 States vs. Policies

While regret benchmarks in online control are typically expressed in terms of a reference class of policies, we note that there is a class of “state-targeting” policies which track the reward of fixed states (asymptotically, and up to the influence of disturbances), and which can be implemented if D𝐷Ditalic_D is known; we maintain the formulation in terms of fixed states for clarity with respect to our motivations for Stackelberg optimization. Existing no-regret algorithms for online control typically compete with linear policies, and choose actions each round by implementing policies which are linear in multiple past states (as in e.g. Agarwal et al. (2019a)). Here, we show that all such policies can be arbitrarily suboptimal when compared to state-targeting policies, even for dynamics which are linear up to projection and with fixed convex losses over states, as they may yield actions and states which remain fixed at 𝟎0\mathbf{0}bold_0 in every round even if the optimal state is always immediately accessible under the dynamics. We prove Proposition 2 in Appendix A.

Proposition 2.

For an instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ), let the class of state-targeting policies for 𝒴^𝒴^𝒴𝒴\hat{\operatorname{\mathcal{Y}}}\subseteq\operatorname{\mathcal{Y}}over^ start_ARG caligraphic_Y end_ARG ⊆ caligraphic_Y be given by 𝒫𝒴^={Py^:y^𝒴^}subscript𝒫^𝒴conditional-setsubscript𝑃^𝑦^𝑦^𝒴\mathcal{P}_{\hat{\operatorname{\mathcal{Y}}}}=\{P_{\hat{y}}:\hat{y}\in\hat{% \operatorname{\mathcal{Y}}}\}caligraphic_P start_POSTSUBSCRIPT over^ start_ARG caligraphic_Y end_ARG end_POSTSUBSCRIPT = { italic_P start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT : over^ start_ARG italic_y end_ARG ∈ over^ start_ARG caligraphic_Y end_ARG } where Py^(y)=argmin{x𝒳:D(x,y)𝒴^}D(x,y)y^2P_{\hat{y}}(y)=\operatorname*{argmin}_{\{x\in\operatorname{\mathcal{X}}:D(x,y)% \in\hat{\operatorname{\mathcal{Y}}}\}}\left\lVert D(x,y)-\hat{y}\right\rVert^{2}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT ( italic_y ) = roman_argmin start_POSTSUBSCRIPT { italic_x ∈ caligraphic_X : italic_D ( italic_x , italic_y ) ∈ over^ start_ARG caligraphic_Y end_ARG } end_POSTSUBSCRIPT ∥ italic_D ( italic_x , italic_y ) - over^ start_ARG italic_y end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Define the regret of a policy class 𝒫𝒫\mathcal{P}caligraphic_P as

RegT(𝒫)=minP𝒫(t=1Tft(yt))miny𝒴(t=1Tft(y)),subscriptReg𝑇𝒫subscript𝑃𝒫superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑦𝒴superscriptsubscript𝑡1𝑇subscript𝑓𝑡𝑦\displaystyle\operatorname{\textup{{Reg}}}_{T}(\mathcal{P})=\min_{P\in\mathcal% {P}}\left(\sum_{t=1}^{T}f_{t}(y_{t})\right)-\min_{y\in\operatorname{\mathcal{Y% }}}\left(\sum_{t=1}^{T}{f}_{t}(y)\right),Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_P ) = roman_min start_POSTSUBSCRIPT italic_P ∈ caligraphic_P end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) - roman_min start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) ) ,

where ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is updated by playing P𝑃Pitalic_P at each round. For any ρ𝜌\rhoitalic_ρ-locally controllable instance, there is a set 𝒴^𝒴^𝒴𝒴\hat{\operatorname{\mathcal{Y}}}\subseteq\operatorname{\mathcal{Y}}over^ start_ARG caligraphic_Y end_ARG ⊆ caligraphic_Y for which RegT(𝒫𝒴^)=O(Tρ1)subscriptReg𝑇subscript𝒫^𝒴𝑂𝑇superscript𝜌1\operatorname{\textup{{Reg}}}_{T}(\mathcal{P}_{\hat{\operatorname{\mathcal{Y}}% }})={O}(\sqrt{T\rho^{-1}})Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_P start_POSTSUBSCRIPT over^ start_ARG caligraphic_Y end_ARG end_POSTSUBSCRIPT ) = italic_O ( square-root start_ARG italic_T italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ). Further, for any class 𝒫𝒦subscript𝒫𝒦\mathcal{P}_{\mathcal{K}}caligraphic_P start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT where each K𝒫𝒦𝐾subscript𝒫𝒦K\in\mathcal{P}_{\mathcal{K}}italic_K ∈ caligraphic_P start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT is a matrix yielding actions xt=Kyt1subscript𝑥𝑡𝐾subscript𝑦𝑡1x_{t}=-Ky_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_K italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, there is an instance where RegT(𝒫𝒦)Ω(T)subscriptReg𝑇subscript𝒫𝒦Ω𝑇\operatorname{\textup{{Reg}}}_{T}(\mathcal{P}_{\mathcal{K}})\geq\Omega(T)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_P start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT ) ≥ roman_Ω ( italic_T ) for ρ=1𝜌1\rho=1italic_ρ = 1.

If dynamics are linear up to projection with D(xt,yt1)=Π𝒴(By+Ax)𝐷subscript𝑥𝑡subscript𝑦𝑡1subscriptΠ𝒴𝐵𝑦𝐴𝑥D(x_{t},{y_{t-1}})=\Pi_{\operatorname{\mathcal{Y}}}(By+Ax)italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_B italic_y + italic_A italic_x ) for full-rank A𝐴Aitalic_A, and dim(𝒳)=dim(𝒴)dimension𝒳dimension𝒴\dim(\operatorname{\mathcal{X}})=\dim(\operatorname{\mathcal{Y}})roman_dim ( caligraphic_X ) = roman_dim ( caligraphic_Y ), note that Py^(y)=A1(y^By)subscript𝑃^𝑦𝑦superscript𝐴1^𝑦𝐵𝑦P_{\hat{y}}(y)=A^{-1}(\hat{y}-By)italic_P start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT ( italic_y ) = italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_y end_ARG - italic_B italic_y ) implements any Py^subscript𝑃^𝑦P_{\hat{y}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT for sufficiently large 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X.

3 No-Regret Algorithms for Locally Controllable Dynamics

Here we give a sequence of no-regret algorithms satisfying a range of robustness properties. Our primary algorithm NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl, presented in Section 3.1, operates over known time-varying dynamics without disturbances and requires an offline non-convex optimization oracle, and we identify conditions in Section 3.2 which remove the oracle requirement. In Section 3.3 we give two algorithms, NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap and NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap, which allow adversarial disturbances to weakly and strongly locally controllable dynamics, respectively. In Section 3.4 we extend NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl to accommodate unknown dynamics under appropriate regularity conditions (provided an initial “approximately stabilizing” action is known at t=1𝑡1t=1italic_t = 1), and in Section 3.5 we give an algorithm which obtains O(T3/4)𝑂superscript𝑇34O(T^{3/4})italic_O ( italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ) regret under bandit feedback.

3.1 Nonlinear Control via Online Convex Optimization

When dynamics satisfy local controllability and yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is not too close to bd(𝒴)bd𝒴\operatorname*{bd}(\operatorname{\mathcal{Y}})roman_bd ( caligraphic_Y ), all points ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in a ball around yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT are feasible with an appropriate xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT; this enables execution of an online convex optimization (OCO) algorithm over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y by playing the action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which yields a state update to the target ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT chosen at each iteration, computed via offline non-convex optimization. Here we assume that D𝐷Ditalic_D is known and can be queried for any inputs, and that disturbances to the state are not present. We allow the dynamics to change over time, potentially as a function of previous actions xssubscript𝑥𝑠x_{s}italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and losses fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for s<t𝑠𝑡s<titalic_s < italic_t, provided that Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be determined in each round. We use Follow the Regularized Leader (FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl) as our OCO subroutine (Shalev-Shwartz and Singer, 2006; Abernethy et al., 2008), yet we note that it may be substituted for any OCO algorithm whose per-round step size is guaranteed to be sufficiently small (such as OGD with a constant learning rate); statements of the FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl algorithm and its key properties are provided in Appendix B. We instantiate FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl over a contracted space 𝒴~𝒴~𝒴𝒴\tilde{\operatorname{\mathcal{Y}}}\subseteq\operatorname{\mathcal{Y}}over~ start_ARG caligraphic_Y end_ARG ⊆ caligraphic_Y, calibrated to ensure that the minimum loss over 𝒴~~𝒴\tilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG is close to that for 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, yet where each step of FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl lies within the feasible region ensured by (weak) local controllability.

Algorithm 1 Nested Online Convex Optimization (NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl).
  Let ψ:𝒴:𝜓𝒴\psi:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_ψ : caligraphic_Y → blackboard_R be γ𝛾\gammaitalic_γ-strongly convex with argminyψ(y)=𝟎subscriptargmin𝑦𝜓𝑦0\text{argmin}_{y}\psi(y)=\mathbf{0}argmin start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_ψ ( italic_y ) = bold_0 and maxy,y|ψ(y)ψ(y)|Gsubscript𝑦superscript𝑦𝜓𝑦𝜓superscript𝑦𝐺\max_{y,y^{\prime}}\left\lvert\psi(y)-\psi(y^{\prime})\right\rvert\leq Groman_max start_POSTSUBSCRIPT italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_ψ ( italic_y ) - italic_ψ ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_G
  Let η=(Gγ)1/2((1+Rrρ)TL2)1/2𝜂superscript𝐺𝛾12superscript1𝑅𝑟𝜌𝑇superscript𝐿212\eta=(G\gamma)^{1/2}((1+\frac{R}{r\rho})TL^{2})^{-1/2}italic_η = ( italic_G italic_γ ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT
  Let 𝒴~={y:11δy𝒴}~𝒴conditional-set𝑦11𝛿𝑦𝒴\widetilde{\operatorname{\mathcal{Y}}}=\{y:\frac{1}{1-\delta}y\in\operatorname% {\mathcal{Y}}\}over~ start_ARG caligraphic_Y end_ARG = { italic_y : divide start_ARG 1 end_ARG start_ARG 1 - italic_δ end_ARG italic_y ∈ caligraphic_Y } for δ=ηLrργ𝛿𝜂𝐿𝑟𝜌𝛾\delta=\eta\frac{L}{r\rho\gamma}italic_δ = italic_η divide start_ARG italic_L end_ARG start_ARG italic_r italic_ρ italic_γ end_ARG
  Initialize FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl to run for T𝑇Titalic_T rounds over 𝒴~~𝒴\widetilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG with regularizer ψ𝜓\psiitalic_ψ and parameter η𝜂\etaitalic_η
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Let ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the point chosen by FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl
     Use Oracle(yt1,y)Oraclesubscript𝑦𝑡1superscript𝑦\texttt{Oracle}(y_{t-1},y^{*})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) to compute xt=argminxDt(x,yt1)y2x_{t}=\operatorname*{argmin}_{x}\left\lVert D_{t}(x,y_{t-1})-y^{*}\right\rVert% ^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
     Play action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
     Observe ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), update FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl
  end for
Theorem 1.

For a ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) without disturbances and with Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT known at each t𝑡titalic_t, the regret of NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for convex L𝐿Litalic_L-Lipschitz losses ft:𝒴:subscript𝑓𝑡𝒴f_{t}:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_Y → blackboard_R is at most

RegT(NestedOCO)subscriptReg𝑇NestedOCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrl ) ≤  2L(1+R(rρ)1)TGγ12𝐿1𝑅superscript𝑟𝜌1𝑇𝐺superscript𝛾1\displaystyle\;2L\sqrt{(1+{R}(r\rho)^{-1})TG\gamma^{-1}}2 italic_L square-root start_ARG ( 1 + italic_R ( italic_r italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_T italic_G italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG

with respect to any state y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y, with T𝑇Titalic_T queries made to a non-convex optimization oracle.

The proof for Theorem 1 is given in Appendix C.

3.2 Efficient Updates for Action-Linear Dynamics

While NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl requires no assumptions on the dynamics beyond local controllability, there are large classes of dynamics for which the oracle call can be removed. We say that dynamics are action-linear if yx=D(x,y)subscript𝑦𝑥𝐷𝑥𝑦y_{x}=D(x,y)italic_y start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_D ( italic_x , italic_y ) is linear in x𝑥xitalic_x, for yxint(𝒴)subscript𝑦𝑥int𝒴y_{x}\in\operatorname*{int}(\operatorname{\mathcal{Y}})italic_y start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ roman_int ( caligraphic_Y ) (and arbitrary for yxbd(𝒴))y_{x}\in\operatorname*{bd}(\operatorname{\mathcal{Y}}))italic_y start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ roman_bd ( caligraphic_Y ) ).

Proposition 3.

For a ρ𝜌\rhoitalic_ρ-locally controllable and action-linear instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ), the per-round optimization problem for Oracle(yt1,y)Oraclesubscript𝑦𝑡1superscript𝑦\texttt{{Oracle}}(y_{t-1},y^{*})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) in NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl is convex.

  • Proof

    For y=yt1𝒴~int(𝒴)𝑦subscript𝑦𝑡1~𝒴int𝒴y=y_{t-1}\in\widetilde{\operatorname{\mathcal{Y}}}\subseteq\operatorname*{int}% (\operatorname{\mathcal{Y}})italic_y = italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ over~ start_ARG caligraphic_Y end_ARG ⊆ roman_int ( caligraphic_Y ), we have D(x,y)=Ayx+by𝐷𝑥𝑦subscript𝐴𝑦𝑥subscript𝑏𝑦D(x,y)=A_{y}\cdot x+b_{y}italic_D ( italic_x , italic_y ) = italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT for some matrix Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and vector bysubscript𝑏𝑦b_{y}italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, and so we can solve xt=argminx𝒳Ayx+byy2x_{t}=\operatorname*{argmin}_{x\in\operatorname{\mathcal{X}}}\left\lVert A_{y}% \cdot x+b_{y}-y^{*}\right\rVert^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT efficiently. ∎

The class of action-linear dynamics is quite general, owing to the flexibility permitted by nonlinear parameterizations of (Ay,by)subscript𝐴𝑦subscript𝑏𝑦(A_{y},b_{y})( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) in terms of y𝑦yitalic_y; in Appendix D, we show that local controllability holds for multiple explicit families of instances when appropriate eigenvalue conditions are satisfied. We can further relax this condition to accommodate dynamics where action-linearity holds only locally in the neighborhood of stabilizing actions (i.e. actions xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT where D(x,y)=y𝐷superscript𝑥𝑦𝑦D(x^{*},y)=yitalic_D ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y ) = italic_y).

Definition 3 (Locally Action-Linear Dynamics).

An instance (D,𝒳,𝒴)𝐷𝒳𝒴(D,\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}})( italic_D , caligraphic_X , caligraphic_Y ) is locally action-linear if, for any yint(𝒴)𝑦int𝒴y\in\operatorname*{int}(\operatorname{\mathcal{Y}})italic_y ∈ roman_int ( caligraphic_Y ), xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that D(x,y)=y𝐷superscript𝑥𝑦𝑦D(x^{*},y)=yitalic_D ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y ) = italic_y, and x𝑥xitalic_x such that D(x,y)int(𝒴)𝐷𝑥𝑦int𝒴D(x,y)\in\operatorname*{int}(\operatorname{\mathcal{Y}})italic_D ( italic_x , italic_y ) ∈ roman_int ( caligraphic_Y ), the dynamics are given by D(x,y)=Ayx+by+qy(x)𝐷𝑥𝑦subscript𝐴𝑦𝑥subscript𝑏𝑦subscript𝑞𝑦𝑥D(x,y)=A_{y}x+b_{y}+q_{y}(x)italic_D ( italic_x , italic_y ) = italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x ), where Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is a matrix and bysubscript𝑏𝑦b_{y}italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is a vector, both with norms bounded by some absolute constant, where and qy:𝒳dim(𝒴):subscript𝑞𝑦𝒳superscriptdimension𝒴q_{y}:\operatorname{\mathcal{X}}\rightarrow\operatorname{\mathbb{R}}^{\dim(% \operatorname{\mathcal{Y}})}italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT : caligraphic_X → blackboard_R start_POSTSUPERSCRIPT roman_dim ( caligraphic_Y ) end_POSTSUPERSCRIPT is any function where qy(x)CAy(xx)1+cdelimited-∥∥subscript𝑞𝑦𝑥𝐶superscriptdelimited-∥∥subscript𝐴𝑦𝑥superscript𝑥1𝑐\left\lVert q_{y}(x)\right\rVert\leq C\left\lVert A_{y}(x-x^{*})\right\rVert^{% 1+c}∥ italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ italic_C ∥ italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 1 + italic_c end_POSTSUPERSCRIPT for some constants C,c>0𝐶𝑐0C,c>0italic_C , italic_c > 0.

By this condition, for any x𝑥xitalic_x in a sufficiently small neighborhood around xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the deviation of dynamics (and thus the resulting yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT) from action-linearity vanishes. Note that our algorithm always chooses a target ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will always be near yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT; as such, these deviations from non-action-linearity can be modeled as disturbances with magnitude strictly less than our per-round step size yt+1ytdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡\left\lVert y_{t+1}-y_{t}\right\rVert∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ (along with universal constant factors). The existence of an efficient implementation follows as a straightforward corollary of Theorem 2 in Section 3.3, which extends NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl to accommodate bounded adversarial disturbances, as we can then select actions by disregarding the influence of qysubscript𝑞𝑦q_{y}italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and only considering the local approximation D(x,y)=Ayx+by𝐷𝑥𝑦subscript𝐴𝑦𝑥subscript𝑏𝑦D(x,y)=A_{y}x+b_{y}italic_D ( italic_x , italic_y ) = italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT at each state y𝑦yitalic_y (assuming that each decomposition between qysubscript𝑞𝑦q_{y}italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and the action-linear component is known).

3.3 Adversarial Disturbances

Our algorithm NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl can be extended to accommodate adversarial disturbances, where the state is updated as yt=D(xt,yt1)+wtsubscript𝑦𝑡𝐷subscript𝑥𝑡subscript𝑦𝑡1subscript𝑤𝑡y_{t}=D(x_{t},y_{t-1})+w_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with {wt}subscript𝑤𝑡\{w_{t}\}{ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } chosen adversarially. In the weak local controllability case, we show a sharp threshold effect in terms of whether or not wtdelimited-∥∥subscript𝑤𝑡\left\lVert w_{t}\right\rVert∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ is allowed to exceed the undisturbed distance from the boundary by a factor of ρ1+ρ𝜌1𝜌\frac{\rho}{1+\rho}divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG: if disturbances are bounded below this threshold, regret minimization remains feasible with a tight Θ(E)Θ𝐸\Theta(E)roman_Θ ( italic_E ) dependence on the total disturbance magnitude, yet if disturbances may exceed this, no sublinear regret rate is attainable even for a constant total disturbance magnitude. When ρ𝜌\rhoitalic_ρ is small, an adversary can push us to the boundary faster than we can “undo” past disturbances, causing our feasible range to decay.

Theorem 2 (Bounded Disturbances for Weak Local Controllability).

For any ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ], suppose that a sequence of adversarial disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for a ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) satisfies t=1TwtEsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_E and wtραρ1+ρπ(D(xt,yt1))delimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\left\lVert w_{t}\right\rVert\leq\frac{\rho-\alpha\rho}{1+\rho}\cdot\pi\left(D% (x_{t},y_{t-1})\right)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), for some α𝛼\alpha\in\operatorname{\mathbb{R}}italic_α ∈ blackboard_R. If α>0𝛼0\alpha>0italic_α > 0, there is an algorithm NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap with regret for convex Lipschitz losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bounded by

RegT(NestedOCO-BD)subscriptReg𝑇NestedOCO-BDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-BD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrlap ) ≤ O(T(αρ)1+E),𝑂𝑇superscript𝛼𝜌1𝐸\displaystyle\;O\left(\sqrt{T\cdot(\alpha\rho)^{-1}}+E\right),italic_O ( square-root start_ARG italic_T ⋅ ( italic_α italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG + italic_E ) ,

and there is an instance where any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains RegT(𝒜)=Ω(E)subscriptReg𝑇𝒜Ω𝐸\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=\Omega(E)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = roman_Ω ( italic_E ). If α<0𝛼0\alpha<0italic_α < 0, there is an instance such that any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains RegT(𝒜)Ω(T)subscriptReg𝑇𝒜Ω𝑇\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})\geq\Omega\left(T\right)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) ≥ roman_Ω ( italic_T ) even when E=O(1)𝐸𝑂1E=O(1)italic_E = italic_O ( 1 ).

The maximum disturbance bound can be removed when dynamics are strongly locally controllable, as the ensured feasible range of the dynamics does not vanish at the boundary of the state space. For such instances, we can minimize regret (with tight O(Eρ1)𝑂𝐸superscript𝜌1O(E\cdot\rho^{-1})italic_O ( italic_E ⋅ italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) dependence) even if disturbances are only implicitly bounded by the state space diameter (which is at least ρ𝜌\rhoitalic_ρ, without loss of generality).

Theorem 3 (Unbounded Disturbances for Strong Local Controllability).

For any ρ>0𝜌0\rho>0italic_ρ > 0 and strongly ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying t=1TwtEsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_E, there is an algorithm NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap with regret for convex Lipschitz losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bounded by

RegT(NestedOCO-UD)subscriptReg𝑇NestedOCO-UDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-UD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrluap ) ≤ O(T+Eρ1),𝑂𝑇𝐸superscript𝜌1\displaystyle\;O\left(\sqrt{T}+E\cdot\rho^{-1}\right),italic_O ( square-root start_ARG italic_T end_ARG + italic_E ⋅ italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ,

and there is an instance where any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains RegT(𝒜)Ω(Eρ1)subscriptReg𝑇𝒜Ω𝐸superscript𝜌1\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})\geq\Omega\left(E% \cdot\rho^{-1}\right)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) ≥ roman_Ω ( italic_E ⋅ italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ).

In each case, our lower bounds in terms of E𝐸Eitalic_E hold for the same constants obtained by our algorithms, and our algorithms obtain the stated regret guarantees even when E𝐸Eitalic_E is not known in advance. We present the algorithms and analysis for each theorem in Appendix E; both operate by tracking deviations from an idealized trajectory without disturbances, and calibrating parameters to preserve sufficient reachability margin for applying corrections towards this trajectory in each round. The lower bounds both proceed by considering an instance with a fixed target state ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and losses which track the distance from ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, along with an adversary whose goal is to maximize this distance by selecting disturbances which push the current state away from ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

3.4 Unknown Dynamics

Up until this point, we have assumed that the dynamics D𝐷Ditalic_D can be queried arbitrarily in each round. While this has required minimal assumptions on D𝐷Ditalic_D beyond local controllability, accommodation of unknown dynamics is often desired in online control (Cassel et al., 2022; Minasyan et al., 2022) and for several of our applications (Roth et al., 2015; Agarwal and Brown, 2023). Here we give conditions under which regret minimization can be implemented without advance knowledge of D𝐷Ditalic_D by an algorithm ProbingOCOProbingOCO\operatorname{\textup{{ProbingOCO}}}probingoco, which maintains continuously-updating local linear approximations of D𝐷Ditalic_D near ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT across rounds. Crucially, we assume that D𝐷Ditalic_D is time-invariant and locally action-linear with sufficiently small Lipschitz parameters, and that for the initial state y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT some near-stabilizing action x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is known, i.e. D(x1,y0)y0ϵdelimited-∥∥𝐷subscript𝑥1subscript𝑦0subscript𝑦0italic-ϵ\left\lVert D(x_{1},y_{0})-y_{0}\right\rVert\leq\epsilon∥ italic_D ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ italic_ϵ, for some ϵ=o(T)italic-ϵ𝑜𝑇\epsilon=o(\sqrt{T})italic_ϵ = italic_o ( square-root start_ARG italic_T end_ARG ).

Theorem 4.

For any ρ𝜌\rhoitalic_ρ-locally controllable and time-invariant instance (D,𝒳,𝒴)𝐷𝒳𝒴(D,\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}})( italic_D , caligraphic_X , caligraphic_Y ) which satisfies local action-linearity and appropriate Lipschitz conditions, there is an algorithm ProbingOCOProbingOCO\operatorname{\textup{{ProbingOCO}}}probingoco with RegT(ProbingOCO)O(T)subscriptReg𝑇ProbingOCO𝑂𝑇\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{ProbingOCO}}})\leq O(% \sqrt{T})Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( probingoco ) ≤ italic_O ( square-root start_ARG italic_T end_ARG ) for convex Lipschitz losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and unknown dynamics D𝐷Ditalic_D, provided that at t=1𝑡1t=1italic_t = 1 we are given some x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that D(x1,y0)y0=o(T)delimited-∥∥𝐷subscript𝑥1subscript𝑦0subscript𝑦0𝑜𝑇\left\lVert D(x_{1},y_{0})-y_{0}\right\rVert=o(\sqrt{T})∥ italic_D ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ = italic_o ( square-root start_ARG italic_T end_ARG ).

We state ProbingOCOProbingOCO\operatorname{\textup{{ProbingOCO}}}probingoco and prove Theorem 4 in Appendix F, along with additional details on the regularity and near-stability assumptions. The crux of our analysis, beyond that from our previous results, hinges on being able to maintain and update local linear approximations of D𝐷Ditalic_D throughout our optimization which are sufficiently accurate to allow us to discard the effects of both learned representation errors and action non-linearity from qy(x)subscript𝑞𝑦𝑥q_{y}(x)italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x ) as bounded disturbances. We implement each update from our nested regret minimization algorithm as a series of O(dim(𝒳))𝑂dimension𝒳O(\dim(\operatorname{\mathcal{X}}))italic_O ( roman_dim ( caligraphic_X ) ) steps involving small near-orthogonal perturbations to our targets ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which we then use to update our local estimate for D𝐷Ditalic_D.

3.5 Bandit Feedback

We can extend our approach from NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl to accommodate bandit feedback for convex losses by replacing FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl with the FKMFKM\operatorname{\textup{{FKM}}}fkm algorithm (Flaxman et al., 2004) and appropriately recalibrating parameters. FKMFKM\operatorname{\textup{{FKM}}}fkm obtains O(T3/4)𝑂superscript𝑇34{O}(T^{3/4})italic_O ( italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ) regret, which is the best currently-known bound for bandit convex optimization without additional assumptions (e.g. strong convexity), and we obtain an analogous bound here for nested optimization. We note that this extension to bandit feedback can again be applied for any algorithm with a small per-round step-size bound, though this property does not hold for algorithms which sample from larger sets to reduce variance of gradient estimators (e.g. those from Abernethy et al. (2008); Hazan and Levy (2014)).

Theorem 5.

For any ρ𝜌\rhoitalic_ρ-locally controllable instance (D,𝒳,𝒴)𝐷𝒳𝒴(D,\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}})( italic_D , caligraphic_X , caligraphic_Y ), there is an oracle-efficient algorithm NestedBCONestedBCO\operatorname{\textup{{NestedBCO}}}nestedbco with expected regret bounded by

RegT(NestedBCO)=subscriptReg𝑇NestedBCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedBCO}}})=Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( nestedbco ) = O(nRLT3/4(rρ)1)𝑂𝑛𝑅𝐿superscript𝑇34superscript𝑟𝜌1\displaystyle\;O\left(nRLT^{3/4}(r\rho)^{-1}\right)italic_O ( italic_n italic_R italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ( italic_r italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )

for L𝐿Litalic_L-Lipschitz convex losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT under bandit feedback.

We present the NestedBCONestedBCO\operatorname{\textup{{NestedBCO}}}nestedbco algorithm and prove Theorem 5 in Appendix G.

4 Applications for Online Stackelberg Optimization

We give several applications of our framework to online Stackelberg problems involving strategic or adaptive agents, each cast as an instance of online control with nonlinear dynamics where local controllability holds, and where our objectives are well-approximated by convex surrogate losses only over the state. Each application extends prior work by either allowing for more relaxed assumptions, unifying distinct problem instances, or giving a novel formulation to account for dynamic and adversarial behavior; analysis and comparison to related work is contained in Appendices H-K.

4.1 Online Performative Prediction

Performative Prediction was introduced by Perdomo et al. (2020) to capture settings in which the data distribution may shift as a function of the classifier itself. We consider the online formulation of Performative Prediction introduced in Kumar et al. (2022) as an instance of online convex optimization with unbounded memory, which we extend to accommodate a stateful variant of the problem (as in Brown et al. (2022)) in which the update to the distribution is a function of both the classifier and the current distribution itself. Let 𝒳n𝒳superscript𝑛\operatorname{\mathcal{X}}\subseteq\operatorname{\mathbb{R}}^{n}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote our space of classifiers, and let p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the initial distribution over nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. When a classifier xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is deployed, the distribution is updated to

pt=subscript𝑝𝑡absent\displaystyle p_{t}=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = (1θ)pt1+θ𝒟(xt,yt1)1𝜃subscript𝑝𝑡1𝜃𝒟subscript𝑥𝑡subscript𝑦𝑡1\displaystyle\;(1-\theta)p_{t-1}+\theta\operatorname{\mathcal{D}}(x_{t},y_{t-1})( 1 - italic_θ ) italic_p start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )

where 𝒟(xt,y)=A(xt,yt1)+ξ𝒟subscript𝑥𝑡𝑦𝐴subscript𝑥𝑡subscript𝑦𝑡1𝜉\operatorname{\mathcal{D}}(x_{t},y)=A(x_{t},y_{t-1})+\xicaligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y ) = italic_A ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_ξ, for a random variable ξn𝜉superscript𝑛\xi\in\operatorname{\mathbb{R}}^{n}italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with mean μ𝜇\muitalic_μ and covariance ΣΣ\Sigmaroman_Σ, and with yt=A(xt,yt1)subscript𝑦𝑡𝐴subscript𝑥𝑡subscript𝑦𝑡1y_{t}=A(x_{t},y_{t-1})italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_A ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), where A𝐴Aitalic_A satisfies ρ𝜌\rhoitalic_ρ-local controllability for some ρ>0𝜌0\rho>0italic_ρ > 0 and appropriate smoothness notions. We also assume there is some linear s:𝒳𝒴:𝑠𝒳𝒴s:\operatorname{\mathcal{X}}\rightarrow\operatorname{\mathcal{Y}}italic_s : caligraphic_X → caligraphic_Y such that A(x,y)=s(x)𝐴𝑥𝑦𝑠𝑥A(x,y)=s(x)italic_A ( italic_x , italic_y ) = italic_s ( italic_x ) if y=s(x)𝑦𝑠𝑥y=s(x)italic_y = italic_s ( italic_x ). We then receive loss f~t(xt,pt)=𝔼zpt[ft(xt,z)]subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡subscript𝔼similar-to𝑧subscript𝑝𝑡subscript𝑓𝑡subscript𝑥𝑡𝑧\tilde{f}_{t}(x_{t},p_{t})=\operatorname*{\mathbb{E}}_{z\sim p_{t}}[f_{t}(x_{t% },z)]over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z ) ], where each ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is convex and Lipschitz.

This generalizes the model of Kumar et al. (2022), in which A(x,y)=An×n𝐴𝑥𝑦𝐴superscript𝑛𝑛A(x,y)=A\in\operatorname{\mathbb{R}}^{n\times n}italic_A ( italic_x , italic_y ) = italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT is taken to be a fixed matrix; there, ρ𝜌\rhoitalic_ρ-local controllability is satisfied for some ρ>0𝜌0\rho>0italic_ρ > 0 provided that A𝐴Aitalic_A is nonsingular. Their aim is to compete with the best fixed classifier by running regret minimization over 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X. Here we run NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, taken over the range of s𝑠sitalic_s, which allows us to compete against the best fixed classifier as well by the properties of s𝑠sitalic_s; while the classifiers xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT we play will generally not result in stabilizing points of A𝐴Aitalic_A, their excess loss compared to each s1(yt)superscript𝑠1subscript𝑦𝑡s^{-1}(y_{t})italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is bounded.

Theorem 6 (Regret Minimization for Performative Prediction).

For any θ>0𝜃0\theta>0italic_θ > 0, the dynamics for Online Performative Prediction are ρ𝜌\rhoitalic_ρ-locally controllable, and NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl obtains regret O(T(ρ1+θ1))𝑂𝑇superscript𝜌1superscript𝜃1O(\sqrt{T(\rho^{-1}+\theta^{-1})})italic_O ( square-root start_ARG italic_T ( italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) end_ARG ) with respect to the best fixed classifier.

4.2 Adaptive Recommendations

Online interactions with economic agents of various types are ubiquitous, and the resulting control problems tend to be manifestly nonlinear; here we treat two diverse examples from this space. The Adaptive Recommendations problem, as introduced by Agarwal and Brown (2022), is about providing menu recommendations repeatedly to an agent, whose choice distribution is a function of their past selections, while the controller’s reward in each round depends on adversarial losses over the choice. In each round t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], we show the agent a (possibly randomized) menu Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT containing k𝑘kitalic_k (out of n𝑛nitalic_n) items, and the agent’s instantaneous choice distribution conditioned on seeing Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is

pt(i;Kt,vt1)=subscript𝑝𝑡𝑖subscript𝐾𝑡subscript𝑣𝑡1absent\displaystyle p_{t}(i;K_{t},v_{t-1})=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ; italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = {si(vt1)jKtsj(vt1)iKt0iKtcasessubscript𝑠𝑖subscript𝑣𝑡1subscript𝑗subscript𝐾𝑡subscript𝑠𝑗subscript𝑣𝑡1𝑖subscript𝐾𝑡0𝑖subscript𝐾𝑡\displaystyle\;\begin{cases}\frac{s_{i}(v_{t-1})}{\sum_{j\in K_{t}}s_{j}(v_{t-% 1})}&i\in K_{t}\\ 0&i\notin K_{t}\end{cases}{ start_ROW start_CELL divide start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL italic_i ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_i ∉ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW

where each si:Δ(n)[λ,1]:subscript𝑠𝑖Δ𝑛𝜆1s_{i}:\Delta(n)\rightarrow[\lambda,1]italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Δ ( italic_n ) → [ italic_λ , 1 ] is the agent’s preference scoring function for item i𝑖iitalic_i, for some λ>0𝜆0\lambda>0italic_λ > 0, taking as input the agent’s memory vector vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ). The memory vector updates each round as

vt=(1θt)vt1+θtpt,subscript𝑣𝑡1subscript𝜃𝑡subscript𝑣𝑡1subscript𝜃𝑡subscript𝑝𝑡\displaystyle v_{t}=(1-\theta_{t})v_{t-1}+\theta_{t}p_{t},italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where θt[θ,1]subscript𝜃𝑡𝜃1\theta_{t}\in[\theta,1]italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_θ , 1 ] for θ>0𝜃0\theta>0italic_θ > 0 is a possibly time-dependent update speed, and we receive loss ft(pt)subscript𝑓𝑡subscript𝑝𝑡f_{t}(p_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), where each ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is convex and L𝐿Litalic_L-Lipschitz. Note that the set of feasible choice distributions when considering all menu distributions xtΔ((nk))subscript𝑥𝑡Δbinomial𝑛𝑘x_{t}\in\Delta({n\choose k})italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ) depends on the memory vector vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The regret benchmark considered by Agarwal and Brown (2022) is the intersection of all such sets, denoted the “everywhere instantaneously-realizable distribution” set EIRD=vΔIRD(v)EIRDsubscript𝑣ΔIRD𝑣\operatorname{\textup{{EIRD}}}=\cap_{v\in\Delta}\operatorname{\textup{{IRD}}}(v)EIRD = ∩ start_POSTSUBSCRIPT italic_v ∈ roman_Δ end_POSTSUBSCRIPT IRD ( italic_v ), where IRD(v)IRD𝑣\operatorname{\textup{{IRD}}}(v)IRD ( italic_v ) is the “instantaneously realizable distribution” set for v𝑣vitalic_v, given as the convex hull of the choice distributions p(Kt)𝑝subscript𝐾𝑡p(K_{t})italic_p ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) resulting from each menu Kt[(nk)]subscript𝐾𝑡delimited-[]binomial𝑛𝑘K_{t}\in[{n\choose k}]italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ] when v𝑣vitalic_v is the memory vector. It is shown that the set is non-empty when λ𝜆\lambdaitalic_λ is not too small, and algorithms which minimize regret with respect to any distribution in EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD are given in Agarwal and Brown (2022) and Agarwal and Brown (2023) under varying assumptions regarding the scoring functions and update speed.

While the prior work considers a bandit version of the problem with unknown dynamics, here we consider a full-feedback deterministic variant of the problem for simplicity, which further allows us to circumvent barriers posed by uncertainty Agarwal and Brown (2022, 2023) and relax structural assumptions (e.g. on θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT or sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). We can cast this as an instance of our framework by taking 𝒳=Δ((nk))𝒳Δbinomial𝑛𝑘\operatorname{\mathcal{X}}=\Delta({n\choose k})caligraphic_X = roman_Δ ( ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ) and 𝒴=EIRD𝒴EIRD\operatorname{\mathcal{Y}}=\operatorname{\textup{{EIRD}}}caligraphic_Y = EIRD, where D𝐷Ditalic_D expresses updates to the memory vector. We assume v0=𝐮nsubscript𝑣0subscript𝐮𝑛v_{0}=\mathbf{u}_{n}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and we reparameterize to run our algorithm over Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ). We optimize surrogate losses ft(vt)subscriptsuperscript𝑓𝑡subscript𝑣𝑡f^{*}_{t}(v_{t})italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and bound excess regret from ft(pt)subscript𝑓𝑡subscript𝑝𝑡f_{t}(p_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

Theorem 7 (Regret Minimization over EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD).

For λ>k1n1𝜆𝑘1𝑛1\lambda>\frac{k-1}{n-1}italic_λ > divide start_ARG italic_k - 1 end_ARG start_ARG italic_n - 1 end_ARG, the dynamics for Adaptive Recommendations over EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD are θ𝜃\thetaitalic_θ-locally controllable, and NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl obtains regret O(Tθ1)𝑂𝑇superscript𝜃1O(\sqrt{T\theta^{-1}})italic_O ( square-root start_ARG italic_T italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ).

In Agarwal and Brown (2023), a property for scoring functions is considered which enables regret minimization over a potentially much larger set of distributions than EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD. A scoring function si:Δ(n)[λσ,1]:subscript𝑠𝑖Δ𝑛𝜆𝜎1s_{i}:\Delta(n)\rightarrow[\frac{\lambda}{\sigma},1]italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Δ ( italic_n ) → [ divide start_ARG italic_λ end_ARG start_ARG italic_σ end_ARG , 1 ] is said to be (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded for σ>1𝜎1\sigma>1italic_σ > 1 if, for all vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ), we have that

σ1((1λ)vi+λ)si(v)σ((1λ)vi+λ).superscript𝜎11𝜆subscript𝑣𝑖𝜆subscript𝑠𝑖𝑣𝜎1𝜆subscript𝑣𝑖𝜆\displaystyle\sigma^{-1}((1-\lambda)v_{i}+\lambda)\leq s_{i}(v)\leq\sigma((1-% \lambda)v_{i}+\lambda).italic_σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ( 1 - italic_λ ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ ) ≤ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) ≤ italic_σ ( ( 1 - italic_λ ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ ) .

The set considered is the ϕitalic-ϕ\phiitalic_ϕ-smoothed simplex Δϕ(n)={(1ϕ)v+ϕ𝐮n:vΔ(n)}superscriptΔitalic-ϕ𝑛conditional-set1italic-ϕ𝑣italic-ϕsubscript𝐮𝑛𝑣Δ𝑛\Delta^{\phi}(n)=\{(1-\phi)v+\phi\mathbf{u}_{n}:v\in\Delta(n)\}roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) = { ( 1 - italic_ϕ ) italic_v + italic_ϕ bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_v ∈ roman_Δ ( italic_n ) }, for ϕ=Θ(kλσ2)italic-ϕΘ𝑘𝜆superscript𝜎2\phi=\Theta(k\lambda\sigma^{2})italic_ϕ = roman_Θ ( italic_k italic_λ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where it is shown that IRD(v)IRD𝑣\operatorname{\textup{{IRD}}}(v)IRD ( italic_v ) contains a ball around v𝑣vitalic_v for vΔϕ(n)𝑣superscriptΔitalic-ϕ𝑛v\in\Delta^{\phi}(n)italic_v ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ). We take 𝒴=Δϕ(n)𝒴superscriptΔitalic-ϕ𝑛\operatorname{\mathcal{Y}}=\Delta^{\phi}(n)caligraphic_Y = roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ), which satisfies local controllability, and optimize over ft(vt)superscriptsubscript𝑓𝑡subscript𝑣𝑡f_{t}^{*}(v_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) with NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl.

Theorem 8 (Regret Minimization over Δϕ(n)superscriptΔitalic-ϕ𝑛\Delta^{\phi}(n)roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n )).

For (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded scoring functions sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for any λ>0𝜆0\lambda>0italic_λ > 0 and σ>1𝜎1\sigma>1italic_σ > 1, the dynamics for Adaptive Recommendations over Δϕ(n)superscriptΔitalic-ϕ𝑛\Delta^{\phi}(n)roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) are Ω(θλϕ)Ω𝜃𝜆italic-ϕ\Omega(\theta\lambda\phi)roman_Ω ( italic_θ italic_λ italic_ϕ )-locally controllable, and NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl obtains regret O(T(θλϕ)1)𝑂𝑇superscript𝜃𝜆italic-ϕ1O(\sqrt{T(\theta\lambda\phi)^{-1}})italic_O ( square-root start_ARG italic_T ( italic_θ italic_λ italic_ϕ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ).

4.3 Adaptive Pricing

Here we consider an Adaptive Pricing problem for real-valued goods, formulated as a dynamic extension of the setting of Roth et al. (2015) where purchase history and consumption affect demand. In each round we set per-unit price vectors pt+nsubscript𝑝𝑡superscriptsubscript𝑛p_{t}\in\operatorname{\mathbb{R}}_{+}^{n}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and an agent buys some bundle of goods xt+nsubscript𝑥𝑡superscriptsubscript𝑛x_{t}\in\operatorname{\mathbb{R}}_{+}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, which results in us obtaining a reward pt,xtct(xt)subscript𝑝𝑡subscript𝑥𝑡subscript𝑐𝑡subscript𝑥𝑡\langle p_{t},x_{t}\rangle-c_{t}(x_{t})⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), where our production cost function ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at each round is convex and Lcsubscript𝐿𝑐L_{c}italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT-Lipschitz, and may be chosen adversarially.

Departing from Roth et al. (2015), we consider an agent who maintains goods reserves yt10nsubscript𝑦𝑡1superscriptsubscriptabsent0𝑛y_{t-1}\in\operatorname{\mathbb{R}}_{\geq 0}^{n}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and consumes an adversarially chosen fraction θt[θ,1]subscript𝜃𝑡𝜃1\theta_{t}\in[\theta,1]italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_θ , 1 ] of every good’s reserve at each round (for some θ>0𝜃0\theta>0italic_θ > 0). The agent then chooses a bundle xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to maximize their utility g(pt,xt,yt)=v(yt)pt,xt𝑔subscript𝑝𝑡subscript𝑥𝑡subscript𝑦𝑡𝑣subscript𝑦𝑡subscript𝑝𝑡subscript𝑥𝑡g(p_{t},x_{t},y_{t})=v(y_{t})-\langle p_{t},x_{t}\rangleitalic_g ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩, where yt=(1θt)yt1+xtsubscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1subscript𝑥𝑡y_{t}=(1-\theta_{t})y_{t-1}+x_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is their updated reserve bundle. We make several regularity assumptions on the agent’s valuation function v:+n+:𝑣superscriptsubscript𝑛subscriptv:\operatorname{\mathbb{R}}_{+}^{n}\rightarrow\operatorname{\mathbb{R}}_{+}italic_v : blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, all of which are satisfied by several classically studied utility families (which we discuss in Appendix 4.3). Notably, we assume that v𝑣vitalic_v is strictly concave and increasing, and homogeneous; the range is bounded under rationality.

Our aim will be to set prices which allow us to compete with the best stable reserve policy, e.g. against any pricing policy where the agent maintains the same reserve bundle yt=ysubscript𝑦𝑡superscript𝑦y_{t}=y^{*}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT at each round for some ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT regardless of θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We take an appropriate convex set of such bundles as our state space, for which we show that local controllability holds. Observe that to induce a purchase of xt=θtyt1subscript𝑥𝑡subscript𝜃𝑡subscript𝑦𝑡1x_{t}=\theta_{t}y_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, it suffices to set prices pt=v(yt1)subscript𝑝𝑡𝑣subscript𝑦𝑡1p_{t}=\nabla v(y_{t-1})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), as we then have that xt(v((1θt)yt1+xt)pt,xt)=𝟎subscriptsubscript𝑥𝑡𝑣1subscript𝜃𝑡subscript𝑦𝑡1subscript𝑥𝑡subscript𝑝𝑡subscript𝑥𝑡0\nabla_{x_{t}}(v((1-\theta_{t})y_{t-1}+x_{t})-\langle p_{t},x_{t}\rangle)=% \mathbf{0}∇ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ ) = bold_0. By homogeneity of v𝑣vitalic_v, we also have that v(yt),θtyt=θtkv(yt)𝑣subscript𝑦𝑡subscript𝜃𝑡subscript𝑦𝑡subscript𝜃𝑡𝑘𝑣subscript𝑦𝑡\langle\nabla v(y_{t}),\theta_{t}y_{t}\rangle=\theta_{t}k\cdot v(y_{t})⟨ ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ = italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some k𝑘kitalic_k, and we show that optimization via the concave surrogate rewards

ft(yt)=subscriptsuperscript𝑓𝑡subscript𝑦𝑡absent\displaystyle f^{*}_{t}(y_{t})=italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = θtkv(yt)ct(θtyt)subscript𝜃𝑡𝑘𝑣subscript𝑦𝑡subscript𝑐𝑡subscript𝜃𝑡subscript𝑦𝑡\displaystyle\;\theta_{t}k\cdot v(y_{t})-c_{t}(\theta_{t}y_{t})italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

will closely track our true rewards ft(pt,xt)=pt,xtct(xt)subscript𝑓𝑡subscript𝑝𝑡subscript𝑥𝑡subscript𝑝𝑡subscript𝑥𝑡subscript𝑐𝑡subscript𝑥𝑡f_{t}(p_{t},x_{t})=\langle p_{t},x_{t}\rangle-c_{t}(x_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). While neither our true nor surrogate rewards will be Lipschitz, we extend NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl to obtain sublinear regret over Hölder continuous losses by appropriately calibrating our step size (which may be of independent interest).

Theorem 9 (Regret Minimization over Stable Reserve Policies).

For any θ>0𝜃0\theta>0italic_θ > 0, the dynamics for Adaptive Pricing can are θ𝜃\thetaitalic_θ-locally controllable, and NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl obtains regret o(Tθ1)𝑜𝑇superscript𝜃1o(T\theta^{-1})italic_o ( italic_T italic_θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) with respect to the best stable reserve policy.

4.4 Steering Learners in Online Games

A recent line of work (Deng et al., 2019; Mansour et al., 2022; Brown et al., 2023) explores maximizing rewards in a repeated game against a no-regret learner, and Anagnostides et al. (2023) study of no-regret dynamics in time-varying games. We consider these questions in unison, and aim to optimize reward against a no-regret learner for game matrices chosen adversarially and online.

Consider adversarial sequences of two-player m×n𝑚𝑛m\times nitalic_m × italic_n bimatrix games (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), where m>n𝑚𝑛m>nitalic_m > italic_n; we assume that the convex hull of the rows of each Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT contains the unit ball. As Player A, we choose strategies xtΔ(m)subscript𝑥𝑡Δ𝑚x_{t}\in\Delta(m)italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_m ) each round to maximize our reward against Player B, who chooses their strategies ytΔ(n)subscript𝑦𝑡Δ𝑛y_{t}\in\Delta(n)italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ) according to a no-regret algorithm (in particular, online projected gradient descent). The game (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is only revealed after both players have chosen strategies for round t𝑡titalic_t. Our aim here is to illustrate the feasibility of steering the opponent’s trajectory, and so we consider games where Player A’s reward is predominantly a function only of Player B’s actions. We assume that xAtxAtδtdelimited-∥∥𝑥subscript𝐴𝑡𝑥subscriptsuperscript𝐴𝑡subscript𝛿𝑡\left\lVert xA_{t}-x{A^{*}_{t}}\right\rVert\leq\delta_{t}∥ italic_x italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for any xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ), where each Atsubscriptsuperscript𝐴𝑡A^{*}_{t}italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a matrix with identical rows, and that per-round changes to Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are bounded, with xBtxBt1ϵtdelimited-∥∥𝑥subscript𝐵𝑡𝑥subscript𝐵𝑡1subscriptitalic-ϵ𝑡\left\lVert xB_{t}-xB_{t-1}\right\rVert\leq\epsilon_{t}∥ italic_x italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for any xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ). We measure the regret of an algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A with respect to any profile (x,y)Δ(m)×Δ(n)𝑥𝑦Δ𝑚Δ𝑛(x,y)\in\Delta(m)\times\Delta(n)( italic_x , italic_y ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ), where

RegT(𝒜)=subscriptReg𝑇𝒜absent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = max(x,y)Δ(m)×Δ(n)t=1TxAtyxtAtyt.subscript𝑥𝑦Δ𝑚Δ𝑛superscriptsubscript𝑡1𝑇𝑥subscript𝐴𝑡𝑦subscript𝑥𝑡subscript𝐴𝑡subscript𝑦𝑡\displaystyle\;\max_{(x,y)\in\Delta(m)\times\Delta(n)}\sum_{t=1}^{T}xA_{t}y-x_% {t}A_{t}y_{t}.roman_max start_POSTSUBSCRIPT ( italic_x , italic_y ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

When Player B plays OGDOGD\operatorname{\textup{OGD}}opgd with step size θ=Θ(T1/2)𝜃Θsuperscript𝑇12\theta=\Theta(T^{-1/2})italic_θ = roman_Θ ( italic_T start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ), their strategy updates each round as

yt+1=subscript𝑦𝑡1absent\displaystyle y_{t+1}=italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = ΠΔ(n)(yt+θ(xtBt)),subscriptΠΔ𝑛subscript𝑦𝑡𝜃subscript𝑥𝑡subscript𝐵𝑡\displaystyle\;\Pi_{\Delta(n)}\left(y_{t}+\theta(x_{t}B_{t})\right),roman_Π start_POSTSUBSCRIPT roman_Δ ( italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ,

with y1=𝐮nsubscript𝑦1subscript𝐮𝑛y_{1}=\mathbf{u}_{n}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and yields regret O(T)𝑂𝑇O(\sqrt{T})italic_O ( square-root start_ARG italic_T end_ARG ) for Player B with respect to any yΔ(n)𝑦Δ𝑛y\in\Delta(n)italic_y ∈ roman_Δ ( italic_n ) for the loss sequence {xtBt:t[T]}conditional-setsubscript𝑥𝑡subscript𝐵𝑡𝑡delimited-[]𝑇\{x_{t}B_{t}:t\in[T]\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_t ∈ [ italic_T ] }. To cast this in our framework, we consider Δ(n)=𝒴Δ𝑛𝒴\Delta(n)=\operatorname{\mathcal{Y}}roman_Δ ( italic_n ) = caligraphic_Y as our state space, where we select actions xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT to induce desired updates to ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and optimize over the surrogate losses {𝐮mAtyt:t[T]}conditional-setsubscript𝐮𝑚subscriptsuperscript𝐴𝑡subscript𝑦𝑡𝑡delimited-[]𝑇\{\mathbf{u}_{m}A^{*}_{t}y_{t}:t\in[T]\}{ bold_u start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_t ∈ [ italic_T ] }. While we do not see Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT prior to choosing each xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we view our update errors from instead selecting an action in terms of the dynamics resulting from Bt1subscript𝐵𝑡1B_{t-1}italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT as adversarial disturbances and run NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap, as the dynamics are strongly locally controllable.

Theorem 10 (Regret Minimization in Online Games).

For θ=Θ(T1/2)𝜃Θsuperscript𝑇12\theta=\Theta(T^{-1/2})italic_θ = roman_Θ ( italic_T start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ), repeated play against OGDOGD\operatorname{\textup{OGD}}opgd in online m×n𝑚𝑛m\times nitalic_m × italic_n games can be cast as a θ𝜃\thetaitalic_θ-strongly locally controllable instance of online control with nonlinear dynamics, for which NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap obtains regret O(T+t(δt+ϵt))𝑂𝑇subscript𝑡subscript𝛿𝑡subscriptitalic-ϵ𝑡O(\sqrt{T}+\sum_{t}(\delta_{t}+\epsilon_{t}))italic_O ( square-root start_ARG italic_T end_ARG + ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ).

References

  • Abernethy et al. (2008) Jacob D. Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Annual Conference Computational Learning Theory, 2008. URL https://api.semanticscholar.org/CorpusID:8547150.
  • Agarwal and Brown (2022) Arpit Agarwal and William Brown. Diversified recommendations for agents with adaptive preferences. In Advances in Neural Information Processing Systems, volume 35, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/a75db7d2ee1e4bee8fb819979b0a6cad-Paper-Conference.pdf.
  • Agarwal and Brown (2023) Arpit Agarwal and William Brown. Online recommendations for agents with discounted adaptive preferences, 2023.
  • Agarwal et al. (2019a) Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, and Karan Singh. Online control with adversarial disturbances, 2019a.
  • Agarwal et al. (2019b) Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019b. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/78719f11fa2df9917de3110133506521-Paper.pdf.
  • Agrawal et al. (2023) Shipra Agrawal, Yiding Feng, and Wei Tang. Dynamic pricing and learning with bayesian persuasion, 2023.
  • Ahmadi et al. (2023) Saba Ahmadi, Avrim Blum, and Kunhe Yang. Fundamental bounds on online strategic classification, 2023.
  • Alcantara-Jiménez and Clempner (2020) Guillermo Alcantara-Jiménez and Julio B. Clempner. Repeated stackelberg security games: Learning with incomplete state information. Reliability Engineering & System Safety, 195:106695, 2020. ISSN 0951-8320. doi: https://doi.org/10.1016/j.ress.2019.106695. URL https://www.sciencedirect.com/science/article/pii/S0951832019304478.
  • Anagnostides et al. (2022) Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, and Tuomas Sandholm. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022.
  • Anagnostides et al. (2023) Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, and Tuomas Sandholm. On the convergence of no-regret learning dynamics in time-varying games, 2023.
  • Anava et al. (2014) Oren Anava, Elad Hazan, and Shie Mannor. Online convex optimization against adversaries with memory and application to statistical arbitrage, 2014.
  • Aoki (1974) Masanao Aoki. Local Controllability of a Decentralized Economic System1. The Review of Economic Studies, 41(1):51–63, 01 1974. ISSN 0034-6527. doi: 10.2307/2296398. URL https://doi.org/10.2307/2296398.
  • Balcan et al. (2015) Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334105. doi: 10.1145/2764468.2764478. URL https://doi.org/10.1145/2764468.2764478.
  • Barbero-Liñán and Jakubczyk (2013) M. Barbero-Liñán and B. Jakubczyk. Second order conditions for optimality and local controllability of discrete-time systems, 2013.
  • Blum et al. (2008) Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimization and the price of total anarchy. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 373–382, 2008.
  • Blum et al. (2014) Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/cc1aa436277138f61cda703991069eaf-Paper.pdf.
  • Boscain et al. (2021) Ugo Boscain, Daniele Cannarsa, Valentina Franceschi, and Mario Sigalotti. Local controllability does imply global controllability, 2021.
  • Braverman et al. (2017) Mark Braverman, Jieming Mao, Jon Schneider, and S. Matthew Weinberg. Selling to a no-regret buyer. CoRR, abs/1711.09176, 2017. URL http://arxiv.longhoe.net/abs/1711.09176.
  • Brown et al. (2022) Gavin Brown, Shlomi Hod, and Iden Kalemaj. Performative prediction in a stateful world, 2022.
  • Brown et al. (2023) William Brown, Jon Schneider, and Kiran Vodrahalli. Is learning in games good for the learners?, 2023.
  • Cassel et al. (2022) Asaf Cassel, Alon Cohen, and Tomer Koren. Efficient online linear control with stochastic convex costs and unknown dynamics, 2022.
  • Chen et al. (2022) Xinyi Chen, Edgar Minasyan, Jason D. Lee, and Elad Hazan. Provable regret bounds for deep online learning and control, 2022.
  • Cohen et al. (2018) Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. CoRR, abs/1806.07104, 2018. URL http://arxiv.longhoe.net/abs/1806.07104.
  • Collina et al. (2023) Natalie Collina, Eshwar Ram Arunachaleswaran, and Michael Kearns. Efficient stackelberg strategies for finitely repeated games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, page 643–651, Richland, SC, 2023. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  • Daskalakis and Syrgkanis (2015) Constantinos Daskalakis and Vasilis Syrgkanis. Learning in auctions: Regret is hard, envy is easy. CoRR, abs/1511.01411, 2015. URL http://arxiv.longhoe.net/abs/1511.01411.
  • Dean and Morgenstern (2022) Sarah Dean and Jamie Morgenstern. Preference dynamics under personalized recommendations, 2022.
  • Deng et al. (2019) Yuan Deng, Jon Schneider, and Balusubramanian Sivan. Strategizing against no-regret learners, 2019.
  • Dong et al. (2018) **shuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, EC ’18, page 55–70, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450358293. doi: 10.1145/3219166.3219193. URL https://doi.org/10.1145/3219166.3219193.
  • Feng et al. (2019) Zhe Feng, Okke Schrijvers, and Eric Sodomka. Online learning for measuring incentive compatibility in ad auctions. CoRR, abs/1901.06808, 2019. URL http://arxiv.longhoe.net/abs/1901.06808.
  • Flaxman et al. (2004) Abraham Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. CoRR, cs.LG/0408007, 2004. URL http://arxiv.longhoe.net/abs/cs.LG/0408007.
  • Flaxman et al. (2016) Seth Flaxman, Sharad Goel, and Justin M. Rao. Filter Bubbles, Echo Chambers, and Online News Consumption. Public Opinion Quarterly, 80(S1):298–320, 03 2016. ISSN 0033-362X. doi: 10.1093/poq/nfw006. URL https://doi.org/10.1093/poq/nfw006.
  • Flokas et al. (2019) Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, and Georgios Piliouras. Poincaré recurrence, cycles and spurious equilibria in gradient-descent-ascent for non-convex non-concave zero-sum games, 2019.
  • Gaitonde et al. (2021) Jason Gaitonde, Jon M. Kleinberg, and Éva Tardos. Polarization in geometric opinion dynamics. In Péter Biró, Shuchi Chawla, and Federico Echenique, editors, EC ’21: The 22nd ACM Conference on Economics and Computation, Budapest, Hungary, July 18-23, 2021, pages 499–519. ACM, 2021.
  • Golrezaei et al. (2020) Negin Golrezaei, Adel Javanmard, and Vahab S. Mirrokni. Dynamic incentive-aware learning: Robust pricing in contextual auctions. CoRR, abs/2002.11137, 2020. URL https://arxiv.longhoe.net/abs/2002.11137.
  • Gradu et al. (2022) Paula Gradu, Elad Hazan, and Edgar Minasyan. Adaptive regret for control of time-varying dynamics, 2022.
  • Hardt et al. (2015) Moritz Hardt, Nimrod Megiddo, Christos H. Papadimitriou, and Mary Wootters. Strategic classification. CoRR, abs/1506.06980, 2015. URL http://arxiv.longhoe.net/abs/1506.06980.
  • Hartline et al. (2015a) Jason Hartline, Vasilis Syrgkanis, and Eva Tardos. No-regret learning in bayesian games. Advances in Neural Information Processing Systems, 28, 2015a.
  • Hartline et al. (2015b) Jason D. Hartline, Vasilis Syrgkanis, and Éva Tardos. No-regret learning in repeated bayesian games. CoRR, abs/1507.00418, 2015b. URL http://arxiv.longhoe.net/abs/1507.00418.
  • Hazan (2021) Elad Hazan. Introduction to online convex optimization, 2021.
  • Hazan and Levy (2014) Elad Hazan and Kfir Levy. Bandit convex optimization: Towards tight bounds. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  • Hazan and Singh (2022) Elad Hazan and Karan Singh. Introduction to online nonstochastic control, 2022.
  • Hazla et al. (2019) Jan Hazla, Yan **, Elchanan Mossel, and Govind Ramnarayan. A geometric model of opinion polarization. CoRR, abs/1910.05274, 2019.
  • Jagadeesan et al. (2022a) Meena Jagadeesan, Nikhil Garg, and Jacob Steinhardt. Supply-side equilibria in recommender systems, 2022a.
  • Jagadeesan et al. (2022b) Meena Jagadeesan, Tijana Zrnic, and Celestine Mendler-Dünner. Regret minimization with performative feedback. CoRR, abs/2202.00628, 2022b. URL https://arxiv.longhoe.net/abs/2202.00628.
  • Jia et al. (2014) Liyan Jia, Lang Tong, and Qing Zhao. An online learning approach to dynamic pricing for demand response, 2014.
  • Kakade et al. (2020) Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, and Wen Sun. Information theoretic regret bounds for online nonlinear control, 2020.
  • Kanoria and Nazerzadeh (2020) Yash Kanoria and Hamid Nazerzadeh. Dynamic reserve prices for repeated auctions: Learning from bids. CoRR, abs/2002.07331, 2020. URL https://arxiv.longhoe.net/abs/2002.07331.
  • Kuhn and Wohltmann (1989) H. Kuhn and H.-W. Wohltmann. Controllability of economic systems under alternative expectations hypotheses—the discrete case. Computers & Mathematics with Applications, 18(6):617–628, 1989. ISSN 0898-1221. doi: https://doi.org/10.1016/0898-1221(89)90112-0. URL https://www.sciencedirect.com/science/article/pii/0898122189901120.
  • Kumar et al. (2022) Raunak Kumar, Sarah Dean, and Robert D. Kleinberg. Online convex optimization with unbounded memory, 2022.
  • Lale et al. (2021) Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, and Anima Anandkumar. Model learning predictive control in nonlinear dynamical systems. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 757–762, 2021. doi: 10.1109/CDC45484.2021.9683670.
  • Lauffer et al. (2022) Niklas Lauffer, Mahsa Ghasemi, Abolfazl Hashemi, Yagiz Savas, and Ufuk Topcu. No-regret learning in dynamic stackelberg games, 2022.
  • Letchford et al. (2009) Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, 2009. URL https://api.semanticscholar.org/CorpusID:1795572.
  • Luo et al. (2022) Wenhao Luo, Wen Sun, and Ashish Kapoor. Sample-efficient safe learning for online nonlinear control with control barrier functions, 2022.
  • Lykouris et al. (2021) Thodoris Lykouris, Max Simchowitz, Alex Slivkins, and Wen Sun. Corruption-robust exploration in episodic reinforcement learning. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 3242–3245. PMLR, 15–19 Aug 2021. URL https://proceedings.mlr.press/v134/lykouris21a.html.
  • Mansour et al. (2022) Yishay Mansour, Mehryar Mohri, Jon Schneider, and Balasubramanian Sivan. Strategizing against learners in bayesian games, 2022.
  • Mehta et al. (2007) Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani. Adwords and generalized online matching. J. ACM, 54(5):22–es, oct 2007. ISSN 0004-5411. doi: 10.1145/1284320.1284321. URL https://doi.org/10.1145/1284320.1284321.
  • Mendler-Dünner et al. (2020) Celestine Mendler-Dünner, Juan Perdomo, Tijana Zrnic, and Moritz Hardt. Stochastic optimization for performative prediction. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 4929–4939. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/33e75ff09dd601bbe69f351039152189-Paper.pdf.
  • Miller et al. (2021) John Miller, Juan C. Perdomo, and Tijana Zrnic. Outside the echo chamber: Optimizing the performative risk. CoRR, abs/2102.08570, 2021. URL https://arxiv.longhoe.net/abs/2102.08570.
  • Minasyan et al. (2022) Edgar Minasyan, Paula Gradu, Max Simchowitz, and Elad Hazan. Online control of unknown time-varying dynamical systems, 2022.
  • Morgenstern and Roughgarden (2016) Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. CoRR, abs/1604.03171, 2016. URL http://arxiv.longhoe.net/abs/1604.03171.
  • Mussi et al. (2022) Marco Mussi, Gianmarco Genalti, Alessandro Nuara, Francesco Trovò, Marcello Restelli, and Nicola Gatti. Dynamic pricing with volume discounts in online settings, 2022.
  • Muthirayan and Khargonekar (2022) Deepan Muthirayan and Pramod P. Khargonekar. Online learning robust control of nonlinear dynamical systems, 2022.
  • Nedelec et al. (2020) Thomas Nedelec, Clement Calauzenes, Vianney Perchet, and Noureddine El Karoui. Robust stackelberg buyers in repeated auctions. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1342–1351. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/nedelec20a.html.
  • Neu and Olkhovskaya (2021) Gergely Neu and Julia Olkhovskaya. Online learning in mdps with linear function approximation and bandit feedback. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 10407–10417. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/5631e6ee59a4175cd06c305840562ff3-Paper.pdf.
  • Peng et al. (2019) Binghui Peng, Weiran Shen, **zhong Tang, and Song Zuo. Learning optimal strategies to commit to. In AAAI Conference on Artificial Intelligence, 2019. URL https://api.semanticscholar.org/CorpusID:92982174.
  • Perdomo et al. (2020) Juan C. Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. CoRR, abs/2002.06673, 2020. URL https://arxiv.longhoe.net/abs/2002.06673.
  • Piliouras and Yu (2022) Georgios Piliouras and Fang-Yi Yu. Multi-agent performative prediction: From global stability and optimality to chaos, 2022.
  • Roth et al. (2015) Aaron Roth, Jonathan R. Ullman, and Zhiwei Steven Wu. Watch and learn: Optimizing from revealed preferences feedback. CoRR, abs/1504.01033, 2015. URL http://arxiv.longhoe.net/abs/1504.01033.
  • Roughgarden (2015) Tim Roughgarden. Intrinsic robustness of the price of anarchy. J. ACM, 62(5), nov 2015. ISSN 0004-5411. doi: 10.1145/2806883. URL https://doi.org/10.1145/2806883.
  • Shalev-Shwartz and Singer (2006) Shai Shalev-Shwartz and Yoram Singer. Online learning meets optimization in the dual. In Proceedings of the 19th Annual Conference on Learning Theory, COLT’06, page 423–437, Berlin, Heidelberg, 2006. Springer-Verlag. ISBN 3540352945. doi: 10.1007/11776420˙32. URL https://doi.org/10.1007/11776420_32.
  • Shen et al. (2023) Lingqing Shen, Nam Ho-Nguyen, and Fatma Kılınç-Karzan. An online convex optimization-based framework for convex bilevel optimization. Mathematical Programming, 198(2):1519–1582, 04 2023. ISSN 1436-4646. doi: 10.1007/s10107-022-01894-5. URL https://doi.org/10.1007/s10107-022-01894-5.
  • Simchowitz et al. (2020) Max Simchowitz, Karan Singh, and Elad Hazan. Improper learning for non-stochastic control. CoRR, abs/2001.09254, 2020. URL https://arxiv.longhoe.net/abs/2001.09254.
  • Yue et al. (2012) Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012. ISSN 0022-0000. doi: https://doi.org/10.1016/j.jcss.2011.12.028. URL https://www.sciencedirect.com/science/article/pii/S0022000012000281. JCSS Special Issue: Cloud Computing 2011.
  • Zhang et al. (2023) Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, and Tuomas Sandholm. Steering no-regret learners to optimal equilibria, 2023.
  • Zhang et al. (2021) Xuezhou Zhang, Yiding Chen, Jerry Zhu, and Wen Sun. Corruption-robust offline reinforcement learning. CoRR, abs/2106.06630, 2021. URL https://arxiv.longhoe.net/abs/2106.06630.
  • Zinkevich (2003) Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. 2, 04 2003.
  • Zrnic et al. (2021a) Tijana Zrnic, Eric Mazumdar, S. Shankar Sastry, and Michael I. Jordan. Who leads and who follows in strategic classification? CoRR, abs/2106.12529, 2021a. URL https://arxiv.longhoe.net/abs/2106.12529.
  • Zrnic et al. (2021b) Tijana Zrnic, Eric Mazumdar, Shankar Sastry, and Michael Jordan. Who leads and who follows in strategic classification? In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 15257–15269. Curran Associates, Inc., 2021b. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/812214fb8e7066bfa6e32c626c2c688b-Paper.pdf.

Appendix A Omitted Proofs for Section 2

  • Proof

    of Proposition 1. Without loss of generality, assume αβ/2𝛼𝛽2\alpha\leq\beta/2italic_α ≤ italic_β / 2 and that T𝑇Titalic_T is even. Let ft=ytysubscript𝑓𝑡delimited-∥∥subscript𝑦𝑡𝑦f_{t}=\left\lVert y_{t}-y\right\rVertitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y ∥ for each t𝑡titalic_t. Consider any round t𝑡titalic_t where yt1Bα(y)subscript𝑦𝑡1subscript𝐵𝛼𝑦y_{t-1}\in B_{\alpha}(y)italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ); then, for all actions xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have that ytα(y)subscript𝑦𝑡subscript𝛼𝑦y_{t}\notin\operatorname{\mathcal{B}}_{\alpha}(y)italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ caligraphic_B start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ), as α(y)β(yt1)subscript𝛼𝑦subscript𝛽subscript𝑦𝑡1\operatorname{\mathcal{B}}_{\alpha}(y)\subseteq\operatorname{\mathcal{B}}_{% \beta}(y_{t-1})caligraphic_B start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) ⊆ caligraphic_B start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ); as such, we incur loss ft(yt)αsubscript𝑓𝑡subscript𝑦𝑡𝛼f_{t}(y_{t})\geq\alphaitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_α in round t𝑡titalic_t. Now suppose yt1Bα(y)subscript𝑦𝑡1subscript𝐵𝛼𝑦y_{t-1}\notin B_{\alpha}(y)italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∉ italic_B start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ); then, we must have incurred loss at least ft1(yt1)αsubscript𝑓𝑡1subscript𝑦𝑡1𝛼f_{t-1}(y_{t-1})\geq\alphaitalic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ≥ italic_α in round t1𝑡1t-1italic_t - 1. As losses are non-negative, our total loss is at least αT/2𝛼𝑇2\alpha T/2italic_α italic_T / 2, as loss α𝛼\alphaitalic_α is incurred at least every other round; given that the best fixed state y=ysuperscript𝑦𝑦y^{*}=yitalic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_y incurs total loss 00, we have that Reg𝒜(T)=Ω(T)subscriptReg𝒜𝑇Ω𝑇\operatorname{\textup{{Reg}}}_{\operatorname{\mathcal{A}}}(T)=\Omega(T)Reg start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_T ) = roman_Ω ( italic_T ) for any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A. ∎

  • Proof

    of Proposition 2. We begin by observing that for instances (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ), the class of state-targeting policies contains a policy which obtains the reward of the best fixed state up to O(Tρ1)𝑂𝑇superscript𝜌1O(\sqrt{T{\rho^{-1}}})italic_O ( square-root start_ARG italic_T italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ), for sufficiently large T𝑇Titalic_T. Consider the set 𝒴^={y𝒴:π(y)(Tρ)1/2}^𝒴conditional-setsuperscript𝑦𝒴𝜋superscript𝑦superscript𝑇𝜌12\hat{\operatorname{\mathcal{Y}}}=\{y^{*}\in\operatorname{\mathcal{Y}}:\pi(y^{*% })\geq(T\rho)^{-1/2}\}over^ start_ARG caligraphic_Y end_ARG = { italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y : italic_π ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ ( italic_T italic_ρ ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT }. Note that the reward of any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y is matched by some y𝒴^superscript𝑦^𝒴y^{*}\in\hat{\operatorname{\mathcal{Y}}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ over^ start_ARG caligraphic_Y end_ARG up to O(Tρ1)𝑂𝑇superscript𝜌1O(\sqrt{T\rho^{-1}})italic_O ( square-root start_ARG italic_T italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) for any fixed inner radius r𝑟ritalic_r, outer radius R𝑅Ritalic_R, and Lipschitz constant L𝐿Litalic_L. For any such ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, note that under the policy Pysubscript𝑃superscript𝑦P_{y^{*}}italic_P start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT when starting at y0=0subscript𝑦00y_{0}=0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, the distance between ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in each round t𝑡titalic_t is updated to at most:

    ytydelimited-∥∥subscript𝑦𝑡superscript𝑦absent\displaystyle\left\lVert y_{t}-y^{*}\right\rVert\leq∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ max(0,ρπ(yt1)).0𝜌𝜋subscript𝑦𝑡1\displaystyle\;\max\left(0,\rho\cdot\pi(y_{t-1})\right).roman_max ( 0 , italic_ρ ⋅ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) .

    It is straightforward to see that 𝒴^^𝒴\hat{\operatorname{\mathcal{Y}}}over^ start_ARG caligraphic_Y end_ARG is convex, and so our state ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will never leave 𝒴^^𝒴\hat{\operatorname{\mathcal{Y}}}over^ start_ARG caligraphic_Y end_ARG on its path to ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; as such, we reach ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT within O(Tρ1)𝑂𝑇superscript𝜌1O(\sqrt{T\rho^{-1}})italic_O ( square-root start_ARG italic_T italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) rounds, after which point our reward exactly tracks that of ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. For some y𝒴^superscript𝑦^𝒴y^{*}\in\hat{\operatorname{\mathcal{Y}}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ over^ start_ARG caligraphic_Y end_ARG, this yields a regret for Pysubscript𝑃superscript𝑦P_{y^{*}}italic_P start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of at most O(Tρ1)𝑂𝑇superscript𝜌1O(\sqrt{T\rho^{-1}})italic_O ( square-root start_ARG italic_T italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) to the best fixed state in 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y.

    Next, consider an instance where 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X and 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y are both the unit ball in nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. With y0=0subscript𝑦00y_{0}=0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, let the dynamics be given by

    yt=subscript𝑦𝑡absent\displaystyle y_{t}=italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = Π𝒴(yt1+xt).subscriptΠ𝒴subscript𝑦𝑡1subscript𝑥𝑡\displaystyle\;\Pi_{\operatorname{\mathcal{Y}}}\left(y_{t-1}+x_{t}\right).roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

    Observe that this satisfies ρ𝜌\rhoitalic_ρ-local controllability for any ρ1𝜌1\rho\leq 1italic_ρ ≤ 1, as a ball of radius π(yt1)𝜋subscript𝑦𝑡1\pi(y_{t-1})italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is always feasible around yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. Let each loss ft=yp2subscript𝑓𝑡superscriptdelimited-∥∥𝑦𝑝2f_{t}=\left\lVert y-p\right\rVert^{2}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∥ italic_y - italic_p ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for some p0𝑝0p\neq 0italic_p ≠ 0. Immediately we can see that any matrix policy K𝒫𝒦𝐾subscript𝒫𝒦K\in\mathcal{P}_{\mathcal{K}}italic_K ∈ caligraphic_P start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT has regret Ω(T)Ω𝑇\Omega(T)roman_Ω ( italic_T ), as the action xt=0subscript𝑥𝑡0x_{t}=0italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 will be played in each round. ∎

Appendix B Follow the Regularized Leader

Here we state the FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl algorithm and several of its key properties; see e.g. Hazan (2021) for proofs of Propositions 4 and 5.

Algorithm 2 Follow the Regularized Leader (FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl)
  Choose a time horizon T𝑇Titalic_T, step size η𝜂\etaitalic_η, and γ𝛾\gammaitalic_γ-strongly convex regularizer ψ:𝒴:𝜓𝒴\psi:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_ψ : caligraphic_Y → blackboard_R
  Let y1=argminy𝒴ψ(y)subscript𝑦1subscriptargmin𝑦𝒴𝜓𝑦y_{1}=\text{argmin}_{y\in\operatorname{\mathcal{Y}}}~{}\psi(y)italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = argmin start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_ψ ( italic_y )
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Play ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and observe loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
     Set t=ft(yt)subscript𝑡subscript𝑓𝑡subscript𝑦𝑡\nabla_{t}=\nabla f_{t}(y_{t})∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
     Set yt+1=argminy𝒴(ηs=1tsy+ψ(y))subscript𝑦𝑡1subscriptargmin𝑦𝒴𝜂superscriptsubscript𝑠1𝑡superscriptsubscript𝑠top𝑦𝜓𝑦y_{t+1}=\text{argmin}_{y\in\operatorname{\mathcal{Y}}}\left(\eta\cdot\sum_{s=1% }^{t}\nabla_{s}^{\top}y+\psi(y)\right)italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = argmin start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT ( italic_η ⋅ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_y + italic_ψ ( italic_y ) )
  end for
Proposition 4.

For a γ𝛾\gammaitalic_γ-strongly convex regularizer ψ:𝒴:𝜓𝒴\psi:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_ψ : caligraphic_Y → blackboard_R where |ψ(y)ψ(y)|G𝜓𝑦𝜓superscript𝑦𝐺\left\lvert\psi(y)-\psi(y^{\prime})\right\rvert\leq G| italic_ψ ( italic_y ) - italic_ψ ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_G for all y,y𝒴𝑦superscript𝑦𝒴y,y^{\prime}\in\operatorname{\mathcal{Y}}italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y, and for convex L𝐿Litalic_L-Lipschitz losses f1,,fTsubscript𝑓1subscript𝑓𝑇f_{1},\ldots,f_{T}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, the regret of FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl is bounded by

RegT(FTRL)subscriptReg𝑇FTRLabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\textup{$\operatorname{\textup{% {FTRL}}}$})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ftrl ) ≤ ηTL2γ+Gη.𝜂𝑇superscript𝐿2𝛾𝐺𝜂\displaystyle\;\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}.italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG .
Proposition 5.

Any pair of points ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT chosen by FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl satisfies yt+1ytηLγdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡𝜂𝐿𝛾\left\lVert y_{t+1}-y_{t}\right\rVert\leq\eta\frac{L}{\gamma}∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_η divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG.

Appendix C Analysis for NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl

  • Proof

    of Theorem 1. First we show that any point chosen by FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl will be feasible under local controllability, by induction. It is straightforward to see that 𝒴~~𝒴\tilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG is convex and 𝒴~𝒴~𝒴𝒴\tilde{\operatorname{\mathcal{Y}}}\subseteq\operatorname{\mathcal{Y}}over~ start_ARG caligraphic_Y end_ARG ⊆ caligraphic_Y; further, any y𝒴~𝑦~𝒴y\in\tilde{\operatorname{\mathcal{Y}}}italic_y ∈ over~ start_ARG caligraphic_Y end_ARG is bounded away from bd(𝒴)bd𝒴\operatorname*{bd}(\operatorname{\mathcal{Y}})roman_bd ( caligraphic_Y ). By the definition of 𝒴~~𝒴\tilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG, we have that y=(1δ)y𝑦1𝛿superscript𝑦y=(1-\delta)y^{\prime}italic_y = ( 1 - italic_δ ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for some y𝒴superscript𝑦𝒴y^{\prime}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y. Recall that r(𝟎)𝒴subscript𝑟0𝒴\operatorname{\mathcal{B}}_{r}(\mathbf{0})\subseteq\operatorname{\mathcal{Y}}caligraphic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) ⊆ caligraphic_Y, and note that δr(y)={y+δy^:y^r(𝟎)}subscript𝛿𝑟𝑦conditional-set𝑦𝛿^𝑦^𝑦subscript𝑟0\operatorname{\mathcal{B}}_{\delta r}(y)=\{y+\delta\hat{y}:\hat{y}\in% \operatorname{\mathcal{B}}_{r}(\mathbf{0})\}caligraphic_B start_POSTSUBSCRIPT italic_δ italic_r end_POSTSUBSCRIPT ( italic_y ) = { italic_y + italic_δ over^ start_ARG italic_y end_ARG : over^ start_ARG italic_y end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ) }. Let y′′superscript𝑦′′y^{\prime\prime}italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT be any point in r(𝟎)subscript𝑟0\operatorname{\mathcal{B}}_{r}(\mathbf{0})caligraphic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ). By convexity of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, we then have that any point (1δ)y+δy′′1𝛿superscript𝑦𝛿superscript𝑦′′(1-\delta)y^{\prime}+\delta y^{\prime\prime}( 1 - italic_δ ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_δ italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT lies in 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, and so for any y𝒴~𝑦~𝒴y\in\tilde{\operatorname{\mathcal{Y}}}italic_y ∈ over~ start_ARG caligraphic_Y end_ARG we have that rδ(y)𝒴subscript𝑟𝛿𝑦𝒴\operatorname{\mathcal{B}}_{r\delta}(y)\subseteq\operatorname{\mathcal{Y}}caligraphic_B start_POSTSUBSCRIPT italic_r italic_δ end_POSTSUBSCRIPT ( italic_y ) ⊆ caligraphic_Y. Each yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT lies in 𝒴~~𝒴\tilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG, and so we have that π(yt1)rδ𝜋subscript𝑦𝑡1𝑟𝛿\pi(y_{t-1})\geq r\deltaitalic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ≥ italic_r italic_δ; as such, any point ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in rδρ(yt1)ρπ(yt1)(yt1)subscript𝑟𝛿𝜌subscript𝑦𝑡1subscript𝜌𝜋subscript𝑦𝑡1subscript𝑦𝑡1\operatorname{\mathcal{B}}_{r\delta\rho}(y_{t-1})\subseteq\operatorname{% \mathcal{B}}_{\rho\cdot\pi(y_{t-1})}(y_{t-1})caligraphic_B start_POSTSUBSCRIPT italic_r italic_δ italic_ρ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ⊆ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is feasible. Given that ηLγrδρ𝜂𝐿𝛾𝑟𝛿𝜌\eta\frac{L}{\gamma}\leq r\delta\rhoitalic_η divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG ≤ italic_r italic_δ italic_ρ, by Proposition 5 we have that ytrδρ(yt1)subscript𝑦𝑡subscript𝑟𝛿𝜌subscript𝑦𝑡1y_{t}\in\operatorname{\mathcal{B}}_{r\delta\rho}(y_{t-1})italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_r italic_δ italic_ρ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) in each round for the chosen point. Each action will be selected by solving for

    argminxt𝒳D(xt,yt1)y2\displaystyle\operatorname*{argmin}_{x_{t}\in\operatorname{\mathcal{X}}}\left% \lVert D(x_{t},y_{t-1})-y^{*}\right\rVert^{2}roman_argmin start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X end_POSTSUBSCRIPT ∥ italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

    via a call to Oracle(yt1,y)Oraclesubscript𝑦𝑡1superscript𝑦\texttt{Oracle}(y_{t-1},y^{*})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Each call is guaranteed to have a solution which achieves an objective of 0 where D(xt,yt1)=y𝐷subscript𝑥𝑡subscript𝑦𝑡1superscript𝑦D(x_{t},y_{t-1})=y^{*}italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for some yρπ(yt1)(yt1)superscript𝑦subscript𝜌𝜋subscript𝑦𝑡1subscript𝑦𝑡1y^{*}\in\operatorname{\mathcal{B}}_{\rho\cdot\pi(y_{t-1})}(y_{t-1})italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) by local controllability, yielding an exact state update to yt=ysubscript𝑦𝑡superscript𝑦y_{t}=y^{*}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as we assume Oracle can solve arbitrary non-convex minimization problems. To bound the regret, first note that for any y𝒴superscript𝑦𝒴y^{*}\in{\operatorname{\mathcal{Y}}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y, we have

    t=1Tft(yt)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ ηTL2γ+Gη+t=1Tft((1δ)y)𝜂𝑇superscript𝐿2𝛾𝐺𝜂superscriptsubscript𝑡1𝑇subscript𝑓𝑡1𝛿superscript𝑦\displaystyle\;\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}+\sum_{t=1}^{T}f_{t}((1% -\delta)y^{*})italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( 1 - italic_δ ) italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )

    by Proposition 4, as (1δ)y𝒴~1𝛿superscript𝑦~𝒴(1-\delta)y^{*}\in\tilde{\operatorname{\mathcal{Y}}}( 1 - italic_δ ) italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ over~ start_ARG caligraphic_Y end_ARG for any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y. Then, observe that for any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y, we have that

    t=1Tft((1δ)y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡1𝛿superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}((1-\delta)y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( 1 - italic_δ ) italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ t=1T(ft(y)+Lδy)superscriptsubscript𝑡1𝑇subscript𝑓𝑡superscript𝑦𝐿delimited-∥∥𝛿superscript𝑦\displaystyle\;\sum_{t=1}^{T}\left(f_{t}(y^{*})+L\left\lVert\delta y^{*}\right% \rVert\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_L ∥ italic_δ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ )
    \displaystyle\leq t=1T(ft(y)+δLR).superscriptsubscript𝑡1𝑇subscript𝑓𝑡superscript𝑦𝛿𝐿𝑅\displaystyle\;\sum_{t=1}^{T}\left(f_{t}(y^{*})+\delta LR\right).∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_δ italic_L italic_R ) .

    Combining the previous claims, we have that

    t=1Tft(yt)ft(y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ δTLR+ηTL2γ+Gη𝛿𝑇𝐿𝑅𝜂𝑇superscript𝐿2𝛾𝐺𝜂\displaystyle\;\delta TLR+\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}italic_δ italic_T italic_L italic_R + italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    =\displaystyle== η(1+Rrρ)TL2γ+Gη𝜂1𝑅𝑟𝜌𝑇superscript𝐿2𝛾𝐺𝜂\displaystyle\;\eta\left(1+\frac{R}{r\rho}\right)\frac{TL^{2}}{\gamma}+\frac{G% }{\eta}italic_η ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    =\displaystyle==  2(1+Rrρ)TGL2γ21𝑅𝑟𝜌𝑇𝐺superscript𝐿2𝛾\displaystyle\;2\sqrt{\frac{(1+\frac{R}{r\rho})TGL^{2}}{\gamma}}2 square-root start_ARG divide start_ARG ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG

    upon setting δ=ηLrργ𝛿𝜂𝐿𝑟𝜌𝛾\delta=\eta\frac{L}{r\rho\gamma}italic_δ = italic_η divide start_ARG italic_L end_ARG start_ARG italic_r italic_ρ italic_γ end_ARG and η=Gγ(1+Rrρ)TL2𝜂𝐺𝛾1𝑅𝑟𝜌𝑇superscript𝐿2\eta=\sqrt{\frac{G\gamma}{(1+\frac{R}{r\rho})TL^{2}}}italic_η = square-root start_ARG divide start_ARG italic_G italic_γ end_ARG start_ARG ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG, which yields the theorem. ∎

Appendix D Examples and Analysis for Action-Linear Dynamics

As a simple yet general example of dynamics which are both action-linear and locally controllable, consider update rules in which a step is taken by applying a nonsingular matrix transformation to the action, where the matrix can be parameterized by the state, with projection back into 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y if necessary.

Example 1.

Let both 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X and 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y be given by the unit ball 1(𝟎)subscript10\operatorname{\mathcal{B}}_{1}(\mathbf{0})caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_0 ) in nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. For any fixed y𝑦yitalic_y, let the updates from D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) be given by

D(x,y)=𝐷𝑥𝑦absent\displaystyle D(x,y)=italic_D ( italic_x , italic_y ) = Π𝒴(y+Ayx),subscriptΠ𝒴𝑦subscript𝐴𝑦𝑥\displaystyle\;\Pi_{\operatorname{\mathcal{Y}}}\left(y+A_{y}\cdot x\right),roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_y + italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x ) ,

where each Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is a square matrix with minimum absolute eigenvalue |λn(Ay)|π(y)ρsubscript𝜆𝑛subscript𝐴𝑦𝜋𝑦𝜌\left\lvert\lambda_{n}(A_{y})\right\rvert\geq\pi(y)\cdot\rho| italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) | ≥ italic_π ( italic_y ) ⋅ italic_ρ for some ρ>0𝜌0\rho>0italic_ρ > 0. Then, the instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) is action-linear and satisfies ρ𝜌\rhoitalic_ρ-local controllability.

  • Proof

    for Example 1. It is straightforward to see that D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) is action-linear. To show ρ𝜌\rhoitalic_ρ-local controllability, let ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be any point in ρπ(y)(y)subscript𝜌𝜋𝑦𝑦\operatorname{\mathcal{B}}_{\rho\cdot\pi(y)}(y)caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y ) end_POSTSUBSCRIPT ( italic_y ). It suffices to show that there is some x𝒳superscript𝑥𝒳x^{*}\in\operatorname{\mathcal{X}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_X such that Ayx=yysubscript𝐴𝑦superscript𝑥superscript𝑦𝑦A_{y}\cdot x^{*}=y^{*}-yitalic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y. As Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is non-singular, we can solve for x=Ay1(yy)superscript𝑥superscriptsubscript𝐴𝑦1superscript𝑦𝑦x^{*}=A_{y}^{-1}(y^{*}-y)italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y ), where yyρπ(y)delimited-∥∥superscript𝑦𝑦𝜌𝜋𝑦\left\lVert y^{*}-y\right\rVert\leq\rho\cdot\pi(y)∥ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y ∥ ≤ italic_ρ ⋅ italic_π ( italic_y ) and |λ1(Ay1)|1ρπ(y)subscript𝜆1superscriptsubscript𝐴𝑦11𝜌𝜋𝑦\left\lvert\lambda_{1}(A_{y}^{-1})\right\rvert\leq\frac{1}{\rho\cdot\pi(y)}| italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) | ≤ divide start_ARG 1 end_ARG start_ARG italic_ρ ⋅ italic_π ( italic_y ) end_ARG, and so we have that x1(𝟎)=𝒳superscript𝑥subscript10𝒳x^{*}\in\operatorname{\mathcal{B}}_{1}(\mathbf{0})=\operatorname{\mathcal{X}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_0 ) = caligraphic_X. ∎

We can also extend this to include state-parameterized generalizations of any linear system governed by nonsingular matrices over a bounded-radius state space (for a sufficiently large action space).

Example 2.

Let 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y be given by the radius-R𝑅Ritalic_R ball R(𝟎)subscript𝑅0\operatorname{\mathcal{B}}_{R}(\mathbf{0})caligraphic_B start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( bold_0 ) in nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and let 𝒳=cR(𝟎)𝒳subscript𝑐𝑅0\operatorname{\mathcal{X}}=\operatorname{\mathcal{B}}_{cR}(\mathbf{0})caligraphic_X = caligraphic_B start_POSTSUBSCRIPT italic_c italic_R end_POSTSUBSCRIPT ( bold_0 ). For any fixed y𝑦yitalic_y, let the updates from D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) be given by

D(x,y)=𝐷𝑥𝑦absent\displaystyle D(x,y)=italic_D ( italic_x , italic_y ) = Π𝒴(Kyy+Ayx),subscriptΠ𝒴subscript𝐾𝑦𝑦subscript𝐴𝑦𝑥\displaystyle\;\Pi_{\operatorname{\mathcal{Y}}}\left(K_{y}\cdot y+A_{y}\cdot x% \right),roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_y + italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x ) ,

where both Kysubscript𝐾𝑦K_{y}italic_K start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are square matrices. For any y𝑦yitalic_y, let My=KyIsubscript𝑀𝑦subscript𝐾𝑦𝐼M_{y}=K_{y}-Iitalic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_I, and suppose we take c𝑐citalic_c large enough such that c|λn(Ay)||λ1(My)|+π(y)ρ𝑐subscript𝜆𝑛subscript𝐴𝑦subscript𝜆1subscript𝑀𝑦𝜋𝑦𝜌c\cdot\left\lvert\lambda_{n}(A_{y})\right\rvert\geq\left\lvert\lambda_{1}(M_{y% })\right\rvert+\pi(y)\cdot\rhoitalic_c ⋅ | italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) | ≥ | italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) | + italic_π ( italic_y ) ⋅ italic_ρ for some ρ>0𝜌0\rho>0italic_ρ > 0. Then, the instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) is action-linear and satisfies ρ𝜌\rhoitalic_ρ-local controllability.

  • Proof

    for Example 2. Here, again it is evident that D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) is action-linear, and so it suffices to show that there is some x𝒳superscript𝑥𝒳x^{*}\in\operatorname{\mathcal{X}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_X such that

    Kyy+Ayx=subscript𝐾𝑦𝑦subscript𝐴𝑦superscript𝑥absent\displaystyle K_{y}\cdot y+A_{y}\cdot x^{*}=italic_K start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_y + italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = y+Myy+Ayx𝑦subscript𝑀𝑦𝑦subscript𝐴𝑦superscript𝑥\displaystyle\;y+M_{y}\cdot y+A_{y}\cdot x^{*}italic_y + italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_y + italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
    =\displaystyle== ysuperscript𝑦\displaystyle\;y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

    for any ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in ρπ(y)(y)subscript𝜌𝜋𝑦𝑦\operatorname{\mathcal{B}}_{\rho\cdot\pi(y)}(y)caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y ) end_POSTSUBSCRIPT ( italic_y ). As in the proof for Example 1, we have that MyyR|λ1(My)|delimited-∥∥subscript𝑀𝑦𝑦𝑅subscript𝜆1subscript𝑀𝑦\left\lVert M_{y}\cdot y\right\rVert\leq R\cdot\left\lvert\lambda_{1}(M_{y})\right\rvert∥ italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_y ∥ ≤ italic_R ⋅ | italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) |, and for large enough c𝑐citalic_c there is some xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that Ayx=y^subscript𝐴𝑦superscript𝑥^𝑦A_{y}\cdot x^{*}=\hat{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = over^ start_ARG italic_y end_ARG for any y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG where y^R|λ1(My)|+π(y)ρdelimited-∥∥^𝑦𝑅subscript𝜆1subscript𝑀𝑦𝜋𝑦𝜌\left\lVert\hat{y}\right\rVert\leq R\cdot\left\lvert\lambda_{1}(M_{y})\right% \rvert+\pi(y)\cdot\rho∥ over^ start_ARG italic_y end_ARG ∥ ≤ italic_R ⋅ | italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) | + italic_π ( italic_y ) ⋅ italic_ρ. Thus, any point yR|λ1(My)|+π(y)ρ(y+Myy)superscript𝑦subscript𝑅subscript𝜆1subscript𝑀𝑦𝜋𝑦𝜌𝑦subscript𝑀𝑦𝑦y^{*}\in\operatorname{\mathcal{B}}_{R\cdot\left\lvert\lambda_{1}(M_{y})\right% \rvert+\pi(y)\cdot\rho}(y+M_{y}\cdot y)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_R ⋅ | italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) | + italic_π ( italic_y ) ⋅ italic_ρ end_POSTSUBSCRIPT ( italic_y + italic_M start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_y ) is feasible by some xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which contains the ball π(y)ρ(y)subscript𝜋𝑦𝜌𝑦\operatorname{\mathcal{B}}_{\pi(y)\cdot\rho}(y)caligraphic_B start_POSTSUBSCRIPT italic_π ( italic_y ) ⋅ italic_ρ end_POSTSUBSCRIPT ( italic_y ). ∎

Appendix E Algorithms for Adversarial Disturbances

E.1 NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap and Proofs for Theorem 2

We show that it is possible simulate NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl over the undisturbed states y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT under the assumption that the dynamics are in αρ𝛼𝜌\alpha\rhoitalic_α italic_ρ-locally controllable for some α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) while retaining sufficient range in the feasible region around ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to correct for the disturbance wt1subscript𝑤𝑡1w_{t-1}italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT from the previous round. Here, the oracle call for computing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in each round is updated to consider the true state yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT.

Algorithm 3 NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl with Adversarial Disturbances (NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap).
  Initialize NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for T𝑇Titalic_T rounds over (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) for αρ𝛼𝜌\alpha\rhoitalic_α italic_ρ-locally controllable dynamics
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Let y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the target state chosen by NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl
     Use Oracle(yt1,y^t)Oraclesubscript𝑦𝑡1subscript^𝑦𝑡\texttt{Oracle}(y_{t-1},\hat{y}_{t})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to compute xt=argminx𝒳D(x,yt1)y^t2x_{t}=\operatorname*{argmin}_{x\in\operatorname{\mathcal{X}}}\left\lVert D(x,y% _{t-1})-\hat{y}_{t}\right\rVert^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ italic_D ( italic_x , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
     Play action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
     Observe disturbed state yt=y^t+wtsubscript𝑦𝑡subscript^𝑦𝑡subscript𝑤𝑡y_{t}=\hat{y}_{t}+w_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
     Update NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl with state y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(y^t)subscript𝑓𝑡subscript^𝑦𝑡f_{t}(\hat{y}_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
  end for

Theorem 2 follows directly from Theorems 11, 12, and 13. Intuitively, when the per-round disturbance magnitude is at most ραρ1+ρπ(D(xt,yt1))𝜌𝛼𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\frac{\rho-\alpha\rho}{1+\rho}\cdot\pi\left(D(x_{t},y_{t-1})\right)divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), one can calibrate NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for the case of αρ𝛼𝜌\alpha\rhoitalic_α italic_ρ-locally controllable dynamics and maintain sufficient “slack” to correct for the previous round’s disturbance in every round. When disturbances exceed ρ1+ρπ(D(xt,yt1))𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\frac{\rho}{1+\rho}\cdot\pi\left(D(x_{t},y_{t-1})\right)divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), an adversary can continually push the state towards the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, which may require vanishing disturbance magnitude as rounds progress due to the limited range promised by local controllability near the boundary.

Theorem 11.

For a ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with convex losses ft:𝒴:subscript𝑓𝑡𝒴f_{t}:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_Y → blackboard_R and adversarial disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT where wtραρ1+ρπ(D(xt,yt1))delimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\left\lVert w_{t}\right\rVert\leq\frac{\rho-\alpha\rho}{1+\rho}\cdot\pi\left(D% (x_{t},y_{t-1})\right)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) and t=1TwtEsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_E, the regret of NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap with respect to the reward of any state is bounded by

RegT(NestedOCO-BD)subscriptReg𝑇NestedOCO-BDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-BD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrlap ) ≤ O(T(αρ)1+E),𝑂𝑇superscript𝛼𝜌1𝐸\displaystyle\;O\left(\sqrt{T\cdot(\alpha\rho)^{-1}}+E\right),italic_O ( square-root start_ARG italic_T ⋅ ( italic_α italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG + italic_E ) ,

with T𝑇Titalic_T queries made to an oracle for non-convex optimization.

  • Proof

    We show by induction that each call to Oracle(yt1,y^t)Oraclesubscript𝑦𝑡1subscript^𝑦𝑡\texttt{Oracle}(y_{t-1},\hat{y}_{t})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) yields a feasible action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying y^t=D(xt,yt1)subscript^𝑦𝑡𝐷subscript𝑥𝑡subscript𝑦𝑡1\hat{y}_{t}=D(x_{t},y_{t-1})over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). This is immediate for t=1𝑡1t=1italic_t = 1, and suppose this holds up to some round t1𝑡1t-1italic_t - 1, where we have that yt1=y^t1+wt1subscript𝑦𝑡1subscript^𝑦𝑡1subscript𝑤𝑡1y_{t-1}=\hat{y}_{t-1}+w_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. Given that NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl selects actions under αρ𝛼𝜌\alpha\rhoitalic_α italic_ρ-local controllability, we can bound

    y^ty^t1delimited-∥∥subscript^𝑦𝑡subscript^𝑦𝑡1absent\displaystyle\left\lVert\hat{y}_{t}-\hat{y}_{t-1}\right\rVert\leq∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ αρπ(y^t1).𝛼𝜌𝜋subscript^𝑦𝑡1\displaystyle\;\alpha\rho\cdot\pi(\hat{y}_{t-1}).italic_α italic_ρ ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) .

    Further, the magnitude of the disturbance wt1subscript𝑤𝑡1w_{t-1}italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is bounded by

    wt1delimited-∥∥subscript𝑤𝑡1absent\displaystyle\left\lVert w_{t-1}\right\rVert\leq∥ italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ ραρ1+ρπ(y^t1),𝜌𝛼𝜌1𝜌𝜋subscript^𝑦𝑡1\displaystyle\;\frac{\rho-\alpha\rho}{1+\rho}\cdot\pi(\hat{y}_{t-1}),divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ,

    yielding that

    y^tyt1delimited-∥∥subscript^𝑦𝑡subscript𝑦𝑡1absent\displaystyle\left\lVert\hat{y}_{t}-y_{t-1}\right\rVert\leq∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ y^ty^t1wt1delimited-∥∥subscript^𝑦𝑡subscript^𝑦𝑡1subscript𝑤𝑡1\displaystyle\;\left\lVert\hat{y}_{t}-\hat{y}_{t-1}-w_{t-1}\right\rVert∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥
    \displaystyle\leq (αρ+ραρ1+ρ)π(y^t1).𝛼𝜌𝜌𝛼𝜌1𝜌𝜋subscript^𝑦𝑡1\displaystyle\;\left(\alpha\rho+\frac{\rho-\alpha\rho}{1+\rho}\right)\cdot\pi(% \hat{y}_{t-1}).( italic_α italic_ρ + divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ) ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) . (yt1=wt1+y^t1subscript𝑦𝑡1subscript𝑤𝑡1subscript^𝑦𝑡1y_{t-1}=w_{t-1}+\hat{y}_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT)

    As such, we have that

    ρπ(yt1)𝜌𝜋subscript𝑦𝑡1absent\displaystyle\rho\cdot\pi({y}_{t-1})\geqitalic_ρ ⋅ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ≥ ρ(1ραρ1+ρ)π(y^t1)𝜌1𝜌𝛼𝜌1𝜌𝜋subscript^𝑦𝑡1\displaystyle\;\rho\left(1-\frac{\rho-\alpha\rho}{1+\rho}\right)\cdot\pi(\hat{% y}_{t-1})italic_ρ ( 1 - divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ) ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
    =\displaystyle== ρ(α+1α1+ρ)π(y^t1),𝜌𝛼1𝛼1𝜌𝜋subscript^𝑦𝑡1\displaystyle\;\rho\left(\alpha+\frac{1-\alpha}{1+\rho}\right)\cdot\pi(\hat{y}% _{t-1}),italic_ρ ( italic_α + divide start_ARG 1 - italic_α end_ARG start_ARG 1 + italic_ρ end_ARG ) ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ,

    and so by ρ𝜌\rhoitalic_ρ-local controllability some feasible action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT exists, as y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT lies in ρπ(yt1)subscript𝜌𝜋subscript𝑦𝑡1\operatorname{\mathcal{B}}_{\rho\cdot\pi(y_{t-1})}caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT. The regret bound for NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl holds over the states y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and so we can bound the total regret of NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap with respect to any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y as:

    t=1Tft(yt)ft(y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ t=1Tft(y^t)ft(y)+Lyty^tsuperscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript^𝑦𝑡subscript𝑓𝑡superscript𝑦𝐿delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡\displaystyle\;\sum_{t=1}^{T}f_{t}(\hat{y}_{t})-f_{t}(y^{*})+L\left\lVert y_{t% }-\hat{y}_{t}\right\rVert∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥
    \displaystyle\leq RegT(OEN-FTRL)+Lt=1TwtsubscriptReg𝑇OEN-FTRL𝐿superscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(\textup{OEN-FTRL})+L\sum_{t=1% }^{T}\left\lVert w_{t}\right\rVertReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( OEN-FTRL ) + italic_L ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ (Thm. 1)
    \displaystyle\leq  2(1+Rrαρ)TGL2γ+LE.21𝑅𝑟𝛼𝜌𝑇𝐺superscript𝐿2𝛾𝐿𝐸\displaystyle\;2\sqrt{\frac{(1+\frac{R}{r\alpha\rho})TGL^{2}}{\gamma}}+LE.2 square-root start_ARG divide start_ARG ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_α italic_ρ end_ARG ) italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG + italic_L italic_E .

We show that the dependence on E𝐸Eitalic_E is tight up to the constant. Note that we we can obtain regret O(T(αρ)1)+LE𝑂𝑇superscript𝛼𝜌1𝐿𝐸O(\sqrt{T\cdot(\alpha\rho)^{-1}})+LEitalic_O ( square-root start_ARG italic_T ⋅ ( italic_α italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) + italic_L italic_E in the following instance via NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap.

Theorem 12 (Regret Lower Bound for Bounded Disturbances).

Suppose for any α>0𝛼0\alpha>0italic_α > 0 and ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ] an adversary can choose wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with wtραρ1+ρπ(D(xt,yt1))delimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\left\lVert w_{t}\right\rVert\leq\frac{\rho-\alpha\rho}{1+\rho}\cdot\pi\left(D% (x_{t},y_{t-1})\right)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), where t=1Twt=Esuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert=E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = italic_E for any E𝐸Eitalic_E. There is a ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with L𝐿Litalic_L-Lipschitz convex losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains regret RegT(𝒜)max(LE,ραρ1+ρTL)subscriptReg𝑇𝒜𝐿𝐸𝜌𝛼𝜌1𝜌𝑇𝐿\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})\geq\max(LE,\frac% {\rho-\alpha\rho}{1+\rho}TL)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) ≥ roman_max ( italic_L italic_E , divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG italic_T italic_L ).

  • Proof

    Consider any norm delimited-∥∥\left\lVert\cdot\right\rVert∥ ⋅ ∥ over nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Let 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y be the unit ball B1(𝟎)subscript𝐵10B_{1}(\mathbf{0})italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_0 ), and let each ft(yt)=Lytsubscript𝑓𝑡subscript𝑦𝑡𝐿delimited-∥∥subscript𝑦𝑡f_{t}(y_{t})=L\left\lVert y_{t}\right\rVertitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥. Consider any action space 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X and dynamics D𝐷Ditalic_D where ρ𝜌\rhoitalic_ρ-local controllability exactly characterizes the range of D𝐷Ditalic_D, i.e. for any y𝑦yitalic_y and ysuperscript𝑦y^{\prime}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{\prime}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if and only if yρπ(y)(x,y)superscript𝑦subscript𝜌𝜋𝑦𝑥𝑦y^{\prime}\in\operatorname{\mathcal{B}}_{\rho\cdot\pi(y)}(x,y)italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y ) end_POSTSUBSCRIPT ( italic_x , italic_y ).

First, note that π(y)=1y𝜋𝑦1delimited-∥∥𝑦\pi(y)=1-\left\lVert y\right\rVertitalic_π ( italic_y ) = 1 - ∥ italic_y ∥ for any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y. In each round t𝑡titalic_t , suppose an algorithm plays an action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at state yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT which yields an target undisturbed update y^=D(xt,yt1)^𝑦𝐷subscript𝑥𝑡subscript𝑦𝑡1\hat{y}=D(x_{t},y_{t-1})over^ start_ARG italic_y end_ARG = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). The adversary can then choose any wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying wtραρ1+ρ(1y^t)delimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌1delimited-∥∥subscript^𝑦𝑡\left\lVert w_{t}\right\rVert\leq\frac{\rho-\alpha\rho}{1+\rho}\cdot(1-\left% \lVert\hat{y}_{t}\right\rVert)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ ( 1 - ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ); suppose each wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is given by

wt=subscript𝑤𝑡absent\displaystyle w_{t}=italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = y^tραρ1+ρ(1y^t)y^tsubscript^𝑦𝑡𝜌𝛼𝜌1𝜌1delimited-∥∥subscript^𝑦𝑡delimited-∥∥subscript^𝑦𝑡\displaystyle\;\hat{y}_{t}\cdot\frac{\frac{\rho-\alpha\rho}{1+\rho}\cdot(1-% \left\lVert\hat{y}_{t}\right\rVert)}{\left\lVert\hat{y}_{t}\right\rVert}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ divide start_ARG divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ ( 1 - ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ) end_ARG start_ARG ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ end_ARG

if y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is non-zero, and an arbitrary vector wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with wt=ραρ1+ρdelimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌\left\lVert w_{t}\right\rVert=\frac{\rho-\alpha\rho}{1+\rho}∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG if y^t=𝟎subscript^𝑦𝑡0\hat{y}_{t}=\mathbf{0}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_0. This satisfies the disturbance norm bound, and further yields yt=y^t+wtsubscript𝑦𝑡subscript^𝑦𝑡subscript𝑤𝑡y_{t}=\hat{y}_{t}+w_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where for non-zero y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG we have

yt=subscript𝑦𝑡absent\displaystyle y_{t}=italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = yt^(1+ραρ1+ρ(1y^t)y^t)^subscript𝑦𝑡1𝜌𝛼𝜌1𝜌1delimited-∥∥subscript^𝑦𝑡delimited-∥∥subscript^𝑦𝑡\displaystyle\;\hat{y_{t}}\cdot\left(1+\frac{\frac{\rho-\alpha\rho}{1+\rho}% \cdot(1-\left\lVert\hat{y}_{t}\right\rVert)}{\left\lVert\hat{y}_{t}\right% \rVert}\right)over^ start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ⋅ ( 1 + divide start_ARG divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ ( 1 - ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ) end_ARG start_ARG ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ end_ARG )

and thus for any y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG,

ytdelimited-∥∥subscript𝑦𝑡absent\displaystyle\left\lVert y_{t}\right\rVert\geq∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≥ y^t+ραρ1+ρ(1y^t)delimited-∥∥subscript^𝑦𝑡𝜌𝛼𝜌1𝜌1delimited-∥∥subscript^𝑦𝑡\displaystyle\;\left\lVert\hat{y}_{t}\right\rVert+\frac{\rho-\alpha\rho}{1+% \rho}\cdot(1-\left\lVert\hat{y}_{t}\right\rVert)∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ + divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ ( 1 - ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ )
\displaystyle\geq ραρ1+ρ,𝜌𝛼𝜌1𝜌\displaystyle\;\frac{\rho-\alpha\rho}{1+\rho},divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ,

yielding a loss ft(yt)Lραρ1+ρsubscript𝑓𝑡subscript𝑦𝑡𝐿𝜌𝛼𝜌1𝜌f_{t}(y_{t})\geq L\cdot\frac{\rho-\alpha\rho}{1+\rho}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_L ⋅ divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG at a disturbance cost of wt=ραρ1+ρ(1y^t)delimited-∥∥subscript𝑤𝑡𝜌𝛼𝜌1𝜌1delimited-∥∥subscript^𝑦𝑡\left\lVert w_{t}\right\rVert=\frac{\rho-\alpha\rho}{1+\rho}(1-\left\lVert\hat% {y}_{t}\right\rVert)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ( 1 - ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ). Assuming the adversary continues this strategy in each round until any disturbance budget E=t=1Twt𝐸superscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡E=\sum_{t=1}^{T}\left\lVert w_{t}\right\rVertitalic_E = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ is exhausted, this yields a regret for any algorithm of at least

RegT(𝒜)subscriptReg𝑇𝒜absent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})\geqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) ≥ min(LE,ραρ1+ρTL),𝐿𝐸𝜌𝛼𝜌1𝜌𝑇𝐿\displaystyle\;\min\left(LE,\frac{\rho-\alpha\rho}{1+\rho}TL\right),roman_min ( italic_L italic_E , divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG italic_T italic_L ) ,

as y=𝟎superscript𝑦0y^{*}=\mathbf{0}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = bold_0 obtains total loss 0. ∎

The disturbance upper bound is indeed necessary for ρ𝜌\rhoitalic_ρ-locally controllable dynamics. We show a sharp threshold effect at ρ1+ρπ(D(xt,yt1))𝜌1𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\frac{\rho}{1+\rho}\cdot\pi(D(x_{t},y_{t-1}))divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), wherein an adversary who is allowed to exceed this limit by any amount can force an algorithm to incur linear regret even with only a constant budget. Note that for any ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ] and α<0𝛼0\alpha<0italic_α < 0, there is some β[0,1)𝛽01\beta\in[0,1)italic_β ∈ [ 0 , 1 ) such that ραρ1+ρρ1+βρ𝜌𝛼𝜌1𝜌𝜌1𝛽𝜌\frac{\rho-\alpha\rho}{1+\rho}\geq\frac{\rho}{1+\beta\rho}divide start_ARG italic_ρ - italic_α italic_ρ end_ARG start_ARG 1 + italic_ρ end_ARG ≥ divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_β italic_ρ end_ARG.

Theorem 13.

Suppose an adversary can choose any state disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with wtρ1+βρπ(D(xt,yt1))delimited-∥∥subscript𝑤𝑡𝜌1𝛽𝜌𝜋𝐷subscript𝑥𝑡subscript𝑦𝑡1\left\lVert w_{t}\right\rVert\leq\frac{\rho}{1+\beta\rho}\cdot\pi\left(D(x_{t}% ,y_{t-1})\right)∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ⋅ italic_π ( italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), for any ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ] and any β[0,1)𝛽01\beta\in[0,1)italic_β ∈ [ 0 , 1 ). Then, there is a ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with convex losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains regret RegT(𝒜)=Θ(T)subscriptReg𝑇𝒜Θ𝑇\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=\Theta(T)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = roman_Θ ( italic_T ) even if t=1Twt=O(1)superscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝑂1\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert=O(1)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = italic_O ( 1 ).

  • Proof

    Consider any instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) where ρ𝜌\rhoitalic_ρ-local controllability exactly characterizes the range of D𝐷Ditalic_D, i.e. for any y𝑦yitalic_y and ysuperscript𝑦y^{\prime}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{\prime}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if and only if yρπ(y)(x,y)superscript𝑦subscript𝜌𝜋𝑦𝑥𝑦y^{\prime}\in\operatorname{\mathcal{B}}_{\rho\cdot\pi(y)}(x,y)italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_y ) end_POSTSUBSCRIPT ( italic_x , italic_y ).

    Let dt=π(yt)subscript𝑑𝑡𝜋subscript𝑦𝑡d_{t}=\pi(y_{t})italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for each round. Beginning at any round t𝑡titalic_t, suppose the adversary observes an action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which yields an update y^t=D(xt,yt1)subscript^𝑦𝑡𝐷subscript𝑥𝑡subscript𝑦𝑡1\hat{y}_{t}=D(x_{t},y_{t-1})over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). Let zt=argminybd(𝒴)yy^tsubscript𝑧𝑡subscriptargmin𝑦bd𝒴𝑦subscript^𝑦𝑡z_{t}=\operatorname*{argmin}_{y\in\operatorname*{bd}(\operatorname{\mathcal{Y}% })}\left\lVert y-\hat{y}_{t}\right\rVertitalic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_y ∈ roman_bd ( caligraphic_Y ) end_POSTSUBSCRIPT ∥ italic_y - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥, and suppose the adversary chooses the disturbance:

    wt=subscript𝑤𝑡absent\displaystyle w_{t}=italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = argminw:wρ1+βρπ(y^t)y^t+wtzt.subscriptargmin:𝑤delimited-∥∥𝑤𝜌1𝛽𝜌𝜋subscript^𝑦𝑡subscript^𝑦𝑡subscript𝑤𝑡subscript𝑧𝑡\displaystyle\;\operatorname*{argmin}_{w:\left\lVert w\right\rVert\leq\frac{% \rho}{1+\beta\rho}\cdot\pi\left(\hat{y}_{t}\right)}\left\lVert\hat{y}_{t}+w_{t% }-z_{t}\right\rVert.roman_argmin start_POSTSUBSCRIPT italic_w : ∥ italic_w ∥ ≤ divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ .

    This forces ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT closer to the boundary at each round, regardless of the choice of xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

    dt=subscript𝑑𝑡absent\displaystyle d_{t}=italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = (1ρ1+βρ)π(y^t)1𝜌1𝛽𝜌𝜋subscript^𝑦𝑡\displaystyle\;\left(1-\frac{\rho}{1+\beta\rho}\right)\cdot\pi(\hat{y}_{t})( 1 - divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) ⋅ italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq (1+ρρ1+βρρ21+βρ)dt11𝜌𝜌1𝛽𝜌superscript𝜌21𝛽𝜌subscript𝑑𝑡1\displaystyle\;\left(1+\rho-\frac{\rho}{1+\beta\rho}-\frac{\rho^{2}}{1+\beta% \rho}\right)d_{t-1}( 1 + italic_ρ - divide start_ARG italic_ρ end_ARG start_ARG 1 + italic_β italic_ρ end_ARG - divide start_ARG italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT (π(y^t)(1+ρ)dt1𝜋subscript^𝑦𝑡1𝜌subscript𝑑𝑡1\pi(\hat{y}_{t})\leq(1+\rho)d_{t-1}italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ ( 1 + italic_ρ ) italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT)
    \displaystyle\leq 1+βρ+βρ2ρ21+βρdt11𝛽𝜌𝛽superscript𝜌2superscript𝜌21𝛽𝜌subscript𝑑𝑡1\displaystyle\;\frac{1+\beta\rho+\beta\rho^{2}-\rho^{2}}{1+\beta\rho}d_{t-1}divide start_ARG 1 + italic_β italic_ρ + italic_β italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT
    \displaystyle\leq (1(1β)ρ21+βρ)dt1,11𝛽superscript𝜌21𝛽𝜌subscript𝑑𝑡1\displaystyle\;\left(1-\frac{(1-\beta)\rho^{2}}{1+\beta\rho}\right)d_{t-1},( 1 - divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ,

    where π(y^t)(1+ρ)dt1𝜋subscript^𝑦𝑡1𝜌subscript𝑑𝑡1\pi(\hat{y}_{t})\leq(1+\rho)d_{t-1}italic_π ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ ( 1 + italic_ρ ) italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT holds by our assumption on D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ). Assuming the adversary applies a disturbance wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT selected as above in each round tT𝑡𝑇t\leq Titalic_t ≤ italic_T, we have that

    dtsubscript𝑑𝑡absent\displaystyle d_{t}\leqitalic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ (1(1β)ρ21+βρ)td0,superscript11𝛽superscript𝜌21𝛽𝜌𝑡subscript𝑑0\displaystyle\;\left(1-\frac{(1-\beta)\rho^{2}}{1+\beta\rho}\right)^{t}\cdot d% _{0},( 1 - divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,

    where the magnitude of each disturbance is bounded by

    wtdelimited-∥∥subscript𝑤𝑡absent\displaystyle\left\lVert w_{t}\right\rVert\leq∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ ρ+ρ21+βρdt1𝜌superscript𝜌21𝛽𝜌subscript𝑑𝑡1\displaystyle\;\frac{\rho+\rho^{2}}{1+\beta\rho}d_{t-1}divide start_ARG italic_ρ + italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT
    \displaystyle\leq ρ+ρ21+βρ(1(1β)ρ21+βρ)t1d0,𝜌superscript𝜌21𝛽𝜌superscript11𝛽superscript𝜌21𝛽𝜌𝑡1subscript𝑑0\displaystyle\;\frac{\rho+\rho^{2}}{1+\beta\rho}\left(1-\frac{(1-\beta)\rho^{2% }}{1+\beta\rho}\right)^{t-1}\cdot d_{0},divide start_ARG italic_ρ + italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ( 1 - divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,

    where we take the initial state distance to the boundary d0=π(y0)subscript𝑑0𝜋subscript𝑦0d_{0}=\pi(y_{0})italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_π ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) to be a constant bounded away from zero. This yields that the sum of disturbance magnitudes E=t=1Twt𝐸superscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡E=\sum_{t=1}^{T}\left\lVert w_{t}\right\rVertitalic_E = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ is at most:

    t=1Twtsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡absent\displaystyle\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ d0ρ+ρ21+βρt=1T(1(1β)ρ21+βρ)t1subscript𝑑0𝜌superscript𝜌21𝛽𝜌superscriptsubscript𝑡1𝑇superscript11𝛽superscript𝜌21𝛽𝜌𝑡1\displaystyle\;d_{0}\frac{\rho+\rho^{2}}{1+\beta\rho}\cdot\sum_{t=1}^{T}\left(% 1-\frac{(1-\beta)\rho^{2}}{1+\beta\rho}\right)^{t-1}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG italic_ρ + italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( 1 - divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT
    \displaystyle\leq d0ρ+ρ2(1β)ρ2subscript𝑑0𝜌superscript𝜌21𝛽superscript𝜌2\displaystyle\;d_{0}\cdot\frac{\rho+\rho^{2}}{(1-\beta)\rho^{2}}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ divide start_ARG italic_ρ + italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
    =\displaystyle== O(1).𝑂1\displaystyle\;O(1).italic_O ( 1 ) .

    Now suppose that the loss at each round is given by ft(yt)=yty0subscript𝑓𝑡subscript𝑦𝑡delimited-∥∥subscript𝑦𝑡subscript𝑦0f_{t}(y_{t})=\left\lVert y_{t}-y_{0}\right\rVertitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥. Then, our regret with respect to y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is at least:

    t=1Tft(yt)ft(y0)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript𝑦0absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(y_{0})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ t=1Td0dtsuperscriptsubscript𝑡1𝑇subscript𝑑0subscript𝑑𝑡\displaystyle\;\sum_{t=1}^{T}d_{0}-d_{t}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
    \displaystyle\leq d0(Tt=1T(1β)ρ21+βρ)subscript𝑑0𝑇superscriptsubscript𝑡1𝑇1𝛽superscript𝜌21𝛽𝜌\displaystyle\;d_{0}\left(T-\sum_{t=1}^{T}\frac{(1-\beta)\rho^{2}}{1+\beta\rho% }\right)italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_T - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG )
    \displaystyle\leq d0(T1(1β)ρ21+βρ(1β)ρ21+βρ)subscript𝑑0𝑇11𝛽superscript𝜌21𝛽𝜌1𝛽superscript𝜌21𝛽𝜌\displaystyle\;d_{0}\left(T-\frac{1-\frac{(1-\beta)\rho^{2}}{1+\beta\rho}}{% \frac{(1-\beta)\rho^{2}}{1+\beta\rho}}\right)italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_T - divide start_ARG 1 - divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG end_ARG start_ARG divide start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β italic_ρ end_ARG end_ARG )
    \displaystyle\leq d0(T1+βρ(1β)ρ2)subscript𝑑0𝑇1𝛽𝜌1𝛽superscript𝜌2\displaystyle\;d_{0}\left(T-\frac{1+\beta\rho}{(1-\beta)\rho^{2}}\right)italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_T - divide start_ARG 1 + italic_β italic_ρ end_ARG start_ARG ( 1 - italic_β ) italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
    =\displaystyle== Θ(T).Θ𝑇\displaystyle\;\Theta(T).roman_Θ ( italic_T ) .

Together, the previous three theorems yield Theorem 2.

E.2 NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap and Proofs for Theorem 3

We can remove the bound on the maximum disturbance for strongly locally controllable instances, as the feasible update sets do not vanish at the boundary of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y. Recall that an instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) satisfies strong ρ𝜌\rhoitalic_ρ-local controllability for ρ>0𝜌0\rho>0italic_ρ > 0 if, for any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y and yρ(y)𝒴superscript𝑦subscript𝜌𝑦𝒴y^{*}\in\operatorname{\mathcal{B}}_{\rho}(y)\cap\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_y ) ∩ caligraphic_Y, there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{*}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We assume without loss of generality that ρ2R𝜌2𝑅\rho\leq 2Ritalic_ρ ≤ 2 italic_R, where R𝑅Ritalic_R is the radius of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y.

Intuitively, our algorithm tracks the target state which would be chosen by FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl in the absence of all disturbances (by recording the loss counterfactual loss rather than the one truly experienced), and always seeks to minimize distance to that state.

Algorithm 4 NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl with Unbounded Disturbances (NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap).
  Initialize FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl for T𝑇Titalic_T rounds over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y with step size η=GγTL2𝜂𝐺𝛾𝑇superscript𝐿2\eta=\sqrt{\frac{G\gamma}{TL^{2}}}italic_η = square-root start_ARG divide start_ARG italic_G italic_γ end_ARG start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG.
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Let y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the target state chosen by FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl.
     Use Oracle(yt1,y^t)Oraclesubscript𝑦𝑡1subscript^𝑦𝑡\texttt{Oracle}(y_{t-1},\hat{y}_{t})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to compute xt=argminx𝒳D(x,yt1)y^t2x_{t}=\operatorname*{argmin}_{x\in\operatorname{\mathcal{X}}}\left\lVert D(x,y% _{t-1})-\hat{y}_{t}\right\rVert^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT ∥ italic_D ( italic_x , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
     Play action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
     Observe disturbed state yt=D(xt,yt1)+wtsubscript𝑦𝑡𝐷subscript𝑥𝑡subscript𝑦𝑡1subscript𝑤𝑡y_{t}=D(x_{t},y_{t-1})+w_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
     Update FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl with state y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(y^t)subscript𝑓𝑡subscript^𝑦𝑡f_{t}(\hat{y}_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
  end for

Theorem 14.

For a strongly ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with convex losses ft:𝒴:subscript𝑓𝑡𝒴f_{t}:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_Y → blackboard_R and adversarial disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT where t=1TwtEsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_E, the regret of NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap is bounded by

RegT(NestedOCO-UD)subscriptReg𝑇NestedOCO-UDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-UD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrluap ) ≤ O(T+Eρ1)𝑂𝑇𝐸superscript𝜌1\displaystyle\;O\left(\sqrt{T}+E\cdot\rho^{-1}\right)italic_O ( square-root start_ARG italic_T end_ARG + italic_E ⋅ italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )

with respect to the reward of any state, with T𝑇Titalic_T queries made to an oracle for non-convex optimization.

  • Proof

    We begin by bounding the total state error t=1tyty^tsuperscriptsubscript𝑡1𝑡delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡\sum_{t=1}^{t}\left\lVert y_{t}-\hat{y}_{t}\right\rVert∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ across rounds. First, note that for any fixed ρ>0𝜌0\rho>0italic_ρ > 0, and any desired α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), we have that ηLγρα𝜂𝐿𝛾𝜌𝛼\eta\frac{L}{\gamma}\leq\rho\alphaitalic_η divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG ≤ italic_ρ italic_α for sufficiently large T𝑇Titalic_T, as ηLγ=GTγ𝜂𝐿𝛾𝐺𝑇𝛾\eta\frac{L}{\gamma}=\sqrt{\frac{G}{T\gamma}}italic_η divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG = square-root start_ARG divide start_ARG italic_G end_ARG start_ARG italic_T italic_γ end_ARG end_ARG; we assume this holds for any given choice of α𝛼\alphaitalic_α, and so we have that y^t+1y^tραdelimited-∥∥subscript^𝑦𝑡1subscript^𝑦𝑡𝜌𝛼\left\lVert\hat{y}_{t+1}-\hat{y}_{t}\right\rVert\leq\rho\alpha∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_ρ italic_α by Proposition 5. For a total disturbance budget E𝐸Eitalic_E, we separately consider disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT depending on whether or not the accumulated disturbance error up to wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is driven to 0 in the next round. Define W+subscript𝑊W_{+}italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and Wsubscript𝑊W_{-}italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT as:

    W+=subscript𝑊absent\displaystyle W_{+}=italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = {wt:D(xt+1,yt)y^t+1}conditional-setsubscript𝑤𝑡𝐷subscript𝑥𝑡1subscript𝑦𝑡subscript^𝑦𝑡1\displaystyle\;\{w_{t}:D(x_{t+1},y_{t})\neq\hat{y}_{t+1}\}{ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_D ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT }

    and

    W=subscript𝑊absent\displaystyle W_{-}=italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT = {wt:D(xt+1,yt)=y^t+1}conditional-setsubscript𝑤𝑡𝐷subscript𝑥𝑡1subscript𝑦𝑡subscript^𝑦𝑡1\displaystyle\;\{w_{t}:D(x_{t+1},y_{t})=\hat{y}_{t+1}\}{ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_D ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT }

    with E+=wtW+wtsubscript𝐸subscriptsubscript𝑤𝑡subscript𝑊delimited-∥∥subscript𝑤𝑡E_{+}=\sum_{w_{t}\in W_{+}}\left\lVert w_{t}\right\rVertitalic_E start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ and E=wtWwtsubscript𝐸subscriptsubscript𝑤𝑡subscript𝑊delimited-∥∥subscript𝑤𝑡E_{-}=\sum_{w_{t}\in W_{-}}\left\lVert w_{t}\right\rVertitalic_E start_POSTSUBSCRIPT - end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥. First, observe that at each round t𝑡titalic_t corresponding to wtWsubscript𝑤𝑡subscript𝑊w_{t}\in W_{-}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT, given that y^t+1ytρdelimited-∥∥subscript^𝑦𝑡1subscript𝑦𝑡𝜌\left\lVert\hat{y}_{t+1}-y_{t}\right\rVert\leq\rho∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_ρ we have that wt=yty^t(1+α)ρdelimited-∥∥subscript𝑤𝑡delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡1𝛼𝜌\left\lVert w_{t}\right\rVert=\left\lVert y_{t}-\hat{y}_{t}\right\rVert\leq(1+% \alpha)\rho∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ ( 1 + italic_α ) italic_ρ, as y^t+1y^tαρdelimited-∥∥subscript^𝑦𝑡1subscript^𝑦𝑡𝛼𝜌\left\lVert\hat{y}_{t+1}-\hat{y}_{t}\right\rVert\leq\alpha\rho∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_α italic_ρ. As such, we have that

    t:wtWft(yt)ft(y^t)subscript:𝑡subscript𝑤𝑡subscript𝑊subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript^𝑦𝑡absent\displaystyle\sum_{t:w_{t}\in W_{-}}f_{t}(y_{t})-f_{t}(\hat{y}_{t})\leq∑ start_POSTSUBSCRIPT italic_t : italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ t:wtWLyty^tsubscript:𝑡subscript𝑤𝑡subscript𝑊𝐿delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡\displaystyle\;\sum_{t:w_{t}\in W_{-}}L\left\lVert y_{t}-\hat{y}_{t}\right\rVert∑ start_POSTSUBSCRIPT italic_t : italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥
    \displaystyle\leq (1+α)LE.1𝛼𝐿subscript𝐸\displaystyle\;(1+\alpha)LE_{-}.( 1 + italic_α ) italic_L italic_E start_POSTSUBSCRIPT - end_POSTSUBSCRIPT .

    Next, consider any wtW+subscript𝑤𝑡subscript𝑊w_{t}\in W_{+}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. As our instance is strongly ρ𝜌\rhoitalic_ρ-locally controllable, we must have that y^t+1yt>ρdelimited-∥∥subscript^𝑦𝑡1subscript𝑦𝑡𝜌\left\lVert\hat{y}_{t+1}-y_{t}\right\rVert>\rho∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ > italic_ρ, as otherwise there would some feasible action xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT which would be selected that would yield wtWsubscript𝑤𝑡subscript𝑊w_{t}\in W_{-}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT - end_POSTSUBSCRIPT. Since y^t+1y^tαρdelimited-∥∥subscript^𝑦𝑡1subscript^𝑦𝑡𝛼𝜌\left\lVert\hat{y}_{t+1}-\hat{y}_{t}\right\rVert\leq\alpha\rho∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_α italic_ρ, it then must be the case that wt=yty^t>(1α)ρdelimited-∥∥subscript𝑤𝑡delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡1𝛼𝜌\left\lVert w_{t}\right\rVert=\left\lVert y_{t}-\hat{y}_{t}\right\rVert>(1-% \alpha)\rho∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ > ( 1 - italic_α ) italic_ρ, and so we can bound the number of disturbances in W+subscript𝑊W_{+}italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT as:

    |W+|subscript𝑊absent\displaystyle\left\lvert W_{+}\right\rvert\leq| italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | ≤ E+(1α)ρ.subscript𝐸1𝛼𝜌\displaystyle\;\frac{E_{+}}{(1-\alpha)\rho}.divide start_ARG italic_E start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_α ) italic_ρ end_ARG .

    Assuming a maximal distance y^tyt=2Rdelimited-∥∥subscript^𝑦𝑡subscript𝑦𝑡2𝑅\left\lVert\hat{y}_{t}-y_{t}\right\rVert=2R∥ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = 2 italic_R for each round t𝑡titalic_t corresponding to some wtW+subscript𝑤𝑡subscript𝑊w_{t}\in W_{+}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, this yields

    t:wtW+ft(yt)ft(y^t)subscript:𝑡subscript𝑤𝑡subscript𝑊subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript^𝑦𝑡absent\displaystyle\sum_{t:w_{t}\in W_{+}}f_{t}(y_{t})-f_{t}(\hat{y}_{t})\leq∑ start_POSTSUBSCRIPT italic_t : italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ t:wtW+Lyty^tsubscript:𝑡subscript𝑤𝑡subscript𝑊𝐿delimited-∥∥subscript𝑦𝑡subscript^𝑦𝑡\displaystyle\;\sum_{t:w_{t}\in W_{+}}L\left\lVert y_{t}-\hat{y}_{t}\right\rVert∑ start_POSTSUBSCRIPT italic_t : italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥
    \displaystyle\leq 2LRE+(1α)ρ2𝐿𝑅subscript𝐸1𝛼𝜌\displaystyle\;\frac{2LRE_{+}}{(1-\alpha)\rho}divide start_ARG 2 italic_L italic_R italic_E start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_α ) italic_ρ end_ARG

    We can assume α𝛼\alphaitalic_α is small enough to yield 2Rρ(1+α)(1α)2𝑅𝜌1𝛼1𝛼\frac{2R}{\rho}\geq(1+\alpha)\cdot(1-\alpha)divide start_ARG 2 italic_R end_ARG start_ARG italic_ρ end_ARG ≥ ( 1 + italic_α ) ⋅ ( 1 - italic_α ), and so we have

    t=1Tft(yt)ft(y^t)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript^𝑦𝑡absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(\hat{y}_{t})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ 2LRE(1α)ρ.2𝐿𝑅𝐸1𝛼𝜌\displaystyle\;\frac{2LRE}{(1-\alpha)\rho}.divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG ( 1 - italic_α ) italic_ρ end_ARG .

    The regret bound for FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl holds over the states y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and so we can bound the total regret of NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap with respect to any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y as:

    t=1Tft(yt)ft(y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ t=1Tft(y^t)ft(y)+t=1Tft(yt)ft(y^t)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript^𝑦𝑡subscript𝑓𝑡superscript𝑦superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript^𝑦𝑡\displaystyle\;\sum_{t=1}^{T}f_{t}(\hat{y}_{t})-f_{t}(y^{*})+\sum_{t=1}^{T}f_{% t}(y_{t})-f_{t}(\hat{y}_{t})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq ηTL2γ+Gη+2LRE(1α)ρ𝜂𝑇superscript𝐿2𝛾𝐺𝜂2𝐿𝑅𝐸1𝛼𝜌\displaystyle\;\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}+\frac{2LRE}{(1-\alpha)\rho}italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG ( 1 - italic_α ) italic_ρ end_ARG (Prop. 4)
    \displaystyle\leq  2TGL2γ+2LRE(1α)ρ.2𝑇𝐺superscript𝐿2𝛾2𝐿𝑅𝐸1𝛼𝜌\displaystyle\;2\sqrt{\frac{TGL^{2}}{\gamma}}+\frac{2LRE}{(1-\alpha)\rho}.2 square-root start_ARG divide start_ARG italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG + divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG ( 1 - italic_α ) italic_ρ end_ARG .

Theorem 15 (Regret Lower Bound for Unbounded Disturbances).

Suppose an adversary can choose any state disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with t=1Twt=Esuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert=E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = italic_E. For any ρ(0,1]𝜌01\rho\in(0,1]italic_ρ ∈ ( 0 , 1 ], there is a strongly ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) with convex losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that any algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A obtains regret RegT(𝒜)=min(2LREρ,2TLR)subscriptReg𝑇𝒜2𝐿𝑅𝐸𝜌2𝑇𝐿𝑅\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=\min(\frac{2LRE}% {\rho},2TLR)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = roman_min ( divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG italic_ρ end_ARG , 2 italic_T italic_L italic_R ).

  • Proof

    Let 𝒴=[R,R]𝒴𝑅𝑅\operatorname{\mathcal{Y}}=[-R,R]caligraphic_Y = [ - italic_R , italic_R ] for any R>0𝑅0R>0italic_R > 0 and let ft(yt)=Lyt+LRsubscript𝑓𝑡subscript𝑦𝑡𝐿subscript𝑦𝑡𝐿𝑅f_{t}(y_{t})=-Ly_{t}+LRitalic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = - italic_L italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L italic_R for each y𝑦yitalic_y. Suppose strong ρ𝜌\rhoitalic_ρ-local controllability exactly characterizes the range of D𝐷Ditalic_D, i.e. for any y,y𝒴𝑦superscript𝑦𝒴y,y^{\prime}\in\operatorname{\mathcal{Y}}italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y there is some x𝑥xitalic_x such that D(x,y)=y𝐷𝑥𝑦superscript𝑦D(x,y)=y^{\prime}italic_D ( italic_x , italic_y ) = italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if and only if |yy|ρ𝑦superscript𝑦𝜌\left\lvert y-y^{\prime}\right\rvert\leq\rho| italic_y - italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ≤ italic_ρ. Consider an adversary who chooses disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in each round such that yt=Rsubscript𝑦𝑡𝑅y_{t}=-Ritalic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_R until their disturbance budget E𝐸Eitalic_E is exhausted. This requires a disturbance of magnitude at most R+ρ𝑅𝜌R+\rhoitalic_R + italic_ρ for w1subscript𝑤1w_{1}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, as we assume y0=0subscript𝑦00y_{0}=0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, and at most ρ𝜌\rhoitalic_ρ in subsequent rounds, and thus the adversary can force any algorithm to remain at yt=Rsubscript𝑦𝑡𝑅y_{t}=-Ritalic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_R for (ER)ρ1𝐸𝑅superscript𝜌1({E-R}){\rho^{-1}}( italic_E - italic_R ) italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT rounds.

    As such, any algorithm must incur loss of at least 2LR(ER)ρ12𝐿𝑅𝐸𝑅superscript𝜌1{2LR(E-R)}\rho^{-1}2 italic_L italic_R ( italic_E - italic_R ) italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT across these rounds, and further must incur average loss LR𝐿𝑅LRitalic_L italic_R over the subsequent 2Rρ12𝑅superscript𝜌1{2R}\rho^{-1}2 italic_R italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT rounds (if T𝑇Titalic_T is not yet reached), for an additional loss of 2LR2ρ12𝐿superscript𝑅2superscript𝜌1{2LR^{2}}{\rho^{-1}}2 italic_L italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, as they can only decrease per-round loss by Lρ𝐿𝜌L\rhoitalic_L italic_ρ given the restriction on the range of D𝐷Ditalic_D. As the optimal state y=Rsuperscript𝑦𝑅y^{*}=Ritalic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_R obtains loss 0, the total regret is at least:

    t=1Tft(yt)ft(y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}(y_{t})-f_{t}(y^{*})\geq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ min(2LREρ,2TLR).2𝐿𝑅𝐸𝜌2𝑇𝐿𝑅\displaystyle\;\min\left(\frac{2LRE}{\rho},2TLR\right).roman_min ( divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG italic_ρ end_ARG , 2 italic_T italic_L italic_R ) .

Together, the previous two theorems yield Theorem 3. Note that for both algorithms it remains computationally efficient to optimize over action-linear dynamics, as the constraint that D(x,yt1)𝒴𝐷𝑥subscript𝑦𝑡1𝒴D(x,y_{t-1})\in\operatorname{\mathcal{Y}}italic_D ( italic_x , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ∈ caligraphic_Y can be encoded as a convex contraint over 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X.

Appendix F Unknown Dynamics: Analysis for ProbingOCOProbingOCO\operatorname{\textup{{ProbingOCO}}}probingoco

Algorithm 5 Probing Online Convex Optimization (ProbingOCOProbingOCO\operatorname{\textup{{ProbingOCO}}}probingoco).
  Let n=dim(𝒳)𝑛dimension𝒳n=\dim(\operatorname{\mathcal{X}})italic_n = roman_dim ( caligraphic_X ), let y0=𝟎subscript𝑦00y_{0}=\mathbf{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0, and let x1𝒳subscript𝑥1𝒳x_{1}\in\operatorname{\mathcal{X}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_X such that D(x1,y0)y0ϵ=o(T)delimited-∥∥𝐷subscript𝑥1subscript𝑦0subscript𝑦0italic-ϵ𝑜𝑇\left\lVert D(x_{1},y_{0})-y_{0}\right\rVert\leq\epsilon=o(\sqrt{T})∥ italic_D ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ italic_ϵ = italic_o ( square-root start_ARG italic_T end_ARG )
  Initialize NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap to run over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y for T/(2n+1)𝑇2𝑛1T/(2n+1)italic_T / ( 2 italic_n + 1 ) rounds
  Run Estimate for 2n+12𝑛12n+12 italic_n + 1 rounds:
  Play x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
  for i=1𝑖1i=1italic_i = 1 to n𝑛nitalic_n do
     Play x1+ϵeisubscript𝑥1italic-ϵsubscript𝑒𝑖x_{1}+\epsilon\cdot e_{i}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ ⋅ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
     Play x1ϵeisubscript𝑥1italic-ϵsubscript𝑒𝑖x_{1}-\epsilon\cdot e_{i}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_ϵ ⋅ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
  end for
  Solve for estimates (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) which are consistent with with the previous 2n+12𝑛12n+12 italic_n + 1 observed state updates, up to error O(ϵ)𝑂italic-ϵO(\epsilon)italic_O ( italic_ϵ )
  for t=2n+1𝑡2𝑛1t=2n+1italic_t = 2 italic_n + 1 to T𝑇Titalic_T do
     Let t=tsuperscript𝑡𝑡t^{*}=titalic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_t
     Using (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ), target y=yt𝑦subscript𝑦superscript𝑡y=y_{t^{*}}italic_y = italic_y start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
     Let ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the next point chosen by NestedOCO-BDNestedOCO-BD\operatorname{\textup{{NestedOCO-BD}}}oenftrlap
     for i=1𝑖1i=1italic_i = 1 to n𝑛nitalic_n do
        Using (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ), target y=yt+2i12n(yyt)+ϵei𝑦subscript𝑦superscript𝑡2𝑖12𝑛superscript𝑦subscript𝑦superscript𝑡italic-ϵsubscript𝑒𝑖y=y_{t^{*}}+\frac{2i-1}{2n}(y^{*}-y_{t^{*}})+\epsilon\cdot e_{i}italic_y = italic_y start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + divide start_ARG 2 italic_i - 1 end_ARG start_ARG 2 italic_n end_ARG ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) + italic_ϵ ⋅ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
        Using (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ), target y=yt+2i2n(yyt)ϵei𝑦subscript𝑦superscript𝑡2𝑖2𝑛superscript𝑦subscript𝑦superscript𝑡italic-ϵsubscript𝑒𝑖y=y_{t^{*}}+\frac{2i}{2n}(y^{*}-y_{t^{*}})-\epsilon\cdot e_{i}italic_y = italic_y start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + divide start_ARG 2 italic_i end_ARG start_ARG 2 italic_n end_ARG ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - italic_ϵ ⋅ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
     end for
     Update estimates (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ), solving for values which are consistent with the previous 2n+12𝑛12n+12 italic_n + 1 observed state updates, up to error O(ϵ)𝑂italic-ϵO(\epsilon)italic_O ( italic_ϵ )
  end for
  • Proof

    of Theorem 4 Assume the following hold for D(x,y)𝐷𝑥𝑦D(x,y)italic_D ( italic_x , italic_y ) at each y𝑦yitalic_y:

    • D(x,y)=Ayx+by+y+qy(x)𝐷𝑥𝑦subscript𝐴𝑦𝑥subscript𝑏𝑦𝑦subscript𝑞𝑦𝑥D(x,y)=A_{y}\cdot x+b_{y}+y+q_{y}(x)italic_D ( italic_x , italic_y ) = italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_y + italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x ), for a function qy:𝒳n:subscript𝑞𝑦𝒳superscript𝑛q_{y}:\operatorname{\mathcal{X}}\rightarrow\operatorname{\mathbb{R}}^{n}italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT : caligraphic_X → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT;

    • Aysubscript𝐴𝑦A_{y}italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT has a largest absolute eigenvalue bounded by an absolute constant, smallest absolute eigenvalue bounded away from 0, and is Lαsubscript𝐿𝛼L_{\alpha}italic_L start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT-Lipschitz in the matrix 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm;

    • bysubscript𝑏𝑦b_{y}italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT has a norm bounded by an absolute constant, and is Lβsubscript𝐿𝛽L_{\beta}italic_L start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT-Lipschitz;

    • qy(x)ϵdelimited-∥∥subscript𝑞𝑦𝑥italic-ϵ\left\lVert q_{y}(x)\right\rVert\leq\epsilon∥ italic_q start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ italic_ϵ for any x𝑥xitalic_x such that Ayx+byy=O(T)delimited-∥∥subscript𝐴𝑦𝑥subscript𝑏𝑦𝑦𝑂𝑇\left\lVert A_{y}\cdot x+b_{y}-y\right\rVert=O(\sqrt{T})∥ italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ⋅ italic_x + italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_y ∥ = italic_O ( square-root start_ARG italic_T end_ARG ).

    In the neighborhood of any ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, observe that playing x=Ay1(yyby)𝑥subscriptsuperscript𝐴1𝑦superscript𝑦𝑦subscript𝑏𝑦x=A^{-1}_{y}(y^{*}-y-b_{y})italic_x = italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y - italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) yields an update to y+wϵsuperscript𝑦subscript𝑤italic-ϵy^{*}+w_{\epsilon}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_w start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT, where the error term wϵsubscript𝑤italic-ϵw_{\epsilon}italic_w start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT has magnitude bounded linearly in terms of the neighborhood size as well as polynomial in the relevant constants. We assume sufficiently small values of ϵitalic-ϵ\epsilonitalic_ϵ, Lαsubscript𝐿𝛼L_{\alpha}italic_L start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT, and Lβsubscript𝐿𝛽L_{\beta}italic_L start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT (whose relative bounds may trade off with each other, and in general will be inverse-polynomial in problem parameters other than T𝑇Titalic_T) to bound the error of this process in accordance with the requirements of Theorem 2, as well as to ensure that estimation error for (A^y,b^y)subscript^𝐴𝑦subscript^𝑏𝑦(\hat{A}_{y},\hat{b}_{y})( over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) is uniformly bounded for all tT𝑡𝑇t\leq Titalic_t ≤ italic_T. Given ϵ=o(T)italic-ϵ𝑜𝑇\epsilon=o(\sqrt{T})italic_ϵ = italic_o ( square-root start_ARG italic_T end_ARG ), this yields estimation error terms wtCTsubscript𝑤𝑡𝐶𝑇w_{t}\leq C\sqrt{T}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_C square-root start_ARG italic_T end_ARG in each round, for small enough C𝐶Citalic_C to obtain the obtain the desired regret bound. ∎

Appendix G Bandit Feedback: Analysis for NestedBCONestedBCO\operatorname{\textup{{NestedBCO}}}nestedbco

We first state the FKMFKM\operatorname{\textup{{FKM}}}fkm algorithm and its bounds for regret and per-round step size.

Algorithm 6 FKMFKM\operatorname{\textup{{FKM}}}fkm (Flaxman et al., 2004)
  Input: decision set 𝒦𝒦\operatorname{\mathcal{K}}caligraphic_K containing 𝟎0\mathbf{0}bold_0, set v1=𝟎subscript𝑣10v_{1}=\mathbf{0}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_0, parameters η,δ~𝜂~𝛿\eta,\tilde{\delta}italic_η , over~ start_ARG italic_δ end_ARG.
  Let v1int(𝒦)subscript𝑣1int𝒦v_{1}\in\operatorname*{int}(\operatorname{\mathcal{K}})italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_int ( caligraphic_K ) such that (v1)=0subscript𝑣10\nabla\mathcal{R}(v_{1})=0∇ caligraphic_R ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0,
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Draw ut𝕊subscript𝑢𝑡𝕊u_{t}\in\mathbb{S}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_S uniformly, set yt=vt+δ~utsubscript𝑦𝑡subscript𝑣𝑡~𝛿subscript𝑢𝑡y_{t}=v_{t}+\tilde{\delta}u_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over~ start_ARG italic_δ end_ARG italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
     Play ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, observe loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), set gt=nδ~ft(yt)utsubscript𝑔𝑡𝑛~𝛿subscript𝑓𝑡subscript𝑦𝑡subscript𝑢𝑡g_{t}=\frac{n}{\tilde{\delta}}f_{t}(y_{t})u_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_n end_ARG start_ARG over~ start_ARG italic_δ end_ARG end_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
     Update vt+1=Π𝒦δ~[vtηgt]subscript𝑣𝑡1subscriptΠsubscript𝒦~𝛿delimited-[]subscript𝑣𝑡𝜂subscript𝑔𝑡v_{t+1}={\Pi}_{\operatorname{\mathcal{K}}_{\tilde{\delta}}}\left[v_{t}-\eta g_% {t}\right]italic_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_K start_POSTSUBSCRIPT over~ start_ARG italic_δ end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ], where 𝒦δ~={(1δ~)v:v𝒦}subscript𝒦~𝛿conditional-set1~𝛿𝑣𝑣𝒦\operatorname{\mathcal{K}}_{\tilde{\delta}}=\{(1-\tilde{\delta})v:v\in% \operatorname{\mathcal{K}}\}caligraphic_K start_POSTSUBSCRIPT over~ start_ARG italic_δ end_ARG end_POSTSUBSCRIPT = { ( 1 - over~ start_ARG italic_δ end_ARG ) italic_v : italic_v ∈ caligraphic_K }
  end for
Proposition 6 (Flaxman et al. (2004)).

For L𝐿Litalic_L-Lipschitz convex losses and a domain 𝒦𝒦\operatorname{\mathcal{K}}caligraphic_K with diameter 2R2𝑅2R2 italic_R which contains a ball of radius r𝑟ritalic_r around the origin, FKMFKM\operatorname{\textup{{FKM}}}fkm obtains expected regret

RegT(FKM)subscriptReg𝑇FKMabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{FKM}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( fkm ) ≤ ηn2δ~2T+4R2ηr2+8δ~RLTr,𝜂superscript𝑛2superscript~𝛿2𝑇4superscript𝑅2𝜂superscript𝑟28~𝛿𝑅𝐿𝑇𝑟\displaystyle\;\eta\frac{n^{2}}{\tilde{\delta}^{2}}T+\frac{4R^{2}}{\eta r^{2}}% +\frac{8\tilde{\delta}RLT}{r},italic_η divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over~ start_ARG italic_δ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_T + divide start_ARG 4 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_η italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 8 over~ start_ARG italic_δ end_ARG italic_R italic_L italic_T end_ARG start_ARG italic_r end_ARG ,

with each point ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT contained in 𝒦𝒦\operatorname{\mathcal{K}}caligraphic_K. Further, each pair of consecutive points ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT chosen by FKMFKM\operatorname{\textup{{FKM}}}fkm satisfies yt+1yt2δ~+ηnLδ~delimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡2~𝛿𝜂𝑛𝐿~𝛿\left\lVert y_{t+1}-y_{t}\right\rVert\leq 2\tilde{\delta}+\frac{\eta nL}{% \tilde{\delta}}∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ 2 over~ start_ARG italic_δ end_ARG + divide start_ARG italic_η italic_n italic_L end_ARG start_ARG over~ start_ARG italic_δ end_ARG end_ARG.

The NestedBCONestedBCO\operatorname{\textup{{NestedBCO}}}nestedbco algorithm is essentially equivalent to NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl, replacing FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl with FKMFKM\operatorname{\textup{{FKM}}}fkm and recalibrating parameters.

Algorithm 7 Nested Bandit Convex Optimization (NestedBCONestedBCO\operatorname{\textup{{NestedBCO}}}nestedbco).
  Let δ~=1T1/4=rδρ/4~𝛿1superscript𝑇14𝑟𝛿𝜌4\tilde{\delta}=\frac{1}{T^{1/4}}=r\delta\rho/4over~ start_ARG italic_δ end_ARG = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG = italic_r italic_δ italic_ρ / 4, let η=R2nrLT3/4𝜂𝑅2𝑛𝑟𝐿superscript𝑇34\eta=\frac{R}{2nrLT^{3/4}}italic_η = divide start_ARG italic_R end_ARG start_ARG 2 italic_n italic_r italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG
  Let 𝒴~={y:11δy𝒴}~𝒴conditional-set𝑦11𝛿𝑦𝒴\widetilde{\operatorname{\mathcal{Y}}}=\{y:\frac{1}{1-\delta}y\in\operatorname% {\mathcal{Y}}\}over~ start_ARG caligraphic_Y end_ARG = { italic_y : divide start_ARG 1 end_ARG start_ARG 1 - italic_δ end_ARG italic_y ∈ caligraphic_Y }
  Initialize FKMFKM\operatorname{\textup{{FKM}}}fkm to run for T𝑇Titalic_T rounds over 𝒴~~𝒴\widetilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG with parameters η,δ~𝜂~𝛿\eta,\tilde{\delta}italic_η , over~ start_ARG italic_δ end_ARG
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Let ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the point chosen by FKMFKM\operatorname{\textup{{FKM}}}fkm
     Use Oracle(yt1,y)Oraclesubscript𝑦𝑡1superscript𝑦\texttt{Oracle}(y_{t-1},y^{*})Oracle ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) to compute xt=argminxDt(x,yt1)y2x_{t}=\operatorname*{argmin}_{x}\left\lVert D_{t}(x,y_{t-1})-y^{*}\right\rVert% ^{2}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
     Play action xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
     Observe ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and loss ft(yt)subscript𝑓𝑡subscript𝑦𝑡f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), update SCRIBLESCRIBLE\operatorname{\textup{{SCRIBLE}}}scrible
  end for
  • Proof

    of Theorem 5. Following the proof of Theorem 1, to apply the bound of FKMFKM\operatorname{\textup{{FKM}}}fkm to our setting (along with excess regret at most δLR𝛿𝐿𝑅\delta LRitalic_δ italic_L italic_R per round from contracting 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y to 𝒴~~𝒴\widetilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG), the key step is to show that each point selected by FKMFKM\operatorname{\textup{{FKM}}}fkm is feasible under weakly locally controllable dynamics over 𝒴~~𝒴\widetilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG, i.e. yt+1ytrδρdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡𝑟𝛿𝜌\left\lVert y_{t+1}-y_{t}\right\rVert\leq r\delta\rho∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_r italic_δ italic_ρ. Let δ~=1T1/4=rδρ/4~𝛿1superscript𝑇14𝑟𝛿𝜌4\tilde{\delta}=\frac{1}{T^{1/4}}=r\delta\rho/4over~ start_ARG italic_δ end_ARG = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG = italic_r italic_δ italic_ρ / 4, and let η=R2nrLT3/4𝜂𝑅2𝑛𝑟𝐿superscript𝑇34\eta=\frac{R}{2nrLT^{3/4}}italic_η = divide start_ARG italic_R end_ARG start_ARG 2 italic_n italic_r italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG. Assume for simplicity that r1𝑟1r\leq 1italic_r ≤ 1 and T1/4Rrsuperscript𝑇14𝑅𝑟T^{1/4}\geq\frac{R}{r}italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG. When instantiating FKMFKM\operatorname{\textup{{FKM}}}fkm over 𝒴~~𝒴\widetilde{\operatorname{\mathcal{Y}}}over~ start_ARG caligraphic_Y end_ARG with parameters η𝜂\etaitalic_η and δ~~𝛿\tilde{\delta}over~ start_ARG italic_δ end_ARG, by Proposition 6 we then have

    yt+1ytdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡absent\displaystyle\left\lVert y_{t+1}-y_{t}\right\rVert\leq∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤  2δ~+ηnLδ~2~𝛿𝜂𝑛𝐿~𝛿\displaystyle\;2\tilde{\delta}+\frac{\eta nL}{\tilde{\delta}}2 over~ start_ARG italic_δ end_ARG + divide start_ARG italic_η italic_n italic_L end_ARG start_ARG over~ start_ARG italic_δ end_ARG end_ARG
    \displaystyle\leq rδρ/2+(R2nrLT3/4)nLδ~𝑟𝛿𝜌2𝑅2𝑛𝑟𝐿superscript𝑇34𝑛𝐿~𝛿\displaystyle\;r\delta\rho/2+\left(\frac{R}{2nrLT^{3/4}}\right)\frac{nL}{% \tilde{\delta}}italic_r italic_δ italic_ρ / 2 + ( divide start_ARG italic_R end_ARG start_ARG 2 italic_n italic_r italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG italic_n italic_L end_ARG start_ARG over~ start_ARG italic_δ end_ARG end_ARG
    \displaystyle\leq rδρ/2+δ~/2𝑟𝛿𝜌2~𝛿2\displaystyle\;r\delta\rho/2+\tilde{\delta}/2italic_r italic_δ italic_ρ / 2 + over~ start_ARG italic_δ end_ARG / 2
    \displaystyle\leq rδρ,𝑟𝛿𝜌\displaystyle\;r\delta\rho,italic_r italic_δ italic_ρ ,

    and so each selected point is feasible. This allows us to bound our regret by

    RegT(NestedBCO)=subscriptReg𝑇NestedBCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedBCO}}})=Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( nestedbco ) = RegT(FKM)+δLRTsubscriptReg𝑇FKM𝛿𝐿𝑅𝑇\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{FKM}}}% )+\delta LRTReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( fkm ) + italic_δ italic_L italic_R italic_T
    =\displaystyle== ηn2δ~2T+4R2ηr2+8δ~LRTr+δLRT𝜂superscript𝑛2superscript~𝛿2𝑇4superscript𝑅2𝜂superscript𝑟28~𝛿𝐿𝑅𝑇𝑟𝛿𝐿𝑅𝑇\displaystyle\;\eta\frac{n^{2}}{\tilde{\delta}^{2}}T+\frac{4R^{2}}{\eta r^{2}}% +\frac{8\tilde{\delta}LRT}{r}+{\delta LRT}italic_η divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over~ start_ARG italic_δ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_T + divide start_ARG 4 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_η italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 8 over~ start_ARG italic_δ end_ARG italic_L italic_R italic_T end_ARG start_ARG italic_r end_ARG + italic_δ italic_L italic_R italic_T
    =\displaystyle== η16n2r2δ2ρ2T+4R2ηr2+2δρLRT+δLRT𝜂16superscript𝑛2superscript𝑟2superscript𝛿2superscript𝜌2𝑇4superscript𝑅2𝜂superscript𝑟22𝛿𝜌𝐿𝑅𝑇𝛿𝐿𝑅𝑇\displaystyle\;\eta\frac{16n^{2}}{r^{2}{\delta}^{2}\rho^{2}}T+\frac{4R^{2}}{% \eta r^{2}}+2\delta\rho LRT+{\delta LRT}italic_η divide start_ARG 16 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_T + divide start_ARG 4 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_η italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 2 italic_δ italic_ρ italic_L italic_R italic_T + italic_δ italic_L italic_R italic_T (δ~=rδρ/4~𝛿𝑟𝛿𝜌4\tilde{\delta}=r\delta\rho/4over~ start_ARG italic_δ end_ARG = italic_r italic_δ italic_ρ / 4)
    \displaystyle\leq  16ηn2T3/2+4R2ηr2+12LRT3/4rρ16𝜂superscript𝑛2superscript𝑇324superscript𝑅2𝜂superscript𝑟212𝐿𝑅superscript𝑇34𝑟𝜌\displaystyle\;16\eta n^{2}T^{3/2}+\frac{4R^{2}}{\eta r^{2}}+\frac{12LRT^{3/4}% }{r\rho}16 italic_η italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT + divide start_ARG 4 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_η italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 12 italic_L italic_R italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r italic_ρ end_ARG (δ=4rρT1/4,r1formulae-sequence𝛿4𝑟𝜌superscript𝑇14𝑟1\delta=\frac{4}{r\rho T^{1/4}},r\leq 1italic_δ = divide start_ARG 4 end_ARG start_ARG italic_r italic_ρ italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG , italic_r ≤ 1)
    \displaystyle\leq 16nLRT3/4r+12LRT3/4rρ16𝑛𝐿𝑅superscript𝑇34𝑟12𝐿𝑅superscript𝑇34𝑟𝜌\displaystyle\;\frac{16nLRT^{3/4}}{r}+\frac{12LRT^{3/4}}{r\rho}divide start_ARG 16 italic_n italic_L italic_R italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG + divide start_ARG 12 italic_L italic_R italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r italic_ρ end_ARG (η=R2nrLT3/4𝜂𝑅2𝑛𝑟𝐿superscript𝑇34\eta=\frac{R}{2nrLT^{3/4}}italic_η = divide start_ARG italic_R end_ARG start_ARG 2 italic_n italic_r italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG)
    =\displaystyle== O(nRLT3/4(rρ)1).𝑂𝑛𝑅𝐿superscript𝑇34superscript𝑟𝜌1\displaystyle\;O\left(nRLT^{3/4}(r\rho)^{-1}\right).italic_O ( italic_n italic_R italic_L italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ( italic_r italic_ρ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

Appendix H Background and Proofs for Section 4.1: Performative Prediction

H.1 Background

Introduced by Perdomo et al. (2020), the Performative Prediction problem captures settings in which the data distribution for which a classifier is deployed may shift as a function of the classifier itself, notably including strategic classification Hardt et al. (2015) as well as problems related to reinforcement learning and causal inference. While a number of extensions of strategic classification to online settings have been considered Dong et al. (2018); Zrnic et al. (2021b); Ahmadi et al. (2023), the bulk of the literature on performative prediction considers settings with a fixed loss function and distribution “update map” Perdomo et al. (2020); Miller et al. (2021); Jagadeesan et al. (2022b); Mendler-Dünner et al. (2020); Piliouras and Yu (2022); Brown et al. (2022), where the update map may sometimes depend on the current distribution (as in the Stateful Performative Prediction setting of Brown et al. (2022)). For the location-scale family of update maps introduced by Miller et al. (2021) (and additionally explored by Jagadeesan et al. (2022b) from a regret minimization perspective), which yields a convex “performative risk” objective function, a formulation of Online Performative Prediction is given by Kumar et al. (2022) as an application of online convex optimization with unbounded memory, in which the classification loss function may change over time and the distribution updates may occur gradually.

Here, we generalize the problem formulation of Kumar et al. (2022) to also accommodate notions of statefulness similar to that in Brown et al. (2022). In particular, the instances we consider will resemble location-scale maps when restricting attention only the performatively stable classifiers for each distribution, yet the update effect of a non-stable classifier may be distribution-dependent and nonlinear, provided that the update map satisfies local controllability (viewing classifiers as actions and distributions as states) and mild regularity properties (e.g. invertibility and Lipschitz conditions).

H.2 Model

In the setting of Online Performative Prediction we consider, as formulated by Kumar et al. (2022), in each round t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ] we deploy some classifier xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and observe samples from some distribution ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which may change dynamically as a function of the history of interactions. Here, we take 𝒳n𝒳superscript𝑛\operatorname{\mathcal{X}}\subseteq\operatorname{\mathbb{R}}^{n}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as our space of classifiers, e.g. representing weight vectors for regression, which we assume is bounded and convex. The initial data distribution is given by some distribution p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. In each round, upon deploying a classifier xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the distribution is updated according to

pt=subscript𝑝𝑡absent\displaystyle p_{t}=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = (1θ)pt1+θ𝒟(xt,yt1),1𝜃subscript𝑝𝑡1𝜃𝒟subscript𝑥𝑡subscript𝑦𝑡1\displaystyle\;(1-\theta)p_{t-1}+\theta\operatorname{\mathcal{D}}(x_{t},y_{t-1% }),( 1 - italic_θ ) italic_p start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ,

for θ(0,1]𝜃01\theta\in(0,1]italic_θ ∈ ( 0 , 1 ], where 𝒟(xt,yt1)𝒟subscript𝑥𝑡subscript𝑦𝑡1\operatorname{\mathcal{D}}(x_{t},y_{t-1})caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is the distribution update map taking as input our classifier xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and some representation of the state y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y, where we assume 𝒴n𝒴superscript𝑛\operatorname{\mathcal{Y}}\subseteq\operatorname{\mathbb{R}}^{n}caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is convex, contains r(𝟎)subscript𝑟0\operatorname{\mathcal{B}}_{r}(\mathbf{0})caligraphic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_0 ), is bounded with radius R𝑅Ritalic_R, and that y0=0subscript𝑦00y_{0}=0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0. We make the following assumptions on 𝒟𝒟\operatorname{\mathcal{D}}caligraphic_D.

Assumption 1.

We assume the distribution update map 𝒟(x,y)𝒟𝑥𝑦\operatorname{\mathcal{D}}(x,y)caligraphic_D ( italic_x , italic_y ) operates as follows:

  • 𝒟(x,y)=A(x,y)+ξ𝒟𝑥𝑦𝐴𝑥𝑦𝜉\operatorname{\mathcal{D}}(x,y)=A(x,y)+\xicaligraphic_D ( italic_x , italic_y ) = italic_A ( italic_x , italic_y ) + italic_ξ, with A:𝒳×𝒴𝒴:𝐴𝒳𝒴𝒴A:\operatorname{\mathcal{X}}\times\operatorname{\mathcal{Y}}\rightarrow% \operatorname{\mathcal{Y}}italic_A : caligraphic_X × caligraphic_Y → caligraphic_Y,

  • ξ𝜉\xiitalic_ξ is a random variable in nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with mean μ𝜇\muitalic_μ and covariance ΣΣ\Sigmaroman_Σ,

  • A(x,y)𝐴𝑥𝑦A(x,y)italic_A ( italic_x , italic_y ) satisfies ρ𝜌\rhoitalic_ρ-local controllability and has an inverse action map** X(y,y)𝑋𝑦superscript𝑦X(y,y^{*})italic_X ( italic_y , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) where

    A(X(y,y),y)=y,𝐴𝑋𝑦superscript𝑦𝑦superscript𝑦\displaystyle A(X(y,y^{*}),y)=y^{*},italic_A ( italic_X ( italic_y , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , italic_y ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ,

    defined over feasible pairs, which is Lysubscript𝐿𝑦L_{y}italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT-Lipschitz in y𝑦yitalic_y (when feasibility of ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT holds), and

  • There is a linear invertible function s:𝒳𝒴:𝑠𝒳𝒴s:\operatorname{\mathcal{X}}\rightarrow\operatorname{\mathcal{Y}}italic_s : caligraphic_X → caligraphic_Y such that A(x,y)=s(x)𝐴𝑥𝑦𝑠𝑥A(x,y)=s(x)italic_A ( italic_x , italic_y ) = italic_s ( italic_x ) if y=s(x)𝑦𝑠𝑥y=s(x)italic_y = italic_s ( italic_x ), where s1:𝒴𝒳:superscript𝑠1𝒴𝒳s^{-1}:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathcal{X}}italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT : caligraphic_Y → caligraphic_X is S𝑆Sitalic_S-Lipschitz.

Further, A(x,y)𝐴𝑥𝑦A(x,y)italic_A ( italic_x , italic_y ) is known and ξ𝜉\xiitalic_ξ can be sampled freely.

The inverse action map** assumption simply enforces that classifiers need not change drastically to have the same update effect under small changes to the state. The final assumption imposes a linear structure over performatively stable classifiers (i.e. classifiers for which the resulting distribution will remain fixed under 𝒟𝒟\operatorname{\mathcal{D}}caligraphic_D, as formulated by Perdomo et al. (2020)), but we note that the distribution may update in an arbitrarily nonlinear fashion (subject to the other conditions) when xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is not a performatively stable classifier for the distribution induced by the previous state yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. The ability to accommodate a state component is reminiscent of prior work involving notions of statefulness in performative prediction such as Brown et al. (2022). Our setting generalizes that of Kumar et al. (2022), in which the map A𝐴Aitalic_A is taken to be a fixed matrix. For any nonsingular matrix A𝐴Aitalic_A there is immediately a linear map s(x)=A1x𝑠𝑥superscript𝐴1𝑥s(x)=A^{-1}xitalic_s ( italic_x ) = italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x, and local controllability can be defined in terms of the largest and smallest absolute eigenvalues of A𝐴Aitalic_A (as a special case of our Example 1 with a fixed matrix). We view the nonsingularity assumption (and invertibility in the more general case) as fairly mild, as it amounts to assuming that the distribution map can depend on all parameters of classifier without any necessary (linear) dependency structure imposed, and that no two classifiers are equivalent only to the population but not the optimizer (as otherwise one could simply reduce dimensionality of 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X). However, even in the case where A𝐴Aitalic_A is singular, we note that this issue is resolvable augmenting the state representation ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to incorporate the choice of free classifier parameters which affect loss but not distribution updates (e.g. by adding a vector wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which is orthogonal to the range of A𝐴Aitalic_A and linear in xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT). We assume invertibility here for simplicity, and we take 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y to be simply be given by the range of s𝑠sitalic_s over 𝒳𝒳\operatorname{\mathcal{X}}caligraphic_X. At each round t𝑡titalic_t, some scoring function ft(x,z)subscript𝑓𝑡𝑥𝑧f_{t}(x,z)italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_z ) is chosen adversarially, and our loss is then given by

f~t(xt,pt)=subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡absent\displaystyle\tilde{f}_{t}(x_{t},p_{t})=over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 𝔼zpt[ft(xt,z)].subscript𝔼similar-to𝑧subscript𝑝𝑡subscript𝑓𝑡subscript𝑥𝑡𝑧\displaystyle\;\operatorname*{\mathbb{E}}_{z\sim p_{t}}[f_{t}(x_{t},z)].blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z ) ] .

We assume each ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is convex and Lzsubscript𝐿𝑧L_{z}italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz in both x𝑥xitalic_x and z𝑧zitalic_z, and that p0=y0+ξsubscript𝑝0subscript𝑦0𝜉p_{0}=y_{0}+\xiitalic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ξ. We measure our regret with respect to the best performatively stable classifier, i.e. the loss of any classifier as if were held constant indefinitely as the distribution updates. We define our regret as follows:

RegT(𝒜)=subscriptReg𝑇𝒜absent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = maxxt=1Tf~t(xt,pt)f~t(x,𝒟(x,s(x)))subscriptsuperscript𝑥superscriptsubscript𝑡1𝑇subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡subscript~𝑓𝑡superscript𝑥𝒟superscript𝑥𝑠superscript𝑥\displaystyle\;\max_{x^{*}}\sum_{t=1}^{T}\tilde{f}_{t}(x_{t},p_{t})-\tilde{f}_% {t}(x^{*},\operatorname{\mathcal{D}}(x^{*},s(x^{*})))roman_max start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_D ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) )

Here, the role of s(x)𝑠superscript𝑥s(x^{*})italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) captures the convergence of the distribution to a stable point, resulting from taking the limit of the distribution update rule as t𝑡titalic_t grows large.

As in many of the applications we consider, here our loss is determined both by our action (the classifier) and the state (in terms of the distribution). Our approach for casting Online Performative Prediction as an instance of online nonlinear control in our framework will be to define appropriate surrogate convex losses which depend only on the state, over which we run NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl. Here, these will correspond to losses only over the updated distribution component 𝒟(xt,yt1)𝒟subscript𝑥𝑡subscript𝑦𝑡1\operatorname{\mathcal{D}}(x_{t},y_{t-1})caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), which we show closely track our true incurred loss.

H.3 Analysis

For each round t𝑡titalic_t, define the surrogate loss ft(y)subscriptsuperscript𝑓𝑡𝑦f^{*}_{t}(y)italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) as:

ft(y)=𝔼zyt+ξ[ft(s1(y),z)].subscriptsuperscript𝑓𝑡𝑦subscript𝔼similar-to𝑧subscript𝑦𝑡𝜉subscript𝑓𝑡superscript𝑠1𝑦𝑧\displaystyle f^{*}_{t}(y)=\operatorname*{\mathbb{E}}_{z\sim y_{t}+\xi}\left[f% _{t}(s^{-1}(y),z)\right].italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) = blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ξ end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) , italic_z ) ] .
Lemma 1.

Each ft(y)superscriptsubscript𝑓𝑡𝑦f_{t}^{*}(y)italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y ) is convex and (1+S)Lz1𝑆subscript𝐿𝑧(1+S)L_{z}( 1 + italic_S ) italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz in y𝑦yitalic_y.

  • Proof

    Consider any individual sample vξsimilar-to𝑣𝜉v\sim\xiitalic_v ∼ italic_ξ. We can then view g(y)=(s1(y),y+v)𝑔𝑦superscript𝑠1𝑦𝑦𝑣g(y)=(s^{-1}(y),y+v)italic_g ( italic_y ) = ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) , italic_y + italic_v ) as a vector-valued function which is (1+S)1superscript𝑆(1+S^{*})( 1 + italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )-Lipschitz. The function ft(g(y))subscript𝑓𝑡𝑔𝑦f_{t}(g(y))italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g ( italic_y ) ) is a Lzsubscript𝐿𝑧L_{z}italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz and convex function of this linear function of y𝑦yitalic_y, and thus ft(s1(y),y+v)subscript𝑓𝑡superscript𝑠1𝑦𝑦𝑣f_{t}(s^{-1}(y),y+v)italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) , italic_y + italic_v ) is convex and (1+S)Lz1superscript𝑆subscript𝐿𝑧(1+S^{*})L_{z}( 1 + italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz in y𝑦yitalic_y. The function ft(y)subscriptsuperscript𝑓𝑡𝑦f^{*}_{t}(y)italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) is an average of such functions, taken over the expectation of ξ𝜉\xiitalic_ξ, and thus is convex and (1+S)Lz1superscript𝑆subscript𝐿𝑧(1+S^{*})L_{z}( 1 + italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz in y𝑦yitalic_y as well. ∎

Observe that ft(y)=f~t(s1(y),𝒟(s1(y),y)f^{*}_{t}(y)=\tilde{f}_{t}(s^{-1}(y),\operatorname{\mathcal{D}}(s^{-1}(y),y)italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) = over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) , caligraphic_D ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y ) , italic_y ). We will run NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for these losses over the ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,A)𝒳𝒴𝐴(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},A)( caligraphic_X , caligraphic_Y , italic_A ), where we can track the current state yt=A(xt,yt1)subscript𝑦𝑡𝐴subscript𝑥𝑡subscript𝑦𝑡1y_{t}=A(x_{t},y_{t-1})italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_A ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) at each step as a function of our past actions given knowledge of A𝐴Aitalic_A, and can compute gradients of ft(yt)subscriptsuperscript𝑓𝑡subscript𝑦𝑡f^{*}_{t}(y_{t})italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to arbitrary desired precision by sampling from ξ𝜉\xiitalic_ξ. This will yield the regret bound from Theorem 1 with respect to the surrogate losses, and the key challenge will be to analyze our error between the true and surrogate losses.

Lemma 2.

For any round t𝑡titalic_t we have that

f~t(xt,pt)ft(yt)subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡superscriptsubscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\tilde{f}_{t}(x_{t},p_{t})-f_{t}^{*}(y_{t})\leqover~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ (1θ)hM+ηLz(1+S)γ(Ly+1θθ)superscript1𝜃𝑀𝜂subscript𝐿𝑧1𝑆𝛾subscript𝐿𝑦1𝜃𝜃\displaystyle\;(1-\theta)^{h}M+\frac{\eta L_{z}(1+S)}{\gamma}\cdot\left(L_{y}+% \frac{1-\theta}{\theta}\right)( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT italic_M + divide start_ARG italic_η italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 1 + italic_S ) end_ARG start_ARG italic_γ end_ARG ⋅ ( italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG 1 - italic_θ end_ARG start_ARG italic_θ end_ARG )
  • Proof

    For any h<t𝑡h<titalic_h < italic_t, the loss of xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over the distribution yth+ξ=𝒟(xth,yth1)subscript𝑦𝑡𝜉𝒟subscript𝑥𝑡subscript𝑦𝑡1y_{t-h}+\xi=\operatorname{\mathcal{D}}(x_{t-h},y_{t-h-1})italic_y start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT + italic_ξ = caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - italic_h - 1 end_POSTSUBSCRIPT ) can be expressed as

    f^t(xt,yth)=subscript^𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡absent\displaystyle\hat{f}_{t}(x_{t},y_{t-h})=over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT ) = 𝔼zξ+yth[ft(xt,z)],subscript𝔼similar-to𝑧𝜉subscript𝑦𝑡subscript𝑓𝑡subscript𝑥𝑡𝑧\displaystyle\;\operatorname*{\mathbb{E}}_{z\sim\xi+y_{t-h}}\left[f_{t}(x_{t},% z)\right],blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_ξ + italic_y start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z ) ] ,

    which is convex and Lzsubscript𝐿𝑧L_{z}italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT-Lipschitz in both parameters when taking the expectation over ξ𝜉\xiitalic_ξ. For round t𝑡titalic_t in isolation, using the inverse action map** bound and the bound on ytyt1delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1\left\lVert y_{t}-y_{t-1}\right\rVert∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ from Proposition 5 we have that

    f^t(xt,yt)ft(yt)=subscript^𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscriptsuperscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\hat{f}_{t}(x_{t},y_{t})-f^{*}_{t}(y_{t})=over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = f^t(xt,yt)f^t(s1(yt),yt)subscript^𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript^𝑓𝑡superscript𝑠1subscript𝑦𝑡subscript𝑦𝑡\displaystyle\;\hat{f}_{t}(x_{t},y_{t})-\hat{f}_{t}(s^{-1}(y_{t}),y_{t})over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    =\displaystyle== f^t(X(yt1,yt),yt)f^t(X(yt,yt),yt)subscript^𝑓𝑡𝑋subscript𝑦𝑡1subscript𝑦𝑡subscript𝑦𝑡subscript^𝑓𝑡𝑋subscript𝑦𝑡subscript𝑦𝑡subscript𝑦𝑡\displaystyle\;\hat{f}_{t}(X(y_{t-1},y_{t}),y_{t})-\hat{f}_{t}(X(y_{t},y_{t}),% y_{t})over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq ηLyLzγ,𝜂subscript𝐿𝑦subscript𝐿𝑧𝛾\displaystyle\;\frac{\eta L_{y}L_{z}}{\gamma},divide start_ARG italic_η italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG start_ARG italic_γ end_ARG ,

    and further for previous states that

    f^t(xt,yth)ft(yt)=subscript^𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscriptsuperscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\hat{f}_{t}(x_{t},y_{t-h})-f^{*}_{t}(y_{t})=over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = (Ly+h)ηLz(1+S)γ.subscript𝐿𝑦𝜂subscript𝐿𝑧1𝑆𝛾\displaystyle\;(L_{y}+h)\frac{\eta L_{z}(1+S)}{\gamma}.( italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_h ) divide start_ARG italic_η italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 1 + italic_S ) end_ARG start_ARG italic_γ end_ARG .

    We can decompose the distribution ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into updates from past rounds as

    pt=subscript𝑝𝑡absent\displaystyle p_{t}=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = (1θ)tp0+h=0t1θ(1θ)h𝒟(xth,yth1)superscript1𝜃𝑡subscript𝑝0superscriptsubscript0𝑡1𝜃superscript1𝜃𝒟subscript𝑥𝑡subscript𝑦𝑡1\displaystyle\;(1-\theta)^{t}p_{0}+\sum_{h=0}^{t-1}\theta(1-\theta)^{h}% \operatorname{\mathcal{D}}(x_{t-h},y_{t-h-1})( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_θ ( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT caligraphic_D ( italic_x start_POSTSUBSCRIPT italic_t - italic_h end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - italic_h - 1 end_POSTSUBSCRIPT )

    which then yields a loss discrepancy of at most

    f~t(xt,pt)ft(yt)subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡superscriptsubscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\tilde{f}_{t}(x_{t},p_{t})-f_{t}^{*}(y_{t})\leqover~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ (1θ)tft(xt,p0)+ηLz(1+S)γ(h=0t1θ(1θ)h(Ly+h))superscript1𝜃𝑡subscript𝑓𝑡subscript𝑥𝑡subscript𝑝0𝜂subscript𝐿𝑧1𝑆𝛾superscriptsubscript0𝑡1𝜃superscript1𝜃subscript𝐿𝑦\displaystyle\;(1-\theta)^{t}f_{t}(x_{t},p_{0})+\frac{\eta L_{z}(1+S)}{\gamma}% \left(\sum_{h=0}^{t-1}\theta(1-\theta)^{h}(L_{y}+h)\right)( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + divide start_ARG italic_η italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 1 + italic_S ) end_ARG start_ARG italic_γ end_ARG ( ∑ start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_θ ( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_h ) )
    \displaystyle\leq ηLz(1+S)γ(Ly+1θθ+(1θ)t)𝜂subscript𝐿𝑧1𝑆𝛾subscript𝐿𝑦1𝜃𝜃superscript1𝜃𝑡\displaystyle\;\frac{\eta L_{z}(1+S)}{\gamma}\cdot\left(L_{y}+\frac{1-\theta}{% \theta}+(1-\theta)^{t}\right)divide start_ARG italic_η italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 1 + italic_S ) end_ARG start_ARG italic_γ end_ARG ⋅ ( italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG 1 - italic_θ end_ARG start_ARG italic_θ end_ARG + ( 1 - italic_θ ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )

    between the true and surrogate loss for round t𝑡titalic_t. ∎

We can now bound the cumulative regret of NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for the problem.

Theorem 16.

For any θ>0𝜃0\theta>0italic_θ > 0, when Assumption 1 holds for the distribution update rule, Online Performative Prediction can be cast as a ρ𝜌\rhoitalic_ρ-locally controllable instance of online control with nonlinear dynamics, for which NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl obtains regret

RegT(NestedOCO)subscriptReg𝑇NestedOCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrl ) ≤  2(1+Ly+Rrρ+2θθ)TGLz2(1+S)2γ21subscript𝐿𝑦𝑅𝑟𝜌2𝜃𝜃𝑇𝐺superscriptsubscript𝐿𝑧2superscript1𝑆2𝛾\displaystyle\;2\sqrt{\frac{(1+L_{y}+\frac{R}{r\rho}+\frac{2-\theta}{\theta})% TGL_{z}^{2}(1+S)^{2}}{\gamma}}2 square-root start_ARG divide start_ARG ( 1 + italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 2 - italic_θ end_ARG start_ARG italic_θ end_ARG ) italic_T italic_G italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG

with respect to the best performatively stable classifier classifier.

  • Proof

    Combining the previous results with Theorem 1, we have that for any x𝒳superscript𝑥𝒳x^{*}\in\operatorname{\mathcal{X}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_X our regret is at most

    t=1Tf~t(xt,pt)f~t(𝒟(x,s(x)))superscriptsubscript𝑡1𝑇subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡subscript~𝑓𝑡𝒟superscript𝑥𝑠superscript𝑥absent\displaystyle\sum_{t=1}^{T}\tilde{f}_{t}(x_{t},p_{t})-\tilde{f}_{t}(% \operatorname{\mathcal{D}}(x^{*},s(x^{*})))\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( caligraphic_D ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ) ≤ t=1Tf^t(yt)f~t(x,𝒟(x,s(x)))+t=1Tf~t(xt,pt)ft(yt)superscriptsubscript𝑡1𝑇subscript^𝑓𝑡subscript𝑦𝑡subscript~𝑓𝑡superscript𝑥𝒟superscript𝑥𝑠superscript𝑥superscriptsubscript𝑡1𝑇subscript~𝑓𝑡subscript𝑥𝑡subscript𝑝𝑡subscriptsuperscript𝑓𝑡subscript𝑦𝑡\displaystyle\;\sum_{t=1}^{T}\hat{f}_{t}(y_{t})-\tilde{f}_{t}(x^{*},% \operatorname{\mathcal{D}}(x^{*},s(x^{*})))+\sum_{t=1}^{T}\tilde{f}_{t}(x_{t},% p_{t})-{f}^{*}_{t}(y_{t})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_D ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq η(1+Ly+2θθ+Rrρ)TLz(1+S)γ+Gη𝜂1subscript𝐿𝑦2𝜃𝜃𝑅𝑟𝜌𝑇subscript𝐿𝑧1𝑆𝛾𝐺𝜂\displaystyle\;\eta\left(1+L_{y}+\frac{2-\theta}{\theta}+\frac{R}{r\rho}\right% )\frac{TL_{z}(1+S)}{\gamma}+\frac{G}{\eta}italic_η ( 1 + italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG 2 - italic_θ end_ARG start_ARG italic_θ end_ARG + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 1 + italic_S ) end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    =\displaystyle==  2(1+Ly+Rrρ+2θθ)TGLz2(1+S)2γ21subscript𝐿𝑦𝑅𝑟𝜌2𝜃𝜃𝑇𝐺superscriptsubscript𝐿𝑧2superscript1𝑆2𝛾\displaystyle\;2\sqrt{\frac{(1+L_{y}+\frac{R}{r\rho}+\frac{2-\theta}{\theta})% TGL_{z}^{2}(1+S)^{2}}{\gamma}}2 square-root start_ARG divide start_ARG ( 1 + italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 2 - italic_θ end_ARG start_ARG italic_θ end_ARG ) italic_T italic_G italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG

    upon setting η=Gγ(1+Ly+Rrρ+2θθ)TLz2(1+S)2𝜂𝐺𝛾1subscript𝐿𝑦𝑅𝑟𝜌2𝜃𝜃𝑇superscriptsubscript𝐿𝑧2superscript1𝑆2\eta=\sqrt{\frac{G\gamma}{(1+L_{y}+\frac{R}{r\rho}+\frac{2-\theta}{\theta})TL_% {z}^{2}(1+S)^{2}}}italic_η = square-root start_ARG divide start_ARG italic_G italic_γ end_ARG start_ARG ( 1 + italic_L start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 2 - italic_θ end_ARG start_ARG italic_θ end_ARG ) italic_T italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG. ∎

Theorem 6 follows directly from Theorem 16. For Online Performative Prediction, in the full generality of the setting considered, the per-round optimization problem may not be convex, in which case we make use of the non-convex optimization oracle access for NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl. However, in each of the following applications we show that the action selection step can indeed be implemented efficiently without imposing additional restrictions on the dynamics.

Appendix I Background and Proofs for Section 4.2: Adaptive Recommendations

I.1 Background

Motivated by problems involving preference dynamics and feedback loops in recommendation systems (see e.g.Flaxman et al. (2016)), a number of recent works Hazla et al. (2019); Gaitonde et al. (2021); Dean and Morgenstern (2022); Jagadeesan et al. (2022a); Agarwal and Brown (2022, 2023) have explored models of repeated recommendation where given to an agent whose preferences or opinions evolve over time. Several of these models Hazla et al. (2019); Dean and Morgenstern (2022); Jagadeesan et al. (2022a) consider population-level effects for settings where a single recommendation is given each round and consumers (or producers) update their behavior according to linear dynamics. Nonlinear preference dynamics with menus of recommendations for a single agent are considered in Agarwal and Brown (2022, 2023), where the aims to minimize regret for adversarial losses over the agent’s choices. The Adaptive Recommendations formulation of Agarwal and Brown (2022) somewhat resembles the “Dueling Bandits” setting of Yue et al. (2012), where k>1𝑘1k>1italic_k > 1 actions are chosen in each round, yet where preferences can now evolve dynamically as a function of the history rather than remaining fixed. Whereas Agarwal and Brown (2022, 2023) study a bandit formulation of the problem with unknown preference dynamics, here we consider a full-feedback model with known dynamics, allowing for relaxed structural assumptions (on the agent’s “memory horizon” and “preference scoring functions”) at the cost of stronger informational assumptions, while maintaining the overall dynamics of the problem.

I.2 Model

Here, we are tasked with repeatedly recommending menus of content to an agent. Out of a universe of n𝑛nitalic_n elements (e.g. video channels, clothing items), we show a subset of size k𝑘kitalic_k (denoted Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) to the agent in each round, for T𝑇Titalic_T total rounds. The agent chooses one item iKt𝑖subscript𝐾𝑡i\in K_{t}italic_i ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the menu, according to a distribution in terms of their preferences, which are a function of their selection history. Conditioned on being shown a menu Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the agent’s choice distribution has positive mass only on the k𝑘kitalic_k items iKt𝑖subscript𝐾𝑡i\in K_{t}italic_i ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The agent’s representation of their selection history is given by their memory vector vtΔ(n)subscript𝑣𝑡Δ𝑛v_{t}\in\Delta(n)italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ), and choices are determined by their preference scoring functions si:Δ(n)[λ,1]:subscript𝑠𝑖Δ𝑛𝜆1s_{i}:\Delta(n)\rightarrow[\lambda,1]italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Δ ( italic_n ) → [ italic_λ , 1 ] for each i𝑖iitalic_i, which map the agent’s memory vector to relative preference scores for each item. The menu we show to the agent may be chosen from some distribution xtΔ((nk))subscript𝑥𝑡Δbinomial𝑛𝑘x_{t}\in\Delta({n\choose k})italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ), and for each Kt[(nk)]subscript𝐾𝑡delimited-[]binomial𝑛𝑘K_{t}\in[{n\choose k}]italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ] the agent’s menu-conditional distribution pt(;Kt,vt1)Δ(n)subscript𝑝𝑡subscript𝐾𝑡subscript𝑣𝑡1Δ𝑛p_{t}(\cdot;K_{t},v_{t-1})\in\Delta(n)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ; italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ∈ roman_Δ ( italic_n ) is proportional to the scores si(vt)subscript𝑠𝑖subscript𝑣𝑡s_{i}(v_{t})italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for items in Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, given as

pt(i;Kt,vt1)=subscript𝑝𝑡𝑖subscript𝐾𝑡subscript𝑣𝑡1absent\displaystyle p_{t}(i;K_{t},v_{t-1})=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ; italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = si(vt1)jKtsj(vt1)subscript𝑠𝑖subscript𝑣𝑡1subscript𝑗subscript𝐾𝑡subscript𝑠𝑗subscript𝑣𝑡1\displaystyle\;\frac{s_{i}(v_{t-1})}{\sum_{j\in K_{t}}s_{j}(v_{t-1})}divide start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG

for each iKt𝑖subscript𝐾𝑡i\in K_{t}italic_i ∈ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with pt(j;Kt,vt1)=0subscript𝑝𝑡𝑗subscript𝐾𝑡subscript𝑣𝑡10p_{t}(j;K_{t},v_{t-1})=0italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_j ; italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = 0 for jKt𝑗subscript𝐾𝑡j\notin K_{t}italic_j ∉ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The joint item choice distribution, considering both random selection of a menu Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT according to xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and the agent’s choice from Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, is given by

pt(;xt,vt1)=subscript𝑝𝑡subscript𝑥𝑡subscript𝑣𝑡1absent\displaystyle p_{t}(\cdot;x_{t},v_{t-1})=italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ; italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = Kt(nk)xt(Kt)pt(;Kt,vt1)subscriptsubscript𝐾𝑡binomial𝑛𝑘subscript𝑥𝑡subscript𝐾𝑡subscript𝑝𝑡subscript𝐾𝑡subscript𝑣𝑡1\displaystyle\;\sum_{K_{t}\in{n\choose k}}x_{t}(K_{t})\cdot p_{t}(\cdot;K_{t},% v_{t-1})∑ start_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⋅ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ; italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )

which we may denote simply by the vector ptΔ(n)subscript𝑝𝑡Δ𝑛p_{t}\in\Delta(n)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ), or as a function pt(xt)subscript𝑝𝑡subscript𝑥𝑡p_{t}(x_{t})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). In contrast to prior work, here we consider a deterministic variant of the problem as an illustration of the flexibility of our framework for online nonlinear control. In particular, we assume that the agent’s memory vector vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT updates according to its expectation over ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as

vt=(1θt)vt1+θtpt,subscript𝑣𝑡1subscript𝜃𝑡subscript𝑣𝑡1subscript𝜃𝑡subscript𝑝𝑡\displaystyle v_{t}=(1-\theta_{t})v_{t-1}+\theta_{t}p_{t},italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where θt[θ,1]subscript𝜃𝑡𝜃1\theta_{t}\in[\theta,1]italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_θ , 1 ] is the per-round update speed, and we assume that the agent’s scoring functions sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are known. We receive convex and L𝐿Litalic_L-Lipschitz losses ft(pt)subscript𝑓𝑡subscript𝑝𝑡f_{t}(p_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in each round in terms of the agent’s choices, over which we aim to minimize regret with respect to some distribution set 𝒴Δ(n)𝒴Δ𝑛\operatorname{\mathcal{Y}}\subseteq\Delta(n)caligraphic_Y ⊆ roman_Δ ( italic_n ).

The prior work (Agarwal and Brown, 2022, 2023) has considered two particular subsets of Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ) as regret benchmarks. We show that both can be cast as locally controllable instances of online control, and further, we make use of local controllability to give a general characterization of convex sets 𝒴Δ(n)𝒴Δ𝑛\operatorname{\mathcal{Y}}\subseteq\Delta(n)caligraphic_Y ⊆ roman_Δ ( italic_n ) over which sublinear regret is attainable. We recall some key definitions and results from (Agarwal and Brown, 2022, 2023).

Definition 4 (Instantaneously Realizable Distributions).

The set of instantaneously realizable distributions at a memory vector vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ) is given by

IRD(v)=IRD𝑣absent\displaystyle\operatorname{\textup{{IRD}}}(v)=IRD ( italic_v ) = convhull{p(;K,v):K[(nk)]}.convhull:𝑝𝐾𝑣𝐾delimited-[]binomial𝑛𝑘\displaystyle\;\operatorname*{convhull}\left\{p(\cdot;K,v):K\in\left[{n\choose k% }\right]\right\}.roman_convhull { italic_p ( ⋅ ; italic_K , italic_v ) : italic_K ∈ [ ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ] } .

Each such set IRD(vt1)IRDsubscript𝑣𝑡1\operatorname{\textup{{IRD}}}(v_{t-1})IRD ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) corresponds to the feasible distributions ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, given the agent’s scoring functions and memory vt1subscript𝑣𝑡1v_{t-1}italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. It is shown by Agarwal and Brown (2023) that each IRDIRD\operatorname{\textup{{IRD}}}IRD sets can be directly characterized in terms of the ratios between target frequencies and scores.

Proposition 7 (Menu Times for IRDIRD\operatorname{\textup{{IRD}}}IRD Agarwal and Brown (2023)).

Given a memory vector vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ) and target distribution pΔ(n)𝑝Δ𝑛p\in\Delta(n)italic_p ∈ roman_Δ ( italic_n ), let the menu time μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for item i𝑖iitalic_i be given by

μi=subscript𝜇𝑖absent\displaystyle\mu_{i}=italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = kp(i)si(v)j=1np(j)sj(v),𝑘𝑝𝑖subscript𝑠𝑖𝑣superscriptsubscript𝑗1𝑛𝑝𝑗subscript𝑠𝑗𝑣\displaystyle\;\frac{k\cdot\frac{p(i)}{s_{i}(v)}}{\sum_{j=1}^{n}\frac{p(j)}{s_% {j}(v)}},divide start_ARG italic_k ⋅ divide start_ARG italic_p ( italic_i ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_p ( italic_j ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) end_ARG end_ARG ,

where i=1nμi=ksuperscriptsubscript𝑖1𝑛subscript𝜇𝑖𝑘\sum_{i=1}^{n}\mu_{i}=k∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k. Then, pIRD(v)𝑝IRD𝑣p\in\operatorname{\textup{{IRD}}}(v)italic_p ∈ IRD ( italic_v ) if and only if μi1subscript𝜇𝑖1\mu_{i}\leq 1italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 for each i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

We recall the prior benchmark sets considered, and the corresponding assumptions which yield feasibility of regret minimization. We state informal analogues of the prior results as translated to our setting, which we then show formally below.

Definition 5 (Everywhere Instantaneously Realizable Distributions).

The set of everywhere instantaneously realizable distributions is given by

EIRD=EIRDabsent\displaystyle\operatorname{\textup{{EIRD}}}=EIRD = vΔ(n)IRD(v).subscript𝑣Δ𝑛IRD𝑣\displaystyle\;\bigcap_{v\in\Delta(n)}\operatorname{\textup{{IRD}}}(v).⋂ start_POSTSUBSCRIPT italic_v ∈ roman_Δ ( italic_n ) end_POSTSUBSCRIPT IRD ( italic_v ) .
Proposition 8 (Corollary of Agarwal and Brown (2022)).

If λkn+kn(n1)𝜆𝑘𝑛𝑘𝑛𝑛1\lambda\geq\frac{k}{n}+\frac{k}{n(n-1)}italic_λ ≥ divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_k end_ARG start_ARG italic_n ( italic_n - 1 ) end_ARG, then EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD is non-empty, and there is a o(T)𝑜𝑇o(T)italic_o ( italic_T ) regret algorithm with respect to any distribution pEIRD𝑝EIRDp\in\operatorname{\textup{{EIRD}}}italic_p ∈ EIRD.

Distributions ptEIRDsubscript𝑝𝑡EIRDp_{t}\in\operatorname{\textup{{EIRD}}}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ EIRD are always feasible regardless of vt1subscript𝑣𝑡1v_{t-1}italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT by an appropriate choice of xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, but EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD may be quite small in relation to Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ). Under stronger assumptions for each sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, a potentially much larger set becomes feasible as a regret benchmark.

Definition 6 (ϕitalic-ϕ\phiitalic_ϕ-Smoothed Simplex).

The ϕitalic-ϕ\phiitalic_ϕ-smoothed simplex Δϕ(n)superscriptΔitalic-ϕ𝑛\Delta^{\phi}(n)roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) for ϕ[0,1]italic-ϕ01\phi\in[0,1]italic_ϕ ∈ [ 0 , 1 ] is given by

Δϕ(n)=superscriptΔitalic-ϕ𝑛absent\displaystyle\Delta^{\phi}(n)=roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) = {(1ϕ)v+ϕ𝐮n:vΔ(n)}conditional-set1italic-ϕ𝑣italic-ϕsubscript𝐮𝑛𝑣Δ𝑛\displaystyle\;\{(1-\phi)v+\phi\mathbf{u}_{n}:v\in\Delta(n)\}{ ( 1 - italic_ϕ ) italic_v + italic_ϕ bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_v ∈ roman_Δ ( italic_n ) }
Definition 7 (Scale-Bounded Functions).

A scoring function si:Δ(n)[λσ,1]:subscript𝑠𝑖Δ𝑛𝜆𝜎1s_{i}:\Delta(n)\rightarrow[\frac{\lambda}{\sigma},1]italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Δ ( italic_n ) → [ divide start_ARG italic_λ end_ARG start_ARG italic_σ end_ARG , 1 ] is said to be (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded for σ>1𝜎1\sigma>1italic_σ > 1 and λ>0𝜆0\lambda>0italic_λ > 0 if, for all vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ), we have that

σ1((1λ)vi+λ)si(v)σ((1λ)vi+λ).superscript𝜎11𝜆subscript𝑣𝑖𝜆subscript𝑠𝑖𝑣𝜎1𝜆subscript𝑣𝑖𝜆\displaystyle\sigma^{-1}((1-\lambda)v_{i}+\lambda)\leq s_{i}(v)\leq\sigma((1-% \lambda)v_{i}+\lambda).italic_σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ( 1 - italic_λ ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ ) ≤ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) ≤ italic_σ ( ( 1 - italic_λ ) italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_λ ) .

For such functions, each score si(v)subscript𝑠𝑖𝑣s_{i}(v)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) cannot be too far from item i𝑖iitalic_i’s weight in memory, and it is shown that IRD(v)IRD𝑣\operatorname{\textup{{IRD}}}(v)IRD ( italic_v ) contains a ball around v𝑣vitalic_v for each vΔϕ(n)𝑣superscriptΔitalic-ϕ𝑛v\in\Delta^{\phi}(n)italic_v ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ), for an appropriate choice of ϕitalic-ϕ\phiitalic_ϕ.

Proposition 9 (Corollary of Agarwal and Brown (2023)).

If each sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded, then there is a o(T)𝑜𝑇o(T)italic_o ( italic_T ) regret algorithm with respect to any distribution pΔϕ(n)𝑝superscriptΔitalic-ϕ𝑛p\in\Delta^{\phi}(n)italic_p ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ), for ϕ=Θ(kλσ2)italic-ϕΘ𝑘𝜆superscript𝜎2\phi=\Theta(k\lambda\sigma^{2})italic_ϕ = roman_Θ ( italic_k italic_λ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

We extend these results to general convex benchmark sets 𝒴Δ(n)𝒴Δ𝑛\operatorname{\mathcal{Y}}\subseteq\Delta(n)caligraphic_Y ⊆ roman_Δ ( italic_n ), where we can characterize the feasibility of regret minimization via local controllability using the menu times μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. When ρ𝜌\rhoitalic_ρ-local controllability holds over a set 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, we can minimize regret via NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl using surrogate losses ft(vt)superscriptsubscript𝑓𝑡subscript𝑣𝑡f_{t}^{*}(v_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), which closely track our true losses ft(pt)subscript𝑓𝑡subscript𝑝𝑡f_{t}(p_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

I.3 Analysis

We make use of the menu time quantities μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for a memory vector v𝑣vitalic_v and target distribution p𝑝pitalic_p to translate our notion of local controllability to the Adaptive Recommendations setting. Let 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y be any convex subset of Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ), let 𝒳=Δ((nk))𝒳Δbinomial𝑛𝑘\operatorname{\mathcal{X}}=\Delta({n\choose k})caligraphic_X = roman_Δ ( ( binomial start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) ), where the dynamics Dt(xt,vt1)subscript𝐷𝑡subscript𝑥𝑡subscript𝑣𝑡1D_{t}(x_{t},v_{t-1})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) are given by

Dt(xt,vt1)=subscript𝐷𝑡subscript𝑥𝑡subscript𝑣𝑡1absent\displaystyle D_{t}(x_{t},v_{t-1})=italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = (1θt)vt1+θtpt(xt).1subscript𝜃𝑡subscript𝑣𝑡1subscript𝜃𝑡subscript𝑝𝑡subscript𝑥𝑡\displaystyle\;(1-\theta_{t})v_{t-1}+\theta_{t}p_{t}(x_{t}).( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

Note that Dt(xt,vt1)subscript𝐷𝑡subscript𝑥𝑡subscript𝑣𝑡1D_{t}(x_{t},v_{t-1})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is action-linear in xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and thus we can solve for xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT efficiently (in terms of dim(𝒳)=O(nk)dimension𝒳𝑂superscript𝑛𝑘\dim(\operatorname{\mathcal{X}})=O(n^{k})roman_dim ( caligraphic_X ) = italic_O ( italic_n start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )); further, there is a construction given in Agarwal and Brown (2023) for removing exponential dependence on k𝑘kitalic_k when computing menu distributions. We consider 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y as an (n1)𝑛1(n-1)( italic_n - 1 )-dimensional subset of nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where we define the the ball ρ(v)subscript𝜌𝑣\operatorname{\mathcal{B}}_{\rho}(v)caligraphic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_v ) of radius ρ𝜌\rhoitalic_ρ around a point v𝒴𝑣𝒴v\in\operatorname{\mathcal{Y}}italic_v ∈ caligraphic_Y as:

ρ(v)=subscript𝜌𝑣absent\displaystyle\operatorname{\mathcal{B}}_{\rho}(v)=caligraphic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_v ) = {pΔ(n):pvρ}.conditional-set𝑝Δ𝑛delimited-∥∥𝑝𝑣𝜌\displaystyle\;\{p\in\Delta(n):\left\lVert p-v\right\rVert\leq\rho\}.{ italic_p ∈ roman_Δ ( italic_n ) : ∥ italic_p - italic_v ∥ ≤ italic_ρ } .
Theorem 17.

An instance of Adaptive Recommendations (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) satisfies ρθ𝜌𝜃\rho\thetaitalic_ρ italic_θ-local controllability if, for any v𝒴𝑣𝒴v\in\operatorname{\mathcal{Y}}italic_v ∈ caligraphic_Y and pρπ(v)𝑝subscript𝜌𝜋𝑣p\in\operatorname{\mathcal{B}}_{\rho\cdot\pi(v)}italic_p ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ ⋅ italic_π ( italic_v ) end_POSTSUBSCRIPT, we have that

(k1)p(i)si(v)𝑘1𝑝𝑖subscript𝑠𝑖𝑣absent\displaystyle\frac{(k-1)p(i)}{s_{i}(v)}\leqdivide start_ARG ( italic_k - 1 ) italic_p ( italic_i ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) end_ARG ≤ jinp(j)sj(v)superscriptsubscript𝑗𝑖𝑛𝑝𝑗subscript𝑠𝑗𝑣\displaystyle\;\sum_{j\neq i}^{n}\frac{p(j)}{s_{j}(v)}∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_p ( italic_j ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) end_ARG

for every i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

This follows immediately from Proposition 8 and the definition of local controllability, which can analogously extend to strong local controllability. We can use this formulation to unify the feasibility analysis for each of the previously considered sets.

Lemma 3.

For λk1n1+ϵ𝜆𝑘1𝑛1italic-ϵ\lambda\geq\frac{k-1}{n-1}+\epsilonitalic_λ ≥ divide start_ARG italic_k - 1 end_ARG start_ARG italic_n - 1 end_ARG + italic_ϵ and ϵ0italic-ϵ0\epsilon\geq 0italic_ϵ ≥ 0, the EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD set contains a ball of radius ρ=Θ(ϵnk+ϵ)𝜌Θitalic-ϵ𝑛𝑘italic-ϵ\rho=\Theta(\frac{\epsilon}{nk+\epsilon})italic_ρ = roman_Θ ( divide start_ARG italic_ϵ end_ARG start_ARG italic_n italic_k + italic_ϵ end_ARG ) around 𝐮nsubscript𝐮𝑛\mathbf{u}_{n}bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and any instance (𝒳,EIRD,D)𝒳EIRD𝐷(\operatorname{\mathcal{X}},\operatorname{\textup{{EIRD}}},D)( caligraphic_X , EIRD , italic_D ) satisfies θ𝜃\thetaitalic_θ-local controllability.

  • Proof

    For any vΔ(n)𝑣Δ𝑛v\in\Delta(n)italic_v ∈ roman_Δ ( italic_n ), i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ], and pρ(𝐮n)𝑝subscript𝜌subscript𝐮𝑛p\in\operatorname{\mathcal{B}}_{\rho}(\mathbf{u}_{n})italic_p ∈ caligraphic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) we have p(i)1n+ρ22𝑝𝑖1𝑛𝜌22p(i)\leq\frac{1}{n}+\frac{\rho\sqrt{2}}{2}italic_p ( italic_i ) ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG and si(v)k1n1+ϵsubscript𝑠𝑖𝑣𝑘1𝑛1italic-ϵs_{i}(v)\geq\frac{k-1}{n-1}+\epsilonitalic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_v ) ≥ divide start_ARG italic_k - 1 end_ARG start_ARG italic_n - 1 end_ARG + italic_ϵ, yielding that

    (k1)p(j)sj(v)𝑘1𝑝𝑗subscript𝑠𝑗𝑣absent\displaystyle\frac{(k-1)p(j)}{s_{j}(v)}\leqdivide start_ARG ( italic_k - 1 ) italic_p ( italic_j ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) end_ARG ≤ 1+ρn22nn1+ϵnk1,1𝜌𝑛22𝑛𝑛1italic-ϵ𝑛𝑘1\displaystyle\;\frac{1+\frac{\rho n\sqrt{2}}{2}}{\frac{n}{n-1}+\frac{\epsilon n% }{k-1}},divide start_ARG 1 + divide start_ARG italic_ρ italic_n square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG start_ARG divide start_ARG italic_n end_ARG start_ARG italic_n - 1 end_ARG + divide start_ARG italic_ϵ italic_n end_ARG start_ARG italic_k - 1 end_ARG end_ARG ,

    and over all items ji𝑗𝑖j\neq iitalic_j ≠ italic_i (with sj(v)1subscript𝑠𝑗𝑣1s_{j}(v)\leq 1italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) ≤ 1) we have

    jinp(j)sj(v)superscriptsubscript𝑗𝑖𝑛𝑝𝑗subscript𝑠𝑗𝑣absent\displaystyle\sum_{j\neq i}^{n}\frac{p(j)}{s_{j}(v)}\geq∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_p ( italic_j ) end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v ) end_ARG ≥  11nρ22.11𝑛𝜌22\displaystyle\;1-\frac{1}{n}-\frac{\rho\sqrt{2}}{2}.1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG .

    Observe that the bounds for each term are equalized at n1n𝑛1𝑛\frac{n-1}{n}divide start_ARG italic_n - 1 end_ARG start_ARG italic_n end_ARG when ρ=ϵ=0𝜌italic-ϵ0\rho=\epsilon=0italic_ρ = italic_ϵ = 0, and so 𝐮nEIRDsubscript𝐮𝑛EIRD\mathbf{u}_{n}\in\operatorname{\textup{{EIRD}}}bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ EIRD whenever λk1n1𝜆𝑘1𝑛1\lambda\geq\frac{k-1}{n-1}italic_λ ≥ divide start_ARG italic_k - 1 end_ARG start_ARG italic_n - 1 end_ARG. We can specify ϵ(ρ)italic-ϵ𝜌\epsilon(\rho)italic_ϵ ( italic_ρ ) in terms of ρ𝜌\rhoitalic_ρ to maintain equality, and thus inclusion of pEIRD𝑝EIRDp\in\operatorname{\textup{{EIRD}}}italic_p ∈ EIRD. Taking ϵ(ρ)italic-ϵ𝜌\epsilon(\rho)italic_ϵ ( italic_ρ ) in terms of ρ𝜌\rhoitalic_ρ as

    ϵ(ρ)=italic-ϵ𝜌absent\displaystyle\epsilon(\rho)=italic_ϵ ( italic_ρ ) = ρn(k1)2(n1)2nρ𝜌𝑛𝑘12𝑛12𝑛𝜌\displaystyle\;\frac{\rho n(k-1)}{\frac{2(n-1)}{\sqrt{2}n}-\rho}divide start_ARG italic_ρ italic_n ( italic_k - 1 ) end_ARG start_ARG divide start_ARG 2 ( italic_n - 1 ) end_ARG start_ARG square-root start_ARG 2 end_ARG italic_n end_ARG - italic_ρ end_ARG
    =\displaystyle== ρn(k1)22(11nρ22)𝜌𝑛𝑘12211𝑛𝜌22\displaystyle\;\frac{\frac{\rho n(k-1)\sqrt{2}}{2}}{\left(1-\frac{1}{n}-\frac{% \rho\sqrt{2}}{2}\right)}divide start_ARG divide start_ARG italic_ρ italic_n ( italic_k - 1 ) square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG start_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) end_ARG
    =\displaystyle== (k1)(1n+ρ2211nρ221n1)𝑘11𝑛𝜌2211𝑛𝜌221𝑛1\displaystyle\;(k-1)\left(\frac{\frac{1}{n}+\frac{\rho\sqrt{2}}{2}}{1-\frac{1}% {n}-\frac{\rho\sqrt{2}}{2}}-\frac{1}{n-1}\right)( italic_k - 1 ) ( divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG italic_n - 1 end_ARG )

    gives us that

    1n1+ϵ(ρ)k11𝑛1italic-ϵ𝜌𝑘1absent\displaystyle\frac{1}{n-1}+\frac{\epsilon(\rho)}{k-1}\geqdivide start_ARG 1 end_ARG start_ARG italic_n - 1 end_ARG + divide start_ARG italic_ϵ ( italic_ρ ) end_ARG start_ARG italic_k - 1 end_ARG ≥ 1n+ρ2211nρ221𝑛𝜌2211𝑛𝜌22\displaystyle\;\frac{\frac{1}{n}+\frac{\rho\sqrt{2}}{2}}{1-\frac{1}{n}-\frac{% \rho\sqrt{2}}{2}}divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG start_ARG 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_ρ square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG end_ARG

    for ρ0𝜌0\rho\geq 0italic_ρ ≥ 0, and so we maintain that pEIRD𝑝EIRDp\in\operatorname{\textup{{EIRD}}}italic_p ∈ EIRD. Inverting, we have

    ρ(ϵ)=𝜌italic-ϵabsent\displaystyle\rho(\epsilon)=italic_ρ ( italic_ϵ ) = ϵ2(n1)2nn(k1)+ϵitalic-ϵ2𝑛12𝑛𝑛𝑘1italic-ϵ\displaystyle\;\frac{\epsilon\frac{2(n-1)}{\sqrt{2}n}}{n(k-1)+\epsilon}divide start_ARG italic_ϵ divide start_ARG 2 ( italic_n - 1 ) end_ARG start_ARG square-root start_ARG 2 end_ARG italic_n end_ARG end_ARG start_ARG italic_n ( italic_k - 1 ) + italic_ϵ end_ARG

    as the radius of a ball around 𝐮nsubscript𝐮𝑛\mathbf{u}_{n}bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT contained in EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD. To see that EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD is θ𝜃\thetaitalic_θ-locally controllable, consider any vt1subscript𝑣𝑡1v_{t-1}italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT and vsuperscript𝑣v^{*}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD where vπ(vt1)(vt1)superscript𝑣subscript𝜋subscript𝑣𝑡1subscript𝑣𝑡1v^{*}\in\operatorname{\mathcal{B}}_{\pi(v_{t-1})}(v_{t-1})italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_π ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), and let vt=(1θt)vt1+θtvsubscript𝑣𝑡1subscript𝜃𝑡subscript𝑣𝑡1subscript𝜃𝑡superscript𝑣v_{t}=(1-\theta_{t})v_{t-1}+\theta_{t}v^{*}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. By playing an action distribution xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which induces pt(xt)=vsubscript𝑝𝑡subscript𝑥𝑡superscript𝑣p_{t}(x_{t})=v^{*}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the memory vector is then updated to vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This is feasible for any vtθπ(vt1)(vt1)subscript𝑣𝑡subscript𝜃𝜋subscript𝑣𝑡1subscript𝑣𝑡1v_{t}\in\operatorname{\mathcal{B}}_{\theta\cdot\pi(v_{t-1})}(v_{t-1})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_θ ⋅ italic_π ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), as each corresponds to some vπ(vt1)(vt1)superscript𝑣subscript𝜋subscript𝑣𝑡1subscript𝑣𝑡1v^{*}\in\operatorname{\mathcal{B}}_{\pi(v_{t-1})}(v_{t-1})italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_π ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). ∎

We remark that for the EIRDEIRD\operatorname{\textup{{EIRD}}}EIRD set, if losses are given over ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT rather than vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, one can define dynamics which directly consider the state to simply be the induced distribution ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in each round, which satisfies strong local controllability with any ptEIRDsubscript𝑝𝑡EIRDp_{t}\in\operatorname{\textup{{EIRD}}}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ EIRD feasible at each round; in general, we consider dynamics to view the memory vector as the state, as the feasible updates ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are a function of vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Such is the case for the ϕitalic-ϕ\phiitalic_ϕ-smoothed simplex, for which we can state an analogous local controllability result.

Lemma 4.

If each sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded, then any instance (𝒳,Δϕ(n),D)𝒳superscriptΔitalic-ϕ𝑛𝐷(\operatorname{\mathcal{X}},\Delta^{\phi}(n),D)( caligraphic_X , roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) , italic_D ) over the ϕitalic-ϕ\phiitalic_ϕ-smoothed simplex for ϕ=Θ(kλσ2)italic-ϕΘ𝑘𝜆superscript𝜎2\phi=\Theta(k\lambda\sigma^{2})italic_ϕ = roman_Θ ( italic_k italic_λ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) satisfies Ω(θλϕ)Ω𝜃𝜆italic-ϕ\Omega(\theta\lambda\phi)roman_Ω ( italic_θ italic_λ italic_ϕ )-local controllability.

  • Proof

    The following lemma from Agarwal and Brown (2023) shows that a ball of distributions around any memory vector vΔϕ(n)𝑣superscriptΔitalic-ϕ𝑛v\in\Delta^{\phi}(n)italic_v ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) is feasible under IRD(v)IRD𝑣\operatorname{\textup{{IRD}}}(v)IRD ( italic_v ).

    Lemma 5 (IRDIRD\operatorname{\textup{{IRD}}}IRD for Scale-Bounded Preferences Agarwal and Brown (2023)).

    Let each sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be (σ,λ)𝜎𝜆(\sigma,\lambda)( italic_σ , italic_λ )-scale-bounded with σ4(n1)/k𝜎4𝑛1𝑘\sigma\leq\sqrt{4(n-1)/k}italic_σ ≤ square-root start_ARG 4 ( italic_n - 1 ) / italic_k end_ARG, and let vΔϕ(n)𝑣superscriptΔitalic-ϕ𝑛v\in\Delta^{\phi}(n)italic_v ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ) be a vector in the ϕitalic-ϕ\phiitalic_ϕ-smoothed simplex, for ϕΘkλσ2italic-ϕΘ𝑘𝜆superscript𝜎2\phi\geq\Theta{k\lambda\sigma^{2}}italic_ϕ ≥ roman_Θ italic_k italic_λ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, pIRD(v)𝑝IRD𝑣p\in\operatorname{\textup{{IRD}}}(v)italic_p ∈ IRD ( italic_v ) for any vector pλϕ(v)Δϕ(n)𝑝subscript𝜆italic-ϕ𝑣superscriptΔitalic-ϕ𝑛p\in\operatorname{\mathcal{B}}_{\lambda\phi}(v)\cap\Delta^{\phi}(n)italic_p ∈ caligraphic_B start_POSTSUBSCRIPT italic_λ italic_ϕ end_POSTSUBSCRIPT ( italic_v ) ∩ roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ).

    Let d=min(λϕ,π(vt1))λϕπ(vt1)𝑑𝜆italic-ϕ𝜋subscript𝑣𝑡1𝜆italic-ϕ𝜋subscript𝑣𝑡1d=\min(\lambda\phi,\pi(v_{t-1}))\leq\lambda\phi\pi(v_{t-1})italic_d = roman_min ( italic_λ italic_ϕ , italic_π ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ≤ italic_λ italic_ϕ italic_π ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) for any vt1subscript𝑣𝑡1v_{t-1}italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT in Δϕ(n)superscriptΔitalic-ϕ𝑛\Delta^{\phi}(n)roman_Δ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT ( italic_n ). Any vd(vt1)superscript𝑣subscript𝑑subscript𝑣𝑡1v^{*}\in\operatorname{\mathcal{B}}_{d}(v_{t-1})italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) then is contained in IRD(vt1)IRDsubscript𝑣𝑡1\operatorname{\textup{{IRD}}}(v_{t-1})IRD ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), and so playing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that pt(xt)=vsubscript𝑝𝑡subscript𝑥𝑡superscript𝑣p_{t}(x_{t})=v^{*}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT yields an update to vt=(1θt)vt1+θvsubscript𝑣𝑡1subscript𝜃𝑡subscript𝑣𝑡1𝜃superscript𝑣v_{t}=(1-\theta_{t})v_{t-1}+\theta v^{*}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which is feasible for any vtdθ(vt1)subscript𝑣𝑡subscript𝑑𝜃subscript𝑣𝑡1v_{t}\in\operatorname{\mathcal{B}}_{d\theta}(v_{t-1})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_d italic_θ end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), and so Ω(θλϕ)Ω𝜃𝜆italic-ϕ\Omega(\theta\lambda\phi)roman_Ω ( italic_θ italic_λ italic_ϕ )-local controllability holds. ∎

For any such set 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y which yields locally controllable dynamics for the instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ), we can minimize regret over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y via NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl, where we optimize with respect to the surrogate losses ft(vt)superscriptsubscript𝑓𝑡subscript𝑣𝑡f_{t}^{*}(v_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Note that for our regret benchmark of the best per-round instantaneously distribution in 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y, any fixed vector vsuperscript𝑣v^{*}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT which is instantaneously targeted across all rounds yields an item distribution pt=vsubscript𝑝𝑡superscript𝑣p_{t}=v^{*}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in each round, and so ft(v)=ft(p)subscriptsuperscript𝑓𝑡superscript𝑣subscript𝑓𝑡superscript𝑝f^{*}_{t}(v^{*})=f_{t}(p^{*})italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). We assume that y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is bounded inside 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y (which typically will hold for y0=𝐮nsubscript𝑦0subscript𝐮𝑛y_{0}=\mathbf{u}_{n}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT).

Theorem 18.

For any ρ𝜌\rhoitalic_ρ-locally controllable instance (𝒳,𝒴,D)𝒳𝒴𝐷(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D)( caligraphic_X , caligraphic_Y , italic_D ) of Adaptive Recommendations with update speed θ>0𝜃0\theta>0italic_θ > 0, running NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl over the surrogate losses ft(vt)superscriptsubscript𝑓𝑡subscript𝑣𝑡f_{t}^{*}(v_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) yields regret

RegT(NestedOCO)subscriptReg𝑇NestedOCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrl ) ≤  2(2+Rrρ+1θ)TGL2γ22𝑅𝑟𝜌1𝜃𝑇𝐺superscript𝐿2𝛾\displaystyle\;2\sqrt{\frac{(2+\frac{R}{r\rho}+\frac{1}{\theta})TGL^{2}}{% \gamma}}2 square-root start_ARG divide start_ARG ( 2 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG

with respect to the true losses ft(pt)subscript𝑓𝑡subscript𝑝𝑡f_{t}(p_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y.

  • Proof

    Beyond applying the regret bound for NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl from Theorem 1, the key step here is to bound surrogate loss errors as:

    t=1Tft(pt)ft(v)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑝𝑡subscript𝑓𝑡superscript𝑣absent\displaystyle\sum_{t=1}^{T}f_{t}(p_{t})-f_{t}(v^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ t=1Tft(vt)ft(v)+t=1Tft(vt)ft(pt)superscriptsubscript𝑡1𝑇superscriptsubscript𝑓𝑡subscript𝑣𝑡subscript𝑓𝑡superscript𝑣superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑣𝑡subscript𝑓𝑡subscript𝑝𝑡\displaystyle\;\sum_{t=1}^{T}f_{t}^{*}(v_{t})-f_{t}(v^{*})+\sum_{t=1}^{T}f_{t}% (v_{t})-f_{t}(p_{t})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq η(1+Rrρ)TL2γ+Gη+t=1Tft(vt)ft(vt(1θt)vt1θt)𝜂1𝑅𝑟𝜌𝑇superscript𝐿2𝛾𝐺𝜂superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑣𝑡subscript𝑓𝑡subscript𝑣𝑡1subscript𝜃𝑡subscript𝑣𝑡1subscript𝜃𝑡\displaystyle\;\eta\left(1+\frac{R}{r\rho}\right)\frac{TL^{2}}{\gamma}+\frac{G% }{\eta}+\sum_{t=1}^{T}f_{t}(v_{t})-f_{t}\left(\frac{v_{t}-(1-\theta_{t})v_{t-1% }}{\theta_{t}}\right)italic_η ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
    \displaystyle\leq η(1+Rrρ)TL2γ+Gη+t=1Tft(vt)ft(vt1+vtvt1θt)𝜂1𝑅𝑟𝜌𝑇superscript𝐿2𝛾𝐺𝜂superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑣𝑡subscript𝑓𝑡subscript𝑣𝑡1subscript𝑣𝑡subscript𝑣𝑡1subscript𝜃𝑡\displaystyle\;\eta\left(1+\frac{R}{r\rho}\right)\frac{TL^{2}}{\gamma}+\frac{G% }{\eta}+\sum_{t=1}^{T}f_{t}(v_{t})-f_{t}\left(v_{t-1}+\frac{v_{t}-v_{t-1}}{% \theta_{t}}\right)italic_η ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
    \displaystyle\leq η(1+Rrρ)TL2γ+Gη+L(1+1θ)t=1Tvtvt1𝜂1𝑅𝑟𝜌𝑇superscript𝐿2𝛾𝐺𝜂𝐿11𝜃superscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑣𝑡subscript𝑣𝑡1\displaystyle\;\eta\left(1+\frac{R}{r\rho}\right)\frac{TL^{2}}{\gamma}+\frac{G% }{\eta}+L\left(1+\frac{1}{\theta}\right)\sum_{t=1}^{T}\left\lVert v_{t}-v_{t-1% }\right\rVertitalic_η ( 1 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + italic_L ( 1 + divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥
    \displaystyle\leq η(2+Rrρ+1θ)TL2γ+Gη𝜂2𝑅𝑟𝜌1𝜃𝑇superscript𝐿2𝛾𝐺𝜂\displaystyle\;\eta\left(2+\frac{R}{r\rho}+\frac{1}{\theta}\right)\frac{TL^{2}% }{\gamma}+\frac{G}{\eta}italic_η ( 2 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    =\displaystyle==  2(2+Rrρ+1θ)TGL2γ22𝑅𝑟𝜌1𝜃𝑇𝐺superscript𝐿2𝛾\displaystyle\;2\sqrt{\frac{(2+\frac{R}{r\rho}+\frac{1}{\theta})TGL^{2}}{% \gamma}}2 square-root start_ARG divide start_ARG ( 2 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG

    upon setting η=Gγ(2+Rrρ+1θ)TL2𝜂𝐺𝛾2𝑅𝑟𝜌1𝜃𝑇superscript𝐿2\eta=\sqrt{\frac{G\gamma}{(2+\frac{R}{r\rho}+\frac{1}{\theta})TL^{2}}}italic_η = square-root start_ARG divide start_ARG italic_G italic_γ end_ARG start_ARG ( 2 + divide start_ARG italic_R end_ARG start_ARG italic_r italic_ρ end_ARG + divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ) italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG, which yields the theorem. ∎

Theorems 7 and 8 follow from Theorem 18, as well as from Lemmas 3 and 4, respectively.

Appendix J Background and Proofs for Section 4.3: Adaptive Pricing

J.1 Background

While there is a large literature on designing online mechanisms for pricing discrete goods via auctions (Mehta et al., 2007; Kanoria and Nazerzadeh, 2020; Golrezaei et al., 2020; Morgenstern and Roughgarden, 2016; Feng et al., 2019; Braverman et al., 2017), there is comparatively little work related to online pricing problems for real-valued goods. Most work for such problems to date requires strong assumptions on valuation functions, often either assuming linearity (Jia et al., 2014) or additivity (Agrawal et al., 2023), or requiring approximability via discretization (Mussi et al., 2022). Here, we introduce a novel formulation for an Adaptive Pricing problem which builds on the myopic-demand fixed-cost setting of Roth et al. (2015), which we extend to accommodate adversarial consumption rates for the agent (which affect demand, as a function of the agent’s reserves) as well as adversarial production costs. As in Roth et al. (2015), our setting can accommodate general convex (increasing) production cost functions and concave (increasing) valuations for the agent, provided that valuations additionally are homogeneous; to our knowledge, this encompasses a much wider class of valuations and costs than considered by any prior work on no-regret dynamic pricing for real-valued goods.

J.2 Model

In each round t𝑡titalic_t, an agent (the consumer) begins with goods reserves yt10nsubscript𝑦𝑡1superscriptsubscriptabsent0𝑛y_{t-1}\in\operatorname{\mathbb{R}}_{\geq 0}^{n}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (with y0=𝟎subscript𝑦00y_{0}=\mathbf{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0), then consumes an adversarially chosen fraction θt[θ,1]subscript𝜃𝑡𝜃1\theta_{t}\in[\theta,1]italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_θ , 1 ] of each good simultaneously (e.g. corresponding to their rate of manufacturing downstream items, using the goods as components), updating their reserves to (1θt)yt11subscript𝜃𝑡subscript𝑦𝑡1(1-\theta_{t})y_{t-1}( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. We (the producer) show the consumer some vector pt+nsubscript𝑝𝑡superscriptsubscript𝑛p_{t}\in\operatorname{\mathbb{R}}_{+}^{n}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of per-unit prices for each good, and the consumer purchases some bundle of goods xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The consumer’s valuation function for reserves of goods is given by v:+n+:𝑣superscriptsubscript𝑛subscriptv:\operatorname{\mathbb{R}}_{+}^{n}\rightarrow\operatorname{\mathbb{R}}_{+}italic_v : blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, and their selection of xt=x(pt,θt,yt1)subscript𝑥𝑡superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1x_{t}=x^{*}(p_{t},\theta_{t},y_{t-1})italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is given by

x(pt,θt,yt1)=superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1absent\displaystyle x^{*}(p_{t},\theta_{t},y_{t-1})=italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = argmaxx+nv(x+(1θt)yt1)pt,x.subscriptargmax𝑥superscriptsubscript𝑛𝑣𝑥1subscript𝜃𝑡subscript𝑦𝑡1subscript𝑝𝑡𝑥\displaystyle\;\operatorname*{argmax}_{x\in\operatorname{\mathbb{R}}_{+}^{n}}v% (x+(1-\theta_{t})y_{t-1})-\langle p_{t},x\rangle.roman_argmax start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_v ( italic_x + ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ⟩ .

We later discuss behavior of xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT when the argmaxargmax\operatorname*{argmax}roman_argmax is undefined; it will suffice for us to only consider price vectors for which it is defined. This updates the consumer’s reserves to yt=xt+(1θt)yt1subscript𝑦𝑡subscript𝑥𝑡1subscript𝜃𝑡subscript𝑦𝑡1y_{t}=x_{t}+(1-\theta_{t})y_{t-1}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. Upon seeing the consumer’s purchased bundle xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we receive their payment pt,xtsubscript𝑝𝑡subscript𝑥𝑡\langle p_{t},x_{t}\rangle⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ minus our production cost ct(xt):+n+:subscript𝑐𝑡subscript𝑥𝑡superscriptsubscript𝑛subscriptc_{t}(x_{t}):\operatorname{\mathbb{R}}_{+}^{n}\rightarrow\operatorname{\mathbb% {R}}_{+}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, where ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is adversarially chosen. Our utility is then given by

ft(pt,xt)=subscript𝑓𝑡subscript𝑝𝑡subscript𝑥𝑡absent\displaystyle f_{t}(p_{t},x_{t})=italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = pt,xtct(xt).subscript𝑝𝑡subscript𝑥𝑡subscript𝑐𝑡subscript𝑥𝑡\displaystyle\;\langle p_{t},x_{t}\rangle-c_{t}(x_{t}).⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

We make the following assumptions on production costs ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the consumer’s valuation v𝑣vitalic_v.

Assumption 2 (Production Costs).

We assume that for each ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the following hold over +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:

  • ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is non-negative, convex, and Lcsubscript𝐿𝑐L_{c}italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT-Lipschitz,

  • limϵ0ct(ϵ𝟏)C0subscriptitalic-ϵ0subscript𝑐𝑡italic-ϵ1subscript𝐶0\lim_{\epsilon\rightarrow 0}c_{t}(\epsilon\cdot\mathbf{1})\leq C_{0}roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ϵ ⋅ bold_1 ) ≤ italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for some C00subscript𝐶00C_{0}\geq 0italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ 0, and

  • ct(x)ϕx+C0subscript𝑐𝑡𝑥italic-ϕdelimited-∥∥𝑥subscript𝐶0c_{t}(x)\geq\phi\left\lVert x\right\rVert+C_{0}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ≥ italic_ϕ ∥ italic_x ∥ + italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for some ϕ>0italic-ϕ0\phi>0italic_ϕ > 0.

Further, each ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is revealed prior to setting prices pt+1subscript𝑝𝑡1p_{t+1}italic_p start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT.

Assumption 3 (Consumer Valuations).

We assume that the following hold over some set 𝒴+n𝒴subscriptsuperscript𝑛\operatorname{\mathcal{Y}}\subseteq\operatorname{\mathbb{R}}^{n}_{+}caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT:

  • v𝑣vitalic_v is non-negative, continuous, and differentiable,

  • v𝑣vitalic_v is strictly concave and increasing,

  • v𝑣vitalic_v is (λ,β)𝜆𝛽(\lambda,\beta)( italic_λ , italic_β )-Hölder continuous for some λ1𝜆1\lambda\geq 1italic_λ ≥ 1 and β(0,1]𝛽01\beta\in(0,1]italic_β ∈ ( 0 , 1 ], i.e.

    |v(y)v(y)|λyyβ,𝑣𝑦𝑣superscript𝑦𝜆superscriptdelimited-∥∥𝑦superscript𝑦𝛽\left\lvert v(y)-v(y^{\prime})\right\rvert\leq\lambda\left\lVert y-y^{\prime}% \right\rVert^{\beta},| italic_v ( italic_y ) - italic_v ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_λ ∥ italic_y - italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ,

    and

  • v𝑣vitalic_v is homogeneous of degree k𝑘kitalic_k for some k(0,1)𝑘01k\in(0,1)italic_k ∈ ( 0 , 1 ), i.e. v(by)=bkv(y)𝑣𝑏𝑦superscript𝑏𝑘𝑣𝑦v(by)=b^{k}v(y)italic_v ( italic_b italic_y ) = italic_b start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_v ( italic_y ) for any b>0𝑏0b>0italic_b > 0.

Further, v𝑣vitalic_v is known to the producer.

Given the concavity assumption, we note that it is without loss of generality to assume that k(0,1)𝑘01k\in(0,1)italic_k ∈ ( 0 , 1 ) for the homogeneity parameter. There are several well-studied valuation families which satisfy these properties for an appropriate set 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y; see Roth et al. (2015) for proofs of each example.

Example 3 (Constant Elasticity of Substitution (CES)).

Valuations of the form

v(y)=𝑣𝑦absent\displaystyle v(y)=italic_v ( italic_y ) = (i=1nαiyiκ)β,superscriptsuperscriptsubscript𝑖1𝑛subscript𝛼𝑖superscriptsubscript𝑦𝑖𝜅𝛽\displaystyle\;\left(\sum_{i=1}^{n}\alpha_{i}y_{i}^{\kappa}\right)^{\beta},( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ,

with each αi,κ,β>0subscript𝛼𝑖𝜅𝛽0\alpha_{i},\kappa,\beta>0italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_κ , italic_β > 0 and κ,βκ<1𝜅𝛽𝜅1\kappa,\beta\kappa<1italic_κ , italic_β italic_κ < 1, are Hölder continuous, differentiable, strictly concave, non-decreasing, and homogeneous over a convex set in +nsubscriptsuperscript𝑛\operatorname{\mathbb{R}}^{n}_{+}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

Example 4 (Cobb-Douglas).

Valuations of the form

v(y)=𝑣𝑦absent\displaystyle v(y)=italic_v ( italic_y ) = i=1nyiαi,superscriptsubscriptproduct𝑖1𝑛superscriptsubscript𝑦𝑖subscript𝛼𝑖\displaystyle\;\prod_{i=1}^{n}y_{i}^{\alpha_{i}},∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

with αi>0subscript𝛼𝑖0\alpha_{i}>0italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 and i=1nαi<1superscriptsubscript𝑖1𝑛subscript𝛼𝑖1\sum_{i=1}^{n}\alpha_{i}<1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 are Hölder continuous, differentiable, strictly concave, non-decreasing, and homogeneous over a convex set in +nsubscriptsuperscript𝑛\operatorname{\mathbb{R}}^{n}_{+}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

We initially assume that Assumption 3 holds over all of +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, but will restrict our attention to the set 𝒴+n𝒴subscriptsuperscript𝑛\operatorname{\mathcal{Y}}\subseteq\operatorname{\mathbb{R}}^{n}_{+}caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT of bundles where v(y)ϕy𝑣𝑦italic-ϕdelimited-∥∥𝑦v(y)\geq\phi\left\lVert y\right\rVertitalic_v ( italic_y ) ≥ italic_ϕ ∥ italic_y ∥ for each y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y, and we note that our results can be extended to arbitrary downward-closed convex sets (where by𝒴𝑏𝑦𝒴by\in\operatorname{\mathcal{Y}}italic_b italic_y ∈ caligraphic_Y for any y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y and b(0,1]𝑏01b\in(0,1]italic_b ∈ ( 0 , 1 ]). In Section J.3 we that show Assumptions 2 and 3 yield several important properties which enable optimization via our framework. We show a unique map** between price vectors and bundle purchases (for any fixed reserves and consumption rate), that restricting attention to 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y is justified under rationality constraints, and that 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y is convex.

Further, there is some price vector which yields a reserve update to any yt𝒴subscript𝑦𝑡𝒴y_{t}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_Y in a neighborhood around yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, yielding local controllability. Crucially, we show that there are concave surrogate rewards ft(yt)subscriptsuperscript𝑓𝑡subscript𝑦𝑡f^{*}_{t}(y_{t})italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) which will closely track our true rewards ft(pt,xt)subscript𝑓𝑡subscript𝑝𝑡subscript𝑥𝑡f_{t}(p_{t},x_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), leveraging the following property of homogeneous functions.

Proposition 10 (Euler’s Theorem for Homogeneous Functions).

A continuous and differentiable function v:𝒴+:𝑣𝒴subscriptv:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}_{+}italic_v : caligraphic_Y → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is homogeneous of degree k𝑘kitalic_k if and only if

v(y),y=𝑣𝑦𝑦absent\displaystyle\langle\nabla v(y),y\rangle=⟨ ∇ italic_v ( italic_y ) , italic_y ⟩ = kv(y).𝑘𝑣𝑦\displaystyle\;k\cdot v(y).italic_k ⋅ italic_v ( italic_y ) .

We run NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl directly over these concave surrogate rewards (by inverting the sign of each), where each ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be computed efficiently in terms of yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT and θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and we show that the surrogate reward distance from our true rewards is bounded. While our rewards will not be Lipschitz over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y in general, we show that appropriately calibrating our step size yields sublinear regret with dependence on the Hölder continuity parameters. We measure our regret with respect to the set of stable reserve policies, i.e. pricing policies where ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT remains constant.

Definition 8 (Regret for Stable Reserve Policies).

Let 𝒫𝒴={Py:y𝒴}subscript𝒫𝒴conditional-setsubscript𝑃𝑦𝑦𝒴\operatorname{\mathcal{P}}_{\operatorname{\mathcal{Y}}}=\{P_{y}:y\in% \operatorname{\mathcal{Y}}\}caligraphic_P start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT = { italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT : italic_y ∈ caligraphic_Y } be the set of stable reserve policies, where for any yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT and θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying (1θt)yt1y1subscript𝜃𝑡subscript𝑦𝑡1superscript𝑦(1-\theta_{t})y_{t-1}\leq y^{*}( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ≤ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, playing prices computed by a policy pt=Py(yt1,θ)subscript𝑝𝑡superscriptsubscript𝑃𝑦subscript𝑦𝑡1𝜃p_{t}=P_{y}^{*}(y_{t-1},\theta)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_θ ) yields

(1θt)yt1+x(pt,θt,yt1)=y.1subscript𝜃𝑡subscript𝑦𝑡1superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1superscript𝑦\displaystyle(1-\theta_{t})y_{t-1}+x^{*}(p_{t},\theta_{t},y_{t-1})=y^{*}.( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

It is straightforward to see that any Py𝒫𝒴superscriptsubscript𝑃𝑦subscript𝒫𝒴P_{y}^{*}\in\operatorname{\mathcal{P}}_{\operatorname{\mathcal{Y}}}italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT maintains the invariant that yt=ysubscript𝑦𝑡superscript𝑦y_{t}=y^{*}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, provided that some such ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is always feasible.

J.3 Analysis

We show a series of results establishing the key conditions allowing us to formulate this problem as a locally controllable instance of online nonlinear control. We first show that any positive bundle is the unique optimal purchase for some positive price vector.

Lemma 6.

For any reserves yt10nsubscript𝑦𝑡1superscriptsubscriptabsent0𝑛y_{t-1}\in\operatorname{\mathbb{R}}_{\geq 0}^{n}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, consumption rate θt[θ,1]subscript𝜃𝑡𝜃1\theta_{t}\in[\theta,1]italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_θ , 1 ], and vector yt+nsubscript𝑦𝑡superscriptsubscript𝑛y_{t}\in\operatorname{\mathbb{R}}_{+}^{n}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT where yt>(1θt)yt1subscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1y_{t}>(1-\theta_{t})y_{t-1}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT elementwise, the bundle xt=yt(1θt)yt1subscript𝑥𝑡subscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1x_{t}=y_{t}-(1-\theta_{t})y_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is the unique solution to

xt=subscript𝑥𝑡absent\displaystyle x_{t}=italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = x(pt,θt,yt1)superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1\displaystyle\;x^{*}(p_{t},\theta_{t},y_{t-1})italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )

for prices pt=v(yt)subscript𝑝𝑡𝑣subscript𝑦𝑡p_{t}=\nabla v(y_{t})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

  • Proof

    Recall that the consumer’s bundle choice is given by

    x(pt,θt,yt1)=superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1absent\displaystyle x^{*}(p_{t},\theta_{t},y_{t-1})=italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = argmaxx+nv(x+(1θt)yt1)pt,x.subscriptargmax𝑥superscriptsubscript𝑛𝑣𝑥1subscript𝜃𝑡subscript𝑦𝑡1subscript𝑝𝑡𝑥\displaystyle\;\operatorname*{argmax}_{x\in\operatorname{\mathbb{R}}_{+}^{n}}v% (x+(1-\theta_{t})y_{t-1})-\langle p_{t},x\rangle.roman_argmax start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_v ( italic_x + ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ⟩ .

    Note that v((1θt)yt1+x)pt,x𝑣1subscript𝜃𝑡subscript𝑦𝑡1𝑥subscript𝑝𝑡𝑥v((1-\theta_{t})y_{t-1}+x)-\langle p_{t},x\rangleitalic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ⟩ is strictly concave in x𝑥xitalic_x for any x+n𝑥subscriptsuperscript𝑛x\in\operatorname{\mathbb{R}}^{n}_{+}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, as the gradients

    xv((1θt)yt+1+x)=subscript𝑥𝑣1subscript𝜃𝑡subscript𝑦𝑡1𝑥absent\displaystyle\nabla_{x}v((1-\theta_{t})y_{t+1}+x)=∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_x ) = ytv(yt)subscriptsubscript𝑦𝑡𝑣subscript𝑦𝑡\displaystyle\;\nabla_{y_{t}}v(y_{t})∇ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

    are preserved at each point yt=(1θt)yt+1+xsubscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1𝑥y_{t}=(1-\theta_{t})y_{t+1}+xitalic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_x, and subtracting the linear function x,pt𝑥subscript𝑝𝑡\langle x,p_{t}\rangle⟨ italic_x , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ does not affect strict concavity. We also have that pt+nsubscript𝑝𝑡superscriptsubscript𝑛p_{t}\in\operatorname{\mathbb{R}}_{+}^{n}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for prices pt=v(yt)subscript𝑝𝑡𝑣subscript𝑦𝑡p_{t}=\nabla v(y_{t})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), as v𝑣vitalic_v is strictly concave and non-decreasing. This yields that v((1θt)yt1+x)pt,x𝑣1subscript𝜃𝑡subscript𝑦𝑡1𝑥subscript𝑝𝑡𝑥v((1-\theta_{t})y_{t-1}+x)-\langle p_{t},x\rangleitalic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ⟩ has a unique global maximum at xt=yt(1θt)yt1subscript𝑥𝑡subscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1x_{t}=y_{t}-(1-\theta_{t})y_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, as x(v((1θt)yt+1+x)pt,x)=𝟎subscript𝑥𝑣1subscript𝜃𝑡subscript𝑦𝑡1𝑥subscript𝑝𝑡𝑥0\nabla_{x}(v((1-\theta_{t})y_{t+1}+x)-\langle p_{t},x\rangle)=\mathbf{0}∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_x ) - ⟨ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x ⟩ ) = bold_0. ∎

As such, the argmaxargmax\operatorname*{argmax}roman_argmax for x(pt,θt,yt1)superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1x^{*}(p_{t},\theta_{t},y_{t-1})italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) is unique whenever pt=v(y)subscript𝑝𝑡𝑣𝑦p_{t}=\nabla v(y)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_v ( italic_y ) for some y+n𝑦superscriptsubscript𝑛y\in\operatorname{\mathbb{R}}_{+}^{n}italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We let p(xt;yt1,θt)=v((1θt)yt1+xt)superscript𝑝subscript𝑥𝑡subscript𝑦𝑡1subscript𝜃𝑡𝑣1subscript𝜃𝑡subscript𝑦𝑡1subscript𝑥𝑡p^{*}(x_{t};y_{t-1},\theta_{t})=\nabla v((1-\theta_{t})y_{t-1}+x_{t})italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∇ italic_v ( ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) denote this price vector which induces a purchase of xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. For any other price vector p𝑝pitalic_p, the maximizing bundle xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT either approaches a point on the boundary of +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, or grows unboundedly. We restrict our attention to bundles contained in +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and show that the issue of unboundedness is resolved by rationality considerations for the producer. We characterize the per-round rewards of stable reserve policies as concave functions of y+n𝑦superscriptsubscript𝑛y\in\operatorname{\mathbb{R}}_{+}^{n}italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and show that the optimal such policy corresponds to some state y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y, where 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y is convex and bounded.

Lemma 7.

The round-t𝑡titalic_t reward of a stable reserve policy Pysubscript𝑃𝑦P_{y}italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT corresponding to any y+n𝑦superscriptsubscript𝑛y\in\operatorname{\mathbb{R}}_{+}^{n}italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is given by a strictly concave function

ft(Py)=subscript𝑓𝑡subscript𝑃𝑦absent\displaystyle f_{t}(P_{y})=italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) = θtkv(y)ct(θty).subscript𝜃𝑡𝑘𝑣𝑦subscript𝑐𝑡subscript𝜃𝑡𝑦\displaystyle\;\theta_{t}k\cdot v(y)-c_{t}(\theta_{t}y).italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y ) - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ) .
  • Proof

    We first note that we can maintain yt=ysubscript𝑦𝑡𝑦y_{t}=yitalic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y in every round by Lemma 6, as y0=𝟎subscript𝑦00y_{0}=\mathbf{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0 and (1θt)y<y1subscript𝜃𝑡𝑦𝑦(1-\theta_{t})y<y( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y < italic_y. As such, a bundle xt=θtysubscript𝑥𝑡subscript𝜃𝑡𝑦x_{t}=\theta_{t}yitalic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y is purchased in each round at prices v(y)𝑣𝑦\nabla v(y)∇ italic_v ( italic_y ), and our reward is given by

    ft(Py)=subscript𝑓𝑡subscript𝑃𝑦absent\displaystyle f_{t}(P_{y})=italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) = ft(p(θty;y,θt),θty)subscript𝑓𝑡superscript𝑝subscript𝜃𝑡𝑦𝑦subscript𝜃𝑡subscript𝜃𝑡𝑦\displaystyle\;f_{t}(p^{*}(\theta_{t}y;y,\theta_{t}),\theta_{t}y)italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ; italic_y , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y )
    =\displaystyle== v(y),θtyct(θty)𝑣𝑦subscript𝜃𝑡𝑦subscript𝑐𝑡subscript𝜃𝑡𝑦\displaystyle\;\langle\nabla v(y),\theta_{t}y\rangle-c_{t}(\theta_{t}y)⟨ ∇ italic_v ( italic_y ) , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ⟩ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y )
    =\displaystyle== θtkv(y)ct(θty),subscript𝜃𝑡𝑘𝑣𝑦subscript𝑐𝑡subscript𝜃𝑡𝑦\displaystyle\;\theta_{t}k\cdot v(y)-c_{t}(\theta_{t}y),italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y ) - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ) ,

    where the final step follows from Proposition 10 for homogeneous functions. The function θtkv(y)subscript𝜃𝑡𝑘𝑣𝑦\theta_{t}k\cdot v(y)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y ) is strictly concave, which is preserved upon subtracting the convex function ct(θty)subscript𝑐𝑡subscript𝜃𝑡𝑦c_{t}(\theta_{t}y)italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ). ∎

Lemma 8.

The set 𝒴={y+n:v(y)ϕy}𝒴conditional-set𝑦superscriptsubscript𝑛𝑣𝑦italic-ϕdelimited-∥∥𝑦\operatorname{\mathcal{Y}}=\{y\in\operatorname{\mathbb{R}}_{+}^{n}:v(y)\geq% \phi\left\lVert y\right\rVert\}caligraphic_Y = { italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_v ( italic_y ) ≥ italic_ϕ ∥ italic_y ∥ } is convex.

  • Proof

    Consider any two points y,y𝒴𝑦superscript𝑦𝒴y,y^{\prime}\in\operatorname{\mathcal{Y}}italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y, and let y′′=ay+(1a)ysuperscript𝑦′′𝑎𝑦1𝑎superscript𝑦y^{\prime\prime}=ay+(1-a)y^{\prime}italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = italic_a italic_y + ( 1 - italic_a ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for any a[0,1]𝑎01a\in[0,1]italic_a ∈ [ 0 , 1 ]. Recall that y+nsuperscript𝑦superscriptsubscript𝑛y^{*}\in\operatorname{\mathbb{R}}_{+}^{n}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT belongs to 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y if and only if v(y)ϕy𝑣superscript𝑦italic-ϕdelimited-∥∥superscript𝑦v(y^{*})\geq\phi\left\lVert y^{*}\right\rVertitalic_v ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ italic_ϕ ∥ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥. By concavity of v𝑣vitalic_v, we have that

    v(y′′)=𝑣superscript𝑦′′absent\displaystyle v(y^{\prime\prime})=italic_v ( italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = v(ay+(1a)y)𝑣𝑎𝑦1𝑎superscript𝑦\displaystyle\;v(ay+(1-a)y^{\prime})italic_v ( italic_a italic_y + ( 1 - italic_a ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
    \displaystyle\geq av(y)+(1a)v(y)𝑎𝑣𝑦1𝑎𝑣superscript𝑦\displaystyle\;av(y)+(1-a)v(y^{\prime})italic_a italic_v ( italic_y ) + ( 1 - italic_a ) italic_v ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
    \displaystyle\geq ϕay+ϕ(1a)yitalic-ϕdelimited-∥∥𝑎𝑦italic-ϕdelimited-∥∥1𝑎superscript𝑦\displaystyle\;\phi\left\lVert ay\right\rVert+\phi\left\lVert(1-a)y^{\prime}\right\rVertitalic_ϕ ∥ italic_a italic_y ∥ + italic_ϕ ∥ ( 1 - italic_a ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
    \displaystyle\geq ϕay+(1a)yitalic-ϕdelimited-∥∥𝑎𝑦1𝑎superscript𝑦\displaystyle\;\phi\left\lVert ay+(1-a)y^{\prime}\right\rVertitalic_ϕ ∥ italic_a italic_y + ( 1 - italic_a ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
    =\displaystyle== ϕy′′italic-ϕdelimited-∥∥superscript𝑦′′\displaystyle\;\phi\left\lVert y^{\prime\prime}\right\rVertitalic_ϕ ∥ italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∥

    and so y′′𝒴superscript𝑦′′𝒴y^{\prime\prime}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y, yielding convexity of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y. ∎

Lemma 9.

For any z+n𝑧superscriptsubscript𝑛z\in\operatorname{\mathbb{R}}_{+}^{n}italic_z ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT where z𝒴𝑧𝒴z\notin\operatorname{\mathcal{Y}}italic_z ∉ caligraphic_Y, there is some y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y such that ft(Py)ft(Pz)subscript𝑓𝑡subscript𝑃𝑦subscript𝑓𝑡subscript𝑃𝑧f_{t}(P_{y})\geq f_{t}(P_{z})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) ≥ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) for any θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

  • Proof

    Consider some z𝒴𝑧𝒴z\notin\operatorname{\mathcal{Y}}italic_z ∉ caligraphic_Y such that v(z)=ψz𝑣𝑧𝜓delimited-∥∥𝑧v(z)=\psi\left\lVert z\right\rVertitalic_v ( italic_z ) = italic_ψ ∥ italic_z ∥, for ψ<ϕ𝜓italic-ϕ\psi<\phiitalic_ψ < italic_ϕ, and let y=(ψϕ)1/kz𝑦superscript𝜓italic-ϕ1𝑘𝑧y=\left(\frac{\psi}{\phi}\right)^{1/k}zitalic_y = ( divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG ) start_POSTSUPERSCRIPT 1 / italic_k end_POSTSUPERSCRIPT italic_z. By homogeneity of v𝑣vitalic_v, we have that v(y)=ϕψv(z)=ϕz𝑣𝑦italic-ϕ𝜓𝑣𝑧italic-ϕdelimited-∥∥𝑧v(y)=\frac{\phi}{\psi}v(z)=\phi\left\lVert z\right\rVertitalic_v ( italic_y ) = divide start_ARG italic_ϕ end_ARG start_ARG italic_ψ end_ARG italic_v ( italic_z ) = italic_ϕ ∥ italic_z ∥, and so y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y as z>ydelimited-∥∥𝑧delimited-∥∥𝑦\left\lVert z\right\rVert>\left\lVert y\right\rVert∥ italic_z ∥ > ∥ italic_y ∥. For any round with costs ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and consumption rate θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT we then have that:

    ft(Py)ft(Pz)=subscript𝑓𝑡subscript𝑃𝑦subscript𝑓𝑡subscript𝑃𝑧absent\displaystyle f_{t}(P_{y})-f_{t}(P_{z})=italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) = θtk(v(y)v(z))ct(θty)+ct(θtz)subscript𝜃𝑡𝑘𝑣𝑦𝑣𝑧subscript𝑐𝑡subscript𝜃𝑡𝑦subscript𝑐𝑡subscript𝜃𝑡𝑧\displaystyle\;\theta_{t}k\left(v(y)-v(z)\right)-c_{t}(\theta_{t}y)+c_{t}(% \theta_{t}z)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ( italic_v ( italic_y ) - italic_v ( italic_z ) ) - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ) + italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z )
    =\displaystyle== θtk(ψϕ1)ψzct(θty)+ct(θtz)subscript𝜃𝑡𝑘𝜓italic-ϕ1𝜓delimited-∥∥𝑧subscript𝑐𝑡subscript𝜃𝑡𝑦subscript𝑐𝑡subscript𝜃𝑡𝑧\displaystyle\;\theta_{t}k\left(\frac{\psi}{\phi}-1\right)\psi\left\lVert z% \right\rVert-c_{t}(\theta_{t}y)+c_{t}(\theta_{t}z)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ( divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG - 1 ) italic_ψ ∥ italic_z ∥ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ) + italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z ) (homogeneity of v𝑣vitalic_v)
    \displaystyle\geq θtk(ψϕ1)ψz+θtϕzysubscript𝜃𝑡𝑘𝜓italic-ϕ1𝜓delimited-∥∥𝑧subscript𝜃𝑡italic-ϕdelimited-∥∥𝑧𝑦\displaystyle\;\theta_{t}k\left(\frac{\psi}{\phi}-1\right)\psi\left\lVert z% \right\rVert+\theta_{t}\phi\left\lVert z-y\right\rVertitalic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ( divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG - 1 ) italic_ψ ∥ italic_z ∥ + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ϕ ∥ italic_z - italic_y ∥ ( lower bound and convexity of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\geq θtk(ψϕ1)ψz+θt(1(ψϕ)1/k)ϕzsubscript𝜃𝑡𝑘𝜓italic-ϕ1𝜓delimited-∥∥𝑧subscript𝜃𝑡1superscript𝜓italic-ϕ1𝑘italic-ϕdelimited-∥∥𝑧\displaystyle\;\theta_{t}k\left(\frac{\psi}{\phi}-1\right)\psi\left\lVert z% \right\rVert+\theta_{t}\left(1-\left(\frac{\psi}{\phi}\right)^{1/k}\right)\phi% \left\lVert z\right\rVertitalic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ( divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG - 1 ) italic_ψ ∥ italic_z ∥ + italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 - ( divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG ) start_POSTSUPERSCRIPT 1 / italic_k end_POSTSUPERSCRIPT ) italic_ϕ ∥ italic_z ∥
    \displaystyle\geq θt(1ψϕ)ϕzθt(1ψϕ)ψzsubscript𝜃𝑡1𝜓italic-ϕitalic-ϕdelimited-∥∥𝑧subscript𝜃𝑡1𝜓italic-ϕ𝜓delimited-∥∥𝑧\displaystyle\;\theta_{t}\left(1-\frac{\psi}{\phi}\right)\phi\left\lVert z% \right\rVert-\theta_{t}\left(1-\frac{\psi}{\phi}\right)\psi\left\lVert z\right\rVertitalic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 - divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG ) italic_ϕ ∥ italic_z ∥ - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 - divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG ) italic_ψ ∥ italic_z ∥ (k,ψϕ<1𝑘𝜓italic-ϕ1k,\frac{\psi}{\phi}<1italic_k , divide start_ARG italic_ψ end_ARG start_ARG italic_ϕ end_ARG < 1)
    >\displaystyle>>  0. 0\displaystyle\;0.0 . (ϕ>ψitalic-ϕ𝜓\phi>\psiitalic_ϕ > italic_ψ)

Thus the optimal Pysubscript𝑃𝑦P_{y}italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT for any cost and consumption sequence corresponds to some y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y. We can also bound the radius of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y.

Lemma 10.

Let V=maxy+n:y=1v(y)𝑉subscript:𝑦superscriptsubscript𝑛delimited-∥∥𝑦1𝑣𝑦V=\max_{y\in\operatorname{\mathbb{R}}_{+}^{n}:\left\lVert y\right\rVert=1}v(y)italic_V = roman_max start_POSTSUBSCRIPT italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : ∥ italic_y ∥ = 1 end_POSTSUBSCRIPT italic_v ( italic_y ). Then, for every y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y we have that

ydelimited-∥∥𝑦absent\displaystyle\left\lVert y\right\rVert\leq∥ italic_y ∥ ≤ (Vϕ)11k.superscript𝑉italic-ϕ11𝑘\displaystyle\;\left(\frac{V}{\phi}\right)^{\frac{1}{1-k}}.( divide start_ARG italic_V end_ARG start_ARG italic_ϕ end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_k end_ARG end_POSTSUPERSCRIPT .
  • Proof

    Let y=argmaxy:y=1v(y)superscript𝑦subscriptargmax:𝑦delimited-∥∥𝑦1𝑣𝑦y^{*}=\operatorname*{argmax}_{y:\left\lVert y\right\rVert=1}v(y)italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmax start_POSTSUBSCRIPT italic_y : ∥ italic_y ∥ = 1 end_POSTSUBSCRIPT italic_v ( italic_y ), where we have v(y)=V𝑣superscript𝑦𝑉v(y^{*})=Vitalic_v ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_V. Consider the vector by𝑏superscript𝑦by^{*}italic_b italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for any b>0𝑏0b>0italic_b > 0. By homogeneity of v𝑣vitalic_v, we have that

    v(by)=𝑣𝑏superscript𝑦absent\displaystyle v(by^{*})=italic_v ( italic_b italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bkv(y)superscript𝑏𝑘𝑣superscript𝑦\displaystyle\;b^{k}v(y^{*})italic_b start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_v ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
    =\displaystyle== bkV.superscript𝑏𝑘𝑉\displaystyle\;b^{k}V.italic_b start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_V .

    For any b>(Vϕ)11k𝑏superscript𝑉italic-ϕ11𝑘b>\left(\frac{V}{\phi}\right)^{\frac{1}{1-k}}italic_b > ( divide start_ARG italic_V end_ARG start_ARG italic_ϕ end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_k end_ARG end_POSTSUPERSCRIPT we have that

    v(by)=𝑣𝑏superscript𝑦absent\displaystyle v(by^{*})=italic_v ( italic_b italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bb1kV𝑏superscript𝑏1𝑘𝑉\displaystyle\;\frac{b}{b^{1-k}}\cdot Vdivide start_ARG italic_b end_ARG start_ARG italic_b start_POSTSUPERSCRIPT 1 - italic_k end_POSTSUPERSCRIPT end_ARG ⋅ italic_V
    \displaystyle\leq bϕ,𝑏italic-ϕ\displaystyle\;b\phi,italic_b italic_ϕ ,

    where by>bdelimited-∥∥𝑏superscript𝑦𝑏\left\lVert by^{*}\right\rVert>b∥ italic_b italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ > italic_b and thus by𝒴𝑏superscript𝑦𝒴by^{*}\notin\operatorname{\mathcal{Y}}italic_b italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∉ caligraphic_Y. This holds for all vectors with norm b𝑏bitalic_b, as any such vector z𝑧zitalic_z will have at most bkVsuperscript𝑏𝑘𝑉b^{k}Vitalic_b start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_V by homogeneity, which yields the result. ∎

The previous result also implies that by𝒴𝑏𝑦𝒴by\in\operatorname{\mathcal{Y}}italic_b italic_y ∈ caligraphic_Y for any b<1𝑏1b<1italic_b < 1 and y𝒴𝑦𝒴y\in\operatorname{\mathcal{Y}}italic_y ∈ caligraphic_Y. We assume that V>ϕ𝑉italic-ϕV>\phiitalic_V > italic_ϕ, which is without loss of generality as we may otherwise take ϕitalic-ϕ\phiitalic_ϕ to be smaller artificially; we assume ϕitalic-ϕ\phiitalic_ϕ is small enough to ensure that 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y contains a ball 1(y1)subscript1subscript𝑦1\operatorname{\mathcal{B}}_{1}(y_{1})caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) of radius 1 around some y1𝒴subscript𝑦1𝒴y_{1}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_Y, and we let R=(Vϕ)11k𝑅superscript𝑉italic-ϕ11𝑘R=\left(\frac{V}{\phi}\right)^{\frac{1}{1-k}}italic_R = ( divide start_ARG italic_V end_ARG start_ARG italic_ϕ end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_k end_ARG end_POSTSUPERSCRIPT. We consider the dynamics to be given by

Dt(pt,yt1)=subscript𝐷𝑡subscript𝑝𝑡subscript𝑦𝑡1absent\displaystyle D_{t}(p_{t},y_{t-1})=italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = (1θt)yt1+x(pt,θt,yt1).1subscript𝜃𝑡subscript𝑦𝑡1superscript𝑥subscript𝑝𝑡subscript𝜃𝑡subscript𝑦𝑡1\displaystyle\;(1-\theta_{t})y_{t-1}+x^{*}(p_{t},\theta_{t},y_{t-1}).( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) .

We let 𝒵=+n𝒵superscriptsubscript𝑛\operatorname{\mathcal{Z}}=\operatorname{\mathbb{R}}_{+}^{n}caligraphic_Z = blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote our action space of price vectors; while dynamics here are not action-linear, we can still compute our desired action pt=v(yt)subscript𝑝𝑡𝑣subscript𝑦𝑡p_{t}=\nabla v(y_{t})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) efficiently, as we assume we have knowledge of v𝑣vitalic_v. While the dynamics depend on θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, our choice of action ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT depends only on the target update ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to the consumer’s reserves, by Lemma 6. Further, upon observing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we can solve for θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as

θt=subscript𝜃𝑡absent\displaystyle\theta_{t}=italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =  1ytxtyt11subscript𝑦𝑡subscript𝑥𝑡subscript𝑦𝑡1\displaystyle\;1-\frac{y_{t}-x_{t}}{y_{t-1}}1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG

for purposes of representing our surrogate losses, which are given by

ft(yt)=subscriptsuperscript𝑓𝑡subscript𝑦𝑡absent\displaystyle f^{*}_{t}(y_{t})=italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = θtkv(y)ct(θty).subscript𝜃𝑡𝑘𝑣𝑦subscript𝑐𝑡subscript𝜃𝑡𝑦\displaystyle\;\theta_{t}k\cdot v(y)-c_{t}(\theta_{t}y).italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_k ⋅ italic_v ( italic_y ) - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y ) .

We now show that the dynamics satisfy local controllability.

Lemma 11 (Local Controllability).

The instance (𝒵,𝒴,Dt)𝒵𝒴subscript𝐷𝑡(\operatorname{\mathcal{Z}},\operatorname{\mathcal{Y}},D_{t})( caligraphic_Z , caligraphic_Y , italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) satisfies θ𝜃\thetaitalic_θ-local controllability for each round t𝑡titalic_t.

  • Proof

    We show that θ𝜃\thetaitalic_θ-local controllability holds over all of +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, which implies θ𝜃\thetaitalic_θ-local controllability over 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y as each distance π(yt1)𝜋subscript𝑦𝑡1\pi(y_{t-1})italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) while the feasible update region remains the same. By Lemma 6, any update where yt(1θt)yt1subscript𝑦𝑡1subscript𝜃𝑡subscript𝑦𝑡1y_{t}\geq(1-\theta_{t})y_{t-1}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT elementwise is feasible. Each π(yt1)𝜋subscript𝑦𝑡1\pi(y_{t-1})italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) over +nsuperscriptsubscript𝑛\operatorname{\mathbb{R}}_{+}^{n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is simply the minimum element of ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which we denote here by m𝑚mitalic_m. Each element of yt1subscript𝑦𝑡1y_{t-1}italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is decreased by at least θm𝜃𝑚\theta mitalic_θ italic_m, and so any ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball of radius θm=θπ(yt1)𝜃𝑚𝜃𝜋subscript𝑦𝑡1\theta m=\theta\pi(y_{t-1})italic_θ italic_m = italic_θ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), and thus the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ball of radius θπ(yt1)𝜃𝜋subscript𝑦𝑡1\theta\pi(y_{t-1})italic_θ italic_π ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), is feasible. ∎

We are now ready to analyse the regret of NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for the problem. The remaining key issues to resolve will be the errors between our true and surrogate rewards ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ftsuperscriptsubscript𝑓𝑡f_{t}^{*}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as well as the lack of Lipschitz continuity for our rewards. We will make use of more general formulations of the guarantees of FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl, (see e.g. Hazan (2021)).

Proposition 11.

For a γ𝛾\gammaitalic_γ-strongly convex regularizer ψ:𝒴:𝜓𝒴\psi:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}italic_ψ : caligraphic_Y → blackboard_R where |ψ(y)ψ(y)|G𝜓𝑦𝜓superscript𝑦𝐺\left\lvert\psi(y)-\psi(y^{\prime})\right\rvert\leq G| italic_ψ ( italic_y ) - italic_ψ ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_G for all y,y𝒴𝑦superscript𝑦𝒴y,y^{\prime}\in\operatorname{\mathcal{Y}}italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y, and for convex losses f1,,fTsubscript𝑓1subscript𝑓𝑇f_{1},\ldots,f_{T}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, the regret of FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl is bounded by

RegT(FTRL)subscriptReg𝑇FTRLabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\textup{$\operatorname{\textup{% {FTRL}}}$})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ftrl ) ≤ t=1T(gt(yt)gt(yt+1))+Gη,superscriptsubscript𝑡1𝑇subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1𝐺𝜂\displaystyle\;\sum_{t=1}^{T}(g_{t}(y_{t})-g_{t}(y_{t+1}))+\frac{G}{\eta},∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG ,

where gt(y)=tft(yt),ysubscript𝑔𝑡𝑦subscript𝑡subscript𝑓𝑡subscript𝑦𝑡𝑦g_{t}(y)=\langle\nabla_{t}f_{t}(y_{t}),y\rangleitalic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y ) = ⟨ ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y ⟩ and gt(yt)gt(yt+1)γηyt+1yt2subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1𝛾𝜂superscriptdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡2g_{t}(y_{t})-g_{t}(y_{t+1})\geq\frac{\gamma}{\eta}\left\lVert y_{t+1}-y_{t}% \right\rVert^{2}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ≥ divide start_ARG italic_γ end_ARG start_ARG italic_η end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We show that this implies a regret bound for (λ,β)𝜆𝛽(\lambda,\beta)( italic_λ , italic_β )-Hölder continuous convex losses, recovering the λ𝜆\lambdaitalic_λ-Lipschitz bounds when β=1𝛽1\beta=1italic_β = 1.

Theorem 19.

For (λ,β)𝜆𝛽(\lambda,\beta)( italic_λ , italic_β )-Hölder continuous convex losses, FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl with obtains regret bounded by

RegT(FTRL)subscriptReg𝑇FTRLabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{FTRL}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ftrl ) ≤ Tλ(ηλγ)β/(2β)+Gη𝑇𝜆superscript𝜂𝜆𝛾𝛽2𝛽𝐺𝜂\displaystyle\;T\lambda\left(\frac{\eta\lambda}{\gamma}\right)^{\beta/(2-\beta% )}+\frac{G}{\eta}italic_T italic_λ ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG

and chooses points which satisfy yt+1yt(ηλγ)1/(2β)delimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡superscript𝜂𝜆𝛾12𝛽\left\lVert y_{t+1}-y_{t}\right\rVert\leq\left(\frac{\eta\lambda}{\gamma}% \right)^{1/(2-\beta)}∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT 1 / ( 2 - italic_β ) end_POSTSUPERSCRIPT in each round.

  • Proof

    For (λ,β)𝜆𝛽(\lambda,\beta)( italic_λ , italic_β )-Hölder continuous convex losses ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have that

    gt(yt)gt(yt+1)=subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1absent\displaystyle g_{t}(y_{t})-g_{t}(y_{t+1})=italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) = tft(yt),ytyt+1subscript𝑡subscript𝑓𝑡subscript𝑦𝑡subscript𝑦𝑡subscript𝑦𝑡1\displaystyle\;\langle\nabla_{t}f_{t}(y_{t}),y_{t}-y_{t+1}\rangle⟨ ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ⟩
    =\displaystyle== tft(yt),(2ytyt+1)ytsubscript𝑡subscript𝑓𝑡subscript𝑦𝑡2subscript𝑦𝑡subscript𝑦𝑡1subscript𝑦𝑡\displaystyle\;\langle\nabla_{t}f_{t}(y_{t}),(2y_{t}-y_{t+1})-y_{t}\rangle⟨ ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , ( 2 italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩
    \displaystyle\leq ft(2ytyt+1)ft(yt)subscript𝑓𝑡2subscript𝑦𝑡subscript𝑦𝑡1subscript𝑓𝑡subscript𝑦𝑡\displaystyle\;f_{t}(2y_{t}-y_{t+1})-f_{t}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 2 italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

    by convexity of ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where (2ytyt+1)yt=ytyt+1delimited-∥∥2subscript𝑦𝑡subscript𝑦𝑡1subscript𝑦𝑡delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1\left\lVert(2y_{t}-y_{t+1})-y_{t}\right\rVert=\left\lVert y_{t}-y_{t+1}\right\rVert∥ ( 2 italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ = ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥, and so

    gt(yt)gt(yt+1)subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1absent\displaystyle g_{t}(y_{t})-g_{t}(y_{t+1})\leqitalic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ≤ λytyt+1β𝜆superscriptdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1𝛽\displaystyle\;\lambda\left\lVert y_{t}-y_{t+1}\right\rVert^{\beta}italic_λ ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT

    by Hölder continuity. Combining with the lower bound on gt(yt)gt(yt+1)subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1g_{t}(y_{t})-g_{t}(y_{t+1})italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) from Proposition 11 gives us that

    γηyt+1yt2𝛾𝜂superscriptdelimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡2absent\displaystyle\frac{\gamma}{\eta}\left\lVert y_{t+1}-y_{t}\right\rVert^{2}\leqdivide start_ARG italic_γ end_ARG start_ARG italic_η end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ gt(yt)gt(yt+1)λytyt+1βsubscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1𝜆superscriptdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1𝛽\displaystyle\;g_{t}(y_{t})-g_{t}(y_{t+1})\leq\lambda\left\lVert y_{t}-y_{t+1}% \right\rVert^{\beta}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ≤ italic_λ ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT

    and thus

    gt(yt)gt(yt+1)subscript𝑔𝑡subscript𝑦𝑡subscript𝑔𝑡subscript𝑦𝑡1absent\displaystyle g_{t}(y_{t})-g_{t}(y_{t+1})\leqitalic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ≤ λ(ηλγ)β/(2β),𝜆superscript𝜂𝜆𝛾𝛽2𝛽\displaystyle\;\lambda\left(\frac{\eta\lambda}{\gamma}\right)^{\beta/(2-\beta)},italic_λ ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT ,

    yielding a regret bound of

    RegT(FTRL)subscriptReg𝑇FTRLabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{FTRL}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ftrl ) ≤ Tλ(ηλγ)β/(2β)+Gη𝑇𝜆superscript𝜂𝜆𝛾𝛽2𝛽𝐺𝜂\displaystyle\;T\lambda\left(\frac{\eta\lambda}{\gamma}\right)^{\beta/(2-\beta% )}+\frac{G}{\eta}italic_T italic_λ ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG

    with per-round distance at most yt+1yt(ηλγ)1/(2β)delimited-∥∥subscript𝑦𝑡1subscript𝑦𝑡superscript𝜂𝜆𝛾12𝛽\left\lVert y_{t+1}-y_{t}\right\rVert\leq\left(\frac{\eta\lambda}{\gamma}% \right)^{1/(2-\beta)}∥ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT 1 / ( 2 - italic_β ) end_POSTSUPERSCRIPT. ∎

We note that the concave surrogate rewards ft(yt)superscriptsubscript𝑓𝑡subscript𝑦𝑡f_{t}^{*}(y_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) are a sum of a (kλ,β)𝑘𝜆𝛽(k\lambda,\beta)( italic_k italic_λ , italic_β )-Hölder continuous function and a (Lc,1)subscript𝐿𝑐1(L_{c},1)( italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , 1 )-Hölder continuous (i.e. Lipschitz) function; we assume that each function is (L,β)𝐿𝛽(L,\beta)( italic_L , italic_β )-Hölder continuous with L=kλ+Lc𝐿𝑘𝜆subscript𝐿𝑐L=k\lambda+L_{c}italic_L = italic_k italic_λ + italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which is sufficient for for large enough T𝑇Titalic_T as we will have ytyt11delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡11\left\lVert y_{t}-y_{t-1}\right\rVert\leq 1∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ 1 and thus ytyt1ytyt1βdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1superscriptdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1𝛽\left\lVert y_{t}-y_{t-1}\right\rVert\leq\left\lVert y_{t}-y_{t-1}\right\rVert% ^{\beta}∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT. We use a similar analysis to bound the error between true and surrogate rewards, yielding our regret bound for NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl.

Theorem 20.

The regret of NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl with respect to the stable reserve policies 𝒫𝒴subscript𝒫𝒴\operatorname{\mathcal{P}}_{\operatorname{\mathcal{Y}}}caligraphic_P start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT is bounded by

RegT(NestedOCO)subscriptReg𝑇NestedOCOabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrl ) ≤  2L(Gγ)β/2(T(3+(Rθ)β))(2β)/2.2𝐿superscript𝐺𝛾𝛽2superscript𝑇3superscript𝑅𝜃𝛽2𝛽2\displaystyle\;2L\left(\frac{G}{\gamma}\right)^{\beta/2}\left(T\left(3+\left(% \frac{R}{\theta}\right)^{\beta}\right)\right)^{(2-\beta)/{2}}.2 italic_L ( divide start_ARG italic_G end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT ( italic_T ( 3 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT ( 2 - italic_β ) / 2 end_POSTSUPERSCRIPT .
  • Proof

    We reparameterize to treat the bundle y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT where 1(y1)𝒴subscript1subscript𝑦1𝒴\operatorname{\mathcal{B}}_{1}(y_{1})\subseteq\operatorname{\mathcal{Y}}caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊆ caligraphic_Y as the origin, and assume the choice of regularizer has y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as its minimum. By Theorem 1, for any step size and δ>0𝛿0\delta>0italic_δ > 0 such that ytyt1δθdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1𝛿𝜃\left\lVert y_{t}-y_{t-1}\right\rVert\leq\delta\theta∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ italic_δ italic_θ, running NestedOCONestedOCO\operatorname{\textup{{NestedOCO}}}oenftrl for the θ𝜃\thetaitalic_θ-locally controllable instance (𝒵,𝒴,D)𝒵𝒴𝐷(\operatorname{\mathcal{Z}},\operatorname{\mathcal{Y}},D)( caligraphic_Z , caligraphic_Y , italic_D ) over the surrogate rewards ftsuperscriptsubscript𝑓𝑡f_{t}^{*}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, with inradius 1 and radius R𝑅Ritalic_R, obtains

    t=1Tft(y)t=1Tft(yt)superscriptsubscript𝑡1𝑇superscriptsubscript𝑓𝑡superscript𝑦superscriptsubscript𝑡1𝑇superscriptsubscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\sum_{t=1}^{T}f_{t}^{*}(y^{*})-\sum_{t=1}^{T}f_{t}^{*}(y_{t})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ TL(δR)β+TL(ηLγ)β/(2β)+Gη𝑇𝐿superscript𝛿𝑅𝛽𝑇𝐿superscript𝜂𝐿𝛾𝛽2𝛽𝐺𝜂\displaystyle\;TL(\delta R)^{\beta}+TL\left(\frac{\eta L}{\gamma}\right)^{% \beta/(2-\beta)}+\frac{G}{\eta}italic_T italic_L ( italic_δ italic_R ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT + italic_T italic_L ( divide start_ARG italic_η italic_L end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    \displaystyle\leq TL(1+(Rθ)β)(ηLγ)β/(2β)+Gη𝑇𝐿1superscript𝑅𝜃𝛽superscript𝜂𝐿𝛾𝛽2𝛽𝐺𝜂\displaystyle\;TL\left(1+\left(\frac{R}{\theta}\right)^{\beta}\right)\left(% \frac{\eta L}{\gamma}\right)^{\beta/(2-\beta)}+\frac{G}{\eta}italic_T italic_L ( 1 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ( divide start_ARG italic_η italic_L end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG
    \displaystyle\leq  2L(Gγ)β/2(T(1+(Rθ)β))(2β)/22𝐿superscript𝐺𝛾𝛽2superscript𝑇1superscript𝑅𝜃𝛽2𝛽2\displaystyle\;2L\left(\frac{G}{\gamma}\right)^{\beta/2}\left(T\left(1+\left(% \frac{R}{\theta}\right)^{\beta}\right)\right)^{(2-\beta)/{2}}2 italic_L ( divide start_ARG italic_G end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT ( italic_T ( 1 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT ( 2 - italic_β ) / 2 end_POSTSUPERSCRIPT
    =ΔΔ\displaystyle\overset{\Delta}{=}overroman_Δ start_ARG = end_ARG RegT(f)subscriptReg𝑇superscript𝑓\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )

    for any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y, upon setting δ=1θ(ηλγ)1/(2β)𝛿1𝜃superscript𝜂𝜆𝛾12𝛽\delta=\frac{1}{\theta}\left(\frac{\eta\lambda}{\gamma}\right)^{1/(2-\beta)}italic_δ = divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ( divide start_ARG italic_η italic_λ end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT 1 / ( 2 - italic_β ) end_POSTSUPERSCRIPT and η=(GKT)(2β)/2𝜂superscript𝐺𝐾𝑇2𝛽2\eta=\left(\frac{G}{KT}\right)^{(2-\beta)/2}italic_η = ( divide start_ARG italic_G end_ARG start_ARG italic_K italic_T end_ARG ) start_POSTSUPERSCRIPT ( 2 - italic_β ) / 2 end_POSTSUPERSCRIPT, where

    K=L(1+(Rθ)β)(Lγ)β/(2β).superscript𝐾𝐿1superscript𝑅𝜃𝛽superscript𝐿𝛾𝛽2𝛽K^{*}=L\left(1+\left(\frac{R}{\theta}\right)^{\beta}\right)\left(\frac{L}{% \gamma}\right)^{\beta/(2-\beta)}.italic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_L ( 1 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ( divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT .

    Note that the surrogate rewards exactly track the true rewards when a stable reserve policy Pysubscript𝑃superscript𝑦P_{y^{*}}italic_P start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is played, and so our regret with respect to the best stable reserve policy Pysubscript𝑃superscript𝑦P_{y^{*}}italic_P start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is at most

    t=1Tft(Py)t=1Tft(yt)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑃superscript𝑦superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑦𝑡absent\displaystyle\sum_{t=1}^{T}f_{t}(P_{y^{*}})-\sum_{t=1}^{T}f_{t}(y_{t})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ RegT(f)+t=1Tft(yt)ft(pt,xt)subscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇subscriptsuperscript𝑓𝑡subscript𝑦𝑡subscript𝑓𝑡subscript𝑝𝑡subscript𝑥𝑡\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}f^{*}_{t% }(y_{t})-f_{t}(p_{t},x_{t})Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq RegT(f)+t=1Tv(yt),θytxtct(θyt)+ct(xt)subscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇𝑣subscript𝑦𝑡𝜃subscript𝑦𝑡subscript𝑥𝑡subscript𝑐𝑡𝜃subscript𝑦𝑡subscript𝑐𝑡subscript𝑥𝑡\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}\langle% \nabla v(y_{t}),\theta y_{t}-x_{t}\rangle-c_{t}(\theta y_{t})+c_{t}(x_{t})Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_θ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ - italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq RegT(f)+t=1T(1θt)(v(yt),yt1yt+Lytyt1)subscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇1subscript𝜃𝑡𝑣subscript𝑦𝑡subscript𝑦𝑡1subscript𝑦𝑡𝐿delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}(1-% \theta_{t})\left(\langle\nabla v(y_{t}),y_{t-1}-y_{t}\rangle+L\left\lVert y_{t% }-y_{t-1}\right\rVert\right)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( ⟨ ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ + italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ) (xt=(1θt)yt1subscript𝑥𝑡1subscript𝜃𝑡subscript𝑦𝑡1x_{t}=(1-\theta_{t})y_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT)
    \displaystyle\leq RegT(f)+t=1T(v(yt),yt(2ytyt1)+Lytyt1)subscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇𝑣subscript𝑦𝑡subscript𝑦𝑡2subscript𝑦𝑡subscript𝑦𝑡1𝐿delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}\left(% \langle\nabla v(y_{t}),y_{t}-(2y_{t}-y_{t-1})\rangle+L\left\lVert y_{t}-y_{t-1% }\right\rVert\right)Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ⟨ ∇ italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ( 2 italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ⟩ + italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ )
    \displaystyle\leq RegT(f)+t=1Tv(yt)v(2ytyt1)+Lytyt1subscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇𝑣subscript𝑦𝑡𝑣2subscript𝑦𝑡subscript𝑦𝑡1𝐿delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}v(y_{t})% -v(2y_{t}-y_{t-1})+L\left\lVert y_{t}-y_{t-1}\right\rVertReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_v ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_v ( 2 italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ (concavity of v𝑣vitalic_v)
    \displaystyle\leq RegT(f)+t=1T2Lytyt1βsubscriptReg𝑇superscript𝑓superscriptsubscript𝑡1𝑇2𝐿superscriptdelimited-∥∥subscript𝑦𝑡subscript𝑦𝑡1𝛽\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+\sum_{t=1}^{T}2L\left% \lVert y_{t}-y_{t-1}\right\rVert^{\beta}Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 2 italic_L ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT (Hölder, ytyt11delimited-∥∥subscript𝑦𝑡subscript𝑦𝑡11\left\lVert y_{t}-y_{t-1}\right\rVert\leq 1∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ 1)
    \displaystyle\leq RegT(f)+2TL(ηLγ)β/(2β)subscriptReg𝑇superscript𝑓2𝑇𝐿superscript𝜂𝐿𝛾𝛽2𝛽\displaystyle\;\operatorname{\textup{{Reg}}}_{T}(f^{*})+2TL\left(\frac{\eta L}% {\gamma}\right)^{\beta/(2-\beta)}Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + 2 italic_T italic_L ( divide start_ARG italic_η italic_L end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT
    \displaystyle\leq  2L(Gγ)β/2(T(3+(Rθ)β))(2β)/22𝐿superscript𝐺𝛾𝛽2superscript𝑇3superscript𝑅𝜃𝛽2𝛽2\displaystyle\;2L\left(\frac{G}{\gamma}\right)^{\beta/2}\left(T\left(3+\left(% \frac{R}{\theta}\right)^{\beta}\right)\right)^{(2-\beta)/{2}}2 italic_L ( divide start_ARG italic_G end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / 2 end_POSTSUPERSCRIPT ( italic_T ( 3 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT ( 2 - italic_β ) / 2 end_POSTSUPERSCRIPT

    upon updating Ksuperscript𝐾K^{*}italic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to K𝐾Kitalic_K as

    K=L(3+(Rθ)β)(Lγ)β/(2β),𝐾𝐿3superscript𝑅𝜃𝛽superscript𝐿𝛾𝛽2𝛽K=L\left(3+\left(\frac{R}{\theta}\right)^{\beta}\right)\left(\frac{L}{\gamma}% \right)^{\beta/(2-\beta)},italic_K = italic_L ( 3 + ( divide start_ARG italic_R end_ARG start_ARG italic_θ end_ARG ) start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) ( divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG ) start_POSTSUPERSCRIPT italic_β / ( 2 - italic_β ) end_POSTSUPERSCRIPT ,

    which yields the theorem. ∎

Theorem 9 follows directly from Theorem 20.

Appendix K Background and Proofs for Section 4.4: Steering Learners

K.1 Background

While much of the literature related to no-regret learning in general-sum games considers either rates of convergence to (coarse) correlated equilibria Blum et al. (2008); Anagnostides et al. (2022) or welfare guarantees for such equilibria Roughgarden (2015); Hartline et al. (2015a), a recent line of work Braverman et al. (2017); Deng et al. (2019); Mansour et al. (2022) has considered the question of optimizing one’s reward when playing against a no-regret learner. A target benchmark which has emerged for this problem is the value of the Stackelberg equilibrium of a game (the optimal mixed strategy to “commit to”, assuming an opponent best responds), which was shown by attainable by Deng et al. (2019) against any no-regret algorithm and optimal in many cases (e.g. for no-swap learners), both up to o(T)𝑜𝑇o(T)italic_o ( italic_T ) terms, and further which may yield higher reward for the optimizer than (coarse) correlated equilibria.

We show a class of instances for which the problem for optimizing reward against a learner playing according to gradient descent can be formulated as a locally controllable instance of online nonlinear control with adversarial perturbations and surrogate state-based losses. The simplest non-trivial instances we consider are those where the optimizer’s reward is a function only of the learner’s actions (i.e. all rows of their reward matrix are identical), and the optimization problem amounts to steering the learner to a desired strategy via one’s choice of actions. Additionally, we allow the game matrices to change over time, which has not been substantially considered in prior work to our knowledge. We require that the learner’s matrices do not change too quickly (which we model as adversarial disturbances to dynamics), and the optimizer’s matrices can change arbitrarily provided that they remain close to some row-identical matrix (which we model as imprecision in our surrogate loss function).

K.2 Model

Here we are tasked with playing a sequence of bimatrix games against a no-regret learning opponent, where the game matrices may change adversarially in each round. We assume the following properties hold for the adversarial sequence of games.

Assumption 4.

For a sequence {(At,Bt):t[T]}conditional-setsubscript𝐴𝑡subscript𝐵𝑡𝑡delimited-[]𝑇\{(A_{t},B_{t}):t\in[T]\}{ ( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) : italic_t ∈ [ italic_T ] } of m×n𝑚𝑛m\times nitalic_m × italic_n bimatrix games, with m>n𝑚𝑛m>nitalic_m > italic_n:

  • Each entry of Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT lies in [L2n,L2n]𝐿2𝑛𝐿2𝑛[-\frac{L}{2\sqrt{n}},\frac{L}{2\sqrt{n}}][ - divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG , divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG ]

  • the convex hull of the of the rows of each Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT contains the unit ball in nsuperscript𝑛\operatorname{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

  • xAtxAtδtdelimited-∥∥𝑥subscript𝐴𝑡𝑥subscriptsuperscript𝐴𝑡subscript𝛿𝑡\left\lVert xA_{t}-xA^{*}_{t}\right\rVert\leq\delta_{t}∥ italic_x italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for any xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ), where each row of Atsubscriptsuperscript𝐴𝑡A^{*}_{t}italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is identical, and

  • xBtxBt1ϵtdelimited-∥∥𝑥subscript𝐵𝑡𝑥subscript𝐵𝑡1subscriptitalic-ϵ𝑡\left\lVert xB_{t}-xB_{t-1}\right\rVert\leq\epsilon_{t}∥ italic_x italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∥ ≤ italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for any xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ).

Each game (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is revealed after Players A and B commit to their respective strategies xtΔ(m)subscript𝑥𝑡Δ𝑚x_{t}\in\Delta(m)italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_m ) and ytΔ(n)subscript𝑦𝑡Δ𝑛y_{t}\in\Delta(n)italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ). Observe that due to the first property, for any z1(𝟎)𝑧subscript10z\in\operatorname{\mathcal{B}}_{1}(\mathbf{0})italic_z ∈ caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_0 ), there is some xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ) such that xB=z𝑥𝐵𝑧xB=zitalic_x italic_B = italic_z. By the second property, we have that xAt=xAt𝑥subscriptsuperscript𝐴𝑡superscript𝑥subscriptsuperscript𝐴𝑡xA^{*}_{t}=x^{\prime}A^{*}_{t}italic_x italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for any x,xΔ(m)𝑥superscript𝑥Δ𝑚x,x^{\prime}\in\Delta(m)italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Δ ( italic_m ).

We recall the Online Gradient Descent algorithm with convex losses tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from Zinkevich (2003).

Algorithm 8 Online Gradient Descent (OGD)
  Input: Convex set 𝒴n𝒴superscript𝑛\operatorname{\mathcal{Y}}\subseteq\operatorname{\mathbb{R}}^{n}caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, initial point y1𝒴subscript𝑦1𝒴y_{1}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_Y, and step sizes θ1,,θTsubscript𝜃1subscript𝜃𝑇\theta_{1},\ldots,\theta_{T}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.
  for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
     Play ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and observe loss t(yt)subscript𝑡subscript𝑦𝑡\ell_{t}(y_{t})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
     Set t=t(yt)subscript𝑡subscript𝑡subscript𝑦𝑡\nabla_{t}=\nabla\ell_{t}(y_{t})∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).
     Set yt+1=Π𝒴(ytθtt)=argminy𝒴ytθttysubscript𝑦𝑡1subscriptΠ𝒴subscript𝑦𝑡subscript𝜃𝑡subscript𝑡subscriptargmin𝑦𝒴delimited-∥∥subscript𝑦𝑡subscript𝜃𝑡subscript𝑡𝑦y_{t+1}=\Pi_{\operatorname{\mathcal{Y}}}\left(y_{t}-\theta_{t}\nabla_{t}\right% )=\text{argmin}_{y\in\operatorname{\mathcal{Y}}}\left\lVert y_{t}-\theta_{t}% \nabla_{t}-y\right\rVertitalic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = argmin start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y ∥.
  end for
Proposition 12 (Zinkevich (2003)).

For differentiable convex losses t:𝒴:subscript𝑡𝒴\ell_{t}:\operatorname{\mathcal{Y}}\rightarrow\operatorname{\mathbb{R}}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_Y → blackboard_R, with θt+1θtsubscript𝜃𝑡1subscript𝜃𝑡\theta_{t+1}\leq\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for each tT𝑡𝑇t\leq Titalic_t ≤ italic_T, then for all y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y the regret of OGD is bounded by

t=1Tt(yt)t(y)superscriptsubscript𝑡1𝑇subscript𝑡subscript𝑦𝑡subscript𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}\ell_{t}(y_{t})-\ell_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ 2RB2θT+t=1Tθt2t2,2subscriptsuperscript𝑅2𝐵subscript𝜃𝑇superscriptsubscript𝑡1𝑇subscript𝜃𝑡2superscriptdelimited-∥∥subscript𝑡2\displaystyle\;\frac{2R^{2}_{B}}{\theta_{T}}+\sum_{t=1}^{T}\frac{\theta_{t}}{2% }\left\lVert\nabla_{t}\right\rVert^{2},divide start_ARG 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG start_ARG italic_θ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where RBsubscript𝑅𝐵R_{B}italic_R start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is the radius of 𝒴𝒴\operatorname{\mathcal{Y}}caligraphic_Y. If tGBdelimited-∥∥subscript𝑡subscript𝐺𝐵\left\lVert\nabla_{t}\right\rVert\leq G_{B}∥ ∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT and θt=2RBGBTsubscript𝜃𝑡2subscript𝑅𝐵subscript𝐺𝐵𝑇\theta_{t}=\frac{2R_{B}}{G_{B}\sqrt{T}}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 2 italic_R start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG start_ARG italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT square-root start_ARG italic_T end_ARG end_ARG for all tT𝑡𝑇t\leq Titalic_t ≤ italic_T, we have that

t=1Tt(yt)t(y)superscriptsubscript𝑡1𝑇subscript𝑡subscript𝑦𝑡subscript𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}\ell_{t}(y_{t})-\ell_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤  2RBGBT.2subscript𝑅𝐵subscript𝐺𝐵𝑇\displaystyle\;2R_{B}G_{B}\sqrt{T}.2 italic_R start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT square-root start_ARG italic_T end_ARG .

We assume that Player B plays according to OPGD in our setup, with y1=𝐮nsubscript𝑦1subscript𝐮𝑛y_{1}=\mathbf{u}_{n}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and θ=RBGBT𝜃subscript𝑅𝐵subscript𝐺𝐵𝑇\theta=\frac{R_{B}}{G_{B}\sqrt{T}}italic_θ = divide start_ARG italic_R start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG start_ARG italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT square-root start_ARG italic_T end_ARG end_ARG. At each round t𝑡titalic_t, we (Player A) choose some mixed strategy xtΔ(n)subscript𝑥𝑡Δ𝑛x_{t}\in\Delta(n)italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ), and Player B plays some mixed strategy ytΔ(n)subscript𝑦𝑡Δ𝑛y_{t}\in\Delta(n)italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_n ). Utilities for each player are given by the game (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) as

utA(xt,yt)=superscriptsubscript𝑢𝑡𝐴subscript𝑥𝑡subscript𝑦𝑡absent\displaystyle u_{t}^{A}(x_{t},y_{t})=italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = xtAtyt;subscript𝑥𝑡subscript𝐴𝑡subscript𝑦𝑡\displaystyle\;x_{t}A_{t}y_{t};italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ;
utB(xt,yt)=superscriptsubscript𝑢𝑡𝐵subscript𝑥𝑡subscript𝑦𝑡absent\displaystyle u_{t}^{B}(x_{t},y_{t})=italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = xtBtyt.subscript𝑥𝑡subscript𝐵𝑡subscript𝑦𝑡\displaystyle\;x_{t}B_{t}y_{t}.italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Note that the loss gradient utB(xt,yt)superscriptsubscript𝑢𝑡𝐵subscript𝑥𝑡subscript𝑦𝑡-\nabla u_{t}^{B}(x_{t},y_{t})- ∇ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) each round for Player B (for negative utilities) is given by

t=subscript𝑡absent\displaystyle\nabla_{t}=∇ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = xtB,subscript𝑥𝑡𝐵\displaystyle\;-x_{t}B,- italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B ,

and so their mixed strategy is updated at each round according to

yt=subscript𝑦𝑡absent\displaystyle y_{t}=italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ΠΔ(n)(yt1+θ(xt1Bt1)).subscriptΠΔ𝑛subscript𝑦𝑡1𝜃subscript𝑥𝑡1subscript𝐵𝑡1\displaystyle\;\Pi_{\Delta(n)}\left(y_{t-1}+\theta(x_{t-1}B_{t-1})\right).roman_Π start_POSTSUBSCRIPT roman_Δ ( italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) .

Our utility is given by xtAtyt=𝐮nAtyt+xt(AtAt)ytsubscript𝑥𝑡subscript𝐴𝑡subscript𝑦𝑡subscript𝐮𝑛superscriptsubscript𝐴𝑡subscript𝑦𝑡subscript𝑥𝑡subscript𝐴𝑡superscriptsubscript𝐴𝑡subscript𝑦𝑡x_{t}A_{t}y_{t}=\mathbf{u}_{n}A_{t}^{*}y_{t}+x_{t}(A_{t}-A_{t}^{*})y_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, as xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT does not affect rewards from Atsuperscriptsubscript𝐴𝑡A_{t}^{*}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We benchmark the regret of an algorithm 𝒜𝒜\operatorname{\mathcal{A}}caligraphic_A against the optimal profile (x,y)Δ(m)×Δ(n)𝑥𝑦Δ𝑚Δ𝑛(x,y)\in\Delta(m)\times\Delta(n)( italic_x , italic_y ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ):

RegT(𝒜)=subscriptReg𝑇𝒜absent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\mathcal{A}})=Reg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( caligraphic_A ) = max(x,y)Δ(m)×Δ(n)t=1TxAtyxtAtyt.subscript𝑥𝑦Δ𝑚Δ𝑛superscriptsubscript𝑡1𝑇𝑥subscript𝐴𝑡𝑦subscript𝑥𝑡subscript𝐴𝑡subscript𝑦𝑡\displaystyle\;\max_{(x,y)\in\Delta(m)\times\Delta(n)}\sum_{t=1}^{T}xA_{t}y-x_% {t}A_{t}y_{t}.roman_max start_POSTSUBSCRIPT ( italic_x , italic_y ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Note that the per-round average utility for the maximizing (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) is at least as high as that obtained by the Stackelberg equilibrium of the average game (tAtT,tBtT)subscript𝑡subscript𝐴𝑡𝑇subscript𝑡subscript𝐵𝑡𝑇\left(\sum_{t}\frac{A_{t}}{T},\sum_{t}\frac{B_{t}}{T}\right)( ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG , ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG ), as for this objective one can choose both players’ strategies without restriction. We remark that finding the Stackelberg equilibrium for any fixed game (At,Bt)superscriptsubscript𝐴𝑡subscript𝐵𝑡(A_{t}^{*},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in our setting, where Atsuperscriptsubscript𝐴𝑡A_{t}^{*}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT has identical rows, is straightforward: it suffices to optimize over [n]delimited-[]𝑛[n][ italic_n ], as any fixed action j[n]𝑗delimited-[]𝑛j\in[n]italic_j ∈ [ italic_n ] is a best response to some xΔ(m)𝑥Δ𝑚x\in\Delta(m)italic_x ∈ roman_Δ ( italic_m ) by our assumption on the rows of Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and as our rewards are only a function of Player B’s strategy y𝑦yitalic_y. However, we are not aware of any prior work which enables competing with the average-game Stackelberg value against a learning opponent when games arrive online.

K.3 Analysis

We first show that the problem can be formulated via known, strongly θ𝜃\thetaitalic_θ-locally controllable dynamics with adversarial disturbances. As Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT changes slowly between rounds, we can run NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap with disturbances representing the error resulting from assuming that Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT does not change from Bt1subscript𝐵𝑡1B_{t-1}italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT.

Lemma 12.

Given the knowledge available prior to selecting xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, updates for yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT can be expressed via known action-linear dynamics (𝒳,𝒴,Dt)𝒳𝒴subscript𝐷𝑡(\operatorname{\mathcal{X}},\operatorname{\mathcal{Y}},D_{t})( caligraphic_X , caligraphic_Y , italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) which satisfy strong θ𝜃\thetaitalic_θ-local controllability, and with adversarial disturbances wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying t=1Twtθt=1Tϵtsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝜃superscriptsubscript𝑡1𝑇subscriptitalic-ϵ𝑡\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq\theta\sum_{t=1}^{T}\epsilon_{t}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_θ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

  • Proof

    First, note that we can compute Player B’s current strategy ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, as it is a function only of games and strategies up to round t1𝑡1t-1italic_t - 1, all of which are observable. Given the update rule for OGDOGD\operatorname{\textup{OGD}}opgd, we can formulate the dynamics Dt(xt,yt)subscript𝐷𝑡subscript𝑥𝑡subscript𝑦𝑡{D}_{t}(x_{t},y_{t})italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) update as

    Dt(xt,yt)=subscript𝐷𝑡subscript𝑥𝑡subscript𝑦𝑡absent\displaystyle{D}_{t}(x_{t},y_{t})=italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ΠΔ(n)(yt+θ(xtBt))subscriptΠΔ𝑛subscript𝑦𝑡𝜃subscript𝑥𝑡subscript𝐵𝑡\displaystyle\;\Pi_{\Delta(n)}\left(y_{t}+\theta(x_{t}B_{t})\right)roman_Π start_POSTSUBSCRIPT roman_Δ ( italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
    =\displaystyle== ΠΔ(n)(yt+θ(xtBt1)+θ(xt(BtBt1)))subscriptΠΔ𝑛subscript𝑦𝑡𝜃subscript𝑥𝑡subscript𝐵𝑡1𝜃subscript𝑥𝑡subscript𝐵𝑡subscript𝐵𝑡1\displaystyle\;\Pi_{\Delta(n)}\left(y_{t}+\theta(x_{t}B_{t-1})+\theta(x_{t}(B_% {t}-B_{t-1}))\right)roman_Π start_POSTSUBSCRIPT roman_Δ ( italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) )
    =\displaystyle== ΠΔ(n)(yt+θ(xtBt1))+wtsubscriptΠΔ𝑛subscript𝑦𝑡𝜃subscript𝑥𝑡subscript𝐵𝑡1subscript𝑤𝑡\displaystyle\;\Pi_{\Delta(n)}\left(y_{t}+\theta(x_{t}B_{t-1})\right)+w_{t}roman_Π start_POSTSUBSCRIPT roman_Δ ( italic_n ) end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

    where wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the error from assuming Bt=Bt1subscript𝐵𝑡subscript𝐵𝑡1B_{t}=B_{t-1}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. by standard properties of Euclidean projection, and the change bound on Btsubscript𝐵𝑡B_{t}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have that wtθ(xt(BtBt1))θϵtdelimited-∥∥subscript𝑤𝑡delimited-∥∥𝜃subscript𝑥𝑡subscript𝐵𝑡subscript𝐵𝑡1𝜃subscriptitalic-ϵ𝑡\left\lVert w_{t}\right\rVert\leq\left\lVert\theta(x_{t}(B_{t}-B_{t-1}))\right% \rVert\leq\theta\epsilon_{t}∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ ∥ italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ∥ ≤ italic_θ italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . Further, the update is action-linear (up to projection, prior to wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT).

    To see that Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfies strong θ𝜃\thetaitalic_θ-local controllability, we recall that the convex hull of the rows of Bt1subscript𝐵𝑡1B_{t-1}italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT contain the unit ball, and so for any ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in θ(yt)Δ(n)subscript𝜃subscript𝑦𝑡Δ𝑛\operatorname{\mathcal{B}}_{\theta}(y_{t})\cap\Delta(n)caligraphic_B start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∩ roman_Δ ( italic_n ) there is some xtΔ(m)subscript𝑥𝑡Δ𝑚x_{t}\in\Delta(m)italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( italic_m ) such that θ(xtBt1)=yyt𝜃subscript𝑥𝑡subscript𝐵𝑡1superscript𝑦subscript𝑦𝑡\theta(x_{t}B_{t-1})=y^{*}-y_{t}italic_θ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. ∎

At round each round t𝑡titalic_t, our loss is given by ft(xt,yt)=xtAtytsubscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript𝑥𝑡subscript𝐴𝑡subscript𝑦𝑡f_{t}(x_{t},y_{t})=-x_{t}A_{t}{y}_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. There are two barriers to running our algorithm. First, the update for ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is determined by xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT and not xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, yet we do not see At1subscript𝐴𝑡1A_{t-1}italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT prior to selecting xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, which would be required to take the appropriate step following ft1subscript𝑓𝑡1f_{t-1}italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. Second, the loss depends on xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in addition to ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. To address both issues, we instead run NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap with surrogate losses f~t(y~t)=𝐮nAt1ytsubscript~𝑓𝑡subscript~𝑦𝑡subscript𝐮𝑛subscript𝐴𝑡1subscript𝑦𝑡\tilde{f}_{t}(\tilde{y}_{t})=-\mathbf{u}_{n}A_{t-1}{y}_{t}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = - bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with action rounds relabeled to account for the fact that xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT influences the step for ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (which does not change the behavior of the algorithm). We set A0=𝟎m,nsubscript𝐴0subscript0𝑚𝑛A_{0}=\mathbf{0}_{m,n}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0 start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT.

Theorem 21.

Repeated play against an opponent using OGDOGD\operatorname{\textup{OGD}}opgd with step size θ=Θ(T1/2)𝜃Θsuperscript𝑇12\theta=\Theta(T^{-1/2})italic_θ = roman_Θ ( italic_T start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) in a sequence of games (At,Bt)subscript𝐴𝑡subscript𝐵𝑡(A_{t},B_{t})( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) satisfying Assumption 4 can be cast as an instance of online control with strongly θ𝜃\thetaitalic_θ-locally controllable dynamics, for which the regret of NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap is at most

RegT(NestedOCO-UD)subscriptReg𝑇NestedOCO-UDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-UD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrluap ) ≤ O(T+t=1T(δt+ϵt)),𝑂𝑇superscriptsubscript𝑡1𝑇subscript𝛿𝑡subscriptitalic-ϵ𝑡\displaystyle\;O\left(\sqrt{T}+\sum_{t=1}^{T}(\delta_{t}+\epsilon_{t})\right),italic_O ( square-root start_ARG italic_T end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ,

with efficient per-round computation.

  • Proof

    We first analyze regret with respect to the surrogate losses f~t(yt)subscript~𝑓𝑡subscript𝑦𝑡\tilde{f}_{t}(y_{t})over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). To run NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap for α>0𝛼0\alpha>0italic_α > 0, it suffices to calibrate the step size for the internal FTRLFTRL\operatorname{\textup{{FTRL}}}ftrl instance such that ηLγθα𝜂𝐿𝛾𝜃𝛼\eta\frac{L}{\gamma}\leq\theta\alphaitalic_η divide start_ARG italic_L end_ARG start_ARG italic_γ end_ARG ≤ italic_θ italic_α. Given that rewards are bounded in [L2n,L2n]𝐿2𝑛𝐿2𝑛[-\frac{L}{2\sqrt{n}},\frac{L}{2\sqrt{n}}][ - divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG , divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG ], we have that each xtBtytsubscript𝑥𝑡subscript𝐵𝑡subscript𝑦𝑡x_{t}B_{t}y_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is Ln𝐿𝑛\frac{L}{\sqrt{n}}divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG-Lipschitz for the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, and thus L𝐿Litalic_L-Lipschitz for the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm, so we can take GB=Lsubscript𝐺𝐵𝐿G_{B}=Litalic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = italic_L. Further, the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT radius of Δ(n)Δ𝑛\Delta(n)roman_Δ ( italic_n ) is RB=2/2subscript𝑅𝐵22R_{B}={\sqrt{2}}/{2}italic_R start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = square-root start_ARG 2 end_ARG / 2, and so we have that

    θ=2L2T.𝜃2superscript𝐿2𝑇\theta=\sqrt{\frac{2}{L^{2}T}}.italic_θ = square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG end_ARG .

    Then, for a strongly θ𝜃\thetaitalic_θ-locally controllable instance with total perturbation bound t=1TwtEsuperscriptsubscript𝑡1𝑇delimited-∥∥subscript𝑤𝑡𝐸\sum_{t=1}^{T}\left\lVert w_{t}\right\rVert\leq E∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_E, we obtain the regret bound

    RegT(NestedOCO-UD)subscriptReg𝑇NestedOCO-UDabsent\displaystyle\operatorname{\textup{{Reg}}}_{T}(\operatorname{\textup{{% NestedOCO-UD}}})\leqReg start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( oenftrluap ) ≤ ηTL2γ+Gη+2LRE(1α)θ𝜂𝑇superscript𝐿2𝛾𝐺𝜂2𝐿𝑅𝐸1𝛼𝜃\displaystyle\;\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}+\frac{2LRE}{(1-\alpha)\theta}italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + divide start_ARG 2 italic_L italic_R italic_E end_ARG start_ARG ( 1 - italic_α ) italic_θ end_ARG (Thm. 14)

    for any

    ηmin(GγL2T,α2T).𝜂𝐺𝛾superscript𝐿2𝑇𝛼2𝑇\displaystyle\eta\leq\min\left(\sqrt{\frac{G\gamma}{L^{2}T}},\alpha\sqrt{\frac% {2}{T}}\right).italic_η ≤ roman_min ( square-root start_ARG divide start_ARG italic_G italic_γ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG end_ARG , italic_α square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_T end_ARG end_ARG ) .

    By Lemma 12, we can efficiently run NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap over the surrogate losses f~tsubscript~𝑓𝑡\tilde{f}_{t}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and bound regret with respect to any y𝒴superscript𝑦𝒴y^{*}\in\operatorname{\mathcal{Y}}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_Y as:

    t=1Tf~t(yt)f~t(y)superscriptsubscript𝑡1𝑇subscript~𝑓𝑡subscript𝑦𝑡subscript~𝑓𝑡superscript𝑦absent\displaystyle\sum_{t=1}^{T}\tilde{f}_{t}(y_{t})-\tilde{f}_{t}(y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ ηTL2γ+Gη+2Lt=1Tϵt1α.𝜂𝑇superscript𝐿2𝛾𝐺𝜂2𝐿superscriptsubscript𝑡1𝑇subscriptitalic-ϵ𝑡1𝛼\displaystyle\;\eta\frac{TL^{2}}{\gamma}+\frac{G}{\eta}+\frac{\sqrt{2}L\cdot% \sum_{t=1}^{T}\epsilon_{t}}{1-\alpha}.italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + divide start_ARG square-root start_ARG 2 end_ARG italic_L ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_α end_ARG .

    Further, we can bound the error from the surrogate losses as

    t=1Tft(xt,yt)f~t(yt)=superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript~𝑓𝑡subscript𝑦𝑡absent\displaystyle\sum_{t=1}^{T}{f}_{t}(x_{t},y_{t})-\tilde{f}_{t}(y_{t})=∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = t=1Tft(xt,yt)ft1(𝐮n,yt)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript𝑓𝑡1subscript𝐮𝑛subscript𝑦𝑡\displaystyle\;\sum_{t=1}^{T}{f}_{t}(x_{t},y_{t})-{f}_{t-1}(\mathbf{u}_{n},y_{% t})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
    \displaystyle\leq L2n+t=1T1ft(xt,yt)ft(𝐮n,yt+1)𝐿2𝑛superscriptsubscript𝑡1𝑇1subscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript𝑓𝑡subscript𝐮𝑛subscript𝑦𝑡1\displaystyle\;\frac{L}{2\sqrt{n}}+\sum_{t=1}^{T-1}{f}_{t}(x_{t},y_{t})-{f}_{t% }(\mathbf{u}_{n},y_{t+1})divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) (f0(𝐮n,y1)=0subscript𝑓0subscript𝐮𝑛subscript𝑦10f_{0}(\mathbf{u}_{n},y_{1})=0italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0, fT(xT,yT)L2nsubscript𝑓𝑇subscript𝑥𝑇subscript𝑦𝑇𝐿2𝑛f_{T}(x_{T},y_{T})\leq\frac{L}{2\sqrt{n}}italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG)
    \displaystyle\leq L2n+ηTL2γ+t=1T1xt(AtAt)yt𝐿2𝑛𝜂𝑇superscript𝐿2𝛾superscriptsubscript𝑡1𝑇1subscript𝑥𝑡subscript𝐴𝑡superscriptsubscript𝐴𝑡subscript𝑦𝑡\displaystyle\;\frac{L}{2\sqrt{n}}+\eta\frac{TL^{2}}{\gamma}+\sum_{t=1}^{T-1}x% _{t}(A_{t}-A_{t}^{*})y_{t}divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG + italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (Prop. 5)
    \displaystyle\leq L2n+ηTL2γ+t=1Tδt,𝐿2𝑛𝜂𝑇superscript𝐿2𝛾superscriptsubscript𝑡1𝑇subscript𝛿𝑡\displaystyle\;\frac{L}{2\sqrt{n}}+\eta\frac{TL^{2}}{\gamma}+\sum_{t=1}^{T}% \delta_{t},divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG + italic_η divide start_ARG italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (Assumption 4, Cauchy-Schwarz)

    and likewise, for any (x,y)Δ(m)×Δ(n)superscript𝑥superscript𝑦Δ𝑚Δ𝑛(x^{*},y^{*})\in\Delta(m)\times\Delta(n)( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ) we can bound

    t=1Tf~t(y)ft(x,y)superscriptsubscript𝑡1𝑇subscript~𝑓𝑡superscript𝑦subscript𝑓𝑡superscript𝑥superscript𝑦absent\displaystyle\sum_{t=1}^{T}\tilde{f}_{t}(y^{*})-f_{t}(x^{*},y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ fT(x,y)t=1T1x(AtAt)ysubscript𝑓𝑇superscript𝑥superscript𝑦superscriptsubscript𝑡1𝑇1superscript𝑥subscript𝐴𝑡superscriptsubscript𝐴𝑡superscript𝑦\displaystyle\;-f_{T}(x^{*},y^{*})-\sum_{t=1}^{T-1}x^{*}(A_{t}-A_{t}^{*})y^{*}- italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
    \displaystyle\leq L2n+t=1Tδt.𝐿2𝑛superscriptsubscript𝑡1𝑇subscript𝛿𝑡\displaystyle\;\frac{L}{2\sqrt{n}}+\sum_{t=1}^{T}\delta_{t}.divide start_ARG italic_L end_ARG start_ARG 2 square-root start_ARG italic_n end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

    Combining the previous results, we have that for any (x,y)Δ(m)×Δ(n)superscript𝑥superscript𝑦Δ𝑚Δ𝑛(x^{*},y^{*})\in\Delta(m)\times\Delta(n)( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ roman_Δ ( italic_m ) × roman_Δ ( italic_n ), the regret of NestedOCO-UDNestedOCO-UD\operatorname{\textup{{NestedOCO-UD}}}oenftrluap with respect to the true losses is bounded by

    t=1Tft(xt,yt)ft(x,y)superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript𝑓𝑡superscript𝑥superscript𝑦absent\displaystyle\sum_{t=1}^{T}f_{t}(x_{t},y_{t})-f_{t}(x^{*},y^{*})\leq∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ t=1Tf~t(y~t)f~t(y)+t=1Tft(xt,yt)f~t(yt)+t=1Tf~t(y)ft(x,y)superscriptsubscript𝑡1𝑇subscript~𝑓𝑡subscript~𝑦𝑡subscript~𝑓𝑡superscript𝑦superscriptsubscript𝑡1𝑇subscript𝑓𝑡subscript𝑥𝑡subscript𝑦𝑡subscript~𝑓𝑡subscript𝑦𝑡superscriptsubscript𝑡1𝑇subscript~𝑓𝑡superscript𝑦subscript𝑓𝑡superscript𝑥superscript𝑦\displaystyle\;\sum_{t=1}^{T}\tilde{f}_{t}(\tilde{y}_{t})-\tilde{f}_{t}(y^{*})% +\sum_{t=1}^{T}{f}_{t}(x_{t},y_{t})-\tilde{f}_{t}(y_{t})+\sum_{t=1}^{T}\tilde{% f}_{t}(y^{*})-f_{t}(x^{*},y^{*})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
    \displaystyle\leq η2TL2γ+Gη+Ln+2t=1Tδt+2Lt=1Tϵt1α𝜂2𝑇superscript𝐿2𝛾𝐺𝜂𝐿𝑛2superscriptsubscript𝑡1𝑇subscript𝛿𝑡2𝐿superscriptsubscript𝑡1𝑇subscriptitalic-ϵ𝑡1𝛼\displaystyle\;\eta\frac{2TL^{2}}{\gamma}+\frac{G}{\eta}+\frac{L}{\sqrt{n}}+2% \sum_{t=1}^{T}\delta_{t}+\frac{\sqrt{2}L\cdot\sum_{t=1}^{T}\epsilon_{t}}{1-\alpha}italic_η divide start_ARG 2 italic_T italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_G end_ARG start_ARG italic_η end_ARG + divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG square-root start_ARG 2 end_ARG italic_L ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_α end_ARG
    \displaystyle\leq  3max(TGL2γ,T2α2)+Ln+2t=1Tδt+2Lt=1Tϵt1α3𝑇𝐺superscript𝐿2𝛾𝑇2superscript𝛼2𝐿𝑛2superscriptsubscript𝑡1𝑇subscript𝛿𝑡2𝐿superscriptsubscript𝑡1𝑇subscriptitalic-ϵ𝑡1𝛼\displaystyle\;3\cdot\max\left(\sqrt{\frac{TGL^{2}}{\gamma}},\sqrt{\frac{T}{2% \alpha^{2}}}\right)+\frac{L}{\sqrt{n}}+2\sum_{t=1}^{T}\delta_{t}+\frac{\sqrt{2% }L\cdot\sum_{t=1}^{T}\epsilon_{t}}{1-\alpha}3 ⋅ roman_max ( square-root start_ARG divide start_ARG italic_T italic_G italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG end_ARG , square-root start_ARG divide start_ARG italic_T end_ARG start_ARG 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ) + divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG square-root start_ARG 2 end_ARG italic_L ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_α end_ARG

    for any α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), which yields the theorem. ∎

Theorem 10 follows directly from Theorem 21.