Sequential Manipulation against Rank Aggregation: Theory and Algorithm

Ke Ma,  Qianqian Xu,  **shan Zeng, Wei Liu, 
Xiaochun Cao, Yingfei Sun, and Qingming Huang
K. Ma and Y. Sun are with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Bei**g 100049, China. E-mail: [email protected], [email protected]. Q. Xu is with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bei**g 100190, China. E-mail: [email protected], [email protected]. J. Zeng is with the School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China. E-mail: [email protected]. W. Liu is with the Tencent Data Platform, Shenzhen 518054, China. E-mail: [email protected]. X. Cao is with the School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China. E-mail: [email protected]. Q. Huang is with the School of Computer Science and Technology, University of Chinese Academy of Sciences, Bei**g 100049, China, also with the Key Laboratory of Big Data Mining and Knowledge Management (BDKM), University of Chinese Academy of Sciences, Bei**g 100049, China, also with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bei**g 100190, China. E-mail: [email protected]. Corresponding author.
Abstract

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.

Index Terms:
Online Manipulation, Adversarial Learning, Pairwise Comparison, Ranking Aggregation.

1 Introduction

Rank aggregation has wide-ranging applications in social choice theory [2], psychology [46], economics [45], statistic [29], bioinformatic [4], and other fields. In pursuit of large benefits, the potential attackers have strong motivations to manipulate the ranking aggregation algorithms which are utilized in high-stakes scenarios, e.g. elections [6], sports competitions [31], and recommendations [43]. A profit-seeking adversary will try his/her best to designate the ranking list and fulfill his/her demands. In addition to statistical [17] and computational [50] properties, the integrity issue of ranking results becomes a new direction in the study of rank aggregation algorithms.

Refer to caption
(a) offline attack against rank aggregation
Refer to caption
(b) online attack against rank aggregation
Figure 1: Overview of the offline and online adversarial settings. (a) In the offline confrontation scenario, the adversary observes the whole comparison graph on Oct. 17 and he/her obtains the attack strategy which needs to flip the comparisons which have occurred on Sep. 25 and Oct. 2. However, no one can return to the past and change what has happened. Moreover, bypassing the defense mechanisms of the rank aggregation to modify the completed comparison graph is really a challenging task. (b) Different form the offline attack methods, we consider the sequential manipulation strategies which has no knowledge of all future observed pairwise comparisons. The proposed online attack method inserts malicious into the data stream before the construction of comparison graph.

The pioneer in conducting security-related research on rank aggregation is [37]. [37] develops a strong threat model for perturbing the aggregated results. The adversary has complete knowledge of the initial truthful data and corresponding feedback of victims. He/her can corrupt the original data by inserting, deleting, or flip** any pairwise comparisons with limitations on quantity of modification. [37] also considers the adversary with incomplete knowledge, who lacks the preference score generated by the victims. The attack strategies are solved by maximizing the objective functions of the victims with global modification on the weights of comparison graph. Their results show that the rank aggregation algorithms are vulnerable to these attackers. Concurrent to [37], [32] and [1] restrict the modification scope and degree of weights towards specific families of comparison graphs, then provide the recovery guarantees for the ground truth ranking with the proposed procedures. It is noteworthy that these weaker threat models could not be translated into any defense mechanism against the unregulated attackers. Furthermore, [38] poses the manipulation problem against rank aggregation algorithms. The purposeful attackers are not satisfied with simply perturbing the ranking list, but with designating it. The attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim from the perspective of the dynamical system. The manipulation strategies equal to the conditions that the weights of comparison graph should satisfy when the victims obtain the target ranking list. From the above analysis, we conclude that the existing methods study the security issue of rank aggregation in an “offline” adversarial scenario [51, 52, 53]. In general, the attackers from the existing methods try to modify the pairwise comparisons that have already been collected. These offline attacks must occur after the construction of the comparison graph and before the victims aggregate its results. The rank aggregation algorithms would wait for the adversary to complete his/her malicious actions and unconditionally accept the modifications to the data before they can begin their own jobs. Opportunity for such attacks affords the adversary some privileges. There exists an implicit assumption that the adversary is capable of changing the existed data in the possession of the victims. However, the data held by the rank aggregation algorithms is often immutable in practice. In sports competitions, the final ranking is only produced when all the races have been completed. Theoretically, the existing methods could perturb or manipulate the ranking lists of all teams or players. But no one can travel to the past and change the outcome of a match that has already finished. Once a vote has been cast at the polling station, the ballot will not be changed by any third party. In the partial confrontation scenarios, the existing methods assume that they have completely bypassed the constraints of time and space. Therefore, these offline methods fail to profile the capability boundary of the potential attackers and illuminate the underlying risks of ranking aggregation algorithms.

To address the above challenges, we need a new online paradigm for manipulating rank aggregation algorithms. In terms of attack opportunities, attackers need to seek more chances for archiving his/her goal and bypass the time and space constraints. The whole process of obtaining a ranking list can be divided into two parts: online data collection and offline data aggregation. Compared to offline aggregation, online collection is much more vulnerable. As a distributed and asynchronous process, data collection can’t be done in a controlled environment and is therefore independent of rank aggregation. The defense mechanisms of aggregation often fail to protect online data collection. In addition, data collection always takes a long time and the attacker has sufficient chances to execute his/her actions. Consequently, disrupting the data collection process by online falsifying pairwise comparisons is more sophisticated than offline changing the collected data. Having determined the attack opportunity, it is necessary to identify the attacker’s capabilities during data collection. During the collection process, a data source generates many pairwise comparisons waiting to be sampled. Once a comparison is sampled, it is used to construct the comparison graph and cannot be modified. If the attacker performed malicious actions during data collection, all he/she could do was mimic the behavior of normal data. The adversary could construct an adversarial data source which generates specific pairwise comparisons and insert them into the data stream. Since the cost of authenticating data sources is much greater than the cost of fabricating a pairwise comparison with malicious intention, an attacker can effectively bypass the victim’s defenses. To the best of our knowledge, manipulating aggregation results by fabricating the data source and continuously injecting malicious pairwise comparisons into the data stream is a new formulation for attacking against rank aggregation algorithms, which is still under-explored.

The core of this paper is to make the above analysis rigorous by establishing a principle framework for sequential manipulating rank aggregation algorithms. The main methodology and theoretical contributions are summarized as follows.

  • Under a distributionally robust game theoretic framework, we construct the confrontation model between the online manipulator and the ranker who is bound to the original data sources. We then prove the existence of distributionally robust Nash equilibrium in such a game, which guarantees the possibility of sequential manipulation. This adversarial game describes the goal, knowledge and capability of the attacker, with particular emphasis on the uncertainty that all players must deal with.

  • We characterize the data collection process as a sampling algorithm and focus on two of the most basic and well-known sampling algorithms: Bernoulli sampling and reservoir sampling. Our theoretical analysis shows that the sampled results could be representative with respect to the mixture of the original and adversarial data sources. Such results suggest that the actions of adversary could resist the effects of randomness in the original data source and the data collection.

  • Different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. The underlying Bayes risk consists of the expected Kendall’s tau distance and the expected relative generation cost. We then derive the asymptotic optimality of the proposed policies with complete knowledge.

  • To increase the success rate of the sequential manipulation with incomplete knowledge, we empower the generation rule against uncertainty. A distributionally robust estimator replaces the maximum likelihood estimation in a saddle point optimization problem. Then the corresponding conservative generation rule is obtained by mirror descent algorithm.

The rest of the paper is organized as follows. In Section 2, we introduce the basic concept of rank aggregation and two representative algorithms as HodgeRank [29] and RankCentrality [41]. Section 3 establishes the general framework for sequential manipulating rank aggregation algorithms. We present the details of manipulation strategies and the theoretical results in Section 4. Section 5 illustrates the simulated and real-world data results, followed by concluding remarks in Section 6. Technical proofs are provided in the supplementary material.

2 Preliminary

We begin with a formal description of the parametric model for binary comparisons, a.k.a Bradley-Terry-Luce (BTL) model [12]. Then we revisit the comparison graph and the Laplacian matrix which are essential for the ranking algorithms tailored to the BTL model. Two popular approaches which rank the items based on appropriate estimation of the latent preference scores, named HodgeRank [29] and Rank Centrality [41], are chosen as the victims to motivate our target attack strategies. In the remainder of this paper, we will use positive integers to indicate alternatives and voters. Let 𝑽𝑽\boldsymbol{V}bold_italic_V be the set [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\ \dots,\ n\}[ italic_n ] = { 1 , … , italic_n } which denotes a set of alternatives to be ranked. 𝑼={𝒖1,,𝒖m}𝑼subscript𝒖1subscript𝒖𝑚\boldsymbol{U}=\{\boldsymbol{u}_{1},\ \dots,\ \boldsymbol{u}_{m}\}bold_italic_U = { bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } denotes a set of voters. We will adopt the following notation from combinatorics:

[𝑽l]:=set of alllelements subset of𝑽.assignmatrix𝑽𝑙set of all𝑙elements subset of𝑽\begin{bmatrix}\boldsymbol{V}\\ l\end{bmatrix}:=\text{set of all}\ l\ \text{elements subset of}\ \boldsymbol{V}.[ start_ARG start_ROW start_CELL bold_italic_V end_CELL end_ROW start_ROW start_CELL italic_l end_CELL end_ROW end_ARG ] := set of all italic_l elements subset of bold_italic_V .

In particular,

[𝑽2]matrix𝑽2\displaystyle\begin{bmatrix}\boldsymbol{V}\\ 2\end{bmatrix}[ start_ARG start_ROW start_CELL bold_italic_V end_CELL end_ROW start_ROW start_CELL 2 end_CELL end_ROW end_ARG ] :=assign\displaystyle:=:= set of all unordered pairs of elements of𝑽set of all unordered pairs of elements of𝑽\displaystyle\ \ \text{set of all unordered pairs of elements of}\ \boldsymbol% {V}set of all unordered pairs of elements of bold_italic_V
:=assign\displaystyle:=:= {[i,j]|i,j𝑽,ij}.conditional-set𝑖𝑗formulae-sequencefor-all𝑖𝑗𝑽𝑖𝑗\displaystyle\left\{[i,\ j]\ \Big{|}\ \forall\ i,\ j\in\boldsymbol{V},\ i\neq j% \right\}.{ [ italic_i , italic_j ] | ∀ italic_i , italic_j ∈ bold_italic_V , italic_i ≠ italic_j } .

Moreover, for any i,j𝑽,ijformulae-sequence𝑖𝑗𝑽𝑖𝑗i,j\in\boldsymbol{V},\ i\neq jitalic_i , italic_j ∈ bold_italic_V , italic_i ≠ italic_j, we write ijsucceeds𝑖𝑗i\succ jitalic_i ≻ italic_j to mean that alternative i𝑖iitalic_i is preferred over alternative j𝑗jitalic_j. Such a comparison could be converted into an ordered pair (i,j)𝑖𝑗(i,\ j)( italic_i , italic_j ). The set of ordered pair will be denoted as

𝑽×𝑽:={(i,j)|ij,i,j𝑽,ij}.assign𝑽𝑽conditional-set𝑖𝑗formulae-sequencesucceeds𝑖𝑗for-all𝑖formulae-sequence𝑗𝑽𝑖𝑗\boldsymbol{V}\times\boldsymbol{V}:=\left\{(i,\ j)\ \Big{|}\ i\succ j,\ % \forall\ i,\ j\in\boldsymbol{V},\ i\neq j\right\}.bold_italic_V × bold_italic_V := { ( italic_i , italic_j ) | italic_i ≻ italic_j , ∀ italic_i , italic_j ∈ bold_italic_V , italic_i ≠ italic_j } .

Ordered and unordered pairs will be delimited by parentheses (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) and braces {i,j}𝑖𝑗\{i,j\}{ italic_i , italic_j } respectively. If we wish to emphasize the preference judgment from a particular voter 𝒖𝑼𝒖𝑼\boldsymbol{u}\in\boldsymbol{U}bold_italic_u ∈ bold_italic_U, we will write i𝒖jsubscriptsucceeds𝒖𝑖𝑗i\succ_{\boldsymbol{u}}jitalic_i ≻ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT italic_j.

2.1 Parametric Model and Pairwise Comparisons

Given a collection 𝑽𝑽\boldsymbol{V}bold_italic_V of n𝑛nitalic_n alternatives, the parametric model of pairwise comparisons assumes that each i𝑽𝑖𝑽i\in\boldsymbol{V}italic_i ∈ bold_italic_V has a certain numeric quality score θisubscriptsuperscript𝜃𝑖\theta^{*}_{i}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Suppose that 𝜽nsuperscript𝜽superscript𝑛\boldsymbol{\theta}^{*}\in\mathbb{R}^{n}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

𝜽=[θ1,,θn]superscript𝜽superscriptsubscriptsuperscript𝜃1subscriptsuperscript𝜃𝑛top\boldsymbol{\theta}^{*}=\left[\theta^{*}_{1},\ \dots,\ \theta^{*}_{n}\right]^{\top}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (1)

comprises the underlying preference scores assigned to each of the n𝑛nitalic_n items. Without loss of generality, 𝜽superscript𝜽\boldsymbol{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT could be positive as

θi>0,i[n].formulae-sequencesubscriptsuperscript𝜃𝑖0for-all𝑖delimited-[]𝑛\theta^{*}_{i}>0,\ \forall\ i\in[n].italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 , ∀ italic_i ∈ [ italic_n ] .

Specifically, a comparison of any pair {i,j}[𝑽2]𝑖𝑗matrix𝑽2\{i,j\}\in\begin{bmatrix}\boldsymbol{V}\\ 2\end{bmatrix}{ italic_i , italic_j } ∈ [ start_ARG start_ROW start_CELL bold_italic_V end_CELL end_ROW start_ROW start_CELL 2 end_CELL end_ROW end_ARG ] is generated via the comparing between the corresponding scores θisubscriptsuperscript𝜃𝑖\theta^{*}_{i}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and θjsubscriptsuperscript𝜃𝑗\theta^{*}_{j}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (in the presence of noise) by the BTL model. Let yijsubscriptsuperscript𝑦𝑖𝑗y^{*}_{ij}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denote the outcome of the comparison of the pair i𝑖iitalic_i and j𝑗jitalic_j based on 𝜽superscript𝜽\boldsymbol{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, such that yij=1subscriptsuperscript𝑦𝑖𝑗1y^{*}_{ij}=1italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 if i𝑖iitalic_i is preferred over j𝑗jitalic_j and yij=1subscriptsuperscript𝑦𝑖𝑗1y^{*}_{ij}=-1italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = - 1 otherwise. Then, according to the BTL model,

yij={1,with probabilityθi/(θi+θj),1,otherwise.subscriptsuperscript𝑦𝑖𝑗cases1with probabilitysubscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗1otherwisey^{*}_{ij}=\left\{\begin{array}[]{rl}1,&\text{with probability}\ \theta^{*}_{i% }/(\theta^{*}_{i}+\theta^{*}_{j}),\\[5.0pt] -1,&\text{otherwise}.\end{array}\right.italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL with probability italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL - 1 , end_CELL start_CELL otherwise . end_CELL end_ROW end_ARRAY (2)

Since the BTL model is invariant under the scaling of the scores, the latent preference score is not unique. Indeed, under the BTL model, a score vector 𝜽+nsuperscript𝜽subscriptsuperscript𝑛\boldsymbol{\theta}^{*}\in\mathbb{R}^{n}_{+}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is the equivalence class

𝚯={𝜽|there existsα>0such that𝜽=α𝜽}.superscript𝚯conditional-set𝜽there exists𝛼0such that𝜽𝛼superscript𝜽\boldsymbol{\Theta}^{*}=\left\{\boldsymbol{\theta}\ \Big{|}\ \text{there % exists}\ \alpha>0\ \text{such that}\ \boldsymbol{\theta}=\alpha\boldsymbol{% \theta}^{*}\right\}.bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { bold_italic_θ | there exists italic_α > 0 such that bold_italic_θ = italic_α bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } .

The outcome of a comparison depends on the equivalence class 𝚯superscript𝚯\boldsymbol{\Theta}^{*}bold_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

2.2 Comparison Graph and Combinatorial Laplacian

A graph structure, named comparison graph, arises naturally from pairwise comparisons as follows. Let 𝓖=(𝑽,𝑬)𝓖𝑽𝑬\boldsymbol{\mathcal{G}}=(\boldsymbol{V},\boldsymbol{E})bold_caligraphic_G = ( bold_italic_V , bold_italic_E ) stand for a comparison graph, where the vertex set 𝑽=[n]𝑽delimited-[]𝑛\boldsymbol{V}=[n]bold_italic_V = [ italic_n ] represents the n𝑛nitalic_n candidates. In our problem setting, we pay attention to the complete graph setting: the directed edge set 𝑬=𝑽×𝑽𝑬𝑽𝑽\boldsymbol{E}=\boldsymbol{V}\times\boldsymbol{V}bold_italic_E = bold_italic_V × bold_italic_V and N:=|𝑬|=n(n1)assign𝑁𝑬𝑛𝑛1N:=|\boldsymbol{E}|=n(n-1)italic_N := | bold_italic_E | = italic_n ( italic_n - 1 ). One can further associate weights 𝒘superscript𝒘\boldsymbol{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT on 𝑬𝑬\boldsymbol{E}bold_italic_E as voters 𝑼𝑼\boldsymbol{U}bold_italic_U would have rated, i.e. assigned cardinal scores or given an ordinal ordering to, the complete set of the alternatives 𝑽𝑽\boldsymbol{V}bold_italic_V. But no matter how incomplete the rated portion is, one may always convert such judgments into pairwise rankings that have no missing values as follows. For each voter 𝒖𝑼𝒖𝑼\boldsymbol{u}\in\boldsymbol{U}bold_italic_u ∈ bold_italic_U, the pairwise ranking matrix is a skew-symmetric matrix 𝒀𝒖={yij𝒖}{1,0,1}n×nsuperscript𝒀𝒖subscriptsuperscript𝑦𝒖𝑖𝑗superscript101𝑛𝑛\boldsymbol{Y}^{\boldsymbol{u}}=\{y^{\boldsymbol{u}}_{ij}\}\in\{-1,0,1\}^{n% \times n}bold_italic_Y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT = { italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT } ∈ { - 1 , 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT as

yij𝒖=yji𝒖,(i,j)𝑬,𝒖𝑼,formulae-sequencesubscriptsuperscript𝑦𝒖𝑖𝑗subscriptsuperscript𝑦𝒖𝑗𝑖formulae-sequencefor-all𝑖𝑗𝑬for-all𝒖𝑼y^{\boldsymbol{u}}_{ij}=-y^{\boldsymbol{u}}_{ji},\ \forall\ (i,\ j)\in% \boldsymbol{E},\ \forall\ \boldsymbol{u}\in\boldsymbol{U},italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = - italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , ∀ ( italic_i , italic_j ) ∈ bold_italic_E , ∀ bold_italic_u ∈ bold_italic_U , (3)

where

yij𝒖={1,ifi𝒖j,1,ifj𝒖i,0,otherwise.subscriptsuperscript𝑦𝒖𝑖𝑗cases1subscriptsucceeds𝒖if𝑖𝑗1subscriptsucceeds𝒖if𝑗𝑖0otherwisey^{\boldsymbol{u}}_{ij}=\left\{\begin{array}[]{rl}1,&\text{if}\ i\succ_{% \boldsymbol{u}}j,\\[5.0pt] -1,&\text{if}\ j\succ_{\boldsymbol{u}}i,\\[5.0pt] 0,&\text{otherwise}.\end{array}\right.italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL if italic_i ≻ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT italic_j , end_CELL end_ROW start_ROW start_CELL - 1 , end_CELL start_CELL if italic_j ≻ start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT italic_i , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW end_ARRAY (4)

Furthermore, we associate weight with each directed edge as 𝒘=[w12,w13,,wn,n1]+superscript𝒘superscriptsubscriptsuperscript𝑤12subscriptsuperscript𝑤13subscriptsuperscript𝑤𝑛𝑛1topsubscript\boldsymbol{w}^{*}=[w^{*}_{12},w^{*}_{13},\dots,w^{*}_{n,n-1}]^{\top}\in% \mathbb{Z}_{+}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , … , italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT

wij:=𝒖𝑼𝕀[yij𝒖>0]+𝕀[yji𝒖<0],assignsubscriptsuperscript𝑤𝑖𝑗𝒖𝑼𝕀delimited-[]subscriptsuperscript𝑦𝒖𝑖𝑗0𝕀delimited-[]subscriptsuperscript𝑦𝒖𝑗𝑖0w^{*}_{ij}:=\underset{\boldsymbol{u}\in\boldsymbol{U}}{\sum}\ \mathbb{I}[y^{% \boldsymbol{u}}_{ij}>0]+\mathbb{I}[y^{\boldsymbol{u}}_{ji}<0],italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT := start_UNDERACCENT bold_italic_u ∈ bold_italic_U end_UNDERACCENT start_ARG ∑ end_ARG blackboard_I [ italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > 0 ] + blackboard_I [ italic_y start_POSTSUPERSCRIPT bold_italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT < 0 ] , (5)

where 𝕀[]𝕀delimited-[]\mathbb{I}[\cdot]blackboard_I [ ⋅ ] is the Iverson bracket. Consequently, we can represent any pairwise ranking data as a comparison graph 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G with edge weights 𝒘superscript𝒘\boldsymbol{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Given a graph 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G and weights 𝒘superscript𝒘\boldsymbol{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, it is common to consider the weight matrix 𝑾superscript𝑾\boldsymbol{W}^{*}bold_italic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with wijsubscriptsuperscript𝑤𝑖𝑗w^{*}_{ij}italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT as matrix elements, as well as the diagonal degree matrix 𝑫=diag(d1,,dn)superscript𝑫diagsubscriptsuperscript𝑑1subscriptsuperscript𝑑𝑛\boldsymbol{D}^{*}=\textbf{diag}(d^{*}_{1},\dots,d^{*}_{n})bold_italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = diag ( italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) given by di=j𝑽wijsubscriptsuperscript𝑑𝑖subscript𝑗𝑽subscriptsuperscript𝑤𝑖𝑗d^{*}_{i}=\sum_{j\in\boldsymbol{V}}w^{*}_{ij}italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ bold_italic_V end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, which represents the volume taken by each node in the graph 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G. The combinatorial Laplacian 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is defined as

0=𝑫𝑾.subscript0superscript𝑫superscript𝑾\mathcal{L}_{0}=\boldsymbol{D}^{*}-\boldsymbol{W}^{*}.caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . (6)

In both solving process and the theoretical analysis, the combinatorial Laplacian 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT plays a vital role in the popular approaches based on the parametric model.

Remark 1. In this paper, we select HodgeRank [29] and RankCentrality [41] to verify that online manipulation behavior is a potentially significant threat to rank aggregation methods. This is due to the following considerations. First, these two representative methods that have received much recent attention have been well studied by [29, 17, 15] and their theoretical properties guarantee the promising recovery performance. The successful manipulation will be in stark contrast to the original aggregated results. Second, the variants of HodgeRank and RankCentrality are hot topics of the literature [36, 14, 33, 11]. The online attack method proposed in this paper has a large potential victimization. Third, the destructive results of these two estimators for the famous Bradley–Terry–Luce (BTL) model will prompt researchers to focus on the security issue of rank aggregation in the high-stakes applications.

Remark 2. When exists a purposeful adversary, the collected pairwise comparisons would be a mixture of the data which supports the original ranking list and the fabricated data by the adversary. To manipulate the aggregated results, the attacker will predict the ranker’s behavior with incomplete information and fabricate the suitable pairwise comparisons. Therefore, we need a mathematical tool to formulate the ranker’s and the adversary’s behaviors, which has been extensively modeled as a two-player, non-cooperative game in the adversarial learning[20]. Specifically, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. The ranker’s set of actions corresponds to selecting pairwise comparisons and minimizing the difference between the aggregation result and the original ranking list. Meanwhile, the adversary’s set of actions corresponds to generate pairwise comparisons and minimizing the difference between the aggregation result and the desired ranking list. For two players, the upcoming data is the uncertain knowledge.

Remark 3. Although the offline [38] and online attackers have the same goal, different behavioral patterns result in the two having different knowledge and capabilities. Specifically, let T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the stop** time of data collection, the offline attacker has full/partial knowledge of the comparison graph weight 𝒘𝓐(T0)subscript𝒘𝓐subscript𝑇0\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(T_{0})bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Then the offline manipulator has the ability to modify 𝒘(T0)𝒘subscript𝑇0\boldsymbol{w}(T_{0})bold_italic_w ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) in its entirety, increasing or decreasing the values of 𝒘𝓐(T0)subscript𝒘𝓐subscript𝑇0\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(T_{0})bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at arbitrary position111Please see Eq. (53)-(56), (73) and (81) of [38] for the detailed utilization of the offline knowledge.. Meanwhile, the online manipulator of this paper sequentially obtains his/her knowledge but knows nothing about the forthcoming pairwise comparisons. More importantly, the online attacker will execute his/her strategy based on the knowledge 𝒘𝓐(t)subscript𝒘𝓐𝑡\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(t)bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_t ) at each time step t𝑡titalic_t instead of waiting for the moment T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Thus, the greatest limitation on the ability of the online attacker is that he/she can only insert fabricated pairwise comparisons. The online attack paradigm could bypass the existing defense mechanisms of rank aggregation algorithms and break the barrier of time. We provide an example in Fig. 1. It is noteworthy that utilizing the offline method at each time step can’t achieve a similar result as the online method, since the offline method does not guarantee that the collected data keep unchanged.

Remark 4. In order to accomplish an effective online attack without modifying the collected data, the adversary will generate the most destructive data to inject based on the current partial information and stop when the ambiguity of ranking list falls below a certain level. This paper develops a general framework against the parametric models of rank aggregation, especially the BTL model. The proposed adversarial generation process, corresponding to the third core contribution, can designate the leading candidate of the aggregated ranking lists by HodgeRank and RankCentrality. In addition, the offline attack methods [37, 38] cannot yield the available attack results in the online manipulation setting of this paper.

3 General Framework

In this section, we systematically introduce the general framework for sequential manipulating against pairwise ranking algorithms. To mathematically characterize the successive interaction between the manipulator and the victims, we perform threat modeling to profile the attacker’s goal, knowledge and capability in Sec. 3.1 and dissect the online adversarial behavior. Then we develop the game-theoretic formulation between the online adversary and the offline rank aggregation procedure in Sec. 3.2 with particular emphasis on the uncertainty that the online manipulator must deal with. Such a game with fundamental uncertainty about future data and the opponent’s strategies and the settings of [37, 38] are significantly different. Meanwhile the existence of the distributionally robust Nash equilibrium is also established.

3.1 Threat Model of Online Adversary

Here we present the threat model of the manipulator to specify his/her goal, knowledge and capability with online behavioral pattern. The threat model helps to establish the online interactions between the purposeful attacker and the rank aggregation with pairwise comparisons.

The Goal of Online Adversary. Inducing the threatened rank aggregation approaches to produce the designated ranking is the goal of a manipulator. On the one hand, the adversary cannot interact directly with the threatened rank aggregation procedure due to the inevitable defense mechanisms. On the other hand, the collection of pairwise comparisons is an online process which is independent of the subsequent rank aggregation method. It often takes place in open environments and lacks adequate supervision. If the attacker could interfere with the data collection procedure, he/she has a high possibility of bypassing defense mechanisms and accomplishing manipulation. The data collection procedure is always treated as a random sampling process. All possible pairwise comparisons consist of the data stream. A random sampling algorithm will receive and choose the data which constructs the comparison graph. To archive manipulation, the adversary proactively disguises the crafted malicious data as part of the data stream. Then these malicious data could be adopted by sampling algorithms and used to construct a comparison graph. After sampling, the ranker produces the aggregated result based on the comparison graph. These sequential actions of the adversary will induce the ranker to produce a designated ranking result. If the ranking list meets the demand of adversary, we will say that the adversary has executed a successful manipulation.

We denote 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A and 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R be the adversary and the ranker respectively. Let 𝑪={c1,c2,}𝑪subscript𝑐1subscript𝑐2\boldsymbol{C}=\{c_{1},c_{2},\dots\}bold_italic_C = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … } be a sequence of recurring pairwise comparisons involving at most n𝑛nitalic_n candidates. The perturbed sequence by 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A is 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The sequence of pairwise comparisons will be transferred into the comparison graphs as (5). Suppose that 𝓖(𝑪)𝓖𝑪\boldsymbol{\mathcal{G}}(\boldsymbol{C})bold_caligraphic_G ( bold_italic_C ) is the comparison graph constructed by 𝑪𝑪\boldsymbol{C}bold_italic_C. The relative ranking scores 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are produced by 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R with 𝓖(𝑪)𝓖𝑪\boldsymbol{\mathcal{G}}(\boldsymbol{C})bold_caligraphic_G ( bold_italic_C ) and 𝓖(𝑪)𝓖superscript𝑪\boldsymbol{\mathcal{G}}(\boldsymbol{C}^{\prime})bold_caligraphic_G ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) accordingly. The non-adversarial rank aggregation can be portrayed as

𝓡(𝓖(𝑪))=𝜽.𝓡𝓖𝑪𝜽\boldsymbol{\mathcal{R}}(\boldsymbol{\mathcal{G}}(\boldsymbol{C}))=\boldsymbol% {\theta}.\ bold_caligraphic_R ( bold_caligraphic_G ( bold_italic_C ) ) = bold_italic_θ . (7)

Then the rank aggregation result under online manipulation strategies would be

𝓡(𝓖(𝑪))=𝜽.𝓡𝓖superscript𝑪superscript𝜽\boldsymbol{\mathcal{R}}(\boldsymbol{\mathcal{G}}(\boldsymbol{C}^{\prime}))=% \boldsymbol{\theta}^{\prime}.bold_caligraphic_R ( bold_caligraphic_G ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) = bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . (8)

Although 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A is able to achieve multiple objectives with the help of 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, designating the winner will be the most desired achievement of 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A. Therefore, we consider the following scenario: after the action of 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R, it holds that

𝜽𝚯𝓐:={𝜽n|maxi[n],ii0θiθi0},superscript𝜽subscript𝚯𝓐assignconditional-set𝜽superscript𝑛formulae-sequence𝑖delimited-[]𝑛𝑖subscript𝑖0maxsubscript𝜃𝑖subscript𝜃subscript𝑖0\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}:% =\left\{\boldsymbol{\theta}\in\mathbb{R}^{n}\ \Big{|}\ \underset{i\in[n],\ i% \neq i_{0}}{\textbf{max}}\theta_{i}\leq\theta_{i_{0}}\right\},bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT := { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | start_UNDERACCENT italic_i ∈ [ italic_n ] , italic_i ≠ italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_θ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } , (9)

where i0subscript𝑖0i_{0}italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the winner candidate desired by 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A. Then we will say that 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A has a successful online manipulating strategy against 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R by substituting 𝓖(𝐂)𝓖𝐂\boldsymbol{\mathcal{G}}(\boldsymbol{C})bold_caligraphic_G ( bold_italic_C ) with 𝓖(𝐂)𝓖superscript𝐂\boldsymbol{\mathcal{G}}(\boldsymbol{C}^{\prime})bold_caligraphic_G ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) through sequential behavior. It is noteworthy that the goal in this paper implicitly requires sequential/online attack behavior, while [38] needs the help of offline manipulation strategy. The differences between online and offline strategies are shown in the following parts.

The Knowledge of Online Adversary. Let

𝑪(T)={c1,c2,,cT}𝑪𝑇subscript𝑐1subscript𝑐2subscript𝑐𝑇\boldsymbol{C}(T)=\{c_{1},c_{2},\dots,c_{T}\}bold_italic_C ( italic_T ) = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } (10)

be a sub-sequence of 𝑪𝑪\boldsymbol{C}bold_italic_C with its first T𝑇Titalic_T pairwise comparisons. Without loss of generality, the number of pairwise comparisons in 𝑪𝑪\boldsymbol{C}bold_italic_C will be increased by 1111 at each step as

𝑪(T)=[𝑪(T1),cT],T.formulae-sequence𝑪𝑇𝑪𝑇1subscript𝑐𝑇𝑇\boldsymbol{C}(T)=[\boldsymbol{C}(T-1),c_{T}],\ T\in\mathbb{N}.bold_italic_C ( italic_T ) = [ bold_italic_C ( italic_T - 1 ) , italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] , italic_T ∈ blackboard_N . (11)

As a consequence, the knowledge of 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A at T𝑇Titalic_T step, denoted 𝑪𝓐(T)=[𝑪𝓐(T1),φ(cT)]subscript𝑪𝓐𝑇subscript𝑪𝓐𝑇1𝜑subscript𝑐𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}(T)=[\boldsymbol{C}_{\boldsymbol{% \mathcal{A}}}{(T-1)},\varphi(c_{T})]bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) = [ bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T - 1 ) , italic_φ ( italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ], contains two parts:

  • a subset of 𝑪(T1)𝑪𝑇1\boldsymbol{C}(T-1)bold_italic_C ( italic_T - 1 ) as

    𝑪𝓐(T1)𝑪(T1),subscript𝑪𝓐𝑇1𝑪𝑇1\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T-1)}\subseteq\boldsymbol{C}{(T-1)},bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T - 1 ) ⊆ bold_italic_C ( italic_T - 1 ) , (12)
  • and the state of cTsubscript𝑐𝑇c_{T}italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT:

    φ(cT)={cT,if𝓐obtainscT,,otherwise,𝜑subscript𝑐𝑇casessubscript𝑐𝑇if𝓐obtainssubscript𝑐𝑇otherwise,\varphi(c_{T})=\begin{cases}c_{T},&\ \text{if}\ \boldsymbol{\mathcal{A}}\ % \text{obtains}\ c_{T},\\ \varnothing,&\text{otherwise,}\end{cases}italic_φ ( italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = { start_ROW start_CELL italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , end_CELL start_CELL if bold_caligraphic_A obtains italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL ∅ , end_CELL start_CELL otherwise, end_CELL end_ROW (13)

    where \varnothing indicates that no pairwise comparisons will enter the sequence.

Based on the completeness of 𝑪𝓐(T)subscript𝑪𝓐𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ), we consider the following two adversarial scenarios:

  1. i)

    If it holds that

    𝑪𝓐(T)=𝑪(T),T,formulae-sequencesubscript𝑪𝓐𝑇𝑪𝑇for-all𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}=\boldsymbol{C}{(T)},\ \forall\ % T\in\mathbb{N},bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) = bold_italic_C ( italic_T ) , ∀ italic_T ∈ blackboard_N , (14)

    that is 𝑪𝓐(T1)=𝑪(T1)subscript𝑪𝓐𝑇1𝑪𝑇1\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T-1)}=\boldsymbol{C}{(T-1)}bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T - 1 ) = bold_italic_C ( italic_T - 1 ) and φ(cT)=cT𝜑subscript𝑐𝑇subscript𝑐𝑇\varphi(c_{T})=c_{T}italic_φ ( italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, Tfor-all𝑇\forall\ T\in\mathbb{N}∀ italic_T ∈ blackboard_N, we say that 𝓐𝓐{\boldsymbol{\mathcal{A}}}bold_caligraphic_A has the complete knowledge.

  2. ii)

    If there exists a time step T𝑇Titalic_T such that

    𝑪𝓐(T)𝑪(T),subscript𝑪𝓐𝑇𝑪𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}\subset\boldsymbol{C}{(T)},bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) ⊂ bold_italic_C ( italic_T ) , (15)

    we say that 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A has incomplete knowledge. Limited by time and cost, the incomplete state will be held throughout the whole adversarial operation.

Special attention needs to be paid to the fact that the online manipulator in this paper lacks prior information of subsequent data 𝑪/𝑪(T)𝑪superscript𝑪𝑇\boldsymbol{C}/\boldsymbol{C}^{(T)}bold_italic_C / bold_italic_C start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT at T𝑇Titalic_T step. The offline manipulator of [37, 38], on the other hand, doesn’t need the prior information but requires the length of 𝑪𝑪\boldsymbol{C}bold_italic_C to no longer grow, i.e. there exist a step T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that |𝑪|=T0𝑪subscript𝑇0|\boldsymbol{C}|=T_{0}| bold_italic_C | = italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Consequently, the offline adversary in [37, 38] is a special case of the online manipulator, who is the online manipulator at the step that all pairwise comparisons have been collected. Such a distinction will affect the abilities of the offline and online attackers.

The Capability of Online Adversary. The above goal and knowledge empower the online attacker with completely divergent capabilities from those of the offline attacker. The online manipulator 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A is able to insert arbitrary pairwise comparisons into the data stream. Then the perturbed data will replace to produce the comparison graph for rank aggregation. More specifically, the fabricated pairwise comparisons with the knowledge 𝑪𝓐(T)subscript𝑪𝓐𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) is

𝒄(𝑪𝓐(T))=[c(1),,c(aT)],superscript𝒄subscript𝑪𝓐𝑇superscript𝑐1superscript𝑐subscript𝑎𝑇\boldsymbol{c}^{\prime}{(\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)})}=[c^{% \prime}(1),\dots,c^{\prime}(a_{T})],bold_italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) ) = [ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 ) , … , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ] , (16)

where aTsubscript𝑎𝑇a_{T}italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is the maximum number of possible insertions at T𝑇Titalic_T step. This sequence (16) reflects 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A’s capability. It is noticed that 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A is unable to change the pairwise comparisons in 𝑪𝓐(T)𝑪(T)subscript𝑪𝓐𝑇𝑪𝑇\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}\subseteq\boldsymbol{C}{(T)}bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) ⊆ bold_italic_C ( italic_T ). However, in addition to insertion, the offline attackers of [37, 38] could delete or flip a pairwise comparison ct𝑪𝓐(T)𝑪(T),tTformulae-sequencesubscript𝑐𝑡subscript𝑪𝓐𝑇𝑪𝑇for-all𝑡𝑇c_{t}\in\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}(T)\subseteq\boldsymbol{C}(T)% ,\forall\ t\leq Titalic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) ⊆ bold_italic_C ( italic_T ) , ∀ italic_t ≤ italic_T even through ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is generated in the past or protected by the defense mechanisms. Therefore, the online attacker is more restricted than its offline counterpart. The observed sequence of pairwise comparisons for 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R at T𝑇Titalic_T step is

𝑪(T)=(𝑪(T1),𝑪(T)/𝑪𝓐(T),𝒄(𝑪𝓐(T))).superscript𝑪𝑇superscript𝑪𝑇1𝑪𝑇subscript𝑪𝓐𝑇superscript𝒄subscript𝑪𝓐𝑇\boldsymbol{C}^{\prime}(T)=(\boldsymbol{C}^{\prime}(T-1),\boldsymbol{C}(T)/% \boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)},\boldsymbol{c}^{\prime}{(% \boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)})}).bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) = ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T - 1 ) , bold_italic_C ( italic_T ) / bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) , bold_italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ) ) ) . (17)

(17) is a mixture of the collected data 𝑪(T1)superscript𝑪𝑇1\boldsymbol{C}^{\prime}(T-1)bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T - 1 ), the data inaccessible to attackers 𝑪(T)/𝑪𝓐(T)𝑪𝑇subscript𝑪𝓐𝑇\boldsymbol{C}(T)/\boldsymbol{C}_{\boldsymbol{\mathcal{A}}}{(T)}bold_italic_C ( italic_T ) / bold_italic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_T ), and the fabricated pairwise comparisons (16).

3.2 Distributionally Robust Game between the Ranker and the Online Adversary

With the above threat modelling, we can further understand the adversarial scenario from a game-theoretic perspective. When there exists 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A, the pairwise comparisons for 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R come form two sources: the original data stream 𝑪𝑪\boldsymbol{C}bold_italic_C and the fraud data 𝑪/𝑪superscript𝑪𝑪\boldsymbol{C}^{\prime}/\boldsymbol{C}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / bold_italic_C. Due to the extreme difficulty of identifying the possible sources of pairwise comparisons, 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R is only able to aggregate 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and obtain a ranking list 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which is different with the result 𝜽𝜽\boldsymbol{\theta}bold_italic_θ form 𝑪𝑪\boldsymbol{C}bold_italic_C. However, the existence of normal data stream 𝑪𝑪\boldsymbol{C}bold_italic_C will alleviate the impact of 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A on 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and try to keep 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT away from 𝚯𝓐subscript𝚯𝓐\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. With the help of defense and protection mechanisms, we believe that 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R will select pairwise comparisons that will preserve the original result 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. At the same time, 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A needs to induce 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R with the interference of 𝑪𝑪\boldsymbol{C}bold_italic_C and make 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT sufficient to support 𝜽𝚯𝓐superscript𝜽subscript𝚯𝓐\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. As a consequence, this adversarial scenario is a game between 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A who choose pairwise comparisons to produce the desired ranking results.

To establish the adversarial game of the online adversary and the ranker, we first transfer the sequence of pairwise comparison at T𝑇Titalic_T step 𝑪(T)={c1,,ct,,cT}𝑪𝑇subscript𝑐1subscript𝑐𝑡subscript𝑐𝑇\boldsymbol{C}(T)=\{c_{1},\dots,c_{t},\dots,c_{T}\}bold_italic_C ( italic_T ) = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } into a comparison graph 𝓖(𝑪(T))={𝑽,𝑬,𝒘(T)}𝓖𝑪𝑇𝑽𝑬𝒘𝑇\boldsymbol{\mathcal{G}}(\boldsymbol{C}(T))=\{\boldsymbol{V},\boldsymbol{E},% \boldsymbol{w}(T)\}bold_caligraphic_G ( bold_italic_C ( italic_T ) ) = { bold_italic_V , bold_italic_E , bold_italic_w ( italic_T ) }. The vertex set 𝑽𝑽\boldsymbol{V}bold_italic_V is the set of all candidates as 𝑽=[n]𝑽delimited-[]𝑛\boldsymbol{V}=[n]bold_italic_V = [ italic_n ] and the edge set 𝑬𝑬\boldsymbol{E}bold_italic_E contains all directed edges between any pair of candidates as

𝑬={ij|i,j[n],ij}.𝑬conditional-set𝑖𝑗formulae-sequence𝑖𝑗delimited-[]𝑛𝑖𝑗\boldsymbol{E}=\{i\rightarrow j|i,\ j\in[n],\ i\neq j\}.bold_italic_E = { italic_i → italic_j | italic_i , italic_j ∈ [ italic_n ] , italic_i ≠ italic_j } . (18)

Here 𝒘(T)={w1,2(T),w1,3(T),,wn,n1(T)}𝒘𝑇subscript𝑤12𝑇subscript𝑤13𝑇subscript𝑤𝑛𝑛1𝑇\boldsymbol{w}(T)=\{w_{1,2}(T),w_{1,3}(T),\dots,w_{n,n-1}(T)\}bold_italic_w ( italic_T ) = { italic_w start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ( italic_T ) , italic_w start_POSTSUBSCRIPT 1 , 3 end_POSTSUBSCRIPT ( italic_T ) , … , italic_w start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ( italic_T ) } is the weights of 𝑬𝑬\boldsymbol{E}bold_italic_E in 𝑪(T)𝑪𝑇\boldsymbol{C}(T)bold_italic_C ( italic_T ) as

wi,j(T)=t=1T𝕀[ct=(i,j)]subscript𝑤𝑖𝑗𝑇superscriptsubscript𝑡1𝑇𝕀delimited-[]subscript𝑐𝑡𝑖𝑗w_{i,j}(T)=\sum_{t=1}^{T}\ \mathbbm{I}[c_{t}=(i,j)]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_T ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_I [ italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_i , italic_j ) ] (19)

where 𝕀[]𝕀delimited-[]\mathbbm{I}[\cdot]blackboard_I [ ⋅ ] is the Iverson bracket. The weight 𝒘(T)𝒘𝑇\boldsymbol{w}{(T)}bold_italic_w ( italic_T ) represents how often a pairwise comparison occurs in 𝑪(T)𝑪𝑇\boldsymbol{C}(T)bold_italic_C ( italic_T ). Furthermore, 𝒘(T)𝒘𝑇\boldsymbol{w}(T)bold_italic_w ( italic_T ) can be treated as a random variable defined on the probability space (𝕮,𝓔,)𝕮𝓔(\boldsymbol{\mathfrak{C}},\boldsymbol{\mathcal{E}},\mathbb{P})( bold_fraktur_C , bold_caligraphic_E , blackboard_P ):

𝒘(T),similar-to𝒘𝑇\boldsymbol{w}(T)\sim\mathbb{P},bold_italic_w ( italic_T ) ∼ blackboard_P , (20)

where 𝕮𝕮\boldsymbol{\mathfrak{C}}bold_fraktur_C is the sample space of all possible pairwise comparisons involving n𝑛nitalic_n candidates

𝕮={(i,j)|i,j[n],ij},𝕮conditional-set𝑖𝑗formulae-sequence𝑖𝑗delimited-[]𝑛𝑖𝑗\boldsymbol{\mathfrak{C}}=\{(i,j)\ |\ i,j\in[n],\ i\neq j\},bold_fraktur_C = { ( italic_i , italic_j ) | italic_i , italic_j ∈ [ italic_n ] , italic_i ≠ italic_j } , (21)

𝓔𝓔\boldsymbol{\mathcal{E}}bold_caligraphic_E is the event space of all sequences with length T𝑇Titalic_T and \mathbb{P}blackboard_P is a probability function. Consequently, the data sequence 𝑪(T)𝑪𝑇\boldsymbol{C}(T)bold_italic_C ( italic_T ) is associated with data distribution \mathbb{P}blackboard_P which describes the occurrence frequency of different pairwise comparisons.

A notable characteristic of the adversarial game in this paper is that the decision-making processes of 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A involve uncertainty: the true distribution of the mixed sequence (the observed weight) is unknown to both 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A during the whole procedure. All players bear the risk due to the uncertainty. Then the resulting Nash equilibrium may be different from the equilibrium with the true distribution. The uncertainty drives 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A to adopt conservative strategies. To be more specific, we consider the following Nash equilibrium problem: at any time step, each player needs to make decisions prior to the realization of uncertainty by minimizing their expected dis-utility with the most pessimistic situation:

min𝒂r+Nsup𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝝅r,𝒘)],subscript𝒂𝑟subscriptsuperscript𝑁minsubscript𝓟𝑟supsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscript𝒂𝑟subscript𝒂𝑟subscript𝝅𝑟𝒘\underset{\boldsymbol{a}_{r}\in\mathbb{Z}^{N}_{+}}{\ \ \textbf{{min}}\phantom{% g}}\ \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r}}{\ \textbf{{sup}}% \phantom{g}}\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}f_{r}(% \boldsymbol{a}_{r},\boldsymbol{a}_{-r},\boldsymbol{\pi}_{r},\boldsymbol{w})% \Big{]},start_UNDERACCENT bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ blackboard_Z start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_UNDERACCENT start_ARG min end_ARG start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] , (22)

where r𝑟ritalic_r represents the rthsuperscript𝑟thr^{\text{th}}italic_r start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT player in the game. Such a distributionally robust game (DRG) models the interaction between the ranker and adversary. Each player in this game holds a continuous dis-utility function as

fr:+N×+N×n×N,:subscript𝑓𝑟subscriptsuperscript𝑁subscriptsuperscript𝑁superscript𝑛superscript𝑁f_{r}:\mathbb{Z}^{N}_{+}\times\mathbb{Z}^{N}_{+}\times\mathbb{N}^{n}\times% \mathbb{R}^{N}\rightarrow\mathbb{R},italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : blackboard_Z start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT × blackboard_Z start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT × blackboard_N start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R , (23)

N𝑁Nitalic_N is the cardinality of 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C as N:=|𝓒|=n(n1)assign𝑁𝓒𝑛𝑛1N:=|\boldsymbol{\mathcal{C}}|=n(n-1)italic_N := | bold_caligraphic_C | = italic_n ( italic_n - 1 ). The action of r𝑟ritalic_r, 𝒂r={a1,2r,,an,n1r}+Nsubscript𝒂𝑟subscriptsuperscript𝑎𝑟12subscriptsuperscript𝑎𝑟𝑛𝑛1subscriptsuperscript𝑁\boldsymbol{a}_{r}=\{a^{r}_{1,2},\dots,a^{r}_{n,n-1}\}\in\mathbb{Z}^{N}_{+}bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = { italic_a start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT } ∈ blackboard_Z start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, indicates the number of pairwise comparisons selecting by r𝑟ritalic_r, and 𝒂rsubscript𝒂𝑟\boldsymbol{a}_{-r}bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT represent the actions of r𝑟ritalic_r’s opponents. 𝝅rsubscript𝝅𝑟\boldsymbol{\pi}_{r}bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the desired ranking list of player r𝑟ritalic_r. Here the “maximum” operation w.r.t \mathbb{P}blackboard_P means the player r𝑟ritalic_r decides his/her optimal strategy on the worst expected value of frsubscript𝑓𝑟f_{r}italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT from the set of distributions 𝓟rsubscript𝓟𝑟\boldsymbol{\mathcal{P}}_{r}bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT which is constructed from the partially observed information 𝒘rsubscript𝒘𝑟\boldsymbol{w}_{r}bold_italic_w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. (22) is known as the distributionally robust game [35] in the literature. The solution of (22) is named as distributionally robust Nash equilibrium. If any ambiguity set 𝓟rsubscript𝓟𝑟\boldsymbol{\mathcal{P}}_{r}bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT only contains a single distribution, (22) collapses to a stochastic game problem [13]. It is noticed that the ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R is often unaware of the existence of 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A and the corresponding action turns out to be

min𝒂r𝔼𝒘0[fr(𝒂r,𝒘,𝝅r)],subscript𝒂𝑟minsubscript𝔼similar-to𝒘subscript0delimited-[]subscript𝑓𝑟subscript𝒂𝑟𝒘subscript𝝅𝑟\underset{\boldsymbol{a}_{r}}{\textbf{{min}}}\ \ \mathbb{E}_{\boldsymbol{w}% \sim\mathbb{P}_{0}}\Big{[}f_{r}(\boldsymbol{a}_{r},\boldsymbol{w},\boldsymbol{% \pi}_{r})\Big{]},start_UNDERACCENT bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG min end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ] , (24)

which means the ranker will focus on the original ranking list and choose the data with some sampling methods. Meanwhile the player of 𝓐𝓐{\boldsymbol{\mathcal{A}}}bold_caligraphic_A still needs to consider the most pessimistic situation as (22).

Definition 1 (Distributionally Robust Nash Equilibrium).

A tuple (𝐚1,,𝐚R)subscriptsuperscript𝐚1subscriptsuperscript𝐚𝑅(\boldsymbol{a}^{*}_{1},\dots,\boldsymbol{a}^{*}_{R})( bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) is a distributionally robust Nash equilibrium (DRNE) if

𝒂rarg min𝒂rsup𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝝅r,𝒘)],subscriptsuperscript𝒂𝑟subscript𝒂𝑟arg minsubscript𝓟𝑟supsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscript𝒂𝑟subscriptsuperscript𝒂𝑟subscript𝝅𝑟𝒘\boldsymbol{a}^{*}_{r}\in\underset{\boldsymbol{a}_{r}}{\textbf{{arg~{}min}}}\ % \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r}}{\textbf{{sup}}}\ \ % \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}f_{r}(\boldsymbol{a}_{r},% \boldsymbol{a}^{*}_{-r},\boldsymbol{\pi}_{r},\boldsymbol{w})\Big{]},bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ start_UNDERACCENT bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG arg min end_ARG start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] , (25)

where

𝒂r=sr𝒂s.subscriptsuperscript𝒂𝑟subscript𝑠𝑟subscriptsuperscript𝒂𝑠\boldsymbol{a}^{*}_{-r}=\sum_{s\neq r}\boldsymbol{a}^{*}_{s}.bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s ≠ italic_r end_POSTSUBSCRIPT bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT . (26)

This definition shows that the DRNE is a solution of the corresponding DRG (22). Here we consider the case of R=2𝑅2R=2italic_R = 2, say that the game between 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐{\boldsymbol{\mathcal{A}}}bold_caligraphic_A. In what follows, we investigate the existence of DRNE by the following theorem. The detailed proof can be found in the supplementary materials.

Theorem 1.

There exists a DRNE (25) if the following states hold for any r=1,,R𝑟1𝑅r=1,\dots,Ritalic_r = 1 , … , italic_R.

  1. (a)

    Given (𝒂r,𝝅r,𝒘)subscript𝒂𝑟subscript𝝅𝑟𝒘(\boldsymbol{a}_{-r},\boldsymbol{\pi}_{r},\boldsymbol{w})( bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w ), fr(,𝒂r,𝝅r,𝒘)subscript𝑓𝑟subscript𝒂𝑟subscript𝝅𝑟𝒘f_{r}(\cdot,\boldsymbol{a}_{-r},\boldsymbol{\pi}_{r},\boldsymbol{w})italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ⋅ , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w ) is convex over +Nsubscriptsuperscript𝑁\mathbb{Z}^{N}_{+}blackboard_Z start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT.

  2. (b)

    𝔼𝒘[fr(𝒂r,𝒂r,𝝅r,𝒘)]subscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscript𝒂𝑟subscript𝒂𝑟subscript𝝅𝑟𝒘\mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\big{[}f_{r}(\boldsymbol{a}_{r},% \boldsymbol{a}_{-r},\boldsymbol{\pi}_{r},\boldsymbol{w})\big{]}blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] has finite values for any (𝒂r,𝒂r)subscript𝒂𝑟subscript𝒂𝑟(\boldsymbol{a}_{r},\boldsymbol{a}_{-r})( bold_italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT ), 𝓟rsubscript𝓟𝑟\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r}blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and 𝝅rsubscript𝝅𝑟\boldsymbol{\pi}_{r}bold_italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is a permutation of [n]delimited-[]𝑛[n][ italic_n ].

  3. (c)

    𝓟rsubscript𝓟𝑟\boldsymbol{\mathcal{P}}_{r}bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT has weakly compactness.

This result tells us that there exists at least one stable state for both 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A with the above conditions holding. The next key step toward executing manipulation is to identify an equilibrium state that is favorable to 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A or not. When an equilibrium state is favorable to 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A, the perturbed data 𝑪(T)superscript𝑪𝑇\boldsymbol{C}^{\prime}(T)bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) could lead 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R generating 𝜽𝚯𝓐superscript𝜽subscript𝚯𝓐\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT.

3.3 Successful Opportunity to Sequential Manipulation

Without prior knowledge of the original data distribution, analyzing the equilibrium of the proposed distributionally robust game is really challenging. Here we try to dissect the outcome after the adversarial game directly. At the end of the distributionally robust game between 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A, we have a sequence of pairwise comparisons (17) for construction of the comparison graph. To simulate the competitive results of (25), we treat (17) as a stochastic process, the output of a sampling method 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S. The random nature of the sampling process replaces the uncertainty of the distributionally robust game. In addition, utilizing 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S to analyze 𝑪(T)superscript𝑪𝑇\boldsymbol{C}^{\prime}(T)bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) makes our subsequent discussion more pertinent to actual confrontation scenarios. Here we show that the two classic sampling methods will become an accomplice of the online manipulator, who help to generate the stable sequences favoring to 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A’s goal.

In the non-adversarial scenario, an (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative sampling method always suffices to take only a small number of random samples 𝑪𝑪\boldsymbol{C}bold_italic_C in order to represent the data source 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C truthfully [48]. Even with the aimless attacker[8] who still adopts the original data source 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C as the source of perturbation, Bernoulli and the reservoir sampling methods could lead 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R to generate the same ranking result. To be specific, the perturbed sequence 𝑪1subscriptsuperscript𝑪1\boldsymbol{C}^{\prime}_{1}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the original sequence 𝑪𝑪\boldsymbol{C}bold_italic_C would produce different comparison graph weights 𝒘𝑪1subscript𝒘subscriptsuperscript𝑪1\boldsymbol{w}_{\boldsymbol{C}^{\prime}_{1}}bold_italic_w start_POSTSUBSCRIPT bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒘𝑪subscript𝒘𝑪\boldsymbol{w}_{\boldsymbol{C}}bold_italic_w start_POSTSUBSCRIPT bold_italic_C end_POSTSUBSCRIPT. However, they still obey the same distribution as

𝒘𝑪,𝒘𝑪10,similar-tosubscript𝒘𝑪subscript𝒘subscriptsuperscript𝑪1subscript0\boldsymbol{w}_{\boldsymbol{C}},\ \boldsymbol{w}_{\boldsymbol{C}^{\prime}_{1}}% \sim\mathbb{P}_{0},bold_italic_w start_POSTSUBSCRIPT bold_italic_C end_POSTSUBSCRIPT , bold_italic_w start_POSTSUBSCRIPT bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (27)

where 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the distribution of original comparison graph weights. If the probability distribution of comparison graph’s weight 𝒘𝒘\boldsymbol{w}bold_italic_w is 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, it holds that

𝓡(𝒘𝑪)=𝓡(𝒘𝑪)=𝜽0.𝓡subscript𝒘𝑪𝓡subscript𝒘superscript𝑪subscript𝜽0\boldsymbol{\mathcal{R}}(\boldsymbol{w}_{\boldsymbol{C}})=\boldsymbol{\mathcal% {R}}(\boldsymbol{w}_{\boldsymbol{C}^{\prime}})=\boldsymbol{\theta}_{0}.bold_caligraphic_R ( bold_italic_w start_POSTSUBSCRIPT bold_italic_C end_POSTSUBSCRIPT ) = bold_caligraphic_R ( bold_italic_w start_POSTSUBSCRIPT bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (28)

Consequently, 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R generates the original ranking list 𝝅𝜽0subscript𝝅subscript𝜽0\boldsymbol{\pi}_{\boldsymbol{\theta}_{0}}bold_italic_π start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT even if the aimless attacker 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A exists. This so-called “adversarial robustness[8] is on side of the sampling methods.

Unfortunately, the attacker will be sophisticated in the real confrontation scenario. He/she could construct new data sources instead of simply using 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C. For example, given any 𝜽𝚯𝓐superscript𝜽subscript𝚯𝓐\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT like (9), 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A could generate the pairwise comparison through the BTL model: the larger the θi/θjsuperscriptsubscript𝜃𝑖superscriptsubscript𝜃𝑗\theta_{i}^{\prime}/\theta_{j}^{\prime}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the higher the probability of generating pairwise comparison ijsucceeds𝑖𝑗i\succ jitalic_i ≻ italic_j. Such actions construct the adversarial data source 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT whose underlying distribution is 𝓐subscript𝓐\mathbb{P}_{\boldsymbol{\mathcal{A}}}blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT which is distinct from the 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R will produce the manipulated score:

𝓡(𝒘)=𝜽,𝒘𝓐.formulae-sequence𝓡𝒘superscript𝜽similar-to𝒘subscript𝓐\boldsymbol{\mathcal{R}}(\boldsymbol{w})=\boldsymbol{\theta}^{\prime},\ % \boldsymbol{w}\sim\mathbb{P}_{\boldsymbol{\mathcal{A}}}.bold_caligraphic_R ( bold_italic_w ) = bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_w ∼ blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT . (29)

The distributionally robust game between 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C and 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT (22) creates the mixed data source 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Once the underlying distribution of 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is consistent with 𝓐subscript𝓐\mathbb{P}_{\boldsymbol{\mathcal{A}}}blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, it holds that

𝒘𝑪2𝓐,similar-tosubscript𝒘subscriptsuperscript𝑪2subscript𝓐\boldsymbol{w}_{\boldsymbol{C}^{\prime}_{2}}\sim\mathbb{P}_{\boldsymbol{% \mathcal{A}}},bold_italic_w start_POSTSUBSCRIPT bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT , (30)

where 𝑪2subscriptsuperscript𝑪2\boldsymbol{C}^{\prime}_{2}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the perturbed sequence form 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. There is no doubt that the sampled data 𝑪2subscriptsuperscript𝑪2\boldsymbol{C}^{\prime}_{2}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT will lead 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R to obtain 𝝅𝜽subscript𝝅superscript𝜽\boldsymbol{\pi}_{\boldsymbol{\theta}^{\prime}}bold_italic_π start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT as (29). We call such an attacker the “purposeful adversary”. Then the (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative sampling algorithms would fall into a trap. From the perspective of the purposeful adversary, the original “representativeness” turns into the “vulnerability”. This “vulnerability” is the other side of the sampling methods like Bernoulli and reservoir.

We introduce some definitions which will help to establish the vulnerability results of Bernoulli and reservoir sampling methods. The data stream from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a mixture of two data streams from 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C and 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. The mixed data source 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT satisfies

𝓒𝓒𝓒𝓐,𝓒𝓒,𝓒𝓒𝓐.formulae-sequencesuperscript𝓒𝓒subscript𝓒𝓐formulae-sequencesuperscript𝓒𝓒superscript𝓒subscript𝓒𝓐\boldsymbol{\mathcal{C}}^{\prime}\subseteq\boldsymbol{\mathcal{C}}\cup% \boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}},\ \boldsymbol{\mathcal{C}}% ^{\prime}\cap\boldsymbol{\mathcal{C}}\neq\varnothing,\ \boldsymbol{\mathcal{C}% }^{\prime}\cap\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}\neq\varnothing.bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ bold_caligraphic_C ∪ bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT , bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ bold_caligraphic_C ≠ ∅ , bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ≠ ∅ . (31)

Here we consider the following two types of mixtures, which correspond to different behaviors of players in the distributionally robust game. In fact, the dynamic stream comes from the distributionally robust game whose players execute (22) and the static stream corresponds to players who executes (24).

Definition 2 (Static stream).

Let 𝐂={ct}t=1𝐂superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a sequence from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT which is a mixture of two data streams from 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C and 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. For any ct𝓒𝓒subscript𝑐𝑡superscript𝓒𝓒c_{t}\in\boldsymbol{\mathcal{C}}^{\prime}\cap\boldsymbol{\mathcal{C}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ bold_caligraphic_C, if the generation of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is independent with {c1,,ct1}subscript𝑐1subscript𝑐𝑡1\{c_{1},\dots,c_{t-1}\}{ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT }, we call 𝐂𝐂\boldsymbol{C}bold_italic_C from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a static stream.

Definition 3 (Dynamic stream).

Let 𝐂={ct}t=1𝐂superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a sequence from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT which is a mixture of two data streams from 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C and 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. For any ct𝓒𝓒subscript𝑐𝑡superscript𝓒𝓒c_{t}\in\boldsymbol{\mathcal{C}}^{\prime}\cap\boldsymbol{\mathcal{C}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ bold_caligraphic_C and ct𝓒𝓒𝓐subscript𝑐𝑡superscript𝓒subscript𝓒𝓐c_{t}\notin\boldsymbol{\mathcal{C}}^{\prime}\cap\boldsymbol{\mathcal{C}}_{% \boldsymbol{\mathcal{A}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, if the generation of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is dependent with {c1,,ct1}subscript𝑐1subscript𝑐𝑡1\{c_{1},\dots,c_{t-1}\}{ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT }, we call 𝐂𝐂\boldsymbol{C}bold_italic_C from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a dynamic stream.

The concept of ϵitalic-ϵ\epsilonitalic_ϵ-approximation measures the similarity between two sequences from the same data source.

  Input : the number of turns T𝑇Titalic_T, the sampling parameter ϱitalic-ϱ\varrhoitalic_ϱ, the true ranking list 𝝅0subscript𝝅0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the target ranking list 𝝅𝓐subscript𝝅𝓐\boldsymbol{\pi}_{\boldsymbol{\mathcal{A}}}bold_italic_π start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, the dis-utility functions of the original and adversarial data sources f𝑓fitalic_f, f𝓐subscript𝑓𝓐f_{\boldsymbol{\mathcal{A}}}italic_f start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, the stop** time S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the unweighted complete comparison graph 𝓖={𝑬,𝑽}𝓖𝑬𝑽\boldsymbol{\mathcal{G}}=\{\boldsymbol{E},\boldsymbol{V}\}bold_caligraphic_G = { bold_italic_E , bold_italic_V }.
1
2Initialization: let the stream, the sampled sequence and the comparison graph weights be empty:
𝑪=,𝑪=,𝒘(0)=𝟎.formulae-sequence𝑪formulae-sequencesuperscript𝑪𝒘00\boldsymbol{C}=\varnothing,\ \boldsymbol{C}^{\prime}=\varnothing,\ \boldsymbol% {w}(0)=\boldsymbol{0}.bold_italic_C = ∅ , bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ∅ , bold_italic_w ( 0 ) = bold_0 .
3for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do
4       Action of the ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R:
𝒂argmin𝒂maxg𝓟(𝒘(t1))𝔼𝒘[f(𝒂,𝒘,𝝅0)].𝒂𝒂argmin𝓟𝒘𝑡1maxgsubscript𝔼similar-to𝒘delimited-[]𝑓𝒂𝒘subscript𝝅0\displaystyle\boldsymbol{a}\in\underset{\phantom{\boldsymbol{\mathcal{P}}}% \boldsymbol{a}\phantom{\boldsymbol{\mathcal{P}}}}{\textbf{{argmin}}}\underset{% \mathbb{P}\in\boldsymbol{\mathcal{P}}(\boldsymbol{w}(t-1))}{\textbf{{max% \phantom{g}}}}\ \ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\big{[}f(% \boldsymbol{a},\boldsymbol{w},\boldsymbol{\pi}_{0})\big{]}.bold_italic_a ∈ start_UNDERACCENT bold_italic_a end_UNDERACCENT start_ARG argmin end_ARG start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P ( bold_italic_w ( italic_t - 1 ) ) end_UNDERACCENT start_ARG max bold_italic_g end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f ( bold_italic_a , bold_italic_w , bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] .
5      Update the data stream 𝑪𝑪\boldsymbol{C}bold_italic_C:
𝑪𝒂.𝑪𝒂\boldsymbol{C}\leftarrow\boldsymbol{a}.bold_italic_C ← bold_italic_a .
6      Action of sampling method 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S:
𝑪𝓢(𝑪,ϱ).superscript𝑪𝓢𝑪italic-ϱ\boldsymbol{C}^{\prime}\leftarrow\boldsymbol{\mathcal{S}}(\boldsymbol{C},% \varrho).bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← bold_caligraphic_S ( bold_italic_C , italic_ϱ ) .
7      Update the weights:
𝒘(t)𝑪.𝒘𝑡superscript𝑪\boldsymbol{w}(t)\leftarrow\boldsymbol{C}^{\prime}.bold_italic_w ( italic_t ) ← bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .
8      Let s=1𝑠1s=1italic_s = 1 and update the knowledge:
𝒘𝓐(s)=mask(𝒘(t)).subscript𝒘𝓐𝑠mask𝒘𝑡\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(s)=\text{mask}(\boldsymbol{w}(t)).bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_s ) = mask ( bold_italic_w ( italic_t ) ) .
9      while s<S0𝑠subscript𝑆0s<S_{0}italic_s < italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝐰(s)𝐰𝓐(S0)𝐰𝑠subscript𝐰𝓐subscript𝑆0\boldsymbol{w}{(s)}\neq\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}{(S_{0})}bold_italic_w ( italic_s ) ≠ bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) do
10             Action of the online manipulator 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A:
𝒂𝓐(s)subscript𝒂𝓐𝑠absent\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \boldsymbol% {a}_{\boldsymbol{\mathcal{A}}}(s)\inbold_italic_a start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_s ) ∈
argmin𝒂maxg𝓟(𝒘𝓐(s))𝔼𝒘[f𝓐(𝒂,𝒘,𝝅𝓐)].𝒂argmin𝓟subscript𝒘𝓐𝑠maxgsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝓐𝒂𝒘subscript𝝅𝓐\displaystyle\underset{\phantom{\boldsymbol{\mathcal{P}}}\boldsymbol{a}% \phantom{\boldsymbol{\mathcal{P}}}}{\textbf{{argmin}}}\ \underset{\mathbb{P}% \in\boldsymbol{\mathcal{P}}(\boldsymbol{w}_{\boldsymbol{\mathcal{A}}(s)})}{% \textbf{{max\phantom{g}}}}\ \ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\big{[}% f_{\boldsymbol{\mathcal{A}}}(\boldsymbol{a},\boldsymbol{w},\boldsymbol{\pi}_{% \boldsymbol{\mathcal{A}}})\big{]}.start_UNDERACCENT bold_italic_a end_UNDERACCENT start_ARG argmin end_ARG start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P ( bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A ( italic_s ) end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max bold_italic_g end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( bold_italic_a , bold_italic_w , bold_italic_π start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ) ] .
11            Update the data stream 𝑪𝑪\boldsymbol{C}bold_italic_C:
𝑪𝒂𝓐(s).𝑪subscript𝒂𝓐𝑠\boldsymbol{C}\leftarrow\boldsymbol{a}_{\boldsymbol{\mathcal{A}}}(s).bold_italic_C ← bold_italic_a start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_s ) .
12            Action of sampling method 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S:
𝑪𝓢(𝑪,ϱ).superscript𝑪𝓢𝑪italic-ϱ\boldsymbol{C}^{\prime}\leftarrow\boldsymbol{\mathcal{S}}(\boldsymbol{C},% \varrho).bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← bold_caligraphic_S ( bold_italic_C , italic_ϱ ) .
13            Let ss+1𝑠𝑠1s\leftarrow s+1italic_s ← italic_s + 1 and Update the knowledge:
𝒘𝓐(s)𝑪.subscript𝒘𝓐𝑠superscript𝑪\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(s)\leftarrow\boldsymbol{C}^{\prime}.bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_s ) ← bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .
14       end while
15      
16      Update the weights:
𝒘(t)𝑪.𝒘𝑡superscript𝑪\boldsymbol{w}{(t)}\leftarrow\boldsymbol{C}^{\prime}.bold_italic_w ( italic_t ) ← bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .
17 end for
Output : 𝓖(T)={𝑬,𝑽,𝒘(T)}superscript𝓖𝑇𝑬𝑽superscript𝒘𝑇\boldsymbol{\mathcal{G}}^{(T)}=\{\boldsymbol{E},\boldsymbol{V},\boldsymbol{w}^% {(T)}\}bold_caligraphic_G start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT = { bold_italic_E , bold_italic_V , bold_italic_w start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT }.
Algorithm 1 Online Interaction between 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A and 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R
Definition 4 (ϵitalic-ϵ\epsilonitalic_ϵ-approximation).

A sequence 𝐂1subscript𝐂1\boldsymbol{C}_{1}bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is an ϵitalic-ϵ\epsilonitalic_ϵ-approximation of sequence 𝐂0subscript𝐂0\boldsymbol{C}_{0}bold_italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with respect to the data source 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C, if there exists an ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) such that

|d𝓒(𝑪0)d𝓒(𝑪1)|ϵ,subscript𝑑𝓒subscript𝑪0subscript𝑑𝓒subscript𝑪1italic-ϵ|d_{\boldsymbol{\mathcal{C}}}(\boldsymbol{C}_{0})-d_{\boldsymbol{\mathcal{C}}}% (\boldsymbol{C}_{1})|\leq\epsilon,| italic_d start_POSTSUBSCRIPT bold_caligraphic_C end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) | ≤ italic_ϵ , (32)

where 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C is a data source, d𝓒(𝐂)subscript𝑑𝓒𝐂d_{\boldsymbol{\mathcal{C}}}(\boldsymbol{C})italic_d start_POSTSUBSCRIPT bold_caligraphic_C end_POSTSUBSCRIPT ( bold_italic_C ) is the density of 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C in the sequence 𝐂𝐂\boldsymbol{C}bold_italic_C, the fraction of pairwise comparisons in 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C that are also in 𝐂𝐂\boldsymbol{C}bold_italic_C:

d𝓒(𝑪)=(c𝓒|c𝑪).subscript𝑑𝓒𝑪𝑐conditional𝓒𝑐𝑪d_{\boldsymbol{\mathcal{C}}}(\boldsymbol{C})=\mathbb{P}\big{(}c\in\boldsymbol{% \mathcal{C}}\ \big{|}\ c\in\boldsymbol{C}\big{)}.italic_d start_POSTSUBSCRIPT bold_caligraphic_C end_POSTSUBSCRIPT ( bold_italic_C ) = blackboard_P ( italic_c ∈ bold_caligraphic_C | italic_c ∈ bold_italic_C ) . (33)

This definition give us a similarity metric between two sequences with the density function. It is noteworthy that the lengths of 𝑪0subscript𝑪0\boldsymbol{C}_{0}bold_italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝑪1subscript𝑪1\boldsymbol{C}_{1}bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT could be different, where the length of 𝑪0subscript𝑪0\boldsymbol{C}_{0}bold_italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT could be infinite and 𝑪1subscript𝑪1\boldsymbol{C}_{1}bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT only has a limited number of elements. To portray the data source and analyze the vulnerability of sampling methods, we adopt the (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representativeness to quantify the quality of a sampling method w.r.t a data source.

Definition 5 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representativeness).

A sampling method is called (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative if the sampled sequence of 𝐂1subscript𝐂1\boldsymbol{C}_{1}bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is an ϵitalic-ϵ\epsilonitalic_ϵ-approximation of the whole stream 𝐂0subscript𝐂0\boldsymbol{C}_{0}bold_italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with respect to 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C, with probability at least 1δ1𝛿1-\delta1 - italic_δ.

The following theoretical results show that as long as the sampling parameters satisfy certain conditions, 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S must be (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Theorem 2.

Let 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a mixture of 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C and 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT satisfying (31).

  1. i)

    For any static stream 𝑪={ct}t=1𝑪superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT,

    • if the parameter of Bernoulli sampling method satisfies

      ϱclogn(n1)+ln(1/δ)ϵ2T,italic-ϱ𝑐log𝑛𝑛1ln1𝛿superscriptitalic-ϵ2𝑇\varrho\geq c\cdot\frac{\textbf{{log}}\ n(n-1)+\textbf{{ln}}(1/\delta)}{% \epsilon^{2}T},italic_ϱ ≥ italic_c ⋅ divide start_ARG log italic_n ( italic_n - 1 ) + ln ( 1 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , (34)

      where T𝑇Titalic_T is the number of sampling, the output of Bernoulli sampling is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

    • if the parameter of reservoir sampling method satisfies

      ϱclogn(n1)+ln(1/δ)ϵ2,italic-ϱ𝑐log𝑛𝑛1ln1𝛿superscriptitalic-ϵ2\varrho\geq c\cdot\frac{\textbf{{log}}\ n(n-1)+\textbf{{ln}}(1/\delta)}{% \epsilon^{2}},italic_ϱ ≥ italic_c ⋅ divide start_ARG log italic_n ( italic_n - 1 ) + ln ( 1 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (35)

      the output of reservoir sampling is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

  2. ii)

    For any dynamic stream 𝑪={ct}t=1𝑪superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT,

    • if the parameter of Bernoulli sampling method satisfies

      ϱ10ln|𝓒|+ln(4/δ)ϵ2T,italic-ϱ10lnsuperscript𝓒ln4𝛿superscriptitalic-ϵ2𝑇\varrho\geq 10\cdot\frac{\displaystyle\textbf{{ln}}\ |\boldsymbol{\mathcal{C}}% ^{\prime}|+\textbf{{ln}}(4/\delta)}{\epsilon^{2}T},italic_ϱ ≥ 10 ⋅ divide start_ARG ln | bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + ln ( 4 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , (36)

      where T𝑇Titalic_T is the number of sampling, the output of Bernoulli sampling is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

    • if the parameter of reservoir sampling method satisfies

      ϱ10ln|𝓒|+ln(2/δ)ϵ2,italic-ϱ10lnsuperscript𝓒ln2𝛿superscriptitalic-ϵ2\varrho\geq 10\cdot\frac{\displaystyle\textbf{{ln}}\ |\boldsymbol{\mathcal{C}}% ^{\prime}|+\textbf{{ln}}(2/\delta)}{\epsilon^{2}},italic_ϱ ≥ 10 ⋅ divide start_ARG ln | bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + ln ( 2 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (37)

      the output of reservoir sampling is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Theorem 2 indicates that the mixed data source 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT could be a DRNE which will be a favor to 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A. When the underlying distribution of 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is consistent with 𝓒𝓐subscript𝓒𝓐\boldsymbol{\mathcal{C}}_{\boldsymbol{\mathcal{A}}}bold_caligraphic_C start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, the Bernoulli and reservoir sampling methods always select the data that consisted with 𝝅(𝜽)𝝅superscript𝜽\boldsymbol{\pi}(\boldsymbol{\theta}^{\prime})bold_italic_π ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) with high probability. The detailed proof can be found in the supplementary materials. Now we formally define the online adversarial interaction between 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R and 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A discussed in this paper.

  • The behavior of 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R relies on the original data source 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C whose distribution 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT would lead 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R to generate 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. The pairwise comparisons from 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C would be contrary to the attacker’s goal as 𝜽𝚯𝓐𝜽subscript𝚯𝓐\boldsymbol{\theta}\notin\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_italic_θ ∉ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. The way that 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C generates data can be active or passive, corresponding to dynamic and static streams, respectively. In fact, 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C often takes a passive approach like (24) and its dis-utility f𝑓fitalic_f is dependent of the sampler 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S. When 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C is active, we consider that the defense and protection mechanisms exist. It means that 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C will help the ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R to take actions like (22) and maintain 𝝅𝜽0subscript𝝅subscript𝜽0\boldsymbol{\pi}_{\boldsymbol{\theta}_{0}}bold_italic_π start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT as much as possible.

  • In the proposed adversarial game, the online manipulator 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A will try his/her best to creat data source 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT whose distribution 𝓐subscript𝓐\mathbb{P}_{\boldsymbol{\mathcal{A}}}blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT would induce 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R to obtain 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The sequential strategy that 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A employs along the way is dynamic. Based on the current partial information 𝒘𝓐(t)subscript𝒘𝓐𝑡\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(t)bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_t ), 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A would like to choose the most helpful comparisons by f𝓐subscript𝑓𝓐f_{\boldsymbol{\mathcal{A}}}italic_f start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT, which reduce the divergence between the potential aggregated result and the target ranking. 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A needs to be aware of the existence of original ranking data which is always an obstruction of archiving his/her goal. Therefore, 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A needs to take the most pessimistic action as (22).

All data sources are sampled with the same sampler 𝓢𝓢\boldsymbol{\mathcal{S}}bold_caligraphic_S. The proposed online adversarial interaction is summarized in Algorithm 1. The ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R will obtain the aggregated result with the output 𝓖(T)={𝑬,𝑽,𝒘(T)}𝓖𝑇𝑬𝑽𝒘𝑇\boldsymbol{\mathcal{G}}(T)=\{\boldsymbol{E},\boldsymbol{V},\boldsymbol{w}(T)\}bold_caligraphic_G ( italic_T ) = { bold_italic_E , bold_italic_V , bold_italic_w ( italic_T ) }. The remaining question of executing manipulation is how to construct the data source 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT which owns 𝓐subscript𝓐\mathbb{P}_{\boldsymbol{\mathcal{A}}}blackboard_P start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT as the underlying distribution. We provide the details of dynamic attack strategy, say the adversarial pairwise comparison generation process, in the next section.

4 Adversarial Generation Process

In Section 4.1, we propose two adversarial policies for adversary 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A with complete knowledge and present the asymptotic optimality of these policies. Then, we provide the efficient optimization algorithm for incomplete information in Section 4.2.

4.1 Sequential Generation with Complete Knowledge

The strategies of 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A try to maximize the consistency between full order with his/her goal 𝜽𝚯𝓐superscript𝜽subscript𝚯𝓐\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\mathcal{A}}}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT in the online adversarial game. 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A should choose the most destructive comparisons to inject based on the current partial information and stop when the ambiguity of ranking list falls below a certain level. The actions of adversary consist of two components: an adaptive generation rule and a stop** time. For the adaptive rule, we adopt probabilistic rules which contain the deterministic rules as the special cases. Let λi,jsubscript𝜆𝑖𝑗\lambda_{i,j}italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT denote the probability of generating (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) and 𝝀=[λ1,2,,λn,n1]𝚫𝝀subscript𝜆12subscript𝜆𝑛𝑛1𝚫\boldsymbol{\lambda}=[\lambda_{1,2},\dots,\lambda_{n,n-1}]\in\boldsymbol{\Delta}bold_italic_λ = [ italic_λ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ] ∈ bold_Δ be the categorical distribution, where

𝚫={𝝀|(i,j)λi,j=1,λi,j0}𝚫conditional-set𝝀formulae-sequence𝑖𝑗subscript𝜆𝑖𝑗1subscript𝜆𝑖𝑗0\boldsymbol{\Delta}=\left\{\ \boldsymbol{\lambda}\ \Bigg{|}\ \underset{(i,j)}{% \sum}\ \lambda_{i,j}=1,\lambda_{i,j}\geq 0\right\}bold_Δ = { bold_italic_λ | start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 , italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ≥ 0 } (38)

is a probability simplex over n(n1)𝑛𝑛1n(n-1)italic_n ( italic_n - 1 ) pairs. In each turn, the distributionally robust Nash equilibrium (25) (line 7777 in Algorithm 1) decides c𝓐subscript𝑐𝓐c_{\boldsymbol{\mathcal{A}}}italic_c start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT222Without lose of generality, we constraint the adversary insert only one pairwise comparison at any step s𝑠sitalic_s in in Algorithm 1. It means that the action 𝒂𝓐subscript𝒂𝓐\boldsymbol{a}_{\boldsymbol{\mathcal{A}}}bold_italic_a start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT will be a one-hot vector which corresponds to c𝓐subscript𝑐𝓐c_{\boldsymbol{\mathcal{A}}}italic_c start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. according to 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ, which depends on the goal 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the knowledge 𝒘𝓐subscript𝒘𝓐\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT. The generative rules in the online adversarial game constitute the following set:

𝚲={𝝀(s)|𝝀(s)𝚫,s=1, 2,}.𝚲conditional-setsuperscript𝝀𝑠formulae-sequencesuperscript𝝀𝑠𝚫𝑠12\boldsymbol{\Lambda}=\Big{\{}\boldsymbol{\lambda}^{(s)}\ \Big{|}\ \boldsymbol{% \lambda}^{(s)}\in\boldsymbol{\Delta},\ s=1,\ 2,\dots\Big{\}}.bold_Λ = { bold_italic_λ start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT | bold_italic_λ start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ∈ bold_Δ , italic_s = 1 , 2 , … } . (39)

There is no doubt that the longer the stop** time [26], the higher the possibility of achieving the manipulation. However, the adversary can’t insert without limitations. A large amount of {c𝓐}subscript𝑐𝓐\{c_{\boldsymbol{\mathcal{A}}}\}{ italic_c start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT } will alert the ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R thus lose the opportunity to attack. Consequently, we measure the quality of sequential manipulation via the generation cost and the ranking consistency. The risk associated with the stop** time S𝑆Sitalic_S is defined as

(S)=χS.𝑆𝜒𝑆\mathfrak{R}(S)=\chi\cdot S.fraktur_R ( italic_S ) = italic_χ ⋅ italic_S . (40)

Here the constant χ>0𝜒0\chi>0italic_χ > 0 indicates the relative cost of inserting one c𝓐subscript𝑐𝓐c_{\boldsymbol{\mathcal{A}}}italic_c start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT into 𝑪𝑪\boldsymbol{C}bold_italic_C (line 10 in Algorithm 1). The choice of χ𝜒\chiitalic_χ is associated with the difficulty of attack against the specific ranking system.

On the other hand, we adopt Kendall-τ𝜏\tauitalic_τ distance to measure the risk of inconsistency: given a full ranking list 𝝅(𝚲,S)𝝅𝚲𝑆\boldsymbol{\pi}(\boldsymbol{\Lambda},S)bold_italic_π ( bold_Λ , italic_S ) from the victim 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R with (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ), we convert 𝝅(𝚲,S)𝝅𝚲𝑆\boldsymbol{\pi}(\boldsymbol{\Lambda},S)bold_italic_π ( bold_Λ , italic_S ) to the binary decisions set 𝑹𝚲,Ssubscript𝑹𝚲𝑆\boldsymbol{R}_{\boldsymbol{\Lambda},S}bold_italic_R start_POSTSUBSCRIPT bold_Λ , italic_S end_POSTSUBSCRIPT over pairs

𝑹(𝚲,S)={ri,j{0,1}|i,j[n],ij}𝑹𝚲𝑆conditional-setsubscript𝑟𝑖𝑗01formulae-sequence𝑖𝑗delimited-[]𝑛𝑖𝑗\boldsymbol{R}(\boldsymbol{\Lambda},S)=\Big{\{}r_{i,j}\in\{0,1\}\ \Big{|}\ i,j% \in[n],\ i\neq j\Big{\}}bold_italic_R ( bold_Λ , italic_S ) = { italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ { 0 , 1 } | italic_i , italic_j ∈ [ italic_n ] , italic_i ≠ italic_j } (41)

where

ri,j={1,i𝝅(𝚲,S)j,0,otherwise,subscript𝑟𝑖𝑗cases1subscriptsucceeds𝝅𝚲𝑆𝑖𝑗0otherwiser_{i,j}=\left\{\begin{array}[]{cl}1,&i\succ_{\boldsymbol{\pi}(\boldsymbol{% \Lambda},S)}j,\\ 0,&\text{otherwise},\\ \end{array}\right.italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL italic_i ≻ start_POSTSUBSCRIPT bold_italic_π ( bold_Λ , italic_S ) end_POSTSUBSCRIPT italic_j , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise , end_CELL end_ROW end_ARRAY (42)

and i𝝅jsubscriptsucceeds𝝅𝑖𝑗i\succ_{\boldsymbol{\pi}}jitalic_i ≻ start_POSTSUBSCRIPT bold_italic_π end_POSTSUBSCRIPT italic_j means that i𝑖iitalic_i is located before j𝑗jitalic_j in 𝝅𝝅\boldsymbol{\pi}bold_italic_π. The risk of inconsistency between 𝝅(𝚲,S)𝝅𝚲𝑆\boldsymbol{\pi}(\boldsymbol{\Lambda},S)bold_italic_π ( bold_Λ , italic_S ) and the target ranking induced by 𝜽=[θ1,,θn]superscript𝜽subscriptsuperscript𝜃1subscriptsuperscript𝜃𝑛\boldsymbol{\theta}^{\prime}=[\theta^{\prime}_{1},\dots,\theta^{\prime}_{n}]bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] is defined by

(𝑹(𝚲,S))𝑹𝚲𝑆\displaystyle\ \ \mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S))fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) (43)
=\displaystyle== (i,j)𝕀[θi<θj]ri,j+𝕀[θi>θj](1ri,j).𝑖𝑗𝕀delimited-[]subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗subscript𝑟𝑖𝑗𝕀delimited-[]subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗1subscript𝑟𝑖𝑗\displaystyle\ \ \underset{(i,j)}{\sum}\ \mathbbm{I}[\theta^{\prime}_{i}<% \theta^{\prime}_{j}]r_{i,j}+\mathbbm{I}[\theta^{\prime}_{i}>\theta^{\prime}_{j% }](1-r_{i,j}).start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG blackboard_I [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + blackboard_I [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ( 1 - italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) .

In this paper, we consider the ranking algorithms tailored to the BTL model, say HodgeRank and RankCentrality. The ranking decision of these two victims is locating the candidates based on appropriate estimates of the latent preference scores in the full ranking list. Consequently, our proposed generation policy will depend on the maximum likelihood estimation (MLE) of BTL model. Given the ranker 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R under attack, we analyze the combination of (40) and (43) under the Bayesian decision framework, in which the manipulated preference score of the victim is assumed to be random and follows a prior distribution ρ𝜽(𝜽)subscript𝜌superscript𝜽𝜽\rho_{\boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) which is specified by the adversary. The Bayesian risk associated with the victim 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R is defined as

(𝚲,S)=𝔼[(S)+(𝑹(𝚲,S))],𝚲𝑆𝔼delimited-[]𝑆𝑹𝚲𝑆\mathfrak{R}(\boldsymbol{\Lambda},S)=\mathbb{E}[\mathfrak{R}(S)+\mathfrak{R}(% \boldsymbol{R}(\boldsymbol{\Lambda},S))],fraktur_R ( bold_Λ , italic_S ) = blackboard_E [ fraktur_R ( italic_S ) + fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] , (44)

where the expectation 𝔼[]𝔼delimited-[]\mathbb{E}[\cdot]blackboard_E [ ⋅ ] is taken with respect to the adaptive generation rule 𝚲𝚲\boldsymbol{\Lambda}bold_Λ and the stop** time S𝑆Sitalic_S. The adversary 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A hopes to execute the optimal policy (𝚲,S)superscript𝚲superscript𝑆(\boldsymbol{\Lambda}^{*},S^{*})( bold_Λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) which will lead to the minimal risk superscript\mathfrak{R}^{*}fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

=inf𝚲,S(𝚲,S).superscript𝚲𝑆inf𝚲𝑆\mathfrak{R}^{*}=\underset{\boldsymbol{\Lambda},S}{\textbf{{inf}}}\ \mathfrak{% R}(\boldsymbol{\Lambda},S).fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_UNDERACCENT bold_Λ , italic_S end_UNDERACCENT start_ARG inf end_ARG fraktur_R ( bold_Λ , italic_S ) . (45)

For any given cost χ𝜒\chiitalic_χ, the value of superscript\mathfrak{R}^{*}fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represents the effect of manipulation: a small superscript\mathfrak{R}^{*}fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT indicates 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A would be close to his/her purpose and vice versa. However, obtaining the analytical form of (𝚲,S)superscript𝚲superscript𝑆(\boldsymbol{\Lambda}^{*},S^{*})( bold_Λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is typically infeasible. We turn to the asymptotic optimality [18] which is the other well-known evaluation of sequential decision. A policy (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ) for 𝓡𝓡\boldsymbol{\mathcal{R}}bold_caligraphic_R is said to be asymptotically optimal if

infχ0(𝚲,S)=1.𝜒0inf𝚲𝑆superscript1\underset{\chi\rightarrow 0}{\textbf{{inf}}}\ \frac{\mathfrak{R}(\boldsymbol{% \Lambda},S)}{\mathfrak{R}^{*}}=1.start_UNDERACCENT italic_χ → 0 end_UNDERACCENT start_ARG inf end_ARG divide start_ARG fraktur_R ( bold_Λ , italic_S ) end_ARG start_ARG fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = 1 . (46)

By the above definition, we know that the asymptotically optimal policy could work when the relative cost χ𝜒\chiitalic_χ converges to 00. Although χ𝜒\chiitalic_χ cannot be ignored, the relative cost is negligible compared to the huge profit from a successful manipulation. The adversary 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A can do whatever it takes to manipulate the ranking results. Therefore, the asymptotically optimal policy is still important for 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A. Now we pay attention to the inner loop of Algorithm 1 (from line 8 to 13). Suppose the log-likelihood function of the BTL model with a comparison graph 𝓖={𝑽,𝑬,𝒘𝒜(S)}𝓖𝑽𝑬subscript𝒘𝒜𝑆\boldsymbol{\mathcal{G}}=\{\boldsymbol{V},\boldsymbol{E},\boldsymbol{w}_{% \mathcal{A}}(S)\}bold_caligraphic_G = { bold_italic_V , bold_italic_E , bold_italic_w start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_S ) } is

L(𝜽,𝒘𝒜(S))=(i,j)wi,j(S)loggi,j(𝜽),𝐿𝜽subscript𝒘𝒜𝑆subscript𝑖𝑗subscript𝑤𝑖𝑗𝑆logsubscript𝑔𝑖𝑗𝜽L\bigg{(}\boldsymbol{\theta},\boldsymbol{w}_{\mathcal{A}}(S)\bigg{)}=\sum_{(i,% j)}w_{i,j}(S)\cdot\textbf{{log}}~{}g_{i,j}(\boldsymbol{\theta}),italic_L ( bold_italic_θ , bold_italic_w start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_S ) ) = ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_S ) ⋅ log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) , (47)

where gi,j(𝜽)subscript𝑔𝑖𝑗𝜽g_{i,j}(\boldsymbol{\theta})italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) is the probability mass function of ijsucceeds𝑖𝑗i\succ jitalic_i ≻ italic_j with 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝒘𝓐(S)subscript𝒘𝓐𝑆\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}(S)bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT ( italic_S ) represent the complete knowledge. The corresponding MLE with adversarial goal ρ𝜽subscript𝜌superscript𝜽\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is

𝜽^S=arg min𝜽Supp(ρ𝜽)L(𝜽,𝒘𝒜(S)),subscriptbold-^𝜽𝑆𝜽Suppsubscript𝜌superscript𝜽arg min𝐿𝜽subscript𝒘𝒜𝑆\boldsymbol{\hat{\theta}}_{S}=\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(% \rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{arg min}}}\ -L\bigg{(}% \boldsymbol{\theta},\boldsymbol{w}_{\mathcal{A}}(S)\bigg{)},overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG arg min end_ARG - italic_L ( bold_italic_θ , bold_italic_w start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_S ) ) , (48)

where Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) is the support of the prior probability density function ρ𝜽(𝜽)subscript𝜌superscript𝜽𝜽\rho_{\boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ).

Stop** Time. Based on the generalized likelihood ratio statistic [22], we leverage two types of stop** time to decide the number of inserted pairwise comparisons with the complete knowledge:

S1subscript𝑆1\displaystyle S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =\displaystyle== inf{S>0|(i,j)e|Δi,jLS|ezα(χ)}\displaystyle\ \ \textbf{{inf}}\left\{\ S>0\ \ \Bigg{|}\ \ \sum_{(i,j)}e^{-|% \Delta_{i,j}L_{S}|}\leq e^{-z_{\alpha}(\chi)}\right\}inf { italic_S > 0 | ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - | roman_Δ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ≤ italic_e start_POSTSUPERSCRIPT - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) end_POSTSUPERSCRIPT } (49)
S2subscript𝑆2\displaystyle S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =\displaystyle== inf{S>0|min(i,j)|Δi,jLS|zα(χ)},\displaystyle\ \ \textbf{{inf}}\left\{\ S>0\ \ \Bigg{|}\ \ \ \underset{(i,j)}{% \textbf{{min}}}\ \ |\Delta_{i,j}L_{S}|\ \geq\ z_{\alpha}(\chi)\ \right\},inf { italic_S > 0 | start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG min end_ARG | roman_Δ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ≥ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) } ,

where zα()subscript𝑧𝛼z_{\alpha}(\cdot)italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( ⋅ ) is a monotone function with α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 )

zα(χ)=|log(χ)|(1+|log(χ)|α).subscript𝑧𝛼𝜒log𝜒1superscriptlog𝜒𝛼z_{\alpha}(\chi)=|\textbf{{log}}(\chi)|\cdot\big{(}1+|\textbf{{log}}(\chi)|^{-% \alpha}\big{)}.italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) = | log ( italic_χ ) | ⋅ ( 1 + | log ( italic_χ ) | start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ) . (50)

Here Δi,jLSsubscriptΔ𝑖𝑗subscript𝐿𝑆\Delta_{i,j}L_{S}roman_Δ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT measures the difference between θiθjsubscript𝜃𝑖subscript𝜃𝑗\theta_{i}\geq\theta_{j}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and θiθjsubscript𝜃𝑖subscript𝜃𝑗\theta_{i}\leq\theta_{j}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in (47):

Δi,jLSsubscriptΔ𝑖𝑗subscript𝐿𝑆\displaystyle\Delta_{i,j}L_{S}roman_Δ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =\displaystyle== min𝜽𝚯i,jL(𝜽,𝒘𝒜(S))𝜽subscript𝚯𝑖𝑗min𝐿𝜽subscript𝒘𝒜𝑆\displaystyle\ \ \underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{% \textbf{{min}}\ }-L\bigg{(}\boldsymbol{\theta},\boldsymbol{w}_{\mathcal{A}}(S)% \bigg{)}start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG min end_ARG - italic_L ( bold_italic_θ , bold_italic_w start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_S ) ) (51)
min𝜽𝚯j,iL(𝜽,𝒘𝒜(S)),𝜽subscript𝚯𝑗𝑖min𝐿𝜽subscript𝒘𝒜𝑆\displaystyle\ \ -\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}_{j,i}}{% \textbf{{min}}\ }-L\bigg{(}\boldsymbol{\theta},\boldsymbol{w}_{\mathcal{A}}(S)% \bigg{)},- start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG min end_ARG - italic_L ( bold_italic_θ , bold_italic_w start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_S ) ) ,

where

𝚯i,j={𝜽+n|θiθj}Supp(ρ𝜽).subscript𝚯𝑖𝑗conditional-set𝜽subscriptsuperscript𝑛subscript𝜃𝑖subscript𝜃𝑗Suppsubscript𝜌superscript𝜽\boldsymbol{\Theta}_{i,j}=\big{\{}\boldsymbol{\ \theta}\in\mathbb{R}^{n}_{+}\ % |\ \theta_{i}\geq\theta_{j}\ \big{\}}\cap\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}}).bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ∩ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) . (52)

Generally speaking, the proposed criteria (49) will stop the generation process when the likelihood (47) can decide θiθjsubscript𝜃𝑖subscript𝜃𝑗\theta_{i}\geq\theta_{j}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or vice versa.

Generation Rule. Next we discuss the probabilistic generation rule for sequential manipulation. Inspired by the existing sequential design for rank aggregation [16], selecting the desired 𝝀(S)superscript𝝀𝑆\boldsymbol{\lambda}^{(S)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT equals to maximize the consistency between 𝜽^Ssubscriptbold-^𝜽𝑆\boldsymbol{\hat{\theta}}_{S}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and the goal 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Such consistency could be measured by the minimum of the mutual information between gi,j(𝜽^S)subscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆g_{i,j}(\boldsymbol{\hat{\theta}}_{S})italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) and any other gi,j(𝜽~)subscript𝑔𝑖𝑗bold-~𝜽g_{i,j}(\boldsymbol{\tilde{\theta}})italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) when (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) is generated according to 𝝀(S)superscript𝝀𝑆\boldsymbol{\lambda}^{(S)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT:

min𝜽~Supp(ρ𝜽)bold-~𝜽Suppsubscript𝜌superscript𝜽min\displaystyle\underset{\boldsymbol{\tilde{\theta}}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}{\textbf{{min}}}start_UNDERACCENT overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG min end_ARG (i,j)λi,j(S)gi,j(𝜽^S)loggi,j(𝜽^S)gi,j(𝜽~),subscript𝑖𝑗subscriptsuperscript𝜆𝑆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\displaystyle\ \ \sum_{(i,j)}\lambda^{(S)}_{i,j}\cdot g_{i,j}(\boldsymbol{\hat% {\theta}}_{S})\cdot\textbf{{log}}\frac{g_{i,j}(\boldsymbol{\hat{\theta}}_{S})}% {g_{i,j}(\boldsymbol{\tilde{\theta}})},∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG , (53)
subject to 𝝅(𝜽^S)𝝅(𝜽~).𝝅subscriptbold-^𝜽𝑆𝝅bold-~𝜽\displaystyle\ \ \boldsymbol{\pi}(\boldsymbol{\hat{\theta}}_{S})\neq% \boldsymbol{\pi}(\boldsymbol{\tilde{\theta}}).bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) .

It is noteworthy that (53) also minimizes the drift of log-likelihood ratio statistics between two distributions of pairwise comparisons specified by 𝜽^Ssubscriptbold-^𝜽𝑆\boldsymbol{\hat{\theta}}_{S}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and 𝜽~bold-~𝜽\boldsymbol{\tilde{\theta}}overbold_~ start_ARG bold_italic_θ end_ARG under the BTL model and the probabilistic generation 𝝀(S)superscript𝝀𝑆\boldsymbol{\lambda}^{(S)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT. The smaller the minimum value of (53) corresponding to the given 𝝀(S)superscript𝝀𝑆\boldsymbol{\lambda}^{(S)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT, the higher the consistency between 𝜽^Ssubscriptbold-^𝜽𝑆\boldsymbol{\hat{\theta}}_{S}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and the goal 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then we select a generation rule 𝝀(S)superscript𝝀𝑆\boldsymbol{\lambda}^{(S)}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT to maximize the consistency measured by (53):

max𝝀𝚫min𝜽~Supp(ρ𝜽)𝝀𝚫maxbold-~𝜽Suppsubscript𝜌superscript𝜽min\displaystyle\underset{\ \boldsymbol{\lambda}\in\boldsymbol{\Delta}\phantom{% \tilde{1}}}{\textbf{{max}}}\ \underset{\boldsymbol{\tilde{\theta}}\in\textbf{{% Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{min}}}start_UNDERACCENT bold_italic_λ ∈ bold_Δ end_UNDERACCENT start_ARG max end_ARG start_UNDERACCENT overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG min end_ARG (i,j)λi,jgi,j(𝜽^S)loggi,j(𝜽^S)gi,j(𝜽~),subscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\displaystyle\ \ \sum_{(i,j)}\lambda_{i,j}\cdot g_{i,j}(\boldsymbol{\hat{% \theta}}_{S})\cdot\textbf{{log}}\frac{g_{i,j}(\boldsymbol{\hat{\theta}}_{S})}{% g_{i,j}(\boldsymbol{\tilde{\theta}})},∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG , (54)
subject to 𝝅(𝜽^S)𝝅(𝜽~).𝝅subscriptbold-^𝜽𝑆𝝅bold-~𝜽\displaystyle\ \ \boldsymbol{\pi}(\boldsymbol{\hat{\theta}}_{S})\neq% \boldsymbol{\pi}(\boldsymbol{\tilde{\theta}}).bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) .

We discuss the detailed optimization approach for solving (54) in the following part. With the balance between the exploration and exploitation for the generation procedure controlled by (49) and (54), we provide the asymptotic optimality guarantee of the proposed policy (49) and (54) in the supplementary materials.

4.2 Robust Optimization with Incomplete Knowledge

By the adversarial policy (49) and (54) with complete knowledge, the adversary 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A could insert pairwise comparisons to manipulate the rank aggregation results in the sequential way. However, complete knowledge assumption could be not realistic in the actual confrontation scenarios. To dissect the vulnerability as much as possible, we develop a distributionally robust formulation against the uncertainty of knowledge.

Notice that the log-likelihood function L𝐿Litalic_L (47) is a scale-free function w.r.t the weights of a comparison graph. The MLE (48) would be invariant when we map 𝒘𝓐subscript𝒘𝓐\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT into a probabilistic simplex and replace the discrete variable 𝒘𝓐subscript𝒘𝓐\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT333We omit the indices of stop** time S𝑆Sitalic_S when the context is clear. with a continuous variable 𝒑=[p1,2,,pn,n1]+n(n1)𝒑subscript𝑝12subscript𝑝𝑛𝑛1subscriptsuperscript𝑛𝑛1\boldsymbol{p}=[p_{1,2},\dots,p_{n,n-1}]\in\mathbb{R}^{n(n-1)}_{+}bold_italic_p = [ italic_p start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n ( italic_n - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT:

𝒑=1M𝒘𝓐,𝒑𝟏=1formulae-sequence𝒑1𝑀subscript𝒘𝓐superscript𝒑top11\boldsymbol{p}=\frac{1}{M}\cdot\boldsymbol{w}_{\boldsymbol{\mathcal{A}}},\ % \boldsymbol{p}^{\top}\boldsymbol{1}=1bold_italic_p = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ⋅ bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT , bold_italic_p start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_1 = 1 (55)

where 𝟏1\boldsymbol{1}bold_1 is a n-dimension vector whose elements are 1111 and M𝑀Mitalic_M is the total number of observed pairwise comparisons by 𝓐𝓐\boldsymbol{\mathcal{A}}bold_caligraphic_A:

M=(i,j)wi,j.𝑀subscript𝑖𝑗subscript𝑤𝑖𝑗M=\sum_{(i,j)}w_{i,j}.italic_M = ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT . (56)

In fact, 𝒑𝒑\boldsymbol{p}bold_italic_p is drawn from a distribution \mathbb{P}blackboard_P:

=1n(n1)(i,j)δ(pi,j).1𝑛𝑛1𝑖𝑗𝛿subscript𝑝𝑖𝑗\mathbb{P}=\frac{1}{n(n-1)}\ \underset{(i,j)}{\sum}\ \delta(p_{i,j}).blackboard_P = divide start_ARG 1 end_ARG start_ARG italic_n ( italic_n - 1 ) end_ARG start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG italic_δ ( italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) . (57)

where δ(pi,j)𝛿subscript𝑝𝑖𝑗\delta(p_{i,j})italic_δ ( italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) is the Dirac measure concentrated at pi,jsubscript𝑝𝑖𝑗p_{i,j}italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. What is more, we can portray the difference between 𝒘𝓢(t)subscript𝒘𝓢𝑡\boldsymbol{w}_{\boldsymbol{\mathcal{S}}}{(t)}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_S end_POSTSUBSCRIPT ( italic_t ) and 𝒘𝓐subscript𝒘𝓐\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT (line 7777 in Algorithm 1) by the distance between distributions. Such treatments introduce an uncertainty set of \mathbb{P}blackboard_P which contains the probability distributions around \mathbb{P}blackboard_P:

𝖀γ()={|𝒲1(,)γ},superscript𝖀𝛾conditional-setsubscript𝒲1𝛾\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})=\left\{\ \mathbb{Q}\ \Big{|}\ % \mathcal{W}_{1}(\mathbb{P},\ \mathbb{Q})\leq\gamma\ \right\},bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) = { blackboard_Q | caligraphic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( blackboard_P , blackboard_Q ) ≤ italic_γ } , (58)

where 𝒲1(,)subscript𝒲1\mathcal{W}_{1}(\cdot,\cdot)caligraphic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ , ⋅ ) is the 1111-Wasserstein distance [24] as the discrepancy measure. The definition of 1111-Wasserstein distance and related properties can be found in the supplementary materials. With the help of 𝖀γ()superscript𝖀𝛾\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ), we execute a conservative strategy to estimate the parameter of BTL model with incomplete knowledge. Instead of 𝒑𝒑\boldsymbol{p}bold_italic_p, we choose the other random variable 𝒒𝒒\boldsymbol{q}bold_italic_q as the weight in (47). The distribution of 𝒒𝒒\boldsymbol{q}bold_italic_q belongs to 𝖀γ()superscript𝖀𝛾\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) and 𝒒𝒒\boldsymbol{q}bold_italic_q conducts the worst expected value of L𝐿Litalic_L. Such a conservative strategy can alleviate the uncertainty generated by incomplete knowledge in the sequential decisions of manipulation policy. Then the relative ranking score with incomplete is estimated by solving the following distributionally robust optimization (DRO) problem:

maxp𝜽Supp(ρ𝜽)sup𝖀γ()𝔼𝒒[L(𝜽,𝒒)],𝜽Suppsubscript𝜌superscript𝜽maxpsuperscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\underset{\ \boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\ \textbf{{max\phantom{p}}}}\ \underset{\mathbb{Q}\in\boldsymbol{% \mathfrak{U}}^{\gamma}(\mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q% }\sim\mathbb{Q}}\left[L(\boldsymbol{\theta},\boldsymbol{q})\right],start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max bold_italic_p end_ARG start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] , (59)

where L(𝜽,𝒒)𝐿𝜽𝒒L(\boldsymbol{\theta},\boldsymbol{q})italic_L ( bold_italic_θ , bold_italic_q ) replaces the incomplete knowledge 𝒘𝓐subscript𝒘𝓐\boldsymbol{w}_{\boldsymbol{\mathcal{A}}}bold_italic_w start_POSTSUBSCRIPT bold_caligraphic_A end_POSTSUBSCRIPT with the random variable 𝒒similar-to𝒒\boldsymbol{q}\sim\mathbb{Q}bold_italic_q ∼ blackboard_Q in (47). The supreme operation w.r.t. \mathbb{Q}blackboard_Q means that the estimation of the latent preference score is based on the worst expected value of L𝐿Litalic_L from the set of distributions 𝖀γ()superscript𝖀𝛾\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ).

Next, we specify the formulation of Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). Without a lost of generality, we assume the estimated and the desired scores belong to a probability simplex. Given the desired relative ranking score 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we hope that the estimation from (59) is in a neighborhood of 𝜽superscript𝜽\boldsymbol{\theta}^{\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, namely, the distance d:n×n+:𝑑superscript𝑛superscript𝑛subscriptd:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R}_{+}italic_d : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT between the estimation from (59) and 𝜽𝒜subscript𝜽𝒜\boldsymbol{\theta}_{\mathcal{A}}bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT would be sufficiently small:

Supp(ρ𝜽)={𝜽+n|𝜽𝜽22β,𝜽𝟏=1}.Suppsubscript𝜌superscript𝜽conditional-set𝜽subscriptsuperscript𝑛formulae-sequencesuperscriptsubscriptnorm𝜽superscript𝜽22𝛽superscript𝜽top11\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})=\left\{\ \boldsymbol{% \theta}\in\mathbb{R}^{n}_{+}\ \Big{|}\ \|\boldsymbol{\theta}-\boldsymbol{% \theta}^{\prime}\|_{2}^{2}\leq\beta,\ \ \boldsymbol{\theta}^{\top}\boldsymbol{% 1}=1\ \right\}.Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | ∥ bold_italic_θ - bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β , bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_1 = 1 } . (60)
Theorem 3.

Suppose that 𝐩𝐩\boldsymbol{p}bold_italic_p is drawn from the empirical distribution \mathbb{P}blackboard_P (57) and 𝐪𝐪\boldsymbol{q}bold_italic_q is drawn from 𝖀γ()superscript𝖀𝛾\mathbb{Q}\in\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) (58). If the distance between pi,jsubscript𝑝𝑖𝑗p_{i,j}italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT and qi,jsubscript𝑞𝑖𝑗q_{i,j}italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is chosen as

d(pij,qij)=|pijqij|.𝑑subscript𝑝𝑖𝑗subscript𝑞𝑖𝑗subscript𝑝𝑖𝑗subscript𝑞𝑖𝑗d(p_{ij},q_{ij})=\big{|}\ p_{ij}-q_{ij}\ \big{|}.italic_d ( italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = | italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | . (61)

Then, the DRO problem (59) has an equivalent form:

max𝜽Supp(ρ𝜽)h(𝜽),𝜽Suppsubscript𝜌superscript𝜽max𝜽\displaystyle\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol% {\theta}^{\prime}})}{\textbf{{max}}}\ h(\boldsymbol{\theta}),start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG italic_h ( bold_italic_θ ) , (62)

where

h(𝜽)=γ(i,j)loggi,j(𝜽)+(i,j)pi,jloggi,j(𝜽).𝜽𝛾𝑖𝑗logsubscript𝑔𝑖𝑗𝜽𝑖𝑗subscript𝑝𝑖𝑗logsubscript𝑔𝑖𝑗𝜽h(\boldsymbol{\theta})=\sqrt{\gamma\ }\underset{(i,j)}{\sum}\textbf{{log}}~{}g% _{i,j}(\boldsymbol{\theta})+\underset{(i,j)}{\sum}p_{i,j}\textbf{{log}}~{}g_{i% ,j}(\boldsymbol{\theta}).italic_h ( bold_italic_θ ) = square-root start_ARG italic_γ end_ARG start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) + start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) . (63)

Moreover, if the comparison model is BTL model, we have

h(𝜽)𝜽\displaystyle h(\boldsymbol{\theta})italic_h ( bold_italic_θ ) =\displaystyle== γ(i,j)log(1+exp(θjθi))𝛾𝑖𝑗log1expsubscript𝜃𝑗subscript𝜃𝑖\displaystyle\ \ \sqrt{\gamma\ }\cdot\underset{(i,j)}{\sum}\textbf{{log}}(1+% \textbf{{exp}}(\theta_{j}-\theta_{i}))square-root start_ARG italic_γ end_ARG ⋅ start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG log ( 1 + exp ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (64)
+(i,j)pi,jlog(1+exp(θjθi)).𝑖𝑗subscript𝑝𝑖𝑗log1expsubscript𝜃𝑗subscript𝜃𝑖\displaystyle\phantom{\gamma\underset{(i,j)}{\sum}}+\underset{(i,j)}{\sum}\ p_% {i,j}\ \textbf{{log}}(1+\textbf{{exp}}(\theta_{j}-\theta_{i})).+ start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT log ( 1 + exp ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) .

From the above theoretical results, we conduct the adversarial policy for the incomplete knowledge. The stop** time (49) turns to be

S1subscriptsuperscript𝑆1\displaystyle S^{\prime}_{1}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =\displaystyle== inf{S>0|(i,j)e|Δi,jLS|ezα(χ)}\displaystyle\ \ \textbf{{inf}}\left\{\ S>0\ \ \Bigg{|}\ \ \sum_{(i,j)}e^{-|% \Delta^{\prime}_{i,j}L_{S}|}\leq e^{-z_{\alpha}(\chi)}\right\}inf { italic_S > 0 | ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - | roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ≤ italic_e start_POSTSUPERSCRIPT - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) end_POSTSUPERSCRIPT } (65)
S2subscriptsuperscript𝑆2\displaystyle S^{\prime}_{2}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =\displaystyle== inf{S>0|min(i,j)|Δi,jLS|zα(χ)},\displaystyle\ \ \textbf{{inf}}\left\{\ S>0\ \ \Bigg{|}\ \ \ \underset{(i,j)}{% \textbf{{min}}}\ \ |\Delta^{\prime}_{i,j}L_{S}|\ \geq\ z_{\alpha}(\chi)\ % \right\},inf { italic_S > 0 | start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG min end_ARG | roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ≥ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) } ,

where

Δi,jLSsubscriptsuperscriptΔ𝑖𝑗subscript𝐿𝑆\displaystyle\Delta^{\prime}_{i,j}L_{S}roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =\displaystyle== maxp𝜽𝚯i,jsup𝖀γ()𝔼𝒒[L(𝜽,𝒒)]𝜽subscript𝚯𝑖𝑗maxpsuperscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\displaystyle\ \ \underset{\ \boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{% \ \textbf{{max\phantom{p}}}}\ \underset{\mathbb{Q}\in\boldsymbol{\mathfrak{U}}% ^{\gamma}(\mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q}\sim\mathbb{% Q}}\left[L(\boldsymbol{\theta},\boldsymbol{q})\right]start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] (66)
maxp𝜽𝚯j,isup𝖀γ()𝔼𝒒[L(𝜽,𝒒)]𝜽subscript𝚯𝑗𝑖maxpsuperscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\displaystyle\ \ -\underset{\ \boldsymbol{\theta}\in\boldsymbol{\Theta}_{j,i}}% {\ \textbf{{max\phantom{p}}}}\ \underset{\mathbb{Q}\in\boldsymbol{\mathfrak{U}% }^{\gamma}(\mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q}\sim\mathbb% {Q}}\left[L(\boldsymbol{\theta},\boldsymbol{q})\right]- start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ]
=\displaystyle== maxp𝜽𝚯i,jh(𝜽)maxp𝜽𝚯j,ih(𝜽).𝜽subscript𝚯𝑖𝑗maxp𝜽𝜽subscript𝚯𝑗𝑖maxp𝜽\displaystyle\ \ \underset{\ \boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{% \ \textbf{{max\phantom{p}}}}h(\boldsymbol{\theta})-\underset{\ \boldsymbol{% \theta}\in\boldsymbol{\Theta}_{j,i}}{\ \textbf{{max\phantom{p}}}}h(\boldsymbol% {\theta}).start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG italic_h ( bold_italic_θ ) - start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG italic_h ( bold_italic_θ ) .

Now we discuss the generation rule with the robust estimation by (62). Let 𝜽¯Ssubscriptbold-¯𝜽𝑆\boldsymbol{\bar{\theta}}_{S}overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT be a solution of (62) with stop** time (65). We obtain the generation rule with incomplete knowledge by replacing 𝜽^Ssubscriptbold-^𝜽𝑆\boldsymbol{\hat{\theta}}_{S}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT with 𝜽¯Ssubscriptbold-¯𝜽𝑆\boldsymbol{\bar{\theta}}_{S}overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT in (54):

max𝝀𝚫min𝜽~Supp(ρ𝜽)𝝀𝚫maxbold-~𝜽Suppsubscript𝜌superscript𝜽min\displaystyle\underset{\ \boldsymbol{\lambda}\in\boldsymbol{\Delta}\phantom{% \tilde{1}}}{\textbf{{max}}}\ \underset{\boldsymbol{\tilde{\theta}}\in\textbf{{% Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{min}}}start_UNDERACCENT bold_italic_λ ∈ bold_Δ end_UNDERACCENT start_ARG max end_ARG start_UNDERACCENT overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG min end_ARG (i,j)λi,jgi,j(𝜽¯S)loggi,j(𝜽¯S)gi,j(𝜽~),subscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\displaystyle\ \ \sum_{(i,j)}\lambda_{i,j}\cdot g_{i,j}(\boldsymbol{\bar{% \theta}}_{S})\cdot\textbf{{log}}\frac{g_{i,j}(\boldsymbol{\bar{\theta}}_{S})}{% g_{i,j}(\boldsymbol{\tilde{\theta}})},∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG , (67)
subject to 𝝅(𝜽¯S)𝝅(𝜽~).𝝅subscriptbold-¯𝜽𝑆𝝅bold-~𝜽\displaystyle\ \ \boldsymbol{\pi}(\boldsymbol{\bar{\theta}}_{S})\neq% \boldsymbol{\pi}(\boldsymbol{\tilde{\theta}}).bold_italic_π ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) .

Consider the inner problem

min𝜽~Supp(ρ𝜽)bold-~𝜽Suppsubscript𝜌superscript𝜽min\displaystyle\ \underset{\boldsymbol{\tilde{\theta}}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}{\textbf{{min}}}start_UNDERACCENT overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG min end_ARG (i,j)λi,jgi,j(𝜽¯S)loggi,j(𝜽¯S)gi,j(𝜽~),subscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\displaystyle\ \ \sum_{(i,j)}\lambda_{i,j}\cdot g_{i,j}(\boldsymbol{\bar{% \theta}}_{S})\cdot\textbf{{log}}\frac{g_{i,j}(\boldsymbol{\bar{\theta}}_{S})}{% g_{i,j}(\boldsymbol{\tilde{\theta}})},∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG , (68)
subject to 𝝅(𝜽¯S)𝝅(𝜽~),𝝅subscriptbold-¯𝜽𝑆𝝅bold-~𝜽\displaystyle\ \ \boldsymbol{\pi}(\boldsymbol{\bar{\theta}}_{S})\neq% \boldsymbol{\pi}(\boldsymbol{\tilde{\theta}}),bold_italic_π ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) ,

we know that the objective function is smooth and convex w.r.t 𝜽~bold-~𝜽\boldsymbol{\tilde{\theta}}overbold_~ start_ARG bold_italic_θ end_ARG for the BTL model gi,j()subscript𝑔𝑖𝑗g_{i,j}(\cdot)italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( ⋅ ). Moreover, the flexible set

{𝜽~Supp(ρ𝜽)|𝝅(𝜽¯S)𝝅(𝜽~)}conditional-setbold-~𝜽Suppsubscript𝜌superscript𝜽𝝅subscriptbold-¯𝜽𝑆𝝅bold-~𝜽\left\{\boldsymbol{\tilde{\theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}% ^{\prime}})\ \Big{|}\ \boldsymbol{\pi}(\boldsymbol{\bar{\theta}}_{S})\neq% \boldsymbol{\pi}(\boldsymbol{\tilde{\theta}})\right\}{ overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | bold_italic_π ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) } (69)

could be re-written as a union of convex sets which contain at most 2n2𝑛2*n2 ∗ italic_n linear equalities and inequalities like

{𝜽~Supp(ρ𝜽)|θ~i<θ¯πθ¯(i1),θ~i=θ¯πθ¯(i)},conditional-setbold-~𝜽Suppsubscript𝜌superscript𝜽formulae-sequencesubscript~𝜃𝑖subscript¯𝜃subscript𝜋¯𝜃superscript𝑖1subscript~𝜃𝑖subscript¯𝜃subscript𝜋¯𝜃superscript𝑖\left\{\boldsymbol{\tilde{\theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}% ^{\prime}})\ \Big{|}\ \tilde{\theta}_{i}<\bar{\theta}_{\pi_{\bar{\theta}}(i^{% \prime}-1)},\tilde{\theta}_{i}=\bar{\theta}_{\pi_{\bar{\theta}}(i^{\prime})}% \right\},{ overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 ) end_POSTSUBSCRIPT , over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } , (70)

where 𝜽~=[θ~1,,θ~n]bold-~𝜽subscript~𝜃1subscript~𝜃𝑛\boldsymbol{\tilde{\theta}}=[\tilde{\theta}_{1},\dots,\tilde{\theta}_{n}]overbold_~ start_ARG bold_italic_θ end_ARG = [ over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], and θ¯πθ¯(i)subscript¯𝜃subscript𝜋¯𝜃superscript𝑖\bar{\theta}_{\pi_{\bar{\theta}}(i^{\prime})}over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT indicates the preference score of candidates πθ¯(i)subscript𝜋¯𝜃𝑖\pi_{\bar{\theta}}(i)italic_π start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_i ) whose position in 𝝅𝜽¯subscript𝝅bold-¯𝜽\boldsymbol{\pi}_{\boldsymbol{\bar{\theta}}}bold_italic_π start_POSTSUBSCRIPT overbold_¯ start_ARG bold_italic_θ end_ARG end_POSTSUBSCRIPT is isuperscript𝑖i^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Therefore, the inner problem is the convex problem which can be solved efficiently by the standard numerical solvers. Then we analyze the outer problem:

min𝝀𝚫F(𝝀),F(𝝀)=max𝜽~Supp(ρ𝜽)𝝅(𝜽¯S)𝝅(𝜽~)ϕ(𝝀,𝜽~)𝝀𝚫min𝐹𝝀𝐹𝝀matrixbold-~𝜽Suppsubscript𝜌superscript𝜽𝝅subscriptbold-¯𝜽𝑆𝝅bold-~𝜽maxitalic-ϕ𝝀bold-~𝜽\underset{\boldsymbol{\lambda}\in\boldsymbol{\Delta}}{\textbf{{min}}}\ F(% \boldsymbol{\lambda}),\ \ F(\boldsymbol{\lambda})=\underset{\begin{matrix}% \scriptstyle\boldsymbol{\tilde{\theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}})\\ \scriptstyle\boldsymbol{\pi}(\boldsymbol{\bar{\theta}}_{S})\neq\boldsymbol{\pi% }(\boldsymbol{\tilde{\theta}})\end{matrix}}{\textbf{{max}}}\phi(\boldsymbol{% \lambda},\boldsymbol{\tilde{\theta}})start_UNDERACCENT bold_italic_λ ∈ bold_Δ end_UNDERACCENT start_ARG min end_ARG italic_F ( bold_italic_λ ) , italic_F ( bold_italic_λ ) = start_UNDERACCENT start_ARG start_ROW start_CELL overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_π ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG italic_ϕ ( bold_italic_λ , overbold_~ start_ARG bold_italic_θ end_ARG ) (71)

where

ϕ(𝝀,𝜽~)=(i,j)λi,jgi,j(𝜽¯S)loggi,j(𝜽¯S)gi,j(𝜽~).italic-ϕ𝝀bold-~𝜽subscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-¯𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\phi(\boldsymbol{\lambda},\boldsymbol{\tilde{\theta}})=-\sum_{(i,j)}\lambda_{i% ,j}\cdot g_{i,j}(\boldsymbol{\bar{\theta}}_{S})\cdot\textbf{{log}}\frac{g_{i,j% }(\boldsymbol{\bar{\theta}}_{S})}{g_{i,j}(\boldsymbol{\tilde{\theta}})}.italic_ϕ ( bold_italic_λ , overbold_~ start_ARG bold_italic_θ end_ARG ) = - ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG . (72)

It is noteworthy that ϕ(𝝀,𝜽~)italic-ϕ𝝀bold-~𝜽\phi(\boldsymbol{\lambda},\boldsymbol{\tilde{\theta}})italic_ϕ ( bold_italic_λ , overbold_~ start_ARG bold_italic_θ end_ARG ) is a continuous and bounded function. Furthermore, ϕ(𝝀,𝜽~)italic-ϕ𝝀bold-~𝜽\phi(\boldsymbol{\lambda},\boldsymbol{\tilde{\theta}})italic_ϕ ( bold_italic_λ , overbold_~ start_ARG bold_italic_θ end_ARG ) is convex w.r.t 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ for any 𝜽~bold-~𝜽\boldsymbol{\tilde{\theta}}overbold_~ start_ARG bold_italic_θ end_ARG and Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) is a convex set. By Danskin Theorem [9], F(𝝀)𝐹𝝀F(\boldsymbol{\lambda})italic_F ( bold_italic_λ ) is a convex function w.r.t 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ and the min-max optimization problem (67) can be solved efficiently using the mirror descent algorithm [7]. The corresponding solution process is summarized as Algorithm 2. We elaborate the steps of Algorithm 2 in the supplementary materials.

  Input : the probability mass function g𝑔gitalic_g, the incomplete knowledge 𝒑𝒑\boldsymbol{p}bold_italic_p, the uncertainty radius γ𝛾\gammaitalic_γ.
1
2Obtain the robust estimation based on the partial observation by solving (62):
𝜽=RobustEstimation(g,𝒑,γ).𝜽RobustEstimation𝑔𝒑𝛾\boldsymbol{\theta}=\textbf{{RobustEstimation}}(g,\boldsymbol{p},\gamma).bold_italic_θ = RobustEstimation ( italic_g , bold_italic_p , italic_γ ) .
3Solve the generation rule 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ via the min-max problem (67):
𝝀=MirrorDescent(g,𝜽).𝝀MirrorDescent𝑔𝜽\boldsymbol{\lambda}=\textbf{{MirrorDescent}}(g,\boldsymbol{\theta}).bold_italic_λ = MirrorDescent ( italic_g , bold_italic_θ ) .
4Select pairwise comparison c𝑐citalic_c according to the categorical distribution 𝝀𝝀\boldsymbol{\lambda}bold_italic_λ.
Output : a pairwise comparison c𝑐citalic_c.
Algorithm 2 Adversarial Generation

5 Experiments

In this section, three examples are exhibited with both simulated and real-world data to illustrate the validity of the proposed online attack strategy against the Bernoulli method for rank aggregation like HodgeRank [29] and RankCentrality [41]. The first example is with simulated data while the latter two exploit real-world datasets involved in election and crowdsourcing.

5.1 General Setting

We treat the online manipulation against the rank aggregation as the interaction between the original and adversarial data source like Algorithm 1. In each turn of this game, the original and adversarial data sources generate the pairwise comparisons separately. The generation process of the original data source is always a black box for the adversary and the action of the original data source (line 3333 of Algorithm 1) will be replaced by a random procedure. Based on the analysis in Sec. 3.3, the sampling method could be (ϵ,δ)limit-fromitalic-ϵ𝛿(\epsilon,\delta)-( italic_ϵ , italic_δ ) - representative w.r.t the mixed data source 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT when the sampling parameters are sufficiently large. In most cases, the sampling methods for rank aggregation satisfy these conditions. Even if the sampling methods reject the samples generated by the adversary, he/she can still try repeatedly until such samples affect the final aggregation result. As a consequence, we don’t consider the effects of sampling methods (Line 5555 and 11111111 of Algorithm 1) in the experimental studies. It is noteworthy that the attacker only has incomplete knowledge in the adversarial game, say that he/she only observes partial weight of the comparison graph (Line 7777 of Algorithm 1). The actions of the attacker is the adversarial generation process with the target ranking list 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which is discussed in Sec. 4 summarized as Algorithm 2. In each turn, the adversarial generation process will insert S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT pairwise comparisons into the mixed data stream, where S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the stoo** time. We establish the asymptotic optimality of the proposed stop** time S1,S2subscript𝑆1subscript𝑆2S_{1},S_{2}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (49) under the complete knowledge condition. With the incomplete knowledge, we set the stop** time S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT empirically. At the end of the adversarial game, we finish the collection of pairwise comparisons and obtain the weighted comparison graph 𝓖(T)superscript𝓖𝑇\boldsymbol{\mathcal{G}}^{(T)}bold_caligraphic_G start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT (the output of Algorithm 1). Then the rank aggregation methods leverage 𝓖(T)superscript𝓖𝑇\boldsymbol{\mathcal{G}}^{(T)}bold_caligraphic_G start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT to create the ranking list 𝝅′′superscript𝝅′′\boldsymbol{\pi}^{\prime\prime}bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT. We evaluate the similarity between 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝝅′′superscript𝝅′′\boldsymbol{\pi}^{\prime\prime}bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT. The more the two orders are similar, the more the manipulation is successful.

Refer to caption
(a) R. Rank of HodgeRank
Refer to caption
(b) K. τ𝜏\tauitalic_τ of HodgeRank
Refer to caption
(c) R. Rank of RankCentrality
Refer to caption
(d) K. τ𝜏\tauitalic_τ of RankCentrality
Figure 2: Comparative results of different sequential manipulation methods against HodgeRank and RankCentrality on simulated data. The box plot illustrates the results of 50505050 trials with different data sequences which will make HodgeRank and RankCentrality generate 𝝅0=(10,9,8,7,6,5,4,3,2,1)subscript𝝅010987654321\boldsymbol{\pi}_{0}=(10,9,8,7,6,5,4,3,2,1)bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 10 , 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 ). The target list of the adversary is 𝝅=(8,9,10,7,5,6,4,3,2,1)superscript𝝅89107564321\boldsymbol{\pi}^{\prime}=(8,9,10,7,5,6,4,3,2,1)bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( 8 , 9 , 10 , 7 , 5 , 6 , 4 , 3 , 2 , 1 ). The proposed method provides a stable manipulation in the form of sequential action. All metrics of the proposed method will be 1111 with rare outliers. Meanwhile the three competitors fail to manipulate HodgeRank and RankCentrality with sequential actions. The ‘Greedy’ perturbation only focuses on the top-1111 candidate but can’t guarantee the designation of a winner. The result of ‘Straightforward’ strategy is inferior to the proposed method when the number of actions is the same.
Refer to caption
(a) R. Rank of HodgeRank
Refer to caption
(b) K. τ𝜏\tauitalic_τ of HodgeRank
Refer to caption
(c) R. Rank of RankCentrality
Refer to caption
(d) K. τ𝜏\tauitalic_τ of RankCentrality
Figure 3: Change of evaluation metrics on simulated data for different sequential manipulation methods against HodgeRank and RankCentrality. The horizontal axis lists the turns of game. When the interaction proceeds, the proposed method is able to generate malicious pairwise comparisons with incomplete knowledge and manipulate the victim, whose aggregated results are consistent with the attacker’s target.

5.2 Evaluation Metrics

Here we adopt Reciprocal rank and Kendall τ𝜏{\tau}italic_τ coefficient for evaluating the correlation between the target ranking list 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the final aggregated result 𝝅′′superscript𝝅′′\boldsymbol{\pi}^{\prime\prime}bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT. These measurements can be divided into two categories. The reciprocal rank metric reflects whether the first candidates of two ranking lists are the same. The Kendall τ𝜏\tauitalic_τ coefficient considers the consistency of every pairwise comparison in the two ranking lists.

Reciprocal Rank (R. Rank). The reciprocal rank is a statistic measure for evaluating any process that produces an order list of possible responses to a series of queries, ordered by the probability of correctness or the ranking scores. The reciprocal rank of a ranking list is the multiplicative inverse of the leading object’s position in the new order list. The R. Rank between 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝝅′′superscript𝝅′′\boldsymbol{\pi}^{\prime\prime}bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is defined as

R. Rank(𝝅,𝝅′′)=1(𝝅′′)1[𝝅(1)],R. Ranksuperscript𝝅superscript𝝅′′1superscriptsuperscript𝝅′′1delimited-[]superscript𝝅1\textbf{{{R. Rank}}}(\boldsymbol{\pi}^{\prime},\boldsymbol{\pi}^{\prime\prime}% )=\frac{1}{\ (\boldsymbol{\pi}^{\prime\prime})^{-1}\big{[}\boldsymbol{\pi}^{% \prime}(1)\big{]}\ },R. Rank ( bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG ( bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 ) ] end_ARG , (73)

where 𝝅(i)𝝅𝑖\boldsymbol{\pi}(i)bold_italic_π ( italic_i ) refers to the item index which lies in the i𝑖iitalic_i-th position of ranking list 𝝅𝝅\boldsymbol{\pi}bold_italic_π, and 𝝅1[i]superscript𝝅1delimited-[]𝑖\boldsymbol{\pi}^{-1}[i]bold_italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ italic_i ] indicates the position of ranking list 𝝅𝝅\boldsymbol{\pi}bold_italic_π which belongs to the item i𝑖iitalic_i. If it holds that 𝝅(1)=𝝅′′(1)superscript𝝅1superscript𝝅′′1\boldsymbol{\pi}^{\prime}(1)=\boldsymbol{\pi}^{\prime\prime}(1)bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 ) = bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( 1 ), we have R.r(𝝅,𝝅′′)R.rsuperscript𝝅superscript𝝅′′\textbf{{R.r}}(\boldsymbol{\pi}^{\prime},\boldsymbol{\pi}^{\prime\prime})R.r ( bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) archives its maximum value 1111. Lager reciprocal rank value indicates a better manipulation result.

Kendall 𝝉𝝉\boldsymbol{\tau}bold_italic_τ Coefficient (K. 𝝉𝝉\boldsymbol{\tau}bold_italic_τ). The Kendall rank correlation coefficient evaluates the degree of similarity between two ranking lists given the same objects. This coefficient depends upon the number of inversions of pairs of objects which would be needed to transform one rank order into the other. The definition of Kendall τ𝜏\tauitalic_τ coefficient is the normalization of (43). Lager Kendall-τ𝜏\tauitalic_τ value indicates a better purposeful attack result. If dτ(𝝅,𝝅′′)=1subscript𝑑𝜏superscript𝝅superscript𝝅′′1d_{\tau}(\boldsymbol{\pi}^{\prime},\ \boldsymbol{\pi}^{\prime\prime})=1italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = 1, we have 𝝅=𝝅′′superscript𝝅superscript𝝅′′\boldsymbol{\pi}^{\prime}=\boldsymbol{\pi}^{\prime\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_italic_π start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT.

Refer to caption
(a) HodgeRank
Refer to caption
(b) RankCentrality
Figure 4: The victims’ aggregation results of different manipulation methods on simulated data. The original ranking list is 𝝅0=[10,9,8,7,6,5,4,3,2,1]subscript𝝅010987654321\boldsymbol{\pi}_{0}=[10,9,8,7,6,5,4,3,2,1]bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ 10 , 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 ] and the target one is 𝝅=[8,9,10,7,5,6,4,3,2,1]superscript𝝅89107564321\boldsymbol{\pi}^{\prime}=[8,9,10,7,5,6,4,3,2,1]bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ 8 , 9 , 10 , 7 , 5 , 6 , 4 , 3 , 2 , 1 ] (refer to Target in the figure). The dark dots represent the candidates with large ID and vice versa. If a candidate is not in the same position in both ranking lists, there will exist an intersection and the width of the line represent the degree of inconsistent influence on the aggregation results. We mark the inconsistent dots with red circles. When there exist multiple intersections and the key locations (top-3333) are marked by red circles, the attacker has failed to achieve his/her goal. (a) The victim is HodgeRank. The proposed method accomplishes the manipulation of the complete ranking list (no intersection or red circle). (b) The victim is RankCentrality.

5.3 Competitors

To the best of our knowledge, the proposed method is the first overture to online manipulation strategies against the pairwise ranking algorithms. We compare the proposed methods with the following three competitors: the random strategy (referred to as ‘Random’), the greedy strategy (referred to as ‘Greedy’) and the straightforward strategy (referred to as ‘Straight’).

Random perturbation involves a random data source which generates any pairwise comparisons with the same probability. This method does not rely on any information of the desired ranking 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We conjecture that the purposeless behavior of the random perturbation would not sculpt the desired results out of the mixed data stream 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. However, this strategy is still the evidence to prove the necessity of sophisticated attacker in the online manipulation against the rank aggregation methods.

Greedy manipulation generates the mixed data stream 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in a greedy way with the help of the target ranking list 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Specifically, this method only insert 𝝅(1)𝝅(j),j=2,,nformulae-sequencesucceedssuperscript𝝅1superscript𝝅𝑗𝑗2𝑛\boldsymbol{\pi}^{\prime}(1)\succ\boldsymbol{\pi}^{\prime}(j),j=2,\dots,nbold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 ) ≻ bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_j ) , italic_j = 2 , … , italic_n with the same probability. The greedy manipulation could designate the leading position of the aggregated list. Compared with the proposed method, the greedy method lack the manipulation ability of full ranking list.

Straightforward strategy implements the adversarial data source with the so-called “Matthew Principle”: increasing the number of the pairwise comparisons which are consistent with the desired ranking list. There exist n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2 pairwise comparisons which are consistent with 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and they have the equal chance to insert into the mixed data stream 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Obviously, this strategy could archive the goal with sufficient turns in the adversarial game between two data source. Compared with the proposed method, the straightforward method will waste some opportunities and need more actions to archive the goal.

5.4 Simulated Study

Description. We validate the proposed sequential manipulation strategies against HodgeRank and RankCentrality on simulated data. We generate the data stream as follows. First, we build a complete graph 𝓖=(𝑽,𝑬)𝓖𝑽𝑬\boldsymbol{\mathcal{G}}=(\boldsymbol{V},\boldsymbol{E})bold_caligraphic_G = ( bold_italic_V , bold_italic_E ) where 𝑽=[n]𝑽delimited-[]𝑛\boldsymbol{V}=[n]bold_italic_V = [ italic_n ]. Then the latent preference score is assigned to each candidate/vertex of 𝑽𝑽\boldsymbol{V}bold_italic_V and the true ranking is obtained by these scores. Setting n=10𝑛10n=10italic_n = 10 and the true ranking is 𝝅0=(10,9,8,7,6,5,4,3,2,1)subscript𝝅010987654321\boldsymbol{\pi}_{0}=(10,9,8,7,6,5,4,3,2,1)bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 10 , 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 ). Next, we randomly sample 744744744744 pairwise comparisons from 𝑽×𝑽𝑽𝑽\boldsymbol{V}\times\boldsymbol{V}bold_italic_V × bold_italic_V based on their preference score. Notice that the samples could contain the comparisons which are inconsistent with the true ranking. We regard these samples from the original data source (line 3 of Algorithm 1) and construct 50505050 sequences with different orders. Each sequence will be a trail for the adversary. The goal of adversary is to make HodgeRank and RankCentrality produce 𝝅=(8,9,10,7,5,6,4,3,2,1)superscript𝝅89107564321\boldsymbol{\pi}^{\prime}=(8,9,10,7,5,6,4,3,2,1)bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( 8 , 9 , 10 , 7 , 5 , 6 , 4 , 3 , 2 , 1 ). The number of turns in the adversarial game (T=75𝑇75T=75italic_T = 75 in Algorithm 1) is 10101010 percent of the length of the complete sequence. In each turn, the sample from the original data source has an 80%percent8080\%80 % chance of being observed by the adversary (line 7777 of Algorithm 1). If his/her knowledge is not updated, the attacker will not take any action and wait for another sample from the original data source. Moreover, the attackers could insert S0=5subscript𝑆05S_{0}=5italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5 pairwise comparisons to construct the comparison graph in each turn (line 8888 of Algorithm 1).

Refer to caption
(a) Random v.s. HodgeRank
Refer to caption
(b) Straight v.s. HodgeRank
Refer to caption
(c) Greedy v.s. HodgeRank
Refer to caption
(d) Ours v.s. HodgeRank
Refer to caption
(e) Random v.s. RankCentrality
Refer to caption
(f) Straight v.s. RankCentrality
Refer to caption
(g) Greedy v.s. RankCentrality
Refer to caption
(h) Our v.s. RankCentrality
Figure 5: Data distribution generated by different methods on simulated data. The vertical axis lists the number of rounds in the adversarial games and the horizontal axis displays all possible pairwise comparisons. For the same victim, all results are based on the same observed data. The original ranking list is 𝝅0=[10,9,8,7,6,5,4,3,2,1]subscript𝝅010987654321\boldsymbol{\pi}_{0}=[10,9,8,7,6,5,4,3,2,1]bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ 10 , 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 ] and the target one is 𝝅=[8,9,10,7,5,6,4,3,2,1]superscript𝝅89107564321\boldsymbol{\pi}^{\prime}=[8,9,10,7,5,6,4,3,2,1]bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ 8 , 9 , 10 , 7 , 5 , 6 , 4 , 3 , 2 , 1 ].

Comparative Results. The results of each method against HodgeRank on simulated data are reported in Fig. 2 (a)-(b). Our method exhibits higher success rate due to the parsimonious mechanism and consistently outperforms all the competitors by a significant margin. Concerning the performances of the three competitors, we can easily find that:

  • Due to the existence of non-modifiable data, the blind attack strategies cannot even interfere with the aggregation results of victim with limited actions. The random perturbation (‘Random’) can’t boost the position of candidate 8888 in all aggregated results. Although the mean of Kendall τ𝜏\tauitalic_τ coefficient with 50505050 trials is 0.770.770.770.77, the degree of consistency between Random and 𝝅0subscript𝝅0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT remains higher than that between Random and 𝝅superscript𝝅\boldsymbol{\pi}^{\prime}bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. These arguments could be justified by the visualization of ranking lists in Figure 4.

  • The greedy manipulation (‘Greedy’) can have an impact on the winner of the final ranking lists. However, this method failed to consistently manipulate the aggregation results over a specified number of actions. The interquartile range of reciprocal rank is a large interval and the median is 0.50.50.50.5 when the adversary executes Greedy against HodgeRank. As all the insertions of Greedy are consistent with the target list, the Kendall τ𝜏\tauitalic_τ coefficient of Greedy is higher than that of Random. This phenomenon does not imply that Greedy had complete control over the victim’s result as its generation mechanism only guarantees the desired winner could beat the other candidates.

  • In principle, the straightforward strategy (‘Straight’) has potential to manipulate the complete aggregation results of victim. The efficiency of the method is a concern as it ignores the existing data. We suspect that the method will only work under the conditions of the so-called “flooding attack”, where T𝑇Titalic_T and S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT will be sufficiently large.

We report the results of each method against RankCentrality on simulated data in Fig. 2 (c)-(d). The best performance and the median of our method consistently surpass all the competitors. The Kendall τ𝜏\tauitalic_τ coefficient of the proposed method is not 1111. We speculate that this phenomenon comes from the challenge posed by controlling the spectral structure of comparison graph. However, the proposed method is the only one which is able to designate the winner of the aggregation results. Furthermore, we show the specific behavior of the proposed method in every turn of the adversarial game in Fig. 3. Despite the existence of unknown data, all metrics of the proposed method grow and eventually remain stable. This phenomenon implies that the proposed method could obtain an equilibrium state which favors to the adversary even if the details of the victims are not involved. The ranking lists of different methods are shown in Fig. 4. We illustrate every pair of the manipulation result and the target ranking list. All results are based on the same observed data sequence. The proposed method could designate the winner of the aggregation result (8888 is the top-1111 candidate in our results) and keep a high correlation with the target ranking list (there only exist two intersections in our result).

TABLE I: Numeric results of different attack methods on crowdsourcing data. The best results are highlighted with bold text.
Methods HodgeRank RankCentrality
(20,29,8,13)2029813(20,29,8,13)( 20 , 29 , 8 , 13 ) (8,29,20,13)8292013(8,29,20,13)( 8 , 29 , 20 , 13 ) (13,29,20,8)1329208(13,29,20,8)( 13 , 29 , 20 , 8 ) ((20,29,8,13)((20,29,8,13)( ( 20 , 29 , 8 , 13 ) (8,29,20,13)8292013(8,29,20,13)( 8 , 29 , 20 , 13 ) (13,29,20,8)1329208(13,29,20,8)( 13 , 29 , 20 , 8 )
R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ
Random 0.50 0.37 0.50 0.35 0.20 0.25 0.33 0.39 0.33 0.33 0.25 0.14
Straight 0.50 0.64 0.50 0.59 0.25 0.52 0.50 0.44 0.33 0.35 0.25 0.23
Greedy 1.00 0.52 1.00 0.49 0.50 0.38 1.00 0.55 0.33 0.55 0.25 0.24
Ours 1.00 0.55 1.00 0.54 1.00 0.44 1.00 0.67 1.00 0.55 1.00 0.36
TABLE II: Numeric results of different attack methods on Dublin election data. The best results are highlighted with bold text.
Methods HodgeRank RankCentrality
(2,4,13,5)24135(2,4,13,5)( 2 , 4 , 13 , 5 ) (13,4,2,5)13425(13,4,2,5)( 13 , 4 , 2 , 5 ) (5,4,2,13)54213(5,4,2,13)( 5 , 4 , 2 , 13 ) (2,4,13,5)24135(2,4,13,5)( 2 , 4 , 13 , 5 ) (13,4,2,5)13425(13,4,2,5)( 13 , 4 , 2 , 5 ) (5,4,2,13)54213(5,4,2,13)( 5 , 4 , 2 , 13 )
R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ R. Rank K. τ𝜏\tauitalic_τ
Random 0.50 0.93 0.50 0.91 0.25 0.58 0.50 0.93 0.33 0.91 0.25 0.58
Straight 0.50 0.97 0.50 0.93 0.25 0.58 0.50 0.93 0.33 0.91 0.25 0.58
Greedy 1.00 0.98 1.00 0.96 0.50 0.87 1.00 0.93 0.50 0.91 0.33 0.49
Ours 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

The data distributions generated by different methods are illustrated in Fig. 5. With the help of the data distribution, we can better understand the reasons why the proposed method can achieve sequential manipulation with the interference of the original data source. The victim of the first row is HodgeRank and the second one is RankCentrality. Here the horizontal axis lists all possible pairwise comparisons. The vertical axis list 5555 representative turns in the adversarial games. We index them as follows: No.(i1)9+1𝑖191(i-1)*9+1( italic_i - 1 ) ∗ 9 + 1 to i9𝑖9i*9italic_i ∗ 9 are the comparisons {(i,j)|j[10],ji}conditional-set𝑖𝑗formulae-sequence𝑗delimited-[]10𝑗𝑖\{(i,j)\ |\ j\in[10],j\neq i\}{ ( italic_i , italic_j ) | italic_j ∈ [ 10 ] , italic_j ≠ italic_i }. The proposed method makes efficient manipulation with specific purposes. We observe that the bins which represent the desired winner defeating the other candidates are higher after the attack procedure, especially No. 72727272 (810succeeds8108\succ 108 ≻ 10) and No. 71717171 (89succeeds898\succ 98 ≻ 9). Such behavior ensures that the aggregation results of the victims are resistant to the original data source. The ‘Greedy’ method also increases the number of No. 64646464 to No. 72727272. However, it is not sufficient to guarantee that ‘Greedy’ could manipulate the victims’ ranking lists. The ‘Straight’ method disperses its power and fails to promote the position of candidate 8888. The ‘Random’ method uniformly generates all kinds of pairwise comparisons but it does not help to achieve the goal.

5.5 Crowdsourcing

Description. 30303030 images from the human age dataset FGNET444https://yanweifu.github.io/FG_NET_data/ are annotated by a group of volunteer users on a crowdsourcing platform555http://www.chinacrowds.com/. The ground-truth age ranking is known to us. The annotator is presented with two images and given a binary choice of which one is older. Totally, we obtain 8,01780178,0178 , 017 pairwise comparisons from 94949494 annotators. The top-4444 candidates of the true ranking is (29,20,8,13)2920813(29,20,8,13)( 29 , 20 , 8 , 13 ). The goals of adversary are to make HodgeRank and RankCentrality produce (20,29,8,13)2029813(20,29,8,13)( 20 , 29 , 8 , 13 ), (8,29,20,13)8292013(8,29,20,13)( 8 , 29 , 20 , 13 ) and (13,29,20,8)1329208(13,29,20,8)( 13 , 29 , 20 , 8 ) as the top-4444 candidates. The rest part of the whole ranking list remains unchanged. The number of turns in the adversarial game is 5%percent55\%5 % of the length of the complete sequence. In each turn, the sample from the original data source has a 90%percent9090\%90 % chance of being observed by the adversary. If his/her knowledge is not updated, the attacker will not take any action and wait for another sample from the original data source. Moreover, the attackers could insert S0=10subscript𝑆010S_{0}=10italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 10 pairwise comparisons to construct the comparison graph in each turn.

Comparative Results. It is worth mentioning that this real-world data has a high percentage of outliers (about 20%percent2020\%20 % of all comparisons conflict with the correct age ranking). The proposed methods against HodgeRank and RankCentrality still show promise manipulation as Table I. It is more challenging to change (29,20,8,13)2920813(29,20,8,13)( 29 , 20 , 8 , 13 ) to (8,29,20,13)8292013(8,29,20,13)( 8 , 29 , 20 , 13 ) than to (20,29,8,13)2029813(20,29,8,13)( 20 , 29 , 8 , 13 ). Consequently, the values of Kendall τ𝜏\tauitalic_τ coefficient will decrease when the difficulty of the manipulation increases.

5.6 Election

Description. The Dublin election data set666http://www.preflib.org/data/election/irish/ contains a complete record of votes for elections held in county Meath, Dublin, Ireland in 2002. This set contains 64,0816408164,08164 , 081 votes over 14141414 candidates. These votes could be a complete or partial list of the candidate set. The ground-truth ranking of 14141414 candidates is based on their obtained first preference votes777https://electionsireland.org/result.cfm?election=2002&cons=178&sort=first. The five candidates who receive the most first preference votes will be the winner of the election. The top-4444 of is 𝝅0=(4,2,13,5)subscript𝝅042135\boldsymbol{\pi}_{0}=(4,2,13,5)bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 4 , 2 , 13 , 5 ). Then these votes are converted into the pairwise comparisons. The total number of the comparisons is 652,817652817652,817652 , 817. The goals of adversary are to make HodgeRank and RankCentrality produce (2,4,13,5)24135(2,4,13,5)( 2 , 4 , 13 , 5 ), (13,4,2,5)13425(13,4,2,5)( 13 , 4 , 2 , 5 ) and (5,4,2,13)54213(5,4,2,13)( 5 , 4 , 2 , 13 ) as the top-4444 candidates. The number of turns in the adversarial game is 1%percent11\%1 % of the length of the complete sequence. In each turn, the sample from the original data source has an 80%percent8080\%80 % chance of being observed by the adversary. If his/her knowledge is not updated, the attacker will not take any action and wait for another sample from the original data source. Moreover the attackers could insert S0=5subscript𝑆05S_{0}=5italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5 pairwise comparisons to construct the comparison graph in each turn.

Comparative Results. It is worth noting that the election result is not obtained by pairwise ranking aggregation. However, the ordered list aggregated from induced comparisons still shows a positive correlation with the actual election result. Once the attackers generate a successful manipulation strategy against the ballots collection process, this attack plan could be adopted to manipulate the election in the real world. Consequently, the proposed sequential strategy is still able to manipulate the election results. The aggregation results of HodgeRank and RankCentrality are still manipulated by the proposed method, see Table II.

6 Conclusion

In this paper, we establish the first study of sequential manipulation in the context of ranking aggregation with pairwise comparisons to the best of our knowledge. We find that the data collection process is the Achilles’ heel of the rank aggregation. The sequential attack problem is formulated as a distributionally robust game between two players, the online manipulator and the ranker who possesses the original data ‘source’. Furthermore, we introduce the sampling algorithms to analyze the properties of the underlying distributionally Nash equilibrium. Like the two sides of a coin, we prove that the representation ability of sampling methods could turn into the vulnerability when the mixed data source supports the goal of an adversary. With the help of Bayesian decision theory, we develop the manipulation policy with complete knowledge, which achieves the asymptotic optimality. Then a distributionally robust generation rule is proposed to resist the uncertainty of the observed sequence. Our empirical studies show that the proposed sequential manipulation methods could achieve the attacker’s goal in the sense that the leading candidate of the aggregated ranking list is the designated one by the adversary.

References

  • [1] Arpit Agarwal, Shivani Agarwal, Sanjeev Khanna, and Prathamesh Patil. Rank aggregation from pairwise comparisons in the presence of adversarial corruptions. In International Conference on Machine Learning,, pages 85–95, 2020.
  • [2] K.J. Arrow and E.S. Maskin. Social Choice and Individual Values: Third Edition. Yale University Press, 2012.
  • [3] Kazuoki Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967.
  • [4] Marcus A Badgeley, Stuart C Sealfon, and Maria D Chikina. Hybrid bayesian-rank integration approach improves the predictive power of genomic dataset aggregation. Bioinformatics, 31(2):209–215, 2015.
  • [5] Bernd Bank, Jürgen Guddat, Diethard Klatte, Bernd Kummer, and Klaus Tammer. Non-linear Parametric Optimization. Springer, 1982.
  • [6] John Bartholdi, Craig A Tovey, and Michael A Trick. Voting schemes for which it can be difficult to tell who won the election. Social Choice and Welfare, 6:157–165, 1989.
  • [7] Amir Beck and Marc Teboulle. Mirror descent and non-linear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  • [8] Omri Ben-Eliezer and Eylon Yogev. The adversarial robustness of sampling. In ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, page 49–62, 2020.
  • [9] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
  • [10] Jose Blanchet and Karthyek Murthy. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2):565–600, 2019.
  • [11] Heejong Bong and Alessandro Rinaldo. Generalized results for the existence and consistency of the MLE in the bradley-terry-luce model. In International Conference on Machine Learning, pages 2160–2177, 2022.
  • [12] Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3):324–345, 1952.
  • [13] Krishnendu Chatterjee, Rupak Majumdar, and Marcin Jurdziński. On nash equilibria in stochastic games. In International Workshop on Computer Science Logic, pages 26–40, 2004.
  • [14] Pinhan Chen, Chao Gao, and Anderson Y. Zhang. Optimal full ranking from pairwise comparisons. The Annals of Statistics, 50(3):1775 – 1805, 2022.
  • [15] Pinhan Chen, Chao Gao, and Anderson Y. Zhang. Partial Recovery for Top-k Ranking: Optimality of MLE and Sub-optimality of the Spectral Method. The Annals of Statistics, 50(3):1618 – 1652, 2022.
  • [16] Xi Chen, Yunxiao Chen, and Xiaoou Li. Asymptotically optimal sequential design for rank aggregation. Mathematics of Operations Research, 47(3):2310–2332, 2022.
  • [17] Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47(4):2204, 2019.
  • [18] Herman Chernoff. Sequential design of experiments. The Annals of Mathematical Statistics, 30(3):755–770, 1959.
  • [19] Fan Chung and Linyuan Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79 – 127, 2006.
  • [20] Prithviraj Dasgupta and Joseph B. Collins. A survey of game theoretic approaches for adversarial machine learning in cybersecurity tasks. AI Mag., 40(2):31–43, 2019.
  • [21] John C. Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-ball for learning in high dimensions. In International Conference on Machine Learning, pages 272–279, 2008.
  • [22] Jianqing Fan, Chunming Zhang, and Jian Zhang. Generalized likelihood ratio statistics and wilks phenomenon. The Annals of Statistics, 29(1):153–193, 2001.
  • [23] David A. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
  • [24] Andrew Frohmader and Hans Volkmer. 1-wasserstein distance on the standard simplex. Algebraic Statistics, 12(1):43–56, 2021.
  • [25] Rui Gao and Anton Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. Mathematics of Operations Research, 49(2):1–59, 2023.
  • [26] Alexander Goldenshluger and Assaf Zeevi. Optimal stop** of a random sequence with unknown distribution. Mathematics of Operations Research, 47(1):29–49, 2022.
  • [27] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysis and Minimization Algorithms I: Fundamentals, volume 305. Springer, 2013.
  • [28] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
  • [29] Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combinatorial hodge theory. Mathematical Programming, 127(1):203–244, 2011.
  • [30] Shizuo Kakutani. A generalization of brouwer’s fixed point theorem. Duke Mathematical Journal, 8(3):457–459, 1941.
  • [31] James P Keener. The perron–frobenius theorem and the ranking of football teams. SIAM Review, 35(1):80–93, 1993.
  • [32] Gilad Lerman and Yunpeng Shi. Robust group synchronization via cycle-edge message passing. Foundations of Computational Mathematics, 22:1665–1741, 2022.
  • [33] Wanshan Li, Shamindra Shrotriya, and Alessandro Rinaldo. subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds of the mle in the btl model under general comparison graphs. In International Conference on Uncertainty in Artificial Intelligence, pages 1178–1187, 2022.
  • [34] Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516–527, 2001.
  • [35] Yongchao Liu, Huifu Xu, Shu-Jung Sunny Yang, and ** Zhang. Distributionally robust equilibrium for continuous games: Nash and stackelberg models. European Journal of Operational Research, 265(2):631–643, 2018.
  • [36] Yue Liu, Ethan X. Fang, and Junwei Lu. Lagrangian inference for ranking problems. Operations Research, 71(1):202–223, 2023.
  • [37] Ke Ma, Qianqian Xu, **shan Zeng, Xiaochun Cao, and Qingming Huang. Poisoning attack against estimating from pairwise comparisons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6393–6408, 2022.
  • [38] Ke Ma, Qianqian Xu, **shan Zeng, Guorong Li, Xiaochun Cao, and Qingming Huang. A tale of hodgerank and spectral method: Target attack against rank aggregation is the fixed point of adversarial game. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2022.
  • [39] Colin McDiarmid. Concentration, pages 195–248. Springer Berlin Heidelberg, 1998.
  • [40] Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris, pages 666–704, 1781.
  • [41] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Rank centrality: Ranking from pairwise comparisons. Operation Research, 65(1):266–287, 2017.
  • [42] Gabriel Peyré, Marco Cuturi, et al. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  • [43] Filip Radlinski and Thorsten Joachims. Active exploration for learning rankings from click-through data. In ACM International Conference on Knowledge Discovery and Data Mining, pages 570–579, 2007.
  • [44] J. B. Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica, 33(3):520–534, 1965.
  • [45] Donald G Saari. The mathematics of voting: Democratic symmetry. Economist, 83, 2000.
  • [46] Anders Skrondal and Sophia Rabe-Hesketh. Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68:267–287, 2003.
  • [47] M. Talagrand. Sharper Bounds for Gaussian and Empirical Processes. The Annals of Probability, 22(1):28–76, 1994.
  • [48] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
  • [49] Cédric Villani. Optimal Transport: Old and New. Springer, 2008.
  • [50] **gyan Wang, Nihar B. Shah, and R. Ravi. Stretching the effectiveness of MLE from accuracy to bias for pairwise comparisons. In International Conference on Artificial Intelligence and Statistics, pages 66–76, 2020.
  • [51] Xingxing Wei, Ying Guo, and Jie Yu. Adversarial sticker: A stealthy attack method in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2711–2725, 2023.
  • [52] Xingxing Wei, Ying Guo, Jie Yu, and Bo Zhang. Simultaneously optimizing perturbations and positions for black-box adversarial patch attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9041–9054, 2023.
  • [53] ** Wang, and Huanqian Yan. Efficient robustness assessment via adversarial spatial-temporal focus on videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10898–10912, 2023.
[Uncaptioned image] Ke Ma is an associate professor with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences (UCAS), Bei**g, China. He received the B.S. degree in mathematics from Tian** University in 2009, M.E. degree in software engineering from Beihang University (BUAA) in 2013, and the Ph.D. degree in computer science from the Key Laboratory of Information Security (SKLOIS), Institute of Information Engineering (IIE), Chinese Academy of Sciences (CAS), in 2019. His research interests include rank aggregation and algorithmic game theory.
[Uncaptioned image] Qianqian Xu received the B.S. degree in computer science from China University of Mining and Technology in 2007 and the Ph.D. degree in computer science from University of Chinese Academy of Sciences in 2013. She is currently a Professor with the Institute of Computing Technology, Chinese Academy of Sciences, Bei**g, China. Her research interests include statistical machine learning, with applications in multimedia and computer vision. She has authored or coauthored 70+ academic papers in prestigious international journals and conferences (including T-PAMI, IJCV, T-IP, NeurIPS, ICML, CVPR, AAAI, etc). Moreover, she serves as an associate editor of IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, and ACM Transactions on Multimedia Computing, Communications, and Applications.
[Uncaptioned image] **shan Zeng received the Ph.D. degree in applied mathematics from Xi’an Jiaotong University, Xi’an, China, in 2015. He is currently a professor with the School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China. His research interests include non-convex optimization and machine learning.
[Uncaptioned image] Wei Liu is currently a Distinguished Scientist of Tencent and the Director of Ads Multimedia AI at Tencent Data Platform. Prior to that, he has been a research staff member of IBM T. J. Watson Research Center, USA. Dr. Liu has long been devoted to fundamental research and technology development in core fields of AI, including deep learning, machine learning, computer vision, pattern recognition, information retrieval, big data, etc. To date, he has published extensively in these fields with more than 270 peer-reviewed technical papers, and also issued 23 US patents. He currently serves on the editorial boards of IEEE TPAMI, TNNLS, IEEE Intelligent Systems, and Transactions on Machine Learning Research. He is an Area Chair of top-tier computer science and AI conferences, e.g., NeurIPS, ICML, IEEE CVPR, IEEE ICCV, IJCAI, and AAAI. Dr. Liu is a Fellow of the IEEE, IAPR, AAIA, IMA, RSA, and BCS, and an Elected Member of the ISI.
[Uncaptioned image] Xiaochun Cao is a Professor of School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University. He received the B.E. and M.E. degrees both in computer science from Beihang University (BUAA), China, and the Ph.D. degree in computer science from the University of Central Florida, USA, with his dissertation nominated for the university level Outstanding Dissertation Award. After graduation, he spent about three years at ObjectVideo Inc. as a Research Scientist. From 2008 to 2012, he was a professor at Tian** University. Before joining SYSU, he was a professor at Institute of Information Engineering, Chinese Academy of Sciences. He has authored and coauthored over 200 journal and conference papers. In 2004 and 2010, he was the recipients of the Piero Zamperoni best student paper award at the International Conference on Pattern Recognition. He is on the editorial boards of IEEE Trans. on Image Processing and IEEE Trans. on Multimedia, and was on the editorial board of IEEE Trans. on Circuits and Systems for Video Technology.
[Uncaptioned image] Yingfei Sun received the Ph.D. degree in applied mathematics from the Bei**g Institute of Technology, in 1999. He is currently a Full Professor with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences. His current research interests include machine learning and pattern recognition.
[Uncaptioned image] Qingming Huang is a chair professor in the University of Chinese Academy of Sciences and an adjunct research professor in the Institute of Computing Technology, Chinese Academy of Sciences. He graduated with a Bachelor degree in Computer Science in 1988 and Ph.D. degree in Computer Engineering in 1994, both from Harbin Institute of Technology, China. His research areas include multimedia computing, image processing, computer vision and pattern recognition. He has authored or coauthored more than 400 academic papers in prestigious international journals and top-level international conferences. He was the associate editor of IEEE Trans. on CSVT and Acta Automatica Sinica, and the reviewer of various international journals including IEEE Trans. on PAMI, IEEE Trans. on Image Processing, IEEE Trans. on Multimedia, etc. He is a Fellow of IEEE and has served as general chair, program chair, track chair and TPC member for various conferences, including ACM Multimedia, CVPR, ICCV, ICME, ICMR, PCM, BigMM, PSIVT, etc.
\appendixpage
\startcontents

[sections] \printcontents[sections]l1

The first part is about two ranking algorithms tailored to the BTL model, say Hodgerank [29] and the spectral ranking algorithm [41].

The second part is the proof details of Theorem 1, which states the existence of a distributionally robust Nash equilibrium. This result tells us that there exists at least one stable state for both ranker and attacker. To prove the existence of the distributionally robust Nash equilibrium, we need a proposition from [35], which shows that the distributionally robust Nash equilibrium is a global minimizer of the reformulation of (22). This reformulation is also well known for the deterministic Nash equilibrium problem [44]. Based on this proposition, we show the existence results of DRNE for the adversarial game. This result is an extension of the famous Kakutani’s fixed point theorem [30].

The third part is the proof details of Theorem 2. We need some important lemmas. Lemma 1 is a martingale concentration inequality which can deal with the case that the maximum value M𝑀Mitalic_M of |xt+1xt|subscript𝑥𝑡1subscript𝑥𝑡|x_{t+1}-x_{t}|| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | is large, but the maximum is rarely attained (making the variance much smaller than M2superscript𝑀2M^{2}italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) [23, 39, 19]. The following two lemmas assert that for any given subset 𝑺𝒜subscript𝑺𝒜\boldsymbol{S}_{\mathcal{A}}bold_italic_S start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT of the universe 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C, the fraction of elements from 𝑺𝒜subscript𝑺𝒜\boldsymbol{S}_{\mathcal{A}}bold_italic_S start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT within the sample typically does not differ by much from the corresponding fraction among the whole stream. Lemma 2 corresponds to the Bernoulli sampling. Lemma 3 corresponds to the reservoir sampling. Theorem 2 shows the vulnerability of the Bernoulli and reservoir sampling methods. When the sampling parameters of these two methods fulfill the conditions, the data for the ranker will be (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative with respect to the sequence fabricated by the adversary. The adversary are able to obtain the desired ranking list with these pairwise comparisons.

The forth part proves the asymptotic optimality of the proposed adversarial policy (49) and (54) with complete knowledge. First we show some important assumptions. Considering Assumption 1-5, Theorem 4 establishes a lower bound on the minimal Bayesian risk with the help of Lemma 4-7. Theorem 5 provides the asymptotic upper bounds for the expected Kendall tau of the proposed manipulation policy with complete information with the help of Lemma 8 and 9. Theorem 6 shows the asymptotic optimality of the expected stop** time of the proposed manipulation policy with the help of Lemma 10 and 11. These three theorems are sufficient to show the asymptotic optimality of the proposed stop** time (49) and the generation rule solved by (54) with complete knowledge as the identifiability of BTL model by adopting the MLE to obtain the preference score.

The fifth part gives us the solution of (59), which gives birth to the stop** time (65) and generation rule (67) for adversary with incomplete knowledge. Proposition 2 shows that the strong duality ensures that the inner supremum of (59) admits a reformulation which is a simple, univariate optimization problem. Theorem 3 gives the equivalent form of (59), which can be solved efficiently using the mirror descent algorithm. The detailed solving process is Algorithm 3, 4 and 5.

HodgeRank and RankCentrality

HodgeRank

The Hodgerank method discussed in [29] consists in finding the relative ranking score by solving the following least-squares problem:

minimize𝜽n 1 2(i,j)𝑬wij(yijθj+θi)2𝜽superscript𝑛minimize12𝑖𝑗𝑬subscriptsuperscript𝑤𝑖𝑗superscriptsubscript𝑦𝑖𝑗subscript𝜃𝑗subscript𝜃𝑖2\displaystyle\underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{minimize% }}}\ \ \frac{\ 1\ }{\ 2\ }\underset{(i,j)\in\boldsymbol{E}}{\sum}\ w^{*}_{ij}% \big{(}y_{ij}-\theta_{j}+\theta_{i}\big{)}^{2}start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG minimize end_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG start_UNDERACCENT ( italic_i , italic_j ) ∈ bold_italic_E end_UNDERACCENT start_ARG ∑ end_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (74)

where 𝒚=[y12,y13,,yn,n1]𝒚superscriptsubscript𝑦12subscript𝑦13subscript𝑦𝑛𝑛1top\boldsymbol{y}=[y_{12},y_{13},\dots,y_{n,n-1}]^{\top}bold_italic_y = [ italic_y start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT represents the directions of edges. As 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G is a complete graph, we set yij=1subscript𝑦𝑖𝑗1y_{ij}=1italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 which indicates a direct edge from node i𝑖iitalic_i to j𝑗jitalic_j. Based on combinational Hodge theory [29], the minimal norm solution of (74) is simply given as

𝜽¯=0div(𝒚),bold-¯𝜽subscriptsuperscript0div𝒚\boldsymbol{\bar{\theta}}=-\mathcal{L}^{\dagger}_{0}\cdot\textbf{div}(% \boldsymbol{y}),overbold_¯ start_ARG bold_italic_θ end_ARG = - caligraphic_L start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ div ( bold_italic_y ) , (75)

where 0subscriptsuperscript0\mathcal{L}^{\dagger}_{0}caligraphic_L start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the Moore-Penrose pseudo-inverse of 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and the divergence operator div is defined as

[div(𝒚)](i)=j:(i,j)𝑬wijyij,i[n].formulae-sequencedelimited-[]div𝒚𝑖:𝑗𝑖𝑗𝑬subscriptsuperscript𝑤𝑖𝑗subscript𝑦𝑖𝑗for-all𝑖delimited-[]𝑛[\textbf{div}(\boldsymbol{y})](i)=\underset{j:(i,j)\in\boldsymbol{E}}{\sum}\ w% ^{*}_{ij}y_{ij},\ \forall\ i\in[n].[ div ( bold_italic_y ) ] ( italic_i ) = start_UNDERACCENT italic_j : ( italic_i , italic_j ) ∈ bold_italic_E end_UNDERACCENT start_ARG ∑ end_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , ∀ italic_i ∈ [ italic_n ] . (76)

Rank Centrality

The spectral ranking algorithm, or RankCentrality [41], is motivated by the connection between the pairwise comparisons 𝒘superscript𝒘\boldsymbol{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and a random walk over a directed graph 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G. The spectral method constructs a random walk on 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G where at each time, the random walk is likely to go from vertex i𝑖iitalic_i to vertex j𝑗jitalic_j if items i𝑖iitalic_i and j𝑗jitalic_j were ever compared; and if so, the likelihood of going from i𝑖iitalic_i to j𝑗jitalic_j depends on how often i𝑖iitalic_i lost to j𝑗jitalic_j. That is, the random walk is more likely to move to a neighbor who is more probable to “wins”. How frequently this walk visits a particular node in the long run, or equivalently the stationary distribution, is the score of the corresponding item. Thus, effectively this algorithm captures the preference of the given item versus all the others, not just immediate neighbors: the global effect induced by the transitivity of comparisons is captured through the stationary distribution.

A random walk can be represented by a time-independent transition matrix 𝑷={Pi,j}1i,jn+n×n𝑷subscriptsubscript𝑃𝑖𝑗formulae-sequence1𝑖𝑗𝑛subscriptsuperscript𝑛𝑛\boldsymbol{P}=\{P_{i,j}\}_{1\leq i,j\leq n}\in\mathbb{R}^{n\times n}_{+}bold_italic_P = { italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT 1 ≤ italic_i , italic_j ≤ italic_n end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, where Pi,j=(Xt+1=j|Xt=i)subscript𝑃𝑖𝑗subscript𝑋𝑡1conditional𝑗subscript𝑋𝑡𝑖P_{i,j}=\mathbb{P}(X_{t+1}=j|X_{t}=i)italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_j | italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_i ) and Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the state of the process (arriving node) at time t𝑡titalic_t. By definition, the entries of a transition matrix are nonnegative and satisfy

Pi,j+Pj,i=1,i,j[n],ij.formulae-sequencesubscript𝑃𝑖𝑗subscript𝑃𝑗𝑖1for-all𝑖formulae-sequence𝑗delimited-[]𝑛𝑖𝑗P_{i,j}+P_{j,i}=1,\ \ \forall\ i,\ j\in[n],\ i\neq j.italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = 1 , ∀ italic_i , italic_j ∈ [ italic_n ] , italic_i ≠ italic_j . (77)

One way to define a valid transition matrix 𝑷𝑷\boldsymbol{P}bold_italic_P of a random walk on 𝓖𝓖\boldsymbol{\mathcal{G}}bold_caligraphic_G is to scale all the edge weights by the maximum out-degree of a node, noted as dmaxsubscript𝑑maxd_{\text{max}}italic_d start_POSTSUBSCRIPT max end_POSTSUBSCRIPT. This re-scaling ensures that each row-sum is at most one. Finally, to ensure that each row-sum is exactly one, the spectral method adds a self-loop to each node of 𝑽𝑽\boldsymbol{V}bold_italic_V. Concretely, the transition matrix 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is converted from the pairwise comparison data 𝒘superscript𝒘\boldsymbol{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in such a way that

Pi,j={ 0,ifij,wij+wji=0,1dmaxwijwji+wij,ifij,wij+wji0,11dmaxkiwikwik+wki,otherwise.subscriptsuperscript𝑃𝑖𝑗cases 0formulae-sequenceif𝑖𝑗subscriptsuperscript𝑤𝑖𝑗subscriptsuperscript𝑤𝑗𝑖01subscript𝑑maxsubscriptsuperscript𝑤𝑖𝑗subscriptsuperscript𝑤𝑗𝑖subscriptsuperscript𝑤𝑖𝑗formulae-sequenceif𝑖𝑗subscriptsuperscript𝑤𝑖𝑗subscriptsuperscript𝑤𝑗𝑖011subscript𝑑max𝑘𝑖subscriptsuperscript𝑤𝑖𝑘subscriptsuperscript𝑤𝑖𝑘subscriptsuperscript𝑤𝑘𝑖otherwiseP^{*}_{i,j}=\begin{dcases}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ \ \ \ \ \ 0,&\text{if}\ i\neq j,\ w^{*}_{ij}+w^{*}_{ji}=0,\\[3.5pt] \ \ \ \ \ \ \ \ \ \ \ \ \frac{1}{d_{\text{max}}}\ \frac{w^{*}_{ij}}{w^{*}_{ji}% +w^{*}_{ij}},&\text{if}\ i\neq j,\ w^{*}_{ij}+w^{*}_{ji}\neq 0,\\[2.5pt] 1-\frac{1}{d_{\text{max}}}\underset{k\neq i}{\sum}\ \frac{w^{*}_{ik}}{w^{*}_{% ik}+w^{*}_{ki}},&\text{otherwise}.\end{dcases}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 0 , end_CELL start_CELL if italic_i ≠ italic_j , italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = 0 , end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT max end_POSTSUBSCRIPT end_ARG divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL if italic_i ≠ italic_j , italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ≠ 0 , end_CELL end_ROW start_ROW start_CELL 1 - divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT max end_POSTSUBSCRIPT end_ARG start_UNDERACCENT italic_k ≠ italic_i end_UNDERACCENT start_ARG ∑ end_ARG divide start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL otherwise . end_CELL end_ROW (78)

Rank centrality estimates the probability distribution obtained by applying matrix 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT repeatedly starting from any initial condition. Precisely, let θt(i)=(Xt=i)subscript𝜃𝑡𝑖subscript𝑋𝑡𝑖\theta_{t}(i)=\mathbb{P}(X_{t}=i)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) = blackboard_P ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_i ) denote the distribution of the random walk at time t𝑡titalic_t with 𝜽0={θ0(i)}+nsubscript𝜽0subscript𝜃0𝑖subscriptsuperscript𝑛\boldsymbol{\theta}_{0}=\{\theta_{0}(i)\}\in\mathbb{R}^{n}_{+}bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_i ) } ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT as an arbitrary starting distribution on [n]delimited-[]𝑛[n][ italic_n ]. Then the random walk holds

𝜽t+1=𝜽t𝑷.subscriptsuperscript𝜽top𝑡1subscriptsuperscript𝜽top𝑡superscript𝑷\boldsymbol{\theta}^{\top}_{t+1}=\boldsymbol{\theta}^{\top}_{t}\boldsymbol{P}^% {*}.bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . (79)

wwwwww One expects the stationary distribution of the sample version 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to form a good estimate of true relative ranking score888The original paper assumes that the true relative scores are generated from the logistic pairwise comparison model, e.g. Bradley-Terry-Luce (BTL) model, multi-nominal logit (MNL) and Plackett-Luce (PL) model., provided the sample size is sufficiently large. When the transition matrix has a unique left eigenvector 𝜽superscript𝜽\boldsymbol{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT related to the largest eigenvalue, then starting from any initial distribution 𝜽0subscript𝜽0\boldsymbol{\theta}_{0}bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the limiting distribution 𝜽t+1subscript𝜽𝑡1\boldsymbol{\theta}_{t+1}bold_italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is unique. This stationary distribution limt𝜽t𝑡subscript𝜽𝑡\underset{t\rightarrow\infty}{\lim}\boldsymbol{\theta}_{t}start_UNDERACCENT italic_t → ∞ end_UNDERACCENT start_ARG roman_lim end_ARG bold_italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the top left eigenvector of 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as

limt𝜽t=𝜽¯and𝜽¯=𝜽¯𝑷,formulae-sequence𝑡subscript𝜽𝑡bold-¯𝜽andsuperscriptbold-¯𝜽topsuperscriptbold-¯𝜽topsuperscript𝑷\underset{t\rightarrow\infty}{\lim}\boldsymbol{\theta}_{t}=\boldsymbol{\bar{% \theta}}\ \ \text{and}\ \ \boldsymbol{\bar{\theta}}^{\top}=\boldsymbol{\bar{% \theta}}^{\top}\boldsymbol{P}^{*},start_UNDERACCENT italic_t → ∞ end_UNDERACCENT start_ARG roman_lim end_ARG bold_italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = overbold_¯ start_ARG bold_italic_θ end_ARG and overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = overbold_¯ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , (80)

which only involves a simple eigenvector computation.

Proof of Theorem 1

We clarify the definition of weakly compactness of 𝓟𝓟\boldsymbol{\mathcal{P}}bold_caligraphic_P.

Definition 6.

A set of probability distribution (measures) 𝓟𝓟\boldsymbol{\mathcal{P}}bold_caligraphic_P is said to be weakly compact if every sequence {N}𝓟subscript𝑁𝓟\{\mathbb{P}_{N}\}\subset\boldsymbol{\mathcal{P}}{ blackboard_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } ⊂ bold_caligraphic_P contains a sub-sequence {N0}subscriptsubscript𝑁0\{\mathbb{P}_{N_{0}}\}{ blackboard_P start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and a point 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that {N0}subscriptsubscript𝑁0\{\mathbb{P}_{N_{0}}\}{ blackboard_P start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } converges to 0subscript0\mathbb{P}_{0}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT weakly.

To prove the existence of the distributionally robust Nash equilibrium, we need the following proposition [35], which shows that the distributionally robust Nash equilibrium is a global minimizer of the reformulation of (22). This reformulation is also well known for the deterministic Nash equilibrium problem [44].

Proposition 1.

Let

ϕ(𝑨,𝑨)=r=1Rmax𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝒘)],italic-ϕsuperscript𝑨𝑨superscriptsubscript𝑟1𝑅subscript𝓟𝑟maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscriptsuperscript𝒂𝑟subscript𝒂𝑟𝒘\phi(\boldsymbol{A}^{\prime},\boldsymbol{A})=\sum_{r=1}^{R}\underset{\mathbb{P% }\in\boldsymbol{\mathcal{P}}_{r}}{\ \textbf{{max}}\phantom{g}}\ \mathbb{E}_{% \boldsymbol{w}\sim\mathbb{P}}\Big{[}f_{r}(\boldsymbol{a}^{\prime}_{r},% \boldsymbol{a}_{-r},\boldsymbol{w})\Big{]},italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A ) = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] , (81)

where 𝐀=[𝐚1,,𝐚R]superscript𝐀subscriptsuperscript𝐚1subscriptsuperscript𝐚𝑅\boldsymbol{A}^{\prime}=[\boldsymbol{a}^{\prime}_{1},\dots,\boldsymbol{a}^{% \prime}_{R}]bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ]. Under the conditions of Theorem 1, 𝐀superscript𝐀\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a distributionally robust Nash equilibrium as (25) if and only if

𝑨arg min𝑨ϕ(𝑨,𝑨).superscript𝑨superscript𝑨arg minitalic-ϕsuperscript𝑨superscript𝑨\boldsymbol{A}^{*}\in\underset{\boldsymbol{A}^{\prime}}{\textbf{{arg min}}}\ % \phi(\boldsymbol{A}^{\prime},\boldsymbol{A}^{*}).bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ start_UNDERACCENT bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG arg min end_ARG italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (82)
Proof.
  1. 1.

    The ‘if’ part. If 𝑨superscript𝑨\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is not an equilibrium state, there exists at least one 𝒂r0subscriptsuperscript𝒂subscript𝑟0\boldsymbol{a}^{\prime}_{r_{0}}bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT which satisfies

    max𝓟r0𝔼𝒘[fr0(𝒂r0,𝒂r0,𝒘)]subscript𝓟subscript𝑟0maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓subscript𝑟0subscriptsuperscript𝒂subscript𝑟0subscriptsuperscript𝒂subscript𝑟0𝒘\displaystyle\ \ \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r_{0}}}{\ % \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}% f_{r_{0}}(\boldsymbol{a}^{\prime}_{r_{0}},\boldsymbol{a}^{*}_{-{r_{0}}},% \boldsymbol{w})\Big{]}start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_w ) ] (83)
    <\displaystyle<< max𝓟r0𝔼𝒘[fr0(𝒂r0,𝒂r0,𝒘)].subscript𝓟subscript𝑟0maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓subscript𝑟0subscriptsuperscript𝒂subscript𝑟0subscriptsuperscript𝒂subscript𝑟0𝒘\displaystyle\ \ \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r_{0}}}{\ % \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}% f_{r_{0}}(\boldsymbol{a}^{*}_{r_{0}},\boldsymbol{a}^{*}_{-{r_{0}}},\boldsymbol% {w})\Big{]}.start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_w ) ] .

    Let 𝑨′′superscript𝑨′′\boldsymbol{A}^{\prime\prime}bold_italic_A start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT be

    𝑨′′=[𝒂1,,𝒂r01,𝒂r0,𝒂r0+1,,𝒂R]superscript𝑨′′subscriptsuperscript𝒂1subscriptsuperscript𝒂subscript𝑟01subscriptsuperscript𝒂subscript𝑟0subscriptsuperscript𝒂subscript𝑟01subscriptsuperscript𝒂𝑅\boldsymbol{A}^{\prime\prime}=[\boldsymbol{a}^{*}_{1},\dots,\boldsymbol{a}^{*}% _{r_{0}-1},\boldsymbol{a}^{\prime}_{r_{0}},\boldsymbol{a}^{*}_{r_{0}+1},\dots,% \boldsymbol{a}^{*}_{R}]bold_italic_A start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = [ bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , … , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ] (84)

    and we could conduct the following contradiction by (83):

    ϕ(𝑨′′,𝑨)<ϕ(𝑨,𝑨).italic-ϕsuperscript𝑨′′superscript𝑨italic-ϕsuperscript𝑨superscript𝑨\phi(\boldsymbol{A}^{\prime\prime},\boldsymbol{A}^{*})<\phi(\boldsymbol{A}^{*}% ,\boldsymbol{A}^{*}).italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) < italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (85)
  2. 2.

    The ‘only if’ part. By the definition of distributionally robust Nash equilibrium, for any 𝑨superscript𝑨\boldsymbol{A}^{\prime}bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, it holds that

    max𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝒘)]subscript𝓟𝑟maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscriptsuperscript𝒂𝑟subscriptsuperscript𝒂𝑟𝒘\displaystyle\ \ \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r}}{\ % \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}% f_{r}(\boldsymbol{a}^{\prime}_{r},\boldsymbol{a}^{*}_{-{r}},\boldsymbol{w})% \Big{]}start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] (86)
    \displaystyle\geq max𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝒘)],r[R].subscript𝓟𝑟maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscriptsuperscript𝒂𝑟subscriptsuperscript𝒂𝑟𝒘for-all𝑟delimited-[]𝑅\displaystyle\ \ \underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r}}{\ % \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}\Big{[}% f_{r}(\boldsymbol{a}^{*}_{r},\boldsymbol{a}^{*}_{-{r}},\boldsymbol{w})\Big{]},% \ \ \forall\ r\in[R].start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] , ∀ italic_r ∈ [ italic_R ] .

    Summing the two sides of the above inequality, we have

    ϕ(𝑨,𝑨)italic-ϕsuperscript𝑨superscript𝑨\displaystyle\phi(\boldsymbol{A}^{\prime},\boldsymbol{A}^{*})italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) =\displaystyle== r=1Rmax𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝒘)]superscriptsubscript𝑟1𝑅subscript𝓟𝑟maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscriptsuperscript𝒂𝑟subscriptsuperscript𝒂𝑟𝒘\displaystyle\ \ \sum_{r=1}^{R}\underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}% _{r}}{\ \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}% }\Big{[}f_{r}(\boldsymbol{a}^{\prime}_{r},\boldsymbol{a}^{*}_{-{r}},% \boldsymbol{w})\Big{]}∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] (87)
    \displaystyle\geq r=1Rmax𝓟r𝔼𝒘[fr(𝒂r,𝒂r,𝒘)]superscriptsubscript𝑟1𝑅subscript𝓟𝑟maxsubscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscriptsuperscript𝒂𝑟subscriptsuperscript𝒂𝑟𝒘\displaystyle\ \ \sum_{r=1}^{R}\underset{\mathbb{P}\in\boldsymbol{\mathcal{P}}% _{r}}{\ \textbf{{max}}\phantom{g}\ }\ \mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}% }\Big{[}f_{r}(\boldsymbol{a}^{*}_{r},\boldsymbol{a}^{*}_{-{r}},\boldsymbol{w})% \Big{]}∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_UNDERACCENT blackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ]
    =\displaystyle== ϕ(𝑨,𝑨)italic-ϕsuperscript𝑨superscript𝑨\displaystyle\ \ \phi(\boldsymbol{A}^{*},\boldsymbol{A}^{*})italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )

    which implies 𝑨superscript𝑨\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a global optimal solution as (82).

Based on the Proposition 1, we show the existence results of DRNE for the adversarial game. This results is an extension of the famous Kakutani’s fixed point theorem [30].

See 1

Proof.

We know that the ‘supremum’ operator will preserve the convexity. Moreover, with the weakly compactness of 𝓟rsubscript𝓟𝑟\boldsymbol{\mathcal{P}}_{r}bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the ‘supremum’ operator also preserve the continuity. As a consequence, for any given 𝒂rsubscript𝒂𝑟\boldsymbol{a}_{-r}bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT, 𝔼𝒘[fr(,𝒂r,𝒘)]subscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscript𝒂𝑟𝒘\mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}[f_{r}(\cdot,\boldsymbol{a}_{-r},% \boldsymbol{w})]blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ⋅ , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] is continuous and convex for every 𝓟r,r=1,,Rformulae-sequencesubscript𝓟𝑟𝑟1𝑅\mathbb{P}\in\boldsymbol{\mathcal{P}}_{r},r=1,\dots,Rblackboard_P ∈ bold_caligraphic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_r = 1 , … , italic_R. By the definition of ϕitalic-ϕ\phiitalic_ϕ (81), for any given 𝑨𝑨\boldsymbol{A}bold_italic_A, ϕ(𝑨,𝑨)italic-ϕsuperscript𝑨𝑨\phi(\boldsymbol{A}^{\prime},\boldsymbol{A})italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A ) is continuous and convex w.r.t. any 𝑨superscript𝑨\boldsymbol{A}^{\prime}bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Besides, the existence of an optimal solution to

min𝑨ϕ(𝑨,𝑨)superscript𝑨minitalic-ϕsuperscript𝑨𝑨\underset{\boldsymbol{A}^{\prime}}{\textbf{{min}}}\ \phi(\boldsymbol{A}^{% \prime},\boldsymbol{A})start_UNDERACCENT bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG min end_ARG italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A ) (88)

with given 𝒂𝒂\boldsymbol{a}bold_italic_a is guaranteed by assuming that 𝔼𝒘[fr(,𝒂r,𝒘)]subscript𝔼similar-to𝒘delimited-[]subscript𝑓𝑟subscript𝒂𝑟𝒘\mathbb{E}_{\boldsymbol{w}\sim\mathbb{P}}[f_{r}(\cdot,\boldsymbol{a}_{-r},% \boldsymbol{w})]blackboard_E start_POSTSUBSCRIPT bold_italic_w ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ⋅ , bold_italic_a start_POSTSUBSCRIPT - italic_r end_POSTSUBSCRIPT , bold_italic_w ) ] only has finite values.

The remaining part is to show the existence of 𝑨superscript𝑨\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT which satisfies

𝑨arg min𝑨ϕ(𝑨,𝑨).superscript𝑨superscript𝑨arg minitalic-ϕsuperscript𝑨superscript𝑨\boldsymbol{A}^{*}\in\underset{\boldsymbol{A}^{\prime}}{\textbf{{arg min}}}\ % \phi(\boldsymbol{A}^{\prime},\boldsymbol{A}^{*}).bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ start_UNDERACCENT bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG arg min end_ARG italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (89)

Let Γ(𝑨)Γ𝑨\Gamma(\boldsymbol{A})roman_Γ ( bold_italic_A ) be the solution set of minϕ(𝑨,𝑨)minitalic-ϕsuperscript𝑨𝑨\textbf{{min}}\ \phi(\boldsymbol{A}^{\prime},\boldsymbol{A})min italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A ) with given 𝑨𝑨\boldsymbol{A}bold_italic_A. By the convexity of ϕ(,𝑨)italic-ϕ𝑨\phi(\cdot,\boldsymbol{A})italic_ϕ ( ⋅ , bold_italic_A ), Γ(𝑨)Γ𝑨\Gamma(\boldsymbol{A})roman_Γ ( bold_italic_A ) is a convex set. Γ(𝑨)Γ𝑨\Gamma(\boldsymbol{A})roman_Γ ( bold_italic_A ) is also a closed set: for any {𝑨t}𝑨¯subscript𝑨𝑡bold-¯𝑨\{\boldsymbol{A}_{t}\}\rightarrow\boldsymbol{\bar{A}}{ bold_italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } → overbold_¯ start_ARG bold_italic_A end_ARG as t𝑡t\rightarrow\inftyitalic_t → ∞ and 𝑨tΓ(𝑨t)subscriptsuperscript𝑨𝑡Γsubscript𝑨𝑡\boldsymbol{A}^{\prime}_{t}\in\Gamma(\boldsymbol{A}_{t})bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Γ ( bold_italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) with {𝑨t}𝑨¯subscriptsuperscript𝑨𝑡superscriptbold-¯𝑨\{\boldsymbol{A}^{\prime}_{t}\}\rightarrow\boldsymbol{\bar{A}}^{\prime}{ bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } → overbold_¯ start_ARG bold_italic_A end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, it holds that

𝑨¯Γ(𝑨¯).superscriptbold-¯𝑨Γbold-¯𝑨\boldsymbol{\bar{A}}^{\prime}\in\Gamma(\boldsymbol{\bar{A}}).overbold_¯ start_ARG bold_italic_A end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Γ ( overbold_¯ start_ARG bold_italic_A end_ARG ) . (90)

By [5], Γ(𝑨)Γ𝑨\Gamma(\boldsymbol{A})roman_Γ ( bold_italic_A ) is upper semi-continuous on N×Rsuperscript𝑁𝑅\mathbb{R}^{N\times R}blackboard_R start_POSTSUPERSCRIPT italic_N × italic_R end_POSTSUPERSCRIPT. With the well-known Kakutani’s fixed point theorem [30], there exists 𝑨superscript𝑨\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that

𝑨arg min𝑨ϕ(𝑨,𝑨).superscript𝑨superscript𝑨arg minitalic-ϕsuperscript𝑨superscript𝑨\boldsymbol{A}^{*}\in\underset{\boldsymbol{A}^{\prime}}{\textbf{{arg min}}}\ % \phi(\boldsymbol{A}^{\prime},\boldsymbol{A}^{*}).bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ start_UNDERACCENT bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG arg min end_ARG italic_ϕ ( bold_italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (91)

With Proposition 1, we know 𝑨superscript𝑨\boldsymbol{A}^{*}bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a distributionally robust Nash equilibrium. ∎

Proof of Theorem 2

In this section, we prove the main technical result for Bernoulli sampling. First we need the follow well-known results of martingale inequalities.

Definition 7.

A martingale is a sequence 𝐗=(x1,,xT)𝐗subscript𝑥1subscript𝑥𝑇\boldsymbol{X}=(x_{1},\dots,x_{T})bold_italic_X = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) of random variables with finite means, such that

𝔼[xt+1|x1,,xt]=xt,t[T].formulae-sequence𝔼delimited-[]conditionalsubscript𝑥𝑡1subscript𝑥1subscript𝑥𝑡subscript𝑥𝑡for-all𝑡delimited-[]𝑇\mathbb{E}\big{[}x_{t+1}|x_{1},\dots,x_{t}\big{]}=x_{t},\ \ \forall\ t\in[T].blackboard_E [ italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ italic_t ∈ [ italic_T ] . (92)

The most basic and well-known concentration result of martingale is the Azuma’s (or Hoeffding’s) inequality, which asserts that martingales with bounded differences |xt+1xt|subscript𝑥𝑡1subscript𝑥𝑡|x_{t+1}-x_{t}|| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | are well-concentrated around their mean. However, we need the other concentration inequality which can deal with the case that the maximum value M𝑀Mitalic_M of |xt+1xt|subscript𝑥𝑡1subscript𝑥𝑡|x_{t+1}-x_{t}|| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | is large, but the maximum is rarely attained (making the variance much smaller than M2superscript𝑀2M^{2}italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) [23, 39, 19]. The martingales that we investigate in this paper depict this behavior.

Lemma 1.

Let 𝐗=(x1,,xT)𝐗subscript𝑥1subscript𝑥𝑇\boldsymbol{X}=(x_{1},\dots,x_{T})bold_italic_X = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) be a martingale and the variance of xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT given x1,,xtsubscript𝑥1subscript𝑥𝑡x_{1},\dots,x_{t}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be bounded

Var(xt+1|x1,,xt)σt+12,t[T],formulae-sequenceVarconditionalsubscript𝑥𝑡1subscript𝑥1subscript𝑥𝑡subscriptsuperscript𝜎2𝑡1for-all𝑡delimited-[]𝑇\textbf{{Var}}(x_{t+1}|x_{1},\dots,x_{t})\leq\sigma^{2}_{t+1},\ \ \forall\ t% \in[T],Var ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , ∀ italic_t ∈ [ italic_T ] , (93)

where σt0subscript𝜎𝑡0\sigma_{t}\geq 0italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 0. If there exist a constant M𝑀Mitalic_M such that

|xt+1xt|M,subscript𝑥𝑡1subscript𝑥𝑡𝑀|x_{t+1}-x_{t}|\leq M,| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ italic_M , (94)

we have

(𝑿𝔼[𝑿]λ)exp(λ2t=1Tσt2+λM3),𝑿𝔼delimited-[]𝑿𝜆expsuperscript𝜆2superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎2𝑡𝜆𝑀3\mathbb{P}\big{(}\boldsymbol{X}-\mathbb{E}[\boldsymbol{X}]\geq\lambda\big{)}% \leq\textbf{{exp}}\left(-\frac{\lambda^{2}}{\sum_{t=1}^{T}\sigma^{2}_{t}+\frac% {\lambda M}{3}}\right),blackboard_P ( bold_italic_X - blackboard_E [ bold_italic_X ] ≥ italic_λ ) ≤ exp ( - divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_λ italic_M end_ARG start_ARG 3 end_ARG end_ARG ) , (95)

where 𝔼[𝐗]𝔼delimited-[]𝐗\mathbb{E}[\boldsymbol{X}]blackboard_E [ bold_italic_X ] is defined as

𝔼[𝑿]=t=1T𝔼[xt].𝔼delimited-[]𝑿superscriptsubscript𝑡1𝑇𝔼delimited-[]subscript𝑥𝑡\mathbb{E}[\boldsymbol{X}]=\sum_{t=1}^{T}\mathbb{E}[x_{t}].blackboard_E [ bold_italic_X ] = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_E [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] . (96)

Particularly,

(|𝑿𝔼[𝑿]|λ)2exp(λ2t=1Tσt2+λM3).𝑿𝔼delimited-[]𝑿𝜆2expsuperscript𝜆2superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎2𝑡𝜆𝑀3\mathbb{P}\Big{(}\big{|}\boldsymbol{X}-\mathbb{E}[\boldsymbol{X}]\big{|}\geq% \lambda\Big{)}\leq 2\cdot\textbf{{exp}}\left(-\frac{\lambda^{2}}{\sum_{t=1}^{T% }\sigma^{2}_{t}+\frac{\lambda M}{3}}\right).blackboard_P ( | bold_italic_X - blackboard_E [ bold_italic_X ] | ≥ italic_λ ) ≤ 2 ⋅ exp ( - divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_λ italic_M end_ARG start_ARG 3 end_ARG end_ARG ) . (97)

The following lemmas assert that for any given subset 𝑺𝒜subscript𝑺𝒜\boldsymbol{S}_{\mathcal{A}}bold_italic_S start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT of the universe 𝓒𝓒\boldsymbol{\mathcal{C}}bold_caligraphic_C, the fraction of elements from 𝑺𝒜subscript𝑺𝒜\boldsymbol{S}_{\mathcal{A}}bold_italic_S start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT within the sample typically does not differ by much from the corresponding fraction among the whole stream. Lemma 2 corresponds to the Bernoulli sampling. Lemma 3 corresponds to the reservoir sampling.

Lemma 2.

For any dynamic stream 𝐂={ct}t=1𝐂superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, if the parameter of Bernoulli method ϱitalic-ϱ\varrhoitalic_ϱ holds that

ϱ10ln(4/δ)ϵ2T,italic-ϱ10ln4𝛿superscriptitalic-ϵ2𝑇\varrho\geq 10\cdot\frac{\textbf{{ln}}(4/\delta)}{\epsilon^{2}T},italic_ϱ ≥ 10 ⋅ divide start_ARG ln ( 4 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , (98)

we have

(|d𝓒(𝑪)d𝓒(𝑪)|ϵ)δ,subscript𝑑superscript𝓒bold-′𝑪subscript𝑑superscript𝓒superscript𝑪italic-ϵ𝛿\mathbb{P}(|d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C})-d_{% \boldsymbol{\mathcal{C}}^{\prime}}(\boldsymbol{C}^{\prime})|\geq\epsilon)\leq\delta,blackboard_P ( | italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ) ≤ italic_δ , (99)

where 𝐂superscript𝐂\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a sequence which is sampled from 𝐂𝐂\boldsymbol{C}bold_italic_C by the Bernoulli method.

Proof.

At any given time step t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ] along the sampling process, let 𝑪t=(c1,,ct)subscript𝑪𝑡subscript𝑐1subscript𝑐𝑡\boldsymbol{C}_{t}=(c_{1},\dots,c_{t})bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) be the sequence of pairwise comparisons submitted to the Bernoulli method until time t𝑡titalic_t, and 𝑪t𝑪tsuperscriptsubscript𝑪𝑡subscript𝑪𝑡\boldsymbol{C}_{t}^{\prime}\subseteq\boldsymbol{C}_{t}bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a sub-sequence of the sampled pairwise comparisons from 𝑪tsubscript𝑪𝑡\boldsymbol{C}_{t}bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Note that 𝑪T=𝑪subscript𝑪𝑇𝑪\boldsymbol{C}_{T}=\boldsymbol{C}bold_italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = bold_italic_C and 𝑪T=𝑪subscriptsuperscript𝑪𝑇superscript𝑪\boldsymbol{C}^{\prime}_{T}=\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and hence, to prove the lemma, we need to show that

|d𝓒(𝑪T)d𝓒(𝑪T)|ϵ.subscript𝑑superscript𝓒bold-′subscript𝑪𝑇subscript𝑑superscript𝓒bold-′superscriptsubscript𝑪𝑇italic-ϵ|d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{T})-d_{\boldsymbol{% \mathcal{C}^{\prime}}}(\boldsymbol{C}_{T}^{\prime})|\leq\epsilon.| italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_ϵ .

Given a 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we define the random variables

At(𝓒)subscript𝐴𝑡superscript𝓒bold-′\displaystyle A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== tTd𝓒(𝑪t)=|𝓒𝑪t|T,𝑡𝑇subscript𝑑superscript𝓒bold-′subscript𝑪𝑡superscript𝓒bold-′subscript𝑪𝑡𝑇\displaystyle\ \ \frac{\ t}{T}\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(% \boldsymbol{C}_{t})=\frac{|\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}% _{t}|}{T},divide start_ARG italic_t end_ARG start_ARG italic_T end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | end_ARG start_ARG italic_T end_ARG , (100)
Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== |𝓒𝑪t|ϱT,superscript𝓒bold-′subscriptsuperscript𝑪𝑡italic-ϱ𝑇\displaystyle\ \ \frac{|\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}^{% \prime}_{t}|}{\varrho T},divide start_ARG | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | end_ARG start_ARG italic_ϱ italic_T end_ARG ,
Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′\displaystyle Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== Bt(𝓒)At(𝓒),subscript𝐵𝑡superscript𝓒bold-′subscript𝐴𝑡superscript𝓒bold-′\displaystyle\ \ B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-A_{t}({\boldsymbol% {\mathcal{C}^{\prime}}}),italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ,

where the intersection between a set 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT and a sequence 𝑪tsubscript𝑪𝑡\boldsymbol{C}_{t}bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the sub-sequence of 𝑪tsubscript𝑪𝑡\boldsymbol{C}_{t}bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT consisting of all pairwise comparisons (repetitions are allowed) that also belong to 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT. Next we show that (Z0(𝓒),,ZT(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{T}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) is a martingale. Suppose that the 𝑪t1subscriptsuperscript𝑪𝑡1\boldsymbol{C}^{\prime}_{t-1}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is fixed and thence the values of Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) are fixed. Now a new pairwise comparison ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is ready to submit. Note that ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT may be either the original data or the comparison perturbed by the adversary.

If ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we have

At(𝓒)subscript𝐴𝑡superscript𝓒bold-′\displaystyle A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== At1(𝓒),subscript𝐴𝑡1superscript𝓒bold-′\displaystyle\ \ A_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}),italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , (101)
Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== Bt1(𝓒),subscript𝐵𝑡1superscript𝓒bold-′\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}),italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ,
Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′\displaystyle Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== Zt1(𝓒),subscript𝑍𝑡1superscript𝓒bold-′\displaystyle\ \ Z_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}),italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ,

and it holds that

𝔼[Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒]=Zt1(𝓒)𝔼delimited-[]conditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′\mathbb{E}\big{[}\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ \big{|}\ Z_{0}(% {\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}}),\ c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}\ \big{]}=Z_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}})blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) (102)

When ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we have

At(𝓒)subscript𝐴𝑡superscript𝓒bold-′\displaystyle A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== At1(𝓒)+1T,subscript𝐴𝑡1superscript𝓒bold-′1𝑇\displaystyle\ \ A_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+\frac{1}{T},italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_T end_ARG , (103)
Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== {Bt1(𝓒),ifctis not sampled,Bt1(𝓒)+1ϱT,otherwise.casessubscript𝐵𝑡1superscript𝓒bold-′ifsubscript𝑐𝑡is not sampledsubscript𝐵𝑡1superscript𝓒bold-′1italic-ϱ𝑇otherwise.\displaystyle\ \ \begin{cases}\displaystyle B_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}}),&\ \text{if}\ c_{t}\ \text{is not sampled},\\[5.0pt] \displaystyle B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+\frac{1}{\varrho T}% ,&\text{otherwise.}\end{cases}{ start_ROW start_CELL italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is not sampled , end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG , end_CELL start_CELL otherwise. end_CELL end_ROW
Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′\displaystyle Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== {Zt1(𝓒)1T,ifctis not sampled,Zt1(𝓒)+1ϱT1T,otherwise.casessubscript𝑍𝑡1superscript𝓒bold-′1𝑇ifsubscript𝑐𝑡is not sampledsubscript𝑍𝑡1superscript𝓒bold-′1italic-ϱ𝑇1𝑇otherwise.\displaystyle\ \ \begin{cases}\displaystyle Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})-\frac{1}{T},&\ \text{if}\ c_{t}\ \text{is not sampled},\\[7.5pt] \displaystyle Z_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+\frac{1}{\varrho T}% -\frac{1}{T},&\ \text{otherwise.}\end{cases}{ start_ROW start_CELL italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG , end_CELL start_CELL if italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is not sampled , end_CELL end_ROW start_ROW start_CELL italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG , end_CELL start_CELL otherwise. end_CELL end_ROW

Recall that each pairwise comparison is sampled by the Bernoulli method, independently, with probability ϱitalic-ϱ\varrhoitalic_ϱ. Therefore, we have that

𝔼[Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒]𝔼delimited-[]conditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′\displaystyle\ \ \mathbb{E}\big{[}\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})% \ \big{|}\ Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}}),\ c_{t}\in\boldsymbol{\mathcal{C}^{\prime}% }\ \big{]}blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] (104)
=\displaystyle== Zt1(𝓒)+ϱ(1ϱT1T)+(1ϱ)(1T)subscript𝑍𝑡1superscript𝓒bold-′italic-ϱ1italic-ϱ𝑇1𝑇1italic-ϱ1𝑇\displaystyle\ \ Z_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+\varrho\cdot% \left(\frac{1}{\varrho T}-\frac{1}{T}\right)+(1-\varrho)\cdot\left(-\frac{1}{T% }\right)italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_ϱ ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ) + ( 1 - italic_ϱ ) ⋅ ( - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG )
=\displaystyle== Zt1(𝓒).subscript𝑍𝑡1superscript𝓒bold-′\displaystyle\ \ Z_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}).italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) .

Combine the two cases, we know that (Z0(𝓒),,ZT(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{T}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) is a martingale.

Next we will show that the variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) conditioned on Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) is bounded by 1/ϱT21italic-ϱsuperscript𝑇21/\varrho T^{2}1 / italic_ϱ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. If ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT and with simple calculation from (101) and (102), the variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) given Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) equals to zeros as

Var(Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒)=0.Varconditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′0\textbf{{Var}}(\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ \big{|}\ Z_{0}({% \boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}}),\ c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}\ )=0.Var ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = 0 . (105)

When ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we have

Var(Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒)Varconditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′\displaystyle\ \ \textbf{{Var}}(\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ % \big{|}\ Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol% {\mathcal{C}^{\prime}}}),\ c_{t}\in\boldsymbol{\mathcal{C}^{\prime}}\ )Var ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) (106)
=\displaystyle== (1ϱ)(1T)2+ϱ(1ϱT1T)21italic-ϱsuperscript1𝑇2italic-ϱsuperscript1italic-ϱ𝑇1𝑇2\displaystyle\ \ (1-\varrho)\cdot\left(\frac{1}{T}\right)^{2}+\varrho\cdot% \left(\frac{1}{\varrho T}-\frac{1}{T}\right)^{2}( 1 - italic_ϱ ) ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϱ ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1T2(1ϱ1)1ϱT2.1superscript𝑇21italic-ϱ11italic-ϱsuperscript𝑇2\displaystyle\ \ \frac{1}{T^{2}}\cdot\left(\frac{1}{\varrho}-1\right)\leq\frac% {1}{\varrho T^{2}}.divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_ϱ end_ARG - 1 ) ≤ divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Combine the two cases, we know that the variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) conditioned on Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) is bounded by 1/ϱT21italic-ϱsuperscript𝑇21/\varrho T^{2}1 / italic_ϱ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

It always holds that

|Zt(𝓒)Zt1(𝓒)|max{1T,1ϱT1T}1ϱT.subscript𝑍𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′max1𝑇1italic-ϱ𝑇1𝑇1italic-ϱ𝑇|Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})|\leq\textbf{{max}}\left\{\frac{1}{T},\frac{1}{\varrho T}-\frac{1}{T% }\right\}\leq\frac{1}{\varrho T}.| italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≤ max { divide start_ARG 1 end_ARG start_ARG italic_T end_ARG , divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG - divide start_ARG 1 end_ARG start_ARG italic_T end_ARG } ≤ divide start_ARG 1 end_ARG start_ARG italic_ϱ italic_T end_ARG . (107)

At last, we complete the proof of this lemma by proving the following two inequalities for any ϱitalic-ϱ\varrhoitalic_ϱ satisfying the condition (98).

(|AT(𝓒)BT(𝓒)|ϵ2)δ2subscript𝐴𝑇superscript𝓒bold-′subscript𝐵𝑇superscript𝓒bold-′italic-ϵ2𝛿2\displaystyle\mathbb{P}\left(\ \big{|}A_{T}(\boldsymbol{\mathcal{C}^{\prime}})% \ -B_{T}(\boldsymbol{\mathcal{C}^{\prime}})\big{|}\geq\frac{\epsilon}{2}\ % \right)\leq\frac{\delta}{2}blackboard_P ( | italic_A start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≥ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) ≤ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG (108a)
(|BT(𝓒)d𝓒(𝑪T)|ϵ2)δ2subscript𝐵𝑇superscript𝓒bold-′subscript𝑑superscript𝓒bold-′superscriptsubscript𝑪𝑇italic-ϵ2𝛿2\displaystyle\mathbb{P}\left(\ \big{|}B_{T}(\boldsymbol{\mathcal{C}^{\prime}})% -d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{T}^{\prime})\big{|}\geq% \frac{\epsilon}{2}\ \right)\leq\frac{\delta}{2}blackboard_P ( | italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≥ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) ≤ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG (108b)

We can choose λ=ϵ/2𝜆italic-ϵ2\lambda=\epsilon/2italic_λ = italic_ϵ / 2, σt2=1/ϱT2subscriptsuperscript𝜎2𝑡1italic-ϱsuperscript𝑇2\sigma^{2}_{t}=1/\varrho T^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 / italic_ϱ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and M=1/ϱT𝑀1italic-ϱ𝑇M=1/\varrho Titalic_M = 1 / italic_ϱ italic_T and apply Lemma 1 on (Z0(𝓒),,ZT(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{T}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ). As Z0(𝓒)=0subscript𝑍0superscript𝓒bold-′0Z_{0}(\boldsymbol{\mathcal{C}^{\prime}})=0italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = 0, we have

|AT(𝓒)BT(𝓒)|=|ZT(𝓒)Z0(𝓒)|,subscript𝐴𝑇superscript𝓒bold-′subscript𝐵𝑇superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′|A_{T}(\boldsymbol{\mathcal{C}^{\prime}})-B_{T}(\boldsymbol{\mathcal{C}^{% \prime}})|=|Z_{T}(\boldsymbol{\mathcal{C}^{\prime}})-Z_{0}(\boldsymbol{% \mathcal{C}^{\prime}})|,| italic_A start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | = | italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | , (109)

and (108a) will holds that

(|AT(𝓒)BT(𝓒)|ϵ2)subscript𝐴𝑇superscript𝓒bold-′subscript𝐵𝑇superscript𝓒bold-′italic-ϵ2\displaystyle\ \ \mathbb{P}\left(\ \big{|}A_{T}(\boldsymbol{\mathcal{C}^{% \prime}})-B_{T}(\boldsymbol{\mathcal{C}^{\prime}})\big{|}\geq\frac{\epsilon}{2% }\ \right)blackboard_P ( | italic_A start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≥ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ) (110)
\displaystyle\leq 2exp((ϵ/2)22T(1/ϱT2)+ϵ/(6ϱT))2expsuperscriptitalic-ϵ222𝑇1italic-ϱsuperscript𝑇2italic-ϵ6italic-ϱ𝑇\displaystyle\ \ 2\cdot\textbf{{exp}}\left(-\frac{(\epsilon/2)^{2}}{2T\cdot(1/% \varrho T^{2})+\epsilon/(6\varrho T)}\right)2 ⋅ exp ( - divide start_ARG ( italic_ϵ / 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_T ⋅ ( 1 / italic_ϱ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_ϵ / ( 6 italic_ϱ italic_T ) end_ARG )
<\displaystyle<< 2exp(ϵ2ϱT9).2expsuperscriptitalic-ϵ2italic-ϱ𝑇9\displaystyle\ \ 2\cdot\textbf{{exp}}\left(-\frac{\epsilon^{2}\varrho T}{9}% \right).2 ⋅ exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ italic_T end_ARG start_ARG 9 end_ARG ) .

When

ϱ9ln(δ/4)ϵ2T,italic-ϱ9ln𝛿4superscriptitalic-ϵ2𝑇\varrho\geq 9\cdot\frac{\textbf{{ln}}(\delta/4)}{\epsilon^{2}T},italic_ϱ ≥ 9 ⋅ divide start_ARG ln ( italic_δ / 4 ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , (111)

the (110) is further bounded by δ/2𝛿2\delta/2italic_δ / 2.

To prove (108b), we observe that

BT(𝓒)=d𝓒(𝑪T)|𝑪T|ϱT,subscript𝐵𝑇superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑇subscriptsuperscript𝑪𝑇italic-ϱ𝑇B_{T}(\boldsymbol{\mathcal{C}^{\prime}})=d_{\boldsymbol{\mathcal{C}^{\prime}}}% (\boldsymbol{C}^{\prime}_{T})\cdot\frac{|\boldsymbol{C}^{\prime}_{T}|}{\varrho T},italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ⋅ divide start_ARG | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_ARG start_ARG italic_ϱ italic_T end_ARG , (112)

and each pairwise comparison is selected by Bernoulli method with probability ϱitalic-ϱ\varrhoitalic_ϱ, independently of other pairwise comparison. The size of 𝑪tsubscriptsuperscript𝑪𝑡\boldsymbol{C}^{\prime}_{t}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT equals to the expectation of binaomial distribution Binaomial(T,ϱ)Binaomial𝑇italic-ϱ\textbf{{Binaomial}}(T,\varrho)Binaomial ( italic_T , italic_ϱ ), regardless of the adversary’s strategy. Applying the Chernoff inequality with δ=ϵ/2𝛿italic-ϵ2\delta=\epsilon/2italic_δ = italic_ϵ / 2, we have

(||𝑪T|ϱT|ϵϱT2)subscriptsuperscript𝑪𝑇italic-ϱ𝑇italic-ϵitalic-ϱ𝑇2\displaystyle\ \ \mathbb{P}\left(\big{|}|\boldsymbol{C}^{\prime}_{T}|-\varrho T% \big{|}\geq\frac{\epsilon\varrho T}{2}\right)blackboard_P ( | | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | - italic_ϱ italic_T | ≥ divide start_ARG italic_ϵ italic_ϱ italic_T end_ARG start_ARG 2 end_ARG ) (113)
\displaystyle\leq 2exp((ϵ/2)2ϱT2+ϵ/3)2expsuperscriptitalic-ϵ22italic-ϱ𝑇2italic-ϵ3\displaystyle\ \ 2\cdot\textbf{{exp}}\left(-\frac{(\epsilon/2)^{2}\varrho T}{2% +\epsilon/3}\right)2 ⋅ exp ( - divide start_ARG ( italic_ϵ / 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ italic_T end_ARG start_ARG 2 + italic_ϵ / 3 end_ARG )
<\displaystyle<< 2exp(ϵ2ϱT10).2expsuperscriptitalic-ϵ2italic-ϱ𝑇10\displaystyle\ \ 2\cdot\textbf{{exp}}\left(-\frac{\epsilon^{2}\varrho T}{10}% \right).2 ⋅ exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ italic_T end_ARG start_ARG 10 end_ARG ) .

When

ϱ10ln(δ/4)ϵ2T,italic-ϱ10ln𝛿4superscriptitalic-ϵ2𝑇\varrho\geq 10\cdot\frac{\textbf{{ln}}(\delta/4)}{\epsilon^{2}T},italic_ϱ ≥ 10 ⋅ divide start_ARG ln ( italic_δ / 4 ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T end_ARG , (114)

the above inequality is further bounded by δ/2𝛿2\delta/2italic_δ / 2. Conditioning on this event (||𝑪t|ϱT|ϵϱT/2subscriptsuperscript𝑪𝑡italic-ϱ𝑇italic-ϵitalic-ϱ𝑇2||\boldsymbol{C}^{\prime}_{t}|-\varrho T|\geq\epsilon\varrho T/2| | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | - italic_ϱ italic_T | ≥ italic_ϵ italic_ϱ italic_T / 2), we have

|d𝓒(𝑪T)BT(𝓒)|subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑇subscript𝐵𝑇superscript𝓒bold-′\displaystyle\ \ \big{|}d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{% \prime}_{T})-B_{T}(\boldsymbol{\mathcal{C}^{\prime}})\big{|}| italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | (115)
=\displaystyle== |1|𝑪T|ϱT|d𝓒(𝑪T)1subscriptsuperscript𝑪𝑇italic-ϱ𝑇subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑇\displaystyle\ \ \left|1-\frac{|\boldsymbol{C}^{\prime}_{T}|}{\varrho T}\right% |\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime}_{T})| 1 - divide start_ARG | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_ARG start_ARG italic_ϱ italic_T end_ARG | ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT )
\displaystyle\leq |1|𝑪T|ϱT|ϵ2,1subscriptsuperscript𝑪𝑇italic-ϱ𝑇italic-ϵ2\displaystyle\ \ \left|1-\frac{|\boldsymbol{C}^{\prime}_{T}|}{\varrho T}\right% |\leq\frac{\epsilon}{2},| 1 - divide start_ARG | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_ARG start_ARG italic_ϱ italic_T end_ARG | ≤ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ,

where the first inequality follows the fact d𝓒(𝑪T)subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑇d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime}_{T})italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) is always bounded by 1111, and the second inequality follows form the condition ||𝑪t|ϱT|ϵϱT/2subscriptsuperscript𝑪𝑡italic-ϱ𝑇italic-ϵitalic-ϱ𝑇2||\boldsymbol{C}^{\prime}_{t}|-\varrho T|\geq\epsilon\varrho T/2| | bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | - italic_ϱ italic_T | ≥ italic_ϵ italic_ϱ italic_T / 2. We complete the proof of (108b).

Indeed, taking a union bound over (108a) and (108b), applying the triangle inequality and observing that AT(𝓒)=d𝓒(𝑪T)subscript𝐴𝑇superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑇A_{T}(\boldsymbol{\mathcal{C}^{\prime}})=d_{\boldsymbol{\mathcal{C}^{\prime}}}% (\boldsymbol{C}^{\prime}_{T})italic_A start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), we obtain the desired conclusion (99). ∎

Lemma 3.

For any dynamic stream 𝐂={ct}t=1𝐂superscriptsubscriptsubscript𝑐𝑡𝑡1\boldsymbol{C}=\{c_{t}\}_{t=1}^{\infty}bold_italic_C = { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT from 𝓒superscript𝓒\boldsymbol{\mathcal{C}}^{\prime}bold_caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, if the parameter of reservoir method ϱitalic-ϱ\varrhoitalic_ϱ holds that

ϱ2ln(2/δ)ϵ2,italic-ϱ2ln2𝛿superscriptitalic-ϵ2\varrho\geq 2\cdot\frac{\textbf{{ln}}(2/\delta)}{\epsilon^{2}},italic_ϱ ≥ 2 ⋅ divide start_ARG ln ( 2 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (116)

we have

(|d𝓒(𝑪)d𝓒(𝑪)|ϵ)δ,subscript𝑑superscript𝓒bold-′𝑪subscript𝑑superscript𝓒bold-′superscript𝑪italic-ϵ𝛿\mathbb{P}(|d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C})-d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime})|\geq\epsilon)\leq\delta,blackboard_P ( | italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ) ≤ italic_δ , (117)

where 𝐂superscript𝐂\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a sequence which is sampled from 𝐂𝐂\boldsymbol{C}bold_italic_C by the reservoir method.

Proof.

Generally speaking, the proof of this lemma goes along the same lines as Lemma 2, except that adopting the other martingale. Specifically, we define

At(𝓒)subscript𝐴𝑡superscript𝓒bold-′\displaystyle A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== td𝓒(𝑪t)=|𝓒𝑪t|,𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡superscript𝓒bold-′subscript𝑪𝑡\displaystyle\ \ t\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{% t})=|\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}_{t}|,italic_t ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | , (118)
Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== td𝓒(𝑪t)=tϱ|𝓒𝑪t|,𝑡subscript𝑑superscript𝓒bold-′subscriptsuperscript𝑪𝑡𝑡italic-ϱsuperscript𝓒bold-′subscriptsuperscript𝑪𝑡\displaystyle\ \ t\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{% \prime}_{t})=\frac{t}{\varrho}|\boldsymbol{\mathcal{C}^{\prime}}\cap% \boldsymbol{C}^{\prime}_{t}|,italic_t ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ,
Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′\displaystyle Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== Bt(𝓒)At(𝓒),subscript𝐵𝑡superscript𝓒bold-′subscript𝐴𝑡superscript𝓒bold-′\displaystyle\ \ B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-A_{t}({\boldsymbol% {\mathcal{C}^{\prime}}}),italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ,

for t(ϱ,T]𝑡italic-ϱ𝑇t\in(\varrho,T]italic_t ∈ ( italic_ϱ , italic_T ]. When t<ϱ𝑡italic-ϱt<\varrhoitalic_t < italic_ϱ, we define

At(𝓒)=Bt(𝓒)=|𝓒𝑪t|.subscript𝐴𝑡superscript𝓒bold-′subscript𝐵𝑡superscript𝓒bold-′superscript𝓒bold-′subscript𝑪𝑡A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})=B_{t}({\boldsymbol{\mathcal{C}^{% \prime}}})=|\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}_{t}|.italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | . (119)

The next step is similar to Lemma 2. We first show that (Z0(𝓒),,ZT(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{T}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) is a martingale. Notice that (Z0(𝓒),,ZT(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{T}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) is obviously a martingale for tϱ𝑡italic-ϱt\leq\varrhoitalic_t ≤ italic_ϱ. When t>ϱ𝑡italic-ϱt>\varrhoitalic_t > italic_ϱ, the 𝑪t1subscriptsuperscript𝑪𝑡1\boldsymbol{C}^{\prime}_{t-1}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is fixed and thence the values of Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) are fixed. Let ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the next pairwise comparison for the reservoir sampling method, which could be either from the original data source or the adversarial data source controlled by the adversary. It is easy to check that

At(𝓒)={At1(𝓒),ct𝓒,At1(𝓒)+1.ct𝓒.A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})=\left\{\begin{matrix}A_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}}),&c_{t}\notin{\boldsymbol{\mathcal{C}^{% \prime}}},\\ A_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+1.&c_{t}\in{\boldsymbol{\mathcal{% C}^{\prime}}}.\end{matrix}\right.italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = { start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + 1 . end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARG (120)

For Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ), we consider the three factors:

  • i)

    is ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT or not?

  • ii)

    is ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT sampled or not?

  • iii)

    conditioning on ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT being sampled, does it replace an element rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from 𝓒superscript𝓒bold-′{\boldsymbol{\mathcal{C}^{\prime}}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT in the sample, or rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT not in 𝓒superscript𝓒bold-′{\boldsymbol{\mathcal{C}^{\prime}}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT?

  • Case 1.

    When ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT is either not sampled, or sampled but with rt𝓒subscript𝑟𝑡superscript𝓒bold-′r_{t}\notin{\boldsymbol{\mathcal{C}^{\prime}}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, the pairwise comparisons from 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT are neither added nor removed into the cache of the reservoir method. Consequently, we have

    𝓒𝑪t=𝓒𝑪t1.superscript𝓒bold-′subscript𝑪𝑡superscript𝓒bold-′subscript𝑪𝑡1\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}_{t}=\boldsymbol{\mathcal{C% }^{\prime}}\cap\boldsymbol{C}_{t-1}.bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT . (121)

    By the definition of Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT )

    Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== tϱ|𝓒𝑪t|𝑡italic-ϱsuperscript𝓒bold-′subscript𝑪𝑡\displaystyle\ \ \frac{t}{\varrho}\cdot|\boldsymbol{\mathcal{C}^{\prime}}\cap% \boldsymbol{C}_{t}|divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | (122)
    =\displaystyle== t1ϱ|𝓒𝑪t1|+1ϱ|𝓒𝑪t1|𝑡1italic-ϱsuperscript𝓒bold-′subscript𝑪𝑡11italic-ϱsuperscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \frac{t-1}{\varrho}\cdot|\boldsymbol{\mathcal{C}^{\prime}}% \cap\boldsymbol{C}_{t-1}|+\frac{1}{\varrho}\cdot|\boldsymbol{\mathcal{C}^{% \prime}}\cap\boldsymbol{C}_{t-1}|divide start_ARG italic_t - 1 end_ARG start_ARG italic_ϱ end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | + divide start_ARG 1 end_ARG start_ARG italic_ϱ end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT |
    =\displaystyle== Bt1(𝓒)+d𝓒(𝑪t1),subscript𝐵𝑡1superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+d_{\boldsymbol{% \mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1}),italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ,

    where the third equality stands since the sampled pairwise comparisons of 𝑪t1subscript𝑪𝑡1\boldsymbol{C}_{t-1}bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is |𝑪t1|=ϱsubscript𝑪𝑡1italic-ϱ|\boldsymbol{C}_{t-1}|=\varrho| bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | = italic_ϱ when t>ϱ𝑡italic-ϱt>\varrhoitalic_t > italic_ϱ. Therefore conditioned on ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, the expectation of Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) is

    𝔼[Bt(𝓒)|ct𝓒]𝔼delimited-[]conditionalsubscript𝐵𝑡superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′\displaystyle\ \ \mathbb{E}[B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})|c_{t}% \notin{\boldsymbol{\mathcal{C}^{\prime}}}]blackboard_E [ italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] (123)
    =\displaystyle== (1ϱtd𝓒(𝑪t1))(Bt1(𝓒)+d𝓒(𝑪t1))+ϱtd𝓒(𝑪t1)(Bt1(𝓒)+d𝓒(𝑪t1)tϱ)1italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscript𝐵𝑡1superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscript𝐵𝑡1superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1𝑡italic-ϱ\displaystyle\ \ \left(1-\frac{\varrho}{t}d_{\boldsymbol{\mathcal{C}^{\prime}}% }(\boldsymbol{C}_{t-1})\right)\cdot\Big{(}B_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})+d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\Big{)}+% \frac{\varrho}{t}d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})% \left(B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+d_{\boldsymbol{\mathcal{C}^% {\prime}}}(\boldsymbol{C}_{t-1})-\frac{t}{\varrho}\right)( 1 - divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ⋅ ( italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) + divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ( italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG )
    =\displaystyle== Bt1(𝓒).subscript𝐵𝑡1superscript𝓒bold-′\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}).italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) .

    Moreover, At(𝓒)=At1(𝓒)subscript𝐴𝑡superscript𝓒bold-′subscript𝐴𝑡1superscript𝓒bold-′A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})=A_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) when ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT and we deduce that

    𝔼[Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒]=Zt1(𝓒).𝔼delimited-[]conditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′\mathbb{E}\big{[}\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ \big{|}\ Z_{0}(% {\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}}),\ c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}\ \big{]}=Z_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}}).blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) . (124)
  • Case 2.

    Now ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT. When ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is neither added nor removed into the cache of the reservoir method, we have |𝑪t|=|𝑪t1|subscript𝑪𝑡subscript𝑪𝑡1|\boldsymbol{C}_{t}|=|\boldsymbol{C}_{t-1}|| bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | = | bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | and Bt(𝓒)=Bt1(𝓒)+d𝓒(𝑪t1)subscript𝐵𝑡superscript𝓒bold-′subscript𝐵𝑡1superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})=B_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})+d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). If ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into the cache and the replaced element rt𝓒subscript𝑟𝑡superscript𝓒bold-′r_{t}\notin{\boldsymbol{\mathcal{C}^{\prime}}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, which has probability (ϱ/t)(1d𝓒(𝑪t1))italic-ϱ𝑡1subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1(\varrho/t)\cdot(1-d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1}))( italic_ϱ / italic_t ) ⋅ ( 1 - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ), we have

    |𝓒𝑪t|superscript𝓒bold-′subscript𝑪𝑡\displaystyle|\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}_{t}|| bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | =\displaystyle== |𝓒𝑪t1|+1superscript𝓒bold-′subscript𝑪𝑡11\displaystyle\ \ |\boldsymbol{\mathcal{C}^{\prime}}\cap\boldsymbol{C}_{t-1}|+1| bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | + 1 (125)
    Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== tϱ|𝓒𝑪t|𝑡italic-ϱsuperscript𝓒bold-′subscript𝑪𝑡\displaystyle\ \ \frac{t}{\varrho}\cdot|\boldsymbol{\mathcal{C}^{\prime}}\cap% \boldsymbol{C}_{t}|divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT |
    =\displaystyle== tϱ|𝓒𝑪t1|+tϱ𝑡italic-ϱsuperscript𝓒bold-′subscript𝑪𝑡1𝑡italic-ϱ\displaystyle\ \ \frac{t}{\varrho}\cdot|\boldsymbol{\mathcal{C}^{\prime}}\cap% \boldsymbol{C}_{t-1}|+\frac{t}{\varrho}divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∩ bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | + divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG
    =\displaystyle== Bt1(𝓒)+d𝓒(𝑪t1)+tϱ.subscript𝐵𝑡1superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1𝑡italic-ϱ\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+d_{\boldsymbol{% \mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})+\frac{t}{\varrho}.italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG .

    Then the expectation of Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) conditioned on ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT is

    𝔼[Bt(𝓒)|ct𝓒]𝔼delimited-[]conditionalsubscript𝐵𝑡superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′\displaystyle\ \ \mathbb{E}[B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})|c_{t}% \in{\boldsymbol{\mathcal{C}^{\prime}}}]blackboard_E [ italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] (126)
    =\displaystyle== Bt1(𝓒)+d𝓒(𝑪t1)+(ϱt(1d𝓒(𝑪t1))tϱ\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+d_{\boldsymbol{% \mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})+\left(\frac{\varrho}{t}\cdot(1-d_% {\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\right)\cdot\frac{t}{\varrho}italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + ( divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG ⋅ ( 1 - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ⋅ divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG
    =\displaystyle== Bt1(𝓒)+1.subscript𝐵𝑡1superscript𝓒bold-′1\displaystyle\ \ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+1.italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + 1 .

    Furthermore, with the definition of At(𝓒)subscript𝐴𝑡superscript𝓒bold-′A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) when ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in{\boldsymbol{\mathcal{C}^{\prime}}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we know that

    𝔼[Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒]=Zt1(𝓒).𝔼delimited-[]conditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′\mathbb{E}\big{[}\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ \big{|}\ Z_{0}(% {\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}}),\ c_{t}\in\boldsymbol{\mathcal{C}^{\prime}}\ \big{]}=Z_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}}).blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) . (127)

    The analysis of the above two cases implies that (Z0(𝓒),,Zt(𝓒))subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡superscript𝓒bold-′(Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t}({\boldsymbol{\mathcal{% C}^{\prime}}}))( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) is indeed a martingale.

    The second part of proof is to obtain the bounds on the difference |Zt(𝓒)Zt1(𝓒)|subscript𝑍𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′|Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})|| italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | and the variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) given Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ). With the above analysis, we know that

    At(𝓒)subscript𝐴𝑡superscript𝓒bold-′\displaystyle A_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) =\displaystyle== {At1(𝓒),ct𝓒,At1(𝓒)+1,ct𝓒.\displaystyle\ \ \left\{\begin{matrix}A_{t-1}({\boldsymbol{\mathcal{C}^{\prime% }}}),&c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}},\\[5.0pt] A_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+1,&c_{t}\in\boldsymbol{\mathcal{C% }^{\prime}}.\end{matrix}\right.{ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + 1 , end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARG (128)
    Bt(𝓒)subscript𝐵𝑡superscript𝓒bold-′\displaystyle B_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) \displaystyle\in {[Bt1(𝓒),Bt1(𝓒)+1],ct𝓒,[Bt1(𝓒),Bt1(𝓒)+1+tϱ],ct𝓒.\displaystyle\ \ \left\{\begin{matrix}\displaystyle\left[B_{t-1}({\boldsymbol{% \mathcal{C}^{\prime}}}),\ B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}})+1\right% ],&c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}},\\[5.0pt] \displaystyle\left[B_{t-1}({\boldsymbol{\mathcal{C}^{\prime}}}),\ B_{t-1}({% \boldsymbol{\mathcal{C}^{\prime}}})+1+\frac{t}{\varrho}\right],&c_{t}\in% \boldsymbol{\mathcal{C}^{\prime}}.\end{matrix}\right.{ start_ARG start_ROW start_CELL [ italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + 1 ] , end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL [ italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_B start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) + 1 + divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ] , end_CELL start_CELL italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARG

    By the definition of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ), we conclude that

    |Zt(𝓒)Zt1(𝓒)|tϱ.subscript𝑍𝑡superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′𝑡italic-ϱ|Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-Z_{t-1}({\boldsymbol{\mathcal{C}^{% \prime}}})|\leq\frac{t}{\varrho}.| italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≤ divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG . (129)

We next bound the variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) conditioned on Z0(𝓒),,Zt1(𝓒)subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol{\mathcal% {C}^{\prime}}})italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) and d𝓒(𝑪t1)subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). When ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, with probability (ϱ/t)d𝓒(𝑪t1)italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1(\varrho/t)\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})( italic_ϱ / italic_t ) ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), it holds that

𝔼[Zt(𝓒)]Zt(𝓒)=tϱd𝓒(𝑪t1).𝔼delimited-[]subscript𝑍𝑡superscript𝓒bold-′subscript𝑍𝑡superscript𝓒bold-′𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡1\mathbb{E}[Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})]-Z_{t}({\boldsymbol{% \mathcal{C}^{\prime}}})=\frac{t}{\varrho}-d_{\boldsymbol{\mathcal{C}^{\prime}}% }(\boldsymbol{C}_{t-1}).blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ] - italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) . (130)

Otherwise, with probability 1(ϱ/t)d𝓒(𝑪t1)1italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡11-(\varrho/t)\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})1 - ( italic_ϱ / italic_t ) ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), we have

Zt(𝓒)𝔼[Zt(𝓒)]=d𝓒(𝑪t1).subscript𝑍𝑡superscript𝓒bold-′𝔼delimited-[]subscript𝑍𝑡superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-\mathbb{E}[Z_{t}({\boldsymbol{% \mathcal{C}^{\prime}}})]=d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_% {t-1}).italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ] = italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) . (131)

Therefore,

Var(Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒,d𝓒(𝑪t1))Varconditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \textbf{{Var}}(\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ % \big{|}\ Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol% {\mathcal{C}^{\prime}}}),\ c_{t}\notin\boldsymbol{\mathcal{C}^{\prime}},d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\ )Var ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) (132)
=\displaystyle== ϱtd𝓒(𝑪t1)(tϱd𝓒(𝑪t1))2+(1ϱtd𝓒(𝑪t1))d𝓒2(𝑪t1)italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1superscript𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡121italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscriptsuperscript𝑑2superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \frac{\varrho}{t}\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(% \boldsymbol{C}_{t-1})\cdot\left(\frac{t}{\varrho}-d_{\boldsymbol{\mathcal{C}^{% \prime}}}(\boldsymbol{C}_{t-1})\right)^{2}+\left(1-\frac{\varrho}{t}\cdot d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\right)\cdot d^{2}_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ⋅ ( divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ⋅ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
=\displaystyle== tϱd𝓒(𝑪t1)d𝓒2(𝑪t1)𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscriptsuperscript𝑑2superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \frac{t}{\varrho}\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(% \boldsymbol{C}_{t-1})-d^{2}_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}% _{t-1})divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
\displaystyle\leq tϱ.𝑡italic-ϱ\displaystyle\ \ \frac{t}{\varrho}.divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG .

When ct𝓒subscript𝑐𝑡superscript𝓒bold-′c_{t}\in\boldsymbol{\mathcal{C}^{\prime}}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, with probability (ϱ/t)d𝓒(𝑪t1)italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1(\varrho/t)\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})( italic_ϱ / italic_t ) ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), it holds that

Zt(𝓒)𝔼[Zt(𝓒)]=tϱ+d𝓒(𝑪t1)1.subscript𝑍𝑡superscript𝓒bold-′𝔼delimited-[]subscript𝑍𝑡superscript𝓒bold-′𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡11Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})-\mathbb{E}[Z_{t}({\boldsymbol{% \mathcal{C}^{\prime}}})]=\frac{t}{\varrho}+d_{\boldsymbol{\mathcal{C}^{\prime}% }}(\boldsymbol{C}_{t-1})-1.italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ] = divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG + italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - 1 . (133)

Otherwise, with probability 1(ϱ/t)d𝓒(𝑪t1)1italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡11-(\varrho/t)\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})1 - ( italic_ϱ / italic_t ) ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), we have

𝔼[Zt(𝓒)]Zt(𝓒)=1d𝓒(𝑪t1).𝔼delimited-[]subscript𝑍𝑡superscript𝓒bold-′subscript𝑍𝑡superscript𝓒bold-′1subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1\mathbb{E}[Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})]-Z_{t}({\boldsymbol{% \mathcal{C}^{\prime}}})=1-d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}% _{t-1}).blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ] - italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = 1 - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) . (134)

Thus,

Var(Zt(𝓒)|Z0(𝓒),,Zt1(𝓒),ct𝓒,d𝓒(𝑪t1))Varconditionalsubscript𝑍𝑡superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑡1superscript𝓒bold-′subscript𝑐𝑡superscript𝓒bold-′subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \textbf{{Var}}(\ Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})\ % \big{|}\ Z_{0}({\boldsymbol{\mathcal{C}^{\prime}}}),\dots,Z_{t-1}({\boldsymbol% {\mathcal{C}^{\prime}}}),\ c_{t}\in\boldsymbol{\mathcal{C}^{\prime}},d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\ )Var ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) (135)
=\displaystyle== ϱtd𝓒(𝑪t1)(tϱd𝓒(𝑪t1))2+(1ϱtd𝓒(𝑪t1))d𝓒2(𝑪t1)italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1superscript𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡121italic-ϱ𝑡subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscriptsuperscript𝑑2superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \frac{\varrho}{t}\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(% \boldsymbol{C}_{t-1})\cdot\left(\frac{t}{\varrho}-d_{\boldsymbol{\mathcal{C}^{% \prime}}}(\boldsymbol{C}_{t-1})\right)^{2}+\left(1-\frac{\varrho}{t}\cdot d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})\right)\cdot d^{2}_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ⋅ ( divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - divide start_ARG italic_ϱ end_ARG start_ARG italic_t end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) ⋅ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
=\displaystyle== tϱd𝓒(𝑪t1)d𝓒2(𝑪t1)𝑡italic-ϱsubscript𝑑superscript𝓒bold-′subscript𝑪𝑡1subscriptsuperscript𝑑2superscript𝓒bold-′subscript𝑪𝑡1\displaystyle\ \ \frac{t}{\varrho}\cdot d_{\boldsymbol{\mathcal{C}^{\prime}}}(% \boldsymbol{C}_{t-1})-d^{2}_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}% _{t-1})divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG ⋅ italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT )
\displaystyle\leq tϱ.𝑡italic-ϱ\displaystyle\ \ \frac{t}{\varrho}.divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG .

(132) and (135) indicate that the conditional variance of Zt(𝓒)subscript𝑍𝑡superscript𝓒bold-′Z_{t}({\boldsymbol{\mathcal{C}^{\prime}}})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) is bounded by t/ϱ𝑡italic-ϱt/\varrhoitalic_t / italic_ϱ. Moreover, the bound remains intact when we remove the condition on d𝓒(𝑪t1)subscript𝑑superscript𝓒bold-′subscript𝑪𝑡1d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}_{t-1})italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ).

Now we come to the conclusion of the whole lemma. Observe that

(|d𝓒(𝑪)d𝓒(𝑪)|ϵ)subscript𝑑superscript𝓒bold-′𝑪subscript𝑑superscript𝓒bold-′superscript𝑪italic-ϵ\displaystyle\ \ \mathbb{P}(|d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol% {C})-d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime})|\geq\epsilon)blackboard_P ( | italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ) (136)
=\displaystyle== (|BT(𝓒)AT(𝓒)|ϵT)subscript𝐵𝑇superscript𝓒bold-′subscript𝐴𝑇superscript𝓒bold-′italic-ϵ𝑇\displaystyle\ \ \mathbb{P}(|B_{T}(\boldsymbol{\mathcal{C}^{\prime}})-A_{T}(% \boldsymbol{\mathcal{C}^{\prime}})|\geq\epsilon\cdot T)blackboard_P ( | italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_A start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ⋅ italic_T )
=\displaystyle== (|ZT(𝓒)Z0(𝓒)|ϵT).subscript𝑍𝑇superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′italic-ϵ𝑇\displaystyle\ \ \mathbb{P}(|Z_{T}(\boldsymbol{\mathcal{C}^{\prime}})-Z_{0}(% \boldsymbol{\mathcal{C}^{\prime}})|\geq\epsilon\cdot T).blackboard_P ( | italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ⋅ italic_T ) .

Then we apply Lemma 1 on the martingale Z(𝓒)=(Z0(𝓒),,ZT(𝓒))𝑍superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′subscript𝑍𝑇superscript𝓒bold-′Z(\boldsymbol{\mathcal{C}^{\prime}})=(Z_{0}(\boldsymbol{\mathcal{C}^{\prime}})% ,\dots,Z_{T}(\boldsymbol{\mathcal{C}^{\prime}}))italic_Z ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) = ( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ) with λ=ϵT𝜆italic-ϵ𝑇\lambda=\epsilon Titalic_λ = italic_ϵ italic_T, σt2=t/ϱsubscriptsuperscript𝜎2𝑡𝑡italic-ϱ\sigma^{2}_{t}=t/\varrhoitalic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t / italic_ϱ for tϱ𝑡italic-ϱt\geq\varrhoitalic_t ≥ italic_ϱ and σt2=0subscriptsuperscript𝜎2𝑡0\sigma^{2}_{t}=0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 for tϱ𝑡italic-ϱt\geq\varrhoitalic_t ≥ italic_ϱ, and M=T/ϱ𝑀𝑇italic-ϱM=T/\varrhoitalic_M = italic_T / italic_ϱ

(|ZT(𝓒)Z0(𝓒)|ϵT)subscript𝑍𝑇superscript𝓒bold-′subscript𝑍0superscript𝓒bold-′italic-ϵ𝑇\displaystyle\ \ \mathbb{P}(|Z_{T}(\boldsymbol{\mathcal{C}^{\prime}})-Z_{0}(% \boldsymbol{\mathcal{C}^{\prime}})|\geq\epsilon\cdot T)blackboard_P ( | italic_Z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) - italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ⋅ italic_T ) (137)
\displaystyle\leq 2exp(λ22t=1Tσt2+λM3)2expsuperscript𝜆22superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎2𝑡𝜆𝑀3\displaystyle\ \ 2\textbf{{exp}}\left(-\frac{\displaystyle\lambda^{2}}{% \displaystyle 2\sum_{t=1}^{T}\sigma^{2}_{t}+\frac{\lambda M}{3}}\right)2 exp ( - divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_λ italic_M end_ARG start_ARG 3 end_ARG end_ARG )
=\displaystyle== 2exp(ϵ2T22t=1Ttϱ+ϵT23ϱ)2expsuperscriptitalic-ϵ2superscript𝑇22superscriptsubscript𝑡1𝑇𝑡italic-ϱitalic-ϵsuperscript𝑇23italic-ϱ\displaystyle\ \ 2\textbf{{exp}}\left(-\frac{\epsilon^{2}T^{2}}{\displaystyle 2% \sum_{t=1}^{T}\frac{t}{\varrho}+\frac{\epsilon T^{2}}{3\varrho}}\right)2 exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_t end_ARG start_ARG italic_ϱ end_ARG + divide start_ARG italic_ϵ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 3 italic_ϱ end_ARG end_ARG )
=\displaystyle== 2exp(ϵ2T2ϱT(T+1)+ϵT23)2expsuperscriptitalic-ϵ2superscript𝑇2italic-ϱ𝑇𝑇1italic-ϵsuperscript𝑇23\displaystyle\ \ 2\textbf{{exp}}\left(-\frac{\epsilon^{2}T^{2}\varrho}{% \displaystyle T(T+1)+\frac{\epsilon T^{2}}{3}}\right)2 exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ end_ARG start_ARG italic_T ( italic_T + 1 ) + divide start_ARG italic_ϵ italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 3 end_ARG end_ARG )
\displaystyle\leq 2exp(ϵ2T2ϱ2T2)=2exp(ϵ2ϱ2).2expsuperscriptitalic-ϵ2superscript𝑇2italic-ϱ2superscript𝑇22expsuperscriptitalic-ϵ2italic-ϱ2\displaystyle\ \ 2\textbf{{exp}}\left(-\frac{\epsilon^{2}T^{2}\varrho}{2T^{2}}% \right)=2\textbf{{exp}}\left(-\frac{\epsilon^{2}\varrho}{2}\right).2 exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ end_ARG start_ARG 2 italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) = 2 exp ( - divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϱ end_ARG start_ARG 2 end_ARG ) .

Therefore, it suffices to hold

ϱ2ϵ2ln(2δ)italic-ϱ2superscriptitalic-ϵ2ln2𝛿\varrho\geq\frac{2}{\epsilon^{2}}\textbf{{ln}}\ \left(\frac{2}{\delta}\right)italic_ϱ ≥ divide start_ARG 2 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ln ( divide start_ARG 2 end_ARG start_ARG italic_δ end_ARG ) (138)

and get the desired result. ∎

See 2

Proof.

For i)i)italic_i ), the results of static case have been discussed by [47, 34, 48]. Notice that logn(n1)log𝑛𝑛1\textbf{{log}}\ n(n-1)log italic_n ( italic_n - 1 ) is the VC-dimension of 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT. For ii)ii)italic_i italic_i ), we start with the Bernoulli sampling method. For any dynamic stream 𝑪𝑪\boldsymbol{C}bold_italic_C form 𝓒superscript𝓒bold-′\boldsymbol{\mathcal{C}^{\prime}}bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we apply the first part of Lemma 2 with ϵitalic-ϵ\epsilonitalic_ϵ and |𝓒|superscript𝓒bold-′|\boldsymbol{\mathcal{C}^{\prime}}|| bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT |

P(|d𝓒(𝑪)d𝓒(𝑪)|ϵ)δ|𝓒|,𝑃subscript𝑑superscript𝓒bold-′𝑪subscript𝑑superscript𝓒bold-′superscript𝑪italic-ϵ𝛿superscript𝓒bold-′P\Big{(}\big{|}d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C})-d_{% \boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime})\big{|}\geq\epsilon% \Big{)}\leq\frac{\delta}{|\boldsymbol{\mathcal{C}^{\prime}}|},italic_P ( | italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≥ italic_ϵ ) ≤ divide start_ARG italic_δ end_ARG start_ARG | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT | end_ARG , (139)

where 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the sampled sequence by the Bernoulli sampling method. In the event

|d𝓒(𝑪)d𝓒(𝑪)|ϵ,𝑪𝓒,formulae-sequencesubscript𝑑superscript𝓒bold-′𝑪subscript𝑑superscript𝓒bold-′superscript𝑪italic-ϵfor-all𝑪superscript𝓒bold-′\big{|}d_{\boldsymbol{\mathcal{C}^{\prime}}}(\boldsymbol{C})-d_{\boldsymbol{% \mathcal{C}^{\prime}}}(\boldsymbol{C}^{\prime})\big{|}\leq\epsilon,\ \ \forall% \ \boldsymbol{C}\in\boldsymbol{\mathcal{C}^{\prime}},| italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C ) - italic_d start_POSTSUBSCRIPT bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_ϵ , ∀ bold_italic_C ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , (140)

by definition we know that 𝑪superscript𝑪\boldsymbol{C}^{\prime}bold_italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is an ϵitalic-ϵ\epsilonitalic_ϵ-approximation of 𝑪𝑪\boldsymbol{C}bold_italic_C. Taking a union bound over all 𝑪𝓒superscript𝑪bold-′superscript𝓒bold-′\boldsymbol{C^{\prime}}\in\boldsymbol{\mathcal{C}^{\prime}}bold_italic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ∈ bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, we conclude that the probability of this event not to hold is bounded by

δ|𝓒||𝓒|=δ,𝛿superscript𝓒bold-′superscript𝓒bold-′𝛿\frac{\delta}{|\boldsymbol{\mathcal{C}^{\prime}}|}\cdot|\boldsymbol{\mathcal{C% }^{\prime}}|=\delta,divide start_ARG italic_δ end_ARG start_ARG | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT | end_ARG ⋅ | bold_caligraphic_C start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT | = italic_δ , (141)

meaning that Bernoulli method with ϱitalic-ϱ\varrhoitalic_ϱ as above is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-representative.

The proof for reservoir method is identical, except that we apple Lemma 3. ∎

Asymptotic Optimality of the Proposed Policy with Complete Knowledge

We discuss the asymptotic optimality of the proposed stoo** time and generation rule with the complete knowledge. The assumption for the theoretical analysis is described. First there exist some regularity conditions on the prior distribution ρ𝜽subscript𝜌superscript𝜽\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Without loss of generality, we could set θ1=0subscriptsuperscript𝜃10\theta^{\prime}_{1}=0italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 and the unknown model parameter satisfies 𝜽=[θ2,,θn]n1𝜽subscript𝜃2subscript𝜃𝑛superscript𝑛1\boldsymbol{\theta}=[\theta_{2},\dots,\theta_{n}]\in\mathbb{R}^{n-1}bold_italic_θ = [ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT. The following assumptions have been applied in the sequential design for rank aggregation [16]. It is noteworthy that the following assumptions are mainly for the theoretical analysis. The proposed sequential manipulation method does not depend on the condition of Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) or the probability mass function g𝑔gitalic_g.

Assumption 1.

The support

Supp(ρ𝜽)={𝜽n1|ρ𝜽(𝜽)>0}¯Suppsubscript𝜌superscript𝜽¯conditional-set𝜽superscript𝑛1subscript𝜌superscript𝜽𝜽0\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})=\overline{\left\{\ % \boldsymbol{\theta}\in\mathbb{R}^{n-1}\ \Bigg{|}\ \rho_{\boldsymbol{\theta}^{% \prime}}(\boldsymbol{\theta})>0\ \right\}}Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = over¯ start_ARG { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT | italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > 0 } end_ARG (142)

is a compact set, where {}¯¯\overline{\{\cdot\}}over¯ start_ARG { ⋅ } end_ARG denotes the closeure of {}\{\cdot\}{ ⋅ }. Besides, for any full ranking list 𝛑0subscript𝛑0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT which does not include the candidates belong to θ1subscriptsuperscript𝜃1\theta^{\prime}_{1}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,

{{𝜽n1|𝝅(𝜽)=𝝅0}Supp(ρ𝜽)},superscriptconditional-set𝜽superscript𝑛1𝝅𝜽subscript𝝅0Suppsubscript𝜌superscript𝜽\left\{\left\{\ \boldsymbol{\theta}\in\mathbb{R}^{n-1}\ \Bigg{|}\ \boldsymbol{% \pi}(\boldsymbol{\theta})=\boldsymbol{\pi}_{0}\right\}\bigcap\textbf{{Supp}}(% \rho_{\boldsymbol{\theta}^{\prime}})\right\}^{\circ}\neq\varnothing,{ { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT | bold_italic_π ( bold_italic_θ ) = bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } ⋂ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ≠ ∅ , (143)

where {}superscript\{\cdot\}^{\circ}{ ⋅ } start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT represents the interior of {}\{\cdot\}{ ⋅ }.

This assumption assigns the bounded support Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) to the prior distribution ρ𝜽subscript𝜌superscript𝜽\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. The second part tell us that the support Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) contains a non-empty interior for every full ranking list.

Assumption 2.

For all s>0𝑠0s>0italic_s > 0, there exists a constant δ>0𝛿0\delta>0italic_δ > 0 such that

(𝑩(𝜽,ϵ)Supp(ρ𝜽))min{δϵn1,1},𝑩𝜽italic-ϵSuppsubscript𝜌superscript𝜽min𝛿superscriptitalic-ϵ𝑛11\mathcal{L}\left(\boldsymbol{B}(\boldsymbol{\theta},\epsilon)\cap\textbf{{Supp% }}(\rho_{\boldsymbol{\theta}^{\prime}})\right)\geq\textbf{{min}}\{\delta% \epsilon^{n-1},1\},caligraphic_L ( bold_italic_B ( bold_italic_θ , italic_ϵ ) ∩ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ≥ min { italic_δ italic_ϵ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT , 1 } , (144)

where 𝐁(𝛉,ϵ)𝐁𝛉italic-ϵ\boldsymbol{B}(\boldsymbol{\theta},\epsilon)bold_italic_B ( bold_italic_θ , italic_ϵ ) denotes the open ball centered at 𝛉𝛉\boldsymbol{\theta}bold_italic_θ with radius ϵitalic-ϵ\epsilonitalic_ϵ, and ()\mathcal{L}(\cdot)caligraphic_L ( ⋅ ) denotes the Lebesgue measure.

This assumption keeps the support Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) be non-singular.

Assumption 3.

The log probability mass function loggij(𝛉)logsubscript𝑔𝑖𝑗𝛉\textbf{{log}}\ g_{ij}(\boldsymbol{\theta})log italic_g start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) is uniform continuous differentiable w.r.t. 𝛉𝛉\boldsymbol{\theta}bold_italic_θ for all pairwise comparisons (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), that is

sup𝜽Supp(ρ𝜽),(i,j)𝕮𝜽loggij(𝜽)<.matrix𝜽Suppsubscript𝜌superscript𝜽𝑖𝑗𝕮supnormsubscript𝜽logsubscript𝑔𝑖𝑗𝜽\underset{\begin{matrix}\scriptstyle\boldsymbol{\theta}\in\textbf{{Supp}}(\rho% _{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle(i,j)\in\boldsymbol{\mathfrak{C}}\end{matrix}}{\textbf{{sup}}}\ % \left\|\nabla_{\boldsymbol{\theta}}\ \textbf{{log}}\ g_{ij}(\boldsymbol{\theta% })\right\|<\infty.start_UNDERACCENT start_ARG start_ROW start_CELL bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ( italic_i , italic_j ) ∈ bold_fraktur_C end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG sup end_ARG ∥ ∇ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT log italic_g start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ∥ < ∞ . (145)

This assumption needs the smoothness of the likelihood function. The BTl model satisfies this assumption.

Assumption 4.

The probability mass function g𝑔gitalic_g satisfies:

min𝜽,𝜽~Supp(ρ𝜽),𝝅(𝜽)𝝅(𝜽~)max(i,j)gi,j(𝜽)loggi,j(𝜽)gi,j(𝜽~)>0.matrix𝜽bold-~𝜽Suppsubscript𝜌superscript𝜽𝝅𝜽𝝅bold-~𝜽min𝑖𝑗maxsubscript𝑔𝑖𝑗𝜽logsubscript𝑔𝑖𝑗𝜽subscript𝑔𝑖𝑗bold-~𝜽0\underset{\begin{matrix}\scriptstyle\boldsymbol{\theta},\boldsymbol{\tilde{% \theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle\boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(% \boldsymbol{\tilde{\theta}})\end{matrix}}{\textbf{{min}}}\ \underset{(i,j)}{% \textbf{{max}}}\ \ g_{i,j}(\boldsymbol{\theta})\cdot\textbf{{log}}\frac{g_{i,j% }(\boldsymbol{\theta})}{g_{i,j}(\boldsymbol{\tilde{\theta}})}>0.start_UNDERACCENT start_ARG start_ROW start_CELL bold_italic_θ , overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG min end_ARG start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG max end_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG > 0 . (146)

This assumption requires the distinguishability between any pair of candidates in Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), e.g. there does not exist tie between any pair of candidates. Assumption 4 is a standard assumption in sequential hypothesis testing, which is known as the “indifference zone” assumption. The “indifference zone” condition tell us that the null and alternative hypotheses are separated in the sense that the Kullback-Leibler divergence between the two hypotheses is positive. Here the assumption excludes the case that the true preference score is in between the two hypotheses (ijsucceeds𝑖𝑗i\succ jitalic_i ≻ italic_j and jisucceeds𝑗𝑖j\succ iitalic_j ≻ italic_i). Furthermore, it means that selecting ijsucceeds𝑖𝑗i\succ jitalic_i ≻ italic_j or jisucceeds𝑗𝑖j\succ iitalic_j ≻ italic_i (the null and alternative hypothesis) will be different.

Assumption 5.

The prior distribution ρ𝛉subscript𝜌superscript𝛉\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT satisfies

inf𝜽{Supp(ρ𝜽)}ρ𝜽(𝜽)>0,𝜽superscriptSuppsubscript𝜌superscript𝜽infsubscript𝜌superscript𝜽𝜽0\displaystyle\underset{\boldsymbol{\theta}\in\{\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})\}^{\circ}}{\textbf{{inf}}}\ \ \rho_{\boldsymbol% {\theta}^{\prime}}(\boldsymbol{\theta})>0,start_UNDERACCENT bold_italic_θ ∈ { Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > 0 , (147)
sup𝜽Supp(ρ𝜽)ρ𝜽(𝜽)<0.\displaystyle\underset{\ \ \boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})\ \ }{\textbf{{sup}}}\ \ \rho_{\boldsymbol{% \theta}^{\prime}}(\boldsymbol{\theta})<0.start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) < 0 .

This assumption requires that the density function of prior distribution ρ𝜽subscript𝜌superscript𝜽\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is positive over Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ).

Considering the Assumption 1-5, the following theorem establishes a lower bound on the minimal Bayesian risk (45)

=inf𝚲,S(𝚲,S).superscript𝚲𝑆inf𝚲𝑆\mathfrak{R}^{*}=\underset{\boldsymbol{\Lambda},S}{\textbf{{inf}}}\ \mathfrak{% R}(\boldsymbol{\Lambda},S).fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_UNDERACCENT bold_Λ , italic_S end_UNDERACCENT start_ARG inf end_ARG fraktur_R ( bold_Λ , italic_S ) . (45)
Theorem 4.

If the Assumption 1-5 hold, we have

lim infχ0χ𝔼[τχ(Θ)]1,𝜒0lim infsuperscript𝜒𝔼delimited-[]subscript𝜏𝜒Θ1\underset{\chi\rightarrow 0}{\textbf{{lim inf}}}\ \ \frac{\mathfrak{R}^{*}}{% \chi\mathbb{E}\left[\tau_{\chi}(\Theta)\right]}\geq 1,start_UNDERACCENT italic_χ → 0 end_UNDERACCENT start_ARG lim inf end_ARG divide start_ARG fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_χ blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] end_ARG ≥ 1 , (148)

where

τχ(𝜽)=|logχ|(maxf𝝀𝚫min𝜽~Supp(ρ𝜽),𝝅(𝜽)𝝅(𝜽~)(i,j)λi,jgi,j(𝜽)loggi,j(𝜽)gi,j(𝜽~))1,subscript𝜏𝜒𝜽log𝜒superscript𝝀𝚫maxfmatrixbold-~𝜽Suppsubscript𝜌superscript𝜽𝝅𝜽𝝅bold-~𝜽minsubscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗𝜽logsubscript𝑔𝑖𝑗𝜽subscript𝑔𝑖𝑗bold-~𝜽1\tau_{\chi}(\boldsymbol{\theta})=|\textbf{{log}}\ \chi|\cdot\left(\underset{% \boldsymbol{\lambda}\in\boldsymbol{\Delta}\phantom{\boldsymbol{\tilde{\theta}}% }}{\textbf{{max\phantom{f}}}}\underset{\begin{matrix}\scriptstyle\boldsymbol{% \tilde{\theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle\boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(% \boldsymbol{\tilde{\theta}})\end{matrix}}{\textbf{{min}}}\ \ \sum_{(i,j)}% \lambda_{i,j}g_{i,j}(\boldsymbol{\theta})\cdot\textbf{{log}}\frac{g_{i,j}(% \boldsymbol{\theta})}{g_{i,j}(\boldsymbol{\tilde{\theta}})}\right)^{-1},italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) = | log italic_χ | ⋅ ( start_UNDERACCENT bold_italic_λ ∈ bold_Δ end_UNDERACCENT start_ARG max bold_italic_f end_ARG start_UNDERACCENT start_ARG start_ROW start_CELL overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG min end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (149)

and

𝔼[τχ(Θ)]=Supp(ρ𝜽)τχ(𝜽)ρ𝜽(𝜽)𝑑𝜽.𝔼delimited-[]subscript𝜏𝜒ΘsubscriptSuppsubscript𝜌superscript𝜽subscript𝜏𝜒𝜽subscript𝜌superscript𝜽𝜽differential-d𝜽\mathbb{E}\left[\tau_{\chi}(\Theta)\right]=\int_{\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}\tau_{\chi}(\boldsymbol{\theta})\rho_{% \boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})d\boldsymbol{\theta}.blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] = ∫ start_POSTSUBSCRIPT Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_d bold_italic_θ . (150)

To avoid confusion, we write ΘΘ\Thetaroman_Θ when 𝛉𝛉\boldsymbol{\theta}bold_italic_θ is viewed as a random variable.

Proof.

For any manipulation policy (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ) and a prior probability density function ρ𝜽subscript𝜌superscript𝜽\rho_{\boldsymbol{\theta}^{\prime}}italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, there only exist two cases:

  • 𝑬[(𝑹(𝚲,S))]χ|logχ|2.𝑬delimited-[]𝑹𝚲𝑆𝜒superscriptlog𝜒2\boldsymbol{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S))]\geq\chi|% \textbf{{log}}\ \chi|^{2}.bold_italic_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] ≥ italic_χ | log italic_χ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (151)
  • 𝑬[(𝑹(𝚲,S))]<χ|logχ|2.𝑬delimited-[]𝑹𝚲𝑆𝜒superscriptlog𝜒2\boldsymbol{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S))]<\chi|% \textbf{{log}}\ \chi|^{2}.bold_italic_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] < italic_χ | log italic_χ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (152)

For the first case, the Bayesian risk

(𝚲,S)=𝔼[(S)+(𝑹(𝚲,S))]𝚲𝑆𝔼delimited-[]𝑆𝑹𝚲𝑆\mathfrak{R}(\boldsymbol{\Lambda},S)=\mathbb{E}[\mathfrak{R}(S)+\mathfrak{R}(% \boldsymbol{R}(\boldsymbol{\Lambda},S))]fraktur_R ( bold_Λ , italic_S ) = blackboard_E [ fraktur_R ( italic_S ) + fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] (44)

satisfies

(𝚲,S)χ|logχ|2(1+o(1))χ𝔼[τχ(Θ)].𝚲𝑆𝜒superscriptlog𝜒21𝑜1𝜒𝔼delimited-[]subscript𝜏𝜒Θ\mathfrak{R}(\boldsymbol{\Lambda},S)\geq\chi|\textbf{{log}}\ \chi|^{2}\geq(1+o% (1))\chi\mathbb{E}\left[\tau_{\chi}(\Theta)\right].fraktur_R ( bold_Λ , italic_S ) ≥ italic_χ | log italic_χ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ ( 1 + italic_o ( 1 ) ) italic_χ blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] . (153)

For the second case, it is easy to see

(𝚲,S)𝔼[(S)]=χ𝔼[S].𝚲𝑆𝔼delimited-[]𝑆𝜒𝔼delimited-[]𝑆\mathfrak{R}(\boldsymbol{\Lambda},S)\geq\mathbb{E}[\mathfrak{R}(S)]=\chi% \mathbb{E}[S].fraktur_R ( bold_Λ , italic_S ) ≥ blackboard_E [ fraktur_R ( italic_S ) ] = italic_χ blackboard_E [ italic_S ] . (154)

Therefore, proving the results equals to show that

lim infχ0χ𝔼[S]χ𝔼[τχ(Θ)]1.𝜒0lim inf𝜒𝔼delimited-[]𝑆𝜒𝔼delimited-[]subscript𝜏𝜒Θ1\underset{\chi\rightarrow 0}{\textbf{{lim inf}}}\ \ \frac{\chi\mathbb{E}[S]}{% \chi\mathbb{E}\left[\tau_{\chi}(\Theta)\right]}\geq 1.start_UNDERACCENT italic_χ → 0 end_UNDERACCENT start_ARG lim inf end_ARG divide start_ARG italic_χ blackboard_E [ italic_S ] end_ARG start_ARG italic_χ blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] end_ARG ≥ 1 . (155)

In other words, for any δ>0𝛿0\delta>0italic_δ > 0, there exists a χ0>0subscript𝜒00\chi_{0}>0italic_χ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 such taht when χ<χ0𝜒subscript𝜒0\chi<\chi_{0}italic_χ < italic_χ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

𝔼[S](1δ)𝔼[τχ(Θ)].𝔼delimited-[]𝑆1𝛿𝔼delimited-[]subscript𝜏𝜒Θ\mathbb{E}[S]\geq(1-\delta)\mathbb{E}\left[\tau_{\chi}(\Theta)\right].blackboard_E [ italic_S ] ≥ ( 1 - italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] . (156)

For each δ>0𝛿0\delta>0italic_δ > 0, we set

τχ,δ(𝜽)=(123δ)τχ(𝜽),subscript𝜏𝜒𝛿𝜽123𝛿subscript𝜏𝜒𝜽\tau_{\chi,\delta}(\boldsymbol{\theta})=\left(1-\frac{2}{3}\delta\right)\tau_{% \chi}(\boldsymbol{\theta}),italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) = ( 1 - divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) , (157)

then

𝔼[S]𝔼delimited-[]𝑆\displaystyle\mathbb{E}[S]blackboard_E [ italic_S ] \displaystyle\geq 𝔼[S|S>τχ,δ(Θ)]𝔼delimited-[]𝑆ket𝑆subscript𝜏𝜒𝛿Θ\displaystyle\ \ \ \mathbb{E}[S|S>\tau_{\chi,\delta}(\Theta)]blackboard_E [ italic_S | italic_S > italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ] (158)
\displaystyle\geq Supp(ρ𝜽)ρ𝜽(𝜽)τχ,δ(𝜽)(S>τχ,δ(Θ)|Θ=𝜽)𝑑𝜽subscriptSuppsubscript𝜌superscript𝜽subscript𝜌superscript𝜽𝜽subscript𝜏𝜒𝛿𝜽𝑆conditionalsubscript𝜏𝜒𝛿ΘΘ𝜽differential-d𝜽\displaystyle\ \ \ \int_{\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}% \rho_{\boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})\tau_{\chi,\delta}(% \boldsymbol{\theta})\mathbb{P}(S>\tau_{\chi,\delta}(\Theta)|\Theta=\boldsymbol% {\theta})d\boldsymbol{\theta}∫ start_POSTSUBSCRIPT Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) blackboard_P ( italic_S > italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ ) italic_d bold_italic_θ
=\displaystyle== 𝔼[τχ,δ(Θ)]Supp(ρ𝜽)ρ𝜽(𝜽)τχ,δ(𝜽)(Sτχ,δ(Θ)|Θ=𝜽)𝑑𝜽𝔼delimited-[]subscript𝜏𝜒𝛿ΘsubscriptSuppsubscript𝜌superscript𝜽subscript𝜌superscript𝜽𝜽subscript𝜏𝜒𝛿𝜽𝑆conditionalsubscript𝜏𝜒𝛿ΘΘ𝜽differential-d𝜽\displaystyle\ \ \mathbb{E}\left[\tau_{\chi,\delta}(\Theta)\right]-\int_{% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}\rho_{\boldsymbol{\theta}% ^{\prime}}(\boldsymbol{\theta})\tau_{\chi,\delta}(\boldsymbol{\theta})\mathbb{% P}(S\leq\tau_{\chi,\delta}(\Theta)|\Theta=\boldsymbol{\theta})d\boldsymbol{\theta}blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ] - ∫ start_POSTSUBSCRIPT Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ ) italic_d bold_italic_θ
\displaystyle\geq 𝔼[τχ,δ(Θ)]τχ,δmax(Sτχ,δ(Θ)),𝔼delimited-[]subscript𝜏𝜒𝛿Θsubscriptsuperscript𝜏max𝜒𝛿𝑆subscript𝜏𝜒𝛿Θ\displaystyle\ \ \mathbb{E}\left[\tau_{\chi,\delta}({\Theta})\right]-\tau^{% \textbf{{max}}}_{\chi,\delta}\cdot\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta)),blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ] - italic_τ start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ⋅ blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ) ,

where τχ,δmaxsubscriptsuperscript𝜏max𝜒𝛿\tau^{\textbf{{max}}}_{\chi,\delta}italic_τ start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT is defined as

τχ,δmax=max𝜽Supp(ρ𝜽)τχ,δ(𝜽).subscriptsuperscript𝜏max𝜒𝛿𝜽Suppsubscript𝜌superscript𝜽maxsubscript𝜏𝜒𝛿𝜽\tau^{\textbf{{max}}}_{\chi,\delta}=\underset{\boldsymbol{\theta}\in\textbf{{% Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{max}}}\ \tau_{\chi,% \delta}(\boldsymbol{\theta}).italic_τ start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT = start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) . (159)

According to Assumption 4, we know that

τχ,δmax=O(|logχ|)=O(𝔼[τχ(Θ)]).subscriptsuperscript𝜏max𝜒𝛿𝑂log𝜒𝑂𝔼delimited-[]subscript𝜏𝜒Θ\tau^{\textbf{{max}}}_{\chi,\delta}=O(|\textbf{{log}}\ \chi|)=O(\mathbb{E}% \left[\tau_{\chi}({\Theta})\right]).italic_τ start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT = italic_O ( | log italic_χ | ) = italic_O ( blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] ) . (160)

Furthermore, to prove (156), it is sufficient to show

(Sτχ,δ(Θ))=o(1).𝑆subscript𝜏𝜒𝛿Θ𝑜1\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta))=o(1).blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ) = italic_o ( 1 ) . (161)

The next step is to establish an upper bound for (Sτχ,δ(Θ))𝑆subscript𝜏𝜒𝛿Θ\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta))blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ). Given a full ranking list 𝝅0subscript𝝅0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we write

𝚯𝝅0={𝜽|𝝅(𝜽)=𝝅0}subscript𝚯subscript𝝅0conditional-set𝜽𝝅𝜽subscript𝝅0\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}=\{\boldsymbol{\theta}|\boldsymbol{% \pi}(\boldsymbol{\theta})=\boldsymbol{\pi}_{0}\}bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { bold_italic_θ | bold_italic_π ( bold_italic_θ ) = bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } (162)

as all preference score which will generate the full ranking 𝝅0subscript𝝅0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Then

(Sτχ,δ(Θ))𝑆subscript𝜏𝜒𝛿Θ\displaystyle\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta))blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ) =\displaystyle== 𝝅0(Sτχ,δ(Θ),Θ𝚯𝝅0)subscriptsubscript𝝅0formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\displaystyle\ \ \sum_{\boldsymbol{\pi}_{0}}\ \mathbb{P}(S\leq\tau_{\chi,% \delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}})∑ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (163)
=\displaystyle== O(1)max𝝅0(Sτχ,δ(Θ),Θ𝚯𝝅0).𝑂1subscript𝝅0maxformulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\displaystyle\ \ O(1)\cdot\underset{\boldsymbol{\pi}_{0}}{\textbf{{max}}}\ % \mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}}).italic_O ( 1 ) ⋅ start_UNDERACCENT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Then we conduct an upper bound of (Sτχ,δ(Θ),Θ𝚯𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) for any 𝝅0subscript𝝅0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Define an event 𝑬𝝅0subscript𝑬subscript𝝅0\boldsymbol{E}_{\boldsymbol{\pi}_{0}}bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT:

𝑬𝝅0={(Θ𝚯𝝅0|S)max(i,j):𝚯i,j𝚯𝝅0=(Θ𝚯i,j|S)>χδ10ϵ},subscript𝑬subscript𝝅0Θconditionalsubscript𝚯subscript𝝅0subscript𝑆:𝑖𝑗subscript𝚯𝑖𝑗subscript𝚯subscript𝝅0maxΘconditionalsubscript𝚯𝑖𝑗subscript𝑆superscript𝜒𝛿10italic-ϵ\boldsymbol{E}_{\boldsymbol{\pi}_{0}}=\left\{\frac{\mathbb{P}(\Theta\in% \boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}|\mathcal{F}_{S})}{\underset{(i,j):% \boldsymbol{\Theta}_{i,j}\cap\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}=% \varnothing}{\textbf{{max}}}\ \mathbb{P}(\Theta\in\boldsymbol{\Theta}_{i,j}|% \mathcal{F}_{S})}>\frac{\chi^{\frac{\delta}{10}}}{\epsilon}\right\},bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { divide start_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∩ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∅ end_UNDERACCENT start_ARG max end_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG } , (164)

where S=σ(c1,,cS)subscript𝑆𝜎subscript𝑐1subscript𝑐𝑆\mathcal{F}_{S}=\sigma(c_{1},\dots,c_{S})caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_σ ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) denotes the σ𝜎\sigmaitalic_σ-algebra generated by c1,,cSsubscript𝑐1subscript𝑐𝑆c_{1},\dots,c_{S}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 is a constant and the definition of 𝚯i,jsubscript𝚯𝑖𝑗\boldsymbol{\Theta}_{i,j}bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is

𝚯i,j={𝜽+n|θiθj}Supp(ρ𝜽).subscript𝚯𝑖𝑗conditional-set𝜽subscriptsuperscript𝑛subscript𝜃𝑖subscript𝜃𝑗Suppsubscript𝜌superscript𝜽\boldsymbol{\Theta}_{i,j}=\big{\{}\boldsymbol{\ \theta}\in\mathbb{R}^{n}_{+}\ % |\ \theta_{i}\geq\theta_{j}\ \big{\}}\cap\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}}).bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ∩ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) . (52)

For each (Sτχ,δ(Θ),Θ𝚯𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), we have

(Sτχ,δ(Θ),Θ𝚯𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\displaystyle\ \ \mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in% \boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (165)
=\displaystyle== (Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0)+(Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0c)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscript𝑬subscript𝝅0formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscriptsuperscript𝑬𝑐subscript𝝅0\displaystyle\ \ \mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in% \boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}},\boldsymbol{E}_{\boldsymbol{\pi}_{0% }})+\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}},\boldsymbol{E}^{c}_{\boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

where 𝑬𝝅0csubscriptsuperscript𝑬𝑐subscript𝝅0\boldsymbol{E}^{c}_{\boldsymbol{\pi}_{0}}bold_italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the complement of 𝑬𝝅0subscript𝑬subscript𝝅0\boldsymbol{E}_{\boldsymbol{\pi}_{0}}bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then (Sτχ,δ(Θ),Θ𝚯𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) can be further bounded by

(Sτχ,δ(Θ),Θ𝚯𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0\displaystyle\ \ \mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in% \boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (166)
\displaystyle\leq (Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0)+(𝜽𝚯𝝅0,𝑬𝝅0c).formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscript𝑬subscript𝝅0𝜽subscript𝚯subscript𝝅0subscriptsuperscript𝑬𝑐subscript𝝅0\displaystyle\ \ \mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in% \boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}},\boldsymbol{E}_{\boldsymbol{\pi}_{0% }})+\mathbb{P}(\boldsymbol{\theta}\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}% },\boldsymbol{E}^{c}_{\boldsymbol{\pi}_{0}}).blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + blackboard_P ( bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

The first term (Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0)formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscript𝑬subscript𝝅0\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}},\boldsymbol{E}_{\boldsymbol{\pi}_{0}})blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) equals to

(Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0)=𝚯𝝅0(Sτχ,δ(𝜽),𝑬𝝅0|Θ=𝜽)ρ𝜽(𝜽)𝑑𝜽.formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscript𝑬subscript𝝅0subscriptsubscript𝚯subscript𝝅0formulae-sequence𝑆subscript𝜏𝜒𝛿𝜽conditionalsubscript𝑬subscript𝝅0Θ𝜽subscript𝜌superscript𝜽𝜽differential-d𝜽\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}},\boldsymbol{E}_{\boldsymbol{\pi}_{0}})=\int_{\boldsymbol% {\Theta}_{\boldsymbol{\pi}_{0}}}\mathbb{P}(S\leq\tau_{\chi,\delta}(\boldsymbol% {\theta}),\boldsymbol{E}_{\boldsymbol{\pi}_{0}}|\Theta=\boldsymbol{\theta})% \rho_{\boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})d\boldsymbol{\theta}.blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_Θ = bold_italic_θ ) italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_d bold_italic_θ . (167)

Notice that the intersection of {Sτχ,δ(𝜽)}𝑆subscript𝜏𝜒𝛿𝜽\{S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})\}{ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) } and 𝑬𝝅0subscript𝑬subscript𝝅0\boldsymbol{E}_{\boldsymbol{\pi}_{0}}bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT will be

{Sτχ,δ(𝜽)}𝑬𝝅0{max1Sτχ,δ(𝜽)(Θ𝚯𝝅0|S)max(i,j):𝚯i,j𝚯𝝅0=(Θ𝚯i,j|S)>χδ10ϵ}.𝑆subscript𝜏𝜒𝛿𝜽subscript𝑬subscript𝝅01𝑆subscript𝜏𝜒𝛿𝜽maxΘconditionalsubscript𝚯subscript𝝅0subscript𝑆:𝑖𝑗subscript𝚯𝑖𝑗subscript𝚯subscript𝝅0maxΘconditionalsubscript𝚯𝑖𝑗subscript𝑆superscript𝜒𝛿10italic-ϵ\{S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})\}\cap\boldsymbol{E}_{% \boldsymbol{\pi}_{0}}\subset\left\{\underset{1\leq S\leq\tau_{\chi,\delta}(% \boldsymbol{\theta})}{\textbf{{max}}}\frac{\mathbb{P}(\Theta\in\boldsymbol{% \Theta}_{\boldsymbol{\pi}_{0}}|\mathcal{F}_{S})}{\underset{(i,j):\boldsymbol{% \Theta}_{i,j}\cap\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}=\varnothing}{% \textbf{{max}}}\ \mathbb{P}(\Theta\in\boldsymbol{\Theta}_{i,j}|\mathcal{F}_{S}% )}>\frac{\chi^{\frac{\delta}{10}}}{\epsilon}\right\}.{ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) } ∩ bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊂ { start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG divide start_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∩ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∅ end_UNDERACCENT start_ARG max end_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG } . (168)

Then

(Sτχ,δ(𝜽),𝑬𝝅0|Θ=𝜽)(max1Sτχ,δ(𝜽)(Θ𝚯𝝅0|S)max(i,j):𝚯i,j𝚯𝝅0=(Θ𝚯i,j|S)>χδ10ϵ|Θ=𝜽).formulae-sequence𝑆subscript𝜏𝜒𝛿𝜽conditionalsubscript𝑬subscript𝝅0Θ𝜽1𝑆subscript𝜏𝜒𝛿𝜽maxΘconditionalsubscript𝚯subscript𝝅0subscript𝑆:𝑖𝑗subscript𝚯𝑖𝑗subscript𝚯subscript𝝅0maxΘconditionalsubscript𝚯𝑖𝑗subscript𝑆conditionalsuperscript𝜒𝛿10italic-ϵΘ𝜽\mathbb{P}(S\leq\tau_{\chi,\delta}(\boldsymbol{\theta}),\boldsymbol{E}_{% \boldsymbol{\pi}_{0}}|\Theta=\boldsymbol{\theta})\leq\mathbb{P}\left(\underset% {1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}{\textbf{{max}}}\frac{% \mathbb{P}(\Theta\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}|\mathcal{F}_{S}% )}{\underset{(i,j):\boldsymbol{\Theta}_{i,j}\cap\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}}=\varnothing}{\textbf{{max}}}\ \mathbb{P}(\Theta\in% \boldsymbol{\Theta}_{i,j}|\mathcal{F}_{S})}>\frac{\chi^{\frac{\delta}{10}}}{% \epsilon}\Bigg{|}\Theta=\boldsymbol{\theta}\right).blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_Θ = bold_italic_θ ) ≤ blackboard_P ( start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG divide start_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∩ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∅ end_UNDERACCENT start_ARG max end_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG | roman_Θ = bold_italic_θ ) . (169)

The next step is to transform the right-side of the inequality to a probability by a martingale parameterized by 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. The corresponding results are represented by the Lemma 3 in [16].

Lemma 4.

Suppose that S(𝛉′′)subscript𝑆superscript𝛉′′\mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) is a martingale w.r.t the filtration {S:S1}conditional-setsubscript𝑆𝑆1\{\mathcal{F}_{S}:S\geq 1\}{ caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT : italic_S ≥ 1 } and probability measure (|Θ=𝛉)\mathbb{P}(\cdot|\Theta=\boldsymbol{\theta})blackboard_P ( ⋅ | roman_Θ = bold_italic_θ ) for any 𝛉′′𝚯𝛑0superscript𝛉′′subscript𝚯subscript𝛑0\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT,

S(𝜽′′)=S(𝜽′′)S(𝜽S)s=1S(i,j)λi,j(s)gi,j(𝜽)loggi,j(𝜽)gi,j(𝜽S)+s=1S(i,j)λi,j(s)gi,j(𝜽)loggi,j(𝜽)gi,j(𝜽′′),subscript𝑆superscript𝜽′′subscript𝑆superscript𝜽′′subscript𝑆subscriptsuperscript𝜽𝑆superscriptsubscript𝑠1𝑆subscript𝑖𝑗superscriptsubscript𝜆𝑖𝑗𝑠subscript𝑔𝑖𝑗𝜽logsubscript𝑔𝑖𝑗𝜽subscript𝑔𝑖𝑗subscriptsuperscript𝜽𝑆superscriptsubscript𝑠1𝑆subscript𝑖𝑗superscriptsubscript𝜆𝑖𝑗𝑠subscript𝑔𝑖𝑗𝜽logsubscript𝑔𝑖𝑗𝜽subscript𝑔𝑖𝑗superscript𝜽′′\mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})=\ell_{S}(\boldsymbol{% \theta}^{\prime\prime})-\ell_{S}(\boldsymbol{\theta}^{*}_{S})-\sum_{s=1}^{S}% \sum_{(i,j)}\lambda_{i,j}^{(s)}g_{i,j}(\boldsymbol{\theta})\cdot\textbf{{log}}% \frac{g_{i,j}(\boldsymbol{\theta})}{g_{i,j}(\boldsymbol{\theta}^{*}_{S})}+\sum% _{s=1}^{S}\sum_{(i,j)}\lambda_{i,j}^{(s)}g_{i,j}(\boldsymbol{\theta})\cdot% \textbf{{log}}\frac{g_{i,j}(\boldsymbol{\theta})}{g_{i,j}(\boldsymbol{\theta}^% {\prime\prime})},caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG + ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) end_ARG , (170)

where

S(𝜽)=logs=1Sgij(𝜽),subscript𝑆𝜽logsuperscriptsubscriptproduct𝑠1𝑆subscript𝑔𝑖𝑗𝜽\ell_{S}(\boldsymbol{\theta})=\textbf{{log}}\ \prod_{s=1}^{S}g_{ij}(% \boldsymbol{\theta}),roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ ) = log ∏ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) , (171)

and

𝜽Sarg min𝜽~Supp(ρ𝜽),𝝅(𝜽)𝝅(𝜽~)s=1S(i,j)λi,j(s)gi,j(𝜽)loggi,j(𝜽)gi,j(𝜽~).subscriptsuperscript𝜽𝑆matrixbold-~𝜽Suppsubscript𝜌superscript𝜽𝝅𝜽𝝅bold-~𝜽arg minsuperscriptsubscript𝑠1𝑆subscript𝑖𝑗subscriptsuperscript𝜆𝑠𝑖𝑗subscript𝑔𝑖𝑗𝜽logsubscript𝑔𝑖𝑗𝜽subscript𝑔𝑖𝑗~𝜽\boldsymbol{\theta}^{*}_{S}\in\underset{\begin{matrix}\scriptstyle\boldsymbol{% \tilde{\theta}}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle\boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(% \boldsymbol{\tilde{\theta}})\end{matrix}}{\textbf{{arg\ min}}}\ \sum_{s=1}^{S}% \sum_{(i,j)}\lambda^{(s)}_{i,j}g_{i,j}(\boldsymbol{\theta})\textbf{{log}}\frac% {g_{i,j}(\boldsymbol{\theta})}{g_{i,j}(\tilde{\boldsymbol{\theta}})}.bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∈ start_UNDERACCENT start_ARG start_ROW start_CELL overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG arg min end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_θ end_ARG ) end_ARG . (172)

There exists a χ0>0subscript𝜒00\chi_{0}>0italic_χ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 such that when χ𝜒\chiitalic_χ satisfies 0<χ<χ00𝜒subscript𝜒00<\chi<\chi_{0}0 < italic_χ < italic_χ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, it holds that

(max1Sτχ,δ(𝜽)(Θ𝚯𝝅0|S)max(i,j):𝚯i,j𝚯𝝅0=(Θ𝚯i,j|S)>χδ10ϵ|Θ=𝜽)1𝑆subscript𝜏𝜒𝛿𝜽maxΘconditionalsubscript𝚯subscript𝝅0subscript𝑆:𝑖𝑗subscript𝚯𝑖𝑗subscript𝚯subscript𝝅0maxΘconditionalsubscript𝚯𝑖𝑗subscript𝑆conditionalsuperscript𝜒𝛿10italic-ϵΘ𝜽\displaystyle\ \ \mathbb{P}\left(\underset{1\leq S\leq\tau_{\chi,\delta}(% \boldsymbol{\theta})}{\textbf{{max}}}\frac{\mathbb{P}(\Theta\in\boldsymbol{% \Theta}_{\boldsymbol{\pi}_{0}}|\mathcal{F}_{S})}{\underset{(i,j):\boldsymbol{% \Theta}_{i,j}\cap\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}=\varnothing}{% \textbf{{max}}}\ \mathbb{P}(\Theta\in\boldsymbol{\Theta}_{i,j}|\mathcal{F}_{S}% )}>\frac{\chi^{\frac{\delta}{10}}}{\epsilon}\ \Bigg{|}\ \Theta=\boldsymbol{% \theta}\right)blackboard_P ( start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG divide start_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∩ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∅ end_UNDERACCENT start_ARG max end_ARG blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG | roman_Θ = bold_italic_θ ) (173)
\displaystyle\leq (max1Sτχ,δ(𝜽),𝜽′′𝚯𝝅0S(𝜽′′)δ2|logχ||Θ=𝜽).matrix1𝑆subscript𝜏𝜒𝛿𝜽superscript𝜽′′subscript𝚯subscript𝝅0maxsubscript𝑆superscript𝜽′′conditional𝛿2log𝜒Θ𝜽\displaystyle\ \ \mathbb{P}\left(\underset{\begin{matrix}\scriptstyle 1\leq S% \leq\tau_{\chi,\delta}(\boldsymbol{\theta}),\\ \scriptstyle\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}}\end{matrix}}{\textbf{{max}}}\mathcal{M}_{S}(\boldsymbol{% \theta}^{\prime\prime})\geq\frac{\delta}{2}|\textbf{{log}}\ \chi|\ \Bigg{|}\ % \Theta=\boldsymbol{\theta}\right).blackboard_P ( start_UNDERACCENT start_ARG start_ROW start_CELL 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) , end_CELL end_ROW start_ROW start_CELL bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ≥ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | | roman_Θ = bold_italic_θ ) .

With Lemma 4, we know that establishing the upper bound of (169) equals to find an upper bound of the right hand side of (173). It is noteworthy that the right hand side of (173) is a stochastic process, indexed by 𝜽′′superscript𝜽′′\boldsymbol{\theta}^{\prime\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and S𝑆Sitalic_S, going beyond a certain level (δ|logχ|/2𝛿log𝜒2\delta|\textbf{{log}}\ \chi|/2italic_δ | log italic_χ | / 2).

To handle the level-crossing probabilities, we introduce the Azuma-Hoeffding inequality [3, 28] and derive the level-crossing probability by aggregating marginal tail bounds of a random field by Lemma 5 and 6.

Lemma 5 (Azuma-Hoeffding Inequality).

Let Ssubscript𝑆\mathcal{M}_{S}caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT be a martingale w.r.t. the filtration {S:S1}conditional-setsubscript𝑆𝑆1\{\mathcal{F}_{S}:S\geq 1\}{ caligraphic_F start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT : italic_S ≥ 1 } and ΔS=SS1Δsubscript𝑆subscript𝑆subscript𝑆1\Delta\mathcal{M}_{S}=\mathcal{M}_{S}-\mathcal{M}_{S-1}roman_Δ caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - caligraphic_M start_POSTSUBSCRIPT italic_S - 1 end_POSTSUBSCRIPT. Suppose that ΔS[aS,bS]Δsubscript𝑆subscript𝑎𝑆subscript𝑏𝑆\Delta\mathcal{M}_{S}\in[a_{S},b_{S}]roman_Δ caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∈ [ italic_a start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] where aSsubscript𝑎𝑆a_{S}italic_a start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and bSsubscript𝑏𝑆b_{S}italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are deterministic constants, for each S>0𝑆0S>0italic_S > 0, we have

(max1sSsS0)exp(2S02s=1S(bsas)2).1𝑠𝑆maxsubscript𝑠subscript𝑆0exp2superscriptsubscript𝑆02superscriptsubscript𝑠1𝑆superscriptsubscript𝑏𝑠subscript𝑎𝑠2\mathbb{P}\left(\underset{1\leq s\leq S}{\textbf{{max}}}\ \mathcal{M}_{s}\geq S% _{0}\right)\leq\textbf{{exp}}\left(-\frac{2S_{0}^{2}}{\sum_{s=1}^{S}(b_{s}-a_{% s})^{2}}\right).blackboard_P ( start_UNDERACCENT 1 ≤ italic_s ≤ italic_S end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≥ italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ exp ( - divide start_ARG 2 italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) . (174)
Lemma 6.

Let {ζ(𝛉):𝛉𝚯}conditional-set𝜁𝛉𝛉𝚯\{\zeta(\boldsymbol{\theta}):\boldsymbol{\theta}\in\boldsymbol{\Theta}\}{ italic_ζ ( bold_italic_θ ) : bold_italic_θ ∈ bold_Θ } be a random filed over a compact set 𝚯n𝚯superscript𝑛\boldsymbol{\Theta}\subset\mathbb{R}^{n}bold_Θ ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT that satisfies Assumption 2, where ζ()𝜁\zeta(\cdot)italic_ζ ( ⋅ ) has a continuous sample path almost surely under a probability measure \mathbb{P}blackboard_P. Moreover, ζ()𝜁\zeta(\cdot)italic_ζ ( ⋅ ) has a Lipschitz-continuous sample path in the sense that there exists a constant κ𝜅\kappaitalic_κ such that for all 𝛉,𝛉𝚯𝛉superscript𝛉𝚯\boldsymbol{\theta},\boldsymbol{\theta}^{\prime}\in\boldsymbol{\Theta}bold_italic_θ , bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Θ,

|ζ(𝜽)ζ(𝜽)|κ𝜽𝜽𝜁𝜽𝜁superscript𝜽𝜅norm𝜽superscript𝜽\big{|}\zeta(\boldsymbol{\theta})-\zeta(\boldsymbol{\theta}^{\prime})\big{|}% \leq\kappa\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\|| italic_ζ ( bold_italic_θ ) - italic_ζ ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | ≤ italic_κ ∥ bold_italic_θ - bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ (175)

almost surely under \mathbb{P}blackboard_P. We define

β(𝜽,b)=(ζ(𝜽)b),𝛽𝜽𝑏𝜁𝜽𝑏\beta(\boldsymbol{\theta},b)=\mathbb{P}\big{(}\zeta(\boldsymbol{\theta})\geq b% \big{)},italic_β ( bold_italic_θ , italic_b ) = blackboard_P ( italic_ζ ( bold_italic_θ ) ≥ italic_b ) , (176)

and for all γ>0𝛾0\gamma>0italic_γ > 0, it holds that

(max𝜽𝚯ζ(𝜽)b)κLn1γn1δb×𝚯β(𝜽,bγ)𝑑𝜽,𝜽𝚯max𝜁𝜽𝑏subscriptsuperscript𝜅𝑛1𝐿superscript𝛾𝑛1subscript𝛿𝑏subscript𝚯𝛽𝜽𝑏𝛾differential-d𝜽\mathbb{P}\left(\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}}{\textbf{{% max}}}\ \zeta(\boldsymbol{\theta})\geq b\right)\leq\frac{\kappa^{n-1}_{L}}{% \gamma^{n-1}\delta_{b}}\times\int_{\boldsymbol{\Theta}}\beta(\boldsymbol{% \theta},b-\gamma)d\boldsymbol{\theta},blackboard_P ( start_UNDERACCENT bold_italic_θ ∈ bold_Θ end_UNDERACCENT start_ARG max end_ARG italic_ζ ( bold_italic_θ ) ≥ italic_b ) ≤ divide start_ARG italic_κ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG × ∫ start_POSTSUBSCRIPT bold_Θ end_POSTSUBSCRIPT italic_β ( bold_italic_θ , italic_b - italic_γ ) italic_d bold_italic_θ , (177)

where δbsubscript𝛿𝑏\delta_{b}italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the constant in Assumption 2.

With Lemma 5, we set

\displaystyle\mathbb{P}blackboard_P =\displaystyle== (|Θ=𝜽),\displaystyle\ \ \mathbb{P}(\cdot|\Theta=\boldsymbol{\theta}),blackboard_P ( ⋅ | roman_Θ = bold_italic_θ ) , (178)
S𝑆\displaystyle Sitalic_S =\displaystyle== τχ,δ(𝜽),subscript𝜏𝜒𝛿𝜽\displaystyle\ \ \tau_{\chi,\delta}(\boldsymbol{\theta}),italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) ,
S0subscript𝑆0\displaystyle S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =\displaystyle== δ2|logχ|1,𝛿2log𝜒1\displaystyle\ \ \frac{\delta}{2}|\textbf{{log}}\ \chi|-1,divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | - 1 ,
Ssubscript𝑆\displaystyle\mathcal{M}_{S}caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =\displaystyle== S(𝜽′′),subscript𝑆superscript𝜽′′\displaystyle\ \ \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime}),caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ,
aS=bSsubscript𝑎𝑆subscript𝑏𝑆\displaystyle a_{S}\ \ =\ \ b_{S}italic_a start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =\displaystyle== 2max𝜽Supp(ρ𝜽),(i,j)|loggi,j(𝜽)|,2matrix𝜽Suppsubscript𝜌superscript𝜽𝑖𝑗maxlogsubscript𝑔𝑖𝑗𝜽\displaystyle\ \ 2\ \underset{\begin{matrix}\scriptstyle\boldsymbol{\theta}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle(i,j)\in\mathfrak{C}\end{matrix}}{\textbf{{max}}}|\textbf{{log}}\ % g_{i,j}(\boldsymbol{\theta})|,2 start_UNDERACCENT start_ARG start_ROW start_CELL bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ( italic_i , italic_j ) ∈ fraktur_C end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG | log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) | ,

and for each 𝜽′′superscript𝜽′′\boldsymbol{\theta}^{\prime\prime}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, we have

(max1Sτχ,δ(𝜽)S(𝜽′′)δ2|logχ|1|Θ=𝜽)exp(2(δ2|logχ|1)2τχ,δ(𝜽)a12).1𝑆subscript𝜏𝜒𝛿𝜽maxsubscript𝑆superscript𝜽′′𝛿2log𝜒conditional1Θ𝜽exp2superscript𝛿2log𝜒12subscript𝜏𝜒𝛿𝜽superscriptsubscript𝑎12\mathbb{P}\left(\underset{1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}{% \textbf{{max}}}\ \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})\geq\frac{% \delta}{2}|\textbf{{log}}\ \chi|-1\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right% )\leq\textbf{{exp}}\left(-\frac{2\left(\frac{\delta}{2}|\textbf{{log}}\ \chi|-% 1\right)^{2}}{\tau_{\chi,\delta}(\boldsymbol{\theta})\cdot a_{1}^{2}}\right).blackboard_P ( start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ≥ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | - 1 | roman_Θ = bold_italic_θ ) ≤ exp ( - divide start_ARG 2 ( divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) . (179)

According to Assumption 1 and 3, a1<subscript𝑎1a_{1}<\inftyitalic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ∞ stands and

(max1Sτχ,δ(𝜽)S(𝜽′′)δ2|logχ|1|Θ=𝜽)exp(Ω(δ2|logχ|)),1𝑆subscript𝜏𝜒𝛿𝜽maxsubscript𝑆superscript𝜽′′𝛿2log𝜒conditional1Θ𝜽expΩsuperscript𝛿2log𝜒\mathbb{P}\left(\underset{1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}{% \textbf{{max}}}\ \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})\geq\frac{% \delta}{2}|\textbf{{log}}\ \chi|-1\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right% )\leq\textbf{{exp}}\left(-\Omega(\delta^{2}|\textbf{{log}}\ \chi|)\right),blackboard_P ( start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ≥ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | - 1 | roman_Θ = bold_italic_θ ) ≤ exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) , (180)

where O()𝑂O(\cdot)italic_O ( ⋅ ) is the infinitesimal of the same order. Notice that

max1Sτχ,δ(𝜽)S(𝜽′′)max1Sτχ,δ(𝜽)S(𝜽′′′)1𝑆subscript𝜏𝜒𝛿𝜽maxsubscript𝑆superscript𝜽′′1𝑆subscript𝜏𝜒𝛿𝜽maxsubscript𝑆superscript𝜽′′′\displaystyle\ \ \underset{1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}% {\textbf{{max}}}\ \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})-% \underset{1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}{\textbf{{max}}}% \ \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime\prime})start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ) (181)
\displaystyle\leq max1Sτχ,δ(𝜽)|S(𝜽′′)S(𝜽′′′)|1𝑆subscript𝜏𝜒𝛿𝜽maxsubscript𝑆superscript𝜽′′subscript𝑆superscript𝜽′′′\displaystyle\ \ \underset{1\leq S\leq\tau_{\chi,\delta}(\boldsymbol{\theta})}% {\textbf{{max}}}\ |\mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime})-% \mathcal{M}_{S}(\boldsymbol{\theta}^{\prime\prime\prime})|start_UNDERACCENT 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) end_UNDERACCENT start_ARG max end_ARG | caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ) |
\displaystyle\leq τχ,δ(𝜽)κ0𝜽′′𝜽′′′,𝜽′′,𝜽′′′𝚯𝝅0,subscript𝜏𝜒𝛿𝜽subscript𝜅0normsuperscript𝜽′′superscript𝜽′′′for-allsuperscript𝜽′′superscript𝜽′′′subscript𝚯subscript𝝅0\displaystyle\ \ \tau_{\chi,\delta}(\boldsymbol{\theta})\cdot\kappa_{0}\cdot\|% \boldsymbol{\theta}^{\prime\prime}-\boldsymbol{\theta}^{\prime\prime\prime}\|,% \ \ \ \forall\ \ \boldsymbol{\theta}^{\prime\prime},\boldsymbol{\theta}^{% \prime\prime\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}},italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ italic_κ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ ∥ bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ∥ , ∀ bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

where

κ0=4max𝜽Supp(ρ𝜽),(i,j)|loggij(𝜽)|<subscript𝜅04matrix𝜽Suppsubscript𝜌superscript𝜽𝑖𝑗maxlogsubscript𝑔𝑖𝑗𝜽\kappa_{0}=4\ \underset{\begin{matrix}\scriptstyle\boldsymbol{\theta}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}}),\\ \scriptstyle(i,j)\in\mathfrak{C}\end{matrix}}{\textbf{{max}}}\ |\nabla\textbf{% {log}}\ g_{ij}(\boldsymbol{\theta})|<\inftyitalic_κ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 4 start_UNDERACCENT start_ARG start_ROW start_CELL bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ( italic_i , italic_j ) ∈ fraktur_C end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG | ∇ log italic_g start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) | < ∞ (182)

is the Lipschitz constant of 1(𝜽′′)subscript1superscript𝜽′′\mathcal{M}_{1}(\boldsymbol{\theta}^{\prime\prime})caligraphic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ). Consequently, S(𝜽)subscript𝑆superscript𝜽\mathcal{M}_{S}(\boldsymbol{\theta}^{\prime})caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is a Lipschitz-continuous random field w.r.t. 𝜽′′𝚯𝝅0superscript𝜽′′subscript𝚯subscript𝝅0\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then adding the constraint 𝜽′′𝚯𝝅0superscript𝜽′′subscript𝚯subscript𝝅0\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT into (180) and adopting Lemma 6 give

(max1Sτχ,δ(𝜽),𝜽′′𝚯𝝅0S(𝜽′′)δ2|logχ|1|Θ=𝜽)matrix1𝑆subscript𝜏𝜒𝛿𝜽superscript𝜽′′subscript𝚯subscript𝝅0maxsubscript𝑆superscript𝜽′′𝛿2log𝜒conditional1Θ𝜽\displaystyle\ \ \mathbb{P}\left(\underset{\begin{matrix}\scriptstyle 1\leq S% \leq\tau_{\chi,\delta}(\boldsymbol{\theta}),\\ \scriptstyle\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}}\end{matrix}}{\textbf{{max}}}\ \mathcal{M}_{S}(% \boldsymbol{\theta}^{\prime\prime})\geq\frac{\delta}{2}|\textbf{{log}}\ \chi|-% 1\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( start_UNDERACCENT start_ARG start_ROW start_CELL 1 ≤ italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) , end_CELL end_ROW start_ROW start_CELL bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG caligraphic_M start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ≥ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG | log italic_χ | - 1 | roman_Θ = bold_italic_θ ) (183)
\displaystyle\leq exp(Ω(δ2|logχ|))(𝚯𝝅0)τχ,δ(𝜽)n1κ0n1δbexpΩsuperscript𝛿2log𝜒subscript𝚯subscript𝝅0subscript𝜏𝜒𝛿superscript𝜽𝑛1subscriptsuperscript𝜅𝑛10subscript𝛿𝑏\displaystyle\ \ \textbf{{exp}}\left(-\Omega(\delta^{2}|\textbf{{log}}\ \chi|)% \right)\cdot\mathcal{L}(\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}})\cdot\frac{% \tau_{\chi,\delta}(\boldsymbol{\theta})^{n-1}\kappa^{n-1}_{0}}{\delta_{b}}exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) ⋅ caligraphic_L ( bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ⋅ divide start_ARG italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( bold_italic_θ ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_κ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG
=\displaystyle== exp(Ω(δ2|logχ|))O(|logχ|n1).expΩsuperscript𝛿2log𝜒𝑂superscriptlog𝜒𝑛1\displaystyle\ \ \textbf{{exp}}\left(-\Omega(\delta^{2}|\textbf{{log}}\ \chi|)% \right)\cdot O(|\textbf{{log}}\ \chi|^{n-1}).exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) ⋅ italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) .

Combine (167), (169), (173) and (183), we have

(Sτχ,δ(Θ),Θ𝚯𝝅0,𝑬𝝅0)exp(Ω(δ2|logχ|))O(|logχ|n1).formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0subscript𝑬subscript𝝅0expΩsuperscript𝛿2log𝜒𝑂superscriptlog𝜒𝑛1\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}},\boldsymbol{E}_{\boldsymbol{\pi}_{0}})\leq\textbf{{exp}}% \left(-\Omega(\delta^{2}|\textbf{{log}}\ \chi|)\right)\cdot O(|\textbf{{log}}% \ \chi|^{n-1}).blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) ⋅ italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) . (184)

we have established a upper bound for first term in the right-hand side of (166). To bound the second term, we adopt the following lemma [16].

Lemma 7.

For every full ranking involving n𝑛nitalic_n candidates 𝛑0subscript𝛑0\boldsymbol{\pi}_{0}bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT generated by (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ), the corresponding risk is

Θ(𝑹(𝚲,S))=(i,j)𝕮𝕀[θi<θj]ri,j+𝕀[θi>θj](1ri,j).subscriptΘ𝑹𝚲𝑆𝑖𝑗𝕮𝕀delimited-[]subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗subscript𝑟𝑖𝑗𝕀delimited-[]subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗1subscript𝑟𝑖𝑗\mathfrak{R}_{\Theta}(\boldsymbol{R}(\boldsymbol{\Lambda},S))=\underset{(i,j)% \in\boldsymbol{\mathfrak{C}}}{\sum}\mathbbm{I}[\theta^{\prime}_{i}<\theta^{% \prime}_{j}]r_{i,j}+\mathbbm{I}[\theta^{\prime}_{i}>\theta^{\prime}_{j}](1-r_{% i,j}).fraktur_R start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ( bold_italic_R ( bold_Λ , italic_S ) ) = start_UNDERACCENT ( italic_i , italic_j ) ∈ bold_fraktur_C end_UNDERACCENT start_ARG ∑ end_ARG blackboard_I [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + blackboard_I [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ( 1 - italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) . (43)

If

𝔼[(𝑹(𝚲,S))]ϵ,𝔼delimited-[]𝑹𝚲𝑆italic-ϵ\mathbb{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S))]\leq\epsilon,blackboard_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] ≤ italic_ϵ , (185)

we have

(Θ𝚯𝝅0,𝑬𝝅0c)(1+χδ10ϵ)ϵ.Θsubscript𝚯subscript𝝅0subscriptsuperscript𝑬𝑐subscript𝝅01superscript𝜒𝛿10italic-ϵitalic-ϵ\mathbb{P}(\Theta\in\boldsymbol{\Theta}_{\boldsymbol{\pi}_{0}},\boldsymbol{E}^% {c}_{\boldsymbol{\pi}_{0}})\leq\left(1+\frac{\chi^{\frac{\delta}{10}}}{% \epsilon}\right)\epsilon.blackboard_P ( roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ ( 1 + divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG ) italic_ϵ . (186)

Consequently, (166) will be bounded by

(Sτχ,δ(Θ),Θ𝚯𝝅0)exp(Ω(δ2|logχ|))O(|logχ|n1)+(1+χδ10ϵ)ϵ.formulae-sequence𝑆subscript𝜏𝜒𝛿ΘΘsubscript𝚯subscript𝝅0expΩsuperscript𝛿2log𝜒𝑂superscriptlog𝜒𝑛11superscript𝜒𝛿10italic-ϵitalic-ϵ\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta),\Theta\in\boldsymbol{\Theta}_{% \boldsymbol{\pi}_{0}})\leq\textbf{{exp}}\left(-\Omega(\delta^{2}|\textbf{{log}% }\ \chi|)\right)\cdot O(|\textbf{{log}}\ \chi|^{n-1})+\left(1+\frac{\chi^{% \frac{\delta}{10}}}{\epsilon}\right)\epsilon.blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) , roman_Θ ∈ bold_Θ start_POSTSUBSCRIPT bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) ⋅ italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) + ( 1 + divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG ) italic_ϵ . (187)

Back to (163), it holds that

(Sτχ,δ(Θ)O(1)×{exp(Ω(δ2|logχ|))O(|logχ|n1)+(1+χδ10ϵ)ϵ}.\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta)\leq O(1)\times\left\{\textbf{{exp}}% \left(-\Omega(\delta^{2}|\textbf{{log}}\ \chi|)\right)\cdot O(|\textbf{{log}}% \ \chi|^{n-1})+\left(1+\frac{\chi^{\frac{\delta}{10}}}{\epsilon}\right)% \epsilon\right\}.blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_O ( 1 ) × { exp ( - roman_Ω ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | log italic_χ | ) ) ⋅ italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) + ( 1 + divide start_ARG italic_χ start_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ end_ARG ) italic_ϵ } . (188)

Therefore, when χ0𝜒0\chi\rightarrow 0italic_χ → 0

(Sτχ,δ(Θ)=o(1).\mathbb{P}(S\leq\tau_{\chi,\delta}(\Theta)=o(1).blackboard_P ( italic_S ≤ italic_τ start_POSTSUBSCRIPT italic_χ , italic_δ end_POSTSUBSCRIPT ( roman_Θ ) = italic_o ( 1 ) . (189)

We finish the proof. ∎

By the definition of asymptotic optimality of manipulation policy (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ):

infχ0(𝚲,S)=1,𝜒0inf𝚲𝑆superscript1\underset{\chi\rightarrow 0}{\textbf{{inf}}}\ \frac{\mathfrak{R}(\boldsymbol{% \Lambda},S)}{\mathfrak{R}^{*}}=1,start_UNDERACCENT italic_χ → 0 end_UNDERACCENT start_ARG inf end_ARG divide start_ARG fraktur_R ( bold_Λ , italic_S ) end_ARG start_ARG fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG = 1 , (46)

we know that (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ) is asymptotic optimal when χ0𝜒0\chi\rightarrow 0italic_χ → 0 if

(𝚲,S)=(1+o(1)).𝚲𝑆1𝑜1superscript\mathfrak{R}(\boldsymbol{\Lambda},S)=(1+o(1))\cdot\mathfrak{R}^{*}.fraktur_R ( bold_Λ , italic_S ) = ( 1 + italic_o ( 1 ) ) ⋅ fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . (190)

Theorem 4 tells us that

(𝚲,S)=(1+o(1))=(1+o(1))χ𝔼𝜽Supp(ρ𝜽)[τχ(𝜽)]𝚲𝑆1𝑜1superscript1𝑜1𝜒subscript𝔼similar-to𝜽Suppsubscript𝜌superscript𝜽delimited-[]subscript𝜏𝜒𝜽\mathfrak{R}(\boldsymbol{\Lambda},S)=(1+o(1))\cdot\mathfrak{R}^{*}=(1+o(1))% \chi\mathbb{E}_{\boldsymbol{\theta}\sim\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}})}\left[\tau_{\chi}(\boldsymbol{\theta})\right]fraktur_R ( bold_Λ , italic_S ) = ( 1 + italic_o ( 1 ) ) ⋅ fraktur_R start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( 1 + italic_o ( 1 ) ) italic_χ blackboard_E start_POSTSUBSCRIPT bold_italic_θ ∼ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) ] (191)

as χ0𝜒0\chi\rightarrow 0italic_χ → 0 is sufficient to show the asymptotic optimality of (𝚲,S)𝚲𝑆(\boldsymbol{\Lambda},S)( bold_Λ , italic_S ). To prove the asymptotic optimality with complete knowledge of the proposed stop** time (49) and the generation rule solved by (54), we require the identifiability of model to adopt the MLE and obtain the preference score. It is worth noting that the BTL model could satisfy this requirement.

The following theorem provides the asymptotic upper bounds for the expected Kendall tau of the proposed manipulation policy with complete information.

Theorem 5.

Consider the proposed stop** time (49) and the generation rule solved by (54) with complete knowledge, when the exploration probability p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ) will be

p|logχ|12+δ0proportional-to𝑝superscriptlog𝜒12subscript𝛿0p\propto|\textbf{{log}}\chi|^{-\frac{1}{2}+\delta_{0}}italic_p ∝ | log italic_χ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (192)

for some δ0(0,12)subscript𝛿0012\delta_{0}\in\big{(}0,\frac{1}{2}\big{)}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ). Then we have

𝔼[(𝑹(𝚲,S))]=O(χ).𝔼delimited-[]𝑹𝚲𝑆𝑂𝜒\mathbb{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S))]=O(\chi).blackboard_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S ) ) ] = italic_O ( italic_χ ) . (193)
Proof.

We first discuss the second stop** time S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By the generation rule solved through (54), the expected Kendall tau at S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is

𝔼[(𝑹(𝚲,S2))]𝔼delimited-[]𝑹𝚲subscript𝑆2\displaystyle\mathbb{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S_{2}% ))]blackboard_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ] =\displaystyle== 𝔼(i,j)𝕮𝕀[θi<θj]ri,j𝔼𝑖𝑗𝕮𝕀delimited-[]subscriptsuperscript𝜃𝑖subscriptsuperscript𝜃𝑗subscript𝑟𝑖𝑗\displaystyle\ \ \mathbb{E}\underset{(i,j)\in\boldsymbol{\mathfrak{C}}}{\sum}% \mathbbm{I}[\theta^{\prime}_{i}<\theta^{\prime}_{j}]r_{i,j}blackboard_E start_UNDERACCENT ( italic_i , italic_j ) ∈ bold_fraktur_C end_UNDERACCENT start_ARG ∑ end_ARG blackboard_I [ italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (194)
=\displaystyle== Supp(ρ𝜽)(i,j):𝜽𝚯i,j{sup𝜽1𝚯i,jS2(𝜽1)>sup𝜽2𝚯j,iS2(𝜽2)|Θ=𝜽}ρ𝜽(𝜽)𝑑𝜽subscriptSuppsubscript𝜌superscript𝜽:𝑖𝑗𝜽subscript𝚯𝑖𝑗conditional-setsubscript𝜽1subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2subscript𝜽1subscript𝜽2subscript𝚯𝑗𝑖supsubscriptsubscript𝑆2subscript𝜽2Θ𝜽subscript𝜌superscript𝜽𝜽differential-d𝜽\displaystyle\ \ \int_{\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}% \underset{(i,j):\boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{\sum}\mathbb{% P}\left\{\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{% \textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta}_{1})>\underset{\boldsymbol{% \theta}_{2}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S_{2}}(% \boldsymbol{\theta}_{2})\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}\rho_{% \boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})d\boldsymbol{\theta}∫ start_POSTSUBSCRIPT Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_UNDERACCENT ( italic_i , italic_j ) : bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG ∑ end_ARG blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) > start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | roman_Θ = bold_italic_θ } italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_d bold_italic_θ
=\displaystyle== Supp(ρ𝜽)𝜽𝚯i,j{sup𝜽𝚯i,jS2(𝜽1)sup𝜽𝚯j,iS2(𝜽2)>zα(χ)|Θ=𝜽}ρ𝜽(𝜽)𝑑𝜽subscriptSuppsubscript𝜌superscript𝜽𝜽subscript𝚯𝑖𝑗conditional-set𝜽subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2subscript𝜽1𝜽subscript𝚯𝑗𝑖supsubscriptsubscript𝑆2subscript𝜽2subscript𝑧𝛼𝜒Θ𝜽subscript𝜌superscript𝜽𝜽differential-d𝜽\displaystyle\ \ \int_{\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}% \underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{\sum}\mathbb{P}% \left\{\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup% }}}\ \ell_{S_{2}}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}\in% \boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta}_{% 2})>z_{\alpha}(\chi)\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}\rho_{% \boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})d\boldsymbol{\theta}∫ start_POSTSUBSCRIPT Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG ∑ end_ARG blackboard_P { start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) italic_d bold_italic_θ

where 𝑹(𝚲,S)𝑹𝚲𝑆\boldsymbol{R}(\boldsymbol{\Lambda},S)bold_italic_R ( bold_Λ , italic_S ), ri,jsubscript𝑟𝑖𝑗r_{i,j}italic_r start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, 𝚯i,jsubscript𝚯𝑖𝑗\boldsymbol{\Theta}_{i,j}bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, zα(χ)subscript𝑧𝛼𝜒z_{\alpha}(\chi)italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ), S(𝜽)subscript𝑆𝜽\ell_{S}(\boldsymbol{\theta})roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ ) are defined as (41), (42), (52), (50) and (171) correspondingly. The above equation can be further bounded by

𝔼[(𝑹(𝚲,S2))]𝔼delimited-[]𝑹𝚲subscript𝑆2\displaystyle\ \ \mathbb{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S% _{2}))]blackboard_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ] (195)
\displaystyle\leq n(n1)2(Supp(ρ𝜽))×sup𝜽Supp(ρ𝜽)ρ𝜽(𝜽)𝑛𝑛12Suppsubscript𝜌superscript𝜽𝜽Suppsubscript𝜌superscript𝜽supsubscript𝜌superscript𝜽𝜽\displaystyle\ \ \frac{n(n-1)}{2}\cdot\mathcal{L}\big{(}\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})\big{)}\times\underset{\boldsymbol{\theta}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \rho_{% \boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 2 end_ARG ⋅ caligraphic_L ( Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) × start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ )
×sup𝜽Supp(ρ𝜽)maxp(i,j):𝜽𝚯i,j{sup𝜽′′𝚯i,jS2(𝜽′′)S2(𝜽)>zα(χ)|Θ=𝜽}.absent𝜽Suppsubscript𝜌superscript𝜽sup:𝑖𝑗𝜽subscript𝚯𝑖𝑗maxpconditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2superscript𝜽′′subscriptsubscript𝑆2𝜽subscript𝑧𝛼𝜒Θ𝜽\displaystyle\ \ \times\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \underset{(i,j):\boldsymbol{% \theta}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{max\phantom{p}}}}\ \mathbb{P}% \left\{\underset{\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{i,j% }}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta}^{\prime\prime})-\ell_{S_{% 2}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \Bigg{|}\ \Theta=\boldsymbol{\theta% }\right\}.× start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } .

Here the inequality holds as

sup𝜽1𝚯i,jS2(𝜽1)S2(𝜽2),if𝜽2𝚯i,jformulae-sequencesubscript𝜽1subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2subscript𝜽1subscriptsubscript𝑆2subscript𝜽2ifsubscript𝜽2subscript𝚯𝑖𝑗\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}% \ell_{S_{2}}(\boldsymbol{\theta}_{1})\geq\ell_{S_{2}}(\boldsymbol{\theta}_{2})% ,\ \ \text{if}\ \ \boldsymbol{\theta}_{2}\notin\boldsymbol{\Theta}_{i,j}start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , if bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∉ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (196)

and

sup𝜽Supp(ρ𝜽)ρ𝜽(𝜽)<𝜽Suppsubscript𝜌superscript𝜽supsubscript𝜌superscript𝜽𝜽\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\textbf{{sup}}}\ \rho_{\boldsymbol{\theta}^{\prime}}(\boldsymbol{% \theta})<\inftystart_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) < ∞ (197)

by Assumption (5). The last part of (195) can be decomposed as

{sup𝜽′′𝚯i,jS2(𝜽′′)S2(𝜽)>zα(χ)|Θ=𝜽}conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2superscript𝜽′′subscriptsubscript𝑆2𝜽subscript𝑧𝛼𝜒Θ𝜽\displaystyle\ \ \mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta% }^{\prime\prime})-\ell_{S_{2}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \Bigg{|}% \ \Theta=\boldsymbol{\theta}\right\}blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } (198)
\displaystyle\leq {sup𝜽′′𝚯i,jS2(𝜽′′)S2(𝜽)>zα(χ)andS2τ|Θ=𝜽}+{S2τ|Θ=𝜽}.conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2superscript𝜽′′subscriptsubscript𝑆2𝜽subscript𝑧𝛼𝜒andsubscript𝑆2𝜏Θ𝜽conditional-setsubscript𝑆2𝜏Θ𝜽\displaystyle\ \ \mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta% }^{\prime\prime})-\ell_{S_{2}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \text{% and}\ S_{2}\leq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}+\mathbb{P}% \left\{S_{2}\geq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}.blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) and italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_τ | roman_Θ = bold_italic_θ } + blackboard_P { italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_τ | roman_Θ = bold_italic_θ } .

For the first term of the above right-hand side, we introduce S2τ=min(S2,τ)subscript𝑆2𝜏minsubscript𝑆2𝜏S_{2}\wedge\tau=\textbf{{min}}(S_{2},\tau)italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ = min ( italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_τ ) and write

{sup𝜽′′𝚯i,jS2(𝜽′′)S2(𝜽)>zα(χ)andS2τ|Θ=𝜽}conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2superscript𝜽′′subscriptsubscript𝑆2𝜽subscript𝑧𝛼𝜒andsubscript𝑆2𝜏Θ𝜽\displaystyle\ \ \mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta% }^{\prime\prime})-\ell_{S_{2}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \text{% and}\ S_{2}\leq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) and italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_τ | roman_Θ = bold_italic_θ } (199)
\displaystyle\leq {sup𝜽′′𝚯i,jS2τ(𝜽′′)S2τ(𝜽)>zα(χ)|Θ=𝜽}conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2𝜏superscript𝜽′′subscriptsubscript𝑆2𝜏𝜽subscript𝑧𝛼𝜒Θ𝜽\displaystyle\ \ \mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{2}\wedge\tau}(% \boldsymbol{\theta}^{\prime\prime})-\ell_{S_{2}\wedge\tau}(\boldsymbol{\theta}% )>z_{\alpha}(\chi)\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ }

Let η(𝜽′′)𝜂superscript𝜽′′\eta(\boldsymbol{\theta}^{\prime\prime})italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) be a random field

η(𝜽′′)=S2τ(𝜽′′)S2τ(𝜽),𝜽′′𝚯i,j.formulae-sequence𝜂superscript𝜽′′subscriptsubscript𝑆2𝜏superscript𝜽′′subscriptsubscript𝑆2𝜏𝜽for-allsuperscript𝜽′′subscript𝚯𝑖𝑗\eta(\boldsymbol{\theta}^{\prime\prime})=\ell_{S_{2}\wedge\tau}(\boldsymbol{% \theta}^{\prime\prime})-\ell_{S_{2}\wedge\tau}(\boldsymbol{\theta}),\ \ % \forall\ \ \boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{i,j}.italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) = roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ ) , ∀ bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT . (200)

The marginal tail probability of η(𝜽′′)𝜂superscript𝜽′′\eta(\boldsymbol{\theta}^{\prime\prime})italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) can be obtained by the following lemma [16].

Lemma 8.

For all 𝛉′′𝛉superscript𝛉′′𝛉\boldsymbol{\theta}^{\prime\prime}\neq\boldsymbol{\theta}bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ≠ bold_italic_θ and constant L>0𝐿0L>0italic_L > 0, we have

{sup𝜽′′𝚯i,jSτ(𝜽′′)Sτ(𝜽)>L|Θ=𝜽}exp(L).conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscript𝑆𝜏superscript𝜽′′subscript𝑆𝜏𝜽𝐿Θ𝜽exp𝐿\mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{% \Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S\wedge\tau}(\boldsymbol{\theta}^{\prime% \prime})-\ell_{S\wedge\tau}(\boldsymbol{\theta})>L\ \Bigg{|}\ \Theta=% \boldsymbol{\theta}\right\}\leq\textbf{{exp}}(-L).blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_L | roman_Θ = bold_italic_θ } ≤ exp ( - italic_L ) . (201)

We can take L=zα(χ)1𝐿subscript𝑧𝛼𝜒1L=z_{\alpha}(\chi)-1italic_L = italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) - 1 and obtain

{η(𝜽′′)>zα(χ)|Θ=𝜽}exp(zα(χ)+1).conditional-set𝜂superscript𝜽′′subscript𝑧𝛼𝜒Θ𝜽expsubscript𝑧𝛼𝜒1\mathbb{P}\left\{\eta(\boldsymbol{\theta}^{\prime\prime})>z_{\alpha}(\chi)\ % \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}\leq\textbf{{exp}}(-z_{\alpha}(% \chi)+1).blackboard_P { italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } ≤ exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) + 1 ) . (202)

Moreover, η(𝜽′′)𝜂superscript𝜽′′\eta(\boldsymbol{\theta}^{\prime\prime})italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) is a Lipschitz-continuous function as

η(𝜽′′)η(𝜽′′′)|S2τ(𝜽′′)S2τ(𝜽′′′)|τκ0𝜽′′𝜽′′′.𝜂superscript𝜽′′𝜂superscript𝜽′′′subscriptsubscript𝑆2𝜏superscript𝜽′′subscriptsubscript𝑆2𝜏superscript𝜽′′′𝜏subscript𝜅0normsuperscript𝜽′′superscript𝜽′′′\eta(\boldsymbol{\theta}^{\prime\prime})-\eta(\boldsymbol{\theta}^{\prime% \prime\prime})\leq|\ell_{S_{2}\wedge\tau}(\boldsymbol{\theta}^{\prime\prime})-% \ell_{S_{2}\wedge\tau}(\boldsymbol{\theta}^{\prime\prime\prime})|\leq\tau\cdot% \kappa_{0}\|\boldsymbol{\theta}^{\prime\prime}-\boldsymbol{\theta}^{\prime% \prime\prime}\|.italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ) ≤ | roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_τ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ) | ≤ italic_τ ⋅ italic_κ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ∥ . (203)

Combining Lemma 6 and 7, we arrive at

{sup𝜽′′𝚯i,jη(𝜽′′)>zα(χ)|Θ=𝜽}O(τn1exp(zα(χ))).conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗sup𝜂superscript𝜽′′subscript𝑧𝛼𝜒Θ𝜽𝑂superscript𝜏𝑛1expsubscript𝑧𝛼𝜒\mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{% \Theta}_{i,j}}{\textbf{{sup}}}\ \eta(\boldsymbol{\theta}^{\prime\prime})>z_{% \alpha}(\chi)\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}\leq O\Big{(}\tau^{% n-1}\textbf{{exp}}\big{(}-z_{\alpha}(\chi)\big{)}\Big{)}.blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG italic_η ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } ≤ italic_O ( italic_τ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) ) ) . (204)

To bound {S2τ|Θ=𝜽}conditional-setsubscript𝑆2𝜏Θ𝜽\mathbb{P}\left\{S_{2}\geq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}blackboard_P { italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_τ | roman_Θ = bold_italic_θ }, we need the following lemma.

Lemma 9.

When τ𝜏\tauitalic_τ satisfies

τ=Ω(|logχ|3),𝜏Ωsuperscriptlog𝜒3\tau=\Omega(|\textbf{{log}}\ \chi|^{3}),italic_τ = roman_Ω ( | log italic_χ | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) , (205)

we have

{S2τ|Θ=𝜽}χ2.conditional-setsubscript𝑆2𝜏Θ𝜽superscript𝜒2\mathbb{P}\left\{S_{2}\geq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}% \leq\chi^{2}.blackboard_P { italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_τ | roman_Θ = bold_italic_θ } ≤ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (206)

For the stop** time S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have the similar result:

{S2τ|Θ=𝜽}χ2,ifτ=Ω(|logχ|3).formulae-sequenceconditional-setsubscript𝑆2𝜏Θ𝜽superscript𝜒2if𝜏Ωsuperscriptlog𝜒3\mathbb{P}\left\{S_{2}\geq\tau\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right\}% \leq\chi^{2},\ \ \text{if}\ \tau=\Omega(|\textbf{{log}}\ \chi|^{3}).blackboard_P { italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_τ | roman_Θ = bold_italic_θ } ≤ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , if italic_τ = roman_Ω ( | log italic_χ | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) . (207)

Combining (198) with (204) and Lemma 8, we have

{sup𝜽′′𝚯i,jS2(𝜽′′)S2(𝜽)>zα(χ)|Θ=𝜽}conditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆2superscript𝜽′′subscriptsubscript𝑆2𝜽subscript𝑧𝛼𝜒Θ𝜽\displaystyle\ \ \mathbb{P}\left\{\underset{\boldsymbol{\theta}^{\prime\prime}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{2}}(\boldsymbol{\theta% }^{\prime\prime})-\ell_{S_{2}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \Bigg{|}% \ \Theta=\boldsymbol{\theta}\right\}blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } (208)
\displaystyle\leq O(χ2)+O(τn1exp(zα(χ)))𝑂superscript𝜒2𝑂superscript𝜏𝑛1expsubscript𝑧𝛼𝜒\displaystyle\ \ O(\chi^{2})+O\Big{(}\tau^{n-1}\textbf{{exp}}\big{(}-z_{\alpha% }(\chi)\big{)}\Big{)}italic_O ( italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_O ( italic_τ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) ) )
=\displaystyle== O(χ2)+O(exp(|logχ||logχ|1α+(n1)logτ))𝑂superscript𝜒2𝑂explog𝜒superscriptlog𝜒1𝛼𝑛1log𝜏\displaystyle\ \ O(\chi^{2})+O\Big{(}\textbf{{exp}}(-|\textbf{{log}}\ \chi|-|% \textbf{{log}}\ \chi|^{1-\alpha}+(n-1)\textbf{{log}}\ \tau)\Big{)}italic_O ( italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_O ( exp ( - | log italic_χ | - | log italic_χ | start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT + ( italic_n - 1 ) log italic_τ ) )
=\displaystyle== O(χ2)+O(χexp(|logχ|1α+3(n1)log|logχ|))𝑂superscript𝜒2𝑂𝜒expsuperscriptlog𝜒1𝛼3𝑛1loglog𝜒\displaystyle\ \ O(\chi^{2})+O\Big{(}\chi\textbf{{exp}}(-|\textbf{{log}}\ \chi% |^{1-\alpha}+3(n-1)\textbf{{log}}\ |\textbf{{log}}\ \chi|)\Big{)}italic_O ( italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_O ( italic_χ exp ( - | log italic_χ | start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT + 3 ( italic_n - 1 ) log | log italic_χ | ) )
=\displaystyle== o(χ)𝑜𝜒\displaystyle\ \ o(\chi)italic_o ( italic_χ )

and finish the analysis of S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

For S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have

max(i,j):1i<jnexp{min(sup𝜽1𝚯i,jS1(𝜽1)sup𝜽2Supp(ρ𝜽)S1(𝜽2),sup𝜽1𝚯j,iS1(𝜽1)sup𝜽2Supp(ρ𝜽)S1(𝜽2))}:𝑖𝑗1𝑖𝑗𝑛maxexpabsentminsubscript𝜽1subscript𝚯𝑖𝑗supsubscriptsubscript𝑆1subscript𝜽1subscript𝜽2Suppsubscript𝜌superscript𝜽supsubscriptsubscript𝑆1subscript𝜽2subscript𝜽1subscript𝚯𝑗𝑖supsubscriptsubscript𝑆1subscript𝜽1subscript𝜽2Suppsubscript𝜌superscript𝜽supsubscriptsubscript𝑆1subscript𝜽2\displaystyle\ \ \underset{(i,j):1\leq i<j\leq n}{\textbf{{max}}}\ \textbf{{% exp}}\left\{\underset{}{\textbf{{min}}}\left(\underset{\boldsymbol{\theta}_{1}% \in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta% }_{1})-\underset{\boldsymbol{\theta}_{2}\in\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}})}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta}_{2}),% \underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}% \ \ell_{S_{1}}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \ell_{S% _{1}}(\boldsymbol{\theta}_{2})\right)\right\}start_UNDERACCENT ( italic_i , italic_j ) : 1 ≤ italic_i < italic_j ≤ italic_n end_UNDERACCENT start_ARG max end_ARG exp { start_UNDERACCENT end_UNDERACCENT start_ARG min end_ARG ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) } (209)
\displaystyle\leq (i,j):1i<jnexp{min(sup𝜽1𝚯i,jS1(𝜽1)sup𝜽2Supp(ρ𝜽)S1(𝜽2),sup𝜽1𝚯j,iS1(𝜽1)sup𝜽2Supp(ρ𝜽)S1(𝜽2))}:𝑖𝑗1𝑖𝑗𝑛expabsentminsubscript𝜽1subscript𝚯𝑖𝑗supsubscriptsubscript𝑆1subscript𝜽1subscript𝜽2Suppsubscript𝜌superscript𝜽supsubscriptsubscript𝑆1subscript𝜽2subscript𝜽1subscript𝚯𝑗𝑖supsubscriptsubscript𝑆1subscript𝜽1subscript𝜽2Suppsubscript𝜌superscript𝜽supsubscriptsubscript𝑆1subscript𝜽2\displaystyle\ \ \underset{(i,j):1\leq i<j\leq n}{\sum}\textbf{{exp}}\left\{% \underset{}{\textbf{{min}}}\left(\underset{\boldsymbol{\theta}_{1}\in% \boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta}_{% 1})-\underset{\boldsymbol{\theta}_{2}\in\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}})}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta}_{2}),% \underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}% \ \ell_{S_{1}}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \ell_{S% _{1}}(\boldsymbol{\theta}_{2})\right)\right\}start_UNDERACCENT ( italic_i , italic_j ) : 1 ≤ italic_i < italic_j ≤ italic_n end_UNDERACCENT start_ARG ∑ end_ARG exp { start_UNDERACCENT end_UNDERACCENT start_ARG min end_ARG ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) }
\displaystyle\leq exp(zα(χ)).expsubscript𝑧𝛼𝜒\displaystyle\ \ \textbf{{exp}}(-z_{\alpha}(\chi)).exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) ) .

We take logarithm for the both sides of the inequality

min(sup𝜽2Supp(ρ𝜽)S1(𝜽2)min(sup𝜽1𝚯i,jS1(𝜽1),sup𝜽1𝚯j,iS1(𝜽1)))zα(χ).absentminsubscript𝜽2Suppsubscript𝜌superscript𝜽supsubscriptsubscript𝑆1subscript𝜽2absentminsubscript𝜽1subscript𝚯𝑖𝑗supsubscriptsubscript𝑆1subscript𝜽1subscript𝜽1subscript𝚯𝑗𝑖supsubscriptsubscript𝑆1subscript𝜽1subscript𝑧𝛼𝜒\underset{}{\textbf{{min}}}\left(\underset{\boldsymbol{\theta}_{2}\in\textbf{{% Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \ell_{S_{1}}(% \boldsymbol{\theta}_{2})-\underset{}{\textbf{{min}}}\left(\underset{% \boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S_% {1}}(\boldsymbol{\theta}_{1}),\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{% \Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta}_{1})\right)% \right)\geq z_{\alpha}(\chi).start_UNDERACCENT end_UNDERACCENT start_ARG min end_ARG ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - start_UNDERACCENT end_UNDERACCENT start_ARG min end_ARG ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ) ≥ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) . (210)

Then the expected Kendall tau at S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT will be bounded by

𝔼[(𝑹(𝚲,S1))]𝔼delimited-[]𝑹𝚲subscript𝑆1\displaystyle\ \ \mathbb{E}[\mathfrak{R}(\boldsymbol{R}(\boldsymbol{\Lambda},S% _{1}))]blackboard_E [ fraktur_R ( bold_italic_R ( bold_Λ , italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ] (211)
\displaystyle\leq n(n1)2(Supp(ρ𝜽))×sup𝜽Supp(ρ𝜽)ρ𝜽(𝜽)𝑛𝑛12Suppsubscript𝜌superscript𝜽𝜽Suppsubscript𝜌superscript𝜽supsubscript𝜌superscript𝜽𝜽\displaystyle\ \ \frac{n(n-1)}{2}\cdot\mathcal{L}\big{(}\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})\big{)}\times\underset{\boldsymbol{\theta}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \rho_{% \boldsymbol{\theta}^{\prime}}(\boldsymbol{\theta})divide start_ARG italic_n ( italic_n - 1 ) end_ARG start_ARG 2 end_ARG ⋅ caligraphic_L ( Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) × start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ )
×sup𝜽Supp(ρ𝜽)maxp(i,j):𝜽𝚯i,j{sup𝜽′′𝚯i,jS1(𝜽′′)S1(𝜽)>zα(χ)|Θ=𝜽}.absent𝜽Suppsubscript𝜌superscript𝜽sup:𝑖𝑗𝜽subscript𝚯𝑖𝑗maxpconditional-setsuperscript𝜽′′subscript𝚯𝑖𝑗supsubscriptsubscript𝑆1superscript𝜽′′subscriptsubscript𝑆1𝜽subscript𝑧𝛼𝜒Θ𝜽\displaystyle\ \ \times\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}{\textbf{{sup}}}\ \underset{(i,j):\boldsymbol{% \theta}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{max\phantom{p}}}}\ \mathbb{P}% \left\{\underset{\boldsymbol{\theta}^{\prime\prime}\in\boldsymbol{\Theta}_{i,j% }}{\textbf{{sup}}}\ \ell_{S_{1}}(\boldsymbol{\theta}^{\prime\prime})-\ell_{S_{% 1}}(\boldsymbol{\theta})>z_{\alpha}(\chi)\ \Bigg{|}\ \Theta=\boldsymbol{\theta% }\right\}.× start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG start_UNDERACCENT ( italic_i , italic_j ) : bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max bold_italic_p end_ARG blackboard_P { start_UNDERACCENT bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_θ ) > italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) | roman_Θ = bold_italic_θ } .

The rest of the proof is similar to that for S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. ∎

Now we arrive at the asymptotic optimality of the expected stop** time of the proposed manipulation policy.

Theorem 6.

Consider the proposed stop** time (49) and the generation rule solved by (54) with complete knowledge, when the exploration probability p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ) will be

p|logχ|12+δ0proportional-to𝑝superscriptlog𝜒12subscript𝛿0p\propto|\textbf{{log}}\chi|^{-\frac{1}{2}+\delta_{0}}italic_p ∝ | log italic_χ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (212)

for some δ0(0,12)subscript𝛿0012\delta_{0}\in\big{(}0,\frac{1}{2}\big{)}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ). Then we have

lim supχ0𝔼[S]𝔼[τχ(Θ)]1.𝜒0lim sup𝔼delimited-[]𝑆𝔼delimited-[]subscript𝜏𝜒Θ1\underset{\chi\rightarrow 0}{\textbf{{lim sup}}}\ \frac{\mathbb{E}[S]}{\mathbb% {E}\left[\tau_{\chi}(\Theta)\right]}\leq 1.start_UNDERACCENT italic_χ → 0 end_UNDERACCENT start_ARG lim sup end_ARG divide start_ARG blackboard_E [ italic_S ] end_ARG start_ARG blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] end_ARG ≤ 1 . (213)
Proof.

First we establish a upper bound for the expectation of a stop** time S𝑆Sitalic_S, the numerator 𝔼[S]𝔼delimited-[]𝑆\mathbb{E}[S]blackboard_E [ italic_S ].

𝔼[S]𝔼delimited-[]𝑆\displaystyle\mathbb{E}[S]blackboard_E [ italic_S ] =\displaystyle== m=0𝔼[S|m(1+δ)τχ(Θ)S<(m+1)(1+δ)τχ(Θ)]superscriptsubscript𝑚0𝔼delimited-[]conditional𝑆𝑚1𝛿subscript𝜏𝜒Θ𝑆𝑚11𝛿subscript𝜏𝜒Θ\displaystyle\ \ \sum_{m=0}^{\infty}\mathbb{E}\left[S\ \big{|}\ m(1+\delta)% \tau_{\chi}(\Theta)\leq S<(m+1)(1+\delta)\tau_{\chi}(\Theta)\right]∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_E [ italic_S | italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] (214)
\displaystyle\leq (1+δ)𝔼[τχ(Θ)]+m=1𝔼[S|m(1+δ)τχ(Θ)S<(m+1)(1+δ)τχ(Θ)]1𝛿𝔼delimited-[]subscript𝜏𝜒Θsuperscriptsubscript𝑚1𝔼delimited-[]conditional𝑆𝑚1𝛿subscript𝜏𝜒Θ𝑆𝑚11𝛿subscript𝜏𝜒Θ\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+\sum_{m=1}^{\infty}% \mathbb{E}\left[S\ \big{|}\ m(1+\delta)\tau_{\chi}(\Theta)\leq S<(m+1)(1+% \delta)\tau_{\chi}(\Theta)\right]( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_E [ italic_S | italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ]
\displaystyle\leq (1+δ)𝔼[τχ(Θ)]+(1+δ)max𝜽Supp(ρ𝜽)τχ(𝜽)\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+(1+\delta)\cdot% \underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\textbf{{max}}}\ \tau_{\chi}(\boldsymbol{\theta})\cdot( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + ( 1 + italic_δ ) ⋅ start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅
m=1(m+1)(m(1+δ)τχ(Θ)S<(m+1)(1+δ)τχ(Θ))superscriptsubscript𝑚1𝑚1𝑚1𝛿subscript𝜏𝜒Θ𝑆𝑚11𝛿subscript𝜏𝜒Θ\displaystyle\ \ \phantom{(1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]}\sum_{m=1}% ^{\infty}(m+1)\mathbb{P}(m(1+\delta)\tau_{\chi}(\Theta)\leq S<(m+1)(1+\delta)% \tau_{\chi}(\Theta))∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_m + 1 ) blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) )
\displaystyle\leq (1+δ)𝔼[τχ(Θ)]+(1+δ)max𝜽Supp(ρ𝜽)τχ(𝜽)\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+(1+\delta)\cdot% \underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\textbf{{max}}}\ \tau_{\chi}(\boldsymbol{\theta})\cdot( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + ( 1 + italic_δ ) ⋅ start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅
m=1(m+1)max𝜽Supp(ρ𝜽)(m(1+δ)τχ(Θ)S<(m+1)(1+δ)τχ(Θ)|Θ=𝜽).superscriptsubscript𝑚1𝑚1𝜽Suppsubscript𝜌superscript𝜽max𝑚1𝛿subscript𝜏𝜒Θ𝑆bra𝑚11𝛿subscript𝜏𝜒ΘΘ𝜽\displaystyle\ \ \phantom{(1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]}\sum_{m=1}% ^{\infty}(m+1)\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{% \boldsymbol{\theta}^{\prime}})}{\textbf{{max}}}\mathbb{P}\left(m(1+\delta)\tau% _{\chi}(\Theta)\leq S<(m+1)(1+\delta)\tau_{\chi}(\Theta)\ \Big{|}\ \Theta=% \boldsymbol{\theta}\right).∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_m + 1 ) start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ ) .

We start with stop** time S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For m1𝑚1m\geq 1italic_m ≥ 1,

(m(1+δ)τχ(Θ)S2<(m+1)(1+δ)τχ(Θ)|Θ=𝜽)𝑚1𝛿subscript𝜏𝜒Θsubscript𝑆2bra𝑚11𝛿subscript𝜏𝜒ΘΘ𝜽\displaystyle\ \ \mathbb{P}\left(m(1+\delta)\tau_{\chi}(\Theta)\leq S_{2}<(m+1% )(1+\delta)\tau_{\chi}(\Theta)\ \Big{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ ) (215)
\displaystyle\leq (m(1+δ)τχ(Θ)S2<(m+1)(1+δ)τχ(Θ),maxS/(1+δ)τχ(Θ)[δ2m,m+1]𝜽^S𝜽|logχ|δ1|Θ=𝜽)\displaystyle\ \ \mathbb{P}\left(m(1+\delta)\tau_{\chi}(\Theta)\leq S_{2}<(m+1% )(1+\delta)\tau_{\chi}(\Theta),\underset{\begin{matrix}\scriptstyle S/(1+% \delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1]\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\leq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) , start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ] end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≤ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ )
+(maxS/(1+δ)τχ(Θ)[δ2m,m+1]𝜽^S𝜽|logχ|δ1|Θ=𝜽),matrix𝑆1𝛿subscript𝜏𝜒Θabsentsubscript𝛿2𝑚𝑚1maxnormsubscriptbold-^𝜽𝑆𝜽conditionalsuperscriptlog𝜒subscript𝛿1Θ𝜽\displaystyle\ \ +\mathbb{P}\left(\underset{\begin{matrix}\scriptstyle S/(1+% \delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1]\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\geq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right),+ blackboard_P ( start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ] end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≥ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ ) ,

where 𝜽^Ssubscriptbold-^𝜽𝑆\boldsymbol{\hat{\theta}}_{S}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is the MLE

𝜽^S=arg sup𝜽Supp(ρ𝜽)L(𝜽,𝒘𝒜(S)),subscriptbold-^𝜽𝑆𝜽Suppsubscript𝜌superscript𝜽arg sup𝐿𝜽subscriptsuperscript𝒘𝑆𝒜\boldsymbol{\hat{\theta}}_{S}=\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(% \rho_{\boldsymbol{\theta}^{\prime}})}{\textbf{{arg sup}}}\ L\bigg{(}% \boldsymbol{\theta},\boldsymbol{w}^{(S)}_{\mathcal{A}}\bigg{)},overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG arg sup end_ARG italic_L ( bold_italic_θ , bold_italic_w start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ) , (48)

and the δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and δ2subscript𝛿2\delta_{2}italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is two constants related to exploration probability p|logχ|12+δ0proportional-to𝑝superscriptlog𝜒12subscript𝛿0p\propto|\textbf{{log}}\chi|^{-\frac{1}{2}+\delta_{0}}italic_p ∝ | log italic_χ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT:

δ1=δ08,δ2=|logχ|δ02.formulae-sequencesubscript𝛿1subscript𝛿08subscript𝛿2superscriptlog𝜒subscript𝛿02\delta_{1}=\frac{\delta_{0}}{8},\ \ \delta_{2}=|\textbf{{log}}\ \chi|^{-\frac{% \delta_{0}}{2}}.italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 8 end_ARG , italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = | log italic_χ | start_POSTSUPERSCRIPT - divide start_ARG italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT . (216)

The following lemma show a upper bound of the second term in the preceding display.

Lemma 10.

Suppose 𝛌(S)={λi,j(S)}superscript𝛌𝑆subscriptsuperscript𝜆𝑆𝑖𝑗\boldsymbol{\lambda}^{(S)}=\{\lambda^{(S)}_{i,j}\}bold_italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT = { italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT } is a generation rule at S𝑆Sitalic_S step and {ϵ𝛌,T1,T2}subscriptitalic-ϵ𝛌subscript𝑇1subscript𝑇2\{\epsilon_{\boldsymbol{\lambda},T_{1},T_{2}}\}{ italic_ϵ start_POSTSUBSCRIPT bold_italic_λ , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, {δT1,T2}subscript𝛿subscript𝑇1subscript𝑇2\{\delta_{T_{1},T_{2}}\}{ italic_δ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } is two sequences of real numbers such that

minT1ST2,(i,j)λi,j(S)ϵ𝝀,T1,T2,matrixsubscript𝑇1𝑆subscript𝑇2𝑖𝑗minsubscriptsuperscript𝜆𝑆𝑖𝑗subscriptitalic-ϵ𝝀subscript𝑇1subscript𝑇2\displaystyle\underset{\begin{matrix}\scriptstyle T_{1}\leq S\leq T_{2},\\ \scriptstyle(i,j)\in\mathfrak{C}\end{matrix}}{\textbf{{min}}}\ \lambda^{(S)}_{% i,j}\geq\epsilon_{\boldsymbol{\lambda},T_{1},T_{2}},start_UNDERACCENT start_ARG start_ROW start_CELL italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_S ≤ italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL ( italic_i , italic_j ) ∈ fraktur_C end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG min end_ARG italic_λ start_POSTSUPERSCRIPT ( italic_S ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ≥ italic_ϵ start_POSTSUBSCRIPT bold_italic_λ , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (217)
limT1T1ϵ𝝀,T1,T2δT1,T22=.subscript𝑇1limsubscript𝑇1subscriptitalic-ϵ𝝀subscript𝑇1subscript𝑇2subscriptsuperscript𝛿2subscript𝑇1subscript𝑇2\displaystyle\underset{T_{1}\rightarrow\infty}{\textbf{{lim}}}\ T_{1}\cdot% \epsilon_{\boldsymbol{\lambda},T_{1},T_{2}}\cdot\delta^{2}_{T_{1},T_{2}}=\infty.start_UNDERACCENT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → ∞ end_UNDERACCENT start_ARG lim end_ARG italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_ϵ start_POSTSUBSCRIPT bold_italic_λ , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∞ .

Then, it holds that

(maxT1ST2𝜽^S𝜽δT1,T2|Θ=𝜽)exp(Ω(T1ϵ𝝀,T1,T22δT1,T24))×O(T2n).subscript𝑇1𝑆subscript𝑇2maxnormsubscriptbold-^𝜽𝑆𝜽conditionalsubscript𝛿subscript𝑇1subscript𝑇2Θ𝜽expΩsubscript𝑇1subscriptsuperscriptitalic-ϵ2𝝀subscript𝑇1subscript𝑇2subscriptsuperscript𝛿4subscript𝑇1subscript𝑇2𝑂superscriptsubscript𝑇2𝑛\mathbb{P}\left(\underset{T_{1}\leq S\leq T_{2}}{\textbf{{max}}}\ \|% \boldsymbol{\hat{\theta}}_{S}-\boldsymbol{\theta}\|\geq\delta_{T_{1},T_{2}}\ % \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)\leq\textbf{{exp}}\Big{(}-\Omega% \big{(}T_{1}\epsilon^{2}_{\boldsymbol{\lambda},T_{1},T_{2}}\delta^{4}_{T_{1},T% _{2}}\big{)}\Big{)}\times O\big{(}T_{2}^{n}\big{)}.blackboard_P ( start_UNDERACCENT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_S ≤ italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≥ italic_δ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_Θ = bold_italic_θ ) ≤ exp ( - roman_Ω ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_λ , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) × italic_O ( italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) . (218)

When

T1=m(1+δ)δ2τχ(𝜽),T2=m(1+δ)τχ(𝜽),ϵ𝝀,T1,T2=Ω(|logχ|12+δ0),δT1,T22=|logχ|δ1,formulae-sequencesubscript𝑇1𝑚1𝛿subscript𝛿2subscript𝜏𝜒𝜽formulae-sequencesubscript𝑇2𝑚1𝛿subscript𝜏𝜒𝜽formulae-sequencesubscriptitalic-ϵ𝝀subscript𝑇1subscript𝑇2Ωsuperscriptlog𝜒12subscript𝛿0subscriptsuperscript𝛿2subscript𝑇1subscript𝑇2superscriptlog𝜒subscript𝛿1T_{1}=m(1+\delta)\delta_{2}\tau_{\chi}(\boldsymbol{\theta}),\ \ T_{2}=m(1+% \delta)\tau_{\chi}(\boldsymbol{\theta}),\ \ \epsilon_{\boldsymbol{\lambda},T_{% 1},T_{2}}=\Omega\Big{(}|\textbf{{log}}\chi|^{-\frac{1}{2}+\delta_{0}}\Big{)},% \ \ \delta^{2}_{T_{1},T_{2}}=|\textbf{{log}}\chi|^{-\delta_{1}},italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_m ( 1 + italic_δ ) italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) , italic_ϵ start_POSTSUBSCRIPT bold_italic_λ , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_Ω ( | log italic_χ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (219)

we have

(maxS/(1+δ)τχ(Θ)[δ2m,m+1)𝜽^S𝜽|logχ|δ1|Θ=𝜽)matrix𝑆1𝛿subscript𝜏𝜒Θabsentsubscript𝛿2𝑚𝑚1maxnormsubscriptbold-^𝜽𝑆𝜽conditionalsuperscriptlog𝜒subscript𝛿1Θ𝜽\displaystyle\ \ \mathbb{P}\left(\underset{\begin{matrix}\scriptstyle S/(1+% \delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1)\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\geq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≥ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ ) (220)
\displaystyle\leq exp(Ω(m(1+δ)δ2τχ(𝜽)|logχ|4δ1|logχ|1+2δ0))×O(mn1|logχ|n1)expΩ𝑚1𝛿subscript𝛿2subscript𝜏𝜒𝜽superscriptlog𝜒4subscript𝛿1superscriptlog𝜒12subscript𝛿0𝑂superscript𝑚𝑛1superscriptlog𝜒𝑛1\displaystyle\ \ \textbf{{exp}}\left(-\Omega\Big{(}m(1+\delta)\delta_{2}\tau_{% \chi}(\boldsymbol{\theta})|\textbf{{log}}\chi|^{-4\delta_{1}}|\textbf{{log}}% \chi|^{-1+2\delta_{0}}\Big{)}\right)\times O\Big{(}m^{n-1}|\textbf{{log}}\chi|% ^{n-1}\Big{)}exp ( - roman_Ω ( italic_m ( 1 + italic_δ ) italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) | log italic_χ | start_POSTSUPERSCRIPT - 4 italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | log italic_χ | start_POSTSUPERSCRIPT - 1 + 2 italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ) × italic_O ( italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT )
=\displaystyle== exp(Ω(m|logχ|2δ04δ1δ2))×O(mn1|logχ|n1)expΩ𝑚superscriptlog𝜒2subscript𝛿04subscript𝛿1subscript𝛿2𝑂superscript𝑚𝑛1superscriptlog𝜒𝑛1\displaystyle\ \ \textbf{{exp}}\left(-\Omega\Big{(}m|\textbf{{log}}\chi|^{2% \delta_{0}-4\delta_{1}}\delta_{2}\Big{)}\right)\times O\Big{(}m^{n-1}|\textbf{% {log}}\chi|^{n-1}\Big{)}exp ( - roman_Ω ( italic_m | log italic_χ | start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - 4 italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) × italic_O ( italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT )
=\displaystyle== exp(Ω(m|logχ|δ0))×O(mn1|logχ|n1).expΩ𝑚superscriptlog𝜒subscript𝛿0𝑂superscript𝑚𝑛1superscriptlog𝜒𝑛1\displaystyle\ \ \textbf{{exp}}\left(-\Omega\Big{(}m|\textbf{{log}}\chi|^{% \delta_{0}}\Big{)}\right)\times O\Big{(}m^{n-1}|\textbf{{log}}\chi|^{n-1}\Big{% )}.exp ( - roman_Ω ( italic_m | log italic_χ | start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ) × italic_O ( italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) .

Now we analyze the first term on the right-hand side of (215). For m1𝑚1m\geq 1italic_m ≥ 1, S2>m(1+δ)τχ(𝜽)subscript𝑆2𝑚1𝛿subscript𝜏𝜒𝜽S_{2}>m(1+\delta)\tau_{\chi}(\boldsymbol{\theta})italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) implies that there exists a pairwise comparison (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) such that

|sup𝜽1𝚯i,jS(𝜽1)sup𝜽2𝚯j,iS(𝜽2)|zα(χ),subscript𝜽1subscript𝚯𝑖𝑗supsubscript𝑆subscript𝜽1subscript𝜽2subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽2subscript𝑧𝛼𝜒\left|\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{% sup}}}\ \ell_{S}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in% \boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2})% \right|\leq z_{\alpha}(\chi),| start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | ≤ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) , (221)

where S=m(1+δ)τχ(𝜽)𝑆𝑚1𝛿subscript𝜏𝜒𝜽S=m(1+\delta)\tau_{\chi}(\boldsymbol{\theta})italic_S = italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ). Without loss of generality, let 𝜽𝚯i,j𝜽subscript𝚯𝑖𝑗\boldsymbol{\theta}\in\boldsymbol{\Theta}_{i,j}bold_italic_θ ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT and S2>m(1+δ)τχ(𝜽)subscript𝑆2𝑚1𝛿subscript𝜏𝜒𝜽S_{2}>m(1+\delta)\tau_{\chi}(\boldsymbol{\theta})italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) further shows that

S(𝜽)sup𝜽2𝚯j,iS(𝜽2)zα(χ).subscript𝑆𝜽subscript𝜽2subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽2subscript𝑧𝛼𝜒\ell_{S}(\boldsymbol{\theta})-\underset{\boldsymbol{\theta}_{2}\in\boldsymbol{% \Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2})\leq z_{% \alpha}(\chi).roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≤ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) . (222)

Consequently, the first term on the right-hand side of (215) will be bounded

(m(1+δ)τχ(Θ)S2<(m+1)(1+δ)τχ(Θ),maxS/(1+δ)τχ(Θ)[δ2m,m+1)𝜽^S𝜽|logχ|δ1|Θ=𝜽)\displaystyle\ \ \mathbb{P}\left(m(1+\delta)\tau_{\chi}(\Theta)\leq S_{2}<(m+1% )(1+\delta)\tau_{\chi}(\Theta),\underset{\begin{matrix}\scriptstyle S/(1+% \delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1)\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\leq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) , start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≤ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ ) (223)
\displaystyle\leq (S(𝜽)sup𝜽2𝚯j,iS(𝜽2)zα(χ),maxS/(1+δ)τχ(Θ)[δ2m,m+1)𝜽^S𝜽|logχ|δ1|Θ=𝜽).formulae-sequencesubscript𝑆𝜽subscript𝜽2subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽2subscript𝑧𝛼𝜒matrix𝑆1𝛿subscript𝜏𝜒Θabsentsubscript𝛿2𝑚𝑚1maxnormsubscriptbold-^𝜽𝑆𝜽conditionalsuperscriptlog𝜒subscript𝛿1Θ𝜽\displaystyle\ \ \mathbb{P}\left(\ell_{S}(\boldsymbol{\theta})-\underset{% \boldsymbol{\theta}_{2}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}% (\boldsymbol{\theta}_{2})\leq z_{\alpha}(\chi),\underset{\begin{matrix}% \scriptstyle S/(1+\delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1)\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\leq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right).blackboard_P ( roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≤ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) , start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≤ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ ) .

The preceding display can be bounded by the following lemma.

Lemma 11.

Suppose that the generation rule 𝛌(𝛉^S)superscript𝛌subscriptbold-^𝛉𝑆\boldsymbol{\lambda}^{*}(\boldsymbol{\hat{\theta}}_{S})bold_italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) is solved by (54):

𝝀(𝜽^S)arg max𝝀𝚫ming𝜽~Supp(ρ𝜽)𝝅(𝜽^S)𝝅(𝜽~)(i,j)λi,jgi,j(𝜽^S)loggi,j(𝜽^S)gi,j(𝜽~).superscript𝝀subscriptbold-^𝜽𝑆𝝀𝚫arg maxmatrixbold-~𝜽Suppsubscript𝜌superscript𝜽𝝅subscriptbold-^𝜽𝑆𝝅bold-~𝜽mingsubscript𝑖𝑗subscript𝜆𝑖𝑗subscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆logsubscript𝑔𝑖𝑗subscriptbold-^𝜽𝑆subscript𝑔𝑖𝑗bold-~𝜽\boldsymbol{\lambda}^{*}(\boldsymbol{\hat{\theta}}_{S})\in\underset{\ % \boldsymbol{\lambda}\in\boldsymbol{\Delta}\phantom{\tilde{1}}}{\textbf{{arg\ % max}}}\ \underset{\begin{matrix}\scriptstyle\boldsymbol{\tilde{\theta}}\in% \textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})\\ \scriptstyle\boldsymbol{\pi}(\boldsymbol{\hat{\theta}}_{S})\neq\boldsymbol{\pi% }(\boldsymbol{\tilde{\theta}})\end{matrix}}{\textbf{{min\phantom{g}}}}\ \sum_{% (i,j)}\lambda_{i,j}\cdot g_{i,j}(\boldsymbol{\hat{\theta}}_{S})\cdot\textbf{{% log}}\frac{g_{i,j}(\boldsymbol{\hat{\theta}}_{S})}{g_{i,j}(\boldsymbol{\tilde{% \theta}})}.bold_italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ∈ start_UNDERACCENT bold_italic_λ ∈ bold_Δ end_UNDERACCENT start_ARG arg max end_ARG start_UNDERACCENT start_ARG start_ROW start_CELL overbold_~ start_ARG bold_italic_θ end_ARG ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ≠ bold_italic_π ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG min bold_italic_g end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ log divide start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( overbold_~ start_ARG bold_italic_θ end_ARG ) end_ARG . (54)

If 𝛌(𝛉^S)superscript𝛌subscriptbold-^𝛉𝑆\boldsymbol{\lambda}^{*}(\boldsymbol{\hat{\theta}}_{S})bold_italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) is adopted with probability 1o(1)1𝑜11-o(1)1 - italic_o ( 1 ) uniformly for S[m(1+δ)δ2τχ(𝛉),m(1+δ)τχ(𝛉)]𝑆𝑚1𝛿subscript𝛿2subscript𝜏𝜒𝛉𝑚1𝛿subscript𝜏𝜒𝛉S\in[m(1+\delta)\delta_{2}\tau_{\chi}(\boldsymbol{\theta}),m(1+\delta)\tau_{% \chi}(\boldsymbol{\theta})]italic_S ∈ [ italic_m ( 1 + italic_δ ) italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) , italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) ], we have

(S(𝜽)sup𝜽2𝚯j,iS(𝜽2)zα(χ),maxS/(1+δ)τχ(Θ)[δ2m,m+1)𝜽^S𝜽|logχ|δ1|Θ=𝜽)formulae-sequencesubscript𝑆𝜽subscript𝜽2subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽2subscript𝑧𝛼𝜒matrix𝑆1𝛿subscript𝜏𝜒Θabsentsubscript𝛿2𝑚𝑚1maxnormsubscriptbold-^𝜽𝑆𝜽conditionalsuperscriptlog𝜒subscript𝛿1Θ𝜽\displaystyle\ \ \mathbb{P}\left(\ell_{S}(\boldsymbol{\theta})-\underset{% \boldsymbol{\theta}_{2}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}% (\boldsymbol{\theta}_{2})\leq z_{\alpha}(\chi),\underset{\begin{matrix}% \scriptstyle S/(1+\delta)\tau_{\chi}(\Theta)\\ \scriptstyle\in[\delta_{2}m,m+1)\end{matrix}}{\textbf{{max}}}\ \|\boldsymbol{% \hat{\theta}}_{S}-\boldsymbol{\theta}\|\leq\ |\textbf{{log}}\ \chi|^{-\delta_{% 1}}\ \Bigg{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≤ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) , start_UNDERACCENT start_ARG start_ROW start_CELL italic_S / ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) end_CELL end_ROW start_ROW start_CELL ∈ [ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m , italic_m + 1 ) end_CELL end_ROW end_ARG end_UNDERACCENT start_ARG max end_ARG ∥ overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_θ ∥ ≤ | log italic_χ | start_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | roman_Θ = bold_italic_θ ) (224)
\displaystyle\leq exp(Ω(m|logχ|))×O(|logχ|n1mn1),expΩ𝑚log𝜒𝑂superscriptlog𝜒𝑛1superscript𝑚𝑛1\displaystyle\ \ \textbf{{exp}}\Big{(}-\Omega(m|\textbf{{log}}\ \chi|)\Big{)}% \times O\Big{(}|\textbf{{log}}\ \chi|^{n-1}m^{n-1}\Big{)},exp ( - roman_Ω ( italic_m | log italic_χ | ) ) × italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) ,

where S=m(1+δ)τχ(𝛉)𝑆𝑚1𝛿subscript𝜏𝜒𝛉S=m(1+\delta)\tau_{\chi}(\boldsymbol{\theta})italic_S = italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ).

Combining the results of Lemma 9 and Lemma 10, (215) will be bounded as

(m(1+δ)τχ(Θ)S2<(m+1)(1+δ)τχ(Θ)|Θ=𝜽)𝑚1𝛿subscript𝜏𝜒Θsubscript𝑆2bra𝑚11𝛿subscript𝜏𝜒ΘΘ𝜽\displaystyle\ \ \mathbb{P}\left(m(1+\delta)\tau_{\chi}(\Theta)\leq S_{2}<(m+1% )(1+\delta)\tau_{\chi}(\Theta)\ \Big{|}\ \Theta=\boldsymbol{\theta}\right)blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ ) (225)
\displaystyle\leq (exp(Ω(m|logχ|))+exp(Ω(m|logχ|δ0)))×O(|logχ|n1mn1).expΩ𝑚log𝜒expΩ𝑚superscriptlog𝜒subscript𝛿0𝑂superscriptlog𝜒𝑛1superscript𝑚𝑛1\displaystyle\ \ \left(\textbf{{exp}}\Big{(}-\Omega(m|\textbf{{log}}\ \chi|)% \Big{)}+\textbf{{exp}}\Big{(}-\Omega(m|\textbf{{log}}\ \chi|^{\delta_{0}})\Big% {)}\right)\times O\Big{(}|\textbf{{log}}\ \chi|^{n-1}m^{n-1}\Big{)}.( exp ( - roman_Ω ( italic_m | log italic_χ | ) ) + exp ( - roman_Ω ( italic_m | log italic_χ | start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ) ) × italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) .

Aggregating the preceding display with (214), the expectation of the stop** time S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT will be bounded

𝔼[S2]𝔼delimited-[]subscript𝑆2\displaystyle\mathbb{E}[S_{2}]blackboard_E [ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] \displaystyle\leq (1+δ)𝔼[τχ(Θ)]+(1+δ)max𝜽Supp(ρ𝜽)τχ(𝜽)\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+(1+\delta)\cdot% \underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\textbf{{max}}}\ \tau_{\chi}(\boldsymbol{\theta})\cdot( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + ( 1 + italic_δ ) ⋅ start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( bold_italic_θ ) ⋅ (226)
m=1(m+1)max𝜽Supp(ρ𝜽)(m(1+δ)τχ(Θ)S<(m+1)(1+δ)τχ(Θ)|Θ=𝜽)superscriptsubscript𝑚1𝑚1𝜽Suppsubscript𝜌superscript𝜽max𝑚1𝛿subscript𝜏𝜒Θ𝑆bra𝑚11𝛿subscript𝜏𝜒ΘΘ𝜽\displaystyle\ \ \phantom{(1+\delta)}\sum_{m=1}^{\infty}(m+1)\underset{% \boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})}{% \textbf{{max}}}\mathbb{P}\left(m(1+\delta)\tau_{\chi}(\Theta)\leq S<(m+1)(1+% \delta)\tau_{\chi}(\Theta)\ \Big{|}\ \Theta=\boldsymbol{\theta}\right)∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_m + 1 ) start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG blackboard_P ( italic_m ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ≤ italic_S < ( italic_m + 1 ) ( 1 + italic_δ ) italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) | roman_Θ = bold_italic_θ )
\displaystyle\leq (1+δ)𝔼[τχ(Θ)]+O(|logχ|)×\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+O(|\textbf{{log}}\ % \chi|)\times( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + italic_O ( | log italic_χ | ) ×
m=1(m+1)(exp(Ω(m|logχ|))+exp(Ω(m|logχ|δ0)))×O(|logχ|n1mn1)superscriptsubscript𝑚1𝑚1expΩ𝑚log𝜒expΩ𝑚superscriptlog𝜒subscript𝛿0𝑂superscriptlog𝜒𝑛1superscript𝑚𝑛1\displaystyle\ \ \phantom{(1+\delta)}\sum_{m=1}^{\infty}(m+1)\left(\textbf{{% exp}}\Big{(}-\Omega(m|\textbf{{log}}\ \chi|)\Big{)}+\textbf{{exp}}\Big{(}-% \Omega(m|\textbf{{log}}\ \chi|^{\delta_{0}})\Big{)}\right)\times O\Big{(}|% \textbf{{log}}\ \chi|^{n-1}m^{n-1}\Big{)}∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_m + 1 ) ( exp ( - roman_Ω ( italic_m | log italic_χ | ) ) + exp ( - roman_Ω ( italic_m | log italic_χ | start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ) ) × italic_O ( | log italic_χ | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT )
\displaystyle\leq (1+δ)𝔼[τχ(Θ)]+o(|logχ|).1𝛿𝔼delimited-[]subscript𝜏𝜒Θ𝑜log𝜒\displaystyle\ \ (1+\delta)\mathbb{E}[\tau_{\chi}(\Theta)]+o(|\textbf{{log}}\ % \chi|).( 1 + italic_δ ) blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( roman_Θ ) ] + italic_o ( | log italic_χ | ) .

The preceding display is the desired conclusion for the asymptotic optimality of S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Next we proceed to the case of S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Notice that the event S1>Ssubscript𝑆1𝑆S_{1}>Sitalic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_S implies that

(i,j)exp(min(sup𝜽1𝚯i,jS(𝜽1)sup𝜽2Supp(𝜽)S(𝜽2),sup𝜽1𝚯j,iS(𝜽1)sup𝜽2Supp(𝜽)S(𝜽2)))>exp(zα(χ))subscript𝑖𝑗expminsubscript𝜽1subscript𝚯𝑖𝑗supsubscript𝑆subscript𝜽1subscript𝜽2Suppsuperscript𝜽supsubscript𝑆subscript𝜽2subscript𝜽1subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽1subscript𝜽2Suppsuperscript𝜽supsubscript𝑆subscript𝜽2expsubscript𝑧𝛼𝜒\sum_{(i,j)}\textbf{{exp}}\left(\textbf{{min}}\left(\underset{\boldsymbol{% \theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol% {\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in\textbf{{Supp}}(\boldsymbol{% \theta}^{\prime})}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2}),% \underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}% \ \ell_{S}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in% \textbf{{Supp}}(\boldsymbol{\theta}^{\prime})}{\textbf{{sup}}}\ \ell_{S}(% \boldsymbol{\theta}_{2})\right)\right)>\textbf{{exp}}(-z_{\alpha}(\chi))∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT exp ( min ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ) > exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) ) (227)

which further conducts

n(n1)max(i,j)exp(min(sup𝜽1𝚯i,jS(𝜽1)sup𝜽2Supp(𝜽)S(𝜽2),sup𝜽1𝚯j,iS(𝜽1)sup𝜽2Supp(𝜽)S(𝜽2)))𝑛𝑛1𝑖𝑗maxexpminsubscript𝜽1subscript𝚯𝑖𝑗supsubscript𝑆subscript𝜽1subscript𝜽2Suppsuperscript𝜽supsubscript𝑆subscript𝜽2subscript𝜽1subscript𝚯𝑗𝑖supsubscript𝑆subscript𝜽1subscript𝜽2Suppsuperscript𝜽supsubscript𝑆subscript𝜽2\displaystyle\ \ n(n-1)\cdot\underset{(i,j)}{\textbf{{max}}}\ \textbf{{exp}}% \left(\textbf{{min}}\left(\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{% \Theta}_{i,j}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{1})-\underset{% \boldsymbol{\theta}_{2}\in\textbf{{Supp}}(\boldsymbol{\theta}^{\prime})}{% \textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2}),\underset{\boldsymbol{% \theta}_{1}\in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol% {\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in\textbf{{Supp}}(\boldsymbol{% \theta}^{\prime})}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2})\right)\right)italic_n ( italic_n - 1 ) ⋅ start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG max end_ARG exp ( min ( start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ Supp ( bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ) (228)
>\displaystyle>> exp(zα(χ)).expsubscript𝑧𝛼𝜒\displaystyle\ \ \textbf{{exp}}(-z_{\alpha}(\chi)).exp ( - italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) ) .

It means that there exist a pairwise comparison (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) such that

|sup𝜽1𝚯i,jS(𝜽1)sup𝜽2𝚯j,iS(𝜽2)|zα(χ)+logn(n1).\left|\underset{\boldsymbol{\theta}_{1}\in\boldsymbol{\Theta}_{i,j}}{\textbf{{% sup}}}\ \ell_{S}(\boldsymbol{\theta}_{1})-\underset{\boldsymbol{\theta}_{2}\in% \in\boldsymbol{\Theta}_{j,i}}{\textbf{{sup}}}\ \ell_{S}(\boldsymbol{\theta}_{2% })\right|\leq z_{\alpha}(\chi)+\textbf{{log}}\ n(n-1).| start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ bold_Θ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - start_UNDERACCENT bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ∈ bold_Θ start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG roman_ℓ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | ≤ italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) + log italic_n ( italic_n - 1 ) . (229)

Then the analysis process of S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is similar to S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by replacing zα(χ)subscript𝑧𝛼𝜒z_{\alpha}(\chi)italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) with zα(χ)+logn(n1)subscript𝑧𝛼𝜒log𝑛𝑛1z_{\alpha}(\chi)+\textbf{{log}}\ n(n-1)italic_z start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_χ ) + log italic_n ( italic_n - 1 ). ∎

Proof of Theorem 3

The following proposition [10] shows the strong duality for the Wasserstein DRO problem that we investigate in this paper:

maxp𝜽Supp(ρ𝜽)sup𝖀γ()𝔼𝒒[L(𝜽,𝒒)],𝜽Suppsubscript𝜌superscript𝜽maxpsuperscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\underset{\ \boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{% \prime}})}{\ \textbf{{max\phantom{p}}}}\ \underset{\mathbb{Q}\in\boldsymbol{% \mathfrak{U}}^{\gamma}(\mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q% }\sim\mathbb{Q}}\left[L(\boldsymbol{\theta},\boldsymbol{q})\right],start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max bold_italic_p end_ARG start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] , (59)

This strong duality result ensures that the inner supremum admits a reformulation which is a simple, univariate optimization problem. Note that there exists the other strong duality result of Wasserstein DRO [25].

Proposition 2.

Let d:×[0,]:𝑑0d:\mathbb{R}\times\mathbb{R}\rightarrow[0,\infty]italic_d : blackboard_R × blackboard_R → [ 0 , ∞ ] be a lower semi-continuous cost function satisfying d(p,q)=0𝑑𝑝𝑞0d(p,\ q)=0italic_d ( italic_p , italic_q ) = 0 whenever p=q𝑝𝑞p=qitalic_p = italic_q. For λ0𝜆0\lambda\geq 0italic_λ ≥ 0 and loss function \ellroman_ℓ

(𝜽,pi,j)=pi,jloggi,j(𝜽),𝜽subscript𝑝𝑖𝑗subscript𝑝𝑖𝑗logsubscript𝑔𝑖𝑗𝜽\ell(\boldsymbol{\theta},p_{i,j})=p_{i,j}\cdot\textbf{{log}}~{}g_{i,j}(% \boldsymbol{\theta}),roman_ℓ ( bold_italic_θ , italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) = italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) , (230)

where f(i,j)(𝛉)subscript𝑓𝑖𝑗𝛉f_{(i,j)}(\boldsymbol{\theta})italic_f start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT ( bold_italic_θ ) is the probabilistic mass function of pairwise comparison (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), we define

ψλ,(𝜽,qi,j):=supqi,j+{(𝜽,qi,j)λd(pi,j,qi,j)}.assignsubscript𝜓𝜆𝜽subscript𝑞𝑖𝑗subscript𝑞𝑖𝑗subscriptsup𝜽subscript𝑞𝑖𝑗𝜆𝑑subscript𝑝𝑖𝑗subscript𝑞𝑖𝑗\psi_{\lambda,\ell}(\boldsymbol{\theta},q_{i,j}):=\underset{q_{i,j}\in\mathbb{% R}_{+}}{\textbf{{sup}}}\ \Big{\{}\ \ell(\boldsymbol{\theta},q_{i,j})-\lambda% \cdot d(p_{i,j},q_{i,j})\ \Big{\}}.italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) := start_UNDERACCENT italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_UNDERACCENT start_ARG sup end_ARG { roman_ℓ ( bold_italic_θ , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) - italic_λ ⋅ italic_d ( italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) } . (231)

Then it holds that

sup𝖀γ()𝔼𝒒[L(𝜽,𝒒)]=minpλ0{λγ+(i,j)ψλ,(𝜽,qi,j)}superscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒𝜆0minp𝜆𝛾subscript𝑖𝑗subscript𝜓𝜆𝜽subscript𝑞𝑖𝑗\underset{\mathbb{Q}\in\boldsymbol{\mathfrak{U}}^{\gamma}(\mathbb{P})}{\textbf% {{sup}}}\ \ \mathbb{E}_{\boldsymbol{q}\sim\mathbb{Q}}\big{[}L\big{(}% \boldsymbol{\theta},\boldsymbol{q}\big{)}\big{]}=\underset{\lambda\geq 0}{\ % \textbf{{min\phantom{p}}}}\left\{\ \lambda\gamma+\sum_{(i,j)}\psi_{\lambda,% \ell}(\boldsymbol{\theta},q_{i,j})\ \right\}start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] = start_UNDERACCENT italic_λ ≥ 0 end_UNDERACCENT start_ARG min bold_italic_p end_ARG { italic_λ italic_γ + ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) } (232)

With the strong duality, we have the following result.

See 3

Proof.

Let Δij=qijpijsubscriptΔ𝑖𝑗subscript𝑞𝑖𝑗subscript𝑝𝑖𝑗\Delta_{ij}=q_{ij}-p_{ij}roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT where 𝒒=(q1,2,,qn,n1)+N𝒒subscript𝑞12subscript𝑞𝑛𝑛1subscriptsuperscript𝑁\boldsymbol{q}=(q_{1,2},\dots,q_{n,n-1})\in\mathbb{R}^{N}_{+}bold_italic_q = ( italic_q start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. We define ψλ,(𝜽)subscript𝜓𝜆𝜽\psi_{\lambda,\ell}(\boldsymbol{\theta})italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) as

ψλ,(𝜽)subscript𝜓𝜆𝜽\displaystyle\psi_{\lambda,\ell}(\boldsymbol{\theta})italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) =\displaystyle== sup𝒒(i,j){(𝜽,qij)λ[d(pi,j,qi,j)]2}𝒒sup𝑖𝑗𝜽subscript𝑞𝑖𝑗𝜆superscriptdelimited-[]𝑑subscript𝑝𝑖𝑗subscript𝑞𝑖𝑗2\displaystyle\ \ \underset{\boldsymbol{q}}{\textbf{{sup}}}\ \underset{(i,j)}{% \sum}\ \Big{\{}\ell(\boldsymbol{\theta},\ q_{ij})-\lambda\big{[}d(p_{i,j},\ q_% {i,j})\big{]}^{2}\Big{\}}underbold_italic_q start_ARG sup end_ARG start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG { roman_ℓ ( bold_italic_θ , italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) - italic_λ [ italic_d ( italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (233)
=\displaystyle== sup𝒒(i,j){qijlogg(i,j)(𝜽)λ|pijqij|2}𝒒sup𝑖𝑗subscript𝑞𝑖𝑗logsubscript𝑔𝑖𝑗𝜽𝜆superscriptsubscript𝑝𝑖𝑗subscript𝑞𝑖𝑗2\displaystyle\ \ \underset{\boldsymbol{q}}{\textbf{{sup}}}\ \underset{(i,j)}{% \sum}\ \left\{q_{ij}\cdot\textbf{{log}}~{}g_{(i,j)}(\boldsymbol{\theta})-% \lambda\big{|}p_{ij}-q_{ij}\big{|}^{2}\right\}underbold_italic_q start_ARG sup end_ARG start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG { italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ log italic_g start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT ( bold_italic_θ ) - italic_λ | italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }
=\displaystyle== (i,j)supΔij(ΔijbijλΔij2+pijbij),𝑖𝑗subscriptΔ𝑖𝑗supsubscriptΔ𝑖𝑗subscript𝑏𝑖𝑗𝜆superscriptsubscriptΔ𝑖𝑗2subscript𝑝𝑖𝑗subscript𝑏𝑖𝑗\displaystyle\ \ \underset{(i,j)}{\sum}\ \underset{\Delta_{ij}\in\mathbb{R}}{% \textbf{{sup}}}\ \Big{(}\Delta_{ij}b_{ij}-\lambda\Delta_{ij}^{2}+p_{ij}b_{ij}% \Big{)},start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG start_UNDERACCENT roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_R end_UNDERACCENT start_ARG sup end_ARG ( roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_λ roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ,

where

bi,j=loggi,j(𝜽),subscript𝑏𝑖𝑗logsubscript𝑔𝑖𝑗𝜽b_{i,j}=\textbf{{log}}~{}g_{i,j}(\boldsymbol{\theta}),italic_b start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) , (234)

and the third equality holds due to ψλ,(𝜽)subscript𝜓𝜆𝜽\psi_{\lambda,\ell}(\boldsymbol{\theta})italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) is a decomposable function. Expanding (233), we can simplify ψλ,(𝜽)subscript𝜓𝜆𝜽\psi_{\lambda,\ell}(\boldsymbol{\theta})italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) as below:

ψλ,(𝜽)subscript𝜓𝜆𝜽\displaystyle\psi_{\lambda,\ell}(\boldsymbol{\theta})italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) =\displaystyle== 𝒑,𝒃+(i,j)supΔij(ΔijbijλΔij2)𝒑𝒃𝑖𝑗subscriptΔ𝑖𝑗supsubscriptΔ𝑖𝑗subscript𝑏𝑖𝑗𝜆superscriptsubscriptΔ𝑖𝑗2\displaystyle\ \ \big{\langle}\boldsymbol{p},\boldsymbol{b}\big{\rangle}+% \underset{(i,j)}{\sum}\ \underset{\Delta_{ij}\in\mathbb{R}}{\textbf{{sup}}}\ % \big{(}\Delta_{ij}b_{ij}-\lambda\Delta_{ij}^{2}\big{)}⟨ bold_italic_p , bold_italic_b ⟩ + start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG start_UNDERACCENT roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_R end_UNDERACCENT start_ARG sup end_ARG ( roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_λ roman_Δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (235)
=\displaystyle== {𝒑,𝒃+14λ𝒃22,if λ>0,,if λ=0.cases𝒑𝒃14𝜆subscriptsuperscriptnorm𝒃22if 𝜆0if 𝜆0\displaystyle\ \ \left\{\begin{array}[]{lc}\displaystyle\langle\boldsymbol{p},% \ \boldsymbol{b}\rangle+\frac{1}{4\lambda}\|\boldsymbol{b}\|^{2}_{2},&\ \text{% if }\ \lambda>0,\\[10.0pt] \infty,&\ \text{if }\ \lambda=0.\end{array}\right.{ start_ARRAY start_ROW start_CELL ⟨ bold_italic_p , bold_italic_b ⟩ + divide start_ARG 1 end_ARG start_ARG 4 italic_λ end_ARG ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL start_CELL if italic_λ > 0 , end_CELL end_ROW start_ROW start_CELL ∞ , end_CELL start_CELL if italic_λ = 0 . end_CELL end_ROW end_ARRAY

Next, we investigate the duality of (59) with Proposition 2. As ψλ,(𝜽,qi,j)=subscript𝜓𝜆𝜽subscript𝑞𝑖𝑗\psi_{\lambda,\ell}(\boldsymbol{\theta},\ q_{i,j})=\inftyitalic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ , italic_q start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) = ∞ when λ=0𝜆0\lambda=0italic_λ = 0, the dual formulation of the supremum in (59) would be

sup𝖀γ()𝔼𝒒[L(𝜽,𝒒)]superscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\displaystyle\ \ \ \ \underset{\mathbb{Q}\in\boldsymbol{\mathfrak{U}}^{\gamma}% (\mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q}\sim\mathbb{Q}}\left[% L(\boldsymbol{\theta},\boldsymbol{q})\right]start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] (236)
=minλ0{λγ+ψλ,(𝜽)}𝜆0min𝜆𝛾subscript𝜓𝜆𝜽\displaystyle=\ \ \underset{\lambda\geq 0}{\textbf{{min}}}\ \ \Bigg{\{}\lambda% \gamma+\psi_{\lambda,\ell}(\boldsymbol{\theta})\Bigg{\}}= start_UNDERACCENT italic_λ ≥ 0 end_UNDERACCENT start_ARG min end_ARG { italic_λ italic_γ + italic_ψ start_POSTSUBSCRIPT italic_λ , roman_ℓ end_POSTSUBSCRIPT ( bold_italic_θ ) }
=minλ>0{λγ+𝒑,𝒃+14λ𝒃22}.𝜆0minconditional-set𝜆𝛾𝒑𝒃14𝜆evaluated-at𝒃22\displaystyle=\ \ \underset{\lambda>0}{\textbf{{min}}}\ \ \Bigg{\{}\lambda% \gamma+\langle\boldsymbol{p},\ \boldsymbol{b}\rangle+\frac{1}{4\lambda}\|% \boldsymbol{b}\|^{2}_{2}\Bigg{\}}.= start_UNDERACCENT italic_λ > 0 end_UNDERACCENT start_ARG min end_ARG { italic_λ italic_γ + ⟨ bold_italic_p , bold_italic_b ⟩ + divide start_ARG 1 end_ARG start_ARG 4 italic_λ end_ARG ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } .

By the definition of 𝒃𝒃\boldsymbol{b}bold_italic_b, we know that

L(𝜽,𝒑)=𝒑,𝒃𝐿𝜽𝒑𝒑𝒃L(\boldsymbol{\theta},\ \boldsymbol{p})=\big{\langle}\boldsymbol{p},\ % \boldsymbol{b}\big{\rangle}italic_L ( bold_italic_θ , bold_italic_p ) = ⟨ bold_italic_p , bold_italic_b ⟩ (237)

Moreover, notice that the right hand side of (236) is a convex function which approaches infinity when λ𝜆\lambda\rightarrow\inftyitalic_λ → ∞, the global optimal of it can be obtained uniquely via the first order optimality condition as

λ{λγ+𝒑,𝒃+14λ𝒃22}=0,𝜆conditional-set𝜆𝛾𝒑𝒃14𝜆evaluated-at𝒃220\frac{\partial}{\partial\lambda}\Bigg{\{}\ \lambda\gamma+\langle\boldsymbol{p}% ,\ \boldsymbol{b}\rangle+\frac{1}{4\lambda}\|\boldsymbol{b}\|^{2}_{2}\ \Bigg{% \}}=0,divide start_ARG ∂ end_ARG start_ARG ∂ italic_λ end_ARG { italic_λ italic_γ + ⟨ bold_italic_p , bold_italic_b ⟩ + divide start_ARG 1 end_ARG start_ARG 4 italic_λ end_ARG ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } = 0 , (238)

and the optimal dual variable is

λγ=𝒃22γ.subscriptsuperscript𝜆𝛾subscriptnorm𝒃22𝛾\lambda^{*}_{\gamma}=\frac{\|\boldsymbol{b}\|_{2}}{2\sqrt{\gamma}}.italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = divide start_ARG ∥ bold_italic_b ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 square-root start_ARG italic_γ end_ARG end_ARG . (239)

Substituting λγsubscriptsuperscript𝜆𝛾\lambda^{*}_{\gamma}italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT and 𝒃𝒃\boldsymbol{b}bold_italic_b into (236), we have

sup𝖀γ()𝔼𝒒[L(𝜽,𝒒)]superscript𝖀𝛾supsubscript𝔼similar-to𝒒delimited-[]𝐿𝜽𝒒\displaystyle\ \ \underset{\mathbb{Q}\in\boldsymbol{\mathfrak{U}}^{\gamma}(% \mathbb{P})}{\textbf{{sup}}}\ \mathbb{E}_{\boldsymbol{q}\sim\mathbb{Q}}\left[L% (\boldsymbol{\theta},\boldsymbol{q})\right]start_UNDERACCENT blackboard_Q ∈ bold_fraktur_U start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ( blackboard_P ) end_UNDERACCENT start_ARG sup end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_q ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_L ( bold_italic_θ , bold_italic_q ) ] (240)
=\displaystyle== γ𝒃2+𝒑,𝒃𝛾subscriptnorm𝒃2𝒑𝒃\displaystyle\ \ \sqrt{\gamma}\cdot\|\boldsymbol{b}\|_{2}+\langle\boldsymbol{p% },\ \boldsymbol{b}\ranglesquare-root start_ARG italic_γ end_ARG ⋅ ∥ bold_italic_b ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ⟨ bold_italic_p , bold_italic_b ⟩
=\displaystyle== γ(i,j)[loggi,j(𝜽)]2+(i,j)pi,jloggi,j(𝜽).𝛾𝑖𝑗superscriptdelimited-[]logsubscript𝑔𝑖𝑗𝜽2𝑖𝑗subscript𝑝𝑖𝑗logsubscript𝑔𝑖𝑗𝜽\displaystyle\ \ \sqrt{\ \gamma\underset{(i,j)}{\sum}[\textbf{{log}}~{}g_{i,j}% (\boldsymbol{\theta})]^{2}}+\underset{(i,j)}{\sum}p_{i,j}\textbf{{log}}~{}g_{i% ,j}(\boldsymbol{\theta}).square-root start_ARG italic_γ start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG [ log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + start_UNDERACCENT ( italic_i , italic_j ) end_UNDERACCENT start_ARG ∑ end_ARG italic_p start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) .

When we sepicify the probability mass function of (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) as

gi,j(𝜽)=eθieθi+eθj,subscript𝑔𝑖𝑗𝜽superscript𝑒subscript𝜃𝑖superscript𝑒subscript𝜃𝑖superscript𝑒subscript𝜃𝑗g_{i,j}(\boldsymbol{\theta})=\frac{e^{\theta_{i}}}{e^{\theta_{i}}+e^{\theta_{j% }}},italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ,

we can obtain the formulation of hhitalic_h for the BTL model. ∎

Details of Algorithm 3 and 4

  Input : the probability mass function g𝑔gitalic_g, the support set Supp(ρ𝜽)Suppsubscript𝜌superscript𝜽\textbf{{Supp}}(\rho_{\boldsymbol{\theta}^{\prime}})Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), the incomplete knowledge 𝑺𝑺\boldsymbol{S}bold_italic_S and the solution accuracy ϵitalic-ϵ\epsilonitalic_ϵ.
1 Initialization:
μmin=0,subscript𝜇min0\displaystyle\mu_{\text{min}}=0,italic_μ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT = 0 ,
μmax=μ=max{m𝒛,m2β𝒛2},subscript𝜇maxsubscript𝜇max𝑚subscriptnorm𝒛𝑚2𝛽subscriptnorm𝒛2\displaystyle\mu_{\text{max}}=\mu_{\infty}=\textbf{{max}}\left\{m\cdot\|% \boldsymbol{z}\|_{\infty},\sqrt{\frac{m}{2\beta}}\cdot\|\boldsymbol{z}\|_{2}% \right\},italic_μ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = max { italic_m ⋅ ∥ bold_italic_z ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT , square-root start_ARG divide start_ARG italic_m end_ARG start_ARG 2 italic_β end_ARG end_ARG ⋅ ∥ bold_italic_z ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } ,
where 𝒛=[z1,2,,zn,n1]𝒛subscript𝑧12subscript𝑧𝑛𝑛1\boldsymbol{z}=[z_{1,2},\dots,z_{n,n-1}]bold_italic_z = [ italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ]
zij=max𝜽Supp(ρ𝜽)loggi,j(𝜽).subscript𝑧𝑖𝑗𝜽Suppsubscript𝜌superscript𝜽maxlogsubscript𝑔𝑖𝑗𝜽z_{ij}=\underset{\boldsymbol{\theta}\in\textbf{{Supp}}(\rho_{\boldsymbol{% \theta}^{\prime}})}{\textbf{{max}}}\ -\textbf{{log}}~{}g_{i,j}(\boldsymbol{% \theta}).italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = start_UNDERACCENT bold_italic_θ ∈ Supp ( italic_ρ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG max end_ARG - log italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_italic_θ ) .
2while |μmaxμmin|>ϵμsubscript𝜇maxsubscript𝜇minitalic-ϵsubscript𝜇|\mu_{\text{max}}-\mu_{\text{min}}|>\epsilon\cdot\mu_{\infty}| italic_μ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT | > italic_ϵ ⋅ italic_μ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT do
3      
μ𝜇\displaystyle\muitalic_μ =\displaystyle== 12(μmin+μmax),12subscript𝜇minsubscript𝜇max\displaystyle\ \ \ \ \frac{1}{2}(\mu_{\text{min}}+\mu_{\text{max}}),divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_μ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ) ,
𝜽(μ)𝜽𝜇\displaystyle\boldsymbol{\theta}(\mu)bold_italic_θ ( italic_μ ) =\displaystyle== SimplexProjection(𝜽(0),μ),SimplexProjectionsuperscript𝜽0𝜇\displaystyle\ \ \ \ \textbf{{SimplexProjection}}(\boldsymbol{\theta}^{(0)},% \mu),SimplexProjection ( bold_italic_θ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_μ ) ,
(μ)𝜇\displaystyle\nabla\mathcal{H}(\mu)∇ caligraphic_H ( italic_μ ) =\displaystyle== 12𝜽(μ)𝜽𝒜22β.12subscriptsuperscriptnorm𝜽𝜇subscript𝜽𝒜22𝛽\displaystyle\ \ \ \ \frac{1}{2}\|\boldsymbol{\theta}(\mu)-\boldsymbol{\theta}% _{\mathcal{A}}\|^{2}_{2}-\beta.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_θ ( italic_μ ) - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_β .
4      if (μ)>0𝜇0\nabla\mathcal{H}(\mu)>0∇ caligraphic_H ( italic_μ ) > 0 then
5            
μmin=μ,subscript𝜇min𝜇\mu_{\text{min}}=\mu,italic_μ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT = italic_μ ,
6      else
7            
μmax=μ.subscript𝜇max𝜇\mu_{\text{max}}=\mu.italic_μ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = italic_μ .
8       end if
9      
10 end while
11
12Update
μ=12(μmin+μmax).𝜇12subscript𝜇minsubscript𝜇max\mu=\frac{1}{2}\big{(}\mu_{\text{min}}+\mu_{\text{max}}\big{)}.italic_μ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_μ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ) .
13Solve the distributionally robust estimation
𝜽^=SimplexProjection(𝜽(0),μ)bold-^𝜽SimplexProjectionsuperscript𝜽0𝜇\boldsymbol{\hat{\theta}}=\textbf{{SimplexProjection}}(\boldsymbol{\theta}^{(0% )},\mu)overbold_^ start_ARG bold_italic_θ end_ARG = SimplexProjection ( bold_italic_θ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_μ )
Output : the distributionally robust estimation 𝜽^bold-^𝜽\boldsymbol{\hat{\theta}}overbold_^ start_ARG bold_italic_θ end_ARG.
Algorithm 3 Robust Estimation
  Input : the partial dual problem (241)
1 Initialization: the initial step size η0subscript𝜂0\eta_{0}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the maximum iteration number T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝑼=[n]𝑼delimited-[]𝑛\boldsymbol{U}=[n]bold_italic_U = [ italic_n ], s=0𝑠0s=0italic_s = 0, v=0𝑣0v=0italic_v = 0.
2for t=0𝑡0t=0italic_t = 0 to T11subscript𝑇11T_{1}-1italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 do
3      
𝜽(t+1)=𝜽(t)ηt(𝜽(t),μ).superscript𝜽𝑡1superscript𝜽𝑡subscript𝜂𝑡superscript𝜽𝑡𝜇\boldsymbol{\theta}^{(t+1)}=\boldsymbol{\theta}^{(t)}-\eta_{t}\nabla\mathcal{L% }(\boldsymbol{\theta}^{(t)},\mu).bold_italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∇ caligraphic_L ( bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_μ ) .
4 end for
5
6while 𝐔𝐔\boldsymbol{U}\neq\varnothingbold_italic_U ≠ ∅ do
7       Pick k𝑼𝑘𝑼k\in\boldsymbol{U}italic_k ∈ bold_italic_U at random and separate 𝑼𝑼\boldsymbol{U}bold_italic_U as
𝑮={j𝑼|θj(T1)θk(T1)},𝑮conditional-set𝑗𝑼subscriptsuperscript𝜃subscript𝑇1𝑗subscriptsuperscript𝜃subscript𝑇1𝑘\displaystyle\boldsymbol{G}=\{\ j\in\boldsymbol{U}\ |\ \theta^{(T_{1})}_{j}% \geq\theta^{(T_{1})}_{k}\ \},bold_italic_G = { italic_j ∈ bold_italic_U | italic_θ start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_θ start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ,
𝑳={j𝑼|θj(T1)<θk(T1)}.𝑳conditional-set𝑗𝑼subscriptsuperscript𝜃subscript𝑇1𝑗subscriptsuperscript𝜃subscript𝑇1𝑘\displaystyle\boldsymbol{L}=\{\ j\in\boldsymbol{U}\ |\ \theta^{(T_{1})}_{j}<% \theta^{(T_{1})}_{k}\ \}.bold_italic_L = { italic_j ∈ bold_italic_U | italic_θ start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } .
8      Set
Δv=|𝑮|,Δs=j𝑮θ^j(T1).formulae-sequenceΔ𝑣𝑮Δ𝑠subscript𝑗𝑮subscriptsuperscript^𝜃subscript𝑇1𝑗\Delta v=|\boldsymbol{G}|,\ \Delta s=\sum_{j\in\boldsymbol{G}}\hat{\theta}^{(T% _{1})}_{j}.roman_Δ italic_v = | bold_italic_G | , roman_Δ italic_s = ∑ start_POSTSUBSCRIPT italic_j ∈ bold_italic_G end_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .
9      if (s+Δs)(v+Δv)<1𝑠Δ𝑠𝑣Δ𝑣1(s+\Delta s)-(v+\Delta v)<1( italic_s + roman_Δ italic_s ) - ( italic_v + roman_Δ italic_v ) < 1 then
10            
ss+Δs,vv+Δv,𝑼𝑳formulae-sequence𝑠𝑠Δ𝑠formulae-sequence𝑣𝑣Δ𝑣𝑼𝑳s\leftarrow s+\Delta s,\ v\leftarrow v+\Delta v,\ \boldsymbol{U}\leftarrow% \boldsymbol{L}italic_s ← italic_s + roman_Δ italic_s , italic_v ← italic_v + roman_Δ italic_v , bold_italic_U ← bold_italic_L
11      else
12            
𝑼𝑮/{k}.𝑼𝑮𝑘\boldsymbol{U}\leftarrow\boldsymbol{G}/\{k\}.bold_italic_U ← bold_italic_G / { italic_k } .
13       end if
14      
15 end while
16
17
𝜽^(T1)=[𝜽(T1)γ𝟏]+,superscriptbold-^𝜽subscript𝑇1subscriptdelimited-[]superscript𝜽subscript𝑇1𝛾1\boldsymbol{\hat{\theta}}^{(T_{1})}=\big{[}\boldsymbol{\theta}^{(T_{1})}-% \gamma\cdot\boldsymbol{1}\big{]}_{+},overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT = [ bold_italic_θ start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - italic_γ ⋅ bold_1 ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ,
where
γ=s1v.𝛾𝑠1𝑣\gamma=\frac{s-1}{v}.italic_γ = divide start_ARG italic_s - 1 end_ARG start_ARG italic_v end_ARG .
Output : 𝜽^(T1)superscriptbold-^𝜽subscript𝑇1\boldsymbol{\hat{\theta}}^{(T_{1})}overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT.
Algorithm 4 SimplexProjection(𝜽(0),μ)SimplexProjectionsuperscript𝜽0𝜇\textbf{{SimplexProjection}}(\boldsymbol{\theta}^{(0)},\mu)SimplexProjection ( bold_italic_θ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , italic_μ )
Definition 8 (p𝑝pitalic_p-Wasserstein distance).

Let p[1,]𝑝1p\in[1,\infty]italic_p ∈ [ 1 , ∞ ]. The p𝑝pitalic_p-Wasserstein distance between distributions ,𝒫(𝛀)𝒫𝛀\mathbb{P},\ \mathbb{Q}\in\mathcal{P}(\boldsymbol{\Omega})blackboard_P , blackboard_Q ∈ caligraphic_P ( bold_Ω ) is defined as

𝒲p(,)={(minγΓ(,)𝛀×𝛀[d(𝒑,𝒒)]pγ(d𝒑,d𝒒))1p,p<infγΓ(,)γ-ess sup𝛀×𝛀d(𝒑,𝒒),p=,subscript𝒲𝑝casessuperscript𝛾Γminsubscript𝛀𝛀superscriptdelimited-[]𝑑𝒑𝒒𝑝𝛾d𝒑d𝒒1𝑝𝑝𝛾Γinf𝛀𝛀𝛾-ess sup𝑑𝒑𝒒𝑝\displaystyle\mathcal{W}_{p}\ (\mathbb{P},\ \mathbb{Q})=\left\{\begin{array}[]% {ll}\Bigg{(}\underset{\gamma\in\Gamma(\mathbb{P},\ \mathbb{Q})}{\textbf{{min}}% }{\int}_{\boldsymbol{\Omega}\times\boldsymbol{\Omega}}\big{[}d(\boldsymbol{p},% \ \boldsymbol{q})\big{]}^{p}\gamma\big{(}\mathrm{d}\boldsymbol{p},\ \mathrm{d}% \boldsymbol{q}\big{)}\Bigg{)}^{\frac{1}{p}},&p<\infty\\[20.0pt] \underset{\gamma\in\Gamma(\mathbb{P},\ \mathbb{Q})}{\textbf{{inf}}}\ \underset% {\vphantom{\gamma\in\Gamma(\mathbb{P},\ \mathbb{Q})}\boldsymbol{\Omega}\times% \boldsymbol{\Omega}}{\gamma\textnormal{-}\textbf{{ess sup}}}\ d\big{(}% \boldsymbol{p},\ \boldsymbol{q}\big{)},&p=\infty,\end{array}\right.caligraphic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_P , blackboard_Q ) = { start_ARRAY start_ROW start_CELL ( start_UNDERACCENT italic_γ ∈ roman_Γ ( blackboard_P , blackboard_Q ) end_UNDERACCENT start_ARG min end_ARG ∫ start_POSTSUBSCRIPT bold_Ω × bold_Ω end_POSTSUBSCRIPT [ italic_d ( bold_italic_p , bold_italic_q ) ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_γ ( roman_d bold_italic_p , roman_d bold_italic_q ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT , end_CELL start_CELL italic_p < ∞ end_CELL end_ROW start_ROW start_CELL start_UNDERACCENT italic_γ ∈ roman_Γ ( blackboard_P , blackboard_Q ) end_UNDERACCENT start_ARG inf end_ARG start_UNDERACCENT bold_Ω × bold_Ω end_UNDERACCENT start_ARG italic_γ - bold_italic_ess bold_italic_sup end_ARG italic_d ( bold_italic_p , bold_italic_q ) , end_CELL start_CELL italic_p = ∞ , end_CELL end_ROW end_ARRAY

where Γ(,)Γ\Gamma(\mathbb{P},\ \mathbb{Q})roman_Γ ( blackboard_P , blackboard_Q ) denotes the set of all Borel probability distributions on 𝛀×𝛀𝛀𝛀\boldsymbol{\Omega}\times\boldsymbol{\Omega}bold_Ω × bold_Ω with marginal distributions \mathbb{P}blackboard_P and \mathbb{Q}blackboard_Q, d:𝛀×𝛀+:𝑑𝛀𝛀subscriptd:\boldsymbol{\Omega}\times\boldsymbol{\Omega}\rightarrow\mathbb{R}_{+}italic_d : bold_Ω × bold_Ω → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is a nonnegative function, and γ-ess sup𝛾-ess sup\gamma\textnormal{-}\textbf{{ess sup}}italic_γ - bold_italic_ess bold_italic_sup expresses the essential supremum of d(,)𝑑d(\cdot,\ \cdot)italic_d ( ⋅ , ⋅ ) with respect to the measure γ𝛾\gammaitalic_γ.

The Wasserstein distance arises in the problem of optimal transport [40, 49]: for any coupling γΓ(,)𝛾Γ\gamma\in\Gamma(\mathbb{P},\ \mathbb{Q})italic_γ ∈ roman_Γ ( blackboard_P , blackboard_Q ), the conditional distribution γ𝒘|𝒘subscript𝛾conditional𝒘superscript𝒘\gamma_{\boldsymbol{w}|\boldsymbol{w}^{\prime}}italic_γ start_POSTSUBSCRIPT bold_italic_w | bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT can be viewed as a randomized overhead for ‘transporting’ a unit quantity of some material from a random location 𝒘similar-to𝒘\boldsymbol{w}\sim\mathbb{P}bold_italic_w ∼ blackboard_P to another location 𝒘similar-tosuperscript𝒘\boldsymbol{w}^{\prime}\sim\mathbb{Q}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ blackboard_Q. If the cost of transportation from 𝒘𝛀𝒘𝛀\boldsymbol{w}\in\boldsymbol{\Omega}bold_italic_w ∈ bold_Ω to 𝒘𝛀superscript𝒘𝛀\boldsymbol{w}^{\prime}\in\boldsymbol{\Omega}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_Ω is given by [d(𝒘,𝒘)]psuperscriptdelimited-[]𝑑𝒘superscript𝒘𝑝[d(\boldsymbol{w},\boldsymbol{w}^{\prime})]^{p}[ italic_d ( bold_italic_w , bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, 𝒲p(,)subscript𝒲𝑝\mathcal{W}_{p}\ (\mathbb{P},\ \mathbb{Q})caligraphic_W start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( blackboard_P , blackboard_Q ) will be the minimum expected transport cost [42].

Introducing a dual variable μ𝜇\muitalic_μ for the constraint 1/2𝜽𝜽𝒜22β12superscriptsubscriptnorm𝜽subscript𝜽𝒜22𝛽1/2\|\boldsymbol{\theta}-\boldsymbol{\theta}_{\mathcal{A}}\|_{2}^{2}\leq\beta1 / 2 ∥ bold_italic_θ - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β, we solve the original optimization problem (62) by maximizing its dual problem. Furthermore, the strong duality stands for (62) as the Slater condition is satisfied by 𝜽𝒜subscript𝜽𝒜\boldsymbol{\theta}_{\mathcal{A}}bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT. Then the standard min-max swap will be performed as

maxμ0(μ)𝜇0max𝜇\displaystyle\underset{\mu\geq 0}{\ \ \textbf{{max}}\phantom{f}}\mathcal{H}(\mu)start_UNDERACCENT italic_μ ≥ 0 end_UNDERACCENT start_ARG max end_ARG caligraphic_H ( italic_μ ) :=assign\displaystyle:=:= inf𝜽n{1(𝜽,μ)|𝜽0, 1𝜽=1}𝜽superscript𝑛infconditional-setsubscript1𝜽𝜇formulae-sequencesucceeds-or-equals𝜽0superscript1top𝜽1\displaystyle\ \ \underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{inf}% }}\ \Big{\{}\ \mathcal{L}_{1}(\boldsymbol{\theta},\mu)\ \Big{|}\ \boldsymbol{% \theta}\succeq 0,\ \boldsymbol{1}^{\top}\boldsymbol{\theta}=1\ \Big{\}}start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG { caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_θ , italic_μ ) | bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 } (241)
=\displaystyle== inf𝜽n{μ2𝜽𝜽𝒜22μβ+h(𝜽)|𝜽0,𝟏𝜽=1}.𝜽superscript𝑛infconditional-set𝜇2formulae-sequencesucceeds-or-equals𝜽evaluated-atsubscript𝜽𝒜22𝜇𝛽conditional𝜽𝜽0superscript1top𝜽1\displaystyle\ \ \underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{inf}% }}\Big{\{}\frac{\mu}{2}\big{\|}\boldsymbol{\theta}-\boldsymbol{\theta}_{% \mathcal{A}}\big{\|}^{2}_{2}-\mu\beta+h(\boldsymbol{\theta})\ \Big{|}\ % \boldsymbol{\theta}\succeq 0,\boldsymbol{1}^{\top}\boldsymbol{\theta}=1\Big{\}}.start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG { divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ bold_italic_θ - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_μ italic_β + italic_h ( bold_italic_θ ) | bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 } .

By Corollary 4.4.5 of Chapter VI in [27], we know that

(μ)𝜇\displaystyle\nabla\mathcal{H}(\mu)∇ caligraphic_H ( italic_μ ) =\displaystyle== μ1(𝜽(μ),μ)subscript𝜇subscript1𝜽𝜇𝜇\displaystyle\ \ \nabla_{\mu}\ \mathcal{L}_{1}(\boldsymbol{\theta}(\mu),\mu)∇ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_θ ( italic_μ ) , italic_μ ) (242)
=\displaystyle== μ{μ2𝜽(μ)𝜽𝒜22μβ+h(𝜽(μ))}subscript𝜇𝜇2subscriptsuperscriptnorm𝜽𝜇subscript𝜽𝒜22𝜇𝛽𝜽𝜇\displaystyle\ \ \nabla_{\mu}\ \Bigg{\{}\frac{\mu}{2}\big{\|}\boldsymbol{% \theta}(\mu)-\boldsymbol{\theta}_{\mathcal{A}}\big{\|}^{2}_{2}-\mu\beta+h(% \boldsymbol{\theta}(\mu))\Bigg{\}}∇ start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT { divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ bold_italic_θ ( italic_μ ) - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_μ italic_β + italic_h ( bold_italic_θ ( italic_μ ) ) }
=\displaystyle== 12𝜽(μ)𝜽𝒜22β.12subscriptsuperscriptnorm𝜽𝜇subscript𝜽𝒜22𝛽\displaystyle\ \ \frac{1}{2}\big{\|}\boldsymbol{\theta}(\mu)-\boldsymbol{% \theta}_{\mathcal{A}}\big{\|}^{2}_{2}-\beta.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_θ ( italic_μ ) - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_β .

If (μ)𝜇\nabla\mathcal{H}(\mu)∇ caligraphic_H ( italic_μ ) can be solved efficiently, it is possible to binary search with log(1/ε)1𝜀\log(1/\varepsilon)roman_log ( 1 / italic_ε ) iterations and the accuracy ε𝜀\varepsilonitalic_ε for the μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as (μ)εnormsuperscript𝜇𝜀\|\nabla\mathcal{H}(\mu^{*})\|\leq\varepsilon∥ ∇ caligraphic_H ( italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ ≤ italic_ε.

Given μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and denote the probabilistic simplex as

Δ={𝜽n|𝜽0, 1𝜽=1},Δconditional-set𝜽superscript𝑛formulae-sequencesucceeds-or-equals𝜽0superscript1top𝜽1\Delta=\{\ \boldsymbol{\theta}\in\mathbb{R}^{n}\ |\ \boldsymbol{\theta}\succeq 0% ,\ \boldsymbol{1}^{\top}\boldsymbol{\theta}=1\ \},roman_Δ = { bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 } , (243)

the corresponding 𝜽(λ)𝜽superscript𝜆\boldsymbol{\theta}(\lambda^{*})bold_italic_θ ( italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) can be obtained by the projected sub-gradient descent which minimizes the objective \mathcal{L}caligraphic_L with the following sequence {𝜽(t)}t=1Tsuperscriptsubscriptsuperscript𝜽𝑡𝑡1𝑇\{\boldsymbol{\theta}^{(t)}\}_{t=1}^{T}{ bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT,

𝜽(t+1)=ProjΔ(𝜽(t)ηt(𝜽(t),μ)),superscript𝜽𝑡1subscriptProjΔsuperscript𝜽𝑡subscript𝜂𝑡superscript𝜽𝑡superscript𝜇\boldsymbol{\theta}^{(t+1)}=\textbf{Proj}_{\Delta}\big{(}\boldsymbol{\theta}^{% (t)}-\eta_{t}\nabla\mathcal{L}(\boldsymbol{\theta}^{(t)},\mu^{*})\big{)},bold_italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = Proj start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∇ caligraphic_L ( bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) , (244)

where (𝜽(t),μ)superscript𝜽𝑡superscript𝜇\nabla\mathcal{L}(\boldsymbol{\theta}^{(t)},\mu^{*})∇ caligraphic_L ( bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is the (sub)gradient of (𝜽,μ)𝜽superscript𝜇\mathcal{L}(\boldsymbol{\theta},\mu^{*})caligraphic_L ( bold_italic_θ , italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) at 𝜽(t)superscript𝜽𝑡\boldsymbol{\theta}^{(t)}bold_italic_θ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the positive step size and ProjΔ(𝝋)subscriptProjΔ𝝋\textbf{Proj}_{\Delta}(\boldsymbol{\varphi})Proj start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_italic_φ ) is the Euclidean projection of 𝝋𝝋\boldsymbol{\varphi}bold_italic_φ onto ΔΔ\Deltaroman_Δ, i.e. the solution of

min𝜽n12𝝋𝜽22,s.t.𝜽0, 1𝜽=1.formulae-sequencesucceeds-or-equals𝜽superscript𝑛min12subscriptsuperscriptnorm𝝋𝜽22s.t.𝜽0superscript1top𝜽1\underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{min}}}\ \frac{1}{2}\|% \boldsymbol{\varphi}-\boldsymbol{\theta}\|^{2}_{2},\ \textit{s.t.}\ % \boldsymbol{\theta}\succeq 0,\ \boldsymbol{1}^{\top}\boldsymbol{\theta}=1.start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG min end_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_φ - bold_italic_θ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , s.t. bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 . (245)

The projection 𝜽(t+1)superscript𝜽𝑡1\boldsymbol{\theta}^{(t+1)}bold_italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT has the form

𝜽(t+1)superscript𝜽𝑡1\displaystyle\boldsymbol{\theta}^{(t+1)}bold_italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT =\displaystyle== [𝝋(t+1)γ𝟏]+subscriptdelimited-[]superscript𝝋𝑡1𝛾1\displaystyle\ \ [\boldsymbol{\varphi}^{(t+1)}-\gamma\boldsymbol{1}]_{+}[ bold_italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT - italic_γ bold_1 ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT (246)

where []+=max{,0}subscriptdelimited-[]max0[\cdot]_{+}=\textbf{{max}}\{\cdot,0\}[ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = max { ⋅ , 0 } and γ𝛾\gammaitalic_γ holds

γ=1ρ(i=1ρφi(t+1)1),𝛾1𝜌superscriptsubscript𝑖1𝜌subscriptsuperscript𝜑𝑡1𝑖1\gamma=\frac{1}{\rho}\left(\sum_{i=1}^{\rho}\varphi^{(t+1)}_{i}-1\right),italic_γ = divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) , (247)
ρ=max{j[n]|ψj1j(r=1jψr1)>0},𝜌maxconditional-set𝑗delimited-[]𝑛subscript𝜓𝑗1𝑗superscriptsubscript𝑟1𝑗subscript𝜓𝑟10\rho=\textbf{{max}}\left\{j\in[n]\ \left|\ \psi_{j}-\frac{1}{j}\left(\sum_{r=1% }^{j}\psi_{r}-1\right)>0\right.\right\},italic_ρ = max { italic_j ∈ [ italic_n ] | italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_j end_ARG ( ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - 1 ) > 0 } , (248)

where 𝝍𝝍\boldsymbol{\psi}bold_italic_ψ is the sorted version of 𝝋(t+1)superscript𝝋𝑡1\boldsymbol{\varphi}^{(t+1)}bold_italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT with descent order.

Details of Algorithm 5

  Input : The MLE estimator 𝜽^bold-^𝜽\boldsymbol{\hat{\theta}}overbold_^ start_ARG bold_italic_θ end_ARG and the total number of iterations L𝐿Litalic_L.
1
2Initialization: A starting point 𝝀0superscript𝝀0\boldsymbol{\lambda}^{0}bold_italic_λ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and a constant c0>0subscript𝑐00c_{0}>0italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0.
3for l=1𝑙1l=1italic_l = 1 to L𝐿Litalic_L do
4       Solve the maximizing sub-problem
𝜽(𝝀l1)arg max𝜽𝚯:r(𝜽)r(𝜽^)g(𝜽;𝝀l1,𝜽^),𝜽superscript𝝀𝑙1:𝜽𝚯𝑟𝜽𝑟bold-^𝜽arg max𝑔𝜽superscript𝝀𝑙1bold-^𝜽\boldsymbol{\theta}(\boldsymbol{\lambda}^{l-1})\in\underset{\boldsymbol{\theta% }\in\boldsymbol{\Theta}:r(\boldsymbol{\theta})\neq r(\boldsymbol{\hat{\theta}}% )}{\textbf{{arg max}}}\ -g(\boldsymbol{\theta};\boldsymbol{\lambda}^{l-1},% \boldsymbol{\hat{\theta}}),bold_italic_θ ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ∈ start_UNDERACCENT bold_italic_θ ∈ bold_Θ : italic_r ( bold_italic_θ ) ≠ italic_r ( overbold_^ start_ARG bold_italic_θ end_ARG ) end_UNDERACCENT start_ARG arg max end_ARG - italic_g ( bold_italic_θ ; bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_θ end_ARG ) ,
where
g(𝜽;𝝀l1,𝜽^)=(i,j)𝒜λi,jl1Di,j(𝜽^𝜽).𝑔𝜽superscript𝝀𝑙1bold-^𝜽𝑖𝑗𝒜subscriptsuperscript𝜆𝑙1𝑖𝑗superscript𝐷𝑖𝑗conditionalbold-^𝜽𝜽g(\boldsymbol{\theta};\boldsymbol{\lambda}^{l-1},\boldsymbol{\hat{\theta}})=% \underset{(i,j)\in\mathcal{A}}{\sum}\ \lambda^{l-1}_{i,j}D^{i,j}(\boldsymbol{% \hat{\theta}}\|\boldsymbol{\theta}).italic_g ( bold_italic_θ ; bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_θ end_ARG ) = start_UNDERACCENT ( italic_i , italic_j ) ∈ caligraphic_A end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG ∥ bold_italic_θ ) .
5      Calculate the sub-gradient d(𝝀l1)𝑑superscript𝝀𝑙1d(\boldsymbol{\lambda}^{l-1})italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT )
d(𝝀l1)=(d(𝝀l1)1,2,,d(𝝀l1)n,n1),𝑑superscript𝝀𝑙1𝑑subscriptsuperscript𝝀𝑙112𝑑subscriptsuperscript𝝀𝑙1𝑛𝑛1d(\boldsymbol{\lambda}^{l-1})=\big{(}d(\boldsymbol{\lambda}^{l-1})_{1,2},\dots% ,d(\boldsymbol{\lambda}^{l-1})_{n,n-1}\big{)},italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) = ( italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ) ,
where
d(𝝀l1)i,j=Di,j(𝜽^𝜽(𝝀l1)).𝑑subscriptsuperscript𝝀𝑙1𝑖𝑗superscript𝐷𝑖𝑗conditionalbold-^𝜽𝜽superscript𝝀𝑙1d(\boldsymbol{\lambda}^{l-1})_{i,j}=-D^{i,j}(\boldsymbol{\hat{\theta}}\|% \boldsymbol{\theta}(\boldsymbol{\lambda}^{l-1})).italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = - italic_D start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG ∥ bold_italic_θ ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ) .
6      Solve the minimizing sub-problem
𝝀larg min𝝀Δηld(𝝀l1),𝝀+D(𝝀𝝀l1),superscript𝝀𝑙𝝀Δarg minsubscript𝜂𝑙𝑑superscript𝝀𝑙1𝝀𝐷conditional𝝀superscript𝝀𝑙1\boldsymbol{\lambda}^{l}\in\underset{\boldsymbol{\lambda}\in\Delta}{\textbf{{% arg min}}}\ \eta_{l}\langle d(\boldsymbol{\lambda}^{l-1}),\boldsymbol{\lambda}% \rangle+D(\boldsymbol{\lambda}\|\boldsymbol{\lambda}^{l-1}),bold_italic_λ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∈ start_UNDERACCENT bold_italic_λ ∈ roman_Δ end_UNDERACCENT start_ARG arg min end_ARG italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟨ italic_d ( bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) , bold_italic_λ ⟩ + italic_D ( bold_italic_λ ∥ bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) ,
where ηl=c0/lsubscript𝜂𝑙subscript𝑐0𝑙\eta_{l}=c_{0}/\sqrt{~{}l~{}}italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / square-root start_ARG italic_l end_ARG and
D(𝝀𝝀l1)=(i,j)𝒜λi,jlogλi,jλi,jl1𝐷conditional𝝀superscript𝝀𝑙1𝑖𝑗𝒜subscript𝜆𝑖𝑗subscript𝜆𝑖𝑗subscriptsuperscript𝜆𝑙1𝑖𝑗D(\boldsymbol{\lambda}\|\boldsymbol{\lambda}^{l-1})=\underset{(i,j)\in\mathcal% {A}}{\sum}\lambda_{i,j}\log\frac{\lambda_{i,j}}{\lambda^{l-1}_{i,j}}italic_D ( bold_italic_λ ∥ bold_italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT ) = start_UNDERACCENT ( italic_i , italic_j ) ∈ caligraphic_A end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT roman_log divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG
7 end for
8The categorical distribution is obtained through averaging the sequence {𝝀1,,𝝀L}superscript𝝀1superscript𝝀𝐿\{\boldsymbol{\lambda}^{1},\dots,\boldsymbol{\lambda}^{L}\}{ bold_italic_λ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_italic_λ start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT } like
𝝀^=1Ll=1L𝝀l.bold-^𝝀1𝐿superscriptsubscript𝑙1𝐿superscript𝝀𝑙\boldsymbol{\hat{\lambda}}=\frac{1}{L}\sum_{l=1}^{L}\boldsymbol{\lambda}^{l}.overbold_^ start_ARG bold_italic_λ end_ARG = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_italic_λ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT .
Output : The categorical distribution 𝝀^bold-^𝝀\boldsymbol{\hat{\lambda}}overbold_^ start_ARG bold_italic_λ end_ARG.
Algorithm 5 MirrorDescent(𝑺,f,𝚯𝒜β,𝜽(m))MirrorDescent𝑺𝑓subscriptsuperscript𝚯𝛽𝒜superscript𝜽𝑚\textbf{{MirrorDescent}}(\boldsymbol{S},f,\boldsymbol{\Theta}^{\beta}_{% \mathcal{A}},\boldsymbol{\theta}^{(m)})MirrorDescent ( bold_italic_S , italic_f , bold_Θ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT )
𝝀^(m)=arg max𝝀Δming𝜽𝚯:𝝅(𝜽)𝝅(𝜽^(m))g(𝜽,𝝀;𝜽^(m)),superscriptbold-^𝝀𝑚𝝀Δarg max:𝜽𝚯𝝅𝜽𝝅superscriptbold-^𝜽𝑚ming𝑔𝜽𝝀superscriptbold-^𝜽𝑚\boldsymbol{\hat{\lambda}}^{(m)}=\underset{\phantom{\boldsymbol{\hat{\theta}}^% {(m)}}\boldsymbol{\lambda}\in\Delta\phantom{\boldsymbol{\hat{\theta}}^{(m)}}}{% \textbf{{arg max}}}\underset{\boldsymbol{\theta}\in\boldsymbol{\Theta}:% \boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(\boldsymbol{\hat{% \theta}}^{(m)})}{\ \textbf{{min}\phantom{g}}}\ g(\boldsymbol{\theta},% \boldsymbol{\lambda};\boldsymbol{\hat{\theta}}^{(m)}),overbold_^ start_ARG bold_italic_λ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT = start_UNDERACCENT bold_italic_λ ∈ roman_Δ end_UNDERACCENT start_ARG arg max end_ARG start_UNDERACCENT bold_italic_θ ∈ bold_Θ : bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG min bold_g end_ARG italic_g ( bold_italic_θ , bold_italic_λ ; overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ,

where 𝝀=(λ1,2,,λn,n1)𝝀subscript𝜆12subscript𝜆𝑛𝑛1\boldsymbol{\lambda}=(\lambda_{1,2},\dots,\lambda_{n,n-1})bold_italic_λ = ( italic_λ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_n , italic_n - 1 end_POSTSUBSCRIPT ),

g(𝜽,𝝀;𝜽^(m))=(i,j)𝒜λi,jDi,j(𝜽^(m)𝜽),𝑔𝜽𝝀superscriptbold-^𝜽𝑚𝑖𝑗𝒜subscript𝜆𝑖𝑗superscript𝐷𝑖𝑗conditionalsuperscriptbold-^𝜽𝑚𝜽g(\boldsymbol{\theta},\boldsymbol{\lambda};\boldsymbol{\hat{\theta}}^{(m)})=% \underset{(i,j)\in\mathcal{A}}{\sum}\ \lambda_{i,j}D^{i,j}(\boldsymbol{\hat{% \theta}}^{(m)}\|\boldsymbol{\theta}),italic_g ( bold_italic_θ , bold_italic_λ ; overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) = start_UNDERACCENT ( italic_i , italic_j ) ∈ caligraphic_A end_UNDERACCENT start_ARG ∑ end_ARG italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∥ bold_italic_θ ) ,

and Di,j(𝜽^𝜽)superscript𝐷𝑖𝑗conditionalbold-^𝜽𝜽D^{i,j}(\boldsymbol{\hat{\theta}}\|\boldsymbol{\theta})italic_D start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG ∥ bold_italic_θ ) is the Kullback-Leibler (KL) divergence from f(𝜽;(i,j),y)𝑓𝜽𝑖𝑗𝑦f(\boldsymbol{\theta};(i,j),y)italic_f ( bold_italic_θ ; ( italic_i , italic_j ) , italic_y ) to f(𝜽^;(i,j),y)𝑓bold-^𝜽𝑖𝑗𝑦f(\boldsymbol{\hat{\theta}};(i,j),y)italic_f ( overbold_^ start_ARG bold_italic_θ end_ARG ; ( italic_i , italic_j ) , italic_y ) as

Di,j(𝜽^𝜽)=y{1,1}f(𝜽^;(i,j),y)logf(𝜽^;(i,j),y)f(𝜽;(i,j),y).superscript𝐷𝑖𝑗conditionalbold-^𝜽𝜽𝑦11𝑓bold-^𝜽𝑖𝑗𝑦𝑓bold-^𝜽𝑖𝑗𝑦𝑓𝜽𝑖𝑗𝑦D^{i,j}(\boldsymbol{\hat{\theta}}\|\boldsymbol{\theta})=\underset{y\in\{-1,1\}% }{\sum}f(\boldsymbol{\hat{\theta}};(i,j),y)\log\frac{f(\boldsymbol{\hat{\theta% }};(i,j),y)}{f(\boldsymbol{\theta};(i,j),y)}.italic_D start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( overbold_^ start_ARG bold_italic_θ end_ARG ∥ bold_italic_θ ) = start_UNDERACCENT italic_y ∈ { - 1 , 1 } end_UNDERACCENT start_ARG ∑ end_ARG italic_f ( overbold_^ start_ARG bold_italic_θ end_ARG ; ( italic_i , italic_j ) , italic_y ) roman_log divide start_ARG italic_f ( overbold_^ start_ARG bold_italic_θ end_ARG ; ( italic_i , italic_j ) , italic_y ) end_ARG start_ARG italic_f ( bold_italic_θ ; ( italic_i , italic_j ) , italic_y ) end_ARG .
𝜽(𝝀(l1))arg max𝜽𝚯:r(𝜽)r(𝜽^)g(𝜽;𝝀(l1),𝜽^),𝜽superscript𝝀𝑙1:𝜽𝚯𝑟𝜽𝑟bold-^𝜽arg max𝑔𝜽superscript𝝀𝑙1bold-^𝜽\boldsymbol{\theta}(\boldsymbol{\lambda}^{(l-1)})\in\underset{\boldsymbol{% \theta}\in\boldsymbol{\Theta}:r(\boldsymbol{\theta})\neq r(\boldsymbol{\hat{% \theta}})}{\textbf{{arg max}}}\ -g(\boldsymbol{\theta};\boldsymbol{\lambda}^{(% l-1)},\boldsymbol{\hat{\theta}}),bold_italic_θ ( bold_italic_λ start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) ∈ start_UNDERACCENT bold_italic_θ ∈ bold_Θ : italic_r ( bold_italic_θ ) ≠ italic_r ( overbold_^ start_ARG bold_italic_θ end_ARG ) end_UNDERACCENT start_ARG arg max end_ARG - italic_g ( bold_italic_θ ; bold_italic_λ start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_θ end_ARG ) ,
𝜽(𝝀(l1))arg min𝜽𝚯𝒜β:𝝅(𝜽)𝝅(𝜽^)g(𝜽;𝝀(l1),𝜽^),𝜽superscript𝝀𝑙1:𝜽subscriptsuperscript𝚯𝛽𝒜𝝅𝜽𝝅bold-^𝜽arg min𝑔𝜽superscript𝝀𝑙1bold-^𝜽\boldsymbol{\theta}(\boldsymbol{\lambda}^{(l-1)})\in\underset{\boldsymbol{% \theta}\in\boldsymbol{\Theta}^{\beta}_{\mathcal{A}}:\boldsymbol{\pi}(% \boldsymbol{\theta})\neq\boldsymbol{\pi}(\boldsymbol{\hat{\theta}})}{\textbf{{% arg min}}}\ g(\boldsymbol{\theta};\boldsymbol{\lambda}^{(l-1)},\boldsymbol{% \hat{\theta}}),bold_italic_θ ( bold_italic_λ start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) ∈ start_UNDERACCENT bold_italic_θ ∈ bold_Θ start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT : bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG ) end_UNDERACCENT start_ARG arg min end_ARG italic_g ( bold_italic_θ ; bold_italic_λ start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_θ end_ARG ) ,
maxδ0𝒢(δ)𝛿0max𝒢𝛿\displaystyle\ \ \underset{\delta\geq 0}{\textbf{{max}}}\ \ \mathcal{G}(\delta)start_UNDERACCENT italic_δ ≥ 0 end_UNDERACCENT start_ARG max end_ARG caligraphic_G ( italic_δ )
:=assign\displaystyle:=:= inf𝜽n{2(𝜽,δ)|𝜽0, 1𝜽=1,𝝅(𝜽)𝝅(𝜽^)}𝜽superscript𝑛infconditional-setsubscript2𝜽𝛿formulae-sequencesucceeds-or-equals𝜽0formulae-sequencesuperscript1top𝜽1𝝅𝜽𝝅bold-^𝜽\displaystyle\ \ \underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{inf}% }}\ \Big{\{}\ \mathcal{L}_{2}(\boldsymbol{\theta},\delta)\ \Big{|}\ % \boldsymbol{\theta}\succeq 0,\ \boldsymbol{1}^{\top}\boldsymbol{\theta}=1,% \boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(\boldsymbol{\hat{% \theta}})\ \Big{\}}start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG { caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_θ , italic_δ ) | bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 , bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG ) }
=\displaystyle== inf𝜽n{δ2𝜽𝜽𝒜22δβ+g(𝜽;𝝀(l1),𝜽^)|𝜽0,𝟏𝜽=1,𝝅(𝜽)𝝅(𝜽^)}𝜽superscript𝑛infconditional-setmatrix𝛿2subscriptsuperscriptnorm𝜽subscript𝜽𝒜22𝛿𝛽𝑔𝜽superscript𝝀𝑙1bold-^𝜽matrixformulae-sequencesucceeds-or-equals𝜽0superscript1top𝜽1𝝅𝜽𝝅bold-^𝜽\displaystyle\ \underset{\boldsymbol{\theta}\in\mathbb{R}^{n}}{\textbf{{inf}}}% \ \left\{\ \begin{matrix}\displaystyle\frac{\delta}{2}\big{\|}\boldsymbol{% \theta}-\boldsymbol{\theta}_{\mathcal{A}}\big{\|}^{2}_{2}-\delta\beta\\[5.0pt] +\ g(\boldsymbol{\theta};\boldsymbol{\lambda}^{(l-1)},\boldsymbol{\hat{\theta}% })\end{matrix}\ \left|\ \begin{matrix}\boldsymbol{\theta}\succeq 0,\boldsymbol% {1}^{\top}\boldsymbol{\theta}=1,\\[10.0pt] \boldsymbol{\pi}(\boldsymbol{\theta})\neq\boldsymbol{\pi}(\boldsymbol{\hat{% \theta}})\end{matrix}\right.\right\}start_UNDERACCENT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG inf end_ARG { start_ARG start_ROW start_CELL divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∥ bold_italic_θ - bold_italic_θ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_δ italic_β end_CELL end_ROW start_ROW start_CELL + italic_g ( bold_italic_θ ; bold_italic_λ start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT , overbold_^ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG | start_ARG start_ROW start_CELL bold_italic_θ ⪰ 0 , bold_1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_θ = 1 , end_CELL end_ROW start_ROW start_CELL bold_italic_π ( bold_italic_θ ) ≠ bold_italic_π ( overbold_^ start_ARG bold_italic_θ end_ARG ) end_CELL end_ROW end_ARG }

Balance the choice probability of the selection rule

λi,j(m)=p2n(n1)+(1p)λ^i,j(m),i,j[n],ij.formulae-sequencesubscriptsuperscript𝜆𝑚𝑖𝑗𝑝2𝑛𝑛11𝑝subscriptsuperscript^𝜆𝑚𝑖𝑗𝑖formulae-sequence𝑗delimited-[]𝑛𝑖𝑗\lambda^{(m)}_{i,j}=p\cdot\frac{2}{n(n-1)}+(1-p)\hat{\lambda}^{(m)}_{i,j},\ i,% j\in[n],\ i\neq j.italic_λ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_p ⋅ divide start_ARG 2 end_ARG start_ARG italic_n ( italic_n - 1 ) end_ARG + ( 1 - italic_p ) over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_i , italic_j ∈ [ italic_n ] , italic_i ≠ italic_j .

is chosen such that iθi(t+1)=1subscript𝑖subscriptsuperscript𝜃𝑡1𝑖1\sum_{i}\theta^{(t+1)}_{i}=1∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. With Lemma 2 of [21], we know that γ𝛾\gammaitalic_γ plays the same role as the unique index i𝑖iitalic_i which

j=1i(φj(t+1)φi(t+1))<1superscriptsubscript𝑗1𝑖subscriptsuperscript𝜑𝑡1𝑗subscriptsuperscript𝜑𝑡1𝑖1\displaystyle\sum_{j=1}^{i}\Big{(}\varphi^{(t+1)}_{j}-\varphi^{(t+1)}_{i}\Big{% )}<1∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 1 (249)
j=1i+1(φj(t+1)φi+1(t+1))1,superscriptsubscript𝑗1𝑖1subscriptsuperscript𝜑𝑡1𝑗subscriptsuperscript𝜑𝑡1𝑖11\displaystyle\sum_{j=1}^{i+1}\Big{(}\varphi^{(t+1)}_{j}-\varphi^{(t+1)}_{i+1}% \Big{)}\geq 1,∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ( italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_φ start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) ≥ 1 ,

or i=n𝑖𝑛i=nitalic_i = italic_n if there does not exist any index satisfies (249).

The projection 𝜽(λ)=[θi(λ),,θn(λ)]𝜽𝜆subscript𝜃𝑖𝜆subscript𝜃𝑛𝜆\boldsymbol{\theta}(\lambda)=[\theta_{i}(\lambda),\dots,\theta_{n}(\lambda)]bold_italic_θ ( italic_λ ) = [ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ ) , … , italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_λ ) ] has the form

θi(λ)=[θ~i1λF(𝜽)θi]subscript𝜃𝑖𝜆delimited-[]subscript~𝜃𝑖1𝜆𝐹𝜽subscript𝜃𝑖\theta_{i}(\lambda)=\left[\tilde{\theta}_{i}-\frac{1}{\lambda}\frac{\partial F% (\boldsymbol{\theta})}{\partial\theta_{i}}\right]italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ ) = [ over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_λ end_ARG divide start_ARG ∂ italic_F ( bold_italic_θ ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ] (250)

finding the the Euclidean projection of vector 𝒗(λ)=[v1,,vn]n𝒗𝜆subscript𝑣1subscript𝑣𝑛superscript𝑛\boldsymbol{v}(\lambda)=[v_{1},\dots,v_{n}]\in\mathbb{R}^{n}bold_italic_v ( italic_λ ) = [ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

vi=subscript𝑣𝑖absentv_{i}=italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = (251)

onto the probabilistic simplex. Then the projection 𝜽(λ)𝜽𝜆\boldsymbol{\theta}(\lambda)bold_italic_θ ( italic_λ ) has the form θi(λ)=(viη)+subscript𝜃𝑖𝜆subscriptsubscript𝑣𝑖𝜂\theta_{i}(\lambda)=(v_{i}-\eta)_{+}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ ) = ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_η ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT for some η𝜂\eta\in\mathbb{R}italic_η ∈ blackboard_R where η𝜂\etaitalic_η is chosen such that 𝜽(λ)𝜽𝜆\boldsymbol{\theta}(\lambda)bold_italic_θ ( italic_λ ) will satisfy the probabilistic unit sphere constraint. It is equivalent to find the unique index i𝑖iitalic_i such that

j=1i(vjvi)<1,andj=1i+1(vjvi+1)1,formulae-sequencesuperscriptsubscript𝑗1𝑖subscript𝑣𝑗subscript𝑣𝑖1andsuperscriptsubscript𝑗1𝑖1subscript𝑣𝑗subscript𝑣𝑖11\sum_{j=1}^{i}(v_{j}-v_{i})<1,\ \text{and}\ \sum_{j=1}^{i+1}(v_{j}-v_{i+1})% \geq 1,∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 1 , and ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) ≥ 1 , (252)

and i=n𝑖𝑛i=nitalic_i = italic_n if there does not exist such an index. By (252), we know that

vjη{0,ji,0,jisubscript𝑣𝑗𝜂casesabsent0𝑗𝑖absent0𝑗𝑖v_{j}-\eta\left\{\begin{array}[]{ll}\geq 0,&j\leq i,\\ \leq 0,&j\geq i\end{array}\right.italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_η { start_ARRAY start_ROW start_CELL ≥ 0 , end_CELL start_CELL italic_j ≤ italic_i , end_CELL end_ROW start_ROW start_CELL ≤ 0 , end_CELL start_CELL italic_j ≥ italic_i end_CELL end_ROW end_ARRAY (253)