Eclipse Attack Detection on a Blockchain Network as a Non-Parametric Change Detection Problem

Anurag Gupta and Vikram Krishnamurthy and Brian Sadler Anurag Gupta is with the School of Electrical & Computer Engineering, Cornell University, Ithaca NY, 14853, USA. (e-mail: [email protected]).Vikram Krishnamurthy is with the School of Electrical & Computer Engineering, Cornell University, Ithaca NY, 14853, USA. (e-mail: [email protected]).Brian Sadler is with DEVCOM Army Research Laboratory, Adelphi, Maryland, U.S. (e-mail: [email protected])
Abstract

This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby distorting their local copy of the ledger. To detect an eclipse attack, we monitor changes in the Fréchet mean and variance of the evolving blockchain communication network connecting blockchain users. First, we leverage the Johnson-Lindenstrauss lemma to project large-dimensional networks into a lower-dimensional space, preserving essential statistical properties. Subsequently, we employ a non-parametric change detection procedure, leading to a test statistic that converges weakly to a Brownian bridge process in the absence of an eclipse attack. This enables us to quantify the false alarm rate of the detector. Our detector can be implemented as a smart contract on the blockchain, offering a tamper-proof and reliable solution. Finally, we use numerical examples to compare the proposed eclipse attack detector with a detector based on the random forest model.

I Introduction

Blockchain, an immutable ledger distributed across multiple users [1], relies on consensus among its users to share data. This paper studies adversarial attacks on blockchains, with a specific focus on eclipse attacks [2]. In an eclipse attack, malicious users isolate a victim user, disrupting their ability to reach a consensus with the rest of the network. For example, if a user has eight incoming connections from other users, and an attacker controls all eight of those nodes, the attacker can refuse to relay any new blocks that rest of the network produce. Hence, detecting eclipse attacks are crucial for safeguarding blockchain networks.

Main Results and Organization

To detect an eclipse attack, we propose a non-parametric change detection algorithm that identifies changes in the Fréchet mean and variance [3] (these are topological generalizations of mathematical expectation and variance) within a sequence of randomly evolving blockchain communication networks (BCNs). We exploit the Johnson-Lindenstrauss (JL) lemma [4] to extract essential features from the large-dimensional BCN, ensuring that the test statistic is approximately preserved. In blockchain, a smart contract is a computer program that automatically executes a task based on a pre-specified conditions. Our proposed detector can be implemented as a smart contract on blockchain to detect an eclipse attack using a network monitor; this information can then be relayed to the blockchain users.

Sec.II formulates eclipse attack detection as a change detection problem on a space of directed graph and describes our proposed detector. In Sec.III, we analyze the performance of the detector using weak convergence methods. Specifically, Theorem 1 shows that the scaled detector statistic converges weakly to a Brownian bridge process. As a result we can explicitly determine the false alarm by calculating the quantiles of a Brownian bridge. In the presence of an eclipse attack, Theorem 2 estimates the onset of the eclipse attack. Finally, Theorem 3 shows the effect of the JL lemma on the false positive alarm rate of the detector.

Sec.IV assesses the performance of our eclipse attack detector using numerical examples on simulated datasets. We also provide numerical examples comparing the proposed eclipse attack detector and a detector based on the random forest model (RFM).

Related Works

In the literature, several detectors have been proposed for detecting eclipse attacks. [5] and [6] utilize random forest classification to analyze communication traffic and train their models on eclipse attack datasets. [7] employs deep learning technique for detecting eclipse attacks. [8] uses the blockchain’s block creation rate as a detection metric. [9] monitors change in the proof-of-work difficulty levels to identify eclipse attacks.

Related to attack mitigation, a peer selection strategy introduced by [10] offers a way to reduce the likelihood of eclipse attacks. Eclipse attacks share similarities with Sybil attacks [11] and routing attacks [12], both of which can impact the integrity of the blockchain consensus protocol.

Our eclipse attack detection approach distinguishes itself by not requiring training data. Instead, we employ statistical tools from [3], which offer a generalized solution for change detection in arbitrary object spaces. We identify eclipse attacks by tracking changes in the Fréchet mean and variance within the sequence of randomly evolving BCN.

II Detecting Eclipse Attack on a Blockchain Network

In this section, we formulate detecting eclipse attacks on a blockchain network as a change detection problem and present our detection algorithm.

II-A Model for Eclipse Attack

We begin by modeling the BCN as a directed graph and defining its adjacency matrix.

Definition 1.

A BCN is represented as a directed graph G=(V,E)𝒢𝐺𝑉𝐸𝒢G=(V,E)\in\mathcal{G}italic_G = ( italic_V , italic_E ) ∈ caligraphic_G, where 𝒢𝒢\mathcal{G}caligraphic_G is the graph space comprising p𝑝pitalic_p vertices. Each vertex has q𝑞qitalic_q outgoing edges.

The adjacency matrix AG|V|×|V|subscript𝐴𝐺superscript𝑉𝑉A_{G}\in\mathbb{R}^{|V|\times|V|}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∈ roman_ℝ start_POSTSUPERSCRIPT | italic_V | × | italic_V | end_POSTSUPERSCRIPT of the BCN G𝐺Gitalic_G is defined as follows:

AG(i,j)={1, an edge from the vertex jto the vertex i0,otherwisesubscript𝐴𝐺𝑖𝑗cases1 an edge from the vertex jotherwiseto the vertex i0otherwise\displaystyle A_{G}(i,j)=\begin{cases}1,&\text{$\exists$ an edge from the % vertex $j$}\\ &\text{to the vertex $i$}\\ 0,&\text{otherwise}\end{cases}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = { start_ROW start_CELL 1 , end_CELL start_CELL ∃ an edge from the vertex italic_j end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL to the vertex italic_i end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW (1)

Consensus in blockchain relies on peer-to-peer (P2P) communication. The BCN serves to illustrate the flow of information among blockchain users. In the absence of an eclipse attack, the BCN at each time t𝑡titalic_t resembles a random graph with uniform distribution. Here, each user simply selects q𝑞qitalic_q neighbors in a random and uniform manner to share information. However, during an eclipse attack, malicious users target victim users with substantial computational power. These malicious actors choose their neighbors in a non-uniform manner to disrupt the consensus of the victim users.

In this work, we assume: (1) When all users select their neighbour honestly using the blockchain communication protocol, the random BCN G𝐺Gitalic_G follows an unknown but fixed distribution P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. (2) The eclipse attack strategy is time-invariant111For an eclipse attack, multiple malicious users must communicate continuously with the victim user(s). A complex eclipse attack strategy can slow the communication rate and make the attack ineffective. Therefore, the assumption of a time-invariant eclipse attack strategy is justified.. Consequently, in the presence of an eclipse attack, the BCN G𝐺Gitalic_G follows an unknown but fixed distribution P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For instance, a common eclipse attack strategy employed by malicious users is to choose the victim users as their neighbors with a significantly higher probability compared to other users. Now, let’s provide a formal definition of our model for the eclipse attack.

Definition 2 (Eclipse attack).

A blockchain is free from an eclipse attack if the random BCN, as represented by the graph G𝐺Gitalic_G (Definition 1), is sampled from 𝒢𝒢\mathcal{G}caligraphic_G following the distribution P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Conversely, a blockchain is under an eclipse attack if the random graph G𝐺Gitalic_G is sampled from 𝒢𝒢\mathcal{G}caligraphic_G following the distribution P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Example. Consider a blockchain network with p𝑝pitalic_p blockchain users, each of whom selects q𝑞qitalic_q neighbors for consensus. An example of the distribution P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is when each of the neighbors in the BCN is selected uniformly at random, i.e.,

Pr(AG(i,j)=1)=qps.t. iAG(i,j)=q,jformulae-sequencePrsubscript𝐴𝐺𝑖𝑗1𝑞𝑝s.t. subscript𝑖subscript𝐴𝐺𝑖𝑗𝑞for-all𝑗\displaystyle\operatorname{Pr}{\left(A_{G}(i,j)=1\right)}=\frac{q}{p}\quad% \text{s.t. }\sum_{i}A_{G}(i,j)=q,\forall jroman_Pr ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) = divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG s.t. ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = italic_q , ∀ italic_j

where, q𝑞qitalic_q is defined in Definition 1.

Now, let’s consider an example of the distribution P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Here, r𝑟ritalic_r malicious blockchain users, denoted as vpr1,vpr,,vpVsubscript𝑣𝑝𝑟1subscript𝑣𝑝𝑟subscript𝑣𝑝𝑉v_{p-r-1},v_{p-r},\ldots,v_{p}\in Vitalic_v start_POSTSUBSCRIPT italic_p - italic_r - 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_p - italic_r end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ italic_V, choose v1Vsubscript𝑣1𝑉v_{1}\in Vitalic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_V as their victim.

For j=vpr1,vpr,,vpVformulae-sequence𝑗subscript𝑣𝑝𝑟1subscript𝑣𝑝𝑟subscript𝑣𝑝𝑉j=v_{p-r-1},v_{p-r},\ldots,v_{p}\in Vitalic_j = italic_v start_POSTSUBSCRIPT italic_p - italic_r - 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_p - italic_r end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ italic_V

Pr(AG(i,j)=1)Prsubscript𝐴𝐺𝑖𝑗1\displaystyle\operatorname{Pr}{\left(A_{G}(i,j)=1\right)}roman_Pr ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) >qp,i=v1formulae-sequenceabsent𝑞𝑝𝑖subscript𝑣1\displaystyle>\frac{q}{p},\quad i=v_{1}> divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG , italic_i = italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
Pr(AG(i,j)=1)Prsubscript𝐴𝐺𝑖𝑗1\displaystyle\operatorname{Pr}{\left(A_{G}(i,j)=1\right)}roman_Pr ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) <qp,otherwiseabsent𝑞𝑝otherwise\displaystyle<\frac{q}{p},\quad\text{otherwise}< divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG , otherwise

For jvpr1,vpr,,vpVformulae-sequence𝑗subscript𝑣𝑝𝑟1subscript𝑣𝑝𝑟subscript𝑣𝑝𝑉j\neq v_{p-r-1},v_{p-r},\ldots,v_{p}\in Vitalic_j ≠ italic_v start_POSTSUBSCRIPT italic_p - italic_r - 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_p - italic_r end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ italic_V

Pr(AG(i,j)=1)Prsubscript𝐴𝐺𝑖𝑗1\displaystyle\operatorname{Pr}{\left(A_{G}(i,j)=1\right)}roman_Pr ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) =qps.t. iAG(i,j)=q,jformulae-sequenceabsent𝑞𝑝s.t. subscript𝑖subscript𝐴𝐺𝑖𝑗𝑞for-all𝑗\displaystyle=\frac{q}{p}\quad\text{s.t. }\sum_{i}A_{G}(i,j)=q,\forall j= divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG s.t. ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = italic_q , ∀ italic_j

In this example, the victim user heavily depends on information provided by attackers to keep up with the current state of the blockchain. As a result, the victim’s local copy of the distributed ledger no longer aligns with the majority consensus of the blockchain network.

II-B Eclipse Attack Detection Problem

We now formulate the eclipse attack detection problem as a change detection problem. The proposed detector operates on an offline dataset of BCNs222Note that changes in the BCN occurs at a faster rate than the addition of new blocks to the blockchain. This allows us to observe a substantial sample of BCNs before a double spend attack resulting from an eclipse attack is achieved. Therefore, using offline datasets for eclipse attack detection is practical.; the BCN can be monitored using a network monitor. We formulate the eclipse attack detection problem as a hypothesis testing problem.

Definition 3 (Eclipse attack detection problem).

Let the sequence of random graphs {Gi𝒢,i=1,2,,N}formulae-sequencesubscript𝐺𝑖𝒢𝑖12𝑁\{G_{i}\in\mathcal{G},\;i=1,2,\ldots,N\}{ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_G , italic_i = 1 , 2 , … , italic_N } denote the sequence of BCNs observed. The eclipse attack detection problem on a blockchain network is the following hypothesis testing problem

H0:G1,G2,,GNP1H1:τ{1,,N}s.t.{G1,G2,,Gτ1P1Gτ,Gτ+1,,GNP2subscript𝐻0:absentsimilar-tosubscript𝐺1subscript𝐺2subscript𝐺𝑁subscript𝑃1subscript𝐻1:absent𝜏1𝑁missing-subexpressionformulae-sequence𝑠𝑡casessimilar-tosubscript𝐺1subscript𝐺2subscript𝐺𝜏1subscript𝑃1similar-tosubscript𝐺𝜏subscript𝐺𝜏1subscript𝐺𝑁subscript𝑃2\displaystyle\begin{aligned} H_{0}&:G_{1},G_{2},\ldots,G_{N}\sim P_{1}\\ H_{1}&:\exists\;\tau\in\{1,\ldots,N\}\\ &s.t.\>\left\{\begin{array}[]{l}G_{1},G_{2},\ldots,G_{\tau-1}\sim P_{1}\\ G_{\tau},G_{\tau+1},\ldots,G_{N}\sim P_{2}\end{array}\right.\end{aligned}start_ROW start_CELL italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL : italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∼ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL : ∃ italic_τ ∈ { 1 , … , italic_N } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_s . italic_t . { start_ARRAY start_ROW start_CELL italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT ∼ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_G start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∼ italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY end_CELL end_ROW (2)

Here, τ𝜏\tauitalic_τ denotes the the onset of the eclipse attack on the blockchain network. The eclipse attack detection problem (2) is a change detection problem on a space of directed graphs.

II-C Test Statistic for Detecting Eclipse Attack

In this section, we present a test statistic to solve the eclipse attack detection problem (2). The proposed test statistic estimates changes in the mean and variance of the sequence of BCNs. However, the communication network do not lie in the Euclidean space. So, we use the concept of Fréchet mean and variance [3], a topological generalization of mean and variance333Fréchet mean μ𝜇\muitalic_μ and Fréchet variance V𝑉Vitalic_V of a probability measure P𝑃Pitalic_P is defined as follow: μ=argminω𝒢𝔼[d2(G,ω)],V=minω𝒢𝔼[d2(G,ω)]formulae-sequence𝜇subscript𝜔𝒢𝔼delimited-[]superscript𝑑2𝐺𝜔𝑉subscript𝜔𝒢𝔼delimited-[]superscript𝑑2𝐺𝜔\displaystyle\mu=\arg\min_{\omega\in\mathcal{G}}\mathbb{E}\left[d^{2}(G,\omega% )\right],\quad V=\min_{\omega\in\mathcal{G}}\mathbb{E}\left[d^{2}(G,\omega)\right]italic_μ = roman_arg roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT roman_𝔼 [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G , italic_ω ) ] , italic_V = roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT roman_𝔼 [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G , italic_ω ) ] Here, GPsimilar-to𝐺𝑃G\sim Pitalic_G ∼ italic_P denotes a random object with probability measure P𝑃Pitalic_P; 𝒢𝒢\mathcal{G}caligraphic_G denotes the sample space of the random object G𝐺Gitalic_G; and d𝑑ditalic_d denotes a suitable choice of distance metric on the space 𝒢𝒢\mathcal{G}caligraphic_G.. To calculate the Fréchet mean and variance, we define a distance metric d𝑑ditalic_d on the space of the BCN 𝒢𝒢\mathcal{G}caligraphic_G (Definition 1). This metric measures the dissimilarity between two BCNs G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT using the Frobenius norm.

Definition 4.

The distance d𝑑ditalic_d between two BCNs G1,G2𝒢subscript𝐺1subscript𝐺2𝒢G_{1},G_{2}\in\mathcal{G}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_G, (Definition 1), is defined as the Frobenius norm444Our model assumes that the number of users in the blockchain is fixed. Hence, distance between two BCNs is well-defined. of the difference between their adjacency matrices (1).

d(G1,G2)𝑑subscript𝐺1subscript𝐺2\displaystyle d(G_{1},G_{2})italic_d ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =(i,j|AG1(i,j)AG2(i,j)|2)12absentsuperscriptsubscript𝑖𝑗superscriptsubscript𝐴subscript𝐺1𝑖𝑗subscript𝐴subscript𝐺2𝑖𝑗212\displaystyle=\left(\sum_{i,j}|A_{G_{1}}(i,j)-A_{G_{2}}(i,j)|^{2}\right)^{% \frac{1}{2}}= ( ∑ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i , italic_j ) - italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_i , italic_j ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT (3)

The test statistic partitions the sequence of BCNs into two parts. The goal is to determine if the BCNs in these two components are sampled from the same or distinct distributions. To achieve this, the test statistic examines the Fréchet mean and variance of the BCNs in each component.

Under the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (2), i.e., absence of an eclipse attack on the blockchain network, the Fréchet mean and variance of the BCNs in both parts are the same. Conversely, under the alternate hypothesis H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (2), the Fréchet mean and variance of the BCNs in these parts differ, signaling the presence of an eclipse attack.

Before introducing our test statistic for eclipse attack detection, we define several mathematical quantities that rely on the adjacency matrices AGi,i=1,2,,Nformulae-sequencesubscript𝐴subscript𝐺𝑖𝑖12𝑁A_{G_{i}},\;i=1,2,\ldots,Nitalic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N of the sequence of BCNs. We also introduce the term n𝑛nitalic_n, which represents an estimate for the change point τ𝜏\tauitalic_τ in the eclipse attack detection problem (2). For each n{1,,N1}𝑛1𝑁1n\in\{1,\ldots,N-1\}italic_n ∈ { 1 , … , italic_N - 1 }, we proceed to define these quantities and present our test statistic.

μ^nsubscript^𝜇𝑛\displaystyle\hat{\mu}_{n}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT :=argminω𝒢1ni=1nd2(Gi,Gω)assignabsentsubscript𝜔𝒢1𝑛superscriptsubscript𝑖1𝑛superscript𝑑2subscript𝐺𝑖subscript𝐺𝜔\displaystyle:=\arg\min_{\omega\in\mathcal{G}}\frac{1}{n}\sum_{i=1}^{n}d^{2}% \left(G_{i},G_{\omega}\right):= roman_arg roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT )
V^nsubscript^𝑉𝑛\displaystyle\hat{V}_{n}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT :=1ni=1nd2(Gi,μ^n)assignabsent1𝑛superscriptsubscript𝑖1𝑛superscript𝑑2subscript𝐺𝑖subscript^𝜇𝑛\displaystyle:=\frac{1}{n}\sum_{i=1}^{n}d^{2}\left(G_{i},\hat{\mu}_{n}\right):= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
μ^Nnsubscript^𝜇𝑁𝑛\displaystyle\hat{\mu}_{N-n}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT :=argminω𝒢1(Nn)i=n+1Nd2(Gi,Gω)assignabsentsubscript𝜔𝒢1𝑁𝑛superscriptsubscript𝑖𝑛1𝑁superscript𝑑2subscript𝐺𝑖subscript𝐺𝜔\displaystyle:=\arg\min_{\omega\in\mathcal{G}}\frac{1}{(N-n)}\sum_{i=n+1}^{N}d% ^{2}\left(G_{i},G_{\omega}\right):= roman_arg roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ( italic_N - italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT )
V^Nnsubscript^𝑉𝑁𝑛\displaystyle\hat{V}_{N-n}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT :=1(Nn)i=n+1Nd2(Gi,μ^Nn)assignabsent1𝑁𝑛superscriptsubscript𝑖𝑛1𝑁superscript𝑑2subscript𝐺𝑖subscript^𝜇𝑁𝑛\displaystyle:=\frac{1}{(N-n)}\sum_{i=n+1}^{N}d^{2}\left(G_{i},\hat{\mu}_{N-n}\right):= divide start_ARG 1 end_ARG start_ARG ( italic_N - italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT )
V^nCsuperscriptsubscript^𝑉𝑛𝐶\displaystyle\hat{V}_{n}^{C}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT :=1ni=1nd2(Gi,μ^Nn)assignabsent1𝑛superscriptsubscript𝑖1𝑛superscript𝑑2subscript𝐺𝑖subscript^𝜇𝑁𝑛\displaystyle:=\frac{1}{n}\sum_{i=1}^{n}d^{2}\left(G_{i},\hat{\mu}_{N-n}\right):= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT )
V^NnCsuperscriptsubscript^𝑉𝑁𝑛𝐶\displaystyle\hat{V}_{N-n}^{C}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT :=1Nni=n+1Nd2(Gi,μ^n)assignabsent1𝑁𝑛superscriptsubscript𝑖𝑛1𝑁superscript𝑑2subscript𝐺𝑖subscript^𝜇𝑛\displaystyle:=\frac{1}{N-n}\sum_{i=n+1}^{N}d^{2}\left(G_{i},\hat{\mu}_{n}\right):= divide start_ARG 1 end_ARG start_ARG italic_N - italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
μ^^𝜇\displaystyle\hat{\mu}over^ start_ARG italic_μ end_ARG :=argminω𝒢1Ni=1Nd2(Gi,Gω)assignabsentsubscript𝜔𝒢1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2subscript𝐺𝑖subscript𝐺𝜔\displaystyle:=\arg\min_{\omega\in\mathcal{G}}\frac{1}{N}\sum_{i=1}^{N}d^{2}% \left(G_{i},G_{\omega}\right):= roman_arg roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT )
V^^𝑉\displaystyle\hat{V}over^ start_ARG italic_V end_ARG :=1Ni=1Nd2(Gi,μ^)assignabsent1𝑁superscriptsubscript𝑖1𝑁superscript𝑑2subscript𝐺𝑖^𝜇\displaystyle:=\frac{1}{N}\sum_{i=1}^{N}d^{2}\left(G_{i},\hat{\mu}\right):= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG )
σ^2superscript^𝜎2\displaystyle\hat{\sigma}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT :=1N[i=1Nd4(Gi,μ^)V^2]assignabsent1𝑁delimited-[]superscriptsubscript𝑖1𝑁superscript𝑑4subscript𝐺𝑖^𝜇superscript^𝑉2\displaystyle:=\frac{1}{N}\left[\sum_{i=1}^{N}d^{4}\left(G_{i},\hat{\mu}\right% )-\hat{V}^{2}\right]:= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG ) - over^ start_ARG italic_V end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (4)

The test statistic compares the Fréchet mean and variance of the BCNs G1,G2,,Gnsubscript𝐺1subscript𝐺2subscript𝐺𝑛G_{1},G_{2},\ldots,G_{n}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Gn+1,Gn+2,GNsubscript𝐺𝑛1subscript𝐺𝑛2subscript𝐺𝑁G_{n+1},G_{n+2},\ldots G_{N}italic_G start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_n + 2 end_POSTSUBSCRIPT , … italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT.

Definition 5 (Test statistic for detecting an eclipse attack).

Let n𝑛nitalic_n denote the estimate for the change point τ𝜏\tauitalic_τ in the eclipse attack detection problem (2). The test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT is defined as follows:

Sn,N=n(Nn)N2σ^2{(V^nV^Nn)2+(V^nCV^n+V^NnCV^Nn)2}\displaystyle\begin{aligned} S_{n,N}&=\frac{n(N-n)}{N^{2}\hat{\sigma}^{2}}% \left\{\left(\hat{V}_{n}-\hat{V}_{N-n}\right)^{2}+\right.\\ &\left.\left(\hat{V}_{n}^{C}-\hat{V}_{n}+\hat{V}_{N-n}^{C}-\hat{V}_{N-n}\right% )^{2}\right\}\end{aligned}start_ROW start_CELL italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG italic_n ( italic_N - italic_n ) end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG { ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_CELL end_ROW (5)

Here, V^n,V^NnCV^nC,V^NnC,σ^2subscript^𝑉𝑛superscriptsubscript^𝑉𝑁𝑛𝐶superscriptsubscript^𝑉𝑛𝐶superscriptsubscript^𝑉𝑁𝑛𝐶superscript^𝜎2\hat{V}_{n},\hat{V}_{N-n}^{C}\hat{V}_{n}^{C},\hat{V}_{N-n}^{C},\hat{\sigma}^{2}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are defined in (II-C).

The test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT in (5) comprises two terms: the first term estimates the change in Fréchet variance, while the second term estimates the change in the Fréchet mean of the BCNs in the two components.

II-D Dimensionality Reduction of the Adjacency Matrices

In this section, we use the JL lemma to to reduce the dimension of the adjacency matrix. Remember that the proposed test statistic (5) is computed using the sequence of adjacency matrices for the BCNs. The number of elements in the adjacency matrix grows as the square of the number of blockchain users. Hence, it is necessary to reduce the dimension of the adjacency matrices AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT to decrease the computational cost of the test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT (5). In this work, we leverage the JL lemma to project the adjacency matrices of BCNs into a lower-dimensional subspace while approximately preserving the test statistic.

Lemma 1 (Johnson-Lindenstrauss (JL) lemma).

Given any ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) and an integer N𝑁Nitalic_N, let k𝑘kitalic_k be a positive integer satisfying k243ϵ22ϵ3logN𝑘243superscriptitalic-ϵ22superscriptitalic-ϵ3𝑁k\geq\frac{24}{3\epsilon^{2}-2\epsilon^{3}}\log Nitalic_k ≥ divide start_ARG 24 end_ARG start_ARG 3 italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG roman_log italic_N. For any set A𝐴Aitalic_A containing N𝑁Nitalic_N points in msuperscript𝑚\mathbb{R}^{m}roman_ℝ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, there exists a map** f:mk:𝑓superscript𝑚superscript𝑘f:\mathbb{R}^{m}\rightarrow\mathbb{R}^{k}italic_f : roman_ℝ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → roman_ℝ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT such that for all x,yA𝑥𝑦𝐴x,y\in Aitalic_x , italic_y ∈ italic_A, the following inequality holds: (1ϵ)xy2f(x)f(y)2(1+ϵ)xy21italic-ϵsuperscriptnorm𝑥𝑦2superscriptnorm𝑓𝑥𝑓𝑦21italic-ϵsuperscriptnorm𝑥𝑦2(1-\epsilon)\|x-y\|^{2}\leq\|f(x)-f(y)\|^{2}\leq(1+\epsilon)\|x-y\|^{2}( 1 - italic_ϵ ) ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_f ( italic_x ) - italic_f ( italic_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 + italic_ϵ ) ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

The linear map f𝑓fitalic_f in Lemma 1 can be found using random projections in randomized polynomial time [13]. Now, we apply the JL lemma on the adjacency matrices to obtain the projected adjacency matrices.

Definition 6 (Projected adjacency matrices).

The projected adjacency matrices, denoted as A~Gi,i=1,2,,Nformulae-sequencesubscript~𝐴subscript𝐺𝑖𝑖12𝑁\tilde{A}_{G_{i}},\;i=1,2,\ldots,Nover~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N for the BCNs Gi𝒢,i=1,2,,Nformulae-sequencesubscript𝐺𝑖𝒢𝑖12𝑁G_{i}\in\mathcal{G},\;i=1,2,\ldots,Nitalic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_G , italic_i = 1 , 2 , … , italic_N, are obtained by applying the JL lemma (Lemma 1) to the adjacency matrices AGi,i=1,2,,Nformulae-sequencesubscript𝐴subscript𝐺𝑖𝑖12𝑁A_{G_{i}},\;i=1,2,\ldots,Nitalic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N (1), with an appropriately chosen value of ϵitalic-ϵ\epsilonitalic_ϵ. Equivalently,

A~Gi=f(AGi),i=1,2,,Nformulae-sequencesubscript~𝐴subscript𝐺𝑖𝑓subscript𝐴subscript𝐺𝑖𝑖12𝑁\displaystyle\tilde{A}_{G_{i}}=f(A_{G_{i}}),i=1,2,\ldots,Nover~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_f ( italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , italic_i = 1 , 2 , … , italic_N (6)

where the linear map f𝑓fitalic_f satisfies Lemma 1.

Comparison between the Adjacency Matrix AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT of the BCN And The Projected Adjacency Matrix A~Gsubscript~𝐴𝐺\tilde{A}_{G}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT

To apply the JL lemma (Lemma 1), we first vectorize the adjacency matrix AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. Denote the vectorized AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT as X𝑋Xitalic_X. Then, we compute a suitable linear transformation Q𝑄Qitalic_Q that satisfies the JL lemma for the chosen value of ϵitalic-ϵ\epsilonitalic_ϵ. Now, Y=QX𝔼[Y]=Q𝔼[X]ΣY=QΣXQ𝑌𝑄𝑋𝔼delimited-[]𝑌𝑄𝔼delimited-[]𝑋subscriptΣ𝑌𝑄subscriptΣ𝑋superscript𝑄Y=QX\Rightarrow\mathbb{E}\left[Y\right]=Q\mathbb{E}\left[X\right]\Rightarrow% \Sigma_{Y}=Q\Sigma_{X}Q^{\intercal}italic_Y = italic_Q italic_X ⇒ roman_𝔼 [ italic_Y ] = italic_Q roman_𝔼 [ italic_X ] ⇒ roman_Σ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = italic_Q roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT. As the proposed non-parametric statistical detector detects a change in the mean and the variance of the adjacency matrices of the random BCNs, we can use the projected adjacency matrix A~Gsubscript~𝐴𝐺\tilde{A}_{G}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT to compute the test statistic. This is because the mean of the projected adjacency A~Gsubscript~𝐴𝐺\tilde{A}_{G}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT is a linear transformation of the mean of the adjacency matrix AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, and the variance of the projected adjacency matrix A~Gsubscript~𝐴𝐺\tilde{A}_{G}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT is similar to the variance of the adjacency matrix AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT.

II-E Algorithm for Detecting Eclipse Attack

Having developed the necessary mathematical tools, we present the eclipse attack detection algorithm. Algorithm 1 outlines the steps in this algorithm. Given the large dimension of the BCN, we initially employ the JL lemma to reduce dimensionality while approximately preserving the test statistic defined in (5). We also assume that the eclipse attack do not occur near the endpoints555We assume that the eclipse attack do not occur near the endpoints of the sequence of communication networks. To detect an eclipse attack near the endpoints, the detector can use an overlap** sequence of BCNs, ensuring that the attack takes place away from the endpoints for at least one batch. Alternatively, one can refine the test statistic to detect an eclipse attack near endpoints (as explored in [14]), a topic we plan to investigate in future research. , i.e., τ𝕀+,τN(δ,1δ)formulae-sequence𝜏superscript𝕀𝜏𝑁𝛿1𝛿\tau\in\mathbb{I}^{+},\frac{\tau}{N}\in(\delta,1-\delta)italic_τ ∈ roman_𝕀 start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG ∈ ( italic_δ , 1 - italic_δ ) for some δ>0𝛿0\delta>0italic_δ > 0.

Algorithm 1 Algorithm for detecting an eclipse attack on a blockchain network
1:Sequence of adjacency matrices AGi,i=1,2,,Nformulae-sequencesubscript𝐴subscript𝐺𝑖𝑖12𝑁A_{G_{i}},\;i=1,2,\ldots,Nitalic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N of the random BCNs Gi𝒢,i=1,2,,Nformulae-sequencesubscript𝐺𝑖𝒢𝑖12𝑁G_{i}\in\mathcal{G},\;i=1,2,\ldots,Nitalic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_G , italic_i = 1 , 2 , … , italic_N at time t=1,2,,N𝑡12𝑁t=1,2,\ldots,Nitalic_t = 1 , 2 , … , italic_N, respectively (Definition 3).
2:Dimensionality reduction: Compute the projected adjacency matrices A~Gi,i=1,2,,Nformulae-sequencesubscript~𝐴subscript𝐺𝑖𝑖12𝑁\tilde{A}_{G_{i}},\;i=1,2,\ldots,Nover~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N (6) using the JL lemma.
3:Test statistic: Compute the test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT (5) using the projected adjacency matrices A~Gi,i=1,2,,Nformulae-sequencesubscript~𝐴subscript𝐺𝑖𝑖12𝑁\tilde{A}_{G_{i}},\;i=1,2,\ldots,Nover~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N for n=1,2,,N1,nN(δ,1δ)formulae-sequence𝑛12𝑁1𝑛𝑁𝛿1𝛿n=1,2,\ldots,N-1,\frac{n}{N}\in(\delta,1-\delta)italic_n = 1 , 2 , … , italic_N - 1 , divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ∈ ( italic_δ , 1 - italic_δ ) for some δ>0𝛿0\delta>0italic_δ > 0.
4:Asymptotic quantile: Choose a level of significance α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ]. Compute q1α=(1α)subscript𝑞1𝛼1𝛼q_{1-\alpha}=(1-\alpha)italic_q start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT = ( 1 - italic_α ) quantile of the distribution maxn{1,2,,N1},nN(δ,1δ)2(nN)subscriptformulae-sequence𝑛12𝑁1𝑛𝑁𝛿1𝛿superscript2𝑛𝑁\max_{n\in\{1,2,\ldots,N-1\},\frac{n}{N}\in(\delta,1-\delta)}\mathcal{B}^{2}% \left(\frac{n}{N}\right)roman_max start_POSTSUBSCRIPT italic_n ∈ { 1 , 2 , … , italic_N - 1 } , divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ∈ ( italic_δ , 1 - italic_δ ) end_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ). Here, (t)𝑡\mathcal{B}(t)caligraphic_B ( italic_t ) is a Brownian bridge process on [0,1]01[0,1][ 0 , 1 ] with the covariance function C(t1,t2)=1𝐶subscript𝑡1subscript𝑡21C(t_{1},t_{2})=1italic_C ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 1 for 0t1t210subscript𝑡1subscript𝑡210\leq t_{1}\leq t_{2}\leq 10 ≤ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1.
5:if maxn{1,2,N1},nN(δ,1δ)NSn,N<q1αsubscriptformulae-sequence𝑛12𝑁1𝑛𝑁𝛿1𝛿𝑁subscript𝑆𝑛𝑁subscript𝑞1𝛼\max_{n\in\{1,2,\ldots N-1\},\frac{n}{N}\in(\delta,1-\delta)}NS_{n,N}<q_{1-\alpha}roman_max start_POSTSUBSCRIPT italic_n ∈ { 1 , 2 , … italic_N - 1 } , divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ∈ ( italic_δ , 1 - italic_δ ) end_POSTSUBSCRIPT italic_N italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT < italic_q start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT then return No eclipse attack detected.
6:elsereturn Eclipse attack detected at time
n:=argmaxn{1,2,,N1},nN(δ,1δ)Sn,Nassignsuperscript𝑛subscriptargmaxformulae-sequence𝑛12𝑁1𝑛𝑁𝛿1𝛿subscript𝑆𝑛𝑁n^{*}:=\operatorname*{arg\,max}_{n\in\{1,2,\ldots,N-1\},\frac{n}{N}\in(\delta,% 1-\delta)}S_{n,N}italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_n ∈ { 1 , 2 , … , italic_N - 1 } , divide start_ARG italic_n end_ARG start_ARG italic_N end_ARG ∈ ( italic_δ , 1 - italic_δ ) end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT
7:end if

Let k𝑘kitalic_k denote the dimension of the adjacency matrices. Then, the complexity of the Algorithm 1 is O(N2k+N|𝒢|)𝑂superscript𝑁2𝑘𝑁𝒢O\left(N^{2}k+N|\mathcal{G}|\right)italic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k + italic_N | caligraphic_G | ), where 𝒢𝒢\mathcal{G}caligraphic_G is defined in (1).

To summarize, we designed an algorithm to detect an eclipse attack on a blockchain network. The proposed test statistic was based on Fréchet change detection [3]. We also used the JL lemma to reduce the dimension of the BCNs.

III Weak Convergence Analysis of Eclipse Attack Detector

In this section, we analyze the test statistic for the proposed eclipse attack detector (Algorithm 1). Under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (absence of an eclipse attack), we prove that a scaled test statistic converges weakly to the square of a Brownian bridge process. Under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (presence of an eclipse attack), we show that the peak of the test statistic estimates the onset of the eclipse attack.

III-A Weak Convergence of Test Statistic

Our first result (Theorem 1) analyzes the asymptotics of the test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT (5) under the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (2), i.e., absence of an eclipse attack on the blockchain network. Note that {Sn,N,n=1,2,,N1}formulae-sequencesubscript𝑆𝑛𝑁𝑛12𝑁1\{S_{n,N},n=1,2,\ldots,N-1\}{ italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT , italic_n = 1 , 2 , … , italic_N - 1 } (5), represents a discrete-time stochastic process. As is customary in weak convergence analysis [15, 16], we first construct a continuous time stochastic process SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) by interpolating the discrete time test statistic process {Sn,N}subscript𝑆𝑛𝑁\{S_{n,N}\}{ italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT }.

SN(Nt)subscript𝑆𝑁𝑁𝑡\displaystyle S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) =Sn,Nabsentsubscript𝑆𝑛𝑁\displaystyle=S_{n,N}= italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT (7)
for Nt[n,n+1),n=0,1,,N1formulae-sequencefor 𝑁𝑡𝑛𝑛1𝑛01𝑁1\displaystyle\text{ for }Nt\in[n,n+1),\;n=0,1,\ldots,N-1for italic_N italic_t ∈ [ italic_n , italic_n + 1 ) , italic_n = 0 , 1 , … , italic_N - 1

The continuous time process SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) has sample paths in the function space D[0,1]𝐷01D[0,1]italic_D [ 0 , 1 ], namely the space of functions that are continuous on the right with limit on the left (cadlag functions). We define a scaled test statistic continuous time stochastic process TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) as follows:

TN(t):=NSN(Nt)assignsubscript𝑇𝑁𝑡𝑁subscript𝑆𝑁𝑁𝑡\displaystyle T_{N}(t):=NS_{N}(Nt)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) := italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) (8)

Theorem 1 shows that as N𝑁N\rightarrow\inftyitalic_N → ∞, the scaled test statistic continuous time stochastic process TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) converges weakly (in Skorohod metric [17]) to the square of a Brownian bridge stochastic process. Note that the weak convergence approach deals with the convergence of scaled sequences of the test statistic that are treated as stochastic processes rather than random variables. Thus, the weak convergence approach specifies convergence for the entire trajectory of the test statistic of the detection algorithm.

Theorem 1.

Assume that the eclipse attack do not occur near the endpoints, i.e., τN[δ,(1δ)]𝜏𝑁𝛿1𝛿\frac{\tau}{N}\in[\delta,(1-\delta)]divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG ∈ [ italic_δ , ( 1 - italic_δ ) ] for some δ>0𝛿0\delta>0italic_δ > 0, where τ𝜏\tauitalic_τ is defined in (2). Then, under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (absence of an eclipse attack), the scaled test statistic (8) process converges weakly:

TN(t)subscript𝑇𝑁𝑡\displaystyle T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) 2(t)absentsuperscript2𝑡\displaystyle\Rightarrow\mathcal{B}^{2}(t)⇒ caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t )
Also, the continuous map** theorem implies
maxt[δ,(1δ)]TN(t)subscript𝑡𝛿1𝛿subscript𝑇𝑁𝑡\displaystyle\max_{t\in[\delta,(1-\delta)]}T_{N}(t)roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , ( 1 - italic_δ ) ] end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) maxt[δ,(1δ)]2(t)absentsubscript𝑡𝛿1𝛿superscript2𝑡\displaystyle\Rightarrow\max_{t\in[\delta,(1-\delta)]}\mathcal{B}^{2}(t)⇒ roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , ( 1 - italic_δ ) ] end_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t )

Here \Rightarrow denotes weak convergence666 Weak convergence in functional space is a generalization of the weak convergence in distribution for random variables. A sequence of probability measures μnsubscript𝜇𝑛\mu_{n}italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges weakly to the probability measure μ𝜇\muitalic_μ if, for all bounded and continuous test functionals f𝑓fitalic_f, the expected value of f𝑓fitalic_f with respect to μnsubscript𝜇𝑛\mu_{n}italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges to the expected value of f𝑓fitalic_f with respect to μ𝜇\muitalic_μ, i.e., 𝔼μn[f]𝔼μ[f]subscript𝔼subscript𝜇𝑛delimited-[]𝑓subscript𝔼𝜇delimited-[]𝑓\mathbb{E}_{\mu_{n}}[f]\rightarrow\mathbb{E}_{\mu}[f]roman_𝔼 start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f ] → roman_𝔼 start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT [ italic_f ]. ; \mathcal{B}caligraphic_B is a standardized Brownian bridge process777A standardized Brownian bridge on [0,T]0𝑇[0,T][ 0 , italic_T ] is a continuous-time stochastic process whose probability distribution is the conditional probability of the Weiner process W(t)𝑊𝑡W(t)italic_W ( italic_t ) subject to the condition that W(0)=W(T)=0𝑊0𝑊𝑇0W(0)=W(T)=0italic_W ( 0 ) = italic_W ( italic_T ) = 0 with the covariance function C(t1,t2)=1, 0t1t21formulae-sequence𝐶subscript𝑡1subscript𝑡21 0subscript𝑡1subscript𝑡21C(t_{1},t_{2})=1,\,0\leq t_{1}\leq t_{2}\leq 1italic_C ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 1 , 0 ≤ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1. with the covariance function C(t1,t2)=1, 0t1t21formulae-sequence𝐶subscript𝑡1subscript𝑡21 0subscript𝑡1subscript𝑡21C(t_{1},t_{2})=1,\,0\leq t_{1}\leq t_{2}\leq 1italic_C ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 1 , 0 ≤ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1.

Proof.

Appendix A of the supplementary material. ∎

Convergence to a Brownian bridge instead of Brownian process in Theorem 1 is intuitive because TN(t)t(1t)proportional-tosubscript𝑇𝑁𝑡𝑡1𝑡T_{N}(t)\propto t(1-t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) ∝ italic_t ( 1 - italic_t ). Theorem 1 is used in steps 3-4 of Algorithm 1 to detect an eclipse attack. In practice, we declare the presence of an eclipse attack on a blockchain network if the maximum of the scaled test statistic exceeds the 0.950.950.950.95 quantile, denoted as q0.95subscript𝑞0.95q_{0.95}italic_q start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT, of the distribution maxt[δ,1δ]2(t)subscript𝑡𝛿1𝛿superscript2𝑡\max_{t\in[\delta,1-\delta]}\mathcal{B}^{2}(t)roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , 1 - italic_δ ] end_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ).

Our second result (Theorem 2 below) investigates the test statistic SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) (5) under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (2), i.e., presence of an eclipse attack on the blockchain network. This result estimates the onset of the eclipse attack. Before presenting the theorem, we need to define the limiting test statistic S(Nt)𝑆𝑁𝑡S(Nt)italic_S ( italic_N italic_t ):

S(t):=limNSN(Nt)assign𝑆𝑡subscript𝑁subscript𝑆𝑁𝑁𝑡\displaystyle S(t):=\lim_{N\rightarrow\infty}S_{N}(Nt)italic_S ( italic_t ) := roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) (9)

Here, the test statistic SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) converges to S(t)𝑆𝑡S(t)italic_S ( italic_t ) in probability [3].

Theorem 2.

Assume that the eclipse attack do not occur near the endpoints, i.e., τN[δ,(1δ)]𝜏𝑁𝛿1𝛿\frac{\tau}{N}\in[\delta,(1-\delta)]divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG ∈ [ italic_δ , ( 1 - italic_δ ) ] for some δ>0𝛿0\delta>0italic_δ > 0, where τ𝜏\tauitalic_τ is defined in (2). Then, under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (presence of an eclipse attack), the maximum of the limiting test statistic S(Nt)𝑆𝑁𝑡S(Nt)italic_S ( italic_N italic_t ) defined in (9) occurs at the onset, τ𝜏\tauitalic_τ of the eclipse attack:

limNτN=argmaxt[δ,(1δ)]S(t)subscript𝑁𝜏𝑁subscriptargmax𝑡𝛿1𝛿𝑆𝑡\displaystyle\lim_{N\rightarrow\infty}\frac{\tau}{N}=\operatorname*{arg\,max}_% {t\in[\delta,(1-\delta)]}S(t)roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , ( 1 - italic_δ ) ] end_POSTSUBSCRIPT italic_S ( italic_t )

Let τN=Nargmaxt[δ,(1δ)]SN(Nt)subscript𝜏𝑁𝑁subscript𝑡𝛿1𝛿subscript𝑆𝑁𝑁𝑡\tau_{N}=N\arg\max_{t\in[\delta,(1-\delta)]}S_{N}(Nt)italic_τ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_N roman_arg roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , ( 1 - italic_δ ) ] end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) where SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) is defined in (7). Then for γ>0𝛾0\gamma>0italic_γ > 0 the following holds

Pr(|τNτN|>γ)0Prsubscript𝜏𝑁𝜏𝑁𝛾0\displaystyle\begin{aligned} \operatorname{Pr}{\left(\left|\frac{{\tau}_{N}-% \tau}{N}\right|>\gamma\right)}\rightarrow 0\end{aligned}start_ROW start_CELL roman_Pr ( | divide start_ARG italic_τ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT - italic_τ end_ARG start_ARG italic_N end_ARG | > italic_γ ) → 0 end_CELL end_ROW
Proof.

Appendix B of supplementary material. ∎

The second statement of Theorem 2 gives an error bound for estimating the onset of the eclipse attack using finite samples of BCNs. Theorem 2 is used in step 5 of Algorithm 1 to estimate the onset of the eclipse attack using the discrete-time test statistic Sn,Nsubscript𝑆𝑛𝑁S_{n,N}italic_S start_POSTSUBSCRIPT italic_n , italic_N end_POSTSUBSCRIPT (7) .

III-B Effect of the Processed Adjacency Matrices on the Test Statistic

Our final result compares the test statistic computed using the projected adjacency matrices and the original adjacency matrices of the BCN888 See Sec.IV-D of the supplementary material for a numerical example illustrating Theorem 3.. It shows that the false positive alarm rate of the detector is higher when the test statistic is computed using the projected adjacency matrix A~Gsubscript~𝐴𝐺\tilde{A}_{G}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT.

Theorem 3.

Let SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) defined in (7) denote the test statistic computed using the original adjacency matrices (1). Let S~N(Nt)subscript~𝑆𝑁𝑁𝑡\tilde{S}_{N}(Nt)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) denote the test statistic computed using the projected adjacency matrices (6). Under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (2), as N𝑁N\rightarrow\inftyitalic_N → ∞, using projected adjacency matrices to compute the test statistic leads to a higher false positive alarm rate:

limNS~N(Nt)limNSN(Nt)subscript𝑁subscript~𝑆𝑁𝑁𝑡subscript𝑁subscript𝑆𝑁𝑁𝑡\displaystyle\lim_{N\rightarrow\infty}\tilde{S}_{N}(Nt)\geq\lim_{N\rightarrow% \infty}S_{N}(Nt)roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) ≥ roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t )

Here, the convergence to the limit is in probability. Furthermore,

limNS~N(Nt)5ϵt(1t)Vσ^2subscript𝑁subscript~𝑆𝑁𝑁𝑡5italic-ϵ𝑡1𝑡𝑉superscript^𝜎2\displaystyle\lim_{N\rightarrow\infty}\tilde{S}_{N}(Nt)\geq\frac{5\epsilon\,t(% 1-t)V}{\hat{\sigma}^{2}}roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) ≥ divide start_ARG 5 italic_ϵ italic_t ( 1 - italic_t ) italic_V end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

where V=limNV^𝑉subscript𝑁^𝑉V=\lim_{N\rightarrow\infty}\hat{V}italic_V = roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT over^ start_ARG italic_V end_ARG (V^^𝑉\hat{V}over^ start_ARG italic_V end_ARG is defined in (II-C)); ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ); and σ^2superscript^𝜎2\hat{\sigma}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the empirical variance computed in (II-C).

Proof.

Appendix C of supplementary material. ∎

In summary, we have presented three key results on the test statistic (5) for detecting an eclipse attack: 1) Under the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (2), the first result ensures weak convergence of the maximum of the scaled test statistic to the maximum of the square of the Brownian bridge process. 2) Under the alternate hypothesis H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (2), the second result estimates the onset of the eclipse attack on the blockchain network. 3) The third result investigates the impact on the false alarm rate of the detector when using the projected adjacency matrices (6) to compute the test statistic.

IV Numerical Examples

Refer to caption
(a) Absence of an eclipse attack.
Refer to caption
(b) Presence of an eclipse attack at τN=0.6𝜏𝑁0.6\frac{\tau}{N}=0.6divide start_ARG italic_τ end_ARG start_ARG italic_N end_ARG = 0.6.
Figure 1: Scaled test statistic TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) (8) vs. t𝑡titalic_t in the absence/presence of an eclipse attack on the blockchain network (100 simulations). We used the projected adjacency matrices (6) of dimension 100 to compute TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ). When there’s an eclipse attack, the peak of the scaled test statistic is well above the 0.950.950.950.95 quantile of the distribution 2(t)=q0.95=9.05superscript2𝑡subscript𝑞0.959.05\mathcal{B}^{2}(t)=q_{0.95}=9.05caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) = italic_q start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT = 9.05 (Theorem 1). Moreover, the peak of the scaled test statistic gives the onset of the eclipse attack. Therefore, using the processed adjacency matrices decreases the computational cost of the detector while preserving the test statistic (See Sec.IV-D of the supplementary material for a numerical example comparing the test statistics computed using the original and projected adjacency matrices).

In this section, we illustrate our eclipse attack detection algorithm (Algorithm 1) on a simulated dataset. Sec.IV-A describes the process of generating a simulated dataset using the eclipse attack model in Definition 2. Sec.IV-B studies the performance of the proposed eclipse detector when applied to the simulated dataset. Sec.IV-C plots the ROC curve for the proposed eclipse detector on a noisy dataset. Sec.IV-D studies the effect of projected adjacency matrices on the false alarm rate of the detector. Sec.IV-E compares the proposed eclipse attack detector against an eclipse attack detector based on the RFM. Sec.IV-F implements a RFM based regressor to estimate the onset of the eclipse attack. Sec.IV-G studies the sensitivity of the RFM based detector to variations in the training dataset.999All numerical examples use Matlab. Our source codes are available in the Sec.LABEL:sec:source-code of the supplementary material.

IV-A Simulation Setup

We use Definition 2 to generate a simulated dataset to illustrate the performance of the eclipse attack detector (Algorithm 1). Our dataset represents a large-dimensional101010The number of elements in the adjacency matrix for the BCN is 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. blockchain network with 100 users; it consists of a sequence of 1000100010001000 adjacency matrices (1) for the BCNs. In the absence of an eclipse attack, each blockchain user randomly and uniformly selects five neighbors. However, to simulate an eclipse attack, we introduced one victim user and two malicious users into the blockchain. The malicious users always include the victim user as one of their neighbors and other four neighbors are chosen uniformly at random. Equivalently, P1,P2subscript𝑃1subscript𝑃2P_{1},P_{2}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in Definition 2 are given by

P1(AG(i,j)=1)subscript𝑃1subscript𝐴𝐺𝑖𝑗1\displaystyle P_{1}(A_{G}(i,j)=1)italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) =5100absent5100\displaystyle=\frac{5}{100}= divide start_ARG 5 end_ARG start_ARG 100 end_ARG
P2(AG(i,j)=1)subscript𝑃2subscript𝐴𝐺𝑖𝑗1\displaystyle P_{2}(A_{G}(i,j)=1)italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i , italic_j ) = 1 ) ={1,i=1,j=99,100499,i1,j=99,1005100,otherwiseabsentcasesformulae-sequence1𝑖1𝑗99100otherwiseformulae-sequence499𝑖1𝑗99100otherwise5100otherwiseotherwise\displaystyle=\begin{cases}1,\quad i=1,j=99,100\\ \frac{4}{99},\quad i\neq 1,j=99,100\\ \frac{5}{100},\quad\text{otherwise}\end{cases}= { start_ROW start_CELL 1 , italic_i = 1 , italic_j = 99 , 100 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG 4 end_ARG start_ARG 99 end_ARG , italic_i ≠ 1 , italic_j = 99 , 100 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG 5 end_ARG start_ARG 100 end_ARG , otherwise end_CELL start_CELL end_CELL end_ROW

Here, the blockchain user with index 1 is the victim, and those with indexes 99 and 100 are the attackers. We assume that the nodes in the graph are labeled in descending order of their computation power. Since eclipse attacks target users with high computational power, our numerical examples focus on the first four rows of the adjacency matrix to reduce computational cost. Henceforth, with an abuse of the notation, AGsubscript𝐴𝐺A_{G}italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT refers to the first four rows of the adjacency matrix.

IV-B Numerical Examples for the Proposed Eclipse Attack Detector

We employ Algorithm 1 to detect an eclipse attack on the blockchain network. In step 1, we used the projected adjacency matrices (6) with the number of elements equal to 100. In step 2, we used the projected adjacency matrices to compute the scaled test statistic TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) (8). In step 3, we set a significance level of 0.05 for rejecting the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We computed the q0.95subscript𝑞0.95q_{0.95}italic_q start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT quantile of the distribution maxt[δ,1δ]2(t)subscript𝑡𝛿1𝛿superscript2𝑡\max_{t\in[\delta,1-\delta]}\mathcal{B}^{2}(t)roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_δ , 1 - italic_δ ] end_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) to be 9.05. Fig. 1 plots the scaled test statistic TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) defined in (8) for both the absence and presence of an eclipse attack. In the presence of an eclipse attack, the peak of the test statistic surpasses the threshold q0.95subscript𝑞0.95q_{0.95}italic_q start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT. Moreover, the peak of the scaled test statistic gives the onset of the eclipse attack (Theorem 2).

IV-C ROC Curves for the Proposed Eclipse Attack Detector

In this section, we investigate how the signal-to-noise ratio (SNR) of the dataset impacts the performance of the proposed eclipse attack detector (Algorithm 1). We added noise in the adjacency matrix as follows:

Y=XN,N=𝟙{U>SNR1}formulae-sequence𝑌𝑋𝑁𝑁double-struck-𝟙𝑈superscriptSNR1\displaystyle Y=X\wedge N,\quad N=\mathbb{1}\left\{U>\operatorname{SNR}^{-1}\right\}italic_Y = italic_X ∧ italic_N , italic_N = blackboard_𝟙 { italic_U > roman_SNR start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT } (10)

Here, X,Y𝑋𝑌X,Yitalic_X , italic_Y denote the noise-free and noisy adjacency matrix, respectively; U𝑈Uitalic_U denotes a uniform random variable on [0,1]01[0,1][ 0 , 1 ]; and \wedge denotes the logical and operator. This noise simulates scenarios where the network monitor misses communication between two nodes.

Refer to caption
Figure 2: ROC curve of the proposed eclipse attack detector for various SNR values (10). As observed, the detector performs well with noisy datasets.

Fig. 2 displays the ROC curve [18] for the proposed eclipse attack detector with various SNR values. As observed, the eclipse attack detector is robust to noise.

IV-D Comparison of the Test Statistic Computed using Original and Projected Adjacency Matrices

Refer to caption
(a) Absence of an eclipse attack on the blockchain network.
Refer to caption
(b) Presence of an eclipse attack on the blockchain network.
Figure 3: Comparison of the scaled test statistic TN(t)subscript𝑇𝑁𝑡T_{N}(t)italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) computed using original and projected adjacency matrices. The scaled test statistic is averaged over 100 simulations. As in Sec.IV-B, we use the first four rows of the adjacency matrices. Therefore, the number of elements in the original adjacency matrix is 400. We used the JL lemma to obtain the projected adjacency matrices of dimension 100. As observed, the computing the scaled test statistic using the projected adjacency matrices leads to higher false positive and false negative alarm rate.

Recall that in Theorem 3, we showed that using the projected adjacency matrices to compute the scaled test statistic leads to a higher false positive alarm rate. Fig. 3(a) illustrates the impact of projected adjacency matrices on the false alarm rate. As in Sec.IV-B, we use the first four rows of the adjacency matrices. Therefore, number of elements in the original adjacency matrix is 400. We used the JL lemma to obtain the projected adjacency matrices of dimension 100.

Moreover, in Fig. 3(b), we show using a numerical example that computing the scaled test statistics using the projected adjacency matrices leads to higher false negative rate.

The two numerical examples justifies the heuristic that the JL lemma approximately preserves the test statistic.

IV-E Comparison of the Proposed Eclipse Attack Detector with a RFM based Detector

This section compares the performance of our proposed eclipse attack detector with a RFM [19] based detector.

To begin, we trained a random forest classifier to detect an eclipse attack on a blockchain network. The training dataset consisted of 390 data points, each corresponding to a sequence of 1000 adjacency matrices for the BCNs (Sec. IV-A). The simulated dataset was free from noise. If a sequence of BCNs was free from an eclipse attack, it was labeled as ‘0’; otherwise, it was labeled as ‘1’. Following the training of the random forest classifier, we validated its performance on a test dataset of size 287. The accuracy of the RFM based detector111111The RFM based detector requires a separate regressor to detect the onset of the eclipse attack. We study its performance in Appendix IV-F of the supplementary material. and the proposed eclipse attack detector (Algorithm 1) is summarized in Table I. Fig.4 plots the ROC curve of the two detectors.

Refer to caption
Figure 4: ROC curve of the proposed eclipse attack detector and the RFM based for a dataset with SNR=\infty (10). The proposed detector outperforms the RFM based detector when the false positive rate is high. Note that the RFM based detector requires a training dataset and is sensitive to a training dataset (See Appendix IV-G for a study on sensitivity of the RFM based detector to a training dataset). In contrast, the proposed detector did not require a training dataset.
Detector Accuracy
Proposed Detector (Algorithm 1) 97.49%
Random Forest Model 85.31%
TABLE I: Accuracy of eclipse attack detectors on a dataset with SNR=\infty (10) (100 simulations).

IV-F RFM based Detector to Estimate the Onset of the Eclipse Attack

Recall that in Sec.IV-E, we employed a detector based on a RFM to detect an eclipse attack on the blockchain network. In this section, we implement a random forest based regressor to estimate the onset of the eclipse attack τ𝜏\tauitalic_τ under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (2), i.e., presence of an eclipse attack on a blockchain network. Our training dataset comprised 81 data points, each associated with a sequence of 1000100010001000 adjacency matrices denoted as AGi,i=1,2,,Nformulae-sequencesubscript𝐴subscript𝐺𝑖𝑖12𝑁A_{G_{i}},\;i=1,2,\ldots,Nitalic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N for the BCNs (Sec. IV-A). The labels assigned to these data points corresponded to the onset of the eclipse attack. To compare the accuracy in predicting the onset of the eclipse attack, we computed the root mean squared error for both the proposed eclipse attack detector (Algorithm 1) and the eclipse attack detector based on the RFM. The results are summarized in Table II.

Detector RMSE
Proposed Detector (Algorithm 1) 1.55
Random Forest Model 38.63
TABLE II: Comparison of root mean squared error (RMSE) in estimating the onset of the eclipse attack on a blockchain network. Our test dataset consisted of 83 data points, each corresponding to a sequence of 1000 adjacency matrices for the BCNs (Sec. IV-A). The RMSE values were averaged over 5 runs.

The proposed eclipse attack detector outperforms the eclipse attack detector based on the RFM without requiring a training dataset.

IV-G Sensitivity of the RFM based Detector to Training Dataset

Recall that in Sec.IV-E, we implemented a RFM based detector to detect an eclipse attack on the blockchain network. In this section, we study the sensitivity of the eclipse attack detector, based on the RFM, to variations in the training dataset. To investigate this, we generated a training dataset and a test dataset consisting of 390 and 287 data points, respectively. The procedure for generating the dataset is outlined in Sec.IV-A. The primary distinction between the training and test datasets lies in the number of malicious users. Specifically, the training dataset was designed with 4 malicious users, while the test dataset was configured to include only 2 malicious users.

Once we trained the random forest classifier, we validated its performance on the test dataset We observed a decrease in overall accuracy to 72.25%. Consequently, achieving a precise random forest regressor requires careful feature extraction from the dataset, with an emphasis on selecting features that remain consistent with the parameters in the eclipse attack model (Definition 2).

To summarize, we used a simulated dataset to test the proposed eclipse attack detector (Algorithm 1). We also compared the proposed detector with an eclipse detector based on the RFM. Our model stood out by concurrently addressing the two aspects of eclipse attack detection: 1) detecting the presence of an eclipse attack, and 2) estimating the onset of the eclipse attack. Moreover, the proposed eclipse attack detector did not require a training dataset.

V Conclusion

This paper addressed the problem of detecting an eclipse attack on a blockchain network by designing a non-parametric change detection algorithm. In an eclipse attack, malicious users isolate a victim user, disrupting their ability to reach a consensus with the rest of the network. Our eclipse attack detection approach involved estimating changes in the Fréchet mean and variance of the BCN. We showed that the test statistic for the proposed eclipse attack detector weakly converges to a Brownian bridge process. This allowed us to quantify the false alarm rate of the detector. The proposed statistical detector can be implemented as a smart contract on top of the blockchain to mitigate the impact of an eclipse attack. Finally, we used ROC curves to characterize the performance of the proposed eclipse attack detector and the RFM based detector. It is also worthwhile exploring detection of jump Markov dynamics and the resulting weak convergent statistic; see [20]

In future work, we will explore: (1) detecting an eclipse attack on a blockchain network with time-varying blockchain users, (2) theoretical bounds on the accuracy of the test statistic when the BCNs are observed in noise, (3) refining the proposed test statistic to effectively detect an eclipse attack near endpoints, and (4) generalizing the change detection algorithm to address time-varying eclipse attack strategies. These extensions will improve the applicability and effectiveness of the proposed eclipse attack detection algorithm.

Acknowledgments

This research was supported in part by the U.S. Army Research Office grant W911NF-21-1-0093, National Science Foundation grant CCF-2112457, and the Army Research Laboratory under Cooperative Agreement Number W911NF-23-2-0124.

References

  • [1] M. Nofer, P. Gomber, O. Hinz, and D. Schiereck, “Blockchain,” Business & Information Systems Engineering, vol. 59, pp. 183–187, 2017.
  • [2] E. Heilman, A. Kendler, A. Zohar, and S. Goldberg, “Eclipse attacks on bitcoin’s peer-to-peer network,” in 24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 129–144.
  • [3] P. Dubey and H. Müller, “Fréchet change-point detection,” The Annals of Statistics, vol. 48, no. 6, pp. 3312 – 3335, 2020. [Online]. Available: https://doi.org/10.1214/19-AOS1930
  • [4] J. Matoušek, “On variants of the Johnson-Lindenstrauss lemma,” Random Structures & Algorithms, vol. 33, no. 2, pp. 142–156, 2008.
  • [5] D. Bhumichai and R. Benton, “Detection of Ethereum eclipse attack based on hybrid method and dynamic weighted entropy,” in SoutheastCon 2023, 2023, pp. 779–786.
  • [6] G. Xu, B. Guo, C. Su, X. Zheng, K. Liang, D. Wong, and H. Wang, “Am I eclipsed? a smart detector of eclipse attacks for ethereum,” Computers & Security, vol. 88, p. 101604, 2020.
  • [7] Q. Dai, B. Zhang, and S. Dong, “Eclipse attack detection for blockchain network layer based on deep feature extraction,” Wireless Communications and Mobile Computing, vol. 2022, 2022.
  • [8] B. Alangot, D. Reijsbergen, S. Venugopalan, and P. Szalachowski, “Decentralized lightweight detection of eclipse attacks on bitcoin clients,” in 2020 IEEE International Conference on Blockchain (Blockchain), 2020, pp. 337–342.
  • [9] H. Zheng, T. Tran, and O. Arden, “Total eclipse of the enclave: Detecting eclipse attacks from inside tees,” in 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2021, pp. 1–5.
  • [10] A. Yıldız, A. Atmaca, A. Solak, Y. Tursun, and S. Bahtiyar, “A trust based dns system to prevent eclipse attack on blockchain networks,” in 2022 15th International Conference on Security of Information and Networks (SIN), 2022, pp. 01–08.
  • [11] M. Iqbal and R. Matulevičius, “Exploring sybil and double-spending risks in blockchain systems,” IEEE Access, vol. 9, pp. 76 153–76 177, 2021.
  • [12] R. Chaganti, R. Boppana, V. Ravi, K. Munir, M. Almutairi, F. Rustam, E. Lee, and I. Ashraf, “A comprehensive review of denial of service attacks in blockchain ecosystem and open challenges,” IEEE Access, 2022.
  • [13] J. Gill, “Computational complexity of probabilistic turing machines,” in Proceedings of the sixth annual ACM symposium on Theory of computing, 1974, pp. 91–95.
  • [14] L. Horváth, C. Miller, and G. Rice, “A new class of change point test statistics of Rényi type,” Journal of Business & Economic Statistics, vol. 38, no. 3, pp. 570–579, 2020.
  • [15] H. Kushner, Approximation and weak convergence methods for random processes, with applications to stochastic systems theory.   MIT press, 1984, vol. 6.
  • [16] S. Ethier and T. Kurtz, Markov processes: characterization and convergence.   John Wiley & Sons, 2009.
  • [17] P. Billingsley, Convergence of probability measures.   John Wiley & Sons, 2013.
  • [18] V. Bewick, L. Cheek, and J. Ball, “Statistics review 13: receiver operating characteristic curves,” Critical care, vol. 8, no. 6, pp. 1–5, 2004.
  • [19] J. Speiser, M. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert systems with applications, vol. 134, pp. 93–101, 2019.
  • [20] G. Yin, C. Ion, and V. Krishnamurthy, “How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths,” Mathematical programming B. (Special Issue dedicated to B.T. Polyak’s 70th Birthday), vol. 120, no. 1, pp. 67–99, 2009.

Appendix A Proof of Theorem 1 in Sec.III

The outline of the proof is as follows. Observe that

NSN(Nt)𝑁subscript𝑆𝑁𝑁𝑡\displaystyle NS_{N}(Nt)italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) =NSNA(Nt)+NSNB(Nt)absent𝑁superscriptsubscript𝑆𝑁𝐴𝑁𝑡𝑁superscriptsubscript𝑆𝑁𝐵𝑁𝑡\displaystyle=NS_{N}^{A}(Nt)+NS_{N}^{B}(Nt)= italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_N italic_t ) + italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_N italic_t )
NSNA(Nt)𝑁superscriptsubscript𝑆𝑁𝐴𝑁𝑡\displaystyle NS_{N}^{A}(Nt)italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_N italic_t ) :=Nt(1t)σ^2(V^NtV^N(1t))2assignabsent𝑁𝑡1𝑡superscript^𝜎2superscriptsubscript^𝑉𝑁𝑡subscript^𝑉𝑁1𝑡2\displaystyle:=\frac{Nt(1-t)}{\hat{\sigma}^{2}}\left(\hat{V}_{Nt}-\hat{V}_{N(1% -t)}\right)^{2}:= divide start_ARG italic_N italic_t ( 1 - italic_t ) end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N ( 1 - italic_t ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
NSNB(Nt)𝑁superscriptsubscript𝑆𝑁𝐵𝑁𝑡\displaystyle NS_{N}^{B}(Nt)italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_N italic_t ) :=Nt(1t)σ^2(V^NtCV^Nt+\displaystyle:=\frac{Nt(1-t)}{\hat{\sigma}^{2}}\left(\hat{V}_{Nt}^{C}-\hat{V}_% {Nt}+\right.:= divide start_ARG italic_N italic_t ( 1 - italic_t ) end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N italic_t end_POSTSUBSCRIPT +
V^N(1t)CV^N(1t))2\displaystyle\left.\hat{V}_{N(1-t)}^{C}-\hat{V}_{N(1-t)}\right)^{2}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N ( 1 - italic_t ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N ( 1 - italic_t ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Step 1: Show that NSNA(Nt)𝑤2(t),t[δ,1δ]formulae-sequence𝑤𝑁superscriptsubscript𝑆𝑁𝐴𝑁𝑡superscript2𝑡𝑡𝛿1𝛿NS_{N}^{A}(Nt)\xrightarrow{w}\mathcal{B}^{2}(t),t\in[\delta,1-\delta]italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_N italic_t ) start_ARROW overitalic_w → end_ARROW caligraphic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) , italic_t ∈ [ italic_δ , 1 - italic_δ ].
To show this, first define ZN(t)=NSNA(Nt)subscript𝑍𝑁𝑡𝑁superscriptsubscript𝑆𝑁𝐴𝑁𝑡Z_{N}(t)=\sqrt{NS_{N}^{A}(Nt)}italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) = square-root start_ARG italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_N italic_t ) end_ARG. Then, for t0=δt1tk1=tk+1subscript𝑡0𝛿subscript𝑡1subscript𝑡𝑘1subscript𝑡𝑘1t_{0}=\delta\leq t_{1}\leq\ldots\leq t_{k}\leq 1=t_{k+1}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_δ ≤ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ … ≤ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1 = italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT show that

(ZN(t1),ZN(t2),,ZN(tk)𝑤𝒩(0,Σ)(Z_{N}(t_{1}),Z_{N}(t_{2}),\ldots,Z_{N}(t_{k})\xrightarrow{w}\mathcal{N}(0,\Sigma)( italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_ARROW overitalic_w → end_ARROW caligraphic_N ( 0 , roman_Σ )

where,

Σt1,t2subscriptΣsubscript𝑡1subscript𝑡2\displaystyle\Sigma_{t_{1},t_{2}}roman_Σ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =𝟙(t1=t2)absentdouble-struck-𝟙subscript𝑡1subscript𝑡2\displaystyle=\mathbb{1}(t_{1}=t_{2})= blackboard_𝟙 ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
+[t1(1t2)/t2(1t1)]12𝟙(t1t2),t1t2superscriptdelimited-[]subscript𝑡11subscript𝑡2subscript𝑡21subscript𝑡112double-struck-𝟙subscript𝑡1subscript𝑡2subscript𝑡1subscript𝑡2\displaystyle+[t_{1}(1-t_{2})/t_{2}(1-t_{1})]^{\frac{1}{2}}\mathbb{1}(t_{1}% \neq t_{2}),\;t_{1}\leq t_{2}+ [ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) / italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 - italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT blackboard_𝟙 ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Finally, show that ZN(t)subscript𝑍𝑁𝑡Z_{N}(t)italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) is asymptotically equicontinuous in probability. Step 1 follows from Donsker’s theorem.
Step 2: Show that NSNB(Nt)𝑤0𝑤𝑁superscriptsubscript𝑆𝑁𝐵𝑁𝑡0NS_{N}^{B}(Nt)\xrightarrow{w}0italic_N italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_N italic_t ) start_ARROW overitalic_w → end_ARROW 0 by proving the consistency of estimators under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.
Theorem 1 follows from combining Step 1 and Step 2 using Slutsky’s theorem. Refer [3] for the detailed proof of Theorem 1.

Appendix B Proof of Theorem 2 in Sec.III.

Proof.

The outline of the proof is as follows. Consider two cases: (1) nτ𝑛𝜏n\geq\tauitalic_n ≥ italic_τ and (2) nτ𝑛𝜏n\leq\tauitalic_n ≤ italic_τ. The proof of case (2) is similar to case (1). For case (1), one can show that

S(Nt)𝑆𝑁𝑡\displaystyle S(Nt)italic_S ( italic_N italic_t ) t(1t)σ2(max(α2(V1V2)2,\displaystyle\leq\frac{t(1-t)}{\sigma^{2}}\left(\max(\alpha^{2}(V_{1}-V_{2})^{% 2},\right.≤ divide start_ARG italic_t ( 1 - italic_t ) end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( roman_max ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
(α(V1V2)+min(αΔ1,(1α)Δ2))2))\displaystyle\left.(\alpha(V_{1}-V_{2})+\min(\alpha\Delta_{1},(1-\alpha)\Delta% _{2}))^{2})\right)( italic_α ( italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + roman_min ( italic_α roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ( 1 - italic_α ) roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) )
α(V1V2)2+α(Δ1+Δ2)2=S(Nτ)absent𝛼superscriptsubscript𝑉1subscript𝑉22𝛼superscriptsubscriptΔ1subscriptΔ22𝑆𝑁𝜏\displaystyle\leq\alpha(V_{1}-V_{2})^{2}+\alpha(\Delta_{1}+\Delta_{2})^{2}=S(N\tau)≤ italic_α ( italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ( roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_S ( italic_N italic_τ )

where,

α𝛼\displaystyle\alphaitalic_α =τnabsent𝜏𝑛\displaystyle=\frac{\tau}{n}= divide start_ARG italic_τ end_ARG start_ARG italic_n end_ARG
Δ1subscriptΔ1\displaystyle\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =𝔼P1[d2(AG,μ2)]𝔼P1[d2(AG,μ1)]absentsubscript𝔼subscript𝑃1delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝜇2subscript𝔼subscript𝑃1delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝜇1\displaystyle=\mathbb{E}_{P_{1}}\left[d^{2}\left(A_{G},\mu_{2}\right)\right]-% \mathbb{E}_{P_{1}}\left[d^{2}\left(A_{G},\mu_{1}\right)\right]= roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] - roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
Δ2subscriptΔ2\displaystyle\Delta_{2}roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =𝔼P2[d2(AG,μ1)]𝔼P2[d2(AG,μ2)]absentsubscript𝔼subscript𝑃2delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝜇1subscript𝔼subscript𝑃2delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝜇2\displaystyle=\mathbb{E}_{P_{2}}\left[d^{2}\left(A_{G},\mu_{1}\right)\right]-% \mathbb{E}_{P_{2}}\left[d^{2}\left(A_{G},\mu_{2}\right)\right]= roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ]
σ2superscript𝜎2\displaystyle\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =τ𝔼P1[d4(AG,μ~)]+(1τ)𝔼P2[d4(AG,μ~)]V~2absent𝜏subscript𝔼subscript𝑃1delimited-[]superscript𝑑4subscript𝐴𝐺~𝜇1𝜏subscript𝔼subscript𝑃2delimited-[]superscript𝑑4subscript𝐴𝐺~𝜇superscript~𝑉2\displaystyle=\tau\mathbb{E}_{P_{1}}\left[d^{4}(A_{G},\tilde{\mu})\right]+(1-% \tau)\mathbb{E}_{P_{2}}\left[d^{4}(A_{G},\tilde{\mu})\right]-\tilde{V}^{2}= italic_τ roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) ] + ( 1 - italic_τ ) roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) ] - over~ start_ARG italic_V end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
μ~~𝜇\displaystyle\tilde{\mu}over~ start_ARG italic_μ end_ARG =argminω𝒢{τ𝔼P1[d2(AG,Aω)]\displaystyle=\arg\min_{\omega\in\mathcal{G}}\left\{\tau\mathbb{E}_{P_{1}}% \left[d^{2}(A_{G},A_{\omega})\right]\right.= roman_arg roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT { italic_τ roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) ]
+(1τ)𝔼P2[d2(AG,Aω)]}\displaystyle\left.+(1-\tau)\mathbb{E}_{P_{2}}\left[d^{2}(A_{G},A_{\omega})% \right]\right\}+ ( 1 - italic_τ ) roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) ] }
V~~𝑉\displaystyle\tilde{V}over~ start_ARG italic_V end_ARG =τ𝔼P1[d2(AG,μ~)]+(1τ)𝔼P2[d2(AG,μ~)]absent𝜏subscript𝔼subscript𝑃1delimited-[]superscript𝑑2subscript𝐴𝐺~𝜇1𝜏subscript𝔼subscript𝑃2delimited-[]superscript𝑑2subscript𝐴𝐺~𝜇\displaystyle=\tau\mathbb{E}_{P_{1}}\left[d^{2}(A_{G},\tilde{\mu})\right]+(1-% \tau)\mathbb{E}_{P_{2}}\left[d^{2}(A_{G},\tilde{\mu})\right]= italic_τ roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) ] + ( 1 - italic_τ ) roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) ]
μisubscript𝜇𝑖\displaystyle\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =argminω𝒢𝔼Pi[d2(AG,Aω)],i=1,2formulae-sequenceabsentsubscriptargmin𝜔𝒢subscript𝔼subscript𝑃𝑖delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝐴𝜔𝑖12\displaystyle=\operatorname*{arg\,min}_{\omega\in\mathcal{G}}\mathbb{E}_{P_{i}% }\left[d^{2}(A_{G},A_{\omega})\right],\;i=1,2= start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) ] , italic_i = 1 , 2
Visubscript𝑉𝑖\displaystyle V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =minω𝒢𝔼Pi[d2(AG,Aω)],i=1,2formulae-sequenceabsentsubscript𝜔𝒢subscript𝔼subscript𝑃𝑖delimited-[]superscript𝑑2subscript𝐴𝐺subscript𝐴𝜔𝑖12\displaystyle=\min_{\omega\in\mathcal{G}}\mathbb{E}_{P_{i}}\left[d^{2}(A_{G},A% _{\omega})\right],\;i=1,2= roman_min start_POSTSUBSCRIPT italic_ω ∈ caligraphic_G end_POSTSUBSCRIPT roman_𝔼 start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) ] , italic_i = 1 , 2

The second inequality is obtained from the first inequality by considering multiple sub-cases. Refer [3] for the detailed proof of Theorem 2. ∎

Appendix C Proof of Theorem 3 in Sec.III

Proof.

To prove Theorem 3, we first derive an upper and a lower bound on the variance of the projected adjacency matrices (A~Gi)isubscriptsubscript~𝐴subscript𝐺𝑖𝑖(\tilde{A}_{G_{i}})_{i}( over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (6) in terms of the variance of the adjacency matrices of the BCNs (AGi)isubscriptsubscript𝐴subscript𝐺𝑖𝑖(A_{G_{i}})_{i}( italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (1). Then, we compute the value of the test statistic S~N(Nt)subscript~𝑆𝑁𝑁𝑡\tilde{S}_{N}(Nt)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) and SN(Nt)subscript𝑆𝑁𝑁𝑡S_{N}(Nt)italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) for N𝑁N\rightarrow\inftyitalic_N → ∞ under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Step1: Comparing the variances: Using triangle inequality, Lemma 1 and the fact that argmaxλ𝔼[Aλ2]=𝔼[A]subscriptargmax𝜆𝔼delimited-[]superscriptnorm𝐴𝜆2𝔼delimited-[]𝐴\operatorname*{arg\,max}_{\lambda}\mathbb{E}\left[\|A-\lambda\|^{2}\right]=% \mathbb{E}\left[A\right]start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT roman_𝔼 [ ∥ italic_A - italic_λ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = roman_𝔼 [ italic_A ], we can compare the variance of the projected adjacency matrices (A~Gi)isubscriptsubscript~𝐴subscript𝐺𝑖𝑖(\tilde{A}_{G_{i}})_{i}( over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (6) and the variance of the adjacency matrices of the BCNs (AGi)isubscriptsubscript𝐴subscript𝐺𝑖𝑖(A_{G_{i}})_{i}( italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (1). Let α𝛼\alphaitalic_α be an adjacency matrix s.t. f(α)=𝔼[A~G]𝑓𝛼𝔼delimited-[]subscript~𝐴𝐺f(\alpha)=\mathbb{E}\left[\tilde{A}_{G}\right]italic_f ( italic_α ) = roman_𝔼 [ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ]. We obtain

(1ϵ)AGiα21italic-ϵsuperscriptnormsubscript𝐴subscript𝐺𝑖𝛼2\displaystyle(1-\epsilon)\left\|A_{G_{i}}-\alpha\right\|^{2}( 1 - italic_ϵ ) ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_α ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT A~Gi𝔼[A~G]2absentsuperscriptnormsubscript~𝐴subscript𝐺𝑖𝔼delimited-[]subscript~𝐴𝐺2\displaystyle\leq\left\|\tilde{A}_{G_{i}}-\mathbb{E}\left[\tilde{A}_{G}\right]% \right\|^{2}≤ ∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1ϵ)iAGiα2absent1italic-ϵsubscript𝑖superscriptnormsubscript𝐴subscript𝐺𝑖𝛼2\displaystyle\Rightarrow(1-\epsilon)\sum_{i}\left\|A_{G_{i}}-\alpha\right\|^{2}⇒ ( 1 - italic_ϵ ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_α ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT iA~Gi𝔼[A~G]2absentsubscript𝑖superscriptnormsubscript~𝐴subscript𝐺𝑖𝔼delimited-[]subscript~𝐴𝐺2\displaystyle\leq\sum_{i}\left\|\tilde{A}_{G_{i}}-\mathbb{E}\left[\tilde{A}_{G% }\right]\right\|^{2}≤ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1ϵ)iAGi𝔼[AG]2absent1italic-ϵsubscript𝑖superscriptnormsubscript𝐴subscript𝐺𝑖𝔼delimited-[]subscript𝐴𝐺2\displaystyle\Rightarrow(1-\epsilon)\sum_{i}\left\|A_{G_{i}}-\mathbb{E}\left[A% _{G}\right]\right\|^{2}⇒ ( 1 - italic_ϵ ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT iA~Gi𝔼[A~G]2absentsubscript𝑖superscriptnormsubscript~𝐴subscript𝐺𝑖𝔼delimited-[]subscript~𝐴𝐺2\displaystyle\leq\sum_{i}\left\|\tilde{A}_{G_{i}}-\mathbb{E}\left[\tilde{A}_{G% }\right]\right\|^{2}≤ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Let β𝛽\betaitalic_β be such that the linear map obtained from the JL lemma yields f(𝔼[AG])=β𝑓𝔼delimited-[]subscript𝐴𝐺𝛽f(\mathbb{E}\left[A_{G}\right])=\betaitalic_f ( roman_𝔼 [ italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ) = italic_β.

A~Giβ2superscriptnormsubscript~𝐴subscript𝐺𝑖𝛽2\displaystyle\left\|\tilde{A}_{G_{i}}-\beta\right\|^{2}∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1+ϵ)AGi𝔼[AG]2absent1italic-ϵsuperscriptnormsubscript𝐴subscript𝐺𝑖𝔼delimited-[]subscript𝐴𝐺2\displaystyle\leq(1+\epsilon)\left\|A_{G_{i}}-\mathbb{E}\left[A_{G}\right]% \right\|^{2}≤ ( 1 + italic_ϵ ) ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
iA~Giβ2absentsubscript𝑖superscriptnormsubscript~𝐴subscript𝐺𝑖𝛽2\displaystyle\Rightarrow\sum_{i}\left\|\tilde{A}_{G_{i}}-\beta\right\|^{2}⇒ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1+ϵ)iAGi𝔼[AG]2absent1italic-ϵsubscript𝑖superscriptnormsubscript𝐴subscript𝐺𝑖𝔼delimited-[]subscript𝐴𝐺2\displaystyle\leq(1+\epsilon)\sum_{i}\left\|A_{G_{i}}-\mathbb{E}\left[A_{G}% \right]\right\|^{2}≤ ( 1 + italic_ϵ ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
iA~Gi𝔼[A~G]2absentsubscript𝑖superscriptnormsubscript~𝐴subscript𝐺𝑖𝔼delimited-[]subscript~𝐴𝐺2\displaystyle\Rightarrow\sum_{i}\left\|\tilde{A}_{G_{i}}-\mathbb{E}\left[% \tilde{A}_{G}\right]\right\|^{2}⇒ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1+ϵ)iAGi𝔼[AG]2absent1italic-ϵsubscript𝑖superscriptnormsubscript𝐴subscript𝐺𝑖𝔼delimited-[]subscript𝐴𝐺2\displaystyle\leq(1+\epsilon)\sum_{i}\left\|A_{G_{i}}-\mathbb{E}\left[A_{G}% \right]\right\|^{2}≤ ( 1 + italic_ϵ ) ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ italic_A start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - roman_𝔼 [ italic_A start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Step 2: Comparing the value of the test statistic: Under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (2) as n𝑛n\rightarrow\inftyitalic_n → ∞, V^nV,V^nCV,V^NnV,V^NnCVformulae-sequencesubscript^𝑉𝑛𝑉formulae-sequencesuperscriptsubscript^𝑉𝑛𝐶𝑉formulae-sequencesubscript^𝑉𝑁𝑛𝑉superscriptsubscript^𝑉𝑁𝑛𝐶𝑉\hat{V}_{n}\rightarrow V,\,\hat{V}_{n}^{C}\rightarrow V,\,\hat{V}_{N-n}% \rightarrow V,\,\hat{V}_{N-n}^{C}\rightarrow Vover^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_V , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT → italic_V , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT → italic_V , over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT → italic_V. Here, the convergence is in probability. This implies SN(Nt)=0subscript𝑆𝑁𝑁𝑡0S_{N}(Nt)=0italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) = 0. Using the previous inequalities, one obtains

S~N(Nt)5ϵt(1t)Vσ^2subscript~𝑆𝑁𝑁𝑡5italic-ϵ𝑡1𝑡𝑉superscript^𝜎2\displaystyle\tilde{S}_{N}(Nt)\geq\frac{5\epsilon\,t(1-t)V}{\hat{\sigma}^{2}}over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N italic_t ) ≥ divide start_ARG 5 italic_ϵ italic_t ( 1 - italic_t ) italic_V end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG