License: arXiv.org perpetual non-exclusive license
arXiv:2403.06068v1 [math.ST] 10 Mar 2024

Hypothesis testing for homogenous of nodes in β𝛽\betaitalic_β-models

Kang Fu,   Jianwei Hu   and   Meng Sun
Central China Normal University
School of Mathematics and Statistics, and Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China. Email:[email protected].School of Mathematics and Statistics, and Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China. Email:[email protected].School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China. Email:[email protected].
Abstract

The β𝛽\betaitalic_β-model has been extensively utilized to model degree heterogeneity in networks, wherein each node is assigned a unique parameter. In this article, we consider the hypothesis testing problem that two nodes i𝑖iitalic_i and j𝑗jitalic_j of a β𝛽\betaitalic_β-model have the same node parameter. We prove that the null distribution of the proposed statistic converges in distribution to the standard normal distribution. Further, we investigate the homogeneous test for β𝛽\betaitalic_β-model by combining individual p𝑝pitalic_p-values to aggregate small effects of multiple tests. Both simulation studies and real-world data examples indicate that the proposed method works well.

Keywords: β𝛽\betaitalic_β-model; Combination p𝑝pitalic_p-values; Hypothesis testing; Network data

1 Introduction

Network models are commonly popular models to character the interaction between the different entries (Scott, 2000). The studies on network data have attracted considerable attention in many fields, such as computer science, social science, and biology. For example, in the social network, the interaction between the different individuals represents a friend relationship (Hunter et al., 2012). In general, an undirected and unweight network 𝒢𝒢\mathcal{G}caligraphic_G with n𝑛nitalic_n nodes can be represented by an n×n𝑛𝑛n\times nitalic_n × italic_n adjacency matrix A{0,1}n×n𝐴superscript01𝑛𝑛A\in\{0,1\}^{n\times n}italic_A ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, where (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th entry indicates whether there is a connection between node i𝑖iitalic_i and node j𝑗jitalic_j, i.e., Aij=1subscript𝐴𝑖𝑗1A_{ij}=1italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 if there is a connection between node i𝑖iitalic_i and node j𝑗jitalic_j and Aij=0subscript𝐴𝑖𝑗0A_{ij}=0italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 otherwise. In network data analysis, the β𝛽\betaitalic_β-model, proposed by Chatterjee et al. (2011), is a special case of a class of models known as node-parameter models, where each node degree is associated with a corresponding parameter. Specifically, the β𝛽\betaitalic_β-model assumes that the edge between node i𝑖iitalic_i and node j𝑗jitalic_j exists with probability

{Aij=1}=pij=eβi+βj1+eβi+βj,subscript𝐴𝑖𝑗1subscript𝑝𝑖𝑗superscript𝑒subscript𝛽𝑖subscript𝛽𝑗1superscript𝑒subscript𝛽𝑖subscript𝛽𝑗\mathbb{P}\{A_{ij}=1\}=p_{ij}=\dfrac{e^{\beta_{i}+\beta_{j}}}{1+e^{\beta_{i}+% \beta_{j}}},blackboard_P { italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 } = italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ,

independently of all other edges, where βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the node parameter (also known as the “attractiveness” of node) of node i𝑖iitalic_i. The β𝛽\betaitalic_β-model is an exponential random graph model and can be seen as an undirected version of a p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-model (Holland and Leinhardt, 1981). An advantage of the β𝛽\betaitalic_β-model is that the degree sequence is the unique sufficient statistic. Then, the β𝛽\betaitalic_β-model is widely used to model the network with degree heterogeneous. It is not difficult to see that the probability connecting the node i𝑖iitalic_i and node j𝑗jitalic_j only depends on the parameters of the node i𝑖iitalic_i and node j𝑗jitalic_j. When all βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are equal to each other, the β𝛽\betaitalic_β-model naturally degenerates to the E-R model. To fit a sparse network, Mukherjee et al. (2018) proposed the adjusted β𝛽\betaitalic_β-model

pij=λneβi+βj1+eβi+βj,subscript𝑝𝑖𝑗𝜆𝑛superscript𝑒subscript𝛽𝑖subscript𝛽𝑗1superscript𝑒subscript𝛽𝑖subscript𝛽𝑗p_{ij}=\dfrac{\lambda}{n}\dfrac{e^{\beta_{i}+\beta_{j}}}{1+e^{\beta_{i}+\beta_% {j}}},italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG italic_λ end_ARG start_ARG italic_n end_ARG divide start_ARG italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ,

where λ(1,n)𝜆1𝑛\lambda\in(1,n)italic_λ ∈ ( 1 , italic_n ) is used to measure the sparsity of the graph. Since the β𝛽\betaitalic_β-model can capture important features of real-world networks, the β𝛽\betaitalic_β-model and its variations have been studied widely in recent years (Chatterjee et al., 2011; Yan and Xu, 2013; Rinaldo et al., 2013; Ogawa et al., 2013; Yan et al., 2015, 2016).

Hypothesis testing plays a critical role in the studies on network data (Fu et al., 2022, 2023). One significant application is to recover the community structure of a network. Bickel and Sarkar (2016) and Dong et al. (2020) used the spectral statistic of the normalized adjacency matrix to test whether the network has a community structure, i.e., H0:k=1:subscript𝐻0𝑘1H_{0}:k=1italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_k = 1 for stochastic block models. Then, Cammarata and Ke (2023) considered the global testing problem under the framework of degree-corrected mixed membership models. Further, a majority of methods of the goodness-of-fit test for stochastic block models have also been proposed, see, e.g., Lei (2016); Hu et al. (2021); ** et al. (2023). Under the settings of degree-corrected mixed membership models, Fan et al. (2022) studied the issue of hypothesis testing for the equality of membership vectors between two nodes, up to a possible scaling. Similarly, Du and Tang (2023) investigated the equality of latent positions between two nodes. Their methods are based on the Mahalanobis distance between two vectors, which are generalizations of the corresponding results in Fan et al. (2022).

Hypothesis testing for β𝛽\betaitalic_β-modes is a nascent research area. Motivated by the issues of equality of two nodes, we consider the hypothesis testing problem that two node i𝑖iitalic_i and node j𝑗jitalic_j of a β𝛽\betaitalic_β-model have the same node parameter. Specifically, we consider the following test:

H0:βi=βjv.s.H1:βiβj,H_{0}:\beta_{i}=\beta_{j}\quad v.s.\quad H_{1}:\beta_{i}\neq\beta_{j},italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_v . italic_s . italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (1.1)

for any i,j[n]𝑖𝑗delimited-[]𝑛i,j\in[n]italic_i , italic_j ∈ [ italic_n ], where [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\cdots,n\}[ italic_n ] = { 1 , ⋯ , italic_n }. Further, the other significant problem is the homogeneous test, i.e.,

H0:β1=β2==βnv.s.H1:βiβjfor at least one pair ofi,j.H_{0}^{\prime}:\beta_{1}=\beta_{2}=\cdots=\beta_{n}\quad v.s.\quad H_{1}^{% \prime}:\beta_{i}\neq\beta_{j}\ \text{for at least one pair of}\ i,j.italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ⋯ = italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_v . italic_s . italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for at least one pair of italic_i , italic_j . (1.2)

For test (1.2), the null hypothesis H0superscriptsubscript𝐻0H_{0}^{\prime}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT implies that there is no heterogeneity in the network, and the network can be seen as an E-R graph. For an adjusted β𝛽\betaitalic_β-model, Mukherjee et al. (2018) considered a homogeneous null hypothesis with all βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being equal to 0 against an alternative hypothesis with a subset of {βi:i[n]}conditional-setsubscript𝛽𝑖𝑖delimited-[]𝑛\{\beta_{i}:i\in[n]\}{ italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_i ∈ [ italic_n ] } strictly greater than 0. They proposed three explicitly degree-based test statistics: idisubscript𝑖subscript𝑑𝑖\sum_{i}d_{i}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, maxidisubscript𝑖subscript𝑑𝑖\max_{i}d_{i}roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and a criticism test based on (diλ/2)/(λ(1λ/2n))1/2subscript𝑑𝑖𝜆2superscript𝜆1𝜆2𝑛12(d_{i}-\lambda/2)/(\lambda(1-\lambda/2n))^{1/2}( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_λ / 2 ) / ( italic_λ ( 1 - italic_λ / 2 italic_n ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT and established their asymptotic null distribution under some mild conditions. Similarly, under the β𝛽\betaitalic_β-model, Yan et al. (2022) investigated two testing problems: for a fixed r𝑟ritalic_r, the specified null H0:βi=βi0,i=1,,r:subscript𝐻0formulae-sequencesubscript𝛽𝑖superscriptsubscript𝛽𝑖0𝑖1𝑟H_{0}:\beta_{i}=\beta_{i}^{0},i=1,\cdots,ritalic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_i = 1 , ⋯ , italic_r and the homogeneous null H0:β1==βr:subscript𝐻0subscript𝛽1subscript𝛽𝑟H_{0}:\beta_{1}=\cdots=\beta_{r}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋯ = italic_β start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, where βi0superscriptsubscript𝛽𝑖0\beta_{i}^{0}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is known. For the two nulls, they established the Wilks’ theorem of β𝛽\betaitalic_β-models, i.e., the log-likelihood ratio statistic 2[(𝜷^)(𝜷^res)]2delimited-[]^𝜷superscript^𝜷𝑟𝑒𝑠2[\ell(\hat{\bm{\beta}})-\ell(\hat{\bm{\beta}}^{res})]2 [ roman_ℓ ( over^ start_ARG bold_italic_β end_ARG ) - roman_ℓ ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_r italic_e italic_s end_POSTSUPERSCRIPT ) ] converges in distribution to a chi-square distribution with r𝑟ritalic_r degrees of freedom and r1𝑟1r-1italic_r - 1 degrees of freedom, respectively, where 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG and 𝜷^ressuperscript^𝜷𝑟𝑒𝑠\hat{\bm{\beta}}^{res}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_r italic_e italic_s end_POSTSUPERSCRIPT are the unrestricted and restricted maximum likelihood estimators of 𝜷𝜷\bm{\beta}bold_italic_β, and ()\ell(\cdot)roman_ℓ ( ⋅ ) is the log-likelihood function. Compared with their settings, the advantages of our setting are as follows. First, our null hypothesis has a wider range of parameters than that in Mukherjee et al. (2018) since we do not require that all parameters be equal to zero. Second, we only need the unrestricted maximum likelihood estimate, and save the computational cost.

The rest of this article is organized as follows. In Section 2, we present our main method and theorems about the test for equality of node parameters. The homogeneous test for the β𝛽\betaitalic_β-model is investigated in Section 3. Additional simulation studies and real-world data examples are given in Sections 4 and 5. Section 6 concludes the article. Technical proofs are given in the Appendix.

2 Hypothesis testing for equality of node parameters

Formally, suppose that A{0,1}n×n𝐴superscript01𝑛𝑛A\in\{0,1\}^{n\times n}italic_A ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT is an adjacency matrix of undirected graph 𝒢𝒢\mathcal{G}caligraphic_G generated from the β𝛽\betaitalic_β-model with parameter 𝜷=(β1,,βn)n𝜷superscriptsubscript𝛽1subscript𝛽𝑛topsuperscript𝑛\bm{\beta}=(\beta_{1},\ldots,\beta_{n})^{\top}\in\mathbb{R}^{n}bold_italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where 𝜷𝜷\bm{\beta}bold_italic_β is unknown. Throughout this article, we assume that the self-loops are not allowed, i.e., Aii=0subscript𝐴𝑖𝑖0A_{ii}=0italic_A start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = 0 for 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n. Let di=jiAijsubscript𝑑𝑖subscript𝑗𝑖subscript𝐴𝑖𝑗d_{i}=\sum_{j\neq i}A_{ij}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT be the degree of the node i𝑖iitalic_i. Then, the logarithm of the likelihood function can be written as:

(𝜷|A)=iβidii<jlog(1+eβi+βj).conditional𝜷𝐴subscript𝑖subscript𝛽𝑖subscript𝑑𝑖subscript𝑖𝑗1superscript𝑒subscript𝛽𝑖subscript𝛽𝑗\ell(\bm{\beta}|A)=\sum_{i}\beta_{i}d_{i}-\sum_{i<j}\log\left(1+e^{\beta_{i}+% \beta_{j}}\right).roman_ℓ ( bold_italic_β | italic_A ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT roman_log ( 1 + italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) .

Denote 𝜷^=argmax𝜷(𝜷|A)^𝜷subscript𝜷conditional𝜷𝐴\hat{\bm{\beta}}=\mathop{\arg\max}_{\bm{\beta}}\ell(\bm{\beta}|A)over^ start_ARG bold_italic_β end_ARG = start_BIGOP roman_arg roman_max end_BIGOP start_POSTSUBSCRIPT bold_italic_β end_POSTSUBSCRIPT roman_ℓ ( bold_italic_β | italic_A ) as the maximum likelihood estimator (MLE). The MLE can be obtained by solving the following equations:

di=jieβi+βj1+eβi+βj,(i=1,,n).subscript𝑑𝑖subscript𝑗𝑖superscript𝑒subscript𝛽𝑖subscript𝛽𝑗1superscript𝑒subscript𝛽𝑖subscript𝛽𝑗𝑖1𝑛d_{i}=\sum_{j\neq i}\dfrac{e^{\beta_{i}+\beta_{j}}}{1+e^{\beta_{i}+\beta_{j}}}% ,(i=1,\cdots,n).italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , ( italic_i = 1 , ⋯ , italic_n ) . (2.1)

Chatterjee et al. (2011) showed that the fixed point iterative algorithm can be used to solve 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG. Under the frameworks of the β𝛽\betaitalic_β-model, Chatterjee et al. (2011) established the consistency of 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG. Specifically, let Ln=maxi|βi|subscript𝐿𝑛subscript𝑖subscript𝛽𝑖L_{n}=\max_{i}|\beta_{i}|italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, then there is a constant C(Ln)𝐶subscript𝐿𝑛C(L_{n})italic_C ( italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) depending only on Lnsubscript𝐿𝑛L_{n}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT such that {max1in|β^iβi|C(Ln)n1logn}1C(Ln)n2subscript1𝑖𝑛subscript^𝛽𝑖subscript𝛽𝑖𝐶subscript𝐿𝑛superscript𝑛1𝑛1𝐶subscript𝐿𝑛superscript𝑛2\mathbb{P}\{\max_{1\leq i\leq n}|\hat{\beta}_{i}-\beta_{i}|\leq C(L_{n})\sqrt{% n^{-1}\log n}\}\geq 1-C(L_{n})n^{-2}blackboard_P { roman_max start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT | over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_C ( italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) square-root start_ARG italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_log italic_n end_ARG } ≥ 1 - italic_C ( italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_n start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. Further, by approximating the inverse of the Fisher information matrix, Yan and Xu (2013) proved the asymptotic normality of 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG. Then, Rinaldo et al. (2013) gave the necessary and sufficient conditions for the existence and uniqueness of 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG.

Denote the Fisher information matrix for 𝜷𝜷\bm{\beta}bold_italic_β as V=(vij)n×n𝑉subscriptsubscript𝑣𝑖𝑗𝑛𝑛V=(v_{ij})_{n\times n}italic_V = ( italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT, where

vij=eβi+βj{1+eβi+βj}2(1ijj),vii=jivij.formulae-sequencesubscript𝑣𝑖𝑗superscript𝑒subscript𝛽𝑖subscript𝛽𝑗superscript1superscript𝑒subscript𝛽𝑖subscript𝛽𝑗21𝑖𝑗𝑗subscript𝑣𝑖𝑖subscript𝑗𝑖subscript𝑣𝑖𝑗v_{ij}=\dfrac{e^{\beta_{i}+\beta_{j}}}{\{1+e^{\beta_{i}+\beta_{j}}\}^{2}}\ (1% \leq i\neq j\leq j),\qquad v_{ii}=\sum_{j\neq i}v_{ij}.italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG { 1 + italic_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 ≤ italic_i ≠ italic_j ≤ italic_j ) , italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT .

Note that V𝑉Vitalic_V is also the covariance matrix of degree sequence 𝒅=(d1,,dn)𝒅subscript𝑑1subscript𝑑𝑛\bm{d}=(d_{1},\cdots,d_{n})bold_italic_d = ( italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Then, Yan and Xu (2013) established the following central limiting theorem:

Lemma 1.

If Ln=o(loglogn)subscript𝐿𝑛𝑜𝑛L_{n}=o(\log\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( roman_log roman_log italic_n ), then for any fixed r1𝑟1r\geq 1italic_r ≥ 1, the vector consisting of the first r𝑟ritalic_r elements of G1/2(𝛃^𝛃)superscript𝐺12normal-^𝛃𝛃G^{1/2}(\hat{\bm{\beta}}-\bm{\beta})italic_G start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG - bold_italic_β ) is asymptotically standard multivariate normal as nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, where G=diag(v11,,vnn)𝐺normal-diagsubscript𝑣11normal-⋯subscript𝑣𝑛𝑛G=\mathrm{diag}(v_{11},\cdots,v_{nn})italic_G = roman_diag ( italic_v start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , ⋯ , italic_v start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT ) and G1/2=diag(v111/2,,vnn1/2)superscript𝐺12normal-diagsuperscriptsubscript𝑣1112normal-⋯superscriptsubscript𝑣𝑛𝑛12G^{1/2}=\mathrm{diag}(v_{11}^{1/2},\cdots,v_{nn}^{1/2})italic_G start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = roman_diag ( italic_v start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT , ⋯ , italic_v start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ).

Lemma 1 implies that, for any i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ], the following result holds:

vii1/2(β^iβi)dN(0,1),superscript𝑑superscriptsubscript𝑣𝑖𝑖12subscript^𝛽𝑖subscript𝛽𝑖𝑁01v_{ii}^{1/2}(\hat{\beta}_{i}-\beta_{i})\stackrel{{\scriptstyle d}}{{% \longrightarrow}}N(0,1),italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , 1 ) ,

and β^isubscript^𝛽𝑖\hat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and β^jsubscript^𝛽𝑗\hat{\beta}_{j}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are asymptotic independent for any 1ijn1𝑖𝑗𝑛1\leq i\neq j\leq n1 ≤ italic_i ≠ italic_j ≤ italic_n. Then, for a pair of nodes (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), we have

β^iβ^jdN(βiβj,vii1+vjj1).superscript𝑑subscript^𝛽𝑖subscript^𝛽𝑗𝑁subscript𝛽𝑖subscript𝛽𝑗superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗1\hat{\beta}_{i}-\hat{\beta}_{j}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N% (\beta_{i}-\beta_{j},v_{ii}^{-1}+v_{jj}^{-1}).over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

Under the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT of test (1.1), we have

β^iβ^jdN(0,vii1+vjj1).superscript𝑑subscript^𝛽𝑖subscript^𝛽𝑗𝑁0superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗1\hat{\beta}_{i}-\hat{\beta}_{j}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N% (0,v_{ii}^{-1}+v_{jj}^{-1}).over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (2.2)

Consider the statistic Uij=β^iβ^jvii1+vjj1subscript𝑈𝑖𝑗subscript^𝛽𝑖subscript^𝛽𝑗superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗1U_{ij}=\dfrac{\hat{\beta}_{i}-\hat{\beta}_{j}}{\sqrt{v_{ii}^{-1}+v_{jj}^{-1}}}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG. Then, under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have UijdN(0,1)superscript𝑑subscript𝑈𝑖𝑗𝑁01U_{ij}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1)italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , 1 ). Notice that the statistic Uijsubscript𝑈𝑖𝑗U_{ij}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT involves unknown parameters viisubscript𝑣𝑖𝑖v_{ii}italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT and vjjsubscript𝑣𝑗𝑗v_{jj}italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT. Hence, we can consider a natural estimate of Uijsubscript𝑈𝑖𝑗U_{ij}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT by plugging in the estimated parameters v^iisubscript^𝑣𝑖𝑖\hat{v}_{ii}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT and v^jjsubscript^𝑣𝑗𝑗\hat{v}_{jj}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT, where

v^ij=eβ^i+β^j{1+eβ^i+β^j}2(1ijn),v^ii=jiv^ij.formulae-sequencesubscript^𝑣𝑖𝑗superscript𝑒subscript^𝛽𝑖subscript^𝛽𝑗superscript1superscript𝑒subscript^𝛽𝑖subscript^𝛽𝑗21𝑖𝑗𝑛subscript^𝑣𝑖𝑖subscript𝑗𝑖subscript^𝑣𝑖𝑗\hat{v}_{ij}=\dfrac{e^{\hat{\beta}_{i}+\hat{\beta}_{j}}}{\{1+e^{\hat{\beta}_{i% }+\hat{\beta}_{j}}\}^{2}}\ (1\leq i\neq j\leq n),\qquad\hat{v}_{ii}=\sum_{j% \neq i}\hat{v}_{ij}.over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG { 1 + italic_e start_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 ≤ italic_i ≠ italic_j ≤ italic_n ) , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT .

Denote the empirical estimate of Uijsubscript𝑈𝑖𝑗U_{ij}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT by U^ij=β^iβ^jv^ii1+v^jj1subscript^𝑈𝑖𝑗subscript^𝛽𝑖subscript^𝛽𝑗superscriptsubscript^𝑣𝑖𝑖1superscriptsubscript^𝑣𝑗𝑗1\hat{U}_{ij}=\dfrac{\hat{\beta}_{i}-\hat{\beta}_{j}}{\sqrt{\hat{v}_{ii}^{-1}+% \hat{v}_{jj}^{-1}}}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG. It is natural to conjecture that when the estimates v^iisubscript^𝑣𝑖𝑖\hat{v}_{ii}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT and v^jjsubscript^𝑣𝑗𝑗\hat{v}_{jj}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT are accurate enough, the convergence in (2.2) will still hold for U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT.

Formally, we have the following theorem:

Theorem 1.

Let A𝐴Aitalic_A be an adjacency matrix generated from a β𝛽\betaitalic_β-model with parameter 𝛃=(β1,,βn)𝛃subscript𝛽1normal-⋯subscript𝛽𝑛\bm{\beta}=(\beta_{1},\cdots,\beta_{n})bold_italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, when maxi|βi|=o(loglogn)subscript𝑖subscript𝛽𝑖𝑜𝑛\max_{i}|\beta_{i}|=o(\log\log n)roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = italic_o ( roman_log roman_log italic_n ), we have the following result:

U^ijdN(0,1).superscript𝑑subscript^𝑈𝑖𝑗𝑁01\hat{U}_{ij}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , 1 ) . (2.3)

Under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we assume that βiβj=μsubscript𝛽𝑖subscript𝛽𝑗𝜇\beta_{i}-\beta_{j}=\muitalic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_μ. Then, we have

U^ijdN(μ,1).superscript𝑑subscript^𝑈𝑖𝑗𝑁𝜇1\hat{U}_{ij}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(\mu,1).over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( italic_μ , 1 ) . (2.4)

We postpone the proof to the Appendix. Theorem 1 is an intuitive result. The method is similar to the test of the mean for two samples when the variance is unknown. It can be seen that, for the null and alternative, the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT has different means. Using the result, we can carry out the hypothesis testing. Specifically, given a nominal level α𝛼\alphaitalic_α, we have a rejection rule:

RejectH0,if|U^ij|u1α/2,Rejectsubscript𝐻0ifsubscript^𝑈𝑖𝑗subscript𝑢1𝛼2\mathrm{Reject}\ H_{0},\ \mathrm{if}\ |\hat{U}_{ij}|\geq u_{1-\alpha/2},roman_Reject italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_if | over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≥ italic_u start_POSTSUBSCRIPT 1 - italic_α / 2 end_POSTSUBSCRIPT , (2.5)

where u1α/2subscript𝑢1𝛼2u_{1-\alpha/2}italic_u start_POSTSUBSCRIPT 1 - italic_α / 2 end_POSTSUBSCRIPT is the upper α𝛼\alphaitalic_α-th quantile of the standard normal distribution.

3 Hypothesis testing for homogeneous

In this section, we consider the homogeneous testing for the β𝛽\betaitalic_β-model. Under the null hypothesis of test (1.2), the β𝛽\betaitalic_β-model reduces to the E-R model. Then, the homogeneous testing enables the evaluation of heterogeneity among the nodes within the network. For the test (1.2), the alternative hypothesis implies that there is a pair of nodes (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) with non-equality of node parameters. Hence, using the test (1.2) on node pairs (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) will result in rejecting the null hypothesis. Intuitively, we can consider all pairs of nodes (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) for 1i<jn1𝑖𝑗𝑛1\leq i<j\leq n1 ≤ italic_i < italic_j ≤ italic_n, then using the test (1.2) on node pairs (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), which leads to n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2 testing results. A significant problem is the statistics Uijsubscript𝑈𝑖𝑗U_{ij}italic_U start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT’s are correlated and how to combine the information of n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2 results.

In the meta-analysis, methods for combining multiple test statistics are widely used in massive data analysis. Specifically, suppose we independently test the same hypothesis using K𝐾Kitalic_K different statistical tests and obtain p𝑝pitalic_p-values p1,,pKsubscript𝑝1subscript𝑝𝐾p_{1},\cdots,p_{K}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. An important issue is how to combine them into a single p𝑝pitalic_p-value. Notice that, under the null hypothesis, all pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s should follow the uniform distribution on interval [0,1]01[0,1][ 0 , 1 ]. Hence, the null hypothesis can be rewritten as

H0′′:piU[0,1]fori=1,,K.H^{{}^{\prime\prime}}_{0}:p_{i}\sim U[0,1]\ \text{for}\ i=1,\cdots,K.italic_H start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_U [ 0 , 1 ] for italic_i = 1 , ⋯ , italic_K .

The six most simple and commonly used statistics for combining p𝑝pitalic_p-values are: TF=ilogpisubscript𝑇𝐹subscript𝑖subscript𝑝𝑖T_{F}=\sum_{i}\log p_{i}italic_T start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Fisher, 1932), TP=ilog(1pi)subscript𝑇𝑃subscript𝑖1subscript𝑝𝑖T_{P}=-\sum_{i}\log(1-p_{i})italic_T start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (Pearson, 1933), TG=TF+TP=ilog{pi/(1pi)}subscript𝑇𝐺subscript𝑇𝐹subscript𝑇𝑃subscript𝑖subscript𝑝𝑖1subscript𝑝𝑖T_{G}=T_{F}+T_{P}=\sum_{i}\log\{p_{i}/(1-p_{i})\}italic_T start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + italic_T start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } (Mudholkar and George, 1979), TE=ipisubscript𝑇𝐸subscript𝑖subscript𝑝𝑖T_{E}=\sum_{i}p_{i}italic_T start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Edgington, 1972), TS=iΦ1(pi)subscript𝑇𝑆subscript𝑖superscriptΦ1subscript𝑝𝑖T_{S}=\sum_{i}\Phi^{-1}(p_{i})italic_T start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (Stouffer et al., 1949), TT=minipisubscript𝑇𝑇subscript𝑖subscript𝑝𝑖T_{T}=\min_{i}p_{i}italic_T start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Tippett, 1931). However, an obvious deficiency is that, when there is a dependence structure between pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, all these six methods do not work. Then, Liu and Xie (2020) proposed a Cauchy combination method that takes advantage of the Cauchy distribution. A nonasymptotic result was established to demonstrate that the tail of the null distribution can be effectively approximated by a Cauchy distribution, under arbitrary dependency structures. Specifically, the Cauchy test statistic has the form: TL=iwitan{(0.5pi)π}subscript𝑇𝐿subscript𝑖subscript𝑤𝑖0.5subscript𝑝𝑖𝜋T_{L}=\sum_{i}w_{i}\tan\{(0.5-p_{i})\cdot\pi\}italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tan { ( 0.5 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_π }, where the weights wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are nonnegative and iwi=1subscript𝑖subscript𝑤𝑖1\sum_{i}w_{i}=1∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1.

Recall the homogeneous test (1.2). For any pair of nodes (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), we can calculate the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and the p𝑝pitalic_p-value pijvalue=2N(0,1){X|U^ij|}superscriptsubscript𝑝𝑖𝑗𝑣𝑎𝑙𝑢𝑒2subscript𝑁01𝑋subscript^𝑈𝑖𝑗p_{ij}^{value}=2\mathbb{P}_{N(0,1)}\{X\geq|\hat{U}_{ij}|\}italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUPERSCRIPT = 2 blackboard_P start_POSTSUBSCRIPT italic_N ( 0 , 1 ) end_POSTSUBSCRIPT { italic_X ≥ | over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | }. Under the null H0superscriptsubscript𝐻0H_{0}^{\prime}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, all pijvaluesuperscriptsubscript𝑝𝑖𝑗𝑣𝑎𝑙𝑢𝑒p_{ij}^{value}italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUPERSCRIPT’s should follow the uniform distribution on interval [0,1]01[0,1][ 0 , 1 ], and they are not independent. Hence, we consider the Cauchy combination statistic:

Tn=i<jwijtan{(0.5pijvalue)π}.subscript𝑇𝑛subscript𝑖𝑗subscript𝑤𝑖𝑗0.5superscriptsubscript𝑝𝑖𝑗𝑣𝑎𝑙𝑢𝑒𝜋T_{n}=\sum_{i<j}w_{ij}\tan\{(0.5-p_{ij}^{value})\cdot\pi\}.italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT roman_tan { ( 0.5 - italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUPERSCRIPT ) ⋅ italic_π } .

According to the results in Liu and Xie (2020), the test statistic Tnsubscript𝑇𝑛T_{n}italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT has approximately a Cauchy tail even when pijvaluesuperscriptsubscript𝑝𝑖𝑗𝑣𝑎𝑙𝑢𝑒p_{ij}^{value}italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_a italic_l italic_u italic_e end_POSTSUPERSCRIPT’s are dependent, i.e.,

limt+{|Tn|t}{|C0|t}=1,subscript𝑡subscript𝑇𝑛𝑡subscript𝐶0𝑡1\lim\limits_{t\rightarrow+\infty}\dfrac{\mathbb{P}\{|T_{n}|\geq t\}}{\mathbb{P% }\{|C_{0}|\geq t\}}=1,roman_lim start_POSTSUBSCRIPT italic_t → + ∞ end_POSTSUBSCRIPT divide start_ARG blackboard_P { | italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≥ italic_t } end_ARG start_ARG blackboard_P { | italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ≥ italic_t } end_ARG = 1 ,

where C0subscript𝐶0C_{0}italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes a standard Cauchy random variable. Then, for a given nominal level α𝛼\alphaitalic_α, we have the reject rule:

RejectH0:if|Tn|>c1α/2,:Rejectsuperscriptsubscript𝐻0ifsubscript𝑇𝑛subscript𝑐1𝛼2\text{Reject}\ H_{0}^{\prime}:\ \text{if}\ |T_{n}|>c_{1-\alpha/2},Reject italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : if | italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | > italic_c start_POSTSUBSCRIPT 1 - italic_α / 2 end_POSTSUBSCRIPT ,

where c1α/2subscript𝑐1𝛼2c_{1-\alpha/2}italic_c start_POSTSUBSCRIPT 1 - italic_α / 2 end_POSTSUBSCRIPT is the upper α𝛼\alphaitalic_α-th quantile of the standard Cauchy distribution.

Remark. Compared with the resluts in Yan et al. (2022), the proposed method can test the homogeneous for n𝑛nitalic_n parameters. Following Lemma 1, when r𝑟ritalic_r diverges to infinity, the first r𝑟ritalic_r elements of 𝜷^^𝜷\hat{\bm{\beta}}over^ start_ARG bold_italic_β end_ARG may not be independent. However, in our test procedure, we only consider the two estimators β^isubscript^𝛽𝑖\hat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and β^jsubscript^𝛽𝑗\hat{\beta}_{j}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that can be seen as independent, then we can combine information from n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2 tests.

4 Simulation

In this section, we carry out extensive simulation studies to evaluate the performance of the proposed method. All simulations were performed on a PC with a single processor of 2.3 GHz 8‐Core Intel Core i9.

4.1 The empirical distribution for statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT

In this simulation, we examine the finite sample empirical distribution of the test statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT under the null and alternative hypothesis and verify the result in Theorem 1. We set n=300,500𝑛300500n=300,500italic_n = 300 , 500 and βi=iLn/(n1)subscript𝛽𝑖𝑖subscript𝐿𝑛𝑛1\beta_{i}=iL_{n}/(n-1)italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_i italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / ( italic_n - 1 ) where Ln=0,log(logn)subscript𝐿𝑛0𝑛L_{n}=0,\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 , roman_log ( roman_log italic_n ), and (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. When Ln=0subscript𝐿𝑛0L_{n}=0italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0, all βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are equal, which corresponds to H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. And, when Ln=log(logn)subscript𝐿𝑛𝑛L_{n}=\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log ( roman_log italic_n ) and (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, there is heterogeneous between nodes, which corresponds to H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

In Figures 1-3, we plot the empirical density of the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT from 1000 data replications. When Ln=0,log(logn)subscript𝐿𝑛0𝑛L_{n}=0,\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 , roman_log ( roman_log italic_n ), and (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, the plots show that the simulation result very well matches the prediction of Theorem 1. Under the null (Ln=0subscript𝐿𝑛0L_{n}=0italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0) and the alternative (Ln=log(logn)subscript𝐿𝑛𝑛L_{n}=\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log ( roman_log italic_n ) and (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT), the test statistic has different mean.

Refer to caption
Figure 1: The histogram of the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT under n=300𝑛300n=300italic_n = 300 (upper row) and n=500𝑛500n=500italic_n = 500 (lower row) when Ln=0subscript𝐿𝑛0L_{n}=0italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0. The red solid line indicates the density of the standard normal distribution.
Refer to caption
Figure 2: The histogram of the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT under n=300𝑛300n=300italic_n = 300 (upper row) and n=500𝑛500n=500italic_n = 500 (lower row) when Ln=loglog(n)subscript𝐿𝑛𝑛L_{n}=\log\log(n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log roman_log ( italic_n ). The red solid line indicates the density of the normal distribution with μ=βiβj𝜇subscript𝛽𝑖subscript𝛽𝑗\mu=\beta_{i}-\beta_{j}italic_μ = italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σ2=1superscript𝜎21\sigma^{2}=1italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.
Refer to caption
Figure 3: The histogram of the statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT under n=300𝑛300n=300italic_n = 300 (upper row) and n=500𝑛500n=500italic_n = 500 (lower row) when Ln=(log(n))1/2subscript𝐿𝑛superscript𝑛12L_{n}=(\log(n))^{1/2}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( roman_log ( italic_n ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. The red solid line indicates the density of the standard normal distribution with μ=βiβj𝜇subscript𝛽𝑖subscript𝛽𝑗\mu=\beta_{i}-\beta_{j}italic_μ = italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σ2=1superscript𝜎21\sigma^{2}=1italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.

4.2 The empirical size and power for test (1.1)

In this subsection, we investigate the empirical size and power for test (1.1), and the settings are similar to that in Section 4.1. The proportion of rejection at nominal level 0.05 is summarized in Table 1. It is easy to see that the type I error is correctly kept at the nominal level. For the alternative hypothesis, the power tends to be less than 1. In fact, when the difference between βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is small ((i,j)=(1,50)𝑖𝑗150(i,j)=(1,50)( italic_i , italic_j ) = ( 1 , 50 ) or (50,100)50100(50,100)( 50 , 100 )), the distribution of U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is close to the standard normal distribution, which leads to the power may be much less than 1. When the difference between βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and βjsubscript𝛽𝑗\beta_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is large ((i,j)=(1,100)𝑖𝑗1100(i,j)=(1,100)( italic_i , italic_j ) = ( 1 , 100 ) or (50,200)50200(50,200)( 50 , 200 )), however, the empirical powers are close to 1. The results are consistent with the results in Section 4.1. In addition, we observe that, with the sample increasing, the power of the test decreases. The main reason is that the parameter generation method makes the difference between two nodes become smaller as the number of samples increases.

Table 1: The proportion of rejection at nominal level 0.05 over 200 independent samples.
n=300𝑛300n=300italic_n = 300 n=500𝑛500n=500italic_n = 500
(i,j)𝑖𝑗(i,j)( italic_i , italic_j ) Ln=0subscript𝐿𝑛0L_{n}=0italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 Ln=loglognsubscript𝐿𝑛𝑛L_{n}=\log\log nitalic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log roman_log italic_n (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT Ln=0subscript𝐿𝑛0L_{n}=0italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 Ln=loglognsubscript𝐿𝑛𝑛L_{n}=\log\log nitalic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log roman_log italic_n Ln=(logn)1/2subscript𝐿𝑛superscript𝑛12L_{n}=(\log n)^{1/2}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT
(1,50)150(1,50)( 1 , 50 ) 0.05 0.34 0.47 0.05 0.25 0.35
(1,100)1100(1,100)( 1 , 100 ) 0.05 0.86 0.96 0.05 0.73 0.87
(50,100)50100(50,100)( 50 , 100 ) 0.06 0.31 0.41 0.06 0.27 0.31
(50,200)50200(50,200)( 50 , 200 ) 0.07 0.95 0.99 0.05 0.96 0.99

4.3 The empirical size and power for test (1.2)

In this subsection, we investigate the homogeneous test for the β𝛽\betaitalic_β-model. We also set βi=(i1)Ln/(n1)subscript𝛽𝑖𝑖1subscript𝐿𝑛𝑛1\beta_{i}=(i-1)L_{n}/(n-1)italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_i - 1 ) italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / ( italic_n - 1 ). However, we set β1==βr=0subscript𝛽1subscript𝛽𝑟0\beta_{1}=\cdots=\beta_{r}=0italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋯ = italic_β start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = 0, where r𝑟ritalic_r has five cases: n,n1,n2,n5𝑛𝑛1𝑛2𝑛5n,n-1,n-2,n-5italic_n , italic_n - 1 , italic_n - 2 , italic_n - 5, and n10𝑛10n-10italic_n - 10. It is easy to see that r=n𝑟𝑛r=nitalic_r = italic_n corresponds to the null H0superscriptsubscript𝐻0H_{0}^{\prime}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and the other four cases correspond to the null H1superscriptsubscript𝐻1H_{1}^{\prime}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. For Lnsubscript𝐿𝑛L_{n}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we consider two classes of settings: (i) Ln=(log(logn))1/2,log(logn)subscript𝐿𝑛superscript𝑛12𝑛L_{n}=(\log(\log n))^{1/2},\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( roman_log ( roman_log italic_n ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT , roman_log ( roman_log italic_n ), and (logn)1/2superscript𝑛12(\log n)^{1/2}( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT; (2) Ln=clognsubscript𝐿𝑛𝑐𝑛L_{n}=c\log nitalic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_c roman_log italic_n, where c=0.1,0.2𝑐0.10.2c=0.1,0.2italic_c = 0.1 , 0.2, and 0.50.50.50.5. The results are given in Tables 2 and 3. For the simulation results, under the null H0superscriptsubscript𝐻0H_{0}^{\prime}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (r=n𝑟𝑛r=nitalic_r = italic_n), the type I errors are close to the nominal level. For the alternative H1superscriptsubscript𝐻1H_{1}^{\prime}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the empirical power is less than 1, and the proposed method is superior to the method in Yan et al. (2022) when r𝑟ritalic_r approximates n𝑛nitalic_n. All simulation results show that the proposed method is effective and efficient.

Table 2: The proportion of rejection at nominal level 0.05 over 200 independent samples.
Ln=(log(logn))1/2subscript𝐿𝑛superscript𝑛12L_{n}=(\log(\log n))^{1/2}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( roman_log ( roman_log italic_n ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT Ln=log(logn)subscript𝐿𝑛𝑛L_{n}=\log(\log n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log ( roman_log italic_n ) Ln=(logn)1/2subscript𝐿𝑛superscript𝑛12L_{n}=(\log n)^{1/2}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( roman_log italic_n ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT
n=100𝑛100n=100italic_n = 100 r=n𝑟𝑛r=nitalic_r = italic_n 0.05 (0.07) 0.03 (0.08) 0.05 (0.05)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.97 (0.54) 1 (0.81) 1 (0.99)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 1 (0.96) 1 (1) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 1 (1) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 1 (1) 1 (1) 1 (1)
n=200𝑛200n=200italic_n = 200 r=n𝑟𝑛r=nitalic_r = italic_n 0.04 (0.08) 0.03 (0.05) 0.04 (0.09)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.94 (0.52) 1 (0.82) 1 (0.99)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 1 (0.94) 1 (0.99) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 1 (1) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 1 (1) 1 (1) 1 (1)
n=500𝑛500n=500italic_n = 500 r=n𝑟𝑛r=nitalic_r = italic_n 0.01 (0.05) 0.03 (0.07) 0.03 (0.05)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.98 (0.60) 1 (0.82) 1 (0.98)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 0.99 (0.96) 1 (1) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 1 (1) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 1 (1) 1 (1) 1 (1)
Table 3: The proportion of rejection at nominal level 0.05 over 200 independent samples.
c=0.1𝑐0.1c=0.1italic_c = 0.1 c=0.2𝑐0.2c=0.2italic_c = 0.2 c=0.5𝑐0.5c=0.5italic_c = 0.5
n=100𝑛100n=100italic_n = 100 r=n𝑟𝑛r=nitalic_r = italic_n 0.05 (0.07) 0.03 (0.08) 0.05 (0.05)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.09 (0.06) 0.73 (0.23) 1 (1)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 0.17 (0.10) 0.92 (0.67) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 0.34 (0.35) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 0.57 (0.78) 1 (1) 1 (1)
n=200𝑛200n=200italic_n = 200 r=n𝑟𝑛r=nitalic_r = italic_n 0.04 (0.05) 0.04 (0.06) 0.05 (0.07)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.46 (0.11) 1 (0.67) 1 (1)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 0.67 (0.28) 1 (0.98) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 0.95 (0.83) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 1 (1) 1 (1) 1 (1)
n=500𝑛500n=500italic_n = 500 r=n𝑟𝑛r=nitalic_r = italic_n 0.04 (0.05) 0.02 (0.06) 0.03 (0.06)
r=n1𝑟𝑛1r=n-1italic_r = italic_n - 1 0.99 (0.33) 1 (0.99) 1 (1)
r=n2𝑟𝑛2r=n-2italic_r = italic_n - 2 1 (0.84) 1 (1) 1 (1)
r=n5𝑟𝑛5r=n-5italic_r = italic_n - 5 1 (1) 1 (1) 1 (1)
r=n10𝑟𝑛10r=n-10italic_r = italic_n - 10 1 (1) 1 (1) 1 (1)

5 Real example analysis

In this section, we apply the proposed method to a real network dataset. The food web dataset is from Baird and Ulanowicz (1989) and is available in Blitzstein and Diaconis (2011), which contains data on 33 organisms (such as bacteria, oysters, and catfish) in the Chesapeake Bay during the summer. The degree sequence of this network is 𝒅=(7,8,5,1,1,2,8,10,4,2,4,5,3,6,7,3,2,7,6,1,2,9,6,1,3,4,6,3,3,3,2,4,4)𝒅7851128104245367327612961346333244\bm{d}=(7,8,5,1,1,2,8,10,4,2,4,5,3,6,7,3,2,7,6,1,2,9,6,1,3,4,6,3,3,3,2,4,4)bold_italic_d = ( 7 , 8 , 5 , 1 , 1 , 2 , 8 , 10 , 4 , 2 , 4 , 5 , 3 , 6 , 7 , 3 , 2 , 7 , 6 , 1 , 2 , 9 , 6 , 1 , 3 , 4 , 6 , 3 , 3 , 3 , 2 , 4 , 4 ). We observe that some nodes have identical degrees in this network, and the heterogeneity of the network seems not very obvious. To investigate the equality of node parameters, we consider the nodes 4, 6, 13, 11, 12, 14, 15, 2, 22, and 8, which correspond to degrees 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. Table 4 shows that the p𝑝pitalic_p-values for test problem (1.1). The result indicates that the increase in degree difference between two nodes leads to a decrease in p𝑝pitalic_p-value, which tends to reject the null hypothesis. Finally, we consider the homogeneous test (1.2). The p𝑝pitalic_p-values obtained by the proposed method and likelihood-ratio test are 0.698 and 0.998, respectively. The result shows that the network is homogeneous with high probability.

Table 4: The p𝑝pitalic_p-values of the test statistic U^ijsubscript^𝑈𝑖𝑗\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT under the test problem (1.1).
i𝑖iitalic_i j𝑗jitalic_j 4 6 13 11 12 14 15 2 22 8
4 -- 0.277 0.156 0.090 0.053 0.031 0.019 0.012 0.007 0.004
6 0.277 -- 0.316 0.189 0.110 0.063 0.035 0.019 0.011 0.006
13 0.156 0.316 -- 0.337 0.213 0.128 0.074 0.042 0.023 0.012
11 0.090 0.189 0.337 -- 0.350 0.230 0.143 0.085 0.049 0.027
12 0.053 0.110 0.213 0.350 -- 0.360 0.243 0.156 0.095 0.055
14 0.031 0.063 0.128 0.230 0.360 -- 0.367 0.254 0.167 0.104
15 0.019 0.035 0.074 0.143 0.243 0.367 -- 0.373 0.263 0.176
2 0.012 0.019 0.042 0.085 0.156 0.254 0.373 -- 0.378 0.271
22 0.007 0.011 0.023 0.049 0.095 0.167 0.263 0.378 -- 0.382
8 0.004 0.006 0.012 0.027 0.055 0.104 0.176 0.271 0.382 --

6 Conclusion

In this article, we have proposed a novel statistic to investigate the equality test for the two nodes of the β𝛽\betaitalic_β-model. Based on the central limit theorem, we have proved the limiting distribution of the proposed statistic is the standard normal distribution. Then, plugging in the MLE of parameters, we have proved that the limiting distribution of the empirical counterpart of the test statistic is also the standard normal distribution under some mild conditions. Under the alternative hypothesis, the limit distribution of the test statistic has also been proven to be a normal distribution with a different mean from the null distribution. Further, based on the combining p𝑝pitalic_p-values method, we have investigated the homogeneous test for the β𝛽\betaitalic_β-model. Empirically, by extensive simulation studies, we have demonstrated that the size and the power of the test are valid.

It is worth noting that the proposed test method works well when the difference between the parameters of two nodes is large. However, the power will decrease when the difference between the parameters of two nodes is small. Hence, we need to consider how to improve the power of the proposed test for hypothesis test (1.1) under the case of 0<βiβjε0subscript𝛽𝑖subscript𝛽𝑗𝜀0<\beta_{i}-\beta_{j}\leq\varepsilon0 < italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_ε for a small constant ε>0𝜀0\varepsilon>0italic_ε > 0. Next, we can also consider extending the single sample to the multi-sample, such as H0:𝜷1=𝜷2:subscript𝐻0subscript𝜷1subscript𝜷2H_{0}:\bm{\beta}_{1}=\bm{\beta}_{2}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for two β𝛽\betaitalic_β-models with parameters 𝜷1subscript𝜷1\bm{\beta}_{1}bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝜷2subscript𝜷2\bm{\beta}_{2}bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We will continue to study this issue in future work.

7 Appendix

7.1 Proof of Theorem 1

First, we consider the case of H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. According to the Taylor expansion, we have, for any 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n,

v^ii1vii1=vii2(v^iivii).superscriptsubscript^𝑣𝑖𝑖1superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑖𝑖2subscript^𝑣𝑖𝑖subscript𝑣𝑖𝑖\hat{v}_{ii}^{-1}-v_{ii}^{-1}=-v_{ii}^{-2}(\hat{v}_{ii}-v_{ii}).over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ) .

Following the definition of vijsubscript𝑣𝑖𝑗v_{ij}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, it is easy to see that

n14e2Lnviin14and16(n1)2vii216(n1)2e4Ln.formulae-sequence𝑛14superscript𝑒2subscript𝐿𝑛subscript𝑣𝑖𝑖𝑛14and16superscript𝑛12superscriptsubscript𝑣𝑖𝑖216superscript𝑛12superscript𝑒4subscript𝐿𝑛\dfrac{n-1}{4}e^{-2L_{n}}\leq v_{ii}\leq\dfrac{n-1}{4}\quad\text{and}\quad% \dfrac{16}{(n-1)^{2}}\leq v_{ii}^{-2}\leq\dfrac{16}{(n-1)^{2}}e^{-4L_{n}}.divide start_ARG italic_n - 1 end_ARG start_ARG 4 end_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ≤ divide start_ARG italic_n - 1 end_ARG start_ARG 4 end_ARG and divide start_ARG 16 end_ARG start_ARG ( italic_n - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 16 end_ARG start_ARG ( italic_n - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT - 4 italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (7.1)

Next, we consider to bound the terms v^iiviisubscript^𝑣𝑖𝑖subscript𝑣𝑖𝑖\hat{v}_{ii}-v_{ii}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT. Define f(x)=ex/{1+ex}2𝑓𝑥superscript𝑒𝑥superscript1superscript𝑒𝑥2f(x)=e^{x}/\{1+e^{x}\}^{2}italic_f ( italic_x ) = italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT / { 1 + italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, then f(x)=ex(ex1)/{1+ex}3superscript𝑓𝑥superscript𝑒𝑥superscript𝑒𝑥1superscript1superscript𝑒𝑥3f^{\prime}(x)=-e^{x}(e^{x}-1)/\{1+e^{x}\}^{3}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = - italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - 1 ) / { 1 + italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. For any 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n,

v^iiviisubscript^𝑣𝑖𝑖subscript𝑣𝑖𝑖\displaystyle\hat{v}_{ii}-v_{ii}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ji|v^ijvij|absentsubscript𝑗𝑖subscript^𝑣𝑖𝑗subscript𝑣𝑖𝑗\displaystyle\leq\sum_{j\neq i}|\hat{v}_{ij}-v_{ij}|≤ ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT |
=ji|f(β^i+β^j)f(βi+βj)|absentsubscript𝑗𝑖𝑓subscript^𝛽𝑖subscript^𝛽𝑗𝑓subscript𝛽𝑖subscript𝛽𝑗\displaystyle=\sum_{j\neq i}|f(\hat{\beta}_{i}+\hat{\beta}_{j})-f(\beta_{i}+% \beta_{j})|= ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_f ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_f ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) |
=ji|f(βi+βj)(β^i+β^jβiβj)|absentsubscript𝑗𝑖superscript𝑓subscript𝛽𝑖subscript𝛽𝑗subscript^𝛽𝑖subscript^𝛽𝑗subscript𝛽𝑖subscript𝛽𝑗\displaystyle=\sum_{j\neq i}|f^{\prime}(\beta_{i}+\beta_{j})(\hat{\beta}_{i}+% \hat{\beta}_{j}-\beta_{i}-\beta_{j})|= ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) |
ji2|f(βi+βj)||β^iβi|.absentsubscript𝑗𝑖2superscript𝑓subscript𝛽𝑖subscript𝛽𝑗subscript^𝛽𝑖subscript𝛽𝑖\displaystyle\leq\sum_{j\neq i}2|f^{\prime}(\beta_{i}+\beta_{j})|\cdot|\hat{% \beta}_{i}-\beta_{i}|.≤ ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT 2 | italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | ⋅ | over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

Notice that |f(x)|1/63superscript𝑓𝑥163|f^{\prime}(x)|\leq 1/6\sqrt{3}| italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) | ≤ 1 / 6 square-root start_ARG 3 end_ARG and the convergence rate of β^isubscript^𝛽𝑖\hat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is between Op(n1/2eLn)subscript𝑂𝑝superscript𝑛12superscript𝑒subscript𝐿𝑛O_{p}(n^{-1/2}e^{L_{n}})italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and Op(n1/2)subscript𝑂𝑝superscript𝑛12O_{p}(n^{-1/2})italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ). Hence, we have |v^iivii|=Op(n1/2eLn)subscript^𝑣𝑖𝑖subscript𝑣𝑖𝑖subscript𝑂𝑝superscript𝑛12superscript𝑒subscript𝐿𝑛|\hat{v}_{ii}-v_{ii}|=O_{p}(n^{1/2}e^{L_{n}})| over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT | = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ). Combining with (7.1), we have, for any 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n,

|v^ii1vii1|=Op(n3/2e5Ln).superscriptsubscript^𝑣𝑖𝑖1superscriptsubscript𝑣𝑖𝑖1subscript𝑂𝑝superscript𝑛32superscript𝑒5subscript𝐿𝑛|\hat{v}_{ii}^{-1}-v_{ii}^{-1}|=O_{p}(n^{-3/2}e^{5L_{n}}).| over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT | = italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 3 / 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 5 italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) .

Thus, we have, for any 1ijj1𝑖𝑗𝑗1\leq i\neq j\leq j1 ≤ italic_i ≠ italic_j ≤ italic_j,

U^ijsubscript^𝑈𝑖𝑗\displaystyle\hat{U}_{ij}over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =β^iβ^jv^ii1+v^jj1absentsubscript^𝛽𝑖subscript^𝛽𝑗superscriptsubscript^𝑣𝑖𝑖1superscriptsubscript^𝑣𝑗𝑗1\displaystyle=\dfrac{\hat{\beta}_{i}-\hat{\beta}_{j}}{\sqrt{\hat{v}_{ii}^{-1}+% \hat{v}_{jj}^{-1}}}= divide start_ARG over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG
=β^iβ^jvii1+vjj1×vii1+vjj1v^ii1+v^jj1absentsubscript^𝛽𝑖subscript^𝛽𝑗superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗1superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗1superscriptsubscript^𝑣𝑖𝑖1superscriptsubscript^𝑣𝑗𝑗1\displaystyle=\dfrac{\hat{\beta}_{i}-\hat{\beta}_{j}}{\sqrt{v_{ii}^{-1}+v_{jj}% ^{-1}}}\times\dfrac{\sqrt{v_{ii}^{-1}+v_{jj}^{-1}}}{\sqrt{\hat{v}_{ii}^{-1}+% \hat{v}_{jj}^{-1}}}= divide start_ARG over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG × divide start_ARG square-root start_ARG italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG
=β^iβ^jvii1+vjj1×(1+Op(n3/4e5Ln/2)).absentsubscript^𝛽𝑖subscript^𝛽𝑗superscriptsubscript𝑣𝑖𝑖1superscriptsubscript𝑣𝑗𝑗11subscript𝑂𝑝superscript𝑛34superscript𝑒5subscript𝐿𝑛2\displaystyle=\dfrac{\hat{\beta}_{i}-\hat{\beta}_{j}}{\sqrt{v_{ii}^{-1}+v_{jj}% ^{-1}}}\times\left(1+O_{p}(n^{-3/4}e^{5L_{n}/2})\right).= divide start_ARG over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_v start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_v start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG end_ARG × ( 1 + italic_O start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 3 / 4 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 5 italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT ) ) .

According to the Slutsky’s theorem, we have U^ijdN(0,1)superscript𝑑subscript^𝑈𝑖𝑗𝑁01\hat{U}_{ij}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1)over^ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG italic_d end_ARG end_RELOP italic_N ( 0 , 1 ).

The proof of the alternative H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are similar to that of the null H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we omit the details in the article.

Acknowledgments

Hu is partially supported by the National Natural Science Foundation of China (nos. 12171187, 12371261).

References

  • Baird and Ulanowicz (1989) Baird, D., and R. E. Ulanowicz (1989), The seasonal dynamics of the Chesapeake bay ecosystem, Ecological Monographs, 59(4), 329–364, doi:10.2307/1943071.
  • Bickel and Sarkar (2016) Bickel, P., and P. Sarkar (2016), Hypothesis testing for automated community detection in networks, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1), 253–273, doi:10.1111/rssb.12117.
  • Blitzstein and Diaconis (2011) Blitzstein, J., and P. Diaconis (2011), A sequential importance sampling algorithm for generating random graphs with prescribed degrees, Internet Mathematics, 6(4), 489–522, doi:10.1080/15427951.2010.557277.
  • Cammarata and Ke (2023) Cammarata, L. V., and Z. T. Ke (2023), Power enhancement and phase transitions for global testing of the mixed membership stochastic block model, Bernoulli, 29(3), 1741–1763, doi:10.3150/22-BEJ1519.
  • Chatterjee et al. (2011) Chatterjee, S., P. Diaconis, and A. Sly (2011), Random graphs with a given degree sequence, The Annals of Applied Probability, 21(4), 1400–1435, doi:10.1214/10-AAP728.
  • Dong et al. (2020) Dong, Z., S. Wang, and Q. Liu (2020), Spectral based hypothesis testing for community detection in complex networks, Information Sciences, 512, 1360–1371, doi:10.1016/j.ins.2019.10.056.
  • Du and Tang (2023) Du, X., and M. Tang (2023), Hypothesis testing for equality of latent positions in random graphs, Bernoulli, 29(4), 3221–3254, doi:10.3150/22-BEJ1581.
  • Edgington (1972) Edgington, E. S. (1972), An additive method for combining probability values from independent experiments, The Journal of Psychology, 80(2), 351–363, doi:10.1080/00223980.1972.9924813.
  • Fan et al. (2022) Fan, J., Y. Fan, X. Han, and J. Lv (2022), SIMPLE: Statistical inference on membership profiles in large networks, Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2), 630–653, doi:10.1111/rssb.12505.
  • Fisher (1932) Fisher, R. A. (1932), Statistical Methods for Research Workers, 4th ed., Oliver and Boyd, London.
  • Fu et al. (2022) Fu, K., J. Hu, S. Keita, and H. Liu (2022), Two-sample test for stochastic block models via the largest singular value, arXiv:2211.09123.
  • Fu et al. (2023) Fu, K., J. Hu, S. Keita, and H. Liu (2023), Two-sample test for stochastic block models via maximum entry-wise deviation, Statistics and Its Interface (Accepted).
  • Holland and Leinhardt (1981) Holland, P. W., and S. Leinhardt (1981), An exponential family of probability distributions for directed graphs, Journal of the American Statistical Association, 76(373), 33–50, doi:10.1080/01621459.1981.10477598.
  • Hu et al. (2021) Hu, J., J. Zhang, H. Qin, T. Yan, and J. Zhu (2021), Using maximum entry-wise deviation to test the goodness of fit for stochastic block models, Journal of the American Statistical Association, 116(535), 1373–1382, doi:10.1080/01621459.2020.1722676.
  • Hunter et al. (2012) Hunter, D. R., S. M. Goodreau, and M. S. Handcock (2012), Goodness of fit of social network models, Journal of the American Statistical Association, 103(481), 248–258, doi:10.1198/016214507000000446.
  • ** et al. (2023) **, J., Z. T. Ke, S. Luo, and M. Wang (2023), Optimal estimation of the number of network communities, Journal of the American Statistical Association, 118(543), 2101–2116, doi:10.1080/01621459.2022.2035736.
  • Lei (2016) Lei, J. (2016), A goodness-of-fit test for stochastic block models, The Annals of Statistics, 44(1), 401–424, doi:10.1214/15-AOS1370.
  • Liu and Xie (2020) Liu, Y., and J. Xie (2020), Cauchy combination test: A powerful test with analytic p𝑝pitalic_p-value calculation under arbitrary dependency structures, Journal of the American Statistical Association, 115(529), 393–402, doi:10.1080/01621459.2018.1554485.
  • Mudholkar and George (1979) Mudholkar, G. S., and E. O. George (1979), The logit statistic for combining probabilities, in Symposium on Optimizing Methods in Statistics, edited by J. Rustagi, p. 345–366, Academic Press, New York.
  • Mukherjee et al. (2018) Mukherjee, R., S. Mukherjee, and S. Sen (2018), Detection thresholds for the β𝛽\betaitalic_β-model on sparse graphs, The Annals of Statistics, 46(3), 1288–1317, doi:10.1214/17-AOS1585.
  • Ogawa et al. (2013) Ogawa, M., H. Hara, and A. Takemura (2013), Graver basis for an undirected graph and its application to testing the beta model of random graphs, Annals of the Institute of Statistical Mathematics, 65(1), 191–212, doi:10.1007/s10463-012-0367-8.
  • Pearson (1933) Pearson, K. (1933), On a method of determining whether a sample of size n supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random, Biometrika, 25(3-4), 379–410, doi:10.1093/biomet/25.3-4.379.
  • Rinaldo et al. (2013) Rinaldo, A., S. Petrović, and S. E. Fienberg (2013), Maximum lilkelihood estimation in the β𝛽\betaitalic_β-model, The Annals of Statistics, 41(3), 1085–1110, doi:10.1214/12-AOS1078.
  • Scott (2000) Scott, J. (2000), Social network analysis: A handbook, 2nd ed., SAGE, London.
  • Stouffer et al. (1949) Stouffer, S. A., E. A. Suchman, L. C. Devinney, S. A. Star, and R. M. Williams (1949), The American Soldier. Adjustment During Army Life, Princeton University Press, Princeton.
  • Tippett (1931) Tippett, L. H. C. (1931), The Methods of Statistics, Williams and Norgate, London.
  • Yan and Xu (2013) Yan, T., and J. Xu (2013), A central limit theorem in the β𝛽\betaitalic_β-model for undirected random graphs with a diverging number of vertices, Biometrika, 100(2), 519–524, doi:10.1093/biomet/ass084.
  • Yan et al. (2015) Yan, T., Y. Zhao, and H. Qin (2015), Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters, Journal of Multivariate Analysis, 133, 61–76, doi:10.1016/j.jmva.2014.08.013.
  • Yan et al. (2016) Yan, T., H. Qin, and H. Wang (2016), Asymptotics in undirected random graph models parameterized by the strengths of vertices, Statistica Sinica, 26(3), 273–293, doi:10.5705/ss.2014.180.
  • Yan et al. (2022) Yan, T., Y. Li, J. Xu, Y. Yang, and J. Zhu (2022), Wilks’ theorems in the β𝛽\betaitalic_β-model, arXiv:2211.10055.