Class-Imbalanced Graph Learning without Class Rebalancing

Zhining Liu    Ruizhong Qiu    Zhichen Zeng    Hyunsik Yoo    David Zhou    Zhe Xu    Yada Zhu    Kommy Weldemariam    **grui He    Hanghang Tong
Abstract

Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models. Most existing studies are rooted in a class-rebalancing (CR) perspective and address class imbalance with class-wise reweighting or resampling. In this work, we approach the root cause of class-imbalance bias from an topological paradigm. Specifically, we theoretically reveal two fundamental phenomena in the graph topology that greatly exacerbate the predictive bias stemming from class imbalance. On this basis, we devise a lightweight topological augmentation framework Bat to mitigate the class-imbalance bias without class rebalancing. Being orthogonal to CR, Bat can function as an efficient plug-and-play module that can be seamlessly combined with and significantly boost existing CR techniques. Systematic experiments on real-world imbalanced graph learning tasks show that Bat can deliver up to 46.27% performance gain and up to 72.74% bias reduction over existing techniques. Code, examples, and documentations are available at https://github.com/ZhiningLiu1998/BAT.

Machine Learning, ICML
\newtoggle

highlightrev \togglefalsehighlightrev


1 Introduction

Refer to caption
(a) L: concept of AMP. R: relative performance loss with respect to the non-self-class neighbor ratio.
Refer to caption
(b) L: concept of DMP. R: relative performance loss w.r.t. distance to the nearest same-class labeled node.
Figure 1: Concepts of ambivalent message-passing (AMP) and distant message-passing (DMP) and their impact in real-world imbalanced node classification tasks (Park et al., 2022). Both factors lead to a substantial increase in prediction errors, and further, a larger performance disparity/bias (i.e., the gap between the blue and orange curves) between the majority and minority classes.

Node classification stands as one of the most fundamental tasks in graph machine learning, holding significant relevance in various real-world applications (Akoglu et al., 2015; Tang & Liu, 2010). Graph Neural Networks (GNNs) have demonstrated great success in tackling related tasks due to their robust representation learning capabilities (Song et al., 2022b; Fu & He, 2021). However, real-world graphs are often inherently class-imbalanced, i.e., the sizes of unique classes vary significantly, and a few majority classes have overwhelming numbers in the training set. In Class-Imbalanced Graph Learning (CIGL), GNNs are prone to suffer from severe performance degradation on minority class nodes (Park et al., 2022). This results in a pronounced predictive bias characterized by a large performance disparity between the majority and minority classes.

Traditional imbalance-handling techniques rely on class rebalancing (CR) such as class reweighting and resampling (Chawla et al., 2002; Cui et al., 2019), which works well for non-graph data. Recent studies propose more graph-specific CR strategies tailored for CIGL, e.g., neighborhood-aware reweighting (Li et al., 2022; Huang et al., 2022) and oversampling (Zhao et al., 2021b; Park et al., 2022). Nonetheless, these works are restricted to the class-rebalancing paradigm. Parallel to class imbalance, another emerging line of research studies topology imbalance, characterized by “the asymmetric topological properties of the labeled nodes” (Chen et al., 2021). It is considered an orthogonal problem to class imbalance, and hence, few work theoretically investigates how topological structure affects the learning on class imbalanced graphs. To fill this gap, we conduct an in-depth analysis of the role that topology plays in class-imbalanced graph learning. We theoretically show that topological differences between minority and majority classes significantly amplify the class imbalance bias, imposing great challenges to CIGL. This reveal an unexplored avenue that limits the performance of existing CIGL techniques: mitigating class imbalance bias arising from imbalanced topology structure through topological operations. Following this novel perspective, we devise a lightweight practical solution for CIGL that can be seamlessly combined with and further boost existing CR techniques.

In this work, we formally define and theoretically investigate two fundamental local topological phenomena that greatly hinder CIGL: (i) ambivalent message-passing (AMP), i.e., high ratio of non-self-class neighbors in the node receptive field, and (ii) distant message-passing (DMP), i.e., poor connectivity with self-class labeled nodes. Intuitively, AMP leads to a higher influx of noisy information and DMP leads to poor reception of effective supervision signals in message-passing. Both result in lower signal-to-noise ratios and thus induce higher classification errors. Our theoretical finding reveals that the minority class is inherently more susceptible to both AMP and DMP (Theorem 2.1 & 2.2), which leads to a more pronounced predictive bias. Such bias induced by the graph topology escalates as the level of class imbalance increases. We emphasize that AMP/DMP is defined for all nodes based on the local neighborhood, while influence conflict has no formal definition and influence insufficiency is defined only on labeled nodes with global PageRank score (Chen et al., 2021). To distinguish from them, we use the terminology AMP/DMP instead. Further discussions can be found in Section 5. Fig. 1 visually illustrates the concepts of AMP and DMP, highlighting their distinct impacts on the predictive performance of majority and minority classes.

Following our theoretical and empirical findings, we devise Bat (BAlanced Topological augmentation), a model-agnostic and efficient technique to mitigate class imbalance bias in CIGL via topological augmentation. Bat dynamically locates and rectifies nodes critically influenced by AMP and DMP during learning, thereby effectively reducing the errors and biases in CIGL. Being orthogonal to class rebalancing, our solution is able to work hand-in-hand with existing techniques based on reweighting (Japkowicz & Stephen, 2002; Chen et al., 2021) and resampling (Zhao et al., 2021b; Park et al., 2022) and further boost their performance. Systematic experiments on real-world CIGL tasks show that Bat delivers significant performance boost (up to 46.27%) and bias reduction (up to 72.74%) over various CIGL baselines with diverse GNN architectures.

Our contributions: (i) Novel Perspective. We demonstrate the feasibility of taming class-imbalance bias without class rebalancing, which provides a new avenue that is orthogonal to the predominant class-rebalancing practice in CIGL. (ii) Theoretical Insights. We theoretically reveal the topological difference between minority and majority classes and its role in sha** predictive bias in CIGL, shedding light on future CIGL research. Empirical results validate our findings. (iii) Practical Solution. Motivated by theoretical and empirical finding, we devise a lightweight and versatile framework Bat to handle topological challenges in CIGL. Being complementary to class rebalancing, it can be seamlessly combined with and significantly boost existing CIGL techniques. (iv) Empirical Study. Systematic experiments and analysis across a diverse range of real-world tasks and GNN architectures show that Bat consistently demonstrates superior performance in both promoting classification and mitigating predictive bias.

2 Class Imbalance and Local Topology

Refer to caption
(a) Distribution of node AMP/DMP coefficients.
Refer to caption
(b) Impact of AMP/DMP on predictive performance.
Figure 2: Node-level distribution of AMP and DMP coefficients and their impact on learning.

In this section, we delve into the impact of graph topology on the predictive bias in class-imbalanced node classification. We theoretically unveil that compared to the majority class, the minority class is inherently more susceptible to both Ambivalent Message Passing (AMP) and Distant Message Passing (DMP). This significantly worsens minority-class performance and leads to a more pronounced predictive bias stemming from class imbalance. After that, we present an empirical analysis to validate our theoretical findings, and to provide insights on how to mitigate the bias induced by AMP and DMP in practice. Detailed proofs can be found in Appendix A.

Theoretical analysis on local topology. Consider a graph 𝒢:(𝒱,):𝒢𝒱{\mathcal{G}}:({\mathcal{V}},{\mathcal{E}})caligraphic_G : ( caligraphic_V , caligraphic_E ) from a stochastic block model (Holland et al., 1983) SBM(n,p,q)SBM𝑛𝑝𝑞\text{SBM}(n,p,q)SBM ( italic_n , italic_p , italic_q ), where n𝑛nitalic_n is the total number of nodes, p𝑝pitalic_p and q𝑞qitalic_q are the intra- and inter-class node connection probability. To facilitate analysis, we call node u𝑢uitalic_u homo-connected to node v𝑣vitalic_v if there is a path [u,v1,,vk,v]𝑢subscript𝑣1subscript𝑣𝑘𝑣[u,v_{1},...,v_{k},v][ italic_u , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_v ] where v1,,vk,vsubscript𝑣1subscript𝑣𝑘𝑣v_{1},...,v_{k},vitalic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_v are of the same class, and let (u,k)𝑢𝑘{\mathcal{H}}(u,k)caligraphic_H ( italic_u , italic_k ) denote the set of k𝑘kitalic_k-hop homo-connected neighbors of u𝑢uitalic_u. For binary node classification, we denote the number of nodes of class i𝑖iitalic_i as nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (n1+n2=nsubscript𝑛1subscript𝑛2𝑛n_{1}+n_{2}=nitalic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_n); without loss of generality, let class 1111/2222 be the minority/majority class (thus n1n2much-less-thansubscript𝑛1subscript𝑛2n_{1}\ll n_{2}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≪ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). We denote class i𝑖iitalic_i’s node set as 𝒱isubscript𝒱𝑖{\mathcal{V}}_{i}caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and labeled node set as 𝒱iLsubscriptsuperscript𝒱L𝑖{\mathcal{V}}^{\textnormal{L}}_{i}caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (𝒱iabsentsubscript𝒱𝑖\subset{\mathcal{V}}_{i}⊂ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). For asymptotic analysis, we adopt conventional assumptions: n1p=β+𝒪(1n)subscript𝑛1𝑝𝛽𝒪1𝑛n_{1}\cdot p=\beta+{\mathcal{O}}\big{(}\frac{1}{n}\big{)}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_p = italic_β + caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) (i.e., β𝛽\betaitalic_β is the average intra-class node degree of class 1); p/q=𝒪(1)𝑝𝑞𝒪1p/q={\mathcal{O}}(1)italic_p / italic_q = caligraphic_O ( 1 ) (Decelle et al., 2011).

We now give formal definitions of AMP and DMP. For a node u𝑢uitalic_u from class i𝑖iitalic_i, we define its (i) k𝑘kitalic_k-hop AMP coefficient αk(u)[0,)superscript𝛼𝑘𝑢0\alpha^{k}(u)\in[0,\infty)italic_α start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) ∈ [ 0 , ∞ ) as the ratio of the expected number of non-self-class nodes to self-class nodes in its k𝑘kitalic_k-hop neighborhood (u,k)𝑢𝑘{\mathcal{H}}(u,k)caligraphic_H ( italic_u , italic_k ), i.e., αk(u):=|{v|v𝒱i,v(u,k)}||{v|v𝒱i,v(u,k)}|assignsuperscript𝛼𝑘𝑢conditional-set𝑣formulae-sequence𝑣subscript𝒱𝑖𝑣𝑢𝑘conditional-set𝑣formulae-sequence𝑣subscript𝒱𝑖𝑣𝑢𝑘\alpha^{k}(u):=\frac{|\{v|v\notin{\mathcal{V}}_{i},v\in{\mathcal{H}}(u,k)\}|}{% |\{v|v\in{\mathcal{V}}_{i},v\in{\mathcal{H}}(u,k)\}|}italic_α start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) := divide start_ARG | { italic_v | italic_v ∉ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ∈ caligraphic_H ( italic_u , italic_k ) } | end_ARG start_ARG | { italic_v | italic_v ∈ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ∈ caligraphic_H ( italic_u , italic_k ) } | end_ARG; (ii) k𝑘kitalic_k-hop DMP coefficient δk(u){0,1}superscript𝛿𝑘𝑢01\delta^{k}(u)\in\{0,1\}italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) ∈ { 0 , 1 } as the indicator of whether all labeled nodes in its k𝑘kitalic_k-hop neighborhood are NON-self-class, i.e., δk(u):=𝟙(Lik(u)=0,ΣjLjk(u)>0), where Ljk(u)=|{v|v𝒱jL,v(u,k)}|formulae-sequenceassignsuperscript𝛿𝑘𝑢1formulae-sequencesubscriptsuperscript𝐿𝑘𝑖𝑢0subscriptΣ𝑗subscriptsuperscript𝐿𝑘𝑗𝑢0 where subscriptsuperscript𝐿𝑘𝑗𝑢conditional-set𝑣formulae-sequence𝑣subscriptsuperscript𝒱L𝑗𝑣𝑢𝑘\delta^{k}(u):=\mathds{1}(L^{k}_{i}(u)=0,\Sigma_{j}L^{k}_{j}(u)>0),\text{ % where }L^{k}_{j}(u)=|\{v|v\in{\mathcal{V}}^{\textnormal{L}}_{j},v\in{\mathcal{% H}}(u,k)\}|italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) := blackboard_1 ( italic_L start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_u ) = 0 , roman_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_u ) > 0 ) , where italic_L start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_u ) = | { italic_v | italic_v ∈ caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v ∈ caligraphic_H ( italic_u , italic_k ) } |. For an intuitive example, the target node (marked by the dashed box) in Fig. 1(a) has α1(u)=3/1,δ1(u)=0formulae-sequencesuperscript𝛼1𝑢31superscript𝛿1𝑢0\alpha^{1}(u)=3/1,\delta^{1}(u)=0italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_u ) = 3 / 1 , italic_δ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_u ) = 0 and node in Fig. 1(b) has α1(u)=1/1,δ1(u)=1formulae-sequencesuperscript𝛼1𝑢11superscript𝛿1𝑢1\alpha^{1}(u)=1/1,\delta^{1}(u)=1italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_u ) = 1 / 1 , italic_δ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_u ) = 1. Further, to characterize the level of AMP/DMP for different class, for class i𝑖iitalic_i we define αik:=𝔼u𝒱i[|{v|v𝒱i,v(u,k)}|]𝔼u𝒱i[|{v|v𝒱i,v(u,k)}|]assignsubscriptsuperscript𝛼𝑘𝑖subscript𝔼𝑢subscript𝒱𝑖delimited-[]conditional-set𝑣formulae-sequence𝑣subscript𝒱𝑖𝑣𝑢𝑘subscript𝔼𝑢subscript𝒱𝑖delimited-[]conditional-set𝑣formulae-sequence𝑣subscript𝒱𝑖𝑣𝑢𝑘\alpha^{k}_{i}:=\frac{{\mathbb{E}}_{u\in{\mathcal{V}}_{i}}[|\{v|v\notin{% \mathcal{V}}_{i},v\in{\mathcal{H}}(u,k)\}|]}{{\mathbb{E}}_{u\in{\mathcal{V}}_{% i}}[|\{v|v\in{\mathcal{V}}_{i},v\in{\mathcal{H}}(u,k)\}|]}italic_α start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG blackboard_E start_POSTSUBSCRIPT italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | { italic_v | italic_v ∉ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ∈ caligraphic_H ( italic_u , italic_k ) } | ] end_ARG start_ARG blackboard_E start_POSTSUBSCRIPT italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | { italic_v | italic_v ∈ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v ∈ caligraphic_H ( italic_u , italic_k ) } | ] end_ARG and δik:=(δk(u)=1)assignsubscriptsuperscript𝛿𝑘𝑖superscript𝛿𝑘𝑢1\delta^{k}_{i}:={\mathds{P}}(\delta^{k}(u)=1)italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := blackboard_P ( italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) = 1 ), where u𝑢uitalic_u is a node of class i𝑖iitalic_i. Intuitively, a higher αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates that class i𝑖iitalic_i is more susceptible to AMP or DMP. Building on these metrics, we analyze the disparities in α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ between minority and majority classes, thereby providing insights into how the graph topology induces additional class-imbalance bias.

To simplify notation, we define imbalance ratio ρ:=n2/n1assign𝜌subscript𝑛2subscript𝑛1\rho:=n_{2}/n_{1}italic_ρ := italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The larger the ρ𝜌\rhoitalic_ρ is, the more imbalanced the dataset is. Then for AMP, we have the following Theorem 2.1.

Theorem 2.1 (AMP-sourced bias).

For a large n𝑛nitalic_n, the ratio of AMP coefficients α𝛼\alphaitalic_α for the minority class to the majority class grows polynomially with the imbalance ratio ρ𝜌\rhoitalic_ρ and exponentially with k𝑘kitalic_k:

α1kα2k=(ρt=1k(ρβ)t1t=1kβt1)2+𝒪(1n).superscriptsubscript𝛼1𝑘superscriptsubscript𝛼2𝑘superscript𝜌superscriptsubscript𝑡1𝑘superscript𝜌𝛽𝑡1superscriptsubscript𝑡1𝑘superscript𝛽𝑡12𝒪1𝑛\frac{\alpha_{1}^{k}}{\alpha_{2}^{k}}=\bigg{(}\rho\cdot\frac{\sum_{t=1}^{k}(% \rho\beta)^{t-1}}{\sum_{t=1}^{k}\beta^{t-1}}\bigg{)}^{\!2}+{\mathcal{O}}\Big{(% }\frac{1}{n}\Big{)}.divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG = ( italic_ρ ⋅ divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_ρ italic_β ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) . (1)
Proof.

Please see Appendix A.2.∎

Theorem 2.1 shows that the same-class neighbor proportion of minority-class nodes is significantly smaller than that of majority-class nodes, i.e., the minority class is more susceptible to AMP. As the imbalance ratio ρ𝜌\rhoitalic_ρ increases, this issue becomes even more pronounced and introduces a higher bias into the learning process. Moving on to DMP, we have the following theorem 2.2.

Theorem 2.2 (DMP-sourced bias).

Let riL:=|𝒱iL||𝒱i|assignsubscriptsuperscript𝑟L𝑖subscriptsuperscript𝒱L𝑖subscript𝒱𝑖r^{\textnormal{L}}_{i}:=\frac{|{\mathcal{V}}^{\textnormal{L}}_{i}|}{|{\mathcal% {V}}_{i}|}italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG | caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG start_ARG | caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG denote the label rate of class i𝑖iitalic_i. For a large n𝑛nitalic_n, the ratio of DMP coefficients δ𝛿\deltaitalic_δ of the minority class over the majority class grows exponentially with ρ𝜌\rhoitalic_ρ:

δ1kδ2k1r1L1r2Le(ρ1)β+𝒪(1n).superscriptsubscript𝛿1𝑘superscriptsubscript𝛿2𝑘1subscriptsuperscript𝑟L11subscriptsuperscript𝑟L2superscripte𝜌1𝛽𝒪1𝑛\frac{\delta_{1}^{k}}{\delta_{2}^{k}}\approx\frac{1-r^{\textnormal{L}}_{1}}{1-% r^{\textnormal{L}}_{2}}{\mathrm{e}}^{(\rho-1)\beta}+{\mathcal{O}}\Big{(}\frac{% 1}{n}\Big{)}.divide start_ARG italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG ≈ divide start_ARG 1 - italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG roman_e start_POSTSUPERSCRIPT ( italic_ρ - 1 ) italic_β end_POSTSUPERSCRIPT + caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) . (2)
Proof.

Please see Appendix A.3.∎

Similarly, the result shows that the minority class exhibits a significantly higher susceptibility to DMP than the majority class. Theorem 2.2 also has several interesting implications: (i) The imbalance ratio greatly affects the bias induced by DMP, as δ1k/δ2ksubscriptsuperscript𝛿𝑘1subscriptsuperscript𝛿𝑘2\delta^{k}_{1}/\delta^{k}_{2}italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT grows exponentially with ρ𝜌\rhoitalic_ρ. (ii) Labeling more minority-class nodes can mitigate, but hardly solve the problem. Enlarging the minority-class label rate r1Lsubscriptsuperscript𝑟L1r^{\textnormal{L}}_{1}italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can linearly shrink δ1k/δ2ksuperscriptsubscript𝛿1𝑘superscriptsubscript𝛿2𝑘\delta_{1}^{k}/\delta_{2}^{k}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT / italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, but it can hardly eliminate the bias induced by DMP (i.e., to have δ1kδ2ksuperscriptsubscript𝛿1𝑘superscriptsubscript𝛿2𝑘\delta_{1}^{k}\leq\delta_{2}^{k}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≤ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT) as e(ρ1)βsuperscripte𝜌1𝛽{\mathrm{e}}^{(\rho-1)\beta}roman_e start_POSTSUPERSCRIPT ( italic_ρ - 1 ) italic_β end_POSTSUPERSCRIPT is usually very large in practice. Take the Cora dataset (Sen et al., 2008) as an example (let class 1111/class 2222 denote the smallest/largest class): eliminating the DMP bias requires the minority-class label rate r1L11r2Le(ρ1)β>15.05×108subscriptsuperscript𝑟L111subscriptsuperscript𝑟L2superscripte𝜌1𝛽15.05superscript108r^{\textnormal{L}}_{1}\geq 1-\frac{1-r^{\textnormal{L}}_{2}}{{\mathrm{e}}^{(% \rho-1)\beta}}>1-5.05\times 10^{-8}italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ 1 - divide start_ARG 1 - italic_r start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG roman_e start_POSTSUPERSCRIPT ( italic_ρ - 1 ) italic_β end_POSTSUPERSCRIPT end_ARG > 1 - 5.05 × 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT, which is practically infeasible.

Our theoretical findings show that both AMP and DMP affect the minority and majority classes differently, and the difference is primarily determined by the imbalance ratio ρ𝜌\rhoitalic_ρ. However, directly manipulating ρ𝜌\rhoitalic_ρ is tricky in practice as it requires sampling new nodes and edges from an unknown underlying graph generation model, or at least, simulating the process by oversampling.

Refer to caption
Figure 3: The proposed Bat (BAlanced Topological augmentation) framework, best viewed in color.

A closer look at AMP & DMP in practice. To verify the theoretical results, and to provide more insights on how to mitigate the bias brought about by AMP and DMP in practice, we conduct a fine-grained empirical analysis on a real-world task. Results are detailed in Fig. 2111Results obtained by training a GCN on the PubMed dataset.. Starting from Fig. 2(a), we can first observe that the minority class 1111 has a larger proportion of nodes with high α𝛼\alphaitalic_α or δ𝛿\deltaitalic_δ than the majority class 2222, i.e., minority class 1111 has higher average α𝛼\alphaitalic_α and δ𝛿\deltaitalic_δ (specifically, α1/α2=1.357/0.179subscript𝛼1subscript𝛼21.3570.179\alpha_{1}/\alpha_{2}=1.357/0.179italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1.357 / 0.179, δ1/δ2=0.040/0.004subscript𝛿1subscript𝛿20.0400.004\delta_{1}/\delta_{2}=0.040/0.004italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.040 / 0.004), which is consistent with our theoretical findings. Further, Fig. 2(b) shows that both AMP and DMP significantly reduce the prediction accuracy, especially for the minority class that inherently has poorer data representation. This can be explained from a graph signal denoising perspective (Nt & Maehara, 2019): AMP introduces additional noise from dissimilar nodes, and DMP leads to less efficient label propagation/denoising, thus their impact is particularly significant on minority classes that are more susceptible to noise (Johnson & Khoshgoftaar, 2019) due to poor representation in the feature space. We further notice an intriguing fact that, at sample-level, the impact of AMP/DMP is concentrated on a small fraction of minority class nodes with large α𝛼\alphaitalic_α or δ𝛿\deltaitalic_δ (e.g., the α1𝛼1\alpha\geq 1italic_α ≥ 1 / δ=1𝛿1\delta=1italic_δ = 1 part in Fig. 2(a)). In other words, one can surrogate the tricky manipulation of ρ𝜌\rhoitalic_ρ and directly mitigate AMP/DMP by locating and rectifying a small number of critical nodes, and this exactly motivates our subsequent studies.

3 Handling Class Imbalance from a Topological Perspective

Armed with the findings from Section 2, we now discuss how to devise a practical strategy to mitigate the error and bias induced by graph topology in CIGL. Earlier analyses have shown that this can be achieved by identifying and rectifying the critical nodes that are highly influenced by AMP/DMP. This naturally poses two challenging questions: (i) How can critical nodes be located as the direct calculation of α𝛼\alphaitalic_α/δ𝛿\deltaitalic_δ using ground-truth labels is not possible? (ii) Subsequently, how can critical nodes be rectified and minimize the negative impact caused by AMP and DMP?

In answering the above questions, we devise a lightweight framework Bat (BAlanced Topological augmentation) for handling the topology-sourced errors and biases in CIGL. Specifically, for locating the misclassified nodes, Bat leverages model-based prediction uncertainty to assess the risk of potential misclassification caused by AMP/DMP for each node (§ 3.1). Then to rectify a misclassified node, we estimate a posterior likelihood of each node being in each class (§ 3.2) and dynamically augment the misclassified node’s topological context based on our risk scores and posterior likelihoods (§ 3.3) thereby mitigating the impact of AMP and DMP. An overview of the proposed Bat framework is shown in Fig. 3.

3.1 Node Misclassification Risk Estimation

We now elaborate on the technical details of Bat. As discussed earlier, our first step is to locate the critical nodes that are highly influenced by AMP/DMP. Given the unavailability of ground-truth labels, direct computation of AMP/DMP coefficient is infeasible in practice. Fortunately, recent studies have shown that conflicting or lack of information from the neighborhood can disturb GNN learning and the associated graph-denoising process for affected nodes (Nt & Maehara, 2019; Wu et al., 2019; Ma et al., 2021). This further yields high vacuity or dissonance uncertainty (Stadler et al., 2021; Zhao et al., 2020) in the prediction. This motivates us to exploit the model prediction uncertainty to estimate nodes’ risk of being misclassified due to AMP/DMP.

Uncertainty quantification. While there exist many techniques for uncertainty quantification (e.g., Bayesian-based (Zhang et al., 2019; Hasanzadeh et al., 2020), Jackknife sampling (Kang et al., 2022a)), they often either have to modify the model architecture, and/or impose additional computational overhead. In this study, we aim to streamline the design of Bat for optimal efficiency and adaptability. To this end, we employ an efficient and highly effective approach to uncertainty quantification. Formally, let C𝐶Citalic_C be the number of classes. For a node v𝑣vitalic_v, consider model F(;Θ)𝐹ΘF(\cdot;\Theta)italic_F ( ⋅ ; roman_Θ )’s predicted probability vector 𝒑^v=F(𝑨,𝑿;Θ)vsubscript^𝒑𝑣𝐹subscript𝑨𝑿Θ𝑣\hat{{\bm{p}}}_{v}=F({\bm{A}},{\bm{X}};\Theta)_{v}over^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_F ( bold_italic_A , bold_italic_X ; roman_Θ ) start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, i.e., p^v(j)=(yv=j|𝑨,𝑿,Θ)superscriptsubscript^𝑝𝑣𝑗subscript𝑦𝑣conditional𝑗𝑨𝑿Θ\hat{p}_{v}^{(j)}={\mathds{P}}(y_{v}=j|{\bm{A}},{\bm{X}},\Theta)over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT = blackboard_P ( italic_y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_j | bold_italic_A , bold_italic_X , roman_Θ ). Let y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT be the predicted label. We measure the uncertainty score 𝕌Θ(v)subscript𝕌Θ𝑣{\mathbb{U}}_{\Theta}(v)blackboard_U start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ( italic_v ) by the total variation (TV) distance:

𝕌Θ(v):=dTV(𝒑^v,𝟙y^v)=12j=1C|𝒑^v(j)𝟙y^v(j)|[0,1].assignsubscript𝕌Θ𝑣subscript𝑑TVsubscript^𝒑𝑣subscript1subscript^𝑦𝑣12superscriptsubscript𝑗1𝐶superscriptsubscript^𝒑𝑣𝑗superscriptsubscript1subscript^𝑦𝑣𝑗01{\mathbb{U}}_{\Theta}(v):=d_{\text{TV}}(\hat{{\bm{p}}}_{v},\mathds{1}_{\hat{y}% _{v}})=\frac{1}{2}\sum_{j=1}^{C}|\hat{{\bm{p}}}_{v}^{(j)}-\mathds{1}_{\hat{y}_% {v}}^{(j)}|\in[0,1].blackboard_U start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ( italic_v ) := italic_d start_POSTSUBSCRIPT TV end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , blackboard_1 start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | over^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - blackboard_1 start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | ∈ [ 0 , 1 ] .

(3)

Intuitively, a node has higher uncertainty if the model is less confident about its current prediction. We remark that this metric can be naturally replaced by other uncertainty measures (e.g., information entropy or more complex ones) with additional computation cost, yet the impact on performance is marginal. Please refer to the ablation study provided in Appendix C.1 for more details.

Imbalance-calibrated misclassification risk. Due to the lack of training instances, minority classes generally exhibit higher uncertainty. Therefore, using 𝕌Θ()subscript𝕌Θ{\mathbb{U}}_{\Theta}(\cdot)blackboard_U start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ( ⋅ ) directly as the risk score would treat most minority-class nodes as high-risk, which is contrary to our intention of rectifying the false negatives (i.e., minority nodes wrongly predicted as majority-class) that cause bias in CIGL. To cope with this, we propose imbalance-aware calibration for misclassification risk scores. For each class i𝑖iitalic_i, let 𝒱^i:={u𝒱|y^u=i}assignsubscript^𝒱𝑖conditional-set𝑢𝒱subscript^𝑦𝑢𝑖\hat{{\mathcal{V}}}_{i}:=\{u\in{\mathcal{V}}|\hat{y}_{u}=i\}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := { italic_u ∈ caligraphic_V | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_i } and 𝒱^iL:={u𝒱L|yu=i}assignsubscriptsuperscript^𝒱L𝑖conditional-set𝑢superscript𝒱Lsubscript𝑦𝑢𝑖\hat{{\mathcal{V}}}^{\textnormal{L}}_{i}:=\{u\in{\mathcal{V}}^{\textnormal{L}}% |y_{u}=i\}over^ start_ARG caligraphic_V end_ARG start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := { italic_u ∈ caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_i }. For node v𝑣vitalic_v with predicted label y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, we define its risk rvsubscript𝑟𝑣r_{v}italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT as:

rv:=𝕌Θ(v)maxj=1C|𝒱jL|/|𝒱y^vL|[0,1].assignsubscript𝑟𝑣subscript𝕌Θ𝑣superscriptsubscript𝑗1𝐶subscriptsuperscript𝒱L𝑗subscriptsuperscript𝒱Lsubscript^𝑦𝑣01\textstyle r_{v}:=\frac{{\mathbb{U}}_{\Theta}(v)}{\max_{j=1}^{C}|{\mathcal{V}}% ^{\textnormal{L}}_{j}|/|{\mathcal{V}}^{\textnormal{L}}_{\hat{y}_{v}}|}\in[0,1].italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT := divide start_ARG blackboard_U start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ( italic_v ) end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | / | caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∈ [ 0 , 1 ] . (4)

Intuitively speaking, Eq. (4) calibrates v𝑣vitalic_v’s prediction uncertainty by a label imbalance score maxj=1C|𝒱jL|/|𝒱y^vL|superscriptsubscript𝑗1𝐶subscriptsuperscript𝒱L𝑗subscriptsuperscript𝒱Lsubscript^𝑦𝑣\max_{j=1}^{C}|{\mathcal{V}}^{\textnormal{L}}_{j}|/|{\mathcal{V}}^{\textnormal% {L}}_{\hat{y}_{v}}|roman_max start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | / | caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT |. Minority classes with smaller labeled sets 𝒱iLsubscriptsuperscript𝒱L𝑖{\mathcal{V}}^{\textnormal{L}}_{i}caligraphic_V start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will be discounted more.

Empirical validation. We validate the effectiveness of the proposed node risk assessment method, as shown in Fig. 4. The results indicate that our approach can accurately estimate node misclassification risk across various real-world CIGL tasks while enjoying computational efficiency.

Refer to caption
Figure 4: The negative correlation between the estimated node risk (x-axis) and the prediction accuracy (y-axis). We apply 10 sliding windows to compute the mean and deviation of the accuracy.

3.2 Posterior Likelihood Estimation

With the estimated risk scores of being affected by AMP/DMP, we move to the next question: how to rectify high-risk nodes with topological augmentation? As high-risk nodes are prone to misclassification, their true labels are more likely to be among the non-predicted classes jy^v𝑗subscript^𝑦𝑣j\neq\hat{y}_{v}italic_j ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT. This motivates us to investigate schemes to harness information from these non-predicted classes. Since uniformly drawing from all classes probably introduces noise to learning, we propose to estimate the posterior likelihood s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT that a high-risk node v𝑣vitalic_v belongs to each class j𝑗jitalic_j after observing the current predictions. To estimate s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, we introduce a zeroth-order scheme and a first-order scheme with 𝒪(|𝒱|C)𝒪𝒱𝐶{\mathcal{O}}(|{\mathcal{V}}|C)caligraphic_O ( | caligraphic_V | italic_C ) and 𝒪(||C)𝒪𝐶{\mathcal{O}}(|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_E | italic_C ) time complexity, respectively. Please refer to §4 for a practical complexity analysis. We do not employ higher-order schemes due to the 𝒪(|𝒱|k1||C)𝒪superscript𝒱𝑘1𝐶{\mathcal{O}}(|{\mathcal{V}}|^{k-1}|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_V | start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT | caligraphic_E | italic_C ) time complexity of the k𝑘kitalic_kth-order scheme.

Zeroth-order estimation. A natural approach is to utilize the predicted probabilities p^v(j)superscriptsubscript^𝑝𝑣𝑗\hat{p}_{v}^{(j)}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT. As we have shown, the predicted label y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT of a high-risk node v𝑣vitalic_v is very likely to be wrong. Thus, we define the posterior likelihood s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT as the conditional probability given that the class is not y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, i.e.,

s^v(j):=assignsuperscriptsubscript^𝑠𝑣𝑗absent\displaystyle\hat{s}_{v}^{(j)}:={}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT := y𝒑^v[y=j|yy^v]subscriptsimilar-to𝑦subscript^𝒑𝑣delimited-[]𝑦conditional𝑗𝑦subscript^𝑦𝑣\displaystyle{\mathds{P}}_{y\sim\hat{\bm{p}}_{v}}[y=j|y\neq\hat{y}_{v}]blackboard_P start_POSTSUBSCRIPT italic_y ∼ over^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_y = italic_j | italic_y ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ] (5)
=\displaystyle={}= {p^v(j)/(1p^v(y^v)),if jy^v,0,otherwise.casessuperscriptsubscript^𝑝𝑣𝑗1superscriptsubscript^𝑝𝑣subscript^𝑦𝑣if 𝑗subscript^𝑦𝑣0otherwise\displaystyle\begin{cases}\hat{p}_{v}^{(j)}/(1-\hat{p}_{v}^{(\hat{y}_{v})}),&% \text{if }j\neq\hat{y}_{v},\\ 0,&\text{otherwise}.\end{cases}{ start_ROW start_CELL over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT / ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) , end_CELL start_CELL if italic_j ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW

Intuitively, the zeroth-order posterior likelihoods s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT are consistent with the predicted probabilities p^v(j)superscriptsubscript^𝑝𝑣𝑗\hat{p}_{v}^{(j)}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT except for the wrongly predicted label j=y^v𝑗subscript^𝑦𝑣j=\hat{y}_{v}italic_j = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT. This can be computed efficiently on GPU in matrix form.

First-order estimation via random walk. We further explore the local topology for s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT estimation. Since neighboring nodes on a homophily graph tend to share labels, we consider a 1-step random walk starting from node v𝑣vitalic_v. Let 𝒩(v)𝒩𝑣{\mathcal{N}}(v)caligraphic_N ( italic_v ) be the neighboring node set of v𝑣vitalic_v, and let v𝒩(v)similar-tosuperscript𝑣𝒩𝑣v^{\prime}\sim{\mathcal{N}}(v)italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ caligraphic_N ( italic_v ) denote the ending node of the random walk. We define s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT as the conditional probability that vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is predicted as class j𝑗jitalic_j given that vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not predicted as class y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, i.e.,

s^v(j):=assignsuperscriptsubscript^𝑠𝑣𝑗absent\displaystyle\hat{s}_{v}^{(j)}:={}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT := v𝒩(v)[y^v=j|y^vy^v]subscriptsimilar-tosuperscript𝑣𝒩𝑣delimited-[]subscript^𝑦superscript𝑣conditional𝑗subscript^𝑦superscript𝑣subscript^𝑦𝑣\displaystyle{\mathds{P}}_{v^{\prime}\sim{\mathcal{N}}(v)}[\hat{y}_{v^{\prime}% }=j|\hat{y}_{v^{\prime}}\neq\hat{y}_{v}]blackboard_P start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT [ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_j | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ] (6)
=\displaystyle={}= {|{v𝒩(v)|y^v=j}||𝒩(v)||{v𝒩(v)|y^v=y^v}|,if jy^v,0,otherwise.casesconditional-setsuperscript𝑣𝒩𝑣subscript^𝑦superscript𝑣𝑗𝒩𝑣conditional-setsuperscript𝑣𝒩𝑣subscript^𝑦superscript𝑣subscript^𝑦𝑣if 𝑗subscript^𝑦𝑣0otherwise\displaystyle\begin{cases}\frac{|\{v^{\prime}\in{\mathcal{N}}(v)|\hat{y}_{v^{% \prime}}=j\}|}{|{\mathcal{N}}(v)|-|\{v^{\prime}\in{\mathcal{N}}(v)|\hat{y}_{v^% {\prime}}=\hat{y}_{v}\}|},&\text{if }j\neq\hat{y}_{v},\\ 0,&\text{otherwise}.\end{cases}{ start_ROW start_CELL divide start_ARG | { italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_v ) | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_j } | end_ARG start_ARG | caligraphic_N ( italic_v ) | - | { italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_N ( italic_v ) | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } | end_ARG , end_CELL start_CELL if italic_j ≠ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW

With first-order estimation, s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT is proportional to the label frequency among adjacent nodes. Different from the zeroth-order scheme, this scheme relies on both node-level predictions and local connectivity patterns. The computation can be done via sparse matrix operation with 𝒪(||C)𝒪𝐶{\mathcal{O}}(|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_E | italic_C ) time complexity. As a remark, although this scheme can extend to k𝑘kitalic_k-step random walks, we do not employ them due to the 𝒪(|𝒱|k1||C)𝒪superscript𝒱𝑘1𝐶{\mathcal{O}}(|{\mathcal{V}}|^{k-1}|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_V | start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT | caligraphic_E | italic_C ) complexity of exact computation and the high variance of stochastic computation.

Empirical validation. Figure 5 compares the two schemes in practice. Results show that all high-risk (rv>0subscript𝑟𝑣0r_{v}>0italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT > 0) minority nodes are misclassified, and both schemes can effectively find alternatives with significantly higher chances to be the ground truth class for high-risk nodes.

Refer to caption
Figure 5: The minority-class accuracy of model prediction y^v=F(𝑨,𝑿;Θ)subscript^𝑦𝑣𝐹𝑨𝑿Θ\hat{y}_{v}=F({\bm{A}},{\bm{X}};\Theta)over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_F ( bold_italic_A , bold_italic_X ; roman_Θ ), and max-likelihood-based candidate selection y^vs=argmax(𝒔^v)subscriptsuperscript^𝑦𝑠𝑣argmaxsubscript^𝒔𝑣\hat{y}^{s}_{v}={\operatorname*{arg\,max}(\hat{{\bm{s}}}_{v})}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR ( over^ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ), on PubMed dataset. Note that this is just an illustrative example using argmax(𝒔^v)argmaxsubscript^𝒔𝑣{\operatorname*{arg\,max}(\hat{{\bm{s}}}_{v})}start_OPERATOR roman_arg roman_max end_OPERATOR ( over^ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ). In practice, we consider the whole 𝒔^vsubscript^𝒔𝑣\hat{{\bm{s}}}_{v}over^ start_ARG bold_italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT when sampling virtual edges, as described in Section 3.3.

3.3 Virtual Topology Augmentation

Finally, we discuss how to mitigate AMP and DMP via topology augmentation using our node risk scores and the posterior likelihoods. The general idea is to augment the local topology of high-risk nodes so as to integrate information from nodes that share similar patterns (mitigate AMP), even if they are not closely adjacent to each other in the graph topology (mitigate DMP), thus achieving less-biased CIGL. A straightforward way is to connect high-risk nodes to nodes from high-likelihood classes in the original graph. However, this can be problematic in practice as a massive number of possible edges could be generated, greatly disturbing the original topology structure.

To achieve efficient augmentation without disrupting the graph topology, we create virtual nodes (one per class) as “shortcuts” connecting to high-risk nodes according to posterior likelihoods. These shortcuts aggregate and pass class information to high-risk nodes from nodes that exhibit similar patterns (even if they are distant in the original graph), thus mitigating both AMP and DMP. Formally, for each class j𝑗jitalic_j, we build a virtual node vjsubscriptsuperscript𝑣𝑗v^{*}_{j}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with feature 𝒙vj:=v𝒱^j𝒙v/|𝒱^j|assignsubscript𝒙superscriptsubscript𝑣𝑗subscript𝑣subscript^𝒱𝑗subscript𝒙𝑣subscript^𝒱𝑗{\bm{x}}_{v_{j}^{*}}:=\sum_{v\in\hat{{\mathcal{V}}}_{j}}{\bm{x}}_{v}/|\hat{{% \mathcal{V}}}_{j}|bold_italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_v ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT / | over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | and label yvj:=jassignsubscript𝑦superscriptsubscript𝑣𝑗𝑗y_{v_{j}^{*}}:=jitalic_y start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT := italic_j, and compute the average risk r¯j:=v𝒱^jrv/|𝒱^j|assignsubscript¯𝑟𝑗subscript𝑣subscript^𝒱𝑗subscript𝑟𝑣subscript^𝒱𝑗\bar{r}_{j}:=\sum_{v\in\hat{{\mathcal{V}}}_{j}}r_{v}/|\hat{{\mathcal{V}}}_{j}|over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_v ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT / | over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT |. Then for each node v𝑣vitalic_v, we connect a virtual edge between v𝑣vitalic_v and virtual node vjsubscriptsuperscript𝑣𝑗v^{*}_{j}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability proportional to the posterior likelihood s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT. However, if the connection probability is exactly s^v(j)superscriptsubscript^𝑠𝑣𝑗\hat{s}_{v}^{(j)}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, there will be many unnecessary virtual edges for low-risk nodes. Hence, we introduce a discount factor γvsubscript𝛾𝑣\gamma_{v}italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT based on risk scores and connect the virtual edge with probability qv(j):=γvs^v(j)assignsuperscriptsubscript𝑞𝑣𝑗subscript𝛾𝑣superscriptsubscript^𝑠𝑣𝑗q_{v}^{(j)}:=\gamma_{v}\hat{s}_{v}^{(j)}italic_q start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT := italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT. To design the optimal γvsubscript𝛾𝑣\gamma_{v}italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, we propose to solve the following constrained quadratic program:

min𝜸𝟎(v𝒱(rvr¯y^v)γv+12𝜸22),subscript𝜸0subscript𝑣𝒱subscript𝑟𝑣subscript¯𝑟subscript^𝑦𝑣subscript𝛾𝑣12superscriptsubscriptnorm𝜸22\min_{\bm{\gamma}\geq\bm{0}}\bigg{(}{-\sum_{v\in{\mathcal{V}}}(r_{v}-\bar{r}_{% \hat{y}_{v}})\gamma_{v}+\frac{1}{2}\|\bm{\gamma}\|_{2}^{2}}\bigg{)},roman_min start_POSTSUBSCRIPT bold_italic_γ ≥ bold_0 end_POSTSUBSCRIPT ( - ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_V end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT - over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_γ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (7)

where the first term encourages virtual edges for high-risk nodes, and the second term is to minimize the number of virtual edges. The closed-form solution is γv=max(rvr¯y^v,0)subscript𝛾𝑣subscript𝑟𝑣subscript¯𝑟subscript^𝑦𝑣0\gamma_{v}=\max(r_{v}-\bar{r}_{\hat{y}_{v}},0)italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = roman_max ( italic_r start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT - over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 0 ) (Antoniadis & Fan, 2001), which avoids virtual edges for low-risk nodes as we desire. We now summarize the procedure of Bat in Algorithm 1.

Algorithm 1 Bat: Topological Balanced Augmentation
0:  Class-imbalanced graph 𝒢:{𝑨,𝑿}:𝒢𝑨𝑿{\mathcal{G}}:\{{\bm{A}},{\bm{X}}\}caligraphic_G : { bold_italic_A , bold_italic_X };
1:  Initialize: node classifier F(;Θ)𝐹ΘF(\cdot;\Theta)italic_F ( ⋅ ; roman_Θ );
2:  while not converged do
3:     𝑷^F(𝑨,𝑿;Θ)^𝑷𝐹𝑨𝑿Θ\hat{{\bm{P}}}\leftarrow F({\bm{A}},{\bm{X}};\Theta)over^ start_ARG bold_italic_P end_ARG ← italic_F ( bold_italic_A , bold_italic_X ; roman_Θ );
4:     𝒚^argmaxaxis=1(𝑷^)^𝒚subscriptargmax𝑎𝑥𝑖𝑠1^𝑷\hat{{\bm{y}}}\leftarrow\text{argmax}_{axis=1}(\hat{{\bm{P}}})over^ start_ARG bold_italic_y end_ARG ← argmax start_POSTSUBSCRIPT italic_a italic_x italic_i italic_s = 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_P end_ARG ); \vartriangleright Model predictions 𝒚^^𝒚\hat{{\bm{y}}}over^ start_ARG bold_italic_y end_ARG
5:     𝒓NodeRiskEst(𝑷^{\bm{r}}\leftarrow\texttt{NodeRiskEst}(\hat{{\bm{P}}}bold_italic_r ← NodeRiskEst ( over^ start_ARG bold_italic_P end_ARG, 𝒚^)\hat{{\bm{y}}})over^ start_ARG bold_italic_y end_ARG ); \vartriangleright Eq. (3) - (4)
6:     𝑺^PosteriorEst(𝑨,𝑷^,𝒚^)^𝑺PosteriorEst𝑨^𝑷^𝒚\hat{{\bm{S}}}\leftarrow\texttt{PosteriorEst}({\bm{A}},\hat{{\bm{P}}},\hat{{% \bm{y}}})over^ start_ARG bold_italic_S end_ARG ← PosteriorEst ( bold_italic_A , over^ start_ARG bold_italic_P end_ARG , over^ start_ARG bold_italic_y end_ARG ); \vartriangleright Eq. (5) - (6)
7:     for class j=1𝑗1j=1italic_j = 1 to C𝐶Citalic_C do
8:        𝒙vjv𝒱^j𝒙v/|𝒱^j|subscript𝒙superscriptsubscript𝑣𝑗subscript𝑣subscript^𝒱𝑗subscript𝒙𝑣subscript^𝒱𝑗{\bm{x}}_{v_{j}^{*}}\leftarrow\sum_{v\in\hat{{\mathcal{V}}}_{j}}{\bm{x}}_{v}/|% \hat{{\mathcal{V}}}_{j}|bold_italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_v ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT / | over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT |;
9:        vj:(𝒙vj,j):subscriptsuperscript𝑣𝑗subscript𝒙superscriptsubscript𝑣𝑗𝑗v^{*}_{j}:({\bm{x}}_{v_{j}^{*}},j)italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : ( bold_italic_x start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_j ) \vartriangleright Virtual node vjsubscriptsuperscript𝑣𝑗v^{*}_{j}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for class j𝑗jitalic_j
10:     end for
11:     𝒱{vj|1jC}superscript𝒱conditional-setsubscriptsuperscript𝑣𝑗1𝑗𝐶{\mathcal{V}}^{*}\leftarrow\{v^{*}_{j}|1\leq j\leq C\}caligraphic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← { italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | 1 ≤ italic_j ≤ italic_C } \vartriangleright Virtual node set 𝒱superscript𝒱{\mathcal{V}}^{*}caligraphic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
12:     𝑸𝑺^𝜸superscript𝑸direct-product^𝑺𝜸{\bm{Q}}^{*}\leftarrow\hat{{\bm{S}}}\odot\bm{\gamma}bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← over^ start_ARG bold_italic_S end_ARG ⊙ bold_italic_γ \vartriangleright Virtual link prob. 𝑸superscript𝑸{\bm{Q}}^{*}bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by Eq. (7)
13:     𝑸similar-tosuperscriptsuperscript𝑸{\mathcal{E}}^{*}\sim{\bm{Q}}^{*}caligraphic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∼ bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; \vartriangleright Sample virtual edges superscript{\mathcal{E}}^{*}caligraphic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT w.r.t 𝑸superscript𝑸{\bm{Q}}^{*}bold_italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
14:     Derive 𝑿,𝑨superscript𝑿superscript𝑨{\bm{X}}^{*},{\bm{A}}^{*}bold_italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from 𝒱𝒱,𝒱superscript𝒱superscript{\mathcal{V}}\cup{\mathcal{V}}^{*},{\mathcal{E}}\cup{\mathcal{E}}^{*}caligraphic_V ∪ caligraphic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_E ∪ caligraphic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT;
15:     Update ΘΘ\Thetaroman_Θ with augmented graph 𝒢:{𝑨,𝑿}:superscript𝒢superscript𝑨superscript𝑿{\mathcal{G}}^{*}:\{{\bm{A}}^{*},{\bm{X}}^{*}\}caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : { bold_italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT };
16:  end while
17:  Return: a balanced node classifier F(𝑨,𝑿;Θ)𝐹𝑨𝑿ΘF({\bm{A}},{\bm{X}};\Theta)italic_F ( bold_italic_A , bold_italic_X ; roman_Θ );

Complexity Analysis of Bat. Bat with 0th/1st-order estimation scales linearly with the number of nodes/edges (i.e., with 𝒪(|𝒱|C)𝒪𝒱𝐶{\mathcal{O}}(|{\mathcal{V}}|C)caligraphic_O ( | caligraphic_V | italic_C ) or 𝒪(||C)𝒪𝐶{\mathcal{O}}(|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_E | italic_C ) complexity). This makes Bat highly efficient and allows dynamically graph augmentation in each training step. Specifically, Bat introduces C𝐶Citalic_C (the number of class, usally small) virtual nodes with 𝒪(n)𝒪𝑛\mathcal{O}(n)caligraphic_O ( italic_n ) edges. Because of the long-tail distribution of node uncertainty and the discount factor used to solve Eq. (7), only a small portion of nodes have positive risks with relatively few (empirically around 1-3%) virtual edges introduced. We provide the scalability results of Bat later in the experiment section. In short, Bat takes milliseconds for a single topological augmentation. Please refer to Table 3 and the corresponding discussion for further details. We also discuss how to further speedup Bat in practice in C.2.

Table 1: Bat significantly boosts existing CIGL techniques, achieving better classification performance (Balanced Acc./Marco-F1) with reduced bias (PerfStd). For each CIGL baseline, we report its performance before and after collaborating with Bat. The average and the best score of all CIGL methods under three settings (base, +Bat0, +Bat1) are also provided, with performance gain of Bat highlighted by ΔΔ\Deltaroman_Δ. Due to space limitation, we report the key results and omit the error bar, full results can be found in Appendix D.
Metric Balanced Acc.\uparrow Macro-F1\uparrow PerfStd\downarrow
CIGL Baseline ERM RW RN RS SM GS GE Avg (ΔΔ\Deltaroman_Δ) Best (ΔΔ\Deltaroman_Δ) Avg (ΔΔ\Deltaroman_Δ) Best (ΔΔ\Deltaroman_Δ) Avg (ΔΔ\Deltaroman_Δ) Best (ΔΔ\Deltaroman_Δ)
Cora GCN base 61.6 67.7 66.6 59.5 58.3 68.0 70.1 64.5 70.1 63.7 70.0 25.7 20.0
+Bat0 65.5 71.0 71.4 72.5 72.2 68.5 72.2 70.5 (+5.9) 72.5 (+2.4) 69.2 (+5.5) 71.6 (+1.7) 16.7 (-9.0) 14.4 (-5.6)
+Bat1 69.8 72.1 71.8 74.2 73.9 71.6 72.6 72.3 (+7.8) 74.2 (+4.1) 71.1 (+7.5) 72.8 (+2.9) 17.6 (-8.0) 15.2 (-4.8)
GAT base 61.5 66.9 66.8 57.8 58.8 64.7 69.8 63.8 69.8 63.1 70.0 26.0 20.1
+Bat0 66.3 71.8 72.1 71.9 70.5 69.3 70.6 70.4 (+6.6) 72.1 (+2.4) 69.0 (+6.0) 70.9 (+0.9) 17.2 (-8.9) 15.1 (-5.0)
+Bat1 70.1 71.6 70.3 73.3 72.2 71.1 71.0 71.4 (+7.6) 73.3 (+3.5) 70.2 (+7.1) 72.3 (+2.4) 18.0 (-8.0) 17.3 (-2.8)
SAGE base 59.2 63.8 65.3 57.8 58.8 61.6 68.8 62.2 68.8 60.9 68.2 27.1 19.8
+Bat0 66.2 70.1 71.3 71.2 70.3 69.9 69.8 69.8 (+7.7) 71.3 (+2.5) 68.9 (+8.0) 70.4 (+2.2) 16.4 (-10.7) 13.3 (-6.5)
+Bat1 66.5 71.1 71.5 73.0 73.0 72.3 71.9 71.4 (+9.2) 73.0 (+4.2) 70.1 (+9.2) 71.7 (+3.5) 16.6 (-10.5) 14.9 (-4.9)
CiteSeer GCN base 37.6 42.5 42.6 39.2 39.3 45.1 56.0 43.2 56.0 36.1 54.5 27.0 16.9
+Bat0 52.7 57.9 57.5 57.9 60.1 57.7 60.6 57.8 (+14.6) 60.6 (+4.6) 56.9 (+20.8) 59.9 (+5.4) 17.6 (-9.4) 13.8 (-3.1)
+Bat1 55.4 58.4 59.3 58.8 62.0 57.6 62.7 59.2 (+16.0) 62.7 (+6.7) 58.4 (+22.3) 62.5 (+8.0) 19.3 (-7.7) 13.9 (-3.0)
GAT base 39.2 41.3 43.2 36.0 37.0 41.8 51.5 41.4 51.5 34.1 48.3 29.0 25.2
+Bat0 55.7 59.3 58.3 60.1 60.6 56.1 60.9 58.7 (+17.3) 60.9 (+9.4) 58.1 (+23.9) 60.0 (+11.7) 16.3 (-12.6) 10.7 (-14.5)
+Bat1 60.3 61.2 59.1 60.3 62.4 57.7 63.5 60.6 (+19.2) 63.5 (+12.0) 59.9 (+25.8) 62.5 (+14.2) 17.7 (-11.3) 13.2 (-12.0)
SAGE base 43.0 45.9 48.6 39.4 38.4 42.2 52.6 44.3 52.6 37.9 51.0 27.1 19.8
+Bat0 55.0 58.0 56.3 61.4 64.1 60.9 64.4 60.0 (+15.7) 64.4 (+11.8) 59.4 (+21.6) 63.9 (+12.8) 17.5 (-9.7) 13.2 (-6.6)
+Bat1 53.2 55.9 56.5 61.9 66.3 62.3 63.8 60.0 (+15.7) 66.3 (+13.8) 59.3 (+21.4) 65.9 (+14.9) 18.6 (-8.6) 12.8 (-7.0)
PubMed GCN base 64.2 71.2 71.5 65.0 64.4 74.0 73.7 69.1 74.0 63.5 71.3 23.2 11.9
+Bat0 68.6 74.2 73.2 72.5 73.2 73.1 76.1 73.0 (+3.8) 76.1 (+2.1) 72.0 (+8.5) 75.8 (+4.5) 9.1 (-14.1) 3.4 (-8.6)
+Bat1 67.6 73.4 72.5 72.9 73.1 76.6 76.9 73.3 (+4.1) 76.9 (+2.9) 72.6 (+9.1) 76.9 (+5.6) 9.9 (-13.4) 5.1 (-6.8)
GAT base 65.5 68.4 71.2 65.1 64.8 68.7 73.1 68.1 73.1 62.4 71.8 24.8 10.3
+Bat0 73.2 75.3 75.6 73.3 73.9 74.7 74.3 74.3 (+6.2) 75.6 (+2.4) 73.4 (+11.1) 75.1 (+3.3) 6.5 (-18.2) 3.0 (-7.3)
+Bat1 74.8 74.5 75.2 73.9 74.1 74.4 75.7 74.6 (+6.5) 75.7 (+2.5) 73.9 (+11.5) 75.0 (+3.2) 6.9 (-17.8) 4.6 (-5.7)
SAGE base 67.6 68.0 69.1 69.2 65.0 71.5 71.4 68.8 71.5 64.5 70.1 22.1 11.8
+Bat0 75.3 74.6 74.2 74.9 74.6 74.7 75.9 74.9 (+6.1) 75.9 (+4.3) 74.2 (+9.7) 75.3 (+5.3) 7.6 (-14.4) 3.4 (-8.4)
+Bat1 77.4 75.4 75.3 75.8 77.3 76.1 76.5 76.3 (+7.4) 77.4 (+5.8) 75.8 (+11.3) 76.9 (+6.9) 6.8 (-15.3) 4.1 (-7.7)
  • *Bat0/Bat1: Bat with 0thsuperscript0th0^{\textnormal{th}}0 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT/1stsuperscript1st1^{\textnormal{st}}1 start_POSTSUPERSCRIPT st end_POSTSUPERSCRIPT-order posterior likelihood estimation. ERM: Empirical Risk Minimization (standard training), RW: Reweight (Japkowicz & Stephen, 2002), RN: ReNode (Chen et al., 2021), RS: Resampling (Japkowicz & Stephen, 2002), SM: SMOTE (Chawla et al., 2002), GS: GraphSMOTE (Zhao et al., 2021b), GE: GraphENS (Park et al., 2022).

4 Experiments

We carry out systematic experiments and analysis to validate Bat in the following aspects: (i) Effectiveness in both promoting imbalanced node classification and mitigating the prediction bias between different classes. (ii) Versatility in cooperating with and further boosting various CIGL techniques and GNN backbones. (iii) Robustness to extreme class imbalance. (iv) Efficiency in real-world applications.

Experiment Protocol. We validate Bat on five benchmark datasets for semi-supervised node classification, including the Cora, CiteSeer, PubMed from Plantoid graphs (Sen et al., 2008), and larger-scale CS, Physics from co-author networks (Shchur et al., 2018) with high-dimensional features. Following the same setting as prior studies (Park et al., 2022; Song et al., 2022a; Zhao et al., 2021b), we select half of the classes as minority. The imbalance ratio ρ=nmax/nmin1𝜌subscript𝑛𝑚𝑎𝑥subscript𝑛𝑚𝑖𝑛1\rho=n_{max}/n_{min}\geq 1italic_ρ = italic_n start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT / italic_n start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ≥ 1 is the ratio between the size of the largest class to the smallest class, i.e., more imbalance \Leftrightarrow higher IR. Detailed data statistics and class distributions can be found in Appendix B.1. We test Bat with six CIGL techniques (Park et al., 2022; Chen et al., 2021; Zhao et al., 2021b; Chawla et al., 2002; Japkowicz & Stephen, 2002) and three GNN backbones (Veličković et al., 2018; Hamilton et al., 2017; Welling & Kipf, 2016) under all possible combinations to fully validate Bat’s effectiveness and versatility in practice. Note that although there are other techniques available for CIGL (Hong et al., 2021; Kang et al., 2019; Shi et al., 2020), previous studies (Park et al., 2022; Song et al., 2022a) have shown they are generally outperformed by the baselines we use. Detailed settings can be found in Appendix B.2. To ensure a comprehensive evaluation, we employ three metrics to assess both the classification performance (Balanced Accuracy, Macro-F1) and the model predictive bias (PerfStd, i.e., the standard deviation of accuracy scores across all classes). Lower PerfStd indicates smaller performance gap between all majority and minority classes, and thus smaller predictive bias. For clarity, we use \uparrow/\downarrow to denote larger/smaller is better for each metric.

Bat significantly boosts various CIGL techniques. We report the main results in Table 1 (IR=10). In all settings (3 datasets×\times×3 backbones×\times×7 baselines×\times×3 metrics), Bat achieves significant and consistent performance improvements over other CIGL techniques, which also yields new state-of-the-art performance. Specifically: (1) By mitigating AMP and DMP, Bat further boosts the best CIGL baseline by a large margin, e.g., it boosts the best balanced accuracy score by 4.1/13.8/5.8 on Cora/CiteSeer/PubMed datasets. (2) In addition to better classification performance, Bat also greatly reduces the predictive bias in CIGL, with up to 10.7/14.5/18.2 average performance deviation reduction on Cora/CiteSeer/PubMed. (3) Compared with Bat0, Bat1 achieves better classification performance with first-order posterior likelihood estimation. But we also note that Bat0 performs better in terms of reducing predictive bias and is more computationally efficient, as we will discuss later in the scalability experiments (Table 3).

Table 2: Bat deliver consistent and significant performance gain to CIGL methods under varying types and levels of class imbalance. The numbers in brackets are the performance gain brought about by Bat over Base/BestCIGL method. We report the Balanced Accuracy here, full results with other metrics can be found in Appendix D.3.
Dataset Cora CiteSeer PubMed CS Physics
Step IR 10 20 10 20 10 20 10 20 10 20
Base 61.6 52.7 37.6 34.2 64.2 60.8 75.4 65.3 80.1 67.7
+ Bat 69.8 (+8.2) 71.3 (+18.5) 55.4 (+17.7) 51.3 (+17.1) 68.6 (+4.4) 63.3 (+2.5) 82.6 (+7.2) 79.9 (+14.5) 87.6 (+7.5) 88.0 (+20.2)
BestCIGL 70.1 66.5 56.0 47.2 74.0 71.1 84.1 81.3 89.4 85.7
+ Bat 74.2 (+4.1) 71.6 (+5.1) 62.7 (+6.7) 62.5 (+15.3) 76.9 (+2.9) 75.7 (+4.6) 86.3 (+2.2) 85.6 (+4.3) 91.2 (+1.9) 90.9 (+5.2)
Natural IR 50 100 50 100 50 100 50 100 50 100
Base 60.1 47.0 28.1 21.9 55.1 46.4 72.7 59.2 80.7 64.7
+ Bat 68.7 (+8.6) 69.6 (+22.6) 54.9 (+26.9) 48.9 (+27.0) 67.2 (+12.1) 60.7 (+14.3) 78.6 (+5.9) 74.7 (+15.5) 88.8 (+8.1) 87.8 (+23.2)
BestCIGL 70.0 66.2 54.5 45.0 71.3 68.9 83.9 80.9 89.5 86.2
+ Bat 72.8 (+2.9) 70.2 (+4.0) 62.5 (+8.0) 62.1 (+17.1) 76.9 (+5.6) 74.9 (+6.0) 85.4 (+1.6) 84.6 (+3.7) 90.7 (+1.2) 90.0 (+3.8)
  • *Base: vanilla GCN model; Base+Bat: applying Bat without any other CIGL method; BestCIGL: best CIGL baseline w/o Bat; BestCIGL+Bat: best CIGL baseline w/ Bat;

Bat is robust even under extreme class imbalance. We further extend Table 1 and test Bat’s robustness to varying types and levels of imbalance, as reported in Table 2. In this experiment, we extend the step imbalance ratio from 10 (used in Table 1) to 20 to test Bat under even more challenging class imbalance scenarios. In addition, we consider the natural (long-tail) class imbalance (Park et al., 2022) that is commonly observed in real-world graphs with IR of 50 and 100. Datasets from (Shchur et al., 2018) (CS, Physics) are also included to test Bat on large-scale tasks. Results show that: (1) Bat is robust to extreme class imbalance, and it consistently boosts the CIGL performance by a significant margin under varying types and levels of imbalance. (2) The performance drop from increasing IR is significantly lowered by Bat, i.e., applying Bat improves model’s robustness to extreme class imbalance. (3) Bat’s advantage is even more prominent under higher class imbalance, e.g., on Cora with step IR, the performance gain of applying Bat on Base raised from 8.2 to 18.5 when IR increased from 10 to 20, and similar patterns can be observed in other settings.

Bat effectively alleviates both AMP and DMP. We further design experiments to verify to what extent Bat can effectively handle the topological challenges identified in this paper, i.e., ambivalent and distant message-passing. Specifically, we investigate whether Bat can improve the prediction accuracy of minority class nodes that are highly influenced by AMP/DMP, i.e., with high heterophilic neighbor ratio/long distance to supervision. Results are shown in Fig. 6 (5 independent runs with GCN classifier, IR=10). As can be observed, Bat effectively alleviates the negative impact of AMP and DMP and helps node classifiers to achieve better performance in minority classes.

Refer to caption
(a) Bat mitigates AMP in multiple CIGL tasks.
Refer to caption
(b) Bat mitigates DMP in multiple CIGL tasks.
Figure 6: Bat effectively alleviates both AMP and DMP.

Bat is computationally efficient. As previously discussed in the complexity analysis, Bat with 0th/1st-order estimation scales linearly with the number of nodes/edges (i.e., with 𝒪(|𝒱|C)𝒪𝒱𝐶{\mathcal{O}}(|{\mathcal{V}}|C)caligraphic_O ( | caligraphic_V | italic_C ) or 𝒪(||C)𝒪𝐶{\mathcal{O}}(|{\mathcal{E}}|C)caligraphic_O ( | caligraphic_E | italic_C ) complexity). Since all the operations can be executed in parallel in matrix form, Bat0/Bat1 has 𝒪(|𝒱|CD)𝒪𝒱𝐶𝐷{\mathcal{O}}(\frac{|{\mathcal{V}}|C}{D})caligraphic_O ( divide start_ARG | caligraphic_V | italic_C end_ARG start_ARG italic_D end_ARG )/𝒪(||CD)𝒪𝐶𝐷{\mathcal{O}}(\frac{|{\mathcal{E}}|C}{D})caligraphic_O ( divide start_ARG | caligraphic_E | italic_C end_ARG start_ARG italic_D end_ARG ) time complexity, where D𝐷Ditalic_D is the number of available computational units and is usually large for modern GPUs. Table 3 reports the ratio of virtual nodes/edges to the original graph introduced and the running time of Bat. It can be observed that Bat only introduces a small number of virtual nodes/edges, and is highly efficient (taking milliseconds for augmentation) in practice.

Table 3: Efficiency results of Bat0/Bat1.
Dataset ΔΔ\Deltaroman_Δ Nodes (%) ΔΔ\Deltaroman_Δ Edges (%) ΔΔ\Deltaroman_Δ Time (ms)
Cora 0.258% 2.842%/1.509% 4.50/4.65ms
CiteSeer 0.180% 3.715%/1.081% 4.72/4.97ms
PubMed 0.015% 3.175%/1.464% 6.23/6.64ms
CS 0.082% 1.395%/1.053% 16.97/18.61ms
Physics 0.014% 0.797%/0.527% 30.68/31.91ms
  • * Results obtained on an NVIDIA®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Tesla V100 32GB GPU.

Further discussions. We refer the readers to Appendix for reproducibility details (§B), ablation study and extended discussions (§C), and additional empirical results (§D).

5 Related Works

Imbalanced graph learning. Class imbalance is ubiquitous in many machine-learning tasks and has been extensively studied (He & Garcia, 2008; Krawczyk, 2016). However, most of the existing works focus on i.i.d. scenarios, which may not be tailored to the unique characteristics of graph data. To handle imbalanced graph learning, several techniques have been proposed in recent studies (e.g., by adversarial training (Shi et al., 2020; Qu et al., 2021), designing new GNN architectures (Wang et al., 2020; Liu et al., 2021) or loss functions (Song et al., 2022a)), we review the most closely related model-agnostic CR methods here. One of the early works GraphSMOTE (Zhao et al., 2021b) adopts SMOTE (Chawla et al., 2002) oversampling in the node embedding space to synthesize minority nodes and complements the topology with a learnable edge predictor. A more recent work GraphENS (Park et al., 2022) synthesizes the ego network through saliency-based ego network mixing to handle the neighbor-overfitting problem. Most studies are rooted in a class-rebalancing perspective and address the imbalance by node/class-wise reweighting or resampling.

Topology-imbalance in graphs. Topology imbalance is firstly discussed in Chen et al. (2021). They found that “the unequal structure role of labeled nodes” can cause influence conflict, and propose to re-weight the labeled nodes based on a conflict detection measure. Other works further discussed how to better address the issue via position-aware structure learning (Sun et al., 2022), and handle topology-imbalance in fake news detection (Gao et al., 2022) and bankruptcy prediction (Liu et al., 2023). These studies discussed concepts related to “influence conflict/insufficiency” (Chen et al., 2021), which motivated us to investigate AMP/DMP in this work.

6 Conclusion

In this paper, we study class-imbalanced graph learning from a novel topological perspective. We theoretically reveal that two fundamental topological phenomena, i.e., ambivalent and distant message-passing, can greatly exacerbate the predictive bias stemming from class imbalance. Our findings reveal an unexplored avenue that limits the performance of existing class-rebalancing-based CIGL techniques. In light of this, we propose Bat to handle the topological challenges in CIGL by dynamic topological augmentation. Bat is a swift and model-agnostic framework that can seamlessly complement other CIGL techniques, augmenting their performance and mitigating predictive bias. Systematic experiments validate Bat’s superior effectiveness, versatility, robustness, and efficiency across various CIGL tasks.

Acknowledgements

This work is supported by NSF (1947135), the NSF Program on Fairness in AI in collaboration with Amazon (1939725), NIFA (2020-67021-32799), DHS (17STQAC00001-07-00), the C3.ai Digital Transformation Institute, MIT-IBM Watson AI Lab, and IBM-Illinois Discovery Accelerator Institute. The content of the information in this document does not necessarily reflect the position or the policy of the Government or Amazon, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Impact Statements

This paper presents work whose goal is to advance the field of Graph Data Mining. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

References

  • Akoglu et al. (2015) Akoglu, L., Tong, H., and Koutra, D. Graph based anomaly detection and description: a survey. Data mining and knowledge discovery, 29(3):626–688, 2015.
  • Antoniadis & Fan (2001) Antoniadis, A. and Fan, J. Regularization of wavelet approximations. Journal of the American Statistical Association, 96(455):939–967, 2001.
  • Bojchevski & Günnemann (2017) Bojchevski, A. and Günnemann, S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. arXiv preprint arXiv:1707.03815, 2017.
  • Cai et al. (2021) Cai, L., Li, J., Wang, J., and Ji, S. Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5103–5113, 2021.
  • Chawla et al. (2002) Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  • Chen et al. (2021) Chen, D., Lin, Y., Zhao, G., Ren, X., Li, P., Zhou, J., and Sun, X. Topology-imbalance learning for semi-supervised node classification. Advances in Neural Information Processing Systems, 34:29885–29897, 2021.
  • Chien et al. (2020) Chien, E., Peng, J., Li, P., and Milenkovic, O. Adaptive universal generalized pagerank graph neural network. arXiv preprint arXiv:2006.07988, 2020.
  • Cui et al. (2019) Cui, Y., Jia, M., Lin, T.-Y., Song, Y., and Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9268–9277, 2019.
  • Decelle et al. (2011) Decelle, A., Krzakala, F., Moore, C., and Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.
  • Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
  • Fu & He (2021) Fu, D. and He, J. SDG: A simplified and dynamic graph neural network. In Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., and Sakai, T. (eds.), SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, pp.  2273–2277. ACM, 2021. doi: 10.1145/3404835.3463059. URL https://doi.org/10.1145/3404835.3463059.
  • Fu et al. (2023) Fu, D., Zhou, D., Maciejewski, R., Croitoru, A., Boyd, M., and He, J. Fairness-aware clique-preserving spectral clustering of temporal graphs. In Ding, Y., Tang, J., Sequeda, J. F., Aroyo, L., Castillo, C., and Houben, G. (eds.), Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pp.  3755–3765. ACM, 2023. doi: 10.1145/3543507.3583423. URL https://doi.org/10.1145/3543507.3583423.
  • Fu et al. (2024) Fu, D., Hua, Z., Xie, Y., Fang, J., Zhang, S., Sancak, K., Wu, H., Malevich, A., He, J., and Long, B. Vcr-graphormer: A mini-batch graph transformer via virtual connections. CoRR, abs/2403.16030, 2024. doi: 10.48550/ARXIV.2403.16030. URL https://doi.org/10.48550/arXiv.2403.16030.
  • Gao et al. (2022) Gao, L., Song, L., Liu, J., Chen, B., and Shang, X. Topology imbalance and relation inauthenticity aware hierarchical graph attention networks for fake news detection. In Proceedings of the 29th International Conference on Computational Linguistics, pp.  4687–4696, 2022.
  • Gasteiger et al. (2018) Gasteiger, J., Bojchevski, A., and Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
  • Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  • Hasanzadeh et al. (2020) Hasanzadeh, A., Hajiramezanali, E., Boluki, S., Zhou, M., Duffield, N., Narayanan, K., and Qian, X. Bayesian graph neural networks with adaptive connection sampling. In International conference on machine learning, pp.  4094–4104. PMLR, 2020.
  • He & Garcia (2008) He, H. and Garcia, E. A. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, (9):1263–1284, 2008.
  • Holland et al. (1983) Holland, P. W., Laskey, K. B., and Leinhardt, S. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  • Hong et al. (2021) Hong, Y., Han, S., Choi, K., Seo, S., Kim, B., and Chang, B. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  6626–6636, 2021.
  • Hu et al. (2020) Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
  • Huang et al. (2022) Huang, Z., Tang, Y., and Chen, Y. A graph neural network-based node classification model on class-imbalanced graph data. Knowledge-Based Systems, 244:108538, 2022.
  • Japkowicz & Stephen (2002) Japkowicz, N. and Stephen, S. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002.
  • Johnson & Khoshgoftaar (2019) Johnson, J. M. and Khoshgoftaar, T. M. Survey on deep learning with class imbalance. Journal of Big Data, 6(1):1–54, 2019.
  • Kang et al. (2019) Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
  • Kang et al. (2020) Kang, J., He, J., Maciejewski, R., and Tong, H. Inform: Individual fairness on graph mining. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  379–389, 2020.
  • Kang et al. (2022a) Kang, J., Zhou, Q., and Tong, H. Jurygcn: quantifying jackknife uncertainty on graph convolutional networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  742–752, 2022a.
  • Kang et al. (2022b) Kang, J., Zhu, Y., Xia, Y., Luo, J., and Tong, H. Rawlsgcn: Towards rawlsian difference principle on graph convolutional network. In Proceedings of the ACM Web Conference 2022, pp.  1214–1225, 2022b.
  • Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Krawczyk (2016) Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016.
  • Li et al. (2022) Li, X., Wen, L., Deng, Y., Feng, F., Hu, X., Wang, L., and Fan, Z. Graph neural network with curriculum learning for imbalanced node classification. arXiv preprint arXiv:2202.02529, 2022.
  • Liu et al. (2021) Liu, Y., Ao, X., Qin, Z., Chi, J., Feng, J., Yang, H., and He, Q. Pick and choose: a gnn-based imbalanced learning approach for fraud detection. In Proceedings of the Web Conference 2021, pp.  3168–3177, 2021.
  • Liu et al. (2023) Liu, Y., Gao, Z., Liu, X., Luo, P., Yang, Y., and Xiong, H. Qtiah-gnn: Quantity and topology imbalance-aware heterogeneous graph neural network for bankruptcy prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  1572–1582, 2023.
  • Liu et al. (2020) Liu, Z.-Y., Li, S.-Y., Chen, S., Hu, Y., and Huang, S.-J. Uncertainty aware graph gaussian process for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  4957–4964, 2020.
  • Ma et al. (2021) Ma, Y., Liu, X., Zhao, T., Liu, Y., Tang, J., and Shah, N. A unified view on graph neural networks as graph signal denoising. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp.  1202–1211, 2021.
  • Nt & Maehara (2019) Nt, H. and Maehara, T. Revisiting graph neural networks: All we have is low-pass filters. arXiv preprint arXiv:1905.09550, 2019.
  • Pandey et al. (2019) Pandey, B., Bhanodia, P. K., Khamparia, A., and Pandey, D. K. A comprehensive survey of edge prediction in social networks: Techniques, parameters and challenges. Expert Systems with Applications, 124:164–181, 2019.
  • Park et al. (2022) Park, J., Song, J., and Yang, E. Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In International Conference on Learning Representations, 2022.
  • Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • Qu et al. (2021) Qu, L., Zhu, H., Zheng, R., Shi, Y., and Yin, H. Imgagn: Imbalanced network embedding via generative adversarial graph networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp.  1390–1398, 2021.
  • Sen et al. (2008) Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. Collective classification in network data. AI magazine, 29(3):93–93, 2008.
  • Shchur et al. (2018) Shchur, O., Mumme, M., Bojchevski, A., and Günnemann, S. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
  • Shi et al. (2020) Shi, M., Tang, Y., Zhu, X., Wilson, D., and Liu, J. Multi-class imbalanced graph convolutional network learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), 2020.
  • Song et al. (2022a) Song, J., Park, J., and Yang, E. Tam: Topology-aware margin loss for class-imbalanced node classification. In International Conference on Machine Learning, pp.  20369–20383. PMLR, 2022a.
  • Song et al. (2022b) Song, Z., Yang, X., Xu, Z., and King, I. Graph-based semi-supervised learning: A comprehensive review. IEEE Transactions on Neural Networks and Learning Systems, 2022b.
  • Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  • Stadler et al. (2021) Stadler, M., Charpentier, B., Geisler, S., Zügner, D., and Günnemann, S. Graph posterior network: Bayesian predictive uncertainty for node classification. Advances in Neural Information Processing Systems, 34:18033–18048, 2021.
  • Sun et al. (2022) Sun, Q., Li, J., Yuan, H., Fu, X., Peng, H., Ji, C., Li, Q., and Yu, P. S. Position-aware structure learning for graph topology-imbalance by relieving under-reaching and over-squashing. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp.  1848–1857, 2022.
  • Tang & Liu (2010) Tang, L. and Liu, H. Community detection and mining in social media. Synthesis lectures on data mining and knowledge discovery, 2(1):1–137, 2010.
  • Veličković et al. (2018) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018.
  • Wang et al. (2020) Wang, Z., Ye, X., Wang, C., Cui, J., and Philip, S. Y. Network embedding with completely-imbalanced labels. IEEE Transactions on Knowledge and Data Engineering, 33(11):3634–3647, 2020.
  • Welling & Kipf (2016) Welling, M. and Kipf, T. N. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017), 2016.
  • Wu et al. (2019) Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. Simplifying graph convolutional networks. In International conference on machine learning, pp.  6861–6871. PMLR, 2019.
  • Wu et al. (2022) Wu, L., Xia, J., Gao, Z., Lin, H., Tan, C., and Li, S. Z. Graphmixup: Improving class-imbalanced node classification by reinforcement mixup and self-supervised context prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.  519–535. Springer, 2022.
  • Xu et al. (2023) Xu, Z., Chen, Y., Zhou, Q., Wu, Y., Pan, M., Yang, H., and Tong, H. Node classification beyond homophily: Towards a general solution. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  2862–2873, 2023.
  • Yan et al. (2021a) Yan, Y., Liu, L., Ban, Y., **g, B., and Tong, H. Dynamic knowledge graph alignment. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  4564–4572, 2021a.
  • Yan et al. (2021b) Yan, Y., Zhang, S., and Tong, H. Bright: A bridging algorithm for network alignment. In Proceedings of the web conference 2021, pp.  3907–3917, 2021b.
  • Yan et al. (2024) Yan, Y., Chen, Y., Chen, H., Xu, M., Das, M., Yang, H., and Tong, H. From trainable negative depth to edge heterophily in graphs. Advances in Neural Information Processing Systems, 36, 2024.
  • Yun et al. (2022) Yun, S., Kim, K., Yoon, K., and Park, C. Lte4g: Long-tail experts for graph neural networks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp.  2434–2443, 2022.
  • Zeng et al. (2023a) Zeng, Z., Zhang, S., Xia, Y., and Tong, H. Parrot: Position-aware regularized optimal transport for network alignment. In Proceedings of the ACM Web Conference 2023, pp.  372–382, 2023a.
  • Zeng et al. (2023b) Zeng, Z., Zhu, R., Xia, Y., Zeng, H., and Tong, H. Generative graph dictionary learning. In International Conference on Machine Learning, pp.  40749–40769. PMLR, 2023b.
  • Zeng et al. (2024) Zeng, Z., Du, B., Zhang, S., Xia, Y., Liu, Z., and Tong, H. Hierarchical multi-marginal optimal transport for network alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  16660–16668, 2024.
  • Zhang et al. (2019) Zhang, Y., Pal, S., Coates, M., and Ustebay, D. Bayesian graph convolutional neural networks for semi-supervised classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp.  5829–5836, 2019.
  • Zhao et al. (2021a) Zhao, J., Dong, Y., Ding, M., Kharlamov, E., and Tang, J. Adaptive diffusion in graph neural networks. Advances in Neural Information Processing Systems, 34:23321–23333, 2021a.
  • Zhao et al. (2021b) Zhao, T., Zhang, X., and Wang, S. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining, pp.  833–841, 2021b.
  • Zhao et al. (2022) Zhao, T., Luo, D., Zhang, X., and Wang, S. Topoimb: Toward topology-level imbalance in learning from graphs. In Learning on Graphs Conference, pp.  37–1. PMLR, 2022.
  • Zhao et al. (2020) Zhao, X., Chen, F., Hu, S., and Cho, J.-H. Uncertainty aware semi-supervised learning on graph data. Advances in Neural Information Processing Systems, 33:12827–12836, 2020.
  • Zhu et al. (2020) Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., and Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, 33:7793–7804, 2020.

Appendix

  • Section A: Proofs of Theoretical Results

    • A.1 - Limiting the distribution of Hijksuperscriptsubscript𝐻𝑖𝑗𝑘H_{ij}^{k}italic_H start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

    • A.2 - Proof of Theorem 2.1 (AMP).

    • A.3 - Proof of Theorem 2.2 (DMP).

  • Section B: Reproducibility Details

    • B.1 - Statistics of the used datasets.

    • B.2 - Implementation details of baselines.

    • B.3 - Evaluation protocols.

  • Section C: Further Discussions

    • C.1 - Ablation study of Bat.

    • C.2 - Further speedup of Bat in practice.

    • C.3 - Remarks on choosing between Bat0/Bat1.

    • C.4 - Limitation and future works.

  • Section D: Additional Experiments, Results, and Analysis

    • D.1 - Experiments on additional large-scale graphs

    • D.2 - Comparison with additional independent CIGL baselines.

    • D.3 - Full results with all metrics, error bar, and additional GNN backbones.

Appendix A Proofs of Theoretical Results

Define random variables Hijk:=|𝒱j(u,k)|assignsubscriptsuperscript𝐻𝑘𝑖𝑗subscript𝒱𝑗𝑢𝑘H^{k}_{ij}:=|{\mathcal{V}}_{j}\cap{\mathcal{H}}(u,k)|italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT := | caligraphic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∩ caligraphic_H ( italic_u , italic_k ) | denote the number of class-j𝑗jitalic_j k𝑘kitalic_k-hop homo-connected neighbors of a node u𝒱i𝑢subscript𝒱𝑖u\in{\mathcal{V}}_{i}italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Note that the results of both Theorems 2.1 & 2.2 depend only on the distributions of Hijksubscriptsuperscript𝐻𝑘𝑖𝑗H^{k}_{ij}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. Thus, we will first derive the limiting distributions of Hijksubscriptsuperscript𝐻𝑘𝑖𝑗H^{k}_{ij}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT as a technical lemma, and then give the proofs of Theorems 2.1 & 2.2.

A.1 Limiting Distributions of Hijksubscriptsuperscript𝐻𝑘𝑖𝑗H^{k}_{ij}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT

To count the number of homo-connected neighbors, consider the breadth-first search (BFS) tree rooted at node u𝒱1𝑢subscript𝒱1u\in{\mathcal{V}}_{1}italic_u ∈ caligraphic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. By enumerating the numbers of 1,,k1𝑘1,\ldots,k1 , … , italic_k-hop homo-connected neighbors in the BFS tree respectively, we can calculate the exact joint distribution of (H11k,H12k)subscriptsuperscript𝐻𝑘11subscriptsuperscript𝐻𝑘12(H^{k}_{11},H^{k}_{12})( italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ):

{H11k=s,H12k=s}=a1++ak=sb1++bk=sformulae-sequencesubscriptsuperscript𝐻𝑘11𝑠subscriptsuperscript𝐻𝑘12superscript𝑠subscriptsubscript𝑎1subscript𝑎𝑘𝑠subscript𝑏1subscript𝑏𝑘superscript𝑠\displaystyle\mathbb{P}\{H^{k}_{11}=s,H^{k}_{12}=s^{\prime}\}=\!\!\!\!\!\!\!\!% \sum_{\begin{subarray}{c}a_{1}+\cdots+a_{k}=s\\ b_{1}+\cdots+b_{k}=s^{\prime}\end{subarray}}\!\!\!\!{}blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = italic_s , italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT (n11a1,,ak,n11s)(n2b1,,bk,n2s)binomialsubscript𝑛11subscript𝑎1subscript𝑎𝑘subscript𝑛11𝑠binomialsubscript𝑛2subscript𝑏1subscript𝑏𝑘subscript𝑛2superscript𝑠\displaystyle\binom{n_{1}-1}{a_{1},\dots,a_{k},n_{1}-1-s}\binom{n_{2}}{b_{1},% \dots,b_{k},n_{2}-s^{\prime}}( FRACOP start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 - italic_s end_ARG ) ( FRACOP start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG )
pa1(t=2k(1p)at(1+a1++at2)(1(1p)at1)at)(1p)(n11s)(1+sak)superscript𝑝subscript𝑎1superscriptsubscriptproduct𝑡2𝑘superscript1𝑝subscript𝑎𝑡1subscript𝑎1subscript𝑎𝑡2superscript1superscript1𝑝subscript𝑎𝑡1subscript𝑎𝑡superscript1𝑝subscript𝑛11𝑠1𝑠subscript𝑎𝑘\displaystyle\!\!\!\!\!\!\!\!p^{a_{1}}\bigg{(}\prod_{t=2}^{k}(1-p)^{a_{t}(1+a_% {1}+\cdots+a_{t-2})}(1-(1-p)^{a_{t-1}})^{a_{t}}\bigg{)}(1-p)^{(n_{1}-1-s)(1+s-% a_{k})}italic_p start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 + italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_t - 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_p ) start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 - italic_s ) ( 1 + italic_s - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT
qb1(t=2k(1q)bt(1p)bt(b1++bt2)(1(1p)bt1)bt)(1q)n2s(1p)(n2s)(sbk).superscript𝑞subscript𝑏1superscriptsubscriptproduct𝑡2𝑘superscript1𝑞subscript𝑏𝑡superscript1𝑝subscript𝑏𝑡subscript𝑏1subscript𝑏𝑡2superscript1superscript1𝑝subscript𝑏𝑡1subscript𝑏𝑡superscript1𝑞subscript𝑛2superscript𝑠superscript1𝑝subscript𝑛2superscript𝑠superscript𝑠subscript𝑏𝑘\displaystyle\!\!\!\!\!\!\!\!q^{b_{1}}\bigg{(}\prod_{t=2}^{k}(1-q)^{b_{t}}(1-p% )^{b_{t}(b_{1}+\cdots+b_{t-2})}(1-(1-p)^{b_{t-1}})^{b_{t}}\bigg{)}(1-q)^{n_{2}% -s^{\prime}}(1-p)^{(n_{2}-s^{\prime})(s^{\prime}-b_{k})}.italic_q start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_q ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_t - 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_q ) start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT .

Thus, H11ksubscriptsuperscript𝐻𝑘11H^{k}_{11}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and H12ksubscriptsuperscript𝐻𝑘12H^{k}_{12}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT are independent, and their marginal distributions are:

{H11k=s}=a1++ak=ssubscriptsuperscript𝐻𝑘11𝑠subscriptsubscript𝑎1subscript𝑎𝑘𝑠\displaystyle\mathbb{P}\{H^{k}_{11}=s\}=\sum_{a_{1}+\cdots+a_{k}=s}blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = italic_s } = ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT (n11a1,,ak,n11s)binomialsubscript𝑛11subscript𝑎1subscript𝑎𝑘subscript𝑛11𝑠\displaystyle\binom{n_{1}-1}{a_{1},\dots,a_{k},n_{1}-1-s}( FRACOP start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 - italic_s end_ARG )
pa1(t=2k(1p)at(1+a1++at2)(1(1p)at1)at)(1p)(n11s)(1+sak),superscript𝑝subscript𝑎1superscriptsubscriptproduct𝑡2𝑘superscript1𝑝subscript𝑎𝑡1subscript𝑎1subscript𝑎𝑡2superscript1superscript1𝑝subscript𝑎𝑡1subscript𝑎𝑡superscript1𝑝subscript𝑛11𝑠1𝑠subscript𝑎𝑘\displaystyle p^{a_{1}}\bigg{(}\prod_{t=2}^{k}(1-p)^{a_{t}(1+a_{1}+\cdots+a_{t% -2})}(1-(1-p)^{a_{t-1}})^{a_{t}}\bigg{)}(1-p)^{(n_{1}-1-s)(1+s-a_{k})},italic_p start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 + italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_t - 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_p ) start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 - italic_s ) ( 1 + italic_s - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ,
{H12k=s}=b1++bk=ssubscriptsuperscript𝐻𝑘12𝑠subscriptsubscript𝑏1subscript𝑏𝑘𝑠\displaystyle\mathbb{P}\{H^{k}_{12}=s\}=\sum_{b_{1}+\cdots+b_{k}=s}blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = italic_s } = ∑ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT (n2b1,,bk,n2s)binomialsubscript𝑛2subscript𝑏1subscript𝑏𝑘subscript𝑛2𝑠\displaystyle\binom{n_{2}}{b_{1},\dots,b_{k},n_{2}-s}( FRACOP start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s end_ARG )
qb1(t=2k(1q)bt(1p)bt(b1++bt2)(1(1p)bt1)bt)(1q)n2s(1p)(n2s)(sbk).superscript𝑞subscript𝑏1superscriptsubscriptproduct𝑡2𝑘superscript1𝑞subscript𝑏𝑡superscript1𝑝subscript𝑏𝑡subscript𝑏1subscript𝑏𝑡2superscript1superscript1𝑝subscript𝑏𝑡1subscript𝑏𝑡superscript1𝑞subscript𝑛2𝑠superscript1𝑝subscript𝑛2𝑠𝑠subscript𝑏𝑘\displaystyle q^{b_{1}}\bigg{(}\prod_{t=2}^{k}(1-q)^{b_{t}}(1-p)^{b_{t}(b_{1}+% \cdots+b_{t-2})}(1-(1-p)^{b_{t-1}})^{b_{t}}\bigg{)}(1-q)^{n_{2}-s}(1-p)^{(n_{2% }-s)(s-b_{k})}.italic_q start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_q ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_t - 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_q ) start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_s ) ( italic_s - italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT .

Now consider n𝑛n\to\inftyitalic_n → ∞. Let

β11subscript𝛽11\displaystyle\beta_{11}italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT :=limnn1p=β,assignabsentsubscript𝑛subscript𝑛1𝑝𝛽\displaystyle:=\lim_{n\to\infty}n_{1}\cdot p=\beta,:= roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_p = italic_β ,
β22subscript𝛽22\displaystyle\beta_{22}italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT :=limnn2p=ρβ,assignabsentsubscript𝑛subscript𝑛2𝑝𝜌𝛽\displaystyle:=\lim_{n\to\infty}n_{2}\cdot p=\rho\beta,:= roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_p = italic_ρ italic_β ,
β21subscript𝛽21\displaystyle\beta_{21}italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT :=limnn1q=βqp,assignabsentsubscript𝑛subscript𝑛1𝑞𝛽𝑞𝑝\displaystyle:=\lim_{n\to\infty}n_{1}\cdot q=\beta\frac{q}{p},:= roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_q = italic_β divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG ,
β12subscript𝛽12\displaystyle\beta_{12}italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT :=limnn2q=ρβqp.assignabsentsubscript𝑛subscript𝑛2𝑞𝜌𝛽𝑞𝑝\displaystyle:=\lim_{n\to\infty}n_{2}\cdot q=\rho\beta\frac{q}{p}.:= roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_q = italic_ρ italic_β divide start_ARG italic_q end_ARG start_ARG italic_p end_ARG .

Then, the limiting distributions of H11ksubscriptsuperscript𝐻𝑘11H^{k}_{11}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and H12ksubscriptsuperscript𝐻𝑘12H^{k}_{12}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT are:

{H11k=s}similar-tosubscriptsuperscript𝐻𝑘11𝑠absent\displaystyle\mathbb{P}\{H^{k}_{11}=s\}\sim{}blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = italic_s } ∼ a1++ak=sn1sa1!ak!pa1(t=2k(at1p)at)(1p)n1(1+sak)subscriptsubscript𝑎1subscript𝑎𝑘𝑠superscriptsubscript𝑛1𝑠subscript𝑎1subscript𝑎𝑘superscript𝑝subscript𝑎1superscriptsubscriptproduct𝑡2𝑘superscriptsubscript𝑎𝑡1𝑝subscript𝑎𝑡superscript1𝑝subscript𝑛11𝑠subscript𝑎𝑘\displaystyle\sum_{a_{1}+\cdots+a_{k}=s}\frac{n_{1}^{s}}{a_{1}!\cdots a_{k}!}p% ^{a_{1}}\bigg{(}\prod_{t=2}^{k}(a_{t-1}p)^{a_{t}}\bigg{)}(1-p)^{n_{1}(1+s-a_{k% })}∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋯ italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG italic_p start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_p ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 + italic_s - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT
\displaystyle\to{} eβ11a1++ak=s(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!,superscriptesubscript𝛽11subscriptsubscript𝑎1subscript𝑎𝑘𝑠superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘\displaystyle\mathrm{e}^{-\beta_{11}}\sum_{a_{1}+\cdots+a_{k}=s}\frac{(\beta_{% 11}\mathrm{e}^{-\beta_{11}})^{a_{1}}}{a_{1}!}\bigg{(}\prod_{t=2}^{k-1}\frac{(a% _{t-1}\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{t}}}{a_{t}!}\bigg{)}\frac{(a_{k-% 1}\beta_{11})^{a_{k}}}{a_{k}!},roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG ,
{H12k=s}similar-tosubscriptsuperscript𝐻𝑘12𝑠absent\displaystyle\mathbb{P}\{H^{k}_{12}=s\}\sim{}blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = italic_s } ∼ b1++bk=sn2sb1!bk!qb1(t=2k(bt1p)bt)(1q)n2(1p)n2(sbk)subscriptsubscript𝑏1subscript𝑏𝑘𝑠superscriptsubscript𝑛2𝑠subscript𝑏1subscript𝑏𝑘superscript𝑞subscript𝑏1superscriptsubscriptproduct𝑡2𝑘superscriptsubscript𝑏𝑡1𝑝subscript𝑏𝑡superscript1𝑞subscript𝑛2superscript1𝑝subscript𝑛2𝑠subscript𝑏𝑘\displaystyle\sum_{b_{1}+\cdots+b_{k}=s}\frac{n_{2}^{s}}{b_{1}!\cdots b_{k}!}q% ^{b_{1}}\bigg{(}\prod_{t=2}^{k}(b_{t-1}p)^{b_{t}}\bigg{)}(1-q)^{n_{2}}(1-p)^{n% _{2}(s-b_{k})}∑ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋯ italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG italic_q start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_p ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( 1 - italic_q ) start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_p ) start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_s - italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT
\displaystyle\to{} eβ12b1++bk=s(β12eβ22)b1b1!(t=2k1(bt1β22eβ22)btbt!)(bk1β22)bkbk!.superscriptesubscript𝛽12subscriptsubscript𝑏1subscript𝑏𝑘𝑠superscriptsubscript𝛽12superscriptesubscript𝛽22subscript𝑏1subscript𝑏1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑏𝑡1subscript𝛽22superscriptesubscript𝛽22subscript𝑏𝑡subscript𝑏𝑡superscriptsubscript𝑏𝑘1subscript𝛽22subscript𝑏𝑘subscript𝑏𝑘\displaystyle\mathrm{e}^{-\beta_{12}}\sum_{b_{1}+\cdots+b_{k}=s}\frac{(\beta_{% 12}\mathrm{e}^{-\beta_{22}})^{b_{1}}}{b_{1}!}\bigg{(}\prod_{t=2}^{k-1}\frac{(b% _{t-1}\beta_{22}\mathrm{e}^{-\beta_{22}})^{b_{t}}}{b_{t}!}\bigg{)}\frac{(b_{k-% 1}\beta_{22})^{b_{k}}}{b_{k}!}.roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT divide start_ARG ( italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_b start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG .

Figure 7 shows that the limiting distributions are good approximation for finite n𝑛nitalic_n.

Refer to caption
Figure 7: Distributions of Hijksubscriptsuperscript𝐻𝑘𝑖𝑗H^{k}_{ij}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. Simulated with n=2000𝑛2000n=2000italic_n = 2000, ρ=4𝜌4\rho=4italic_ρ = 4, p=0.01𝑝0.01p=0.01italic_p = 0.01, q=0.002𝑞0.002q=0.002italic_q = 0.002, k=2𝑘2k=2italic_k = 2.

A.2 Proof of Theorem 2.1

.

Note that for any t=1,,ksuperscript𝑡1𝑘t^{\prime}=1,\dots,kitalic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 , … , italic_k,

eβ11a1=0ak=0at(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!=β11t.superscriptesubscript𝛽11superscriptsubscriptsubscript𝑎10superscriptsubscriptsubscript𝑎𝑘0subscript𝑎superscript𝑡superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘superscriptsubscript𝛽11superscript𝑡\mathrm{e}^{-\beta_{11}}\sum_{a_{1}=0}^{\infty}\cdots\sum_{a_{k}=0}^{\infty}a_% {t^{\prime}}\cdot\frac{(\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{1}}}{a_{1}!}% \bigg{(}\prod_{t=2}^{k-1}\frac{(a_{t-1}\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_% {t}}}{a_{t}!}\bigg{)}\frac{(a_{k-1}\beta_{11})^{a_{k}}}{a_{k}!}=\beta_{11}^{t^% {\prime}}.roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⋯ ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG = italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .

Thus,

limn𝔼[H11k]=s=0slimn{H11k=s}subscript𝑛𝔼delimited-[]superscriptsubscript𝐻11𝑘superscriptsubscript𝑠0𝑠subscript𝑛subscriptsuperscript𝐻𝑘11𝑠\displaystyle\lim_{n\to\infty}\mathbb{E}[H_{11}^{k}]=\sum_{s=0}^{\infty}s\cdot% \lim_{n\to\infty}{\mathds{P}}\{H^{k}_{11}=s\}roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_s ⋅ roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_P { italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = italic_s }
=\displaystyle={}= s=0seβ11a1++ak=s(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!superscriptsubscript𝑠0𝑠superscriptesubscript𝛽11subscriptsubscript𝑎1subscript𝑎𝑘𝑠superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘\displaystyle\sum_{s=0}^{\infty}s\cdot\mathrm{e}^{-\beta_{11}}\sum_{a_{1}+% \cdots+a_{k}=s}\frac{(\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{1}}}{a_{1}!}% \bigg{(}\prod_{t=2}^{k-1}\frac{(a_{t-1}\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_% {t}}}{a_{t}!}\bigg{)}\frac{(a_{k-1}\beta_{11})^{a_{k}}}{a_{k}!}∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_s ⋅ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG
=\displaystyle={}= s=0eβ11a1++ak=ss(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!superscriptsubscript𝑠0superscriptesubscript𝛽11subscriptsubscript𝑎1subscript𝑎𝑘𝑠𝑠superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘\displaystyle\sum_{s=0}^{\infty}\mathrm{e}^{-\beta_{11}}\sum_{a_{1}+\cdots+a_{% k}=s}s\cdot\frac{(\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{1}}}{a_{1}!}\bigg{(}% \prod_{t=2}^{k-1}\frac{(a_{t-1}\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{t}}}{a_% {t}!}\bigg{)}\frac{(a_{k-1}\beta_{11})^{a_{k}}}{a_{k}!}∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s end_POSTSUBSCRIPT italic_s ⋅ divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG
=\displaystyle={}= eβ11a1=0ak=0(a1++ak)(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!superscriptesubscript𝛽11superscriptsubscriptsubscript𝑎10superscriptsubscriptsubscript𝑎𝑘0subscript𝑎1subscript𝑎𝑘superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘\displaystyle\mathrm{e}^{-\beta_{11}}\sum_{a_{1}=0}^{\infty}\cdots\sum_{a_{k}=% 0}^{\infty}(a_{1}+\cdots+a_{k})\cdot\frac{(\beta_{11}\mathrm{e}^{-\beta_{11}})% ^{a_{1}}}{a_{1}!}\bigg{(}\prod_{t=2}^{k-1}\frac{(a_{t-1}\beta_{11}\mathrm{e}^{% -\beta_{11}})^{a_{t}}}{a_{t}!}\bigg{)}\frac{(a_{k-1}\beta_{11})^{a_{k}}}{a_{k}!}roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⋯ ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⋅ divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG
=\displaystyle={}= t=1keβ11a1=0ak=0at(β11eβ11)a1a1!(t=2k1(at1β11eβ11)atat!)(ak1β11)akak!superscriptsubscriptsuperscript𝑡1𝑘superscriptesubscript𝛽11superscriptsubscriptsubscript𝑎10superscriptsubscriptsubscript𝑎𝑘0subscript𝑎superscript𝑡superscriptsubscript𝛽11superscriptesubscript𝛽11subscript𝑎1subscript𝑎1superscriptsubscriptproduct𝑡2𝑘1superscriptsubscript𝑎𝑡1subscript𝛽11superscriptesubscript𝛽11subscript𝑎𝑡subscript𝑎𝑡superscriptsubscript𝑎𝑘1subscript𝛽11subscript𝑎𝑘subscript𝑎𝑘\displaystyle\sum_{t^{\prime}=1}^{k}\mathrm{e}^{-\beta_{11}}\sum_{a_{1}=0}^{% \infty}\cdots\sum_{a_{k}=0}^{\infty}a_{t^{\prime}}\cdot\frac{(\beta_{11}% \mathrm{e}^{-\beta_{11}})^{a_{1}}}{a_{1}!}\bigg{(}\prod_{t=2}^{k-1}\frac{(a_{t% -1}\beta_{11}\mathrm{e}^{-\beta_{11}})^{a_{t}}}{a_{t}!}\bigg{)}\frac{(a_{k-1}% \beta_{11})^{a_{k}}}{a_{k}!}∑ start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⋯ ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⋅ divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! end_ARG ( ∏ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ! end_ARG ) divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG
=\displaystyle={}= t=1kβ11t.superscriptsubscriptsuperscript𝑡1𝑘superscriptsubscript𝛽11superscript𝑡\displaystyle\sum_{t^{\prime}=1}^{k}\beta_{11}^{t^{\prime}}.∑ start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .

Similarly,

limn𝔼[H22k]subscript𝑛𝔼delimited-[]superscriptsubscript𝐻22𝑘\displaystyle\lim_{n\to\infty}\mathbb{E}[H_{22}^{k}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] =t=1kβ22t,absentsuperscriptsubscript𝑡1𝑘superscriptsubscript𝛽22𝑡\displaystyle=\sum_{t=1}^{k}\beta_{22}^{t},= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,
limn𝔼[H12k]subscript𝑛𝔼delimited-[]superscriptsubscript𝐻12𝑘\displaystyle\lim_{n\to\infty}\mathbb{E}[H_{12}^{k}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] =t=1kβ12β22t1,absentsuperscriptsubscript𝑡1𝑘subscript𝛽12superscriptsubscript𝛽22𝑡1\displaystyle=\sum_{t=1}^{k}\beta_{12}\beta_{22}^{t-1},= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ,
limn𝔼[H21k]subscript𝑛𝔼delimited-[]superscriptsubscript𝐻21𝑘\displaystyle\lim_{n\to\infty}\mathbb{E}[H_{21}^{k}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_H start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] =t=1kβ21β11t1.absentsuperscriptsubscript𝑡1𝑘subscript𝛽21superscriptsubscript𝛽11𝑡1\displaystyle=\sum_{t=1}^{k}\beta_{21}\beta_{11}^{t-1}.= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT .

It follows that

limnα1kα2ksubscript𝑛superscriptsubscript𝛼1𝑘superscriptsubscript𝛼2𝑘\displaystyle\lim_{n\to\infty}\frac{\alpha_{1}^{k}}{\alpha_{2}^{k}}roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG =limn𝔼[H12k]/𝔼[H11k]𝔼[H21k]/𝔼[H22k]absentsubscript𝑛𝔼delimited-[]subscriptsuperscript𝐻𝑘12𝔼delimited-[]subscriptsuperscript𝐻𝑘11𝔼delimited-[]subscriptsuperscript𝐻𝑘21𝔼delimited-[]subscriptsuperscript𝐻𝑘22\displaystyle=\lim_{n\to\infty}\frac{\mathbb{E}[H^{k}_{12}]/\mathbb{E}[H^{k}_{% 11}]}{\mathbb{E}[H^{k}_{21}]/\mathbb{E}[H^{k}_{22}]}= roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG blackboard_E [ italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ] / blackboard_E [ italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ] end_ARG start_ARG blackboard_E [ italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ] / blackboard_E [ italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ] end_ARG
=t=1kβ12β22t1/t=1kβ11tt=1kβ21β11t1/t=1kβ22tabsentsuperscriptsubscript𝑡1𝑘subscript𝛽12superscriptsubscript𝛽22𝑡1superscriptsubscript𝑡1𝑘superscriptsubscript𝛽11𝑡superscriptsubscript𝑡1𝑘subscript𝛽21superscriptsubscript𝛽11𝑡1superscriptsubscript𝑡1𝑘superscriptsubscript𝛽22𝑡\displaystyle=\frac{\sum_{t=1}^{k}\beta_{12}\beta_{22}^{t-1}\big{/}\sum_{t=1}^% {k}\beta_{11}^{t}}{\sum_{t=1}^{k}\beta_{21}\beta_{11}^{t-1}\big{/}\sum_{t=1}^{% k}\beta_{22}^{t}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG
=β12β22β21β11(t=1kβ22t1)2(t=1kβ11t1)2absentsubscript𝛽12subscript𝛽22subscript𝛽21subscript𝛽11superscriptsuperscriptsubscript𝑡1𝑘superscriptsubscript𝛽22𝑡12superscriptsuperscriptsubscript𝑡1𝑘superscriptsubscript𝛽11𝑡12\displaystyle=\frac{\beta_{12}\beta_{22}}{\beta_{21}\beta_{11}}\cdot\frac{\big% {(}\sum_{t=1}^{k}\beta_{22}^{t-1}\big{)}^{2}}{\big{(}\sum_{t=1}^{k}\beta_{11}^% {t-1}\big{)}^{2}}= divide start_ARG italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=(ρt=1k(ρβ)t1t=1kβt1)2.absentsuperscript𝜌superscriptsubscript𝑡1𝑘superscript𝜌𝛽𝑡1superscriptsubscript𝑡1𝑘superscript𝛽𝑡12\displaystyle=\bigg{(}\rho\cdot\frac{\sum_{t=1}^{k}(\rho\beta)^{t-1}}{\sum_{t=% 1}^{k}\beta^{t-1}}\bigg{)}^{\!2}.\qed= ( italic_ρ ⋅ divide start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_ρ italic_β ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . italic_∎

A.3 Proof of Theorem 2.2

.

For k=2𝑘2k=2italic_k = 2, note the identity:

s=0a=0sλa(μa)saa!(sa)!=a=0λaa!b=0(μa)bb!=a=0λaeμaa!=eλeμ.superscriptsubscript𝑠0superscriptsubscript𝑎0𝑠superscript𝜆𝑎superscript𝜇𝑎𝑠𝑎𝑎𝑠𝑎superscriptsubscript𝑎0superscript𝜆𝑎𝑎superscriptsubscript𝑏0superscript𝜇𝑎𝑏𝑏superscriptsubscript𝑎0superscript𝜆𝑎superscripte𝜇𝑎𝑎superscripte𝜆superscripte𝜇\displaystyle\sum_{s=0}^{\infty}\sum_{a=0}^{s}\frac{\lambda^{a}(\mu a)^{s-a}}{% a!(s-a)!}=\sum_{a=0}^{\infty}\frac{\lambda^{a}}{a!}\sum_{b=0}^{\infty}\frac{(% \mu a)^{b}}{b!}=\sum_{a=0}^{\infty}\frac{\lambda^{a}\mathrm{e}^{\mu a}}{a!}=% \mathrm{e}^{\lambda\mathrm{e}^{\mu}}.∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ( italic_μ italic_a ) start_POSTSUPERSCRIPT italic_s - italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_a ! ( italic_s - italic_a ) ! end_ARG = ∑ start_POSTSUBSCRIPT italic_a = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_a ! end_ARG ∑ start_POSTSUBSCRIPT italic_b = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG ( italic_μ italic_a ) start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_ARG start_ARG italic_b ! end_ARG = ∑ start_POSTSUBSCRIPT italic_a = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_a ! end_ARG = roman_e start_POSTSUPERSCRIPT italic_λ roman_e start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .

It follows that (with λ=(1r1L)β11eβ11𝜆1superscriptsubscript𝑟1Lsubscript𝛽11superscriptesubscript𝛽11\lambda=(1-r_{1}^{\text{L}})\beta_{11}\mathrm{e}^{-\beta_{11}}italic_λ = ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and μ=(1r1L)β11𝜇1superscriptsubscript𝑟1Lsubscript𝛽11\mu=(1-r_{1}^{\text{L}})\beta_{11}italic_μ = ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT)

limn𝔼[(1r1L)H112]=subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻211absent\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{2}_{11}}]={}roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] = s=0(1r1L)slimn{H112=s}superscriptsubscript𝑠0superscript1superscriptsubscript𝑟1L𝑠subscript𝑛subscriptsuperscript𝐻211𝑠\displaystyle\sum_{s=0}^{\infty}(1-r_{1}^{\text{L}})^{s}\cdot\lim_{n\to\infty}% {\mathds{P}}\{H^{2}_{11}=s\}∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ⋅ roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_P { italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = italic_s }
=\displaystyle={}= s=0(1r1L)seβ11a=0s(β11eβ11)a(β11a)saa!(sa)!superscriptsubscript𝑠0superscript1superscriptsubscript𝑟1L𝑠superscriptesubscript𝛽11superscriptsubscript𝑎0𝑠superscriptsubscript𝛽11superscriptesubscript𝛽11𝑎superscriptsubscript𝛽11𝑎𝑠𝑎𝑎𝑠𝑎\displaystyle\sum_{s=0}^{\infty}(1-r_{1}^{\text{L}})^{s}\cdot\mathrm{e}^{-% \beta_{11}}\sum_{a=0}^{s}\frac{(\beta_{11}\mathrm{e}^{-\beta_{11}})^{a}(\beta_% {11}a)^{s-a}}{a!(s-a)!}∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ⋅ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_a ) start_POSTSUPERSCRIPT italic_s - italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_a ! ( italic_s - italic_a ) ! end_ARG
=\displaystyle={}= eβ11s=0a=0s((1r1L)β11eβ11)a((1r1L)β11a)saa!(sa)!superscriptesubscript𝛽11superscriptsubscript𝑠0superscriptsubscript𝑎0𝑠superscript1superscriptsubscript𝑟1Lsubscript𝛽11superscriptesubscript𝛽11𝑎superscript1superscriptsubscript𝑟1Lsubscript𝛽11𝑎𝑠𝑎𝑎𝑠𝑎\displaystyle\mathrm{e}^{-\beta_{11}}\sum_{s=0}^{\infty}\sum_{a=0}^{s}\frac{((% 1-r_{1}^{\text{L}})\beta_{11}\mathrm{e}^{-\beta_{11}})^{a}((1-r_{1}^{\text{L}}% )\beta_{11}a)^{s-a}}{a!(s-a)!}roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG ( ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ( ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_a ) start_POSTSUPERSCRIPT italic_s - italic_a end_POSTSUPERSCRIPT end_ARG start_ARG italic_a ! ( italic_s - italic_a ) ! end_ARG
=\displaystyle={}= eβ11e(1r1L)β11eβ11e(1r1L)β11superscriptesubscript𝛽11superscripte1superscriptsubscript𝑟1Lsubscript𝛽11superscriptesubscript𝛽11superscripte1superscriptsubscript𝑟1Lsubscript𝛽11\displaystyle\mathrm{e}^{-\beta_{11}}\mathrm{e}^{(1-r_{1}^{\text{L}})\beta_{11% }\mathrm{e}^{-\beta_{11}}\mathrm{e}^{(1-r_{1}^{\text{L}})\beta_{11}}}roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT
=\displaystyle={}= e(1(1r1L)er1Lβ11)β11superscripte11superscriptsubscript𝑟1Lsuperscriptesuperscriptsubscript𝑟1Lsubscript𝛽11subscript𝛽11\displaystyle\mathrm{e}^{-(1-(1-r_{1}^{\text{L}})\mathrm{e}^{-r_{1}^{\text{L}}% \beta_{11}})\beta_{11}}roman_e start_POSTSUPERSCRIPT - ( 1 - ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) roman_e start_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
\displaystyle\approx{} eβ11.superscriptesubscript𝛽11\displaystyle\mathrm{e}^{-\beta_{11}}.roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

For k=3𝑘3k=3italic_k = 3, similarly,

limn𝔼[(1r1L)H113]subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻311\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{3}_{11}}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] =e(1(1r1L)β11e(1(1r1L)β11er1Lβ11)β11)β11eβ11.absentsuperscripte11superscriptsubscript𝑟1Lsubscript𝛽11superscripte11superscriptsubscript𝑟1Lsubscript𝛽11superscriptesuperscriptsubscript𝑟1Lsubscript𝛽11subscript𝛽11subscript𝛽11superscriptesubscript𝛽11\displaystyle=\mathrm{e}^{-\big{(}1-(1-r_{1}^{\text{L}})\beta_{11}\mathrm{e}^{% -\big{(}1-(1-r_{1}^{\text{L}})\beta_{11}\mathrm{e}^{-r_{1}^{\text{L}}\beta_{11% }}\big{)}\beta_{11}}\big{)}\beta_{11}}\approx\mathrm{e}^{-\beta_{11}}.= roman_e start_POSTSUPERSCRIPT - ( 1 - ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - ( 1 - ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_e start_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≈ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

In general, the result for k𝑘kitalic_k has k𝑘kitalic_k nested exponentiations, but we still have:

limn𝔼[(1r1L)H11k]eβ11.subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘11superscriptesubscript𝛽11\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{k}_{11}}]% \approx\mathrm{e}^{-\beta_{11}}.roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] ≈ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

Similarly,

limn𝔼[(1r2L)H12k]subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘12\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{2}^{\text{L}})^{H^{k}_{12}}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] eβ12,absentsuperscriptesubscript𝛽12\displaystyle\approx\mathrm{e}^{-\beta_{12}},≈ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
limn𝔼[(1r2L)H22k]subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘22\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{2}^{\text{L}})^{H^{k}_{22}}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] eβ22,absentsuperscriptesubscript𝛽22\displaystyle\approx\mathrm{e}^{-\beta_{22}},≈ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
limn𝔼[(1r1L)H21k]subscript𝑛𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘21\displaystyle\lim_{n\to\infty}{\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{k}_{21}}]roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] eβ21.absentsuperscriptesubscript𝛽21\displaystyle\approx\mathrm{e}^{-\beta_{21}}.≈ roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

By the law of total probability and the independence of Hi1ksubscriptsuperscript𝐻𝑘𝑖1H^{k}_{i1}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT and Hi2ksubscriptsuperscript𝐻𝑘𝑖2H^{k}_{i2}italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT,

δ1kδ2k=subscriptsuperscript𝛿𝑘1subscriptsuperscript𝛿𝑘2absent\displaystyle\frac{\delta^{k}_{1}}{\delta^{k}_{2}}={}divide start_ARG italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = 𝔼[(1r1L)H11k+1(1(1r2L)H12k)]𝔼[(1r2L)H22k+1(1(1r1L)H21k)]𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘1111superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘12𝔼delimited-[]superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘2211superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘21\displaystyle\frac{{\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{k}_{11}+1}(1-(1-r_{2}% ^{\text{L}})^{H^{k}_{12}})]}{{\mathbb{E}}[(1-r_{2}^{\text{L}})^{H^{k}_{22}+1}(% 1-(1-r_{1}^{\text{L}})^{H^{k}_{21}})]}divide start_ARG blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + 1 end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ] end_ARG start_ARG blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT + 1 end_POSTSUPERSCRIPT ( 1 - ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ] end_ARG
=\displaystyle={}= (1r1L)𝔼[(1r1L)H11k](1𝔼[(1r2L)H12k])(1r2L)𝔼[(1r2L)H22k](1𝔼[(1r1L)H21k]).1superscriptsubscript𝑟1L𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘111𝔼delimited-[]superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘121superscriptsubscript𝑟2L𝔼delimited-[]superscript1superscriptsubscript𝑟2Lsubscriptsuperscript𝐻𝑘221𝔼delimited-[]superscript1superscriptsubscript𝑟1Lsubscriptsuperscript𝐻𝑘21\displaystyle\frac{(1-r_{1}^{\text{L}}){\mathbb{E}}[(1-r_{1}^{\text{L}})^{H^{k% }_{11}}](1-{\mathbb{E}}[(1-r_{2}^{\text{L}})^{H^{k}_{12}}])}{(1-r_{2}^{\text{L% }}){\mathbb{E}}[(1-r_{2}^{\text{L}})^{H^{k}_{22}}](1-{\mathbb{E}}[(1-r_{1}^{% \text{L}})^{H^{k}_{21}}])}.divide start_ARG ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] ( 1 - blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] ) end_ARG start_ARG ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] ( 1 - blackboard_E [ ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] ) end_ARG .

It follows that

limnδ1kδ2ksubscript𝑛subscriptsuperscript𝛿𝑘1subscriptsuperscript𝛿𝑘2absent\displaystyle\lim_{n\to\infty}\frac{\delta^{k}_{1}}{\delta^{k}_{2}}\approx{}roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ≈ (1r1L)eβ11(1eβ12)(1r2L)eβ22(1eβ21)1superscriptsubscript𝑟1Lsuperscriptesubscript𝛽111superscriptesubscript𝛽121superscriptsubscript𝑟2Lsuperscriptesubscript𝛽221superscriptesubscript𝛽21\displaystyle\frac{(1-r_{1}^{\text{L}})\mathrm{e}^{-\beta_{11}}(1-\mathrm{e}^{% -\beta_{12}})}{(1-r_{2}^{\text{L}})\mathrm{e}^{-\beta_{22}}(1-\mathrm{e}^{-% \beta_{21}})}divide start_ARG ( 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG start_ARG ( 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT ) roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - roman_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG
\displaystyle\approx{} 1r1L1r2Leβ22β11=1r1L1r2Le(ρ1)β.1superscriptsubscript𝑟1L1superscriptsubscript𝑟2Lsuperscriptesubscript𝛽22subscript𝛽111superscriptsubscript𝑟1L1superscriptsubscript𝑟2Lsuperscripte𝜌1𝛽\displaystyle\frac{1-r_{1}^{\text{L}}}{1-r_{2}^{\text{L}}}\mathrm{e}^{\beta_{2% 2}-\beta_{11}}=\frac{1-r_{1}^{\text{L}}}{1-r_{2}^{\text{L}}}\mathrm{e}^{(\rho-% 1)\beta}.\qeddivide start_ARG 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT end_ARG roman_e start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = divide start_ARG 1 - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT L end_POSTSUPERSCRIPT end_ARG roman_e start_POSTSUPERSCRIPT ( italic_ρ - 1 ) italic_β end_POSTSUPERSCRIPT . italic_∎

Appendix B Reproducibility

In this section, we describe the detailed experimental settings including (§B.1) data statistics, (§B.2) baseline settings, and (§B.3) evaluation protocols. The source code for implementing and evaluating Bat and all the CIGL baseline methods will be released after the paper is published.

B.1 Data Statistics

As previously described, we adopt 5 benchmark graph datasets: the Cora, CiteSeer, and PubMed citation networks (Sen et al., 2008), and the CS and Physics coauthor networks (Shchur et al., 2018) to test Bat on large graphs with more nodes and high-dimensional features. All datasets are publicly available222https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html.. Table 4 summarizes the dataset statistics.

Table 4: Statistics of datasets.
Dataset #nodes #edges #features #classes
Cora 2,708 10,556 1,433 7
CiteSeer 3,327 9,104 3,703 6
PubMed 19,717 88,648 500 3
CS 18,333 163,788 6,805 15
Physics 34,493 495,924 8,415 5

We follow previous works (Zhao et al., 2021b; Park et al., 2022; Song et al., 2022a) to construct and adjust the class imbalanced node classification tasks. For step imbalance, we select half of the classes (m/2𝑚2\lfloor m/2\rfloor⌊ italic_m / 2 ⌋) as minority classes and the rest as majority classes. We follow the public split (Sen et al., 2008) for semi-supervised node classification where each class has 20 training nodes, then randomly remove minority class training nodes until the given imbalance ratio (IR) is met. The imbalance ratio is defined as IR=#majority training nodes#minority training nodes[1,)IR#majority training nodes#minority training nodes1\text{IR}=\frac{\#\text{majority training nodes}}{\#\text{minority training % nodes}}\in[1,\infty)IR = divide start_ARG # majority training nodes end_ARG start_ARG # minority training nodes end_ARG ∈ [ 1 , ∞ ), i.e., more imbalanced data has higher IR. For natural imbalance, we simulate the long-tail class imbalance present in real-world data by utilizing a power-law distribution. Specifically, for a given IR, the largest head class have nhead=IRsubscript𝑛headIRn_{\text{head}}=\text{IR}italic_n start_POSTSUBSCRIPT head end_POSTSUBSCRIPT = IR training nodes, and the smallest tail class have 1 training node. The number of training nodes of the k𝑘kitalic_k-th class is determined by nk=nheadλk,λk=mkm1formulae-sequencesubscript𝑛𝑘superscriptsubscript𝑛headsubscript𝜆𝑘subscript𝜆𝑘𝑚𝑘𝑚1n_{k}=\lfloor n_{\text{head}}^{\lambda_{k}}\rfloor,\lambda_{k}=\frac{m-k}{m-1}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ⌊ italic_n start_POSTSUBSCRIPT head end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⌋ , italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG italic_m - italic_k end_ARG start_ARG italic_m - 1 end_ARG. We set the IR (largest class to smallest class) to 50/100 to test Bat’s robustness under natural and extreme class imbalance. We show the training data distribution under step and natural imbalance in Fig. 8.

Refer to caption
Figure 8: Class distribution of training datasets under step and natural imbalance.

B.2 Baseline Settings

To fully validate Bat’s performance and compatibility with existing CIGL techniques and GNN backbones, we include six baseline methods with five popular GNN backbones in our experiments, and combine Bat with them under all possible combinations. The included CIGL baselines can be generally divided into two categories: reweighting-based (i.e., Reweight (Japkowicz & Stephen, 2002), ReNode (Chen et al., 2021)) and augmentation-based (i.e., Oversampling (Japkowicz & Stephen, 2002), SMOTE (Chawla et al., 2002), GraphSMOTE (Zhao et al., 2021b), and GraphENS (Park et al., 2022)).

  • Reweight (Japkowicz & Stephen, 2002) assigns minority classes with higher misclassification costs (i.e., weights in the loss function) by the inverse of the class frequency in the training set.

  • ReNode (Chen et al., 2021) measures the influence conflict of training nodes, and perform instance-wise node reweighting to alleviate the topology imbalance.

  • Oversample (Japkowicz & Stephen, 2002) augments minority classes with additional synthetic nodes by replication-base oversampling.

  • SMOTE (Chawla et al., 2002) synthesizes minority nodes by 1) randomly selecting a seed node, 2) finding its k𝑘kitalic_k-nearest neighbors in the feature space, and 3) performing linear interpolation between the seed and one of its k𝑘kitalic_k-nearest neighbors.

  • GraphSMOTE (Zhao et al., 2021b) extends SMOTE (Chawla et al., 2002) to graph-structured data by 1) performing SMOTE in the low-dimensional embedding space of GNN and 2) utilizing a learnable edge predictor to generate better topology connections for synthetic nodes.

  • GraphENS (Park et al., 2022) directly synthesize the whole ego network (node with its 1-hop neighbors) for minority classes by similarity-based ego network combining and saliency-based node mixing to prevent neighbor memorization.

Baseline implementation details. We use the public implementations333https://github.com/victorchen96/renode444https://github.com/TianxiangZhao/GraphSmote555https://github.com/JoonHyung-Park/GraphENS of the baseline methods for a fair comparison. For ReNode (Chen et al., 2021), we use its transductive version and search hyperparameters among the lower bound of cosine annealing wmin{0.25,0.5,0.75}subscript𝑤𝑚𝑖𝑛0.250.50.75w_{min}\in\{0.25,0.5,0.75\}italic_w start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ∈ { 0.25 , 0.5 , 0.75 } and upper bound of the cosine annealing wmax{1.25,1.5,1.75}subscript𝑤𝑚𝑎𝑥1.251.51.75w_{max}\in\{1.25,1.5,1.75\}italic_w start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ∈ { 1.25 , 1.5 , 1.75 } following the original paper. We set the teleport probability of PageRank α=0.15𝛼0.15\alpha=0.15italic_α = 0.15 as given by the default setting in the released implementation. As Oversample (Cui et al., 2019) and SMOTE (Chawla et al., 2002) were not proposed to handle graph data, we adopt their enhanced versions provided by GraphSMOTE (Zhao et al., 2021b), which also duplicate the edges from the seed nodes to the synthesized nodes in order to connect them to the graph. For GraphSMOTE (Zhao et al., 2021b), we use the version that predicts edges with binary values as it performs better than the variant with continuous edge predictions in many datasets. For GraphENS (Park et al., 2022), we follow the settings in the paper: Beta(2,2)Beta22\text{Beta}(2,2)Beta ( 2 , 2 ) distribution is used to sample λ𝜆\lambdaitalic_λ, the feature masking hyperparameter k𝑘kitalic_k and temperature τ𝜏\tauitalic_τ are tuned among {1,5,10}1510\{1,5,10\}{ 1 , 5 , 10 } and {1,2}12\{1,2\}{ 1 , 2 }, and the number of warmup epochs is set to 5.

Combining Bat and baseline CIGL techniques. Since Bat only manipulates the graph data and remains independent of the model architecture, it seamlessly integrates with the aforementioned CIGL techniques. During each training epoch, Bat enhances the original graph 𝒢𝒢{\mathcal{G}}caligraphic_G using the current model f𝑓fitalic_f and yields the augmented graph 𝒢superscript𝒢{\mathcal{G}}^{*}caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Subsequently, other CIGL methods operate on the augmented graph 𝒢superscript𝒢{\mathcal{G}}^{*}caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Specifically, loss function engineering methods (Reweight and ReNode) perform loss computation and backpropagation based on 𝒢superscript𝒢{\mathcal{G}}^{*}caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and data augmentation methods (Resampling, SMOTE, GSMOTE, GENS) carry out additional class-balancing operations on 𝒢superscript𝒢{\mathcal{G}}^{*}caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, generating new minority nodes based on its structure.

GNN backbone implementation details. We use pytorch (Paszke et al., 2019) and torch_geometric (Fey & Lenssen, 2019) to implement all five GNN backbones used in this paper, i.e., GCN (Welling & Kipf, 2016), GAT (Veličković et al., 2018), GraphSAGE (Hamilton et al., 2017), APPNP (Gasteiger et al., 2018), and GPRGNN (Chien et al., 2020). Most of our settings are aligned with prevailing works (Park et al., 2022; Chen et al., 2021; Song et al., 2022a) to obtain fair and comparable results. Specifically, we implement all GNNs’ convolution layer with ReLU activation and dropout (Srivastava et al., 2014) with a drop** rate of 0.5 before the last layer. For GAT, we set the number of attention heads to 4. For APPNP and GPRGNN, we follow the best setting in the original paper and use 2 APPNP/GPR_prop convolution layers with 64 hidden units. Note that GraphENS’s official implementation requires modifying the graph convolution for resampling and thus cannot be directly combined with APPNP and GPRGNN. The teleport probability = 0.1 and the number of power iteration steps K = 10. We search for the best architecture for other backbones from #layers l{1,2,3}𝑙123l\in\{1,2,3\}italic_l ∈ { 1 , 2 , 3 } and hidden dimension d{64,128,256}𝑑64128256d\in\{64,128,256\}italic_d ∈ { 64 , 128 , 256 } based on the average of validation accuracy and F1 score. We train each GNN for 2,000 epochs using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.01. To achieve better convergence, we follow (Park et al., 2022) to use 5e-4 weight decay and adopt the ReduceLROnPlateau scheduler in Pytorch, which reduces the learning rate by half if the validation loss does not improve for 100 epochs.

B.3 Evaluation Protocol

To evaluate the predictive performance on class-imbalanced data, we use two balanced metrics, i.e., balanced accuracy (BAcc.) and macro-averaged F1 score (Macro-F1). They compute accuracy/F1-score for each class independently and use the unweighted average mean as the final score, i.e., BAcc. =1mi=1mAcc(ci)absent1𝑚superscriptsubscript𝑖1𝑚𝐴𝑐𝑐subscript𝑐𝑖=\frac{1}{m}\sum_{i=1}^{m}Acc(c_{i})= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_A italic_c italic_c ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), Macro-F1 =1mi=1mF1(ci)absent1𝑚superscriptsubscript𝑖1𝑚𝐹1subscript𝑐𝑖=\frac{1}{m}\sum_{i=1}^{m}F1(c_{i})= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_F 1 ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Additionally, we use performance standard deviation (PerfStd) to evaluate the level of model predictive bias. Formally, let Acc(ci)𝐴𝑐𝑐subscript𝑐𝑖Acc(c_{i})italic_A italic_c italic_c ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) be the classification accuracy of class cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the PerfStd is defined as the standard deviation of the accuracy scores of all classes, i.e., 1mi=1m(Acc(ci)BAcc.)21𝑚superscriptsubscript𝑖1𝑚superscript𝐴𝑐𝑐subscript𝑐𝑖BAcc.2\sqrt{\frac{1}{m}\sum_{i=1}^{m}(Acc(c_{i})-\textit{BAcc.})^{2}}square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_A italic_c italic_c ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - BAcc. ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. All the experiments are conducted on a Linux server with Intel®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Xeon®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Gold 6240R CPU and NVIDIA®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Tesla V100 32GB GPU.

Appendix C Extended Discussions

In this section, we present an ablation study C.1) validate the effectiveness and efficiency of the key modules, then we discuss how to further speed up Bat in practice C.2); how to choose between Bat0 and Bat1 in practice C.3); and finally, the limitation and future works C.4).

C.1 Ablation Study

We present an ablation study to validate the effectiveness and efficiency of the key modules in Bat. Specifically, for node risk estimation, we compare our total-variation-distance-based uncertainty with (i) the naïve random assignment that drawn uncertainty score from a uniform distribution U(0,1)𝑈01U(0,1)italic_U ( 0 , 1 ) and (ii) the information entropy 𝐇(Y)=y𝒴p(y)log2(p(y))𝐇𝑌subscript𝑦𝒴𝑝𝑦subscript2𝑝𝑦\mathbf{H}(Y)=-\sum_{y\in\mathcal{Y}}p(y)\log_{2}(p(y))bold_H ( italic_Y ) = - ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT italic_p ( italic_y ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p ( italic_y ) ). We substitute the original uncertainty metric with these aforementioned methods in Bat0, and assess their impact on performance as well as the computational time required for uncertainty estimation. It is worth noting that in practical implementation, the computation is parallelized on GPU (assuming sufficient GPU memory). Therefore, the computational time of a given uncertainty measure remains consistent across the three datasets we employed (Cora, CiteSeer, PubMed with IR=10). The detailed results are presented in Table 5, revealing that: (i) Randomly assigned uncertainty scores significantly impede the performance of Bat, resulting in a large drop in both balanced accuracy and Marco-f1. (ii) In comparison to our approach, employing information entropy as the node uncertainty score necessitates similar-to\sim2.3x computation time, yet the influence on performance remains marginal.

Table 5: Ablation study on node risk estimation of Bat.
Uncertainty Cora CiteSeer PubMed Computation
BAcc Macro-F1 BAcc Macro-F1 BAcc Macro-F1 Time(ms)
Random 61.64±plus-or-minus\pm±1.89 59.44±plus-or-minus\pm±1.71 46.59±plus-or-minus\pm±2.29 44.37±plus-or-minus\pm±3.23 61.60±plus-or-minus\pm±1.69 58.13±plus-or-minus\pm±1.81 0.0249
Information Entropy 65.18±plus-or-minus\pm±1.68 63.11±plus-or-minus\pm±1.91 51.87±plus-or-minus\pm±2.96 50.36±plus-or-minus\pm±3.43 67.72±plus-or-minus\pm±1.27 67.19±plus-or-minus\pm±1.57 0.1257
TVDistance (ours) 65.54±plus-or-minus\pm±1.25 63.28±plus-or-minus\pm±1.07 52.65±plus-or-minus\pm±1.08 51.55±plus-or-minus\pm±1.28 68.62±plus-or-minus\pm±0.77 67.16±plus-or-minus\pm±1.53 0.0543

Further, we conduct an ablation study for our posterior likelihood estimation strategy by comparing our 0thsuperscript0th0^{\text{th}}0 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT-order (Bat0) and 1stsuperscript1st1^{\text{st}}1 start_POSTSUPERSCRIPT st end_POSTSUPERSCRIPT-order (Bat1) likelihood estimation methods with the random method that assigns (unnormalized) node-class likelihood by drawing from a uniform distribution U(0,1)𝑈01U(0,1)italic_U ( 0 , 1 ). Results are shown in Table 6. We can observe that the random method significantly worsens the predictive performance on all CIGL tasks. Altogether, the ablation study results confirm the effectiveness and efficiency of the design of Bat, showcasing its ability to deliver strong performance with minimal computational overhead.

Table 6: Ablation study on posterior likelihood estimation of Bat.
Estimation Cora CiteSeer PubMed Computation
BAcc Macro-F1 BAcc Macro-F1 BAcc Macro-F1 Time(ms)
Random 63.85±plus-or-minus\pm±2.17 61.94±plus-or-minus\pm±2.68 46.51±plus-or-minus\pm±3.27 41.70±plus-or-minus\pm±4.33 64.32±plus-or-minus\pm±1.23 53.58±plus-or-minus\pm±2.48 0.0883
0th-order (Bat0) 65.54±plus-or-minus\pm±1.25 63.28±plus-or-minus\pm±1.07 52.65±plus-or-minus\pm±1.08 51.55±plus-or-minus\pm±1.28 68.62±plus-or-minus\pm±0.77 67.16±plus-or-minus\pm±1.53 0.1251
1st-order (Bat1) 69.80±plus-or-minus\pm±1.30 68.68±plus-or-minus\pm±1.49 55.37±plus-or-minus\pm±1.39 54.94±plus-or-minus\pm±1.44 67.57±plus-or-minus\pm±3.22 64.40±plus-or-minus\pm±3.68 0.3030

C.2 On the Further Speedup of Bat

As stated in the paper, thanks to its simple and efficient design, Bat can be integrated into the GNN training process to perform dynamic topology augmentation based on the training state. By default, we run Bat in every iteration of GNN training, i.e., the granularity of applying Bat is 1, as described in Alg. 1. However, we note that in practice, this granularity can be increased to further reduce the cost of applying Bat. This operation can result in a significant linear speedup ratio: setting the granularity to N𝑁Nitalic_N reduces the computational overhead of Bat to 1/N1𝑁1/N1 / italic_N of the original (i.e., N𝑁Nitalic_Nx speedup ratio), with minor performance degradation. This could be helpful for scaling Bat to large-scale graphs in practice. In this section, we design experiments to validate the influence of different Bat granularity (i.e., the number of iterations per each use of Bat) in real-world CIGL tasks. We set the granularity to 1/5/10/50/100 and test the performance of BatT with a vanilla GCN classifier on the Cora/CiteSeer/PubMed dataset with an imbalance ratio of 10. Fig. 9 shows the empirical results from 10 independent runs. The red horizontal line in each subfigure represents the baseline (vanilla GCN) performance. It can be observed that setting a larger Bat granularity is an effective way to further speed up Bat in practice. The performance drop of adopting this trick is relatively minor, especially considering the significant linear speedup ratio it brings. The predictive performance boost brought by Bat is still significant even with a large granularity at 100 (i.e., with 100x Bat speedup).

Refer to caption
(a) Influence of Bat granularity on BAcc.
Refer to caption
(b) Influence of Bat granularity on Macro-F1.
Figure 9: Influence of the Bat granularity (i.e., the number of iterations per each use of Bat). Note that this brings a linear speedup ratio in practice, e.g., granularity = 100 \Leftrightarrow 100x speedup.

C.3 Choosing between Bat0 and Bat1

In this section, we summarize the strengths and limitations of Bat0 and Bat1, and give suggestions for choosing between them in practice. In short, we recommend using Bat1 to achieve better classification performance. But in case that computational resources are limited, Bat0 can serve as a more efficient alternative. The reasons are as follows:

Performance. We observe a noticeable performance gap between Bat0 and Bat1, wherein Bat1 consistently demonstrates superior classification performance due to its incorporation of local topological structure. Across the 15 scenarios outlined in Table 1 (the best scores for 3 datasets x 5 GNN backbones): (1) Bat1 outperforms Bat0 significantly in 12 out of 15 scenarios for BAcc/F1 scores, with an average F1 advantage of 1.692 in the 11 leading cases. (2) Conversely, Bat0 exhibits a less pronounced advantage in the instances where it outperforms Bat1, with an average advantage of 0.518 in the 4 leading scenarios666Despite Bat1 holding a relative performance edge, both Bat0 and Bat1 substantially enhance the performance of the best-performing CIGL baseline methods. Over the 15 scenarios, Bat0 yields an average improvement of 4.789/5.848 in the best BAcc/F1, while Bat1 brings an average improvement of 5.805/6.950..

Efficiency. On the other hand, it’s worth noting that Bat1 generally incurs higher time and space complexity compared to Bat0. Specifically, Bat0 demonstrates linear complexity concerning the number of nodes, whereas Bat1 exhibits linear growth in complexity with the number of edges. Given that real-world graph data often features a significantly larger number of edges than nodes, Bat0 is usually the more efficient option (especially for densely connected graphs).

C.4 Limitations and Future Works

One potential limitation of the proposed Bat framework is its reliance on exploiting model prediction for risk and likelihood estimation. This strategy may not provide accurate estimation when the model itself exhibits extremely poor predictive performance. However, this rarely occurs in practice and can be prevented by more careful fine-tuning of parameters and model architectures. In addition to this, as described in Section 3, we adopt several fast measures to estimate node uncertainty, prediction risk, and posterior likelihood for the sake of efficiency. Other techniques for such purposes (e.g., deterministic (Liu et al., 2020; Zhao et al., 2020)/Bayesian (Zhang et al., 2019; Hasanzadeh et al., 2020)/Jackknife (Kang et al., 2022a) uncertainty estimation) can be easily integrated into the proposed Bat framework, although the computational efficiency might be a major bottleneck. How to exploit alternative uncertainty/risk measures while retaining computational efficiency is an interesting future direction.

Beyond the class imbalance in the node label distribution, graph data can also exhibit multi-facet skewness in other aspects. For instance, class imbalance may also exist in edge-level (e.g., in edge classification/prediction (Cai et al., 2021; Pandey et al., 2019)) and graph-level (e.g., in graph classification/alignment (Zeng et al., 2023a, 2024; Yan et al., 2021b)). Handling class imbalance can be even more challenging on evolving/dynamic graphs (Fu & He, 2021; Yan et al., 2021a). Beyond the quantity imbalance among classes, skewness may also exists in the topological structure, such as degree imbalance (Kang et al., 2022b), and motif-level imbalance (Zhao et al., 2022; Zeng et al., 2023b). How to jointly consider the multi-facet node/edge/graph-level imbalance to benefit more graph learning tasks is an exciting future direction. Finally, recent work on fairness-aware graph learning (Kang et al., 2020; Fu et al., 2023) has observed that the topology of the graph can also introduce bias/discrimination towards certain groups: extending BAT’s concept to mitigate group unfairness through topology augmentation poses an intriguing future direction.

Appendix D Additional Experimental Results and Analysis

D.1 Results on Additional Large-scale Graphs

To further test the scalability of Bat and the baseline CIGL methods, we extended the experiment to two large-scale graph datasets, CoraFull (19,793 nodes, 126,842 edges) (Bojchevski & Günnemann, 2017) and arXiv (169,343 nodes, 1,166,243 edges) (Hu et al., 2020). To ensure comprehensive and fair comparisons, we followed the same protocol as Table 1 to evaluate six baseline methods and the performance gains brought by Bat. We used GCN as the backbone. It is worth noting that CIGL on these large graphs is a highly challenging task due to (i) the inherent task complexity (with 70/172 classes), (ii) the label scarcity for numerous minority classes, and (iii) the requirement for the algorithm’s scalability. This may be the reason why most existing works (ReNode (Chen et al., 2021), GraphSMOTE (Zhao et al., 2021b), GraphENS (Park et al., 2022), TAM (Song et al., 2022a), LTE4G (Yun et al., 2022), etc.) did not consider these large datasets. Nevertheless, we conduct the experiments report the results in Table 7. We can observe that:

  • Bat can scale to large-scale CIGL tasks, consistently and significantly enhancing the performance of various CIGL baselines (overall with  10% relative performance gain).

  • The linear complexity of Bat makes it scalable to large graphs, while GraphENS, ReNode, and GraphSMOTE face scalability issues when applied to the arXiv dataset (with 169,343 nodes) due to their O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) space complexity.

Table 7: Experiments on large-scale graphs, the numbers in parentheses are the performance gain brought by Bat.
Metric Dataset Setting CIGL Baseline
ERM Reweight ReNode Resample SMOTE GraphSMOTE GraphENS
BAcc.\uparrow CoraFull Base 36.12 40.93 40.10 35.71 35.59 39.25 43.76
+Bat0 39.40 (+3.28) 43.81 (+2.88) 42.78 (+2.68) 40.77 (+5.05) 40.31 (+4.72) 43.71 (+4.46) 46.20 (+2.44)
+Bat1 40.88 (+4.76) 43.77 (+2.84) 42.82 (+2.72) 40.67 (+4.95) 41.04 (+5.46) 43.70 (+4.45) 47.19 (+3.44)
OGBN-arXiv Base 32.20 35.69 OOM 32.24 32.18 OOM OOM
+Bat0 34.36 (+2.16) 37.59 (+1.90) OOM 37.20 (+4.96) 36.88 (+4.69) OOM OOM
+Bat1 36.57 (+4.37) 39.28 (+3.60) OOM 37.36 (+5.12) 37.50 (+5.32) OOM OOM
Macro-F1\uparrow CoraFull Base 33.54 38.86 37.98 32.84 32.70 37.70 41.28
+Bat0 37.25 (+3.71) 41.21 (+2.35) 40.58 (+2.61) 39.04 (+6.19) 38.35 (+5.65) 41.25 (+3.55) 43.90 (+2.61)
+Bat1 38.58 (+5.04) 41.16 (+2.30) 40.37 (+2.39) 38.48 (+5.64) 39.18 (+6.48) 41.75 (+4.05) 44.61 (+3.33)
OGBN-arXiv Base 29.90 32.14 OOM 30.16 29.96 OOM OOM
+Bat0 32.42 (+2.52) 34.51 (+2.38) OOM 34.50 (+4.34) 34.44 (+4.48) OOM OOM
+Bat1 33.99 (+4.08) 34.78 (+2.64) OOM 34.55 (+4.38) 34.69 (+4.74) OOM OOM
  • *OOM: Out-Of-Memory on a NVIDIA®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Tesla V100 32GB GPU.

D.2 Comparison with Additional CIGL Baselines

In the main results, we chose representative model-independent CIGL baselines to test Bat’s ability to cooperate and boost various CIGL techniques. In this section, we compare Bat with other representative CIGL techniques that are either model-dependent or with design constraints making them incompatible with Bat. Specifically, we further include three CIGL baselines LTE4G (Yun et al., 2022), GraphMixup (Wu et al., 2022), TAM (Song et al., 2022a), and compare them with Bat on seven datasets (including the newly introduced large-scale graphs CoraFull and arXiv in Section D.1). We use GCN as the backbone, and follow the main experiment protocol used in Table 1 to ensure fair and comparable results.

The results are reported in Table 8. In short, we observe that: (i) Bat consistently demonstrates better performance and scalability compared to the additional baselines. (ii) Some methods (such as GraphMixup and LTE4G) exhibit higher space complexity and thus cannot scale to large datasets. (iii) Among the baseline methods, LTE4G achieves the overall best performance due to its three-stage training (encoder pre-training, expert training, student training) and divide-and-conquer strategy. We note that its complex training and inference strategies introduce additional computational overhead and make it challenging to integrate with other methods. In comparison, Bat has better compatibility and can further achieve better performance with existing class-rebalancing techniques.

Table 8: Comparison with independent CIGL baselines. We use bold/italics to mark the best/second-best results.
Metric Method Dataset
Cora CiteSeer PubMed CS Physics CoraFull arXiv
BAcc.\uparrow GCN 61.6 37.6 64.2 75.4 80.1 32.2 36.1
GraphMixup 64.1 50.8 OOM OOM OOM OOM OOM
LTE4G 68.0 51.7 63.7 77.2 81.7 36.6 OOM
TAM 65.3 50.1 65.7 77.0 82.8 35.0 36.3
Bat (Ours) 69.8 55.4 68.6 82.6 87.6 36.6 40.1
Macro-F1\uparrow GCN 60.1 28.1 55.1 72.7 80.7 33.5 29.9
GraphMixup 62.7 47.3 OOM OOM OOM OOM OOM
LTE4G 67.1 49.7 61.7 76.7 82.1 37.7 OOM
TAM 65.7 43.4 63.5 75.2 82.3 32.8 31.2
Bat (Ours) 68.7 54.9 67.2 78.6 88.8 38.6 34.0
  • *OOM: Out-Of-Memory on a NVIDIA®®{}^{\text{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT Tesla V100 32GB GPU.

D.3 Full Empirical Results with Additional GNN Backbones

Due to space limitation, we report the key results of our experiments in Table 1 and 2. We now provide complete results for all settings with the standard error of 5 independent runs. Specifically, Table 9 & 10 & 11 complement Table 1, and Table 12 complements Table 2. We further include APPNP (Gasteiger et al., 2018) and GPRGNN (Chien et al., 2020) as additional GNN backbones for supported CIGL techniques. Note that the official codebase of (Park et al., 2022) only implemented GraphENS using modified (to enable saliency-based mixup) GCN, GAT, and GraphSAGE backbones, and extending it to other GNN backbones is difficult. The results indicate that Bat can consistently boost various CIGL baselines with all GNN backbones under different types and levels of class imbalance, which aligns with our conclusions in the paper.

Table 9: Balanced accuracy of combining Bat with 6 IGL baselines ×\times× 5 GNN backbones.
Dataset (IR=10) Cora CiteSeer PubMed
Metric: BAcc.\uparrow Base + Bat0 + Bat1 Base + Bat0 + Bat1 Base + Bat0 + Bat1
GCN Vanilla 61.56±plus-or-minus\pm±1.24 65.54±plus-or-minus\pm±1.25 69.80±plus-or-minus\pm±1.30 37.62±plus-or-minus\pm±1.61 52.65±plus-or-minus\pm±1.08 55.37±plus-or-minus\pm±1.39 64.23±plus-or-minus\pm±1.55 68.62±plus-or-minus\pm±0.77 67.57±plus-or-minus\pm±3.22
Reweight 67.65±plus-or-minus\pm±0.64 70.97±plus-or-minus\pm±1.28 72.14±plus-or-minus\pm±0.72 42.49±plus-or-minus\pm±2.66 57.91±plus-or-minus\pm±0.98 58.36±plus-or-minus\pm±1.09 71.20±plus-or-minus\pm±2.33 74.19±plus-or-minus\pm±1.12 73.37±plus-or-minus\pm±0.96
ReNode 66.60±plus-or-minus\pm±1.33 71.37±plus-or-minus\pm±0.62 71.84±plus-or-minus\pm±1.25 42.57±plus-or-minus\pm±1.05 57.47±plus-or-minus\pm±0.62 59.28±plus-or-minus\pm±0.59 71.52±plus-or-minus\pm±2.16 73.20±plus-or-minus\pm±0.71 72.53±plus-or-minus\pm±1.62
Resample 59.48±plus-or-minus\pm±1.53 72.51±plus-or-minus\pm±0.68 74.24±plus-or-minus\pm±0.91 39.15±plus-or-minus\pm±2.05 57.90±plus-or-minus\pm±0.33 58.78±plus-or-minus\pm±1.44 64.97±plus-or-minus\pm±1.94 72.53±plus-or-minus\pm±0.85 72.87±plus-or-minus\pm±1.16
SMOTE 58.27±plus-or-minus\pm±1.05 72.16±plus-or-minus\pm±0.53 73.89±plus-or-minus\pm±1.06 39.27±plus-or-minus\pm±1.90 60.06±plus-or-minus\pm±0.81 61.97±plus-or-minus\pm±1.19 64.41±plus-or-minus\pm±1.95 73.17±plus-or-minus\pm±0.84 73.13±plus-or-minus\pm±0.77
GSMOTE 67.99±plus-or-minus\pm±1.37 68.52±plus-or-minus\pm±0.81 71.55±plus-or-minus\pm±0.50 45.05±plus-or-minus\pm±1.95 57.68±plus-or-minus\pm±1.03 57.65±plus-or-minus\pm±1.18 73.99±plus-or-minus\pm±0.88 73.09±plus-or-minus\pm±1.30 76.57±plus-or-minus\pm±0.42
GENS 70.12±plus-or-minus\pm±0.43 72.22±plus-or-minus\pm±0.57 72.58±plus-or-minus\pm±0.58 56.01±plus-or-minus\pm±1.17 60.60±plus-or-minus\pm±0.63 62.67±plus-or-minus\pm±0.42 73.66±plus-or-minus\pm±1.04 76.11±plus-or-minus\pm±0.60 76.91±plus-or-minus\pm±1.03
Best 70.12 72.51 74.24 56.01 60.60 62.67 73.99 76.11 76.91
GAT Vanilla 61.53±plus-or-minus\pm±1.13 66.27±plus-or-minus\pm±0.83 70.13±plus-or-minus\pm±1.07 39.25±plus-or-minus\pm±1.84 55.66±plus-or-minus\pm±1.23 60.34±plus-or-minus\pm±1.66 65.46±plus-or-minus\pm±0.69 73.19±plus-or-minus\pm±0.86 74.75±plus-or-minus\pm±1.18
Reweight 66.94±plus-or-minus\pm±1.24 71.80±plus-or-minus\pm±0.48 71.61±plus-or-minus\pm±0.85 41.29±plus-or-minus\pm±3.39 59.33±plus-or-minus\pm±0.51 61.23±plus-or-minus\pm±0.99 68.37±plus-or-minus\pm±1.41 75.30±plus-or-minus\pm±1.07 74.52±plus-or-minus\pm±1.14
ReNode 66.81±plus-or-minus\pm±0.98 72.14±plus-or-minus\pm±1.24 70.31±plus-or-minus\pm±1.38 43.25±plus-or-minus\pm±1.78 58.26±plus-or-minus\pm±1.98 59.05±plus-or-minus\pm±0.88 71.18±plus-or-minus\pm±2.13 75.55±plus-or-minus\pm±1.01 75.22±plus-or-minus\pm±0.84
Resample 57.76±plus-or-minus\pm±1.73 71.90±plus-or-minus\pm±0.88 73.29±plus-or-minus\pm±1.08 35.97±plus-or-minus\pm±1.42 60.10±plus-or-minus\pm±1.26 60.33±plus-or-minus\pm±0.75 65.14±plus-or-minus\pm±0.86 73.27±plus-or-minus\pm±0.61 73.89±plus-or-minus\pm±0.40
SMOTE 58.81±plus-or-minus\pm±0.64 70.50±plus-or-minus\pm±0.44 72.19±plus-or-minus\pm±0.75 36.95±plus-or-minus\pm±1.86 60.59±plus-or-minus\pm±1.19 62.36±plus-or-minus\pm±1.18 64.81±plus-or-minus\pm±1.47 73.90±plus-or-minus\pm±0.68 74.08±plus-or-minus\pm±0.51
GSMOTE 64.68±plus-or-minus\pm±1.02 69.29±plus-or-minus\pm±1.82 71.14±plus-or-minus\pm±0.96 41.82±plus-or-minus\pm±1.14 56.11±plus-or-minus\pm±1.23 57.71±plus-or-minus\pm±2.58 68.72±plus-or-minus\pm±1.69 74.65±plus-or-minus\pm±0.65 74.41±plus-or-minus\pm±1.57
GENS 69.76±plus-or-minus\pm±0.45 70.63±plus-or-minus\pm±0.40 71.02±plus-or-minus\pm±1.22 51.50±plus-or-minus\pm±2.21 60.95±plus-or-minus\pm±1.51 63.49±plus-or-minus\pm±0.75 73.13±plus-or-minus\pm±1.18 74.34±plus-or-minus\pm±0.35 75.65±plus-or-minus\pm±0.82
Best 69.76 72.14 73.29 51.50 60.95 63.49 73.13 75.55 75.65
SAGE Vanilla 59.17±plus-or-minus\pm±1.23 66.24±plus-or-minus\pm±0.92 66.53±plus-or-minus\pm±0.80 42.96±plus-or-minus\pm±0.28 54.99±plus-or-minus\pm±2.51 53.18±plus-or-minus\pm±2.90 67.56±plus-or-minus\pm±0.84 75.31±plus-or-minus\pm±0.93 77.38±plus-or-minus\pm±0.68
Reweight 63.76±plus-or-minus\pm±0.89 70.15±plus-or-minus\pm±1.15 71.14±plus-or-minus\pm±0.84 45.91±plus-or-minus\pm±2.05 57.95±plus-or-minus\pm±0.73 55.90±plus-or-minus\pm±0.93 68.03±plus-or-minus\pm±1.69 74.56±plus-or-minus\pm±0.41 75.39±plus-or-minus\pm±0.38
ReNode 65.32±plus-or-minus\pm±1.07 71.31±plus-or-minus\pm±1.29 71.54±plus-or-minus\pm±0.85 48.55±plus-or-minus\pm±2.31 56.32±plus-or-minus\pm±0.40 56.49±plus-or-minus\pm±1.73 69.08±plus-or-minus\pm±2.04 74.24±plus-or-minus\pm±0.20 75.28±plus-or-minus\pm±0.69
Resample 57.77±plus-or-minus\pm±1.35 71.24±plus-or-minus\pm±1.08 73.01±plus-or-minus\pm±1.02 39.37±plus-or-minus\pm±1.40 61.41±plus-or-minus\pm±1.11 61.93±plus-or-minus\pm±1.40 69.22±plus-or-minus\pm±1.28 74.91±plus-or-minus\pm±1.09 75.80±plus-or-minus\pm±0.39
SMOTE 58.81±plus-or-minus\pm±1.97 70.31±plus-or-minus\pm±1.35 73.02±plus-or-minus\pm±2.29 38.42±plus-or-minus\pm±1.69 64.14±plus-or-minus\pm±0.75 66.35±plus-or-minus\pm±0.70 64.96±plus-or-minus\pm±1.56 74.59±plus-or-minus\pm±0.96 77.31±plus-or-minus\pm±0.45
GSMOTE 61.57±plus-or-minus\pm±1.78 69.88±plus-or-minus\pm±0.96 72.28±plus-or-minus\pm±1.48 42.21±plus-or-minus\pm±2.12 60.91±plus-or-minus\pm±1.33 62.32±plus-or-minus\pm±1.06 71.55±plus-or-minus\pm±0.64 74.74±plus-or-minus\pm±0.81 76.14±plus-or-minus\pm±0.21
GENS 68.84±plus-or-minus\pm±0.41 69.78±plus-or-minus\pm±1.18 71.92±plus-or-minus\pm±0.71 52.57±plus-or-minus\pm±1.78 64.36±plus-or-minus\pm±0.68 63.84±plus-or-minus\pm±0.68 71.38±plus-or-minus\pm±0.99 75.89±plus-or-minus\pm±1.17 76.46±plus-or-minus\pm±1.29
Best 68.84 71.31 73.02 52.57 64.36 66.35 71.55 75.89 77.38
APPNP Vanilla 55.37±plus-or-minus\pm±1.65 58.13±plus-or-minus\pm±1.69 61.71±plus-or-minus\pm±1.66 35.69±plus-or-minus\pm±0.14 35.68±plus-or-minus\pm±0.15 36.02±plus-or-minus\pm±0.25 59.30±plus-or-minus\pm±0.50 55.62±plus-or-minus\pm±0.31 57.82±plus-or-minus\pm±0.29
Reweight 72.62±plus-or-minus\pm±0.47 73.62±plus-or-minus\pm±0.89 72.51±plus-or-minus\pm±0.87 50.88±plus-or-minus\pm±3.64 63.54±plus-or-minus\pm±1.02 65.57±plus-or-minus\pm±1.11 72.00±plus-or-minus\pm±0.81 72.15±plus-or-minus\pm±0.60 71.22±plus-or-minus\pm±1.10
ReNode 73.74±plus-or-minus\pm±1.12 75.02±plus-or-minus\pm±1.54 72.15±plus-or-minus\pm±0.76 50.50±plus-or-minus\pm±3.51 63.73±plus-or-minus\pm±0.54 65.13±plus-or-minus\pm±0.40 72.76±plus-or-minus\pm±1.37 71.54±plus-or-minus\pm±0.96 71.88±plus-or-minus\pm±0.70
Resample 65.78±plus-or-minus\pm±1.72 73.14±plus-or-minus\pm±0.94 73.57±plus-or-minus\pm±0.92 40.79±plus-or-minus\pm±1.87 66.54±plus-or-minus\pm±0.49 59.51±plus-or-minus\pm±4.16 67.74±plus-or-minus\pm±1.94 72.25±plus-or-minus\pm±0.81 74.41±plus-or-minus\pm±0.95
SMOTE 65.34±plus-or-minus\pm±1.68 73.18±plus-or-minus\pm±1.02 72.88±plus-or-minus\pm±0.90 40.79±plus-or-minus\pm±2.05 66.62±plus-or-minus\pm±0.33 58.82±plus-or-minus\pm±4.59 67.24±plus-or-minus\pm±2.10 72.67±plus-or-minus\pm±1.65 73.33±plus-or-minus\pm±1.37
GSMOTE 71.13±plus-or-minus\pm±0.72 73.37±plus-or-minus\pm±0.82 73.78±plus-or-minus\pm±0.71 45.37±plus-or-minus\pm±2.75 64.95±plus-or-minus\pm±0.11 62.95±plus-or-minus\pm±2.58 69.57±plus-or-minus\pm±2.20 73.37±plus-or-minus\pm±0.95 74.90±plus-or-minus\pm±1.27
Best 73.74 75.02 73.78 50.88 66.62 65.57 72.76 73.37 74.90
GPRGNN Vanilla 67.97±plus-or-minus\pm±0.51 71.99±plus-or-minus\pm±1.14 73.38±plus-or-minus\pm±1.18 42.31±plus-or-minus\pm±2.16 55.85±plus-or-minus\pm±0.89 58.82±plus-or-minus\pm±1.91 67.04±plus-or-minus\pm±1.82 57.92±plus-or-minus\pm±0.45 77.49±plus-or-minus\pm±1.15
Reweight 72.15±plus-or-minus\pm±0.57 72.90±plus-or-minus\pm±1.33 73.22±plus-or-minus\pm±0.55 53.22±plus-or-minus\pm±2.89 59.78±plus-or-minus\pm±0.76 61.00±plus-or-minus\pm±1.82 73.35±plus-or-minus\pm±1.07 75.22±plus-or-minus\pm±1.02 76.86±plus-or-minus\pm±0.76
ReNode 73.38±plus-or-minus\pm±0.67 73.71±plus-or-minus\pm±0.57 73.93±plus-or-minus\pm±1.60 54.66±plus-or-minus\pm±2.82 59.69±plus-or-minus\pm±0.73 60.34±plus-or-minus\pm±1.31 73.56±plus-or-minus\pm±0.98 75.69±plus-or-minus\pm±1.10 76.25±plus-or-minus\pm±0.67
Resample 67.00±plus-or-minus\pm±1.33 72.94±plus-or-minus\pm±1.02 74.89±plus-or-minus\pm±0.86 42.27±plus-or-minus\pm±2.15 64.16±plus-or-minus\pm±0.62 63.89±plus-or-minus\pm±0.98 70.42±plus-or-minus\pm±1.51 73.79±plus-or-minus\pm±0.83 75.31±plus-or-minus\pm±0.54
SMOTE 66.99±plus-or-minus\pm±1.33 74.01±plus-or-minus\pm±1.51 74.41±plus-or-minus\pm±1.05 40.97±plus-or-minus\pm±2.02 63.88±plus-or-minus\pm±0.55 62.60±plus-or-minus\pm±1.72 70.29±plus-or-minus\pm±1.47 73.89±plus-or-minus\pm±0.69 75.48±plus-or-minus\pm±1.02
GSMOTE 70.94±plus-or-minus\pm±0.57 73.63±plus-or-minus\pm±1.25 74.02±plus-or-minus\pm±0.90 48.01±plus-or-minus\pm±3.28 63.03±plus-or-minus\pm±0.92 61.68±plus-or-minus\pm±0.86 71.51±plus-or-minus\pm±1.91 72.16±plus-or-minus\pm±0.58 74.77±plus-or-minus\pm±0.83
Best 73.38 74.01 74.89 54.66 64.16 63.89 73.56 75.69 77.49
Table 10: Macro-F1 score of combining Bat with 6 IGL baselines ×\times× 5 GNN backbones.
Dataset (IR=10) Cora CiteSeer PubMed
Metric: Macro-F1\uparrow Base + Bat0 + Bat1 Base + Bat0 + Bat1 Base + Bat0 + Bat1
GCN Vanilla 60.10±plus-or-minus\pm±1.53 63.28±plus-or-minus\pm±1.07 68.68±plus-or-minus\pm±1.49 28.05±plus-or-minus\pm±2.53 51.55±plus-or-minus\pm±1.28 54.94±plus-or-minus\pm±1.44 55.09±plus-or-minus\pm±2.48 67.16±plus-or-minus\pm±1.53 64.40±plus-or-minus\pm±3.68
Reweight 67.85±plus-or-minus\pm±0.62 69.41±plus-or-minus\pm±1.01 70.31±plus-or-minus\pm±0.82 36.59±plus-or-minus\pm±3.66 56.84±plus-or-minus\pm±1.06 57.54±plus-or-minus\pm±1.08 67.07±plus-or-minus\pm±3.42 72.94±plus-or-minus\pm±0.81 73.24±plus-or-minus\pm±0.90
ReNode 66.66±plus-or-minus\pm±1.59 69.79±plus-or-minus\pm±0.79 70.59±plus-or-minus\pm±1.25 34.64±plus-or-minus\pm±1.54 56.69±plus-or-minus\pm±0.64 58.07±plus-or-minus\pm±0.77 67.86±plus-or-minus\pm±3.99 72.61±plus-or-minus\pm±0.41 72.25±plus-or-minus\pm±0.89
Resample 57.34±plus-or-minus\pm±2.27 71.36±plus-or-minus\pm±0.39 72.82±plus-or-minus\pm±1.13 29.73±plus-or-minus\pm±2.77 57.17±plus-or-minus\pm±0.48 58.03±plus-or-minus\pm±1.42 56.74±plus-or-minus\pm±3.54 71.19±plus-or-minus\pm±0.83 73.13±plus-or-minus\pm±1.33
SMOTE 55.65±plus-or-minus\pm±1.62 71.04±plus-or-minus\pm±0.16 72.82±plus-or-minus\pm±0.86 29.39±plus-or-minus\pm±2.81 59.53±plus-or-minus\pm±0.88 61.53±plus-or-minus\pm±1.24 56.14±plus-or-minus\pm±3.74 71.72±plus-or-minus\pm±0.60 72.83±plus-or-minus\pm±1.20
GSMOTE 67.60±plus-or-minus\pm±1.67 68.01±plus-or-minus\pm±1.00 70.28±plus-or-minus\pm±0.48 40.07±plus-or-minus\pm±3.02 56.64±plus-or-minus\pm±1.09 56.25±plus-or-minus\pm±1.50 70.60±plus-or-minus\pm±1.17 72.95±plus-or-minus\pm±1.39 75.70±plus-or-minus\pm±0.35
GENS 69.96±plus-or-minus\pm±0.29 71.62±plus-or-minus\pm±0.64 72.28±plus-or-minus\pm±0.65 54.45±plus-or-minus\pm±1.69 59.89±plus-or-minus\pm±0.68 62.46±plus-or-minus\pm±0.43 71.28±plus-or-minus\pm±1.84 75.77±plus-or-minus\pm±0.55 76.86±plus-or-minus\pm±0.93
Best 69.96 71.62 72.82 54.45 59.89 62.46 71.28 75.77 76.86
GAT Vanilla 60.71±plus-or-minus\pm±1.61 64.27±plus-or-minus\pm±0.95 68.93±plus-or-minus\pm±0.79 31.12±plus-or-minus\pm±3.15 54.71±plus-or-minus\pm±1.18 59.42±plus-or-minus\pm±1.55 57.32±plus-or-minus\pm±1.55 71.27±plus-or-minus\pm±1.11 74.03±plus-or-minus\pm±1.08
Reweight 66.49±plus-or-minus\pm±1.34 69.84±plus-or-minus\pm±0.91 69.79±plus-or-minus\pm±0.77 34.94±plus-or-minus\pm±4.09 58.53±plus-or-minus\pm±0.68 60.28±plus-or-minus\pm±1.12 63.82±plus-or-minus\pm±1.60 75.13±plus-or-minus\pm±1.13 73.88±plus-or-minus\pm±1.38
ReNode 67.27±plus-or-minus\pm±1.23 70.61±plus-or-minus\pm±0.83 68.24±plus-or-minus\pm±1.48 37.72±plus-or-minus\pm±2.61 57.64±plus-or-minus\pm±2.11 58.57±plus-or-minus\pm±0.75 67.38±plus-or-minus\pm±3.22 74.88±plus-or-minus\pm±0.99 74.96±plus-or-minus\pm±1.18
Resample 55.36±plus-or-minus\pm±2.47 70.87±plus-or-minus\pm±0.94 72.31±plus-or-minus\pm±1.07 25.71±plus-or-minus\pm±1.97 59.77±plus-or-minus\pm±1.31 59.66±plus-or-minus\pm±0.95 57.24±plus-or-minus\pm±1.54 72.53±plus-or-minus\pm±0.66 73.09±plus-or-minus\pm±0.83
SMOTE 57.49±plus-or-minus\pm±0.60 69.68±plus-or-minus\pm±0.66 71.74±plus-or-minus\pm±1.03 26.05±plus-or-minus\pm±2.30 59.83±plus-or-minus\pm±1.33 61.75±plus-or-minus\pm±1.30 55.66±plus-or-minus\pm±2.76 73.33±plus-or-minus\pm±1.00 73.30±plus-or-minus\pm±0.16
GSMOTE 64.34±plus-or-minus\pm±1.69 68.23±plus-or-minus\pm±1.80 69.77±plus-or-minus\pm±1.08 35.07±plus-or-minus\pm±1.77 55.86±plus-or-minus\pm±1.10 57.13±plus-or-minus\pm±2.69 63.35±plus-or-minus\pm±2.92 74.23±plus-or-minus\pm±0.84 73.34±plus-or-minus\pm±2.06
GENS 69.96±plus-or-minus\pm±0.62 69.83±plus-or-minus\pm±0.41 70.71±plus-or-minus\pm±1.16 48.34±plus-or-minus\pm±2.19 60.04±plus-or-minus\pm±1.85 62.55±plus-or-minus\pm±0.86 71.78±plus-or-minus\pm±1.19 72.69±plus-or-minus\pm±0.84 74.42±plus-or-minus\pm±1.12
Best 69.96 70.87 72.31 48.34 60.04 62.55 71.78 75.13 74.96
SAGE Vanilla 57.36±plus-or-minus\pm±1.77 64.90±plus-or-minus\pm±0.87 65.61±plus-or-minus\pm±0.97 36.07±plus-or-minus\pm±1.06 54.76±plus-or-minus\pm±2.47 51.86±plus-or-minus\pm±3.25 63.75±plus-or-minus\pm±1.24 74.35±plus-or-minus\pm±0.74 76.92±plus-or-minus\pm±0.63
Reweight 63.72±plus-or-minus\pm±1.10 69.06±plus-or-minus\pm±0.90 69.59±plus-or-minus\pm±0.53 39.64±plus-or-minus\pm±2.57 57.17±plus-or-minus\pm±0.76 54.83±plus-or-minus\pm±0.74 62.83±plus-or-minus\pm±2.57 73.88±plus-or-minus\pm±0.40 75.42±plus-or-minus\pm±0.48
ReNode 65.59±plus-or-minus\pm±1.44 69.99±plus-or-minus\pm±1.35 69.86±plus-or-minus\pm±1.27 44.20±plus-or-minus\pm±3.68 55.41±plus-or-minus\pm±0.48 55.78±plus-or-minus\pm±1.63 64.97±plus-or-minus\pm±3.00 74.33±plus-or-minus\pm±0.20 74.88±plus-or-minus\pm±0.53
Resample 55.29±plus-or-minus\pm±2.12 70.40±plus-or-minus\pm±1.11 71.49±plus-or-minus\pm±0.79 30.14±plus-or-minus\pm±2.20 60.71±plus-or-minus\pm±1.25 61.29±plus-or-minus\pm±1.48 65.23±plus-or-minus\pm±2.26 74.28±plus-or-minus\pm±0.96 75.48±plus-or-minus\pm±0.44
SMOTE 56.72±plus-or-minus\pm±2.69 69.42±plus-or-minus\pm±1.29 71.71±plus-or-minus\pm±1.94 29.22±plus-or-minus\pm±2.33 63.61±plus-or-minus\pm±0.87 65.91±plus-or-minus\pm±0.68 57.60±plus-or-minus\pm±3.22 72.98±plus-or-minus\pm±0.69 76.45±plus-or-minus\pm±0.77
GSMOTE 59.44±plus-or-minus\pm±2.25 69.10±plus-or-minus\pm±0.95 71.30±plus-or-minus\pm±1.47 34.86±plus-or-minus\pm±3.46 60.53±plus-or-minus\pm±1.27 61.96±plus-or-minus\pm±1.12 67.23±plus-or-minus\pm±0.61 74.36±plus-or-minus\pm±1.02 75.68±plus-or-minus\pm±0.31
GENS 68.23±plus-or-minus\pm±0.72 69.76±plus-or-minus\pm±0.95 71.11±plus-or-minus\pm±0.81 51.05±plus-or-minus\pm±2.03 63.87±plus-or-minus\pm±0.82 63.41±plus-or-minus\pm±0.57 70.06±plus-or-minus\pm±0.86 75.33±plus-or-minus\pm±1.46 76.01±plus-or-minus\pm±1.14
Best 68.23 70.40 71.71 51.05 63.87 65.91 70.06 75.33 76.92
APPNP Vanilla 50.39±plus-or-minus\pm±2.81 54.19±plus-or-minus\pm±2.58 59.99±plus-or-minus\pm±2.49 22.21±plus-or-minus\pm±0.13 22.54±plus-or-minus\pm±0.25 22.89±plus-or-minus\pm±0.22 44.50±plus-or-minus\pm±0.21 44.67±plus-or-minus\pm±0.07 44.59±plus-or-minus\pm±0.06
Reweight 72.63±plus-or-minus\pm±0.53 72.71±plus-or-minus\pm±0.60 70.61±plus-or-minus\pm±0.65 45.25±plus-or-minus\pm±4.85 63.08±plus-or-minus\pm±1.03 65.20±plus-or-minus\pm±1.20 69.53±plus-or-minus\pm±1.14 72.24±plus-or-minus\pm±0.58 72.26±plus-or-minus\pm±0.80
ReNode 73.67±plus-or-minus\pm±0.98 73.67±plus-or-minus\pm±1.18 69.79±plus-or-minus\pm±0.72 44.91±plus-or-minus\pm±4.99 62.97±plus-or-minus\pm±0.78 64.47±plus-or-minus\pm±0.40 70.65±plus-or-minus\pm±1.66 72.33±plus-or-minus\pm±0.90 72.18±plus-or-minus\pm±0.55
Resample 65.20±plus-or-minus\pm±2.08 72.25±plus-or-minus\pm±0.82 72.72±plus-or-minus\pm±0.97 31.04±plus-or-minus\pm±2.76 66.06±plus-or-minus\pm±0.54 54.57±plus-or-minus\pm±6.08 62.42±plus-or-minus\pm±3.62 72.32±plus-or-minus\pm±0.93 74.27±plus-or-minus\pm±1.08
SMOTE 64.70±plus-or-minus\pm±2.06 72.90±plus-or-minus\pm±0.83 72.31±plus-or-minus\pm±0.94 30.90±plus-or-minus\pm±2.86 66.18±plus-or-minus\pm±0.37 53.90±plus-or-minus\pm±6.26 61.83±plus-or-minus\pm±3.65 72.55±plus-or-minus\pm±1.61 73.87±plus-or-minus\pm±1.37
GSMOTE 71.20±plus-or-minus\pm±0.67 73.02±plus-or-minus\pm±0.74 73.22±plus-or-minus\pm±0.92 37.90±plus-or-minus\pm±4.29 64.56±plus-or-minus\pm±0.18 60.41±plus-or-minus\pm±3.84 65.65±plus-or-minus\pm±3.06 72.54±plus-or-minus\pm±0.85 74.61±plus-or-minus\pm±1.36
Best 73.67 73.67 73.22 45.25 66.18 65.20 70.65 72.55 74.61
GPRGNN Vanilla 67.86±plus-or-minus\pm±0.79 70.80±plus-or-minus\pm±1.16 72.32±plus-or-minus\pm±1.18 35.00±plus-or-minus\pm±2.96 55.06±plus-or-minus\pm±0.89 56.31±plus-or-minus\pm±2.87 59.01±plus-or-minus\pm±3.62 50.12±plus-or-minus\pm±1.46 77.62±plus-or-minus\pm±1.04
Reweight 71.66±plus-or-minus\pm±0.85 70.46±plus-or-minus\pm±0.98 71.24±plus-or-minus\pm±0.49 49.19±plus-or-minus\pm±3.61 59.11±plus-or-minus\pm±0.73 60.30±plus-or-minus\pm±2.04 71.18±plus-or-minus\pm±0.95 75.47±plus-or-minus\pm±0.90 77.01±plus-or-minus\pm±0.52
ReNode 73.08±plus-or-minus\pm±0.66 71.52±plus-or-minus\pm±0.50 71.72±plus-or-minus\pm±1.51 50.34±plus-or-minus\pm±3.18 59.10±plus-or-minus\pm±0.75 58.94±plus-or-minus\pm±1.36 71.45±plus-or-minus\pm±1.19 75.08±plus-or-minus\pm±1.06 75.76±plus-or-minus\pm±0.84
Resample 66.42±plus-or-minus\pm±1.65 71.70±plus-or-minus\pm±0.86 73.54±plus-or-minus\pm±0.83 32.60±plus-or-minus\pm±2.71 63.59±plus-or-minus\pm±0.65 63.12±plus-or-minus\pm±1.06 66.58±plus-or-minus\pm±2.08 73.66±plus-or-minus\pm±0.86 75.42±plus-or-minus\pm±0.35
SMOTE 66.43±plus-or-minus\pm±1.74 72.89±plus-or-minus\pm±1.23 73.47±plus-or-minus\pm±1.12 31.38±plus-or-minus\pm±2.70 63.41±plus-or-minus\pm±0.55 61.23±plus-or-minus\pm±2.59 66.78±plus-or-minus\pm±1.97 73.98±plus-or-minus\pm±0.70 75.63±plus-or-minus\pm±0.91
GSMOTE 70.87±plus-or-minus\pm±0.53 72.53±plus-or-minus\pm±0.85 73.12±plus-or-minus\pm±0.95 42.82±plus-or-minus\pm±4.52 62.09±plus-or-minus\pm±1.04 60.82±plus-or-minus\pm±0.88 67.93±plus-or-minus\pm±3.01 72.72±plus-or-minus\pm±0.72 74.66±plus-or-minus\pm±0.74
Best 73.08 72.89 73.54 50.34 63.59 63.12 71.45 75.47 77.62
Table 11: Performance deviation of combining Bat with 6 IGL baselines ×\times× 5 GNN backbones.
Dataset (IR=10) Cora CiteSeer PubMed
Metric: PerfStd\downarrow Base + Bat0 + Bat1 Base + Bat0 + Bat1 Base + Bat0 + Bat1
GCN Vanilla 27.88±plus-or-minus\pm±1.79 21.27±plus-or-minus\pm±1.76 18.49±plus-or-minus\pm±2.68 29.93±plus-or-minus\pm±1.38 13.82±plus-or-minus\pm±2.06 13.93±plus-or-minus\pm±0.80 34.73±plus-or-minus\pm±2.14 9.23±plus-or-minus\pm±2.78 21.81±plus-or-minus\pm±4.51
Reweight 22.29±plus-or-minus\pm±1.41 14.43±plus-or-minus\pm±2.51 18.32±plus-or-minus\pm±2.20 25.47±plus-or-minus\pm±1.78 19.10±plus-or-minus\pm±1.48 22.64±plus-or-minus\pm±0.79 19.33±plus-or-minus\pm±5.26 10.21±plus-or-minus\pm±1.58 5.88±plus-or-minus\pm±0.83
ReNode 22.88±plus-or-minus\pm±1.64 14.65±plus-or-minus\pm±2.07 17.00±plus-or-minus\pm±2.13 30.31±plus-or-minus\pm±1.51 20.22±plus-or-minus\pm±0.88 22.99±plus-or-minus\pm±1.09 18.14±plus-or-minus\pm±5.79 12.99±plus-or-minus\pm±1.57 10.96±plus-or-minus\pm±1.92
Resample 31.57±plus-or-minus\pm±1.85 15.13±plus-or-minus\pm±2.14 15.25±plus-or-minus\pm±2.79 31.00±plus-or-minus\pm±1.32 16.30±plus-or-minus\pm±1.89 20.79±plus-or-minus\pm±0.43 30.90±plus-or-minus\pm±5.67 11.63±plus-or-minus\pm±3.20 7.82±plus-or-minus\pm±0.80
SMOTE 33.32±plus-or-minus\pm±1.38 16.33±plus-or-minus\pm±1.12 17.95±plus-or-minus\pm±2.50 32.61±plus-or-minus\pm±1.45 17.27±plus-or-minus\pm±0.86 18.25±plus-or-minus\pm±0.89 31.79±plus-or-minus\pm±5.21 10.56±plus-or-minus\pm±1.82 11.66±plus-or-minus\pm±2.58
GSMOTE 21.78±plus-or-minus\pm±1.79 17.90±plus-or-minus\pm±2.75 18.44±plus-or-minus\pm±2.20 22.64±plus-or-minus\pm±2.69 21.37±plus-or-minus\pm±1.25 21.01±plus-or-minus\pm±1.45 15.87±plus-or-minus\pm±2.34 3.35±plus-or-minus\pm±1.08 5.83±plus-or-minus\pm±1.27
GENS 20.04±plus-or-minus\pm±1.12 16.98±plus-or-minus\pm±3.02 18.02±plus-or-minus\pm±2.23 16.95±plus-or-minus\pm±2.64 14.94±plus-or-minus\pm±0.75 15.54±plus-or-minus\pm±0.60 11.93±plus-or-minus\pm±3.46 5.95±plus-or-minus\pm±1.85 5.15±plus-or-minus\pm±0.80
Best 20.04 14.43 15.25 16.95 13.82 13.93 11.93 3.35 5.15
GAT Vanilla 27.38±plus-or-minus\pm±1.71 19.23±plus-or-minus\pm±0.80 17.97±plus-or-minus\pm±2.65 28.32±plus-or-minus\pm±2.07 15.62±plus-or-minus\pm±0.77 15.90±plus-or-minus\pm±0.95 30.94±plus-or-minus\pm±1.27 10.77±plus-or-minus\pm±2.04 8.51±plus-or-minus\pm±2.48
Reweight 22.90±plus-or-minus\pm±1.67 16.44±plus-or-minus\pm±2.71 17.32±plus-or-minus\pm±2.83 30.27±plus-or-minus\pm±1.26 18.64±plus-or-minus\pm±1.30 20.83±plus-or-minus\pm±1.06 24.92±plus-or-minus\pm±1.88 3.01±plus-or-minus\pm±0.96 5.44±plus-or-minus\pm±1.33
ReNode 23.13±plus-or-minus\pm±1.54 15.05±plus-or-minus\pm±1.59 18.96±plus-or-minus\pm±1.65 25.21±plus-or-minus\pm±1.85 20.48±plus-or-minus\pm±0.82 20.51±plus-or-minus\pm±0.49 18.15±plus-or-minus\pm±4.37 4.77±plus-or-minus\pm±1.22 6.17±plus-or-minus\pm±0.42
Resample 32.73±plus-or-minus\pm±2.12 17.87±plus-or-minus\pm±2.04 17.64±plus-or-minus\pm±2.57 32.59±plus-or-minus\pm±0.89 17.76±plus-or-minus\pm±1.79 18.73±plus-or-minus\pm±1.02 31.67±plus-or-minus\pm±0.98 6.18±plus-or-minus\pm±1.35 4.58±plus-or-minus\pm±1.14
SMOTE 31.17±plus-or-minus\pm±0.69 18.40±plus-or-minus\pm±1.03 18.26±plus-or-minus\pm±1.87 33.32±plus-or-minus\pm±0.88 10.68±plus-or-minus\pm±0.68 13.24±plus-or-minus\pm±1.12 32.79±plus-or-minus\pm±2.13 8.14±plus-or-minus\pm±2.00 7.56±plus-or-minus\pm±1.05
GSMOTE 24.84±plus-or-minus\pm±1.60 15.48±plus-or-minus\pm±2.08 18.23±plus-or-minus\pm±2.00 26.74±plus-or-minus\pm±1.53 18.34±plus-or-minus\pm±1.64 19.76±plus-or-minus\pm±0.42 24.50±plus-or-minus\pm±2.78 5.12±plus-or-minus\pm±1.38 8.54±plus-or-minus\pm±2.03
GENS 20.08±plus-or-minus\pm±1.56 17.75±plus-or-minus\pm±2.40 17.88±plus-or-minus\pm±2.50 26.49±plus-or-minus\pm±1.18 12.89±plus-or-minus\pm±0.77 15.09±plus-or-minus\pm±0.95 10.29±plus-or-minus\pm±2.75 7.83±plus-or-minus\pm±2.27 7.55±plus-or-minus\pm±2.38
Best 20.08 15.05 17.32 25.21 10.68 13.24 10.29 3.01 4.58
SAGE Vanilla 29.94±plus-or-minus\pm±1.75 18.62±plus-or-minus\pm±2.13 19.49±plus-or-minus\pm±1.67 26.75±plus-or-minus\pm±1.58 14.56±plus-or-minus\pm±1.16 18.13±plus-or-minus\pm±1.32 21.09±plus-or-minus\pm±3.43 10.96±plus-or-minus\pm±1.99 4.09±plus-or-minus\pm±1.17
Reweight 25.61±plus-or-minus\pm±1.60 15.24±plus-or-minus\pm±2.66 17.54±plus-or-minus\pm±2.45 29.95±plus-or-minus\pm±1.83 19.05±plus-or-minus\pm±1.60 22.94±plus-or-minus\pm±0.49 25.47±plus-or-minus\pm±3.49 3.35±plus-or-minus\pm±0.72 8.09±plus-or-minus\pm±0.19
ReNode 24.12±plus-or-minus\pm±1.73 13.32±plus-or-minus\pm±3.03 15.45±plus-or-minus\pm±2.41 22.41±plus-or-minus\pm±4.31 22.20±plus-or-minus\pm±0.97 22.75±plus-or-minus\pm±0.87 22.92±plus-or-minus\pm±4.36 7.63±plus-or-minus\pm±1.23 5.77±plus-or-minus\pm±1.55
Resample 31.66±plus-or-minus\pm±1.47 15.77±plus-or-minus\pm±2.75 15.08±plus-or-minus\pm±2.74 30.29±plus-or-minus\pm±1.16 18.72±plus-or-minus\pm±0.90 18.48±plus-or-minus\pm±2.00 21.41±plus-or-minus\pm±2.88 4.68±plus-or-minus\pm±1.42 4.76±plus-or-minus\pm±1.09
SMOTE 30.86±plus-or-minus\pm±2.64 17.30±plus-or-minus\pm±2.09 14.87±plus-or-minus\pm±3.23 32.07±plus-or-minus\pm±1.00 13.17±plus-or-minus\pm±1.33 12.78±plus-or-minus\pm±0.43 31.62±plus-or-minus\pm±2.86 13.88±plus-or-minus\pm±1.44 11.63±plus-or-minus\pm±2.28
GSMOTE 27.71±plus-or-minus\pm±1.86 17.28±plus-or-minus\pm±2.25 16.10±plus-or-minus\pm±2.94 28.77±plus-or-minus\pm±2.61 18.69±plus-or-minus\pm±0.76 18.05±plus-or-minus\pm±1.21 20.10±plus-or-minus\pm±0.90 5.37±plus-or-minus\pm±1.21 4.64±plus-or-minus\pm±1.61
GENS 19.81±plus-or-minus\pm±1.65 17.50±plus-or-minus\pm±2.05 17.63±plus-or-minus\pm±2.11 19.76±plus-or-minus\pm±2.07 15.99±plus-or-minus\pm±0.81 16.99±plus-or-minus\pm±0.85 11.76±plus-or-minus\pm±2.91 7.63±plus-or-minus\pm±1.51 8.31±plus-or-minus\pm±1.64
Best 19.81 13.32 14.87 19.76 13.17 12.78 11.76 3.35 4.09
APPNP Vanilla 38.32±plus-or-minus\pm±1.94 35.50±plus-or-minus\pm±2.10 32.18±plus-or-minus\pm±1.68 36.82±plus-or-minus\pm±0.10 36.67±plus-or-minus\pm±0.25 36.83±plus-or-minus\pm±0.36 42.13±plus-or-minus\pm±0.27 40.45±plus-or-minus\pm±0.11 41.34±plus-or-minus\pm±0.16
Reweight 19.83±plus-or-minus\pm±1.46 17.33±plus-or-minus\pm±2.88 18.46±plus-or-minus\pm±2.42 26.19±plus-or-minus\pm±2.93 20.96±plus-or-minus\pm±0.58 19.38±plus-or-minus\pm±0.72 16.96±plus-or-minus\pm±2.84 8.04±plus-or-minus\pm±1.94 9.13±plus-or-minus\pm±1.05
ReNode 18.09±plus-or-minus\pm±2.52 16.87±plus-or-minus\pm±2.95 19.47±plus-or-minus\pm±2.00 25.95±plus-or-minus\pm±3.66 22.09±plus-or-minus\pm±1.43 20.42±plus-or-minus\pm±1.57 14.49±plus-or-minus\pm±3.81 10.25±plus-or-minus\pm±1.93 3.95±plus-or-minus\pm±1.02
Resample 27.28±plus-or-minus\pm±2.13 18.37±plus-or-minus\pm±2.34 18.72±plus-or-minus\pm±2.32 32.71±plus-or-minus\pm±1.23 15.87±plus-or-minus\pm±1.02 23.72±plus-or-minus\pm±4.14 25.86±plus-or-minus\pm±4.38 13.60±plus-or-minus\pm±1.68 9.49±plus-or-minus\pm±1.65
SMOTE 27.86±plus-or-minus\pm±1.78 18.61±plus-or-minus\pm±2.35 19.42±plus-or-minus\pm±1.81 33.26±plus-or-minus\pm±1.10 14.91±plus-or-minus\pm±0.93 22.90±plus-or-minus\pm±4.49 26.37±plus-or-minus\pm±4.47 13.37±plus-or-minus\pm±1.97 8.70±plus-or-minus\pm±2.29
GSMOTE 20.98±plus-or-minus\pm±1.45 18.19±plus-or-minus\pm±2.59 18.55±plus-or-minus\pm±2.28 29.39±plus-or-minus\pm±2.20 16.49±plus-or-minus\pm±1.12 19.19±plus-or-minus\pm±3.44 22.32±plus-or-minus\pm±4.21 11.53±plus-or-minus\pm±3.00 10.69±plus-or-minus\pm±2.27
Best 18.09 16.87 18.46 25.95 14.91 19.19 14.49 8.04 3.95
GPRGNN Vanilla 22.96±plus-or-minus\pm±1.20 18.12±plus-or-minus\pm±2.29 17.00±plus-or-minus\pm±2.98 27.57±plus-or-minus\pm±1.32 17.10±plus-or-minus\pm±1.17 20.94±plus-or-minus\pm±2.58 29.94±plus-or-minus\pm±3.68 36.57±plus-or-minus\pm±1.46 5.30±plus-or-minus\pm±0.91
Reweight 20.94±plus-or-minus\pm±1.21 17.83±plus-or-minus\pm±2.82 19.67±plus-or-minus\pm±1.81 22.43±plus-or-minus\pm±2.39 21.52±plus-or-minus\pm±1.06 20.03±plus-or-minus\pm±1.81 16.12±plus-or-minus\pm±1.84 7.54±plus-or-minus\pm±0.49 5.48±plus-or-minus\pm±1.49
ReNode 18.84±plus-or-minus\pm±2.19 16.78±plus-or-minus\pm±2.53 17.89±plus-or-minus\pm±2.96 24.14±plus-or-minus\pm±1.47 19.84±plus-or-minus\pm±1.79 22.83±plus-or-minus\pm±1.40 14.40±plus-or-minus\pm±3.18 9.75±plus-or-minus\pm±2.20 6.61±plus-or-minus\pm±1.47
Resample 25.62±plus-or-minus\pm±1.80 19.23±plus-or-minus\pm±2.26 17.61±plus-or-minus\pm±2.77 33.08±plus-or-minus\pm±0.66 17.04±plus-or-minus\pm±0.78 15.98±plus-or-minus\pm±0.93 22.59±plus-or-minus\pm±2.75 7.62±plus-or-minus\pm±2.50 7.76±plus-or-minus\pm±0.85
SMOTE 25.44±plus-or-minus\pm±1.88 16.97±plus-or-minus\pm±3.19 17.38±plus-or-minus\pm±2.78 32.85±plus-or-minus\pm±0.95 15.09±plus-or-minus\pm±1.23 16.85±plus-or-minus\pm±2.51 21.35±plus-or-minus\pm±2.76 9.41±plus-or-minus\pm±2.67 6.09±plus-or-minus\pm±0.62
GSMOTE 21.23±plus-or-minus\pm±1.48 18.02±plus-or-minus\pm±2.62 19.06±plus-or-minus\pm±2.28 24.21±plus-or-minus\pm±3.06 14.83±plus-or-minus\pm±0.95 19.11±plus-or-minus\pm±1.73 20.08±plus-or-minus\pm±3.77 5.99±plus-or-minus\pm±1.49 8.27±plus-or-minus\pm±0.75
Best 18.84 16.78 17.00 22.43 14.83 15.98 14.40 5.99 5.30
Table 12: Performance of Bat under varying types and levels of class imbalance. For each setting, we report the relative gain over base and mark the best/second-best score in bold/underlined.
Dataset Cora CiteSeer PubMed CS Physics
Step IR 10 20 10 20 10 20 10 20 10 20
BAcc.\uparrow Base 61.6 52.7 37.6 34.2 64.2 60.8 75.4 65.3 80.1 67.7
+ Bat 69.8+13.4% 71.3+35.2% 55.4+47.2% 51.3+49.9% 68.6+6.8% 63.3+4.1% 82.6+9.6% 79.9+22.2% 87.6+9.4% 88.0+29.9%
BestIGL 70.1+13.9% 66.5+26.2% 56.0+48.9% 47.2+38.0% 74.0+15.2% 71.1+17.0% 84.1+11.6% 81.3+24.4% 89.4+11.6% 85.7+26.6%
+ Bat 74.2+20.6% 71.6+35.9% 62.7+66.6% 62.5+82.6% 76.9+19.7% 75.7+24.5% 86.3+14.5% 85.6+31.0% 91.2+13.9% 90.9+34.2%
Macro-F1\uparrow Base 60.1 47.0 28.1 21.9 55.1 46.4 72.7 59.2 80.7 64.7
+ Bat 68.7+14.3% 69.6+48.1% 54.9+95.8% 48.9+123.5% 67.2+21.9% 60.7+30.8% 78.6+8.1% 74.7+26.1% 88.8+10.0% 87.8+35.8%
BestIGL 70.0+16.4% 66.2+40.9% 54.5+94.1% 45.0+105.6% 71.3+29.4% 68.9+48.3% 83.9+15.3% 80.9+36.7% 89.5+10.9% 86.2+33.2%
+ Bat 72.8+21.2% 70.2+49.4% 62.5+122.7% 62.1+183.6% 76.9+39.5% 74.9+61.2% 85.4+17.5% 84.6+43.0% 90.7+12.4% 90.0+39.2%
PerfStd\downarrow Base 27.9 39.0 29.9 35.1 34.7 41.5 21.2 32.1 22.2 36.0
+ Bat 21.3-23.7% 24.4-37.5% 13.9-53.5% 16.7-52.5% 21.8-37.2% 29.1-29.9% 17.4-18.2% 22.9-28.8% 11.5-48.3% 25.6-29.0%
BestIGL 20.0-28.1% 21.9-43.8% 16.9-43.4% 18.0-48.6% 11.9-65.6% 14.2-65.7% 8.9-58.3% 12.3-61.8% 6.3-71.7% 12.4-65.5%
+ Bat 15.2-45.3% 17.5-55.2% 13.9-53.5% 16.7-52.5% 5.1-85.2% 4.6-89.0% 7.9-62.7% 10.1-68.5% 6.6-70.2% 6.9-80.8%
Natural IR 50 100 50 100 50 100 50 100 50 100
BAcc.\uparrow Base 58.1 61.8 44.9 44.7 52.0 51.1 73.8 71.4 76.0 77.7
+ Bat 69.1+18.9% 68.3+10.6% 58.4+29.9% 57.4+28.5% 55.6+7.0% 56.5+10.4% 82.1+11.3% 81.9+14.8% 86.9+14.3% 84.1+8.3%
BestIGL 71.0+22.3% 73.8+19.5% 56.3+25.3% 56.3+26.0% 72.7+39.8% 72.8+42.5% 81.2+10.0% 81.4+14.0% 85.8+12.9% 87.2+12.2%
+ Bat 73.1+25.8% 76.9+24.5% 62.1+38.2% 61.3+37.3% 75.8+45.7% 75.9+48.5% 85.0+15.1% 84.5+18.5% 88.6+16.5% 89.7+15.4%
Macro-F1\uparrow Base 58.7 61.4 37.5 36.2 47.3 45.1 75.3 73.2 78.0 79.8
+ Bat 68.7+17.1% 67.5+10.0% 57.1+52.6% 55.8+54.3% 52.8+11.6% 52.0+15.4% 82.6+9.7% 82.6+12.8% 87.6+12.3% 85.2+6.8%
BestIGL 71.1+21.2% 73.4+19.5% 54.3+44.8% 53.8+48.8% 72.9+53.9% 73.7+63.6% 82.5+9.5% 82.4+12.6% 87.7+12.4% 88.3+10.6%
+ Bat 72.7+23.9% 76.0+23.9% 60.2+60.8% 59.4+64.3% 75.3+59.2% 76.1+68.8% 85.7+13.7% 85.1+16.2% 88.8+13.8% 89.4+12.0%
PerfStd\downarrow Base 28.8 31.0 38.7 39.8 36.2 38.2 26.3 28.2 23.8 21.0
+ Bat 18.3-36.4% 25.4-18.1% 24.9-35.6% 33.1-17.0% 33.3-8.1% 35.9-6.2% 19.0-27.9% 19.5-30.9% 17.0-28.7% 19.6-6.7%
BestIGL 18.9-34.4% 17.3-44.4% 28.7-25.9% 29.7-25.3% 6.0-83.4% 9.6-75.0% 14.4-45.4% 15.4-45.5% 11.2-53.1% 9.7-53.8%
+ Bat 15.9-44.8% 14.7-52.8% 21.9-43.4% 19.8-50.3% 4.2-88.3% 5.6-85.3% 12.2-53.5% 12.8-54.7% 7.4-68.9% 7.2-65.7%