Networked Inequality: Preferential Attachment Bias in
Graph Neural Network Link Prediction

Arjun Subramonian    Levent Sagun    Yizhou Sun
Abstract

Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group (e.g., queer women) fairness and “rich get richer” dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN’s preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores within social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, social, and credit networks.

graph learning, fairness, link prediction

1 Introduction

CS1subscriptCS1\textsc{CS}_{1}CS start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTCS2subscriptCS2\textsc{CS}_{2}CS start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTCS3subscriptCS3\textsc{CS}_{3}CS start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTCS4subscriptCS4\textsc{CS}_{4}CS start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTCS5subscriptCS5\textsc{CS}_{5}CS start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTEdu1subscriptEdu1\textsc{Edu}_{1}Edu start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
Figure 1: An academic collaboration network where nodes are Computer Science (CS) and Education (Edu) researchers, solid edges are current or past collaborations, and dashed edges are collaborations recommended by a GCN. Circular nodes are women and square nodes are men.

Link prediction (LP) using GNNs is increasingly leveraged to recommend friends in social networks (Fan et al., 2019; Sankar et al., 2021), as well as by scholarly tools to recommend academic literature in citation networks (Xie et al., 2021). In recent years, graph learning researchers have raised concerns about the unfairness of GNN LP (Li et al., 2021; Current et al., 2022; Li et al., 2022). This unfairness is often attributed to graph structure, including the stratification of social groups; for example, online networks are usually segregated by ethnicity (Hofstra et al., 2017). However, most fair GNN LP research has focused on dyadic fairness, i.e., satisfying some notion of parity between inter-group and intra-group link predictions. This formulation neglects: 1) LP dynamics within social groups (Kasy & Abebe, 2021); and 2) the “rich get richer” effect, i.e., the prediction of links at a higher rate with high-degree nodes (Barabási & Albert, 1999). In the context of friend recommendation systems, the “rich get richer” effect can increase the number of links formed with high-degree individuals, which boosts their influence on other individuals in the network, and thus their power (Bashardoust et al., 2022).

In this paper, we shed light on how degree bias in networks affects GCN LP (Kipf & Welling, 2017). We theoretically and empirically find that GCNs with a symmetric normalized graph filter have a within-group preferential attachment (PA) bias in LP. Specifically, GCNs often output LP scores that are approximately proportional to the geometric mean of the (within-group) degrees of the incident nodes when the nodes belong to the same social group. (We elaborate on PA and our motivation in §J.) We focus on GCNs with symmetric and random walk normalized graph filters because they are popular architectures for graph deep learning, and they provide us with a reasonable setting to develop a rigorous theory of PA bias in GNN LP while leveraging tools from spectral graph theory.

Our finding can have significant implications for the fairness of GCN LP. For example, consider links within the CS social group in the toy academic collaboration network in Figure 1. Because men in CS, on average, have a higher within-group degree (deg=3deg3\text{deg}=3deg = 3) than women in CS (deg=1.25deg1.25\text{deg}=1.25deg = 1.25), due to gender discrimination, a collaboration recommender system that uses a GCN can suggest men as collaborators at a higher rate. This has the detrimental effect of further concentrating research collaborations among men, thereby reducing the influence of women in CS and reinforcing their marginalization in the field (Yamamoto & Frachtenberg, 2022). Furthermore, considering this marginalization in the context of CS is important, as such marginalization may be less severe or different in Edu.

Our contributions are as follows:

  1. 1.

    We theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group PA bias in LP (§4.1). We validate our theoretical analysis on diverse real-world network datasets (e.g., citation, collaboration, online social networks) of varying size (§6.1). In doing so, we lay a foundation to study this previously-unexplored PA bias in the GNN setting.

  2. 2.

    We bridge GCN’s PA bias with unfairness in LP (§4.2, §6.2). We contribute a new within-group fairness metric for LP, which quantifies disparities in LP scores within social groups, towards combating the amplification of degree and power disparities. To our knowledge, we are the first to study the within-group fairness of GNNs.

  3. 3.

    We propose a training-time strategy to alleviate within-group unfairness (§5), and we assess its effectiveness on citation, online social, and credit networks (§6.3). Our experiments reveal that even for this new form of unfairness, simple regularization approaches can be successful.

2 Related Work

Degree Bias in GNNs

Numerous papers have investigated how GNN performance is degraded for low-degree nodes on node representation learning and classification tasks (Tang et al., 2020; Liu et al., 2021; Kang et al., 2022; Xu et al., 2023; Shomer et al., 2023). Liu et al. (2023) present a generalized notion of degree bias that considers different multi-hop structures around nodes and propose a framework to address it; in contrast to prior work, which focuses on degree equal opportunity (i.e., similar accuracy for nodes with the same degree), Liu et al. (2023) also study degree statistical parity (i.e., similar prediction rates of each class for nodes with the same degree). Beyond node classification, Wang & Derr (2022) find GNN LP performance disparities across nodes with different degrees: low-degree nodes often benefit from higher performance than high-degree nodes. In this paper, we find that GCNs have a PA bias in LP, and present a new fairness metric which quantifies disparities in GNN LP scores within social groups. We focus on group fairness (i.e., parity between groups) rather than individual fairness (i.e., treating similar individuals similarly); this is because producing similar LP scores for similar-degree individuals does not prevent high-degree individuals from unfairly amassing links, and thus power (cf. Figure 1). We further compare our work to prior degree bias works in §K.

Fair Link Prediction

Prior work has investigated the unfairness of GNN LP (Li et al., 2021; Current et al., 2022; Li et al., 2022), often attributing it to graph structure, (e.g., stratification of social groups). However, most of this research has focused on dyadic fairness, i.e., satisfying some notion of parity between inter-group and intra-group links. Like Wang & Derr (2022), we examine how degree bias impacts GNN LP; however, rather than focus on performance disparities across nodes with different degrees, we study GCN’s PA bias and LP score disparities across (sub)groups.

Within-Group Fairness

Much previous work has studied within-group fairness, i.e., fairness over social subgroups (e.g., Black women, Indigenous men) defined over multiple axes (e.g., race, gender) (Kearns et al., 2017; Foulds et al., 2020; Ghosh et al., 2021; Wang et al., 2022). The motivation of this work is that classifiers can be fair with respect to two social axes separately, but be unfair to subgroups defined over both these axes. While prior research has termed this phenomenon intersectional unfairness, we opt for within-group unfairness to distinguish it from the critical framework of Intersectionality (Ovalle et al., 2023). We study within-group fairness in the GNN setting. In particular, our theoretical and empirical findings reveal that GCN LP can further marginalize social subgroups; this relates to the “complexity” tenet of Intersectionality, which expresses that the marginalization faced by, e.g., Black women, is non-additive and distinct from the marginalization faced by Black men and white women (Collins & Bilge, 2020).

Bias and Power in Networks

A wealth of literature outside fair graph learning has examined how network structure enables discrimination and disparities in capital (Fish et al., 2019; Stoica et al., 2020; Zhang et al., 2021; Bashardoust et al., 2022). Boyd et al. (2014) describe how an individual’s position in a social network affects their access to jobs and public health information, as well as how they are surveilled. Stoica et al. (2018) observe that high-degree accounts on Instagram overwhelmingly belong to men and recommendation algorithms further boost these accounts; complementarily, the authors find that even a simple, random walk-based recommendation algorithm can amplify degree disparities between social groups in networks modeled by PA dynamics. Similarly, we investigate how GCN LP can amplify degree disparities in networks and further concentrate power among high-degree individuals.

3 Preliminaries

We have a simple, undirected n𝑛nitalic_n-node graph 𝒢=(𝒱,)𝒢𝒱{\cal G}=({\cal V},{\cal E})caligraphic_G = ( caligraphic_V , caligraphic_E ) with doubly-weighted self-loops. The nodes have features (𝒙i)i𝒱subscriptsubscript𝒙𝑖𝑖𝒱\left({\bm{x}}_{i}\right)_{i\in{\cal V}}( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT, with each 𝒙idsubscript𝒙𝑖superscript𝑑{\bm{x}}_{i}\in\mathbb{R}^{d}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. We denote the adjacency matrix of 𝒢𝒢{\cal G}caligraphic_G as 𝑨{0,1}n×n𝑨superscript01𝑛𝑛{\bm{A}}\in\{0,1\}^{n\times n}bold_italic_A ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and the degree matrix as 𝑫=diag((j𝒱𝑨ij)i𝒱)𝑫diagsubscriptsubscript𝑗𝒱subscript𝑨𝑖𝑗𝑖𝒱{\bm{D}}=\text{diag}\left(\left(\sum_{j\in{\cal V}}{\bm{A}}_{ij}\right)_{i\in{% \cal V}}\right)bold_italic_D = diag ( ( ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT bold_italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT ), with 𝑫n×n𝑫superscript𝑛𝑛{\bm{D}}\in\mathbb{N}^{n\times n}bold_italic_D ∈ blackboard_N start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT. We consider two L𝐿Litalic_L-layer GCN encoders: (1) Φs:n×dn×d:subscriptΦ𝑠superscript𝑛𝑑superscript𝑛superscript𝑑\Phi_{s}:\mathbb{R}^{n\times d}\to\mathbb{R}^{n\times d^{\prime}}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT (Kipf & Welling, 2017), which uses a symmetric normalized filter, and (2) Φr:n×dn×d:subscriptΦ𝑟superscript𝑛𝑑superscript𝑛superscript𝑑\Phi_{r}:\mathbb{R}^{n\times d}\to\mathbb{R}^{n\times d^{\prime}}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, which uses a random walk normalized filter. ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT compute node representations as, i𝒱for-all𝑖𝒱\forall i\in{\cal V}∀ italic_i ∈ caligraphic_V:

Φs((𝒙j)j𝒱)isubscriptΦ𝑠subscriptsubscriptsubscript𝒙𝑗𝑗𝒱𝑖\displaystyle\Phi_{s}\left(\left({\bm{x}}_{j}\right)_{j\in{\cal V}}\right)_{i}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒔i(L),Φr((𝒙j)j𝒱)i=𝒓i(L)formulae-sequenceabsentsuperscriptsubscript𝒔𝑖𝐿subscriptΦ𝑟subscriptsubscriptsubscript𝒙𝑗𝑗𝒱𝑖superscriptsubscript𝒓𝑖𝐿\displaystyle={\bm{s}}_{i}^{(L)},\Phi_{r}\left(\left({\bm{x}}_{j}\right)_{j\in% {\cal V}}\right)_{i}={\bm{r}}_{i}^{(L)}= bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT (1)
l[L],𝒔i(l)for-all𝑙delimited-[]𝐿superscriptsubscript𝒔𝑖𝑙\displaystyle\forall l\in[L],{\bm{s}}_{i}^{(l)}∀ italic_l ∈ [ italic_L ] , bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT =σ(l)(jΓ(i)𝑾s(l)𝒔j(l1)𝑫ii𝑫jj),absentsuperscript𝜎𝑙subscript𝑗Γ𝑖superscriptsubscript𝑾𝑠𝑙superscriptsubscript𝒔𝑗𝑙1subscript𝑫𝑖𝑖subscript𝑫𝑗𝑗\displaystyle=\sigma^{(l)}\left(\sum_{j\in\Gamma(i)}\frac{{\bm{W}}_{s}^{(l)}{% \bm{s}}_{j}^{(l-1)}}{\sqrt{{\bm{D}}_{ii}{\bm{D}}_{jj}}}\right),= italic_σ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j ∈ roman_Γ ( italic_i ) end_POSTSUBSCRIPT divide start_ARG bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG bold_italic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG ) , (2)
l[L],𝒓i(l)for-all𝑙delimited-[]𝐿superscriptsubscript𝒓𝑖𝑙\displaystyle\forall l\in[L],{\bm{r}}_{i}^{(l)}∀ italic_l ∈ [ italic_L ] , bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT =σ(l)(jΓ(i)𝑾r(l)𝒓j(l1)𝑫ii),absentsuperscript𝜎𝑙subscript𝑗Γ𝑖superscriptsubscript𝑾𝑟𝑙superscriptsubscript𝒓𝑗𝑙1subscript𝑫𝑖𝑖\displaystyle=\sigma^{(l)}\left(\sum_{j\in\Gamma(i)}\frac{{\bm{W}}_{r}^{(l)}{% \bm{r}}_{j}^{(l-1)}}{{\bm{D}}_{ii}}\right),= italic_σ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j ∈ roman_Γ ( italic_i ) end_POSTSUBSCRIPT divide start_ARG bold_italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG bold_italic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG ) , (3)

where (𝒔i(0))i𝒱=(𝒓i(0))i𝒱=(𝒙i)i𝒱subscriptsuperscriptsubscript𝒔𝑖0𝑖𝒱subscriptsuperscriptsubscript𝒓𝑖0𝑖𝒱subscriptsubscript𝒙𝑖𝑖𝒱\left({\bm{s}}_{i}^{(0)}\right)_{i\in{\cal V}}=\left({\bm{r}}_{i}^{(0)}\right)% _{i\in{\cal V}}=\left({\bm{x}}_{i}\right)_{i\in{\cal V}}( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT = ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT; Γ(i)Γ𝑖\Gamma(i)roman_Γ ( italic_i ) is the 1-hop neighborhood of i𝑖iitalic_i; 𝑾s(l)superscriptsubscript𝑾𝑠𝑙{\bm{W}}_{s}^{(l)}bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT and 𝑾r(l)superscriptsubscript𝑾𝑟𝑙{\bm{W}}_{r}^{(l)}bold_italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT are the weight matrices corresponding to layer l𝑙litalic_l of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, respectively; for l[L1],σ(l)𝑙delimited-[]𝐿1superscript𝜎𝑙l\in[L-1],\sigma^{(l)}italic_l ∈ [ italic_L - 1 ] , italic_σ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is a ReLU non-linearity; and σ(L)superscript𝜎𝐿\sigma^{(L)}italic_σ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT is the identity function. We now consider the first-order Taylor expansions of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT around (𝟎)i𝒱subscript0𝑖𝒱\left(\mathbf{0}\right)_{i\in{\cal V}}( bold_0 ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT:

𝒔i(L)=j𝒱[𝒔i(L)𝒙j]𝒙j+ξ(𝒔i(L)),superscriptsubscript𝒔𝑖𝐿subscript𝑗𝒱delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝒙𝑗subscript𝒙𝑗𝜉superscriptsubscript𝒔𝑖𝐿\displaystyle{\bm{s}}_{i}^{(L)}=\sum_{j\in{\cal V}}\left[\frac{\partial{\bm{s}% }_{i}^{(L)}}{\partial{\bm{x}}_{j}}\right]{\bm{x}}_{j}+\xi\left({\bm{s}}_{i}^{(% L)}\right),bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT [ divide start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ] bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) , (4)
𝒓i(L)=j𝒱[𝒓i(L)𝒙j]𝒙j+ξ(𝒓i(L)),superscriptsubscript𝒓𝑖𝐿subscript𝑗𝒱delimited-[]superscriptsubscript𝒓𝑖𝐿subscript𝒙𝑗subscript𝒙𝑗𝜉superscriptsubscript𝒓𝑖𝐿\displaystyle{\bm{r}}_{i}^{(L)}=\sum_{j\in{\cal V}}\left[\frac{\partial{\bm{r}% }_{i}^{(L)}}{\partial{\bm{x}}_{j}}\right]{\bm{x}}_{j}+\xi\left({\bm{r}}_{i}^{(% L)}\right),bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT [ divide start_ARG ∂ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ] bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ξ ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) , (5)

where ξ𝜉\xiitalic_ξ is the error of the first-order approximations. This error is low when (𝒙i)i𝒱subscriptsubscript𝒙𝑖𝑖𝒱\left({\bm{x}}_{i}\right)_{i\in{\cal V}}( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT are close to 𝟎0\mathbf{0}bold_0, which we validate empirically in §6.1. Furthermore, we consider an inner-product LP score function fLP:d×d:subscript𝑓𝐿𝑃superscriptsuperscript𝑑superscriptsuperscript𝑑f_{LP}:\mathbb{R}^{d^{\prime}}\times\mathbb{R}^{d^{\prime}}\to\mathbb{R}italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R:

fLP(𝒉i(L),𝒉j(L))=(𝒉i(L))𝒉j(L),subscript𝑓𝐿𝑃superscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿superscriptsuperscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿\displaystyle f_{LP}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right)=\left({% \bm{h}}_{i}^{(L)}\right)^{\intercal}{\bm{h}}_{j}^{(L)},italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) = ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , (6)

where 𝒉i(L)superscriptsubscript𝒉𝑖𝐿{\bm{h}}_{i}^{(L)}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT is the last-layer representation for node i𝑖iitalic_i. While it is common to use a vanilla GCN and inner-product score function for LP (Fey, 2019), researchers have proposed methods to improve the expressivity of node representations for LP by capturing subgraph information (Zhang & Chen, 2018; Li et al., 2020; Chamberlain et al., 2023). Our theoretical findings remain relevant to methods that ultimately use a GCN to predict links (e.g., Zhang & Chen (2018); Li et al. (2020)), as we do not make assumptions about the features passed to the GCN (i.e., they could be distance encodings, SEAL node embeddings, etc.) Our results may also generalize to GNN architectures that use a degree-normalized graph filter, e.g., Graph Attention Networks (Veličković et al., 2018). Studying the fairness of more expressive LP methods is an interesting direction for future research. Furthermore, although we only consider an inner-product LP score function in our theoretical analysis, we also run experiments with a Hadamard product and MLP score function (cf. §G.2), and we find that our theoretical analysis is still relevant to and reasonably supports the experimental results.

4 Theoretical Analysis

We leverage spectral graph theory to study how degree bias affects GCN LP. Theoretically, we find that GCNs with a symmetric normalized graph filter have a within-group PA bias (§4.1), but GCNs with a random walk normalized filter may lack such a bias (§4.3). We further bridge GCN’s PA bias with unfairness in GCN LP, proposing a new LP within-group fairness metric (§4.2) and a simple training-time strategy to alleviate unfairness (§5). We empirically validate our theoretical results and fairness strategy in §6. We provide proofs for all theoretical results in §A.

Our ultimate goal is to bound the expected LP scores 𝔼[fLP(𝒔i(L),𝒔j(L))]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}% \right)\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] and 𝔼[fLP(𝒓i(L),𝒓j(L))]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{r}}_{i}^{(L)},{\bm{r}}_{j}^{(L)}% \right)\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] for nodes i,j𝑖𝑗i,jitalic_i , italic_j in the same social group in terms of the degrees of i,j𝑖𝑗i,jitalic_i , italic_j. We begin with Lemma 4.1, which expresses GCN representations (in expectation) as a linear combination of the initial node features. In doing so, we decouple the computation of GCN representations from the non-linearities σ(l)superscript𝜎𝑙\sigma^{(l)}italic_σ start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT.

Lemma 4.1.

Similarly to Xu et al. (2018), assume that each path from node ij𝑖𝑗i\to jitalic_i → italic_j in the computation graph of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is independently activated with probability ρs(i)subscript𝜌𝑠𝑖\rho_{s}(i)italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ), and similarly, ρr(i)subscript𝜌𝑟𝑖\rho_{r}(i)italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (cf. §L). Furthermore, suppose that 𝔼[ξ(𝐬i(L))]=𝔼[ξ(𝐫i(L))]=𝟎𝔼delimited-[]𝜉superscriptsubscript𝐬𝑖𝐿𝔼delimited-[]𝜉superscriptsubscript𝐫𝑖𝐿0\mathop{\mathbb{E}}\left[\xi\left({\bm{s}}_{i}^{(L)}\right)\right]=\mathop{% \mathbb{E}}\left[\xi\left({\bm{r}}_{i}^{(L)}\right)\right]=\mathbf{0}blackboard_E [ italic_ξ ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] = blackboard_E [ italic_ξ ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] = bold_0, where the expectations are taken over the probability distributions of paths activating. We define αj=(l=L1𝐖s(l))𝐱jsubscript𝛼𝑗superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝐖𝑠𝑙subscript𝐱𝑗\alpha_{j}=\left(\prod_{l=L}^{1}{\bm{W}}_{s}^{(l)}\right){\bm{x}}_{j}italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and βj=(l=L1𝐖r(l))𝐱jsubscript𝛽𝑗superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝐖𝑟𝑙subscript𝐱𝑗\beta_{j}=\left(\prod_{l=L}^{1}{\bm{W}}_{r}^{(l)}\right){\bm{x}}_{j}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then, i𝒱for-all𝑖𝒱\forall i\in{\cal V}∀ italic_i ∈ caligraphic_V:

𝔼[𝒔i(L)]=j𝒱ρs(i)(𝑫12𝑨𝑫12)ijLαj,𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝑗𝒱subscript𝜌𝑠𝑖subscriptsuperscriptsuperscript𝑫12𝑨superscript𝑫12𝐿𝑖𝑗subscript𝛼𝑗\displaystyle\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]=\sum_{j\in{% \cal V}}\rho_{s}(i)\left({\bm{D}}^{-\frac{1}{2}}{\bm{A}}{\bm{D}}^{-\frac{1}{2}% }\right)^{L}_{ij}\alpha_{j},blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) ( bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_italic_A bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (7)
𝔼[𝒓i(L)]=j𝒱ρr(i)(𝑫1𝑨)ijLβj.𝔼delimited-[]superscriptsubscript𝒓𝑖𝐿subscript𝑗𝒱subscript𝜌𝑟𝑖subscriptsuperscriptsuperscript𝑫1𝑨𝐿𝑖𝑗subscript𝛽𝑗\displaystyle\mathop{\mathbb{E}}\left[{\bm{r}}_{i}^{(L)}\right]=\sum_{j\in{% \cal V}}\rho_{r}(i)\left({\bm{D}}^{-1}{\bm{A}}\right)^{L}_{ij}\beta_{j}.blackboard_E [ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) ( bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (8)

Lemma 4.1 demonstrates that under certain assumptions (which we show to be reasonable in §6.1), the expected GCN representations can be expressed as a linear combination of the node features that depends on a normalized version of the adjacency matrix.

We now introduce social groups in 𝒢𝒢\cal Gcaligraphic_G into our analysis. Suppose that 𝒱𝒱\cal Vcaligraphic_V can be partitioned into B𝐵Bitalic_B disjoint social groups {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT, such that b[B]S(b)=𝒱subscript𝑏delimited-[]𝐵superscript𝑆𝑏𝒱\bigcup_{b\in[B]}S^{(b)}={\cal V}⋃ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = caligraphic_V and b[B]S(b)=subscript𝑏delimited-[]𝐵superscript𝑆𝑏\bigcap_{b\in[B]}S^{(b)}=\emptyset⋂ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = ∅. Furthermore, we define 𝒢(b)superscript𝒢𝑏{\cal G}^{(b)}caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT as the induced connected subgraph of 𝒢𝒢{\cal G}caligraphic_G formed from S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. (If a group comprises C>1𝐶1C>1italic_C > 1 connected components, it can be treated as C𝐶Citalic_C separate groups.) Let 𝑨^^𝑨\widehat{{\bm{A}}}over^ start_ARG bold_italic_A end_ARG be a within-group adjacency matrix that contains links between nodes in the same group, i.e., 𝑨^^𝑨\widehat{{\bm{A}}}over^ start_ARG bold_italic_A end_ARG contains the link (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) if and only if for some group S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, i,jS(b)𝑖𝑗superscript𝑆𝑏i,j\in S^{(b)}italic_i , italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. Without loss of generality, we reorder the rows and columns of 𝑨^^𝑨\widehat{{\bm{A}}}over^ start_ARG bold_italic_A end_ARG and 𝑨𝑨{\bm{A}}bold_italic_A such that 𝑨^^𝑨\widehat{{\bm{A}}}over^ start_ARG bold_italic_A end_ARG is a block matrix. Let 𝑫^^𝑫\widehat{{\bm{D}}}over^ start_ARG bold_italic_D end_ARG be the degree matrix of 𝑨^^𝑨\widehat{{\bm{A}}}over^ start_ARG bold_italic_A end_ARG.

4.1 Symmetric Normalized Filter

We first focus on analyzing ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. We introduce the notation 𝑷=𝑫12𝑨𝑫12𝑷superscript𝑫12𝑨superscript𝑫12{\bm{P}}={\bm{D}}^{-\frac{1}{2}}{\bm{A}}{\bm{D}}^{-\frac{1}{2}}bold_italic_P = bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_italic_A bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT for the symmetric normalized adjacency matrix. We further define 𝑷^=𝑫^12𝑨^𝑫^12^𝑷superscript^𝑫12^𝑨superscript^𝑫12\widehat{{\bm{P}}}=\widehat{{\bm{D}}}^{-\frac{1}{2}}\widehat{{\bm{A}}}\widehat% {{\bm{D}}}^{-\frac{1}{2}}over^ start_ARG bold_italic_P end_ARG = over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_italic_A end_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, which has the form [𝑷^(1)𝟎𝟎𝑷^(B)]matrixsuperscript^𝑷1missing-subexpression0missing-subexpressionmissing-subexpression0missing-subexpressionsuperscript^𝑷𝐵\begin{bmatrix}\widehat{{\bm{P}}}^{(1)}&&\mathbf{0}\\ &\ddots&\\ \mathbf{0}&&\widehat{{\bm{P}}}^{(B)}\end{bmatrix}[ start_ARG start_ROW start_CELL over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋱ end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL end_CELL start_CELL over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_B ) end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]. Each 𝑷^(b)superscript^𝑷𝑏\widehat{{\bm{P}}}^{(b)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT admits the orthonormal spectral decomposition 𝑷^(b)=k=1|S(b)|λk(b)𝒗k(b)(𝒗k(b))superscript^𝑷𝑏superscriptsubscript𝑘1superscript𝑆𝑏superscriptsubscript𝜆𝑘𝑏superscriptsubscript𝒗𝑘𝑏superscriptsuperscriptsubscript𝒗𝑘𝑏\widehat{{\bm{P}}}^{(b)}=\sum_{k=1}^{\left|S^{(b)}\right|}\lambda_{k}^{(b)}{% \bm{v}}_{k}^{(b)}\left({\bm{v}}_{k}^{(b)}\right)^{\intercal}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT. Let (λk(b))1k|S(b)|subscriptsubscriptsuperscript𝜆𝑏𝑘1𝑘superscript𝑆𝑏\left(\lambda^{(b)}_{k}\right)_{1\leq k\leq\left|S^{(b)}\right|}( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_k ≤ | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUBSCRIPT be the eigenvalues of 𝑷^(b)superscript^𝑷𝑏\widehat{{\bm{P}}}^{(b)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT sorted in non-increasing order; the eigenvalues fall in the range (1,1]11(-1,1]( - 1 , 1 ]. By the spectral properties of 𝑷^(b)superscript^𝑷𝑏\widehat{{\bm{P}}}^{(b)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, λ1(b)=1subscriptsuperscript𝜆𝑏11\lambda^{(b)}_{1}=1italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1. Following Lovász (2001), we denote the spectral gap of 𝑷^(b)superscript^𝑷𝑏\widehat{{\bm{P}}}^{(b)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT as λ(b)=max{λ2(b),|λ|S(b)|(b)|}<1superscript𝜆𝑏subscriptsuperscript𝜆𝑏2subscriptsuperscript𝜆𝑏superscript𝑆𝑏1\lambda^{(b)}=\max\left\{\lambda^{(b)}_{2},\left|\lambda^{(b)}_{\left|S^{(b)}% \right|}\right|\right\}<1italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = roman_max { italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , | italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUBSCRIPT | } < 1; λ2(b)subscriptsuperscript𝜆𝑏2\lambda^{(b)}_{2}italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT corresponds to the smallest non-zero eigenvalue of the symmetric normalized graph Laplacian. Let 𝑷=𝑷^+Ξ(0)𝑷^𝑷superscriptΞ0{\bm{P}}=\widehat{{\bm{P}}}+\Xi^{(0)}bold_italic_P = over^ start_ARG bold_italic_P end_ARG + roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT. If 𝒢𝒢\cal Gcaligraphic_G is highly modular or approximately disconnected, then Ξ(0)𝟎approximately-equals-or-equalssuperscriptΞ00\Xi^{(0)}\approxeq\mathbf{0}roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ≊ bold_0, albeit with positive and non-positive entries. Finally, we define the volume vol(𝒢(b))=kS(b)𝑫^kkvolsuperscript𝒢𝑏subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘\text{vol}\left({\cal G}^{(b)}\right)=\sum_{k\in S^{(b)}}\widehat{{\bm{D}}}_{kk}vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT.

In Lemma 4.2, we present an inequality for the entries of 𝑷Lsuperscript𝑷𝐿{\bm{P}}^{L}bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT in terms of the spectral properties of 𝑷^^𝑷\widehat{{\bm{P}}}over^ start_ARG bold_italic_P end_ARG. We can then combine this inequality with Lemma 4.1 to bound 𝔼[𝒔i(L)]𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ], and subsequently 𝔼[fLP(𝒔i(L),𝒔j(L))]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}% \right)\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ].

Lemma 4.2.

For i,jS(b)𝑖𝑗superscript𝑆𝑏i,j\in S^{(b)}italic_i , italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT:

|𝑷ijL𝑫^ii𝑫^jjvol(𝒢(b))|subscriptsuperscript𝑷𝐿𝑖𝑗subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏\displaystyle\left|{\bm{P}}^{L}_{ij}-\frac{\sqrt{\widehat{{\bm{D}}}_{ii}% \widehat{{\bm{D}}}_{jj}}}{\text{vol}\left({\cal G}^{(b)}\right)}\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | (9)
ζs=(λ(b))L+l=1L(Ll)Ξ(0)opl𝑷^opLl,absentsubscript𝜁𝑠superscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\leq\zeta_{s}=\left(\lambda^{(b)}\right)^{L}+\sum_{l=1}^{L}{L% \choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^{% L-l}_{op},≤ italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT , (10)

where op\|\cdot\|_{op}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT is the operator norm. And for iS(b),jS(b)formulae-sequence𝑖superscript𝑆𝑏𝑗superscript𝑆𝑏i\in S^{(b)},j\notin S^{(b)}italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , italic_j ∉ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, |𝐏ijL0|l=1L(Ll)Ξ(0)opl𝐏^opLlζssubscriptsuperscript𝐏𝐿𝑖𝑗0superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝐏𝐿𝑙𝑜𝑝subscript𝜁𝑠\left|{\bm{P}}^{L}_{ij}-0\right|\leq\sum_{l=1}^{L}{L\choose l}\left\|\Xi^{(0)}% \right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}\leq\zeta_{s}| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - 0 | ≤ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT.

The proof of Lemma 4.2 is similar to spectral proofs of random walk convergence. When L𝐿Litalic_L is small (e.g., 2 for many GCNs (Kipf & Welling, 2017)) and Ξ(0)op0approximately-equals-or-equalssubscriptnormsuperscriptΞ0𝑜𝑝0\left\|\Xi^{(0)}\right\|_{op}\approxeq 0∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≊ 0, l=1L(Ll)Ξ(0)opl𝑷^opLl0approximately-equals-or-equalssuperscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝0\sum_{l=1}^{L}{L\choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{% \bm{P}}}\right\|^{L-l}_{op}\approxeq 0∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≊ 0. Furthermore, with significant stratification between social groups (Hofstra et al., 2017) and high expansion within groups (Malliaros & Megalooikonomou, 2011; Leskovec et al., 2008), λ(b)<<1much-less-thansuperscript𝜆𝑏1\lambda^{(b)}<<1italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT < < 1. In this case, ζs0approximately-equals-or-equalssubscript𝜁𝑠0\zeta_{s}\approxeq 0italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≊ 0 and 𝑷ijL𝑫^ii𝑫^jjvol(𝒢(b))approximately-equals-or-equalssubscriptsuperscript𝑷𝐿𝑖𝑗subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏{\bm{P}}^{L}_{ij}\approxeq\frac{\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}% }_{jj}}}{\text{vol}\left({\cal G}^{(b)}\right)}bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≊ divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG for i,jS(b)𝑖𝑗superscript𝑆𝑏i,j\in S^{(b)}italic_i , italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. Combining Lemmas 4.1 and 4.2, ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT can oversmooth the expected representations to 𝔼[𝒔i(L)]ρs(i)𝑫^iijS(b)𝑫^jjvol(𝒢(b))αjapproximately-equals-or-equals𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝜌𝑠𝑖subscript^𝑫𝑖𝑖subscript𝑗superscript𝑆𝑏subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏subscript𝛼𝑗\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]\approxeq\rho_{s}(i)\sqrt{% \widehat{{\bm{D}}}_{ii}}\cdot\sum_{j\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}% _{jj}}}{\text{vol}\left({\cal G}^{(b)}\right)}\alpha_{j}blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] ≊ italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (Keriven, 2022; Giovanni et al., 2023). We use this knowledge to bound 𝔼[fLP(𝒔i(L),𝒔j(L))]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}% \right)\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] in terms of the degrees of i,j𝑖𝑗i,jitalic_i , italic_j.

Theorem 4.3.

Following a relaxed assumption from Xu et al. (2018), for nodes i,jS(b)𝑖𝑗superscript𝑆𝑏i,j\in S^{(b)}italic_i , italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, we assume that ρs(i)=ρs(j)=ρ¯s(b)subscript𝜌𝑠𝑖subscript𝜌𝑠𝑗subscript¯𝜌𝑠𝑏\rho_{s}(i)=\rho_{s}(j)=\overline{\rho}_{s}(b)italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) = italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_j ) = over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_b ). Then:

|𝔼[fLP(𝒔i(L),𝒔j(L))]C0𝑫^ii𝑫^jj|𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿subscript𝐶0subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗\displaystyle\left|\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{% \bm{s}}_{j}^{(L)}\right)\right]-C_{0}\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{% \bm{D}}}_{jj}}\right|| blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] - italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG | (11)
ζsρ¯s2(b)(𝑫^ii+𝑫^jj)C1C2+ζs2ρ¯s2(b)C22,absentsubscript𝜁𝑠superscriptsubscript¯𝜌𝑠2𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗subscript𝐶1subscript𝐶2superscriptsubscript𝜁𝑠2superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝐶22\displaystyle\leq\zeta_{s}\overline{\rho}_{s}^{2}(b)\left(\sqrt{\widehat{{\bm{% D}}}_{ii}}+\sqrt{\widehat{{\bm{D}}}_{jj}}\right)C_{1}C_{2}+\zeta_{s}^{2}% \overline{\rho}_{s}^{2}(b)C_{2}^{2},≤ italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG + square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (12)
where: (13)
C0=ρ¯s2(b)C12,subscript𝐶0superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝐶12\displaystyle C_{0}=\overline{\rho}_{s}^{2}(b)C_{1}^{2},italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (14)
C1=kS(b)𝑫^kkvol(𝒢(b))αk2,subscript𝐶1subscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘2\displaystyle C_{1}=\left\|\sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{% kk}}}{\text{vol}({\cal G}^{(b)})}\alpha_{k}\right\|_{2},italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (15)
C2=k𝒱αk2.subscript𝐶2subscript𝑘𝒱subscriptnormsubscript𝛼𝑘2\displaystyle C_{2}=\sum_{k\in{\cal V}}\|\alpha_{k}\|_{2}.italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (16)

In simpler terms, Theorem 4.3 states that with social stratification and expansion, the expected LP score 𝔼[fLP(𝒔i(L),𝒔j(L))]𝑫^ii𝑫^jjproportional-to𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}% \right)\right]\propto\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] ∝ square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG approximately when i,j𝑖𝑗i,jitalic_i , italic_j belong to the same social group. This is because, as explained before Theorem 4.3, ζs0approximately-equals-or-equalssubscript𝜁𝑠0\zeta_{s}\approxeq 0italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≊ 0, so the RHS of the bound is 0approximately-equals-or-equalsabsent0\approxeq 0≊ 0. This demonstrates that in LP, GCNs with a symmetric normalized graph filter have a within-group PA bias. If ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT positively influences the formation of links over time, this PA bias can drive “rich get richer” dynamics within social groups (Stoica et al., 2018). As shown in Figure 1 and §4.2, such “rich get richer” dynamics can engender group unfairness when nodes’ degrees are statistically associated with their group membership (§4.2). An association between node degree and group membership depends on group size and homophily; in particular, when a group has many nodes and intra-links (i.e., is homophilous), there may be more nodes with a high within-group degree. Beyond fairness, Theorem 4.3 reveals that GCNs do not align with theories that social rank influences link formation, i.e., the likelihood of a link forming between nodes is proportional to their degree difference (Gu et al., 2018).

4.2 Within-Group Fairness

We further investigate the fairness implications of the PA bias of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT in LP. We first introduce an additional set of social groups. Suppose that 𝒱𝒱{\cal V}caligraphic_V can also be partitioned into D𝐷Ditalic_D disjoint social groups {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT; then, we can consider intersections of {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT and {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT. For example, revisiting Figure 1, S𝑆Sitalic_S may correspond to academic discipline (e.g., CS, Edu) and T𝑇Titalic_T may correspond to gender (e.g., men, women). For simplicity, we let D=2𝐷2D=2italic_D = 2. We measure the unfairness Δ(b):d×d:superscriptΔ𝑏superscriptsuperscript𝑑superscriptsuperscript𝑑\Delta^{(b)}:\mathbb{R}^{d^{\prime}}\times\mathbb{R}^{d^{\prime}}\to\mathbb{R}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R of LP for group b𝑏bitalic_b as:

Δ(b)(𝒉i(L),𝒉j(L)):=assignsuperscriptΔ𝑏superscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿absent\displaystyle\Delta^{(b)}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right):=roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) := (17)
|𝔼i,jU((S(b)T(1))×S(b))fLP(𝒉i(L),𝒉j(L))\displaystyle\Biggl{|}\mathop{\mathbb{E}}_{i,j\sim U((S^{(b)}\cap T^{(1)})% \times S^{(b)})}f_{LP}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right)| blackboard_E start_POSTSUBSCRIPT italic_i , italic_j ∼ italic_U ( ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (18)
𝔼i,jU((S(b)T(2))×S(b))fLP(𝒉i(L),𝒉j(L))|,\displaystyle-\mathop{\mathbb{E}}_{i,j\sim U((S^{(b)}\cap T^{(2)})\times S^{(b% )})}f_{LP}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right)\biggr{|},- blackboard_E start_POSTSUBSCRIPT italic_i , italic_j ∼ italic_U ( ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) | , (19)

where U()𝑈U(\cdot)italic_U ( ⋅ ) is a discrete uniform distribution over the input set. Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT quantifies disparities in GCN LP scores within S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (with respect to T(1)superscript𝑇1T^{(1)}italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT and T(2)superscript𝑇2T^{(2)}italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT). In other words, Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT measures differences in how GCNs allocate LP scores across subgroups, i.e., are links with nodes in one subgroup predicted at a higher rate than links with nodes in the other subgroup? Our metric is motivated by how GNN link predictions influence real-world link formation (e.g., GNN-based recommender systems use LP scores to rank suggested social connections), which has consequences for degree and power disparities. Based on Theorem 4.3 and §B.1, when ζs0approximately-equals-or-equalssubscript𝜁𝑠0\zeta_{s}\approxeq 0italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≊ 0, we can estimate Δ(b)(𝒔i(L),𝒔j(L))superscriptΔ𝑏superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\Delta^{(b)}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) as:

Δ^(b)(𝒔i(L),𝒔j(L))superscript^Δ𝑏superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\displaystyle\widehat{\Delta}^{(b)}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (20)
=ρ¯s2(b)|S(b)|kS(b)𝑫^kkvol(𝒢(b))αk22|jS(b)𝑫^jj×\displaystyle=\frac{\overline{\rho}_{s}^{2}(b)}{\left|S^{(b)}\right|}\left\|% \sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{% (b)})}\alpha_{k}\right\|^{2}_{2}\Biggl{|}\sum_{j\in S^{(b)}}\sqrt{\widehat{{% \bm{D}}}_{jj}}\times= divide start_ARG over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG × (21)
(𝔼iU(S(b)T(1))𝑫^ii𝔼iU(S(b)T(2))𝑫^ii)degree disparity|\displaystyle\underbrace{\left(\mathop{\mathbb{E}}_{i\sim U(S^{(b)}\cap T^{(1)% })}\sqrt{\widehat{{\bm{D}}}_{ii}}-\mathop{\mathbb{E}}_{i\sim U(S^{(b)}\cap T^{% (2)})}\sqrt{\widehat{{\bm{D}}}_{ii}}\right)}_{\text{degree disparity}}\biggr{|}under⏟ start_ARG ( blackboard_E start_POSTSUBSCRIPT italic_i ∼ italic_U ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG - blackboard_E start_POSTSUBSCRIPT italic_i ∼ italic_U ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG ) end_ARG start_POSTSUBSCRIPT degree disparity end_POSTSUBSCRIPT | (22)

This suggests that a large disparity in the degree of nodes in S(b)T(1)superscript𝑆𝑏superscript𝑇1S^{(b)}\cap T^{(1)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT vs. S(b)T(2)superscript𝑆𝑏superscript𝑇2S^{(b)}\cap T^{(2)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT can greatly increase the unfairness Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT LP. For example, in Figure 1, the large degree disparity within CS (between men and women) entails that a GCN collaboration recommender system applied to the network will have a large Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. We empirically validate these fairness implications on diverse network datasets in §6.2. While we consider pre-activation LP scores in Eqn. 17 (in line with prior work, e.g., Li et al. (2021)), we consider post-sigmoid scores σ(fLP(𝒉i(L),𝒉j(L)))𝜎subscript𝑓𝐿𝑃superscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿\sigma\left(f_{LP}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right)\right)italic_σ ( italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ) (where σ𝜎\sigmaitalic_σ is the sigmoid function) in §6.2 and §6.3, as this simulates how LP scores may be processed in practice.

Ultimately, within-group unfairness is characteristic of all GNN link prediction methods that: (1) predict scores for links with magnitudes that are positively associated with the degrees of their incident nodes, and (2) are applied to graphs where within-group membership is associated with node degree.

4.3 Random Walk Normalized Filter

We now follow similar steps as with ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT to understand how degree bias affects LP scores for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. We redefine 𝑷=𝑫1𝑨𝑷superscript𝑫1𝑨{\bm{P}}={\bm{D}}^{-1}{\bm{A}}bold_italic_P = bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A, 𝑷^=𝑫^1𝑨^^𝑷superscript^𝑫1^𝑨\widehat{{\bm{P}}}=\widehat{{\bm{D}}}^{-1}\widehat{{\bm{A}}}over^ start_ARG bold_italic_P end_ARG = over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_A end_ARG, and the remaining notation from §4.1 accordingly for the random walk setting.

Theorem 4.4.

Let ζr=maxu,v𝒱𝐃^vv𝐃^uu(λ(b))L+l=1L(Ll)Ξ(0)opl𝐏^opLlsubscript𝜁𝑟subscript𝑢𝑣𝒱subscript^𝐃𝑣𝑣subscript^𝐃𝑢𝑢superscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝐏𝐿𝑙𝑜𝑝\zeta_{r}=\max_{u,v\in{\cal V}}\sqrt{\frac{\widehat{{\bm{D}}}_{vv}}{\widehat{{% \bm{D}}}_{uu}}}\left(\lambda^{(b)}\right)^{L}+\sum_{l=1}^{L}{L\choose l}\left% \|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_u , italic_v ∈ caligraphic_V end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_v italic_v end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT end_ARG end_ARG ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT. Furthermore, for nodes i,jS(b)𝑖𝑗superscript𝑆𝑏i,j\in S^{(b)}italic_i , italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, assume that ρr(i)=ρr(j)=ρ¯r(b)subscript𝜌𝑟𝑖subscript𝜌𝑟𝑗subscript¯𝜌𝑟𝑏\rho_{r}(i)=\rho_{r}(j)=\overline{\rho}_{r}(b)italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) = italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_j ) = over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_b ). Combining Lemmas 4.1 and A.1:

|𝔼[fLP(𝒓i(L),𝒓j(L))]C0|𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿subscript𝐶0\displaystyle\left|\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{r}}_{i}^{(L)},{% \bm{r}}_{j}^{(L)}\right)\right]-C_{0}\right|| blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] - italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | (23)
ζrρ¯r2(b)C1C2+ζr2ρ¯r2(b)C22,absentsubscript𝜁𝑟superscriptsubscript¯𝜌𝑟2𝑏subscript𝐶1subscript𝐶2superscriptsubscript𝜁𝑟2superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝐶22\displaystyle\leq\zeta_{r}\overline{\rho}_{r}^{2}(b)C_{1}C_{2}+\zeta_{r}^{2}% \overline{\rho}_{r}^{2}(b)C_{2}^{2},≤ italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (24)
where: (25)
C0=ρ¯r2(b)C12,subscript𝐶0superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝐶12\displaystyle C_{0}=\overline{\rho}_{r}^{2}(b)C_{1}^{2},italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (26)
C1=kS(b)𝑫^kkvol(𝒢(b))βk,subscript𝐶1normsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘\displaystyle C_{1}=\left\|\sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_{kk}}{% \text{vol}({\cal G}^{(b)})}\beta_{k}\right\|,italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ , (27)
C2=k𝒱βk2.subscript𝐶2subscript𝑘𝒱subscriptnormsubscript𝛽𝑘2\displaystyle C_{2}=\sum_{k\in{\cal V}}\|\beta_{k}\|_{2}.italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (28)

In other words, if ζr0approximately-equals-or-equalssubscript𝜁𝑟0\zeta_{r}\approxeq 0italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≊ 0, 𝔼[fLP(𝒓i(L),𝒓j(L))]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{r}}_{i}^{(L)},{\bm{r}}_{j}^{(L)}% \right)\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] is approximately constant when i,j𝑖𝑗i,jitalic_i , italic_j belong to the same social group. Based on Theorem 4.4 and §B.2, we can estimate Δ(b)(𝒔i(L),𝒔j(L))superscriptΔ𝑏superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\Delta^{(b)}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) as Δ^(b)(𝒔i(L),𝒔j(L))=0superscript^Δ𝑏superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿0\widehat{\Delta}^{(b)}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)=0over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) = 0. Theoretically, this would suggest that a large disparity in the degree of nodes in S(b)T(1)superscript𝑆𝑏superscript𝑇1S^{(b)}\cap T^{(1)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT vs. S(b)T(2)superscript𝑆𝑏superscript𝑇2S^{(b)}\cap T^{(2)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT does not increase the unfairness Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT of ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT LP. However, we find empirically that this is not the case (§6.1). Even so, we include theoretical results for the random walk filter to be more comprehensive with respect to filter choice, as well as be upfront about the limitations of our analysis in this case. We also seek to provide an example of how to apply our analysis to other filters, for researchers who would like to build on it in the future. For example, findings for the random walk filter could be relevant to the GAT filter (Veličković et al., 2018), which is also a row-stochastic matrix.

In summary, in §4, we build on prior analysis techniques for random walks and GNNs. At a high level, we: (1) simplify the GCN architecture to be a linear function by truncating its Taylor expansion and considering node representations in expectation; (2) analyze the convergence of node representations via a spectral analysis of the convergence of short random walks within subgraphs (corresponding to social groups); and (3) use norm inequalities to estimate link prediction scores. Our analysis comprises numerous novel elements including:

  1. 1.

    Analyzing the convergence of random walks within subgraphs, which requires accounting for the rate at which probability mass escapes from the subgraphs. In contrast, random walk results in the literature usually concern the convergence of random walks over an entire graph.

  2. 2.

    Uncovering properties of short random walks on graphs, since most GNNs are shallow. In contrast, random walk results in the literature often concern the stationary distribution of random walks.

  3. 3.

    Concretely relating theoretical properties of random walks to the fairness of GCN link prediction.

5 Fairness Regularizer

We propose a simple training-time solution to alleviate within-group LP unfairness regardless of graph filter type and GNN architecture. In particular, we can add a fairness regularization term fairsubscriptfair{\cal L}_{\text{fair}}caligraphic_L start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT to our original GNN training loss (Kamishima et al., 2011):

new=orig+λfairfair=orig+λfairBb[B]Δ(b),subscriptnewsubscriptorigsubscript𝜆fairsubscriptfairsubscriptorigsubscript𝜆fair𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\displaystyle{\cal L}_{\text{new}}={\cal L}_{\text{orig}}+\lambda_{\text{fair}% }{\cal L}_{\text{fair}}={\cal L}_{\text{orig}}+\frac{\lambda_{\text{fair}}}{B}% \sum_{b\in[B]}\Delta^{(b)},caligraphic_L start_POSTSUBSCRIPT new end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT orig end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT orig end_POSTSUBSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , (29)

where λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT is a tunable hyperparameter that for higher values, pushes the GNN to learn fairer parameters. With our fairness strategy, we empirically observe a significant decrease in the average unfairness across groups 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT without a severe drop in LP performance for GCN (§6.3).

6 Experiments

In this section, we empirically validate our theoretical analysis (§6.1) and the within-group fairness implications of GCN’s LP PA bias (§6.2) on diverse real-world network datasets of varying size. We further find that our simple training-time strategy to alleviate unfairness is effective on citation, online social, and credit networks (§6.3). We release our code and data in our GitHub repository111https://github.com/ArjunSubramonian/link_bias_amplification. We present experimental results with 4-layer GCN encoders and a Hadamard product with MLP LP score function in §G, with similar conclusions.

6.1 Validating Theoretical Analysis

We validate our theoretical analysis on 10 real-world network datasets (e.g., citation, collaboration, online social networks), which we describe in §C. Each dataset is natively intended for node classification; however, we adapt the datasets for LP, treating the connected components within the node classes as the social groups S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. This design choice is reasonable, as in all the datasets, the classes naturally correspond to socially-relevant grou**s of the nodes, or proxies thereof (e.g., in the LastFMAsia dataset, the classes are the home countries of users). Because we adopt the class labels for each dataset as the social group labels, the social groups are largely homophilic; this aligns with our assumptions when interpreting Theorems 4.3 and 4.4 that social groups are stratified in networks.

We train GCN encoders ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for LP over 10 random seeds (cf. §E for more details). In Figure 4, we plot the theoretic222While our theoretic scores resulted from our theoretical analysis in §4, we reiterate that our results in §4 rely on the assumptions that we state and the theoretic score is not a ground-truth value. LP score that we derive in §4 against the GCN LP score for pairs of test nodes belonging to the same social group (including positive and negative links). In particular, for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, the theoretic LP score is ρ¯s2(b)𝑫^ii𝑫^jjkS(b)𝑫^kkvol(𝒢(b))αk22superscriptsubscript¯𝜌𝑠2𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22\overline{\rho}_{s}^{2}(b)\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}% }\left\|\sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({% \cal G}^{(b)})}\alpha_{k}\right\|^{2}_{2}over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the GCN LP score is fLP(𝒔i(L),𝒔j(L))subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (cf. Theorem 4.3). In contrast, for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the theoretic LP score is ρ¯s2(b)kS(b)𝑫^kkvol(𝒢(b))βk22superscriptsubscript¯𝜌𝑠2𝑏subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘22\overline{\rho}_{s}^{2}(b)\left\|\sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_{% kk}}{\text{vol}({\cal G}^{(b)})}\beta_{k}\right\|^{2}_{2}over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the GCN LP score is fLP(𝒓i(L),𝒓j(L))subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿f_{LP}\left({\bm{r}}_{i}^{(L)},{\bm{r}}_{j}^{(L)}\right)italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (cf. Theorem 4.4). For all the datasets, we estimate ρ¯s2(b)superscriptsubscript¯𝜌𝑠2𝑏\overline{\rho}_{s}^{2}(b)over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) and ρ¯r2(b)superscriptsubscript¯𝜌𝑟2𝑏\overline{\rho}_{r}^{2}(b)over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) separately for each social group S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT as the slope of the least-squares regression line (through the data from S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT) that predicts the GCN score as a function of the theoretic score. Hence, we do not plot any pair of test nodes that is the only pair in S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, as it is not possible to estimate ρ¯s2(b)superscriptsubscript¯𝜌𝑠2𝑏\overline{\rho}_{s}^{2}(b)over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ). Further, the test AUC is consistently high, indicating that the GCNs are well-trained. The large range of each color in the plots indicates a diversity of LP scores within each social group.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption

NRMSE (\downarrow) PCC (\uparrow) ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT Test AUC (\uparrow) CORA 0.038±0.006plus-or-minus0.0380.0060.038\pm 0.0060.038 ± 0.006 0.884±0.008plus-or-minus0.8840.0080.884\pm 0.0080.884 ± 0.008 0.927±0.008plus-or-minus0.9270.0080.927\pm 0.0080.927 ± 0.008 CITESEER 0.080±0.005plus-or-minus0.0800.0050.080\pm 0.0050.080 ± 0.005 0.806±0.007plus-or-minus0.8060.0070.806\pm 0.0070.806 ± 0.007 0.943±0.007plus-or-minus0.9430.0070.943\pm 0.0070.943 ± 0.007 DBLP 0.026±0.002plus-or-minus0.0260.0020.026\pm 0.0020.026 ± 0.002 0.820±0.014plus-or-minus0.8200.0140.820\pm 0.0140.820 ± 0.014 0.948±0.001plus-or-minus0.9480.0010.948\pm 0.0010.948 ± 0.001 PUBMED 0.061±0.008plus-or-minus0.0610.0080.061\pm 0.0080.061 ± 0.008 0.774±0.018plus-or-minus0.7740.0180.774\pm 0.0180.774 ± 0.018 0.927±0.010plus-or-minus0.9270.0100.927\pm 0.0100.927 ± 0.010 CS 0.036±0.006plus-or-minus0.0360.0060.036\pm 0.0060.036 ± 0.006 0.917±0.019plus-or-minus0.9170.0190.917\pm 0.0190.917 ± 0.019 0.932±0.008plus-or-minus0.9320.0080.932\pm 0.0080.932 ± 0.008 PHYSICS 0.042±0.003plus-or-minus0.0420.0030.042\pm 0.0030.042 ± 0.003 0.822±0.021plus-or-minus0.8220.0210.822\pm 0.0210.822 ± 0.021 0.946±0.003plus-or-minus0.9460.0030.946\pm 0.0030.946 ± 0.003 LASTFMASIA 0.064±0.003plus-or-minus0.0640.0030.064\pm 0.0030.064 ± 0.003 0.889±0.004plus-or-minus0.8890.0040.889\pm 0.0040.889 ± 0.004 0.962±0.001plus-or-minus0.9620.0010.962\pm 0.0010.962 ± 0.001 DE 0.025±0.003plus-or-minus0.0250.0030.025\pm 0.0030.025 ± 0.003 0.795±0.043plus-or-minus0.7950.0430.795\pm 0.0430.795 ± 0.043 0.913±0.003plus-or-minus0.9130.0030.913\pm 0.0030.913 ± 0.003 EN 0.041±0.002plus-or-minus0.0410.0020.041\pm 0.0020.041 ± 0.002 0.542±0.013plus-or-minus0.5420.0130.542\pm 0.0130.542 ± 0.013 0.876±0.003plus-or-minus0.8760.0030.876\pm 0.0030.876 ± 0.003 FR 0.030±0.002plus-or-minus0.0300.0020.030\pm 0.0020.030 ± 0.002 0.743±0.026plus-or-minus0.7430.0260.743\pm 0.0260.743 ± 0.026 0.910±0.005plus-or-minus0.9100.0050.910\pm 0.0050.910 ± 0.005

NRMSE (\downarrow) PCC (\uparrow) ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT Test AUC (\uparrow) CORA 0.101±0.029plus-or-minus0.1010.0290.101\pm 0.0290.101 ± 0.029 0.553±0.024plus-or-minus0.5530.0240.553\pm 0.0240.553 ± 0.024 0.942±0.005plus-or-minus0.9420.0050.942\pm 0.0050.942 ± 0.005 CITESEER 0.170±0.016plus-or-minus0.1700.0160.170\pm 0.0160.170 ± 0.016 0.363±0.028plus-or-minus0.3630.0280.363\pm 0.0280.363 ± 0.028 0.934±0.003plus-or-minus0.9340.0030.934\pm 0.0030.934 ± 0.003 DBLP 0.157±0.012plus-or-minus0.1570.0120.157\pm 0.0120.157 ± 0.012 0.235±0.022plus-or-minus0.2350.0220.235\pm 0.0220.235 ± 0.022 0.942±0.002plus-or-minus0.9420.0020.942\pm 0.0020.942 ± 0.002 PUBMED 0.155±0.013plus-or-minus0.1550.0130.155\pm 0.0130.155 ± 0.013 0.079±0.029plus-or-minus0.0790.0290.079\pm 0.0290.079 ± 0.029 0.896±0.011plus-or-minus0.8960.0110.896\pm 0.0110.896 ± 0.011 CS 0.101±0.027plus-or-minus0.1010.0270.101\pm 0.0270.101 ± 0.027 0.447±0.070plus-or-minus0.4470.0700.447\pm 0.0700.447 ± 0.070 0.939±0.003plus-or-minus0.9390.0030.939\pm 0.0030.939 ± 0.003 PHYSICS 0.107±0.027plus-or-minus0.1070.0270.107\pm 0.0270.107 ± 0.027 0.264±0.038plus-or-minus0.2640.0380.264\pm 0.0380.264 ± 0.038 0.951±0.004plus-or-minus0.9510.0040.951\pm 0.0040.951 ± 0.004 LASTFMASIA 0.123±0.016plus-or-minus0.1230.0160.123\pm 0.0160.123 ± 0.016 0.409±0.017plus-or-minus0.4090.0170.409\pm 0.0170.409 ± 0.017 0.949±0.001plus-or-minus0.9490.0010.949\pm 0.0010.949 ± 0.001 DE 0.024±0.004plus-or-minus0.0240.0040.024\pm 0.0040.024 ± 0.004 0.074±0.016plus-or-minus0.0740.0160.074\pm 0.0160.074 ± 0.016 0.862±0.003plus-or-minus0.8620.0030.862\pm 0.0030.862 ± 0.003 EN 0.065±0.006plus-or-minus0.0650.0060.065\pm 0.0060.065 ± 0.006 0.012±0.005plus-or-minus0.0120.0050.012\pm 0.0050.012 ± 0.005 0.850±0.002plus-or-minus0.8500.0020.850\pm 0.0020.850 ± 0.002 FR 0.028±0.006plus-or-minus0.0280.0060.028\pm 0.0060.028 ± 0.006 0.006±0.003plus-or-minus0.0060.0030.006\pm 0.0030.006 ± 0.003 0.865±0.004plus-or-minus0.8650.0040.865\pm 0.0040.865 ± 0.004

Figure 2: The plots display the theoretic vs. GCN LP scores for the Cora, CS, and LastFMAsia datasets over 10 random seeds. (We include the plots for the remaining datasets in §F.) The top row of plots corresponds to ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, the bottom row to ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. In the plots, each circle corresponds to a single pair of test nodes (between which we are predicting a link). The center of each circle represents the mean of the theoretic and GCN scores and its area captures the range of scores. The color of each circle indicates the social group to which the node pair belongs. The plots include: (1) the total number of test node pairs N𝑁Nitalic_N; (2) the number of social groups B𝐵Bitalic_B; (3) the dashed line of equality for easy comparison of the theoretic and GCN scores. For all the datasets, the tables display: (1) the mean/standard deviation of the GCN test AUC on LP; and (2) the mean/standard deviation of the range-normalized444Normalized by the sample range of the GCN LP scores. Values fall between 0 and 1.root-mean-square deviation (NRMSE) (Otto, 2019) and Pearson correlation coefficient (PCC) (Freedman et al., 2007) of the theoretic LP scores as predictors of the GCN scores. The left table corresponds to ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, the right to ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

We visually observe that the theoretic LP scores are strong predictors of the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT scores for each dataset, validating our theoretical analysis. This strength is further confirmed by the generally low NRMSE and high PCC (except for the EN dataset). However, we observe a few cases in which our theoretical analysis does not line up with our experiments:

Refer to caption
Refer to caption
Refer to caption
Figure 3: The plots display Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT vs. Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for the NBA, German, and DBLP-Fairness datasets over all b[B]𝑏delimited-[]𝐵b\in[B]italic_b ∈ [ italic_B ] and 10 random seeds. Each point corresponds to a different random seed, and the color of the point corresponds to the social group S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. We compute Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT and Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT post-sigmoid using only the LP scores over the sampled (positive and negative) test edges. The plots display the NRMSE and PCC of Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT as a predictor of Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT.
  1. 1.

    Our theoretical analysis predicts that the LP score between two nodes i,j𝑖𝑗i,jitalic_i , italic_j that belong to the same social group S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT will always be non-negative; however, ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT can predict negative scores for pairs of nodes in the same social group. In this case, it appears that ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT relies more on the dissimilarity of (transformed) features than node degree.

  2. 2.

    For many network datasets (especially from the citation and online social domains), there exist node pairs (near the origin) for which the theoretic LP score underestimates the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT score. Upon further analysis (cf. Appendix H), we find that the theoretic score is less predictive of the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT score for nodes i,j𝑖𝑗i,jitalic_i , italic_j when the product of their degrees (i.e., their PA score) or similarity of their features is relatively low.

  3. 3.

    It appears that the theoretic LP score tends to poorly estimate the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT score when the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT score is relatively high; this suggests that ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT may conservatively rely more on the (dis)similarity of node features than node degree when the degree is large.

We do not observe that the theoretic LP scores are strong predictors of the ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT scores, although there is still a moderate association between these variables. This could be because the error bound for the theoretic scores for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, unlike for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, has an extra dependence maxu,v𝒱𝑫^vv𝑫^uusubscript𝑢𝑣𝒱subscript^𝑫𝑣𝑣subscript^𝑫𝑢𝑢\max_{u,v\in{\cal V}}\sqrt{\frac{\widehat{{\bm{D}}}_{vv}}{\widehat{{\bm{D}}}_{% uu}}}roman_max start_POSTSUBSCRIPT italic_u , italic_v ∈ caligraphic_V end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_v italic_v end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT end_ARG end_ARG on the degrees of the incident nodes (cf. ζrsubscript𝜁𝑟\zeta_{r}italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in Theorem 4.4). In contrast, the error bound for the theoretic scores for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT (cf. ζssubscript𝜁𝑠\zeta_{s}italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT in Theorem 4.3) does not depend on this degree ratio. This ratio can be quite large in social networks (e.g., celebrities vs. new users in the Twitter follow network); we further confirm that this ratio is large for our datasets in §I.

6.2 Within-Group Fairness

We now empirically validate the implications of GCN’s PA bias for within-group unfairness in LP. We run experiments on three network datasets: (1) the NBA social network (Dai & Wang, 2021), (2) the German credit network (Agarwal et al., 2021), and (3) a new DBLP-Fairness citation network that we construct. We describe these datasets in §D, including {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT and {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT.

We train 2-layer GCN encoders ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for LP (cf. §E). In Figure 3, for all the datasets, we plot Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT vs. Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (cf. Eqns. 17, 22) for each b[B]𝑏delimited-[]𝐵b\in[B]italic_b ∈ [ italic_B ]. We qualitatively and quantitatively observe that Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT is moderately predictive of Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT for each dataset. This confirms our theoretical intuition (§4.2) that a large disparity in the degree of nodes in S(b)T(1)superscript𝑆𝑏superscript𝑇1S^{(b)}\cap T^{(1)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT vs. S(b)T(2)superscript𝑆𝑏superscript𝑇2S^{(b)}\cap T^{(2)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT can greatly increase the unfairness Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT LP; such unfairness can amplify degree disparities, worsening power imbalances in the network. Many points deviate from the line of equality; these deviations can be explained by the reasons in §6.1 and the compounding of errors.

Table 1: 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT and the test AUC for the NBA, German, and DBLP-Fairness datasets with various settings of λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT. The left table corresponds to ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and the right to ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (\downarrow) ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT Test AUC (\uparrow) NBA 4.0 0.000±0.001plus-or-minus0.0000.0010.000\pm 0.0010.000 ± 0.001 0.753±0.002plus-or-minus0.7530.0020.753\pm 0.0020.753 ± 0.002 NBA 2.0 0.004±0.003plus-or-minus0.0040.0030.004\pm 0.0030.004 ± 0.003 0.752±0.003plus-or-minus0.7520.0030.752\pm 0.0030.752 ± 0.003 NBA 1.0 0.007±0.004plus-or-minus0.0070.0040.007\pm 0.0040.007 ± 0.004 0.752±0.003plus-or-minus0.7520.0030.752\pm 0.0030.752 ± 0.003 NBA 0.0 0.013±0.005plus-or-minus0.0130.0050.013\pm 0.0050.013 ± 0.005 0.752±0.003plus-or-minus0.7520.0030.752\pm 0.0030.752 ± 0.003 DBLPFAIRNESS 4.0 0.072±0.018plus-or-minus0.0720.0180.072\pm 0.0180.072 ± 0.018 0.741±0.008plus-or-minus0.7410.0080.741\pm 0.0080.741 ± 0.008 DBLPFAIRNESS 2.0 0.095±0.025plus-or-minus0.0950.0250.095\pm 0.0250.095 ± 0.025 0.756±0.007plus-or-minus0.7560.0070.756\pm 0.0070.756 ± 0.007 DBLPFAIRNESS 1.0 0.110±0.033plus-or-minus0.1100.0330.110\pm 0.0330.110 ± 0.033 0.770±0.010plus-or-minus0.7700.0100.770\pm 0.0100.770 ± 0.010 DBLPFAIRNESS 0.0 0.145±0.020plus-or-minus0.1450.0200.145\pm 0.0200.145 ± 0.020 0.778±0.007plus-or-minus0.7780.0070.778\pm 0.0070.778 ± 0.007 GERMAN 4.0 0.012±0.006plus-or-minus0.0120.0060.012\pm 0.0060.012 ± 0.006 0.876±0.017plus-or-minus0.8760.0170.876\pm 0.0170.876 ± 0.017 GERMAN 2.0 0.028±0.017plus-or-minus0.0280.0170.028\pm 0.0170.028 ± 0.017 0.889±0.017plus-or-minus0.8890.0170.889\pm 0.0170.889 ± 0.017 GERMAN 1.0 0.038±0.016plus-or-minus0.0380.0160.038\pm 0.0160.038 ± 0.016 0.897±0.014plus-or-minus0.8970.0140.897\pm 0.0140.897 ± 0.014 GERMAN 0.0 0.045±0.013plus-or-minus0.0450.0130.045\pm 0.0130.045 ± 0.013 0.912±0.009plus-or-minus0.9120.0090.912\pm 0.0090.912 ± 0.009

λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (\downarrow) ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT Test AUC (\uparrow) NBA 4.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.585±0.030plus-or-minus0.5850.0300.585\pm 0.0300.585 ± 0.030 NBA 2.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.584±0.032plus-or-minus0.5840.0320.584\pm 0.0320.584 ± 0.032 NBA 1.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.581±0.034plus-or-minus0.5810.0340.581\pm 0.0340.581 ± 0.034 NBA 0.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.583±0.034plus-or-minus0.5830.0340.583\pm 0.0340.583 ± 0.034 DBLPFAIRNESS 4.0 0.053±0.015plus-or-minus0.0530.0150.053\pm 0.0150.053 ± 0.015 0.715±0.010plus-or-minus0.7150.0100.715\pm 0.0100.715 ± 0.010 DBLPFAIRNESS 2.0 0.060±0.016plus-or-minus0.0600.0160.060\pm 0.0160.060 ± 0.016 0.731±0.009plus-or-minus0.7310.0090.731\pm 0.0090.731 ± 0.009 DBLPFAIRNESS 1.0 0.065±0.022plus-or-minus0.0650.0220.065\pm 0.0220.065 ± 0.022 0.746±0.009plus-or-minus0.7460.0090.746\pm 0.0090.746 ± 0.009 DBLPFAIRNESS 0.0 0.090±0.028plus-or-minus0.0900.0280.090\pm 0.0280.090 ± 0.028 0.758±0.011plus-or-minus0.7580.0110.758\pm 0.0110.758 ± 0.011 GERMAN 4.0 0.029±0.011plus-or-minus0.0290.0110.029\pm 0.0110.029 ± 0.011 0.830±0.024plus-or-minus0.8300.0240.830\pm 0.0240.830 ± 0.024 GERMAN 2.0 0.031±0.019plus-or-minus0.0310.0190.031\pm 0.0190.031 ± 0.019 0.843±0.027plus-or-minus0.8430.0270.843\pm 0.0270.843 ± 0.027 GERMAN 1.0 0.019±0.012plus-or-minus0.0190.0120.019\pm 0.0120.019 ± 0.012 0.864±0.020plus-or-minus0.8640.0200.864\pm 0.0200.864 ± 0.020 GERMAN 0.0 0.015±0.005plus-or-minus0.0150.0050.015\pm 0.0050.015 ± 0.005 0.883±0.009plus-or-minus0.8830.0090.883\pm 0.0090.883 ± 0.009

6.3 Fairness Regularizer

We evaluate our solution to alleviate LP unfairness (§4.2). In particular, we add our fairness regularization term fairsubscriptfair{\cal L}_{\text{fair}}caligraphic_L start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT to the original training loss for the 2-layer ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT encoders. During each training epoch, we compute Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT post-sigmoid using only the LP scores over the sampled (positive and negative) training edges. In Table 1, we summarize the link prediction fairness (1Bb[B]Δ(b))1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\left(\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}\right)( divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) and performance (test AUC) for the NBA, German, and DBLP-Fairness datasets with various settings of λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT.

For both graph filter types, we generally observe a significant decrease in 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (without a severe drop in test AUC) for λfair>0.0subscript𝜆fair0.0\lambda_{\text{fair}}>0.0italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT > 0.0 over λfair=0.0subscript𝜆fair0.0\lambda_{\text{fair}}=0.0italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT = 0.0 (with the exception of ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT for German); however, the varying magnitudes by which 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT decreases across the datasets suggests that λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT may need to be tuned per dataset. As expected, we mostly observe a tradeoff between 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT and the test AUC as λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT increases. Our experiments reveal that, regardless of graph filter type, even simple regularization approaches can alleviate this new form of unfairness. As this form of unfairness has not been previously explored, we have no baselines.

Our fairness regularizer can be easily integrated into model training, does not require significant additional computation, and directly optimizes for LP fairness. The time complexity of calculating the regularization term is 𝒪(b=1B|S(b)T(1)||S(b)|+|S(b)T(2)||S(b)|)𝒪superscriptsubscript𝑏1𝐵superscript𝑆𝑏superscript𝑇1superscript𝑆𝑏superscript𝑆𝑏superscript𝑇2superscript𝑆𝑏{\cal O}\left(\sum_{b=1}^{B}|S^{(b)}\cap T^{(1)}|\cdot|S^{(b)}|+|S^{(b)}\cap T% ^{(2)}|\cdot|S^{(b)}|\right)caligraphic_O ( ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | ⋅ | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | + | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT | ⋅ | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | ), as we have already computed the LP scores for the cross-entropy loss term and simply need to sum them appropriately with respect to the groups and subgroups. Furthermore, the time complexity of computing gradients for the regularization term is on the same order as backpropagation for the cross-entropy loss term.

However, our fairness regularizer is not applicable in settings where model parameters cannot be retrained or finetuned. Hence, we encourage future research to also explore post-processing fairness strategies. For example, for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT models, based on our theory (cf. Theorem 4.3), for each pair of nodes i,j𝑖𝑗i,jitalic_i , italic_j, we can decay the influence of GCN’s PA bias by scaling (pre-activation) LP scores by (𝑫^ii𝑫^jj)αsuperscriptsubscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗𝛼\left(\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}\right)^{-\alpha}( square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT, where 0<α<10𝛼10<\alpha<10 < italic_α < 1 is a hyperparameter that can be tuned to achieve a desirable balance between 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT and the test AUC.

Empirical evaluation of our fairness regularizer using existing LP fairness metrics, such as statistical parity and equal opportunity dyadic fairness (Li et al., 2021), or equal opportunity degree bias (Wang & Derr, 2022), is beyond the scope of our paper given that our algorithm and metric are designed to handle a different form of unfairness. For example, inter-group and intra-group links can be predicted at the same rate or with the same accuracy, but these links can be exclusively with high-degree nodes, thereby marginalizing low-degree nodes (cf. §J). Similarly, even if we consistently predict links with the same accuracy across nodes with different degrees, high-degree nodes can still receive higher LP scores than low-degree nodes (cf. §K).

7 Conclusion

We theoretically and empirically show that GCNs can have a PA bias in LP. We analyze how this bias can engender within-group unfairness, and amplify degree and power imbalances in networks. We further propose a simple training-time strategy to alleviate this unfairness. We encourage future work to: (1) explore PA bias in other GNN architectures and directed and heterophilic networks, (2) characterize the “rich get richer” evolution of networks affected by GCN’s PA bias, and (3) propose pre-processing and post-processing strategies for within-group LP unfairness.

Because this unfairness is at the level of dyads, we would like to explore new forms of unfairness that occur at the level of higher-order structures (e.g., prediction disparities between important coalitions of nodes). Moreover, node degree is a local property, and it would be valuable to theoretically and empirically relate higher-order graph properties (e.g., local clustering coefficient, different measures of centrality) to unfairness.

Acknowledgements

We would like to thank the anonymous reviewers for their feedback on this work. This work was partially supported by NSF 2211557, NSF 1937599, NSF 2119643, NSF 2303037, NSF 2312501, NASA, SRC JUMP 2.0 Center, Amazon Research Awards, and Snapchat Gifts.

Impact Statement

Our paper seeks to uncover and combat discrimination, bias, and unfairness in GNNs. Throughout, we tie our analysis back to issues of disparity and power, towards advancing justice in graph learning. While we propose a strategy to alleviate LP unfairness, we emphasize that it is not a ‘silver bullet’ solution; we encourage graph learning practitioners to adopt a sociotechnical approach to fairness and continually adapt their algorithms, datasets, and metrics in response to the everchanging landscape of inequality and power. Furthermore, the fairness of GCN LP should not sidestep concerns about GCN LP being used at all in certain scenarios.

Some datasets that we use contain protected attribute information (detailed in §D). We avoid using datasets that enable carceral technology (e.g., Recidivism (Agarwal et al., 2021)). We release our code and data with an MIT license.

For transparency, we do our best to discuss limitations throughout the paper. For each lemma and theorem (§4), our assumptions are clearly explained and justified either before or in the statement thereof, and we include complete proofs of our theoretical claims in §A and §B.

For reproducibility, we provide all our code and data (including the raw DBLP-Fairness dataset) in our GitHub repository, along with a README. We detail our data processing steps in §D.3. Furthermore, our experiments (§6) are run with 10 random seeds and errors are reported. We provide model implementation details in §E.

References

  • Agarwal et al. (2021) Agarwal, C., Lakkaraju, H., and Zitnik, M. Towards a unified framework for fair and stable graph representation learning. In Conference on Uncertainty in Artificial Intelligence, 2021.
  • Barabási & Albert (1999) Barabási, A.-L. and Albert, R. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. doi: 10.1126/science.286.5439.509. URL https://www.science.org/doi/abs/10.1126/science.286.5439.509.
  • Bashardoust et al. (2022) Bashardoust, A., Friedler, S. A., Scheidegger, C. E., Sullivan, B. D., and Venkatasubramanian, S. Reducing access disparities in networks using edge augmentation. ArXiv, abs/2209.07616, 2022.
  • Bojchevski & Günnemann (2018) Bojchevski, A. and Günnemann, S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1ZdKJ-0W.
  • Boyd et al. (2014) Boyd, D., Levy, K., and Marwick, A. The networked nature of algorithmic discrimination. Data and Discrimination: Collected Essays. Open Technology Institute, 2014.
  • Chamberlain et al. (2023) Chamberlain, B. P., Shirobokov, S., Rossi, E., Frasca, F., Markovich, T., Hammerla, N. Y., Bronstein, M. M., and Hansmire, M. Graph neural networks for link prediction with subgraph sketching. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=m1oqEOAozQU.
  • Collins & Bilge (2020) Collins, P. H. and Bilge, S. Intersectionality. John Wiley & Sons, 2020.
  • Current et al. (2022) Current, S., He, Y., Gurukar, S., and Parthasarathy, S. FairEGM: Fair link prediction and recommendation via emulated graph modification. In Equity and Access in Algorithms, Mechanisms, and Optimization. ACM, oct 2022. doi: 10.1145/3551624.3555287. URL https://doi.org/10.1145%2F3551624.3555287.
  • Dai & Wang (2021) Dai, E. and Wang, S. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21, pp.  680–688, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450382977. doi: 10.1145/3437963.3441752. URL https://doi.org/10.1145/3437963.3441752.
  • Fan et al. (2019) Fan, W., Ma, Y., Li, Q., He, Y., Zhao, Y. E., Tang, J., and Yin, D. Graph neural networks for social recommendation. The World Wide Web Conference, 2019.
  • Fey (2019) Fey, M. link_pred.py, 2019. URL https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py.
  • Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  • Fish et al. (2019) Fish, B., Bashardoust, A., Boyd, D., Friedler, S., Scheidegger, C., and Venkatasubramanian, S. Gaps in information access in social networks? In The World Wide Web Conference, WWW ’19, pp.  480–490, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450366748. doi: 10.1145/3308558.3313680. URL https://doi.org/10.1145/3308558.3313680.
  • Foulds et al. (2020) Foulds, J. R., Islam, R., Keya, K. N., and Pan, S. An intersectional definition of fairness. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp.  1918–1921, 2020. doi: 10.1109/ICDE48307.2020.00203.
  • Freedman et al. (2007) Freedman, D., Pisani, R., and Purves, R. Statistics: Fourth International Student Edition. Emersion: Emergent Village Resources for Communities of Faith Series. W.W. Norton & Company, 2007. ISBN 9780393930436.
  • Ghosh et al. (2021) Ghosh, A., Genuit, L., and Reagan, M. Characterizing intersectional group fairness with worst-case comparisons. In Artificial Intelligence Diversity, Belonging, Equity, and Inclusion, pp.  22–34. PMLR, 2021.
  • Giovanni et al. (2023) Giovanni, F. D., Rowbottom, J., Chamberlain, B. P., Markovich, T., and Bronstein, M. M. Understanding convolution on graphs via energies. 2023. URL https://openreview.net/forum?id=v5ew3FPTgb.
  • Gu et al. (2018) Gu, Y., Sun, Y., Li, Y., and Yang, Y. Rare: Social rank regulated large-scale network embedding. Proceedings of the 2018 World Wide Web Conference, 2018.
  • Hofstra et al. (2017) Hofstra, B., Corten, R., van Tubergen, F., and Ellison, N. B. Sources of segregation in social networks: A novel approach using facebook. American Sociological Review, 82(3):625–656, 2017. doi: 10.1177/0003122417705656. URL https://doi.org/10.1177/0003122417705656.
  • Kamishima et al. (2011) Kamishima, T., Akaho, S., and Sakuma, J. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pp.  643–650, 2011. doi: 10.1109/ICDMW.2011.83.
  • Kang et al. (2022) Kang, J., Zhu, Y., Xia, Y., Luo, J., and Tong, H. Rawlsgcn: Towards rawlsian difference principle on graph convolutional network. In Proceedings of the ACM Web Conference 2022, pp.  1214–1225, 2022.
  • Kasy & Abebe (2021) Kasy, M. and Abebe, R. Fairness, equality, and power in algorithmic decision-making. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp.  576–586, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445919. URL https://doi.org/10.1145/3442188.3445919.
  • Kearns et al. (2017) Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, 2017.
  • Keriven (2022) Keriven, N. Not too little, not too much: a theoretical analysis of graph (over)smoothing. ArXiv, abs/2205.12156, 2022.
  • Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  • Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  • Leskovec et al. (2008) Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6:123–29, 2008.
  • Li et al. (2020) Li, P., Wang, Y., Wang, H., and Leskovec, J. Distance encoding: Design provably more powerful neural networks for graph representation learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  • Li et al. (2021) Li, P., Wang, Y., Zhao, H., Hong, P., and Liu, H. On dyadic fairness: Exploring and mitigating bias in graph connections. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xgGS6PmzNq6.
  • Li et al. (2022) Li, Y., Wang, X., Ning, Y., and Wang, H. Fairlp: Towards fair link prediction on social network graphs. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):628–639, May 2022. doi: 10.1609/icwsm.v16i1.19321. URL https://ojs.aaai.org/index.php/ICWSM/article/view/19321.
  • Liu et al. (2021) Liu, Z., Nguyen, T.-K., and Fang, Y. Tail-gnn: Tail-node graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp.  1109–1119, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467276. URL https://doi.org/10.1145/3447548.3467276.
  • Liu et al. (2023) Liu, Z., Nguyen, T.-K., and Fang, Y. On generalized degree fairness in graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):4525–4533, Jun. 2023. doi: 10.1609/aaai.v37i4.25574. URL https://ojs.aaai.org/index.php/AAAI/article/view/25574.
  • Lovász (2001) Lovász, L. M. Random walks on graphs: A survey. 2001.
  • Malliaros & Megalooikonomou (2011) Malliaros, F. D. and Megalooikonomou, V. Expansion properties of large social graphs. In DASFAA Workshops, 2011.
  • Nakkiran et al. (2019) Nakkiran, P., Kaplun, G., Kalimeris, D., Yang, T., Edelman, B. L., Zhang, F., and Barak, B. Sgd on neural networks learns functions of increasing complexity, 2019.
  • Oono & Suzuki (2020) Oono, K. and Suzuki, T. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1ldO2EFPr.
  • Otto (2019) Otto, S. How to normalize the rmse [blog post], 2019. URL https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/.
  • Ovalle et al. (2023) Ovalle, A., Subramonian, A., Gautam, V., Gee, G., and Chang, K.-W. Factoring the matrix of domination: A critical review and reimagination of intersectionality in ai fairness. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pp.  496–511, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604705. URL https://doi.org/10.1145/3600211.3604705.
  • Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA, 2019.
  • Rozemberczki & Sarkar (2020) Rozemberczki, B. and Sarkar, R. Characteristic functions on graphs: Birds of a feather, from statistical descriptors to parametric models. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp.  1325–1334, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411866. URL https://doi.org/10.1145/3340531.3411866.
  • Rozemberczki et al. (2021) Rozemberczki, B., Allen, C., and Sarkar, R. Multi-Scale attributed node embedding. Journal of Complex Networks, 9(2):cnab014, 05 2021. ISSN 2051-1329. doi: 10.1093/comnet/cnab014. URL https://doi.org/10.1093/comnet/cnab014.
  • Sankar et al. (2021) Sankar, A., Liu, Y., Yu, J., and Shah, N. Graph neural networks for friend ranking in large-scale social platforms. Proceedings of the Web Conference 2021, 2021.
  • Shchur et al. (2018) Shchur, O., Mumme, M., Bojchevski, A., and Günnemann, S. Pitfalls of graph neural network evaluation. ArXiv, abs/1811.05868, 2018.
  • Shomer et al. (2023) Shomer, H., **, W., Wang, W., and Tang, J. Toward degree bias in embedding-based knowledge graph completion. In Proceedings of the ACM Web Conference 2023, WWW ’23, pp.  705–715, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394161. doi: 10.1145/3543507.3583544. URL https://doi.org/10.1145/3543507.3583544.
  • Stoica et al. (2018) Stoica, A.-A., Riederer, C., and Chaintreau, A. Algorithmic glass ceiling in social networks: The effects of social recommendations on network diversity. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, pp.  923–932, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee. ISBN 9781450356398. doi: 10.1145/3178876.3186140. URL https://doi.org/10.1145/3178876.3186140.
  • Stoica et al. (2020) Stoica, A.-A., Han, J. X., and Chaintreau, A. Seeding network influence in biased networks and the benefits of diversity. In Proceedings of The Web Conference 2020, WWW ’20, pp.  2089–2098, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450370233. doi: 10.1145/3366423.3380275. URL https://doi.org/10.1145/3366423.3380275.
  • Subramonian et al. (2022) Subramonian, A., Chang, K.-W., and Sun, Y. On the discrimination risk of mean aggregation feature imputation in graphs. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  32957–32973. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/d4c2f25bf0c33065b7d4fb9be2a9add1-Paper-Conference.pdf.
  • Tang et al. (2008) Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., and Su, Z. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp.  990–998, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605581934. doi: 10.1145/1401890.1402008. URL https://doi.org/10.1145/1401890.1402008.
  • Tang et al. (2020) Tang, X., Yao, H., Sun, Y., Wang, Y., Tang, J., Aggarwal, C., Mitra, P., and Wang, S. Investigating and mitigating degree-related biases in graph convoltuional networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp.  1435–1444, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411872. URL https://doi.org/10.1145/3340531.3411872.
  • Valle-Pérez et al. (2019) Valle-Pérez, G., Camargo, C. Q., and Louis, A. A. Deep learning generalizes because the parameter-function map is biased towards simple functions, 2019.
  • Veličković et al. (2018) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
  • Wang et al. (2022) Wang, A., Ramaswamy, V. V., and Russakovsky, O. Towards intersectionality in machine learning: Including more identities, handling underrepresentation, and performing evaluation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pp.  336–349, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533101. URL https://doi.org/10.1145/3531146.3533101.
  • Wang & Derr (2022) Wang, Y. and Derr, T. Degree-related bias in link prediction. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pp.  757–758, 2022. doi: 10.1109/ICDMW58026.2022.00103.
  • Xie et al. (2021) Xie, Q., Zhu, Y., Huang, J., Du, P., and Nie, J.-Y. Graph neural collaborative topic model for citation recommendation. ACM Trans. Inf. Syst., 40(3), nov 2021. ISSN 1046-8188. doi: 10.1145/3473973. URL https://doi.org/10.1145/3473973.
  • Xu et al. (2023) Xu, H., Xiang, L., Huang, F., Weng, Y., Xu, R., Wang, X., and Zhou, C. Grace: Graph self-distillation and completion to mitigate degree-related biases. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp.  2813–2824, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599368. URL https://doi.org/10.1145/3580305.3599368.
  • Xu et al. (2021) Xu, H.-R., Bu, Y., Liu, M., Zhang, C., Sun, M., Zhang, Y., Meyer, E., Salas, E., and Ding, Y. Team power dynamics and team impact: New perspectives on scientific collaboration using career age as a proxy for team power. Journal of the Association for Information Science and Technology, 73:1489–1505, 2021.
  • Xu et al. (2018) Xu, K., Li, C., Tian, Y., Sonobe, T., ichi Kawarabayashi, K., and Jegelka, S. Representation learning on graphs with jum** knowledge networks. In International Conference on Machine Learning, 2018.
  • Yamamoto & Frachtenberg (2022) Yamamoto, J. and Frachtenberg, E. Gender differences in collaboration patterns in computer science. Publications, 10(1), 2022. ISSN 2304-6775. doi: 10.3390/publications10010010. URL https://www.mdpi.com/2304-6775/10/1/10.
  • Zhang & Chen (2018) Zhang, M. and Chen, Y. Link prediction based on graph neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp.  5171–5181, Red Hook, NY, USA, 2018. Curran Associates Inc.
  • Zhang et al. (2021) Zhang, Y., Han, J. X., Mahajan, I., Bengani, P., and Chaintreau, A. Chasm in hegemony: explaining and reproducing disparities in homophilous networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–38, 2021.
  • Zhao et al. (2022) Zhao, B., Gu, Y., Forde, J. Z., and Saphra, N. One venue, two conferences: The separation of chinese and american citation networks, 2022.

Supplementary Text

Appendix A Proofs

A.1 Proof of Lemma 4.1

Proof.

Similarly to Xu et al. (2018); Tang et al. (2020), we compute the first-order partial derivatives of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT:

𝒔i(L)𝒙j=pΨijL+1l=L1diag(𝟙𝒛p(l)(l)>0)𝑾s(l)𝑫p(l)p(l)𝑫p(l1)p(l1)superscriptsubscript𝒔𝑖𝐿subscript𝒙𝑗subscript𝑝subscriptsuperscriptΨ𝐿1𝑖𝑗superscriptsubscriptproduct𝑙𝐿1diagsubscript1subscriptsuperscript𝒛𝑙superscript𝑝𝑙0superscriptsubscript𝑾𝑠𝑙subscript𝑫superscript𝑝𝑙superscript𝑝𝑙subscript𝑫superscript𝑝𝑙1superscript𝑝𝑙1\displaystyle\frac{\partial{\bm{s}}_{i}^{(L)}}{\partial{\bm{x}}_{j}}=\sum_{p% \in\Psi^{L+1}_{i\to j}}\prod_{l=L}^{1}\frac{\text{diag}\left(\mathbbm{1}_{{\bm% {z}}^{(l)}_{p^{(l)}}>0}\right){\bm{W}}_{s}^{(l)}}{\sqrt{{\bm{D}}_{p^{(l)}p^{(l% )}}{\bm{D}}_{p^{(l-1)}p^{(l-1)}}}}divide start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT italic_p ∈ roman_Ψ start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i → italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG diag ( blackboard_1 start_POSTSUBSCRIPT bold_italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ) bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG bold_italic_D start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_italic_D start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG end_ARG ,𝒓i(L)𝒙j=pΨijL+1l=L1diag(𝟙𝒛p(l)(l)>0)𝑾s(l)𝑫p(l)p(l)\displaystyle,\quad\frac{\partial{\bm{r}}_{i}^{(L)}}{\partial{\bm{x}}_{j}}=% \sum_{p\in\Psi^{L+1}_{i\to j}}\prod_{l=L}^{1}\frac{\text{diag}\left(\mathbbm{1% }_{{\bm{z}}^{(l)}_{p^{(l)}}>0}\right){\bm{W}}_{s}^{(l)}}{{\bm{D}}_{p^{(l)}p^{(% l)}}}, divide start_ARG ∂ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT italic_p ∈ roman_Ψ start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i → italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG diag ( blackboard_1 start_POSTSUBSCRIPT bold_italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ) bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG start_ARG bold_italic_D start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG (30)
𝒔i(L)𝒙j=𝑫ii𝑫jjpΨijL+1superscriptsubscript𝒔𝑖𝐿subscript𝒙𝑗subscript𝑫𝑖𝑖subscript𝑫𝑗𝑗subscript𝑝subscriptsuperscriptΨ𝐿1𝑖𝑗\displaystyle\frac{\partial{\bm{s}}_{i}^{(L)}}{\partial{\bm{x}}_{j}}=\sqrt{% \frac{{\bm{D}}_{ii}}{{\bm{D}}_{jj}}}\sum_{p\in\Psi^{L+1}_{i\to j}}divide start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = square-root start_ARG divide start_ARG bold_italic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG start_ARG bold_italic_D start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_p ∈ roman_Ψ start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i → italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT l=L1diag(𝟙𝒛p(l)(l)>0)𝑾s(l)𝑫p(l)p(l)superscriptsubscriptproduct𝑙𝐿1diagsubscript1subscriptsuperscript𝒛𝑙superscript𝑝𝑙0superscriptsubscript𝑾𝑠𝑙subscript𝑫superscript𝑝𝑙superscript𝑝𝑙\displaystyle\prod_{l=L}^{1}\frac{\text{diag}\left(\mathbbm{1}_{{\bm{z}}^{(l)}% _{p^{(l)}}>0}\right){\bm{W}}_{s}^{(l)}}{{\bm{D}}_{p^{(l)}p^{(l)}}}∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG diag ( blackboard_1 start_POSTSUBSCRIPT bold_italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ) bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG start_ARG bold_italic_D start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG (31)

where p(l)superscript𝑝𝑙p^{(l)}italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the l𝑙litalic_l-th node on path p𝑝pitalic_p in the computation graph of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT or ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (p(L)superscript𝑝𝐿p^{(L)}italic_p start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT is node i𝑖iitalic_i and p(0)superscript𝑝0p^{(0)}italic_p start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT is node j𝑗jitalic_j); ΨijγsubscriptsuperscriptΨ𝛾𝑖𝑗\Psi^{\gamma}_{i\to j}roman_Ψ start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i → italic_j end_POSTSUBSCRIPT is the set of all γ𝛾\gammaitalic_γ-length random walk paths from node i𝑖iitalic_i to j𝑗jitalic_j; and 𝒛p(l)(l)subscriptsuperscript𝒛𝑙superscript𝑝𝑙{\bm{z}}^{(l)}_{p^{(l)}}bold_italic_z start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is pre-activated 𝒔p(l)(l)subscriptsuperscript𝒔𝑙superscript𝑝𝑙{\bm{s}}^{(l)}_{p^{(l)}}bold_italic_s start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT or 𝒓p(l)(l)subscriptsuperscript𝒓𝑙superscript𝑝𝑙{\bm{r}}^{(l)}_{p^{(l)}}bold_italic_r start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

With our assumption that the path from node ij𝑖𝑗i\to jitalic_i → italic_j in the computation graph of ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is independently activated with probability ρs(i)subscript𝜌𝑠𝑖\rho_{s}(i)italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ), and similarly, ρr(i)subscript𝜌𝑟𝑖\rho_{r}(i)italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT:

𝔼[𝒔i(L)𝒙j]=(𝑫12𝑨𝑫12)ijLρs(i)(l=L1𝑾s(l)),𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝒙𝑗subscriptsuperscriptsuperscript𝑫12𝑨superscript𝑫12𝐿𝑖𝑗subscript𝜌𝑠𝑖superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝑾𝑠𝑙\displaystyle\mathop{\mathbb{E}}\left[\frac{\partial{\bm{s}}_{i}^{(L)}}{% \partial{\bm{x}}_{j}}\right]=\left({\bm{D}}^{-\frac{1}{2}}{\bm{A}}{\bm{D}}^{-% \frac{1}{2}}\right)^{L}_{ij}\rho_{s}(i)\left(\prod_{l=L}^{1}{\bm{W}}_{s}^{(l)}% \right),blackboard_E [ divide start_ARG ∂ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ] = ( bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_italic_A bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) , (32)
𝔼[𝒓i(L)𝒙j]=(𝑫1𝑨)ijLρr(i)(l=L1𝑾r(l)).𝔼delimited-[]superscriptsubscript𝒓𝑖𝐿subscript𝒙𝑗subscriptsuperscriptsuperscript𝑫1𝑨𝐿𝑖𝑗subscript𝜌𝑟𝑖superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝑾𝑟𝑙\displaystyle\mathop{\mathbb{E}}\left[\frac{\partial{\bm{r}}_{i}^{(L)}}{% \partial{\bm{x}}_{j}}\right]=\left({\bm{D}}^{-1}{\bm{A}}\right)^{L}_{ij}\rho_{% r}(i)\left(\prod_{l=L}^{1}{\bm{W}}_{r}^{(l)}\right).blackboard_E [ divide start_ARG ∂ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ] = ( bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) . (33)

Then, recalling Eqn. 5:

𝔼[𝒔i(L)]=j𝒱(𝑫12𝑨𝑫12)ijLρs(i)(l=L1𝑾s(l))𝒙j+𝟎,𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝑗𝒱subscriptsuperscriptsuperscript𝑫12𝑨superscript𝑫12𝐿𝑖𝑗subscript𝜌𝑠𝑖superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝑾𝑠𝑙subscript𝒙𝑗0\displaystyle\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]=\sum_{j\in{% \cal V}}\left({\bm{D}}^{-\frac{1}{2}}{\bm{A}}{\bm{D}}^{-\frac{1}{2}}\right)^{L% }_{ij}\rho_{s}(i)\left(\prod_{l=L}^{1}{\bm{W}}_{s}^{(l)}\right){\bm{x}}_{j}+% \mathbf{0},blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT ( bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_italic_A bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + bold_0 , (34)
𝔼[𝒓i(L)]=j𝒱(𝑫1𝑨)ijLρr(i)(l=L1𝑾r(l))𝒙j+𝟎𝔼delimited-[]superscriptsubscript𝒓𝑖𝐿subscript𝑗𝒱subscriptsuperscriptsuperscript𝑫1𝑨𝐿𝑖𝑗subscript𝜌𝑟𝑖superscriptsubscriptproduct𝑙𝐿1superscriptsubscript𝑾𝑟𝑙subscript𝒙𝑗0\displaystyle\mathop{\mathbb{E}}\left[{\bm{r}}_{i}^{(L)}\right]=\sum_{j\in{% \cal V}}\left({\bm{D}}^{-1}{\bm{A}}\right)^{L}_{ij}\rho_{r}(i)\left(\prod_{l=L% }^{1}{\bm{W}}_{r}^{(l)}\right){\bm{x}}_{j}+\mathbf{0}blackboard_E [ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT ( bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) ( ∏ start_POSTSUBSCRIPT italic_l = italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + bold_0 (35)
𝔼[𝒔i(L)]=j𝒱ρs(i)(𝑫12𝑨𝑫12)ijLαj,𝔼[𝒓i(L)]=j𝒱ρr(i)(𝑫1𝑨)ijLβj.formulae-sequence𝔼delimited-[]superscriptsubscript𝒔𝑖𝐿subscript𝑗𝒱subscript𝜌𝑠𝑖subscriptsuperscriptsuperscript𝑫12𝑨superscript𝑫12𝐿𝑖𝑗subscript𝛼𝑗𝔼delimited-[]superscriptsubscript𝒓𝑖𝐿subscript𝑗𝒱subscript𝜌𝑟𝑖subscriptsuperscriptsuperscript𝑫1𝑨𝐿𝑖𝑗subscript𝛽𝑗\displaystyle\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]=\sum_{j\in{% \cal V}}\rho_{s}(i)\left({\bm{D}}^{-\frac{1}{2}}{\bm{A}}{\bm{D}}^{-\frac{1}{2}% }\right)^{L}_{ij}\alpha_{j},\quad\mathop{\mathbb{E}}\left[{\bm{r}}_{i}^{(L)}% \right]=\sum_{j\in{\cal V}}\rho_{r}(i)\left({\bm{D}}^{-1}{\bm{A}}\right)^{L}_{% ij}\beta_{j}.blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i ) ( bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_italic_A bold_italic_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , blackboard_E [ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_V end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i ) ( bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (36)

A.2 Proof of Lemma 4.2

Proof.

For jS(b)𝑗superscript𝑆𝑏j\in S^{(b)}italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, we can re-express 𝑷^ijL=(𝑷^(b))ijL=(𝒆(i))(𝑷^(b))L𝒆(j)subscriptsuperscript^𝑷𝐿𝑖𝑗subscriptsuperscriptsuperscript^𝑷𝑏𝐿𝑖𝑗superscriptsuperscript𝒆𝑖superscriptsuperscript^𝑷𝑏𝐿superscript𝒆𝑗\widehat{{\bm{P}}}^{L}_{ij}=\left(\widehat{{\bm{P}}}^{(b)}\right)^{L}_{ij}=% \left({\bm{e}}^{(i)}\right)^{\intercal}\left(\widehat{{\bm{P}}}^{(b)}\right)^{% L}{\bm{e}}^{(j)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ( over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT555For simplicity, we abuse notation here: (𝑷^(b))ijLsubscriptsuperscriptsuperscript^𝑷𝑏𝐿𝑖𝑗\left(\widehat{{\bm{P}}}^{(b)}\right)^{L}_{ij}( over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is not the entry at row i𝑖iitalic_i and column j𝑗jitalic_j, but rather the entry at the row corresponding to node i𝑖iitalic_i and column corresponding to node j𝑗jitalic_j. Similarly, 𝒆(i)superscript𝒆𝑖{\bm{e}}^{(i)}bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT is the standard basis vector with a 1 at the entry corresponding to node i𝑖iitalic_i.. By the spectral properties of 𝑷^(b)superscript^𝑷𝑏\widehat{{\bm{P}}}^{(b)}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, (𝒆(i))𝒗1(b)=𝑫^iivol(𝒢(b))superscriptsuperscript𝒆𝑖superscriptsubscript𝒗1𝑏subscript^𝑫𝑖𝑖volsuperscript𝒢𝑏\left({\bm{e}}^{(i)}\right)^{\intercal}{\bm{v}}_{1}^{(b)}=\sqrt{\frac{\widehat% {{\bm{D}}}_{ii}}{\text{vol}\left({\cal G}^{(b)}\right)}}( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG end_ARG (Lovász, 2001). Hence:

𝑷^ijLsubscriptsuperscript^𝑷𝐿𝑖𝑗\displaystyle\widehat{{\bm{P}}}^{L}_{ij}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =k=1|S(b)|(λk(b))L(𝒆(i))𝒗k(b)(𝒗k(b))𝒆(j)absentsuperscriptsubscript𝑘1superscript𝑆𝑏superscriptsuperscriptsubscript𝜆𝑘𝑏𝐿superscriptsuperscript𝒆𝑖subscriptsuperscript𝒗𝑏𝑘superscriptsubscriptsuperscript𝒗𝑏𝑘superscript𝒆𝑗\displaystyle=\sum_{k=1}^{\left|S^{(b)}\right|}\left(\lambda_{k}^{(b)}\right)^% {L}\left({\bm{e}}^{(i)}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\left({\bm{v}}^{(b% )}_{k}\right)^{\intercal}{\bm{e}}^{(j)}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT (37)
=𝑫^ii𝑫^jjvol(𝒢(b))+k=2|S(b)|(λk(b))L(𝒆(i))𝒗k(b)(𝒗k(b))𝒆(j)absentsubscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏superscriptsubscript𝑘2superscript𝑆𝑏superscriptsuperscriptsubscript𝜆𝑘𝑏𝐿superscriptsuperscript𝒆𝑖subscriptsuperscript𝒗𝑏𝑘superscriptsubscriptsuperscript𝒗𝑏𝑘superscript𝒆𝑗\displaystyle=\frac{\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}}{% \text{vol}\left({\cal G}^{(b)}\right)}+\sum_{k=2}^{\left|S^{(b)}\right|}\left(% \lambda_{k}^{(b)}\right)^{L}\left({\bm{e}}^{(i)}\right)^{\intercal}{\bm{v}}^{(% b)}_{k}\left({\bm{v}}^{(b)}_{k}\right)^{\intercal}{\bm{e}}^{(j)}= divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG + ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT (38)

Then, by Cauchy-Schwarz:

|𝑷^ijL𝑫^ii𝑫^jjvol(𝒢(b))|subscriptsuperscript^𝑷𝐿𝑖𝑗subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏\displaystyle\left|\widehat{{\bm{P}}}^{L}_{ij}-\frac{\sqrt{\widehat{{\bm{D}}}_% {ii}\widehat{{\bm{D}}}_{jj}}}{\text{vol}\left({\cal G}^{(b)}\right)}\right|| over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | (λ(b))Lk=1|S(b)||(𝒆(i))𝒗k(b)||(𝒆(j))𝒗k(b)|absentsuperscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑘1superscript𝑆𝑏superscriptsuperscript𝒆𝑖subscriptsuperscript𝒗𝑏𝑘superscriptsuperscript𝒆𝑗subscriptsuperscript𝒗𝑏𝑘\displaystyle\leq\left(\lambda^{(b)}\right)^{L}\sum_{k=1}^{\left|S^{(b)}\right% |}\left|\left({\bm{e}}^{(i)}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\right|\left|% \left({\bm{e}}^{(j)}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\right|≤ ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | (39)
(λ(b))L(k=1|S(b)||(𝒆(i))𝒗k(b)|2)12(k=1|S(b)||(𝒆(j))𝒗k(b)|2)12absentsuperscriptsuperscript𝜆𝑏𝐿superscriptsuperscriptsubscript𝑘1superscript𝑆𝑏superscriptsuperscriptsuperscript𝒆𝑖subscriptsuperscript𝒗𝑏𝑘212superscriptsuperscriptsubscript𝑘1superscript𝑆𝑏superscriptsuperscriptsuperscript𝒆𝑗subscriptsuperscript𝒗𝑏𝑘212\displaystyle\leq\left(\lambda^{(b)}\right)^{L}\left(\sum_{k=1}^{\left|S^{(b)}% \right|}\left|\left({\bm{e}}^{(i)}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\right|% ^{2}\right)^{\frac{1}{2}}\left(\sum_{k=1}^{\left|S^{(b)}\right|}\left|\left({% \bm{e}}^{(j)}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\right|^{2}\right)^{\frac{1}% {2}}≤ ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT (40)
=(λ(b))L((𝒆(i))𝑽(b)(𝑽(b))𝒆(i))12((𝒆(j))𝑽(b)(𝑽(b))𝒆(j))12absentsuperscriptsuperscript𝜆𝑏𝐿superscriptsuperscriptsuperscript𝒆𝑖superscript𝑽𝑏superscriptsuperscript𝑽𝑏superscript𝒆𝑖12superscriptsuperscriptsuperscript𝒆𝑗superscript𝑽𝑏superscriptsuperscript𝑽𝑏superscript𝒆𝑗12\displaystyle=\left(\lambda^{(b)}\right)^{L}\left(\left({\bm{e}}^{(i)}\right)^% {\intercal}{\bm{V}}^{(b)}\left({\bm{V}}^{(b)}\right)^{\intercal}{\bm{e}}^{(i)}% \right)^{\frac{1}{2}}\left(\left({\bm{e}}^{(j)}\right)^{\intercal}{\bm{V}}^{(b% )}\left({\bm{V}}^{(b)}\right)^{\intercal}{\bm{e}}^{(j)}\right)^{\frac{1}{2}}= ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_V start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_V start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( ( bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_V start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_V start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT (41)
=(λ(b))L𝒆(i)2𝒆(j)2absentsuperscriptsuperscript𝜆𝑏𝐿subscriptnormsuperscript𝒆𝑖2subscriptnormsuperscript𝒆𝑗2\displaystyle=\left(\lambda^{(b)}\right)^{L}\left\|{\bm{e}}^{(i)}\right\|_{2}% \left\|{\bm{e}}^{(j)}\right\|_{2}= ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∥ bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (42)
=(λ(b))Labsentsuperscriptsuperscript𝜆𝑏𝐿\displaystyle=\left(\lambda^{(b)}\right)^{L}= ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT (43)

Let 𝑷L=(𝑷^+Ξ(0))L=𝑷^L+Ξ(L)superscript𝑷𝐿superscript^𝑷superscriptΞ0𝐿superscript^𝑷𝐿superscriptΞ𝐿{\bm{P}}^{L}=\left(\widehat{{\bm{P}}}+\Xi^{(0)}\right)^{L}=\widehat{{\bm{P}}}^% {L}+\Xi^{(L)}bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT = ( over^ start_ARG bold_italic_P end_ARG + roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + roman_Ξ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT. Then, by the triangle inequality:

|𝑷ijL𝑫^ii𝑫^jjvol(𝒢(b))|subscriptsuperscript𝑷𝐿𝑖𝑗subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏\displaystyle\left|{\bm{P}}^{L}_{ij}-\frac{\sqrt{\widehat{{\bm{D}}}_{ii}% \widehat{{\bm{D}}}_{jj}}}{\text{vol}\left({\cal G}^{(b)}\right)}\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | (λ(b))L+|(𝒆(i))Ξ(L)𝒆(j)|absentsuperscriptsuperscript𝜆𝑏𝐿superscriptsuperscript𝒆𝑖superscriptΞ𝐿superscript𝒆𝑗\displaystyle\leq\left(\lambda^{(b)}\right)^{L}+\left|\left({\bm{e}}^{(i)}% \right)^{\intercal}\Xi^{(L)}{\bm{e}}^{(j)}\right|≤ ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT roman_Ξ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | (44)
(λ(b))L+Ξ(L)opabsentsuperscriptsuperscript𝜆𝑏𝐿subscriptnormsuperscriptΞ𝐿𝑜𝑝\displaystyle\leq\left(\lambda^{(b)}\right)^{L}+\left\|\Xi^{(L)}\right\|_{op}≤ ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∥ roman_Ξ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (45)
(λ(b))L+l=1L(Ll)Ξ(0)opl𝑷^opLlabsentsuperscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\leq\left(\lambda^{(b)}\right)^{L}+\sum_{l=1}^{L}{L\choose l}% \left\|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}≤ ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (46)

For jS(b)𝑗superscript𝑆𝑏j\notin S^{(b)}italic_j ∉ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, 𝑷^ijL=0subscriptsuperscript^𝑷𝐿𝑖𝑗0\widehat{{\bm{P}}}^{L}_{ij}=0over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0. Then:

|𝑷ijL0|subscriptsuperscript𝑷𝐿𝑖𝑗0\displaystyle\left|{\bm{P}}^{L}_{ij}-0\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - 0 | |(𝒆(i))Ξ(L)𝒆(j)|absentsuperscriptsuperscript𝒆𝑖superscriptΞ𝐿superscript𝒆𝑗\displaystyle\leq\left|\left({\bm{e}}^{(i)}\right)^{\intercal}\Xi^{(L)}{\bm{e}% }^{(j)}\right|≤ | ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT roman_Ξ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | (47)
l=1L(Ll)Ξ(0)opl𝑷^opLlabsentsuperscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\leq\sum_{l=1}^{L}{L\choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}% \left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}≤ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (48)

A.3 Proof of Theorem 4.3

Proof.

For u,v𝒱𝑢𝑣𝒱u,v\in{\cal V}italic_u , italic_v ∈ caligraphic_V, let |δuv|ζssubscript𝛿𝑢𝑣subscript𝜁𝑠\left|\delta_{uv}\right|\leq\zeta_{s}| italic_δ start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT | ≤ italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Combining Lemmas 4.1 and 4.2, by our assumption that the computation graph paths to i,j𝑖𝑗i,jitalic_i , italic_j are activated independently:

𝔼[fLP(𝒔i(L),𝒔j(L))]=𝔼[𝒔i(L)]𝔼[𝒔j(L)]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿𝔼superscriptdelimited-[]superscriptsubscript𝒔𝑖𝐿𝔼delimited-[]superscriptsubscript𝒔𝑗𝐿\displaystyle\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_% {j}^{(L)}\right)\right]=\mathop{\mathbb{E}}\left[{\bm{s}}_{i}^{(L)}\right]^{% \intercal}\mathop{\mathbb{E}}\left[{\bm{s}}_{j}^{(L)}\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] = blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT blackboard_E [ bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] (49)
=ρ¯s2(b)(kS(b)𝑫^ii𝑫^kkvol(𝒢(b))αk+k𝒱δikαk)(kS(b)𝑫^jj𝑫^kkvol(𝒢(b))αk+k𝒱δjkαk)absentsuperscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘subscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛼𝑘subscript𝑘superscript𝑆𝑏subscript^𝑫𝑗𝑗subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛼𝑘\displaystyle=\overline{\rho}_{s}^{2}(b)\left(\sum_{k\in S^{(b)}}\frac{\sqrt{% \widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{kk}}}{\text{vol}\left({\cal G}^{(b)% }\right)}\alpha_{k}+\sum_{k\in{\cal V}}\delta_{ik}\alpha_{k}\right)^{\intercal% }\left(\sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{jj}\widehat{{\bm{D}}% }_{kk}}}{\text{vol}\left({\cal G}^{(b)}\right)}\alpha_{k}+\sum_{k\in{\cal V}}% \delta_{jk}\alpha_{k}\right)= over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (50)
=ρ¯s2(b)𝑫^ii𝑫^jjkS(b)𝑫^kkvol(𝒢(b))αk220absentsuperscriptsubscript¯𝜌𝑠2𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗subscriptsubscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22absent0\displaystyle=\overline{\rho}_{s}^{2}(b)\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{% {\bm{D}}}_{jj}}\underbrace{\left\|\sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm% {D}}}_{kk}}}{\text{vol}({\cal G}^{(b)})}\alpha_{k}\right\|^{2}_{2}}_{\geq 0}= over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG under⏟ start_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT (51)
+ρ¯s2(b)(𝑫^iikS(b)𝑫^kkvol(𝒢(b))αk)(k𝒱δjkαk)superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript^𝑫𝑖𝑖subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛼𝑘\displaystyle+\overline{\rho}_{s}^{2}(b)\left(\sqrt{\widehat{{\bm{D}}}_{ii}}% \sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{% (b)})}\alpha_{k}\right)^{\intercal}\left(\sum_{k\in{\cal V}}\delta_{jk}\alpha_% {k}\right)+ over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (52)
+ρ¯s2(b)(k𝒱δikαk)(𝑫^jjkS(b)𝑫^kkvol(𝒢(b))αk)superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛼𝑘subscript^𝑫𝑗𝑗subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘\displaystyle+\overline{\rho}_{s}^{2}(b)\left(\sum_{k\in{\cal V}}\delta_{ik}% \alpha_{k}\right)^{\intercal}\left(\sqrt{\widehat{{\bm{D}}}_{jj}}\sum_{k\in S^% {(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{(b)})}\alpha_% {k}\right)+ over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (53)
+ρ¯s2(b)(k𝒱δikαk)(k𝒱δjkαk)superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛼𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛼𝑘\displaystyle+\overline{\rho}_{s}^{2}(b)\left(\sum_{k\in{\cal V}}\delta_{ik}% \alpha_{k}\right)^{\intercal}\left(\sum_{k\in{\cal V}}\delta_{jk}\alpha_{k}\right)+ over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (54)

Then, by Cauchy-Schwarz and the triangle inequality:

|𝔼[fLP(𝒔i(L),𝒔j(L))]ρ¯s2(b)kS(b)𝑫^kkvol(𝒢(b))αk22𝑫^ii𝑫^jj𝑫^ii𝑫^jj|𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿subscriptsuperscriptsubscript¯𝜌𝑠2𝑏subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗proportional-toabsentsubscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗\displaystyle\left|\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{s}}_{i}^{(L)},{% \bm{s}}_{j}^{(L)}\right)\right]-\underbrace{\overline{\rho}_{s}^{2}(b)\left\|% \sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{% (b)})}\alpha_{k}\right\|^{2}_{2}\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}% }_{jj}}}_{\propto\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}}\right|| blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] - under⏟ start_ARG over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT ∝ square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT | (55)
ζsρ¯s2(b)(𝑫^ii+𝑫^jj)kS(b)𝑫^kkvol(𝒢(b))αk2(k𝒱αk2)+ζs2ρ¯s2(b)(k𝒱αk2)2absentsubscript𝜁𝑠superscriptsubscript¯𝜌𝑠2𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗subscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘2subscript𝑘𝒱subscriptnormsubscript𝛼𝑘2superscriptsubscript𝜁𝑠2superscriptsubscript¯𝜌𝑠2𝑏superscriptsubscript𝑘𝒱subscriptnormsubscript𝛼𝑘22\displaystyle\leq\zeta_{s}\overline{\rho}_{s}^{2}(b)\left(\sqrt{\widehat{{\bm{% D}}}_{ii}}+\sqrt{\widehat{{\bm{D}}}_{jj}}\right)\left\|\sum_{k\in S^{(b)}}% \frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{(b)})}\alpha_{k}% \right\|_{2}\left(\sum_{k\in{\cal V}}\|\alpha_{k}\|_{2}\right)+\zeta_{s}^{2}% \overline{\rho}_{s}^{2}(b)\left(\sum_{k\in{\cal V}}\|\alpha_{k}\|_{2}\right)^{2}≤ italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG + square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_ζ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (56)

A.4 Lemma A.1 and Proof

Lemma A.1.

We introduce the notation 𝐏=𝐃1𝐀𝐏superscript𝐃1𝐀{\bm{P}}={\bm{D}}^{-1}{\bm{A}}bold_italic_P = bold_italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A. We further define 𝐏^=𝐃^1𝐀^^𝐏superscript^𝐃1^𝐀\widehat{{\bm{P}}}=\widehat{{\bm{D}}}^{-1}\widehat{{\bm{A}}}over^ start_ARG bold_italic_P end_ARG = over^ start_ARG bold_italic_D end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_A end_ARG. Fix iS(b)𝑖superscript𝑆𝑏i\in S^{(b)}italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. Then, for jS(b)𝑗superscript𝑆𝑏j\in S^{(b)}italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT:

|𝑷ijL𝑫^jjvol(𝒢(b))|𝑫^jj𝑫^ii(λ(b))L+l=1L(Ll)Ξ(0)opl𝑷^opLlsubscriptsuperscript𝑷𝐿𝑖𝑗subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏subscript^𝑫𝑗𝑗subscript^𝑫𝑖𝑖superscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\left|{\bm{P}}^{L}_{ij}-\frac{\widehat{{\bm{D}}}_{jj}}{\text{vol}% \left({\cal G}^{(b)}\right)}\right|\leq\sqrt{\frac{\widehat{{\bm{D}}}_{jj}}{% \widehat{{\bm{D}}}_{ii}}}\left(\lambda^{(b)}\right)^{L}+\sum_{l=1}^{L}{L% \choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^{% L-l}_{op}| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | ≤ square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (57)

And for jS(b)𝑗superscript𝑆𝑏j\notin S^{(b)}italic_j ∉ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT:

|𝑷ijL0|subscriptsuperscript𝑷𝐿𝑖𝑗0\displaystyle\left|{\bm{P}}^{L}_{ij}-0\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - 0 | l=1L(Ll)Ξ(0)opl𝑷^opLlabsentsuperscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\leq\sum_{l=1}^{L}{L\choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}% \left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}≤ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (58)
Proof.

Similar to the proof of Lemma 4.2:

𝑷^ijLsubscriptsuperscript^𝑷𝐿𝑖𝑗\displaystyle\widehat{{\bm{P}}}^{L}_{ij}over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =𝑫^jjvol(𝒢(b))+𝑫^jj𝑫^iik=2|S(b)|(λk(b))L(𝒆(i))𝒗k(b)(𝒗k(b))𝒆(j)absentsubscript^𝑫𝑗𝑗volsuperscript𝒢𝑏subscript^𝑫𝑗𝑗subscript^𝑫𝑖𝑖superscriptsubscript𝑘2superscript𝑆𝑏superscriptsuperscriptsubscript𝜆𝑘𝑏𝐿superscriptsuperscript𝒆𝑖subscriptsuperscript𝒗𝑏𝑘superscriptsubscriptsuperscript𝒗𝑏𝑘superscript𝒆𝑗\displaystyle=\frac{\widehat{{\bm{D}}}_{jj}}{\text{vol}\left({\cal G}^{(b)}% \right)}+\sqrt{\frac{\widehat{{\bm{D}}}_{jj}}{\widehat{{\bm{D}}}_{ii}}}\sum_{k% =2}^{\left|S^{(b)}\right|}\left(\lambda_{k}^{(b)}\right)^{L}\left({\bm{e}}^{(i% )}\right)^{\intercal}{\bm{v}}^{(b)}_{k}\left({\bm{v}}^{(b)}_{k}\right)^{% \intercal}{\bm{e}}^{(j)}= divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG + square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( bold_italic_e start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT bold_italic_e start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT (59)

Subsequently:

|𝑷^ijL𝑫^jjvol(𝒢(b))|subscriptsuperscript^𝑷𝐿𝑖𝑗subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏\displaystyle\left|\widehat{{\bm{P}}}^{L}_{ij}-\frac{\widehat{{\bm{D}}}_{jj}}{% \text{vol}\left({\cal G}^{(b)}\right)}\right|| over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | 𝑫^jj𝑫^ii(λ(b))Labsentsubscript^𝑫𝑗𝑗subscript^𝑫𝑖𝑖superscriptsuperscript𝜆𝑏𝐿\displaystyle\leq\sqrt{\frac{\widehat{{\bm{D}}}_{jj}}{\widehat{{\bm{D}}}_{ii}}% }\left(\lambda^{(b)}\right)^{L}≤ square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT (60)

Finally:

|𝑷ijL𝑫^jjvol(𝒢(b))|subscriptsuperscript𝑷𝐿𝑖𝑗subscript^𝑫𝑗𝑗volsuperscript𝒢𝑏\displaystyle\left|{\bm{P}}^{L}_{ij}-\frac{\widehat{{\bm{D}}}_{jj}}{\text{vol}% \left({\cal G}^{(b)}\right)}\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG | ζr=maxu,v𝒱𝑫^vv𝑫^uu(λ(b))L+l=1L(Ll)Ξ(0)opl𝑷^opLlabsentsubscript𝜁𝑟subscript𝑢𝑣𝒱subscript^𝑫𝑣𝑣subscript^𝑫𝑢𝑢superscriptsuperscript𝜆𝑏𝐿superscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝\displaystyle\leq\zeta_{r}=\max_{u,v\in{\cal V}}\sqrt{\frac{\widehat{{\bm{D}}}% _{vv}}{\widehat{{\bm{D}}}_{uu}}}\left(\lambda^{(b)}\right)^{L}+\sum_{l=1}^{L}{% L\choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}\left\|\widehat{{\bm{P}}}\right\|^% {L-l}_{op}≤ italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_u , italic_v ∈ caligraphic_V end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_v italic_v end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT end_ARG end_ARG ( italic_λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT (61)

For jS(b)𝑗superscript𝑆𝑏j\notin S^{(b)}italic_j ∉ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, 𝑷^ijL=0subscriptsuperscript^𝑷𝐿𝑖𝑗0\widehat{{\bm{P}}}^{L}_{ij}=0over^ start_ARG bold_italic_P end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0. Then:

|𝑷ijL0|subscriptsuperscript𝑷𝐿𝑖𝑗0\displaystyle\left|{\bm{P}}^{L}_{ij}-0\right|| bold_italic_P start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - 0 | l=1L(Ll)Ξ(0)opl𝑷^opLlζrabsentsuperscriptsubscript𝑙1𝐿binomial𝐿𝑙subscriptsuperscriptnormsuperscriptΞ0𝑙𝑜𝑝subscriptsuperscriptnorm^𝑷𝐿𝑙𝑜𝑝subscript𝜁𝑟\displaystyle\leq\sum_{l=1}^{L}{L\choose l}\left\|\Xi^{(0)}\right\|^{l}_{op}% \left\|\widehat{{\bm{P}}}\right\|^{L-l}_{op}\leq\zeta_{r}≤ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( binomial start_ARG italic_L end_ARG start_ARG italic_l end_ARG ) ∥ roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ over^ start_ARG bold_italic_P end_ARG ∥ start_POSTSUPERSCRIPT italic_L - italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (62)

A.5 Proof of Theorem 4.4

Proof.

For u,v𝒱𝑢𝑣𝒱u,v\in{\cal V}italic_u , italic_v ∈ caligraphic_V, let |δuv|ζrsubscript𝛿𝑢𝑣subscript𝜁𝑟\left|\delta_{uv}\right|\leq\zeta_{r}| italic_δ start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT | ≤ italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Combining Lemmas 4.1 and A.1, by our assumption that the computation graph paths to i,j𝑖𝑗i,jitalic_i , italic_j are activated independently:

𝔼[fLP(𝒓i(L),𝒓j(L))]=𝔼[𝒓i(L)]𝔼[𝒓j(L)]𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿𝔼superscriptdelimited-[]superscriptsubscript𝒓𝑖𝐿𝔼delimited-[]superscriptsubscript𝒓𝑗𝐿\displaystyle\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{r}}_{i}^{(L)},{\bm{r}}_% {j}^{(L)}\right)\right]=\mathop{\mathbb{E}}\left[{\bm{r}}_{i}^{(L)}\right]^{% \intercal}\mathop{\mathbb{E}}\left[{\bm{r}}_{j}^{(L)}\right]blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] = blackboard_E [ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT blackboard_E [ bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ] (63)
=ρ¯r2(b)(kS(b)𝑫^kkvol(𝒢(b))βk+k𝒱δikβk)(kS(b)𝑫^kkvol(𝒢(b))βk+k𝒱δjkβk)absentsuperscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘subscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛽𝑘subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛽𝑘\displaystyle=\overline{\rho}_{r}^{2}(b)\left(\sum_{k\in S^{(b)}}\frac{% \widehat{{\bm{D}}}_{kk}}{\text{vol}\left({\cal G}^{(b)}\right)}\beta_{k}+\sum_% {k\in{\cal V}}\delta_{ik}\beta_{k}\right)^{\intercal}\left(\sum_{k\in S^{(b)}}% \frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}\left({\cal G}^{(b)}\right)}\beta_{k}% +\sum_{k\in{\cal V}}\delta_{jk}\beta_{k}\right)= over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (64)
=ρ¯r2(b)kS(b)𝑫^kkvol(𝒢(b))βk220+ρ¯r2(b)(kS(b)𝑫^kkvol(𝒢(b))βk)(k𝒱δjkβk)absentsuperscriptsubscript¯𝜌𝑟2𝑏subscriptsubscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘22absent0superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛽𝑘\displaystyle=\overline{\rho}_{r}^{2}(b)\underbrace{\left\|\sum_{k\in S^{(b)}}% \frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}({\cal G}^{(b)})}\beta_{k}\right\|^{2% }_{2}}_{\geq 0}+\overline{\rho}_{r}^{2}(b)\left(\sum_{k\in S^{(b)}}\frac{% \widehat{{\bm{D}}}_{kk}}{\text{vol}({\cal G}^{(b)})}\beta_{k}\right)^{% \intercal}\left(\sum_{k\in{\cal V}}\delta_{jk}\beta_{k}\right)= over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) under⏟ start_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT + over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (65)
+ρ¯r2(b)(k𝒱δikβk)(kS(b)𝑫^kkvol(𝒢(b))βk)+ρ¯r2(b)(k𝒱δikβk)(k𝒱δjkβk)superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛽𝑘subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝑘𝒱subscript𝛿𝑖𝑘subscript𝛽𝑘subscript𝑘𝒱subscript𝛿𝑗𝑘subscript𝛽𝑘\displaystyle+\overline{\rho}_{r}^{2}(b)\left(\sum_{k\in{\cal V}}\delta_{ik}% \beta_{k}\right)^{\intercal}\left(\sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_% {kk}}{\text{vol}({\cal G}^{(b)})}\beta_{k}\right)+\overline{\rho}_{r}^{2}(b)% \left(\sum_{k\in{\cal V}}\delta_{ik}\beta_{k}\right)^{\intercal}\left(\sum_{k% \in{\cal V}}\delta_{jk}\beta_{k}\right)+ over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (66)

Then, by Cauchy-Schwarz and the triangle inequality:

|𝔼[fLP(𝒓i(L),𝒓j(L))]ρ¯r2(b)kS(b)𝑫^kkvol(𝒢(b))βk22 constant|𝔼delimited-[]subscript𝑓𝐿𝑃superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿subscriptsuperscriptsubscript¯𝜌𝑟2𝑏subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘22proportional-toabsent constant\displaystyle\left|\mathop{\mathbb{E}}\left[f_{LP}\left({\bm{r}}_{i}^{(L)},{% \bm{r}}_{j}^{(L)}\right)\right]-\underbrace{\overline{\rho}_{r}^{2}(b)\left\|% \sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}({\cal G}^{(b)})}% \beta_{k}\right\|^{2}_{2}}_{\propto\text{ constant}}\right|| blackboard_E [ italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ] - under⏟ start_ARG over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT ∝ constant end_POSTSUBSCRIPT | (67)
ζrρ¯r2(b)kS(b)𝑫^kkvol(𝒢(b))βk2(k𝒱βk2)+ζr2ρ¯r2(b)(k𝒱βk2)2absentsubscript𝜁𝑟superscriptsubscript¯𝜌𝑟2𝑏subscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘2subscript𝑘𝒱subscriptnormsubscript𝛽𝑘2superscriptsubscript𝜁𝑟2superscriptsubscript¯𝜌𝑟2𝑏superscriptsubscript𝑘𝒱subscriptnormsubscript𝛽𝑘22\displaystyle\leq\zeta_{r}\overline{\rho}_{r}^{2}(b)\left\|\sum_{k\in S^{(b)}}% \frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}({\cal G}^{(b)})}\beta_{k}\right\|_{2% }\left(\sum_{k\in{\cal V}}\|\beta_{k}\|_{2}\right)+\zeta_{r}^{2}\overline{\rho% }_{r}^{2}(b)\left(\sum_{k\in{\cal V}}\|\beta_{k}\|_{2}\right)^{2}≤ italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_ζ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_V end_POSTSUBSCRIPT ∥ italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (68)

Appendix B Approximation of Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT

B.1 Approximation of Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT

Δ(b)(𝒔i(L),𝒔j(L))superscriptΔ𝑏superscriptsubscript𝒔𝑖𝐿superscriptsubscript𝒔𝑗𝐿\displaystyle\Delta^{(b)}\left({\bm{s}}_{i}^{(L)},{\bm{s}}_{j}^{(L)}\right)roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (69)
=|1|(S(b)T(1))×S(b)|iS(b)T(1)jS(b)fLP(𝒔i(L),𝒔j(L))\displaystyle=\big{|}\frac{1}{\left|(S^{(b)}\cap T^{(1)})\times S^{(b)}\right|% }\sum_{i\in S^{(b)}\cap T^{(1)}}\sum_{j\in S^{(b)}}f_{LP}\left({\bm{s}}_{i}^{(% L)},{\bm{s}}_{j}^{(L)}\right)= | divide start_ARG 1 end_ARG start_ARG | ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (70)
1|(S(b)T(2))×S(b)|iS(b)T(2)jS(b)fLP(𝒔i(L),𝒔j(L))|\displaystyle-\frac{1}{\left|(S^{(b)}\cap T^{(2)})\times S^{(b)}\right|}\sum_{% i\in S^{(b)}\cap T^{(2)}}\sum_{j\in S^{(b)}}f_{LP}\left({\bm{s}}_{i}^{(L)},{% \bm{s}}_{j}^{(L)}\right)\big{|}- divide start_ARG 1 end_ARG start_ARG | ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) | (71)
|1|S(b)T(1)||S(b)|iS(b)T(1)jS(b)ρ¯s2(b)𝑫^ii𝑫^jjkS(b)𝑫^kkvol(𝒢(b))αk22approximately-equals-or-equalsabsentevaluated-atdelimited-|‖1superscript𝑆𝑏superscript𝑇1superscript𝑆𝑏subscript𝑖superscript𝑆𝑏superscript𝑇1subscript𝑗superscript𝑆𝑏superscriptsubscript¯𝜌𝑠2𝑏subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22\displaystyle\approxeq\big{|}\frac{1}{\left|S^{(b)}\cap T^{(1)}\right|\left|S^% {(b)}\right|}\sum_{i\in S^{(b)}\cap T^{(1)}}\sum_{j\in S^{(b)}}\overline{\rho}% _{s}^{2}(b)\sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}\left\|\sum_{k% \in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{(b)})}% \alpha_{k}\right\|^{2}_{2}≊ | divide start_ARG 1 end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (72)
1|S(b)T(2)||S(b)|iS(b)T(2)jS(b)ρ¯s2(b)𝑫^ii𝑫^jjkS(b)𝑫^kkvol(𝒢(b))αk22|\displaystyle-\frac{1}{\left|S^{(b)}\cap T^{(2)}\right|\left|S^{(b)}\right|}% \sum_{i\in S^{(b)}\cap T^{(2)}}\sum_{j\in S^{(b)}}\overline{\rho}_{s}^{2}(b)% \sqrt{\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}}\left\|\sum_{k\in S^{(b)}% }\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{(b)})}\alpha_{k}% \right\|^{2}_{2}\big{|}- divide start_ARG 1 end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT | | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | (73)
=ρ¯s2(b)|S(b)|kS(b)𝑫^kkvol(𝒢(b))αk22|jS(b)𝑫^jj(𝔼iU(S(b)T(1))𝑫^ii𝔼iU(S(b)T(2))𝑫^ii)degree disparity|absentsuperscriptsubscript¯𝜌𝑠2𝑏superscript𝑆𝑏subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22subscript𝑗superscript𝑆𝑏subscript^𝑫𝑗𝑗subscriptsubscript𝔼similar-to𝑖𝑈superscript𝑆𝑏superscript𝑇1subscript^𝑫𝑖𝑖subscript𝔼similar-to𝑖𝑈superscript𝑆𝑏superscript𝑇2subscript^𝑫𝑖𝑖degree disparity\displaystyle=\frac{\overline{\rho}_{s}^{2}(b)}{\left|S^{(b)}\right|}\left\|% \sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{vol}({\cal G}^{% (b)})}\alpha_{k}\right\|^{2}_{2}\left|\sum_{j\in S^{(b)}}\sqrt{\widehat{{\bm{D% }}}_{jj}}\underbrace{\left(\mathop{\mathbb{E}}_{i\sim U(S^{(b)}\cap T^{(1)})}% \sqrt{\widehat{{\bm{D}}}_{ii}}-\mathop{\mathbb{E}}_{i\sim U(S^{(b)}\cap T^{(2)% })}\sqrt{\widehat{{\bm{D}}}_{ii}}\right)}_{\text{degree disparity}}\right|= divide start_ARG over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG under⏟ start_ARG ( blackboard_E start_POSTSUBSCRIPT italic_i ∼ italic_U ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG - blackboard_E start_POSTSUBSCRIPT italic_i ∼ italic_U ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT end_ARG ) end_ARG start_POSTSUBSCRIPT degree disparity end_POSTSUBSCRIPT | (74)

B.2 Approximation of Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT

Δ(b)(𝒓i(L),𝒓j(L))superscriptΔ𝑏superscriptsubscript𝒓𝑖𝐿superscriptsubscript𝒓𝑗𝐿\displaystyle\Delta^{(b)}\left({\bm{r}}_{i}^{(L)},{\bm{r}}_{j}^{(L)}\right)roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (75)
=|1|(S(b)T(1))×S(b)|iS(b)T(1)jS(b)fLP(𝒓i(L),𝒓j(L))\displaystyle=\big{|}\frac{1}{\left|(S^{(b)}\cap T^{(1)})\times S^{(b)}\right|% }\sum_{i\in S^{(b)}\cap T^{(1)}}\sum_{j\in S^{(b)}}f_{LP}\left({\bm{r}}_{i}^{(% L)},{\bm{r}}_{j}^{(L)}\right)= | divide start_ARG 1 end_ARG start_ARG | ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) (76)
1|(S(b)T(2))×S(b)|iS(b)T(2)jS(b)fLP(𝒓i(L),𝒓j(L))|\displaystyle-\frac{1}{\left|(S^{(b)}\cap T^{(2)})\times S^{(b)}\right|}\sum_{% i\in S^{(b)}\cap T^{(2)}}\sum_{j\in S^{(b)}}f_{LP}\left({\bm{r}}_{i}^{(L)},{% \bm{r}}_{j}^{(L)}\right)\big{|}- divide start_ARG 1 end_ARG start_ARG | ( italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) × italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) | (77)
|1|S(b)T(1)||S(b)|iS(b)T(1)jS(b)ρ¯r2(b)kS(b)𝑫^kkvol(𝒢(b))βk22approximately-equals-or-equalsabsentevaluated-atdelimited-|‖1superscript𝑆𝑏superscript𝑇1superscript𝑆𝑏subscript𝑖superscript𝑆𝑏superscript𝑇1subscript𝑗superscript𝑆𝑏superscriptsubscript¯𝜌𝑟2𝑏subscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛽𝑘22\displaystyle\approxeq\big{|}\frac{1}{\left|S^{(b)}\cap T^{(1)}\right|\left|S^% {(b)}\right|}\sum_{i\in S^{(b)}\cap T^{(1)}}\sum_{j\in S^{(b)}}\overline{\rho}% _{r}^{2}(b)\left\|\sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}% ({\cal G}^{(b)})}\beta_{k}\right\|^{2}_{2}≊ | divide start_ARG 1 end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (78)
1|S(b)T(2)||S(b)|iS(b)T(2)jS(b)ρ¯r2(b)kS(b)𝑫^kkvol(𝒢(b))βk22|\displaystyle-\frac{1}{\left|S^{(b)}\cap T^{(2)}\right|\left|S^{(b)}\right|}% \sum_{i\in S^{(b)}\cap T^{(2)}}\sum_{j\in S^{(b)}}\overline{\rho}_{r}^{2}(b)% \left\|\sum_{k\in S^{(b)}}\frac{\widehat{{\bm{D}}}_{kk}}{\text{vol}({\cal G}^{% (b)})}\beta_{k}\right\|^{2}_{2}\big{|}- divide start_ARG 1 end_ARG start_ARG | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT | | italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∩ italic_T start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | (79)
=0absent0\displaystyle=0= 0 (80)

Appendix C Datasets Used in §6.1

In our experiments in §6.1, we use 10 real-world network datasets from Bojchevski & Günnemann (2018), Shchur et al. (2018), Rozemberczki & Sarkar (2020), and Rozemberczki et al. (2021), covering diverse domains (e.g., citation networks, collaboration networks, online social networks). We provide a description and some statistics of each dataset in Table 2. All the datasets have node features and are undirected. We were unable to find the exact class names and their label correspondence from the dataset documentation.

  • In all the citation network datasets, nodes represent documents, edges represent citation links, and features are a bag-of-words representation of documents. We row-normalize the features to sum to 1, following Fey & Lenssen (2019)666https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py. The classification task is to predict the topic of documents.

  • In the collaboration network datasets, nodes represent authors, edges represent coauthorships, and features are embeddings of paper keywords for authors’ papers. The classification task is to predict the most active field of study for authors.

  • In the LastFMAsia network dataset, nodes represent LastFM users from Asia, edges represent friendships between users, and features are embeddings of the artists liked by users. The classification task is to predict the home country of users.

  • In the Twitch network datasets, nodes represent gamers on Twitch, edges represent followerships between them, and features are embeddings of the history of games played by the Twitch users. The classification task is to predict whether or not a gamer streams adult content.

We only run experiments on datasets that can fit without sampling nodes on a single NVIDIA GeForce GTX Titan Xp Graphic Card with 12196MiB of space. Furthermore, we only consider the three largest datasets (i.e., with the most nodes) from Rozemberczki et al. (2021). We use PyTorch Geometric to load and process all datasets (Fey & Lenssen, 2019).

Table 2: Summary of the datasets used in our experiments.
Name Domain # Nodes # Edges # Features # Classes
Cora citation 19793 126842 8710 70
CiteSeer citation 4230 10674 602 6
DBLP citation 17716 105734 1639 4
PubMed citation 19717 88648 500 3
CS collaboration 18333 163788 6805 15
Physics collaboration 34493 495924 8415 5
LastFMAsia online social 7624 55612 128 18
Twitch-DE online social 9498 315774 128 2
Twitch-EN online social 7126 77774 128 2
Twitch-FR online social 6551 231883 128 2

Appendix D Datasets Used in §6.2

We run experiments on three network datasets: (1) the NBA social network (cf. §D.1), (2) the German credit network (cf. §D.2), and (3) a new DBLP-Fairness citation network that we construct (cf. §D.3). All the datasets have node features and are undirected. We do not pass sensitive attributes as features to the models that we train. For each dataset, we min-max normalize node features to fall in [1,1]11[-1,1][ - 1 , 1 ], following Dai & Wang (2021) and Agarwal et al. (2021). Furthermore, for all datasets, D=2𝐷2D=2italic_D = 2.

D.1 NBA Dataset

The NBA network (Dai & Wang, 2021) has 403 nodes representing NBA basketball players who are connected if they follow each other on Twitter. There are 21242 links. Each node has 95 features, with an average degree of 52.71±35.14plus-or-minus52.7135.1452.71\pm 35.1452.71 ± 35.14. We consider two sensitive attributes per node:

  • Age {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT: how old the payer is, i.e., Young (25absent25\leq 25≤ 25 years) or Old (>25absent25>25> 25 years).

  • Nationality {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT: from where the player is, i.e., United States or Overseas.

D.2 German Dataset

The German network (Agarwal et al., 2021) comprises 1000 nodes representing clients in a German bank who are connected if they have similar credit accounts. The German network is not natively a graph dataset; synthetic edges were created by Agarwal et al. There are 44484 links. Each node has 27 features (e.g., loan amount, account-related features), with an average degree of 44.48±26.52plus-or-minus44.4826.5244.48\pm 26.5244.48 ± 26.52. We consider two sensitive attributes per node:

  • Foreign worker {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT: whether the client is a foreign worker, i.e., Yes or No.

  • Gender {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT: the gender of the client, i.e., Man or Woman.

D.3 DBLP-Fairness Dataset

In this subsection, we detail how we construct the DBLP-Fairness dataset. We build DBLP-Fairness, as there are only a few natively-graph network datasets with sensitive attributes that are appropriate for graph learning (Subramonian et al., 2022).

We begin with the version of the DBLP-Citation-network V12 dataset from (Tang et al., 2008) that was processed by Xu et al. (2021). This dataset has 3658127 nodes. Each node represents a paper and each edge represents a citation link. We consider five node features:

  • Team size: the number of authors on the paper.

  • Mean collaborators: the average number of collaborators with whom the authors have previously published.

  • Gini collaborators: the Gini coefficient of the number of collaborators with whom the authors have previously published.

  • Mean productivity: the average number of papers that the authors have previously published.

  • Gini productivity: the Gini coefficient of the number of papers that the authors have previously published.

We also consider two sensitive attributes per node:

  • Field {S(b)}b[B]subscriptsuperscript𝑆𝑏𝑏delimited-[]𝐵\{S^{(b)}\}_{b\in[B]}{ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT: the field to which the paper belongs, i.e., Programming Languages or Databases.

  • Nationality {T(d)}d[D]subscriptsuperscript𝑇𝑑𝑑delimited-[]𝐷\{T^{(d)}\}_{d\in[D]}{ italic_T start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_d ∈ [ italic_D ] end_POSTSUBSCRIPT: the country where most authors reside, i.e., United States or China.

In DBLP-Fairness, we only include papers whose nationality is United States or China; American and Chinese citation networks are known to be stratified (Zhao et al., 2022). We also only include papers whose field is Programming Languages or Databases; we infer the field of a paper using its keywords (i.e., whether they contain “programming language” and “database”), and discard papers which include both “programming language” and “database” in its keywords. Furthermore, we filter out all papers from before 2010. We sought DBLB-Fairness to be of comparable size to the citation networks in §C. Following filtering, we were left with 14537 nodes and 24844 edges.

Appendix E Models

For all experiments, we use GCN encoders (Kipf & Welling, 2017) to get node representations. Each encoder has two layers (128-dimensional hidden layer, 64-dimensional output layer) with a ReLU nonlinearity in between. We only use two layers, as this is common practice in graph deep learning to prevent oversmoothing (Oono & Suzuki, 2020); however, we run experiments with four layers in §G. We do not use any regularization (e.g., Dropout, BatchNorm). The encoders are explicitly trained for LP with the inner-product LP score function in Eqn. 6, binary cross-entropy loss, and the Adam optimizer with full-batch gradient descent and a learning rate of 0.01 (Kingma & Ba, 2014). We use a random link split of 0.85-0.05-0.1 for train-val-test, following the PyTorch Geometric LP example777https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py. We train the encoders for 100 epochs, with a new round of negative link sampling during every epoch; we use a 1:1 ratio of positive to negative links. We ultimately select the model parameters with the highest validation ROC-AUC. Although we do not do any hyperparameter tuning, the test ROC-AUC values (displayed in the figures in §6) indicate that the encoders are well-trained. We use PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey & Lenssen, 2019) to train all the encoders on a single NVIDIA GeForce GTX Titan Xp Graphic Card with 12196MiB of space.

Appendix F Remaining Plots

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Theoretic vs. GCN LP scores for citation network datasets.
Refer to caption
Refer to caption
Figure 5: Theoretic vs. GCN LP scores for collaboration network datasets.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Theoretic vs. GCN LP scores for online social network datasets.

Appendix G Additional Experiments

G.1 Additional Experiments for §6.1 (4-layer Encoders)

We run the experiments from §6.1 for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT with the same settings, except we use 4-layer (instead of 2-layer) encoders (128-dimensional hidden layers, 64-dimensional output layer). We run these additional experiments because the error bound for the theoretic LP scores for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT depends on the number of encoder layers L𝐿Litalic_L. We find that the experimental results continue to support our theoretical analysis, both qualitatively and quantitatively (cf. Table 3, Figure 7); the NRMSE and PCC values are comparable to or better than those from the experiments with the 2-layer encoders (especially for the EN dataset).

Table 3: The test AUC of the 4-layer ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT encoders on the real-world network datasets, and the NRMSE and PCC of the theoretic LP scores as predictors of the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT scores.
NRMSE (\downarrow) PCC (\uparrow) ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT Test AUC (\uparrow)
CORA 0.044±0.006plus-or-minus0.0440.0060.044\pm 0.0060.044 ± 0.006 0.858±0.026plus-or-minus0.8580.0260.858\pm 0.0260.858 ± 0.026 0.853±0.028plus-or-minus0.8530.0280.853\pm 0.0280.853 ± 0.028
CITESEER 0.057±0.006plus-or-minus0.0570.0060.057\pm 0.0060.057 ± 0.006 0.890±0.017plus-or-minus0.8900.0170.890\pm 0.0170.890 ± 0.017 0.861±0.026plus-or-minus0.8610.0260.861\pm 0.0260.861 ± 0.026
DBLP 0.021±0.002plus-or-minus0.0210.0020.021\pm 0.0020.021 ± 0.002 0.885±0.054plus-or-minus0.8850.0540.885\pm 0.0540.885 ± 0.054 0.887±0.019plus-or-minus0.8870.0190.887\pm 0.0190.887 ± 0.019
PUBMED 0.056±0.009plus-or-minus0.0560.0090.056\pm 0.0090.056 ± 0.009 0.802±0.024plus-or-minus0.8020.0240.802\pm 0.0240.802 ± 0.024 0.900±0.006plus-or-minus0.9000.0060.900\pm 0.0060.900 ± 0.006
CS 0.039±0.006plus-or-minus0.0390.0060.039\pm 0.0060.039 ± 0.006 0.918±0.008plus-or-minus0.9180.0080.918\pm 0.0080.918 ± 0.008 0.949±0.004plus-or-minus0.9490.0040.949\pm 0.0040.949 ± 0.004
PHYSICS 0.030±0.002plus-or-minus0.0300.0020.030\pm 0.0020.030 ± 0.002 0.077±0.013plus-or-minus0.0770.0130.077\pm 0.0130.077 ± 0.013 0.950±0.004plus-or-minus0.9500.0040.950\pm 0.0040.950 ± 0.004
LASTFMASIA 0.040±0.004plus-or-minus0.0400.0040.040\pm 0.0040.040 ± 0.004 0.938±0.005plus-or-minus0.9380.0050.938\pm 0.0050.938 ± 0.005 0.949±0.002plus-or-minus0.9490.0020.949\pm 0.0020.949 ± 0.002
DE 0.014±0.003plus-or-minus0.0140.0030.014\pm 0.0030.014 ± 0.003 0.918±0.025plus-or-minus0.9180.0250.918\pm 0.0250.918 ± 0.025 0.882±0.002plus-or-minus0.8820.0020.882\pm 0.0020.882 ± 0.002
EN 0.034±0.005plus-or-minus0.0340.0050.034\pm 0.0050.034 ± 0.005 0.752±0.036plus-or-minus0.7520.0360.752\pm 0.0360.752 ± 0.036 0.846±0.008plus-or-minus0.8460.0080.846\pm 0.0080.846 ± 0.008
FR 0.019±0.003plus-or-minus0.0190.0030.019\pm 0.0030.019 ± 0.003 0.833±0.038plus-or-minus0.8330.0380.833\pm 0.0380.833 ± 0.038 0.896±0.003plus-or-minus0.8960.0030.896\pm 0.0030.896 ± 0.003
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Theoretic LP score vs. 4-layer ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT LP score for all network datasets.

G.2 Additional Experiments for §6.1 (Hadamard Product and MLP LP Score Function)

We also run the experiments from §6.1 for ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT with the same settings, except we use the following LP score function:

fLP(𝒉i(L),𝒉j(L))=fMLP(𝒉i(L)𝒉j(L)),subscript𝑓𝐿𝑃superscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿subscript𝑓𝑀𝐿𝑃direct-productsuperscriptsubscript𝒉𝑖𝐿superscriptsubscript𝒉𝑗𝐿\displaystyle f_{LP}\left({\bm{h}}_{i}^{(L)},{\bm{h}}_{j}^{(L)}\right)=f_{MLP}% \left({\bm{h}}_{i}^{(L)}\odot{\bm{h}}_{j}^{(L)}\right),italic_f start_POSTSUBSCRIPT italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ⊙ bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) , (81)

where direct-product\odot is the Hadamard product and fMLPsubscript𝑓𝑀𝐿𝑃f_{MLP}italic_f start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT is a 2-layer MLP with a 64-dimensional hidden layer and ReLU nonlinearity. We run these additional experiments because a Hadamard product and MLP score function is often used in the literature. We find that that our theoretical analysis is still relevant to and reasonably supports the experimental results, both qualitatively and quantitatively (cf. Table 4, Figure 8). This could be because MLPs have an inductive bias towards learning simpler, often linear functions (Nakkiran et al., 2019; Valle-Pérez et al., 2019), and our theoretical findings are generalizable to linear LP score functions. Notably, in this setting, ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT makes a higher number of negative link predictions. For a few datasets (e.g., Cora, CiteSeer, LastFMAsia), a handful of theoretic LP scores are negative because the regression (incorrectly) predicts ρ¯s2(b)superscriptsubscript¯𝜌𝑠2𝑏\overline{\rho}_{s}^{2}(b)over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_b ) for 1-2 groups S(b)superscript𝑆𝑏S^{(b)}italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT to be negative.

Table 4: The test AUC of the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT encoders with an fMLPsubscript𝑓𝑀𝐿𝑃f_{MLP}italic_f start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT score function on the real-world network datasets, and the NRMSE and PCC of the theoretic LP scores as predictors of the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT scores.
NRMSE (\downarrow) PCC (\uparrow) ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT Test AUC (\uparrow)
CORA 0.034±0.004plus-or-minus0.0340.0040.034\pm 0.0040.034 ± 0.004 0.830±0.015plus-or-minus0.8300.0150.830\pm 0.0150.830 ± 0.015 0.915±0.001plus-or-minus0.9150.0010.915\pm 0.0010.915 ± 0.001
CITESEER 0.090±0.014plus-or-minus0.0900.0140.090\pm 0.0140.090 ± 0.014 0.365±0.070plus-or-minus0.3650.0700.365\pm 0.0700.365 ± 0.070 0.913±0.008plus-or-minus0.9130.0080.913\pm 0.0080.913 ± 0.008
DBLP 0.026±0.003plus-or-minus0.0260.0030.026\pm 0.0030.026 ± 0.003 0.652±0.029plus-or-minus0.6520.0290.652\pm 0.0290.652 ± 0.029 0.933±0.004plus-or-minus0.9330.0040.933\pm 0.0040.933 ± 0.004
PUBMED 0.054±0.007plus-or-minus0.0540.0070.054\pm 0.0070.054 ± 0.007 0.813±0.038plus-or-minus0.8130.0380.813\pm 0.0380.813 ± 0.038 0.932±0.003plus-or-minus0.9320.0030.932\pm 0.0030.932 ± 0.003
CS 0.047±0.008plus-or-minus0.0470.0080.047\pm 0.0080.047 ± 0.008 0.677±0.036plus-or-minus0.6770.0360.677\pm 0.0360.677 ± 0.036 0.970±0.001plus-or-minus0.9700.0010.970\pm 0.0010.970 ± 0.001
PHYSICS 0.055±0.007plus-or-minus0.0550.0070.055\pm 0.0070.055 ± 0.007 0.566±0.026plus-or-minus0.5660.0260.566\pm 0.0260.566 ± 0.026 0.976±0.001plus-or-minus0.9760.0010.976\pm 0.0010.976 ± 0.001
LASTFMASIA 0.049±0.008plus-or-minus0.0490.0080.049\pm 0.0080.049 ± 0.008 0.682±0.035plus-or-minus0.6820.0350.682\pm 0.0350.682 ± 0.035 0.960±0.003plus-or-minus0.9600.0030.960\pm 0.0030.960 ± 0.003
DE 0.030±0.008plus-or-minus0.0300.0080.030\pm 0.0080.030 ± 0.008 0.683±0.047plus-or-minus0.6830.0470.683\pm 0.0470.683 ± 0.047 0.935±0.001plus-or-minus0.9350.0010.935\pm 0.0010.935 ± 0.001
EN 0.039±0.006plus-or-minus0.0390.0060.039\pm 0.0060.039 ± 0.006 0.463±0.022plus-or-minus0.4630.0220.463\pm 0.0220.463 ± 0.022 0.905±0.002plus-or-minus0.9050.0020.905\pm 0.0020.905 ± 0.002
FR 0.031±0.006plus-or-minus0.0310.0060.031\pm 0.0060.031 ± 0.006 0.654±0.067plus-or-minus0.6540.0670.654\pm 0.0670.654 ± 0.067 0.935±0.002plus-or-minus0.9350.0020.935\pm 0.0020.935 ± 0.002
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: Theoretic LP score vs. ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT LP score (with Hadamard product and MLP) for all network datasets.

G.3 Additional Experiments for §6.2

Refer to caption
Refer to caption
Refer to caption
Figure 9: The plots display Δ^(b)superscript^Δ𝑏\widehat{\Delta}^{(b)}over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT vs. Δ(b)superscriptΔ𝑏\Delta^{(b)}roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT for 4-layer ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for the NBA, German, and DBLP-Fairness datasets over all b[B]𝑏delimited-[]𝐵b\in[B]italic_b ∈ [ italic_B ] and 10 random seeds.

G.4 Additional Experiments for §6.3

Table 5: 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT and the test AUC for the NBA, German, and DBLP-Fairness datasets with various settings of λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT. The left table corresponds to 4-layer ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and the right to 4-layer ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (\downarrow) ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT Test AUC (\uparrow) NBA 4.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.752±0.001plus-or-minus0.7520.0010.752\pm 0.0010.752 ± 0.001 NBA 2.0 0.006±0.001plus-or-minus0.0060.0010.006\pm 0.0010.006 ± 0.001 0.752±0.001plus-or-minus0.7520.0010.752\pm 0.0010.752 ± 0.001 NBA 1.0 0.011±0.001plus-or-minus0.0110.0010.011\pm 0.0010.011 ± 0.001 0.753±0.001plus-or-minus0.7530.0010.753\pm 0.0010.753 ± 0.001 NBA 0.0 0.014±0.001plus-or-minus0.0140.0010.014\pm 0.0010.014 ± 0.001 0.753±0.001plus-or-minus0.7530.0010.753\pm 0.0010.753 ± 0.001 DBLPFAIRNESS 4.0 0.090±0.041plus-or-minus0.0900.0410.090\pm 0.0410.090 ± 0.041 0.793±0.009plus-or-minus0.7930.0090.793\pm 0.0090.793 ± 0.009 DBLPFAIRNESS 2.0 0.070±0.015plus-or-minus0.0700.0150.070\pm 0.0150.070 ± 0.015 0.800±0.007plus-or-minus0.8000.0070.800\pm 0.0070.800 ± 0.007 DBLPFAIRNESS 1.0 0.099±0.009plus-or-minus0.0990.0090.099\pm 0.0090.099 ± 0.009 0.804±0.007plus-or-minus0.8040.0070.804\pm 0.0070.804 ± 0.007 DBLPFAIRNESS 0.0 0.122±0.028plus-or-minus0.1220.0280.122\pm 0.0280.122 ± 0.028 0.820±0.009plus-or-minus0.8200.0090.820\pm 0.0090.820 ± 0.009 GERMAN 4.0 0.012±0.008plus-or-minus0.0120.0080.012\pm 0.0080.012 ± 0.008 0.817±0.004plus-or-minus0.8170.0040.817\pm 0.0040.817 ± 0.004 GERMAN 2.0 0.018±0.007plus-or-minus0.0180.0070.018\pm 0.0070.018 ± 0.007 0.827±0.015plus-or-minus0.8270.0150.827\pm 0.0150.827 ± 0.015 GERMAN 1.0 0.018±0.008plus-or-minus0.0180.0080.018\pm 0.0080.018 ± 0.008 0.856±0.025plus-or-minus0.8560.0250.856\pm 0.0250.856 ± 0.025 GERMAN 0.0 0.028±0.007plus-or-minus0.0280.0070.028\pm 0.0070.028 ± 0.007 0.874±0.011plus-or-minus0.8740.0110.874\pm 0.0110.874 ± 0.011

λfairsubscript𝜆fair\lambda_{\text{fair}}italic_λ start_POSTSUBSCRIPT fair end_POSTSUBSCRIPT 1Bb[B]Δ(b)1𝐵subscript𝑏delimited-[]𝐵superscriptΔ𝑏\frac{1}{B}\sum_{b\in[B]}\Delta^{(b)}divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ [ italic_B ] end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT (\downarrow) ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT Test AUC (\uparrow) NBA 4.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.581±0.029plus-or-minus0.5810.0290.581\pm 0.0290.581 ± 0.029 NBA 2.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.574±0.021plus-or-minus0.5740.0210.574\pm 0.0210.574 ± 0.021 NBA 1.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.580±0.025plus-or-minus0.5800.0250.580\pm 0.0250.580 ± 0.025 NBA 0.0 0.000±0.000plus-or-minus0.0000.0000.000\pm 0.0000.000 ± 0.000 0.589±0.031plus-or-minus0.5890.0310.589\pm 0.0310.589 ± 0.031 DBLPFAIRNESS 4.0 0.034±0.012plus-or-minus0.0340.0120.034\pm 0.0120.034 ± 0.012 0.769±0.009plus-or-minus0.7690.0090.769\pm 0.0090.769 ± 0.009 DBLPFAIRNESS 2.0 0.045±0.021plus-or-minus0.0450.0210.045\pm 0.0210.045 ± 0.021 0.788±0.007plus-or-minus0.7880.0070.788\pm 0.0070.788 ± 0.007 DBLPFAIRNESS 1.0 0.074±0.013plus-or-minus0.0740.0130.074\pm 0.0130.074 ± 0.013 0.797±0.006plus-or-minus0.7970.0060.797\pm 0.0060.797 ± 0.006 DBLPFAIRNESS 0.0 0.095±0.015plus-or-minus0.0950.0150.095\pm 0.0150.095 ± 0.015 0.811±0.006plus-or-minus0.8110.0060.811\pm 0.0060.811 ± 0.006 GERMAN 4.0 0.027±0.009plus-or-minus0.0270.0090.027\pm 0.0090.027 ± 0.009 0.765±0.013plus-or-minus0.7650.0130.765\pm 0.0130.765 ± 0.013 GERMAN 2.0 0.023±0.007plus-or-minus0.0230.0070.023\pm 0.0070.023 ± 0.007 0.765±0.011plus-or-minus0.7650.0110.765\pm 0.0110.765 ± 0.011 GERMAN 1.0 0.031±0.010plus-or-minus0.0310.0100.031\pm 0.0100.031 ± 0.010 0.786±0.030plus-or-minus0.7860.0300.786\pm 0.0300.786 ± 0.030 GERMAN 0.0 0.030±0.009plus-or-minus0.0300.0090.030\pm 0.0090.030 ± 0.009 0.838±0.025plus-or-minus0.8380.0250.838\pm 0.0250.838 ± 0.025

Appendix H Theory Pitfalls

To understand the second pitfall from §6.1, we separately investigate the association between the within-group degree product (𝑫^ii𝑫^jj)subscript^𝑫𝑖𝑖subscript^𝑫𝑗𝑗\left(\widehat{{\bm{D}}}_{ii}\widehat{{\bm{D}}}_{jj}\right)( over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT ) and the absolute deviation of the theoretic LP scores from the ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT scores, as well as the association between the (transformed) feature similarity (kS(b)𝑫^kkvol(𝒢(b))αk22)subscriptsuperscriptnormsubscript𝑘superscript𝑆𝑏subscript^𝑫𝑘𝑘volsuperscript𝒢𝑏subscript𝛼𝑘22\left(\left\|\sum_{k\in S^{(b)}}\frac{\sqrt{\widehat{{\bm{D}}}_{kk}}}{\text{% vol}({\cal G}^{(b)})}\alpha_{k}\right\|^{2}_{2}\right)( ∥ ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG square-root start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT end_ARG end_ARG start_ARG vol ( caligraphic_G start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ) end_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and the absolute deviation (cf. Figure 10). We observe that the absolute deviation is highest for the node pairs with a relatively small degree product (i.e., nodes with a low PA score) and low feature similarity.

Refer to caption
Refer to caption
Figure 10: Associations of absolute deviation with degree product and with feature similarity for CiteSeer.

Appendix I Error Analysis of ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT Theoretic Scores

Figure 11 reveals that the max term maxu,v𝒱𝑫^vv𝑫^uusubscript𝑢𝑣𝒱subscript^𝑫𝑣𝑣subscript^𝑫𝑢𝑢\max_{u,v\in{\cal V}}\sqrt{\frac{\widehat{{\bm{D}}}_{vv}}{\widehat{{\bm{D}}}_{% uu}}}roman_max start_POSTSUBSCRIPT italic_u , italic_v ∈ caligraphic_V end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_v italic_v end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT end_ARG end_ARG is quite large in practice, which causes the theoretic LP scores to generally be poor estimates for the ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT scores. We additionally find in Figure 11 that the relative error (as measured by NRMSE and PCC) of the theoretic LP scores for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is not lower for lower values of the max term maxu,v𝒱𝑫^vv𝑫^uusubscript𝑢𝑣𝒱subscript^𝑫𝑣𝑣subscript^𝑫𝑢𝑢\max_{u,v\in{\cal V}}\sqrt{\frac{\widehat{{\bm{D}}}_{vv}}{\widehat{{\bm{D}}}_{% uu}}}roman_max start_POSTSUBSCRIPT italic_u , italic_v ∈ caligraphic_V end_POSTSUBSCRIPT square-root start_ARG divide start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_v italic_v end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_D end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT end_ARG end_ARG.

Refer to caption
Refer to caption
Figure 11: Weak associations of max term with NRMSE and PCC of theoretic LP scores for ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT across all datasets described in §C.

Furthermore, Figure 12 reveals that ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT LP scores are not higher for incident nodes with larger degrees.

Refer to caption
Refer to caption
Figure 12: Weak associations of mean ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT LP scores (over 10 random seeds) with degree of each incident node and product of degrees of both incident nodes. Colors correspond to different groups.

There are intimate connections between Theorem 4.4 and the steady-state probabilities of random walks. The stationary probabilities of random walks are the same regardless of the starting node. This is why ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT produces similar representations for all the nodes in each social group, regardless of the degree of the node; in fact, with a larger number of layers, ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT would oversmooth all the representations to the same vector (Keriven, 2022). Hence, ΦrsubscriptΦ𝑟\Phi_{r}roman_Φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT LP scores do not have a degree dependence, theoretically or empirically.

Appendix J Preferential Attachment and Motivation

Preferential Attachment

Preferential attachment (PA) describes the propensity of links to form with high-degree nodes888https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.link_prediction.preferential_attachment.html. Network scientists have studied for decades how links in real-world networks exhibit PA. For example, in the iterative Barabási-Albert model of network formation, each new node s𝑠sitalic_s forms links with existing nodes t𝑡titalic_t with probability proportional to the degree of t𝑡titalic_t, i.e., ((s,t))deg(t)proportional-to𝑠𝑡𝑑𝑒𝑔𝑡\mathbb{P}((s,t)\in{\cal E})\propto deg(t)blackboard_P ( ( italic_s , italic_t ) ∈ caligraphic_E ) ∝ italic_d italic_e italic_g ( italic_t ). In the context of our paper, PA describes how a GCN with an inner-product LP score function often predicts links between nodes i,j𝑖𝑗i,jitalic_i , italic_j with score deg(i)deg(j)proportional-toabsent𝑑𝑒𝑔𝑖𝑑𝑒𝑔𝑗\propto\sqrt{deg(i)\cdot deg(j)}∝ square-root start_ARG italic_d italic_e italic_g ( italic_i ) ⋅ italic_d italic_e italic_g ( italic_j ) end_ARG approximately (Theorem 4.3).

Motivation

A wealth of literature in network science and the social sciences has examined the PA properties of real-world networks and how these properties contribute to unfair (non-neural) algorithms (§2). For example, Stoica et al. (2018) find that Instagram accounts run by men have a significantly higher following than those run by women due to gender discrimination; this degree disparity is only amplified by link recommendation algorithms that suggest following high-degree accounts, which makes the rich get richer and reveals that these algorithms have a PA bias. Moreover, many papers outside graph learning have discussed the intersectional unfairness of machine learning (§2).

However, despite the increasing real-world deployment of GNNs for LP, their unfairness has not been studied from the perspectives of PA and intersections of social groups. Our paper fills this gap by providing thorough theoretical and empirical evidence that GCNs (Kipf & Welling, 2017) have a PA bias when predicting links between nodes in the same social group. This finding is nontrivial as GCNs leverage a combination of features and local structural context to make link predictions.

Our research question is challenging from a technical perspective, as it requires uncovering properties of short random walks on graphs (since most GNNs are shallow); in contrast, most random walk results in the literature concern random walks at convergence. Our research question is further important because GNNs with a PA bias can amplify degree disparities, which translates to increased discrimination and disparities in social influence among nodes.

As we uncover this new form of unfairness, there are no existing solutions to this unfairness in the literature. We propose a training-time regularization-based fairness method that alleviates this unfairness without greatly sacrificing the test AUC of LP. While cap** the number of positive link predictions per node is a possible solution, doing so with utility in mind requires identifying a utility-maximizing subset of link predictions. As our theoretical and empirical results reveal, GCN LP scores are often inherently proportional to the geometric mean of the degrees of the incident nodes, which can make them a poor indicator of prediction confidence; from a calibration perspective, GCNs naturally make overconfident predictions for links between high-degree nodes.

While we describe methods for alleviating degree bias in §2, these methods address degraded performance for low-degree nodes, not PA bias. We do not study performance issues but rather how GCNs scale representations of nodes proportionally to (approximately) the square root of their within-group degree, which affects the magnitude of their LP scores (cf. §K).

In summary, we augment the field’s understanding of degree bias beyond performance disparities across nodes. We further lay a foundation to study PA bias and within-group unfairness in GNN LP more broadly (e.g., SOTA contrastive methods for LP), which is a critical and interesting direction of research.

Appendix K Comparison to Prior Research on Degree Bias

Studies concerning degree bias have observed that low-degree nodes experience degraded performance compared to high-degree nodes. They have thus often formulated degree bias from a performance perspective, focusing on equal opportunity. In particular, these studies seek to satisfy (y^v=y|yv=y,deg(v)=d)=(y^v=y|yv=y,deg(v)=d)\mathbb{P}(\hat{y}_{v}=y|y_{v}=y,deg(v)=d)=\mathbb{P}(\hat{y}_{v}=y|y_{v}=y,% deg(v)=d^{\prime})blackboard_P ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_y | italic_y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_y , italic_d italic_e italic_g ( italic_v ) = italic_d ) = blackboard_P ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_y | italic_y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_y , italic_d italic_e italic_g ( italic_v ) = italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) for all possible degrees d,d𝑑𝑑d,d’italic_d , italic_d ’, where y^vsubscript^𝑦𝑣\hat{y}_{v}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the prediction for node v𝑣vitalic_v and yvsubscript𝑦𝑣y_{v}italic_y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is its ground-truth label. This fairness criterion treats the degree of a node as a sensitive attribute, requiring that a GNN’s accuracy is consistent across nodes with different degrees.

However, in this paper, we seek to ensure that degree disparities in networks are not amplified by GNN LP. We cannot adopt the equal opportunity formulation of degree bias because it is concerned with performance while we are concerned with degree disparity amplification. For example, even if we consistently predict links with the same accuracy across nodes with different degrees, high-degree nodes can still receive higher LP scores than low-degree nodes. In this way, the “degree bias” discussed by other studies is not compatible with our unfairness metric (Eqn. 17). We also cannot simply adopt common LP fairness metrics like dyadic fairness, as they do not capture the new type of unfairness that we uncover.

Roughly, we care that 𝔼[y^uv|deg(u)=d]=𝔼[y^uv|deg(u)=d]𝔼delimited-[]conditionalsubscript^𝑦𝑢𝑣𝑑𝑒𝑔𝑢𝑑𝔼delimited-[]conditionalsubscript^𝑦𝑢𝑣𝑑𝑒𝑔𝑢superscript𝑑\mathbb{E}[\hat{y}_{uv}|deg(u)=d]=\mathbb{E}[\hat{y}_{uv}|deg(u)=d^{\prime}]blackboard_E [ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT | italic_d italic_e italic_g ( italic_u ) = italic_d ] = blackboard_E [ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT | italic_d italic_e italic_g ( italic_u ) = italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ], where y^uvsubscript^𝑦𝑢𝑣\hat{y}_{uv}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT is the GNN score for a link prediction between nodes u,v𝑢𝑣u,vitalic_u , italic_v. In other words, we do not want GNN LP scores to be higher for high-degree nodes vs. low-degree nodes. This is what motivates our fairness metric (Eqn. 17).

Our theoretical analysis (Theorem 4.3) and empirical validation (§6.1) reveal that GCNs fundamentally often predict links between nodes i,j𝑖𝑗i,jitalic_i , italic_j with score approximately deg(i)deg(j)proportional-toabsent𝑑𝑒𝑔𝑖𝑑𝑒𝑔𝑗\propto\sqrt{deg(i)\cdot deg(j)}∝ square-root start_ARG italic_d italic_e italic_g ( italic_i ) ⋅ italic_d italic_e italic_g ( italic_j ) end_ARG because of their symmetric normalized filter. This finding of a preferential attachment bias allows us to express our unfairness metric in terms of degree disparity (Eqn. 22), but this degree disparity is not related to the “degree bias” that has been discussed by other papers; this is a new fairness paradigm.

Appendix L Justification of Assumptions in Lemma 4.1

The independence of path activation probabilities may not always hold true in practice. However, we verify that this assumption is plausible via our extensive experiments on real-world datasets that validate our theoretical analysis (§6.1). This assumption also aligns with findings that deep neural networks have an inductive bias towards learning simpler, often linear, functions (Nakkiran et al., 2019; Valle-Pérez et al., 2019). Furthermore, a variant of our assumption (where ρ(i)=ρ𝜌𝑖𝜌\rho(i)=\rhoitalic_ρ ( italic_i ) = italic_ρ is constant for all nodes) has been used in the literature to simplify theoretical analysis (e.g., Xu et al. (2018); Tang et al. (2020)); our assumption may be more realistic than this variant, as it captures that the probability of paths activating can differ across nodes (e.g., due to differences in features, neighborhood structure).