Networked Inequality: Preferential Attachment Bias in
Graph Neural Network Link Prediction
Abstract
Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group (e.g., queer women) fairness and “rich get richer” dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN’s preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores within social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, social, and credit networks.
1 Introduction
Link prediction (LP) using GNNs is increasingly leveraged to recommend friends in social networks (Fan et al., 2019; Sankar et al., 2021), as well as by scholarly tools to recommend academic literature in citation networks (Xie et al., 2021). In recent years, graph learning researchers have raised concerns about the unfairness of GNN LP (Li et al., 2021; Current et al., 2022; Li et al., 2022). This unfairness is often attributed to graph structure, including the stratification of social groups; for example, online networks are usually segregated by ethnicity (Hofstra et al., 2017). However, most fair GNN LP research has focused on dyadic fairness, i.e., satisfying some notion of parity between inter-group and intra-group link predictions. This formulation neglects: 1) LP dynamics within social groups (Kasy & Abebe, 2021); and 2) the “rich get richer” effect, i.e., the prediction of links at a higher rate with high-degree nodes (Barabási & Albert, 1999). In the context of friend recommendation systems, the “rich get richer” effect can increase the number of links formed with high-degree individuals, which boosts their influence on other individuals in the network, and thus their power (Bashardoust et al., 2022).
In this paper, we shed light on how degree bias in networks affects GCN LP (Kipf & Welling, 2017). We theoretically and empirically find that GCNs with a symmetric normalized graph filter have a within-group preferential attachment (PA) bias in LP. Specifically, GCNs often output LP scores that are approximately proportional to the geometric mean of the (within-group) degrees of the incident nodes when the nodes belong to the same social group. (We elaborate on PA and our motivation in §J.) We focus on GCNs with symmetric and random walk normalized graph filters because they are popular architectures for graph deep learning, and they provide us with a reasonable setting to develop a rigorous theory of PA bias in GNN LP while leveraging tools from spectral graph theory.
Our finding can have significant implications for the fairness of GCN LP. For example, consider links within the CS social group in the toy academic collaboration network in Figure 1. Because men in CS, on average, have a higher within-group degree () than women in CS (), due to gender discrimination, a collaboration recommender system that uses a GCN can suggest men as collaborators at a higher rate. This has the detrimental effect of further concentrating research collaborations among men, thereby reducing the influence of women in CS and reinforcing their marginalization in the field (Yamamoto & Frachtenberg, 2022). Furthermore, considering this marginalization in the context of CS is important, as such marginalization may be less severe or different in Edu.
Our contributions are as follows:
-
1.
We theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group PA bias in LP (§4.1). We validate our theoretical analysis on diverse real-world network datasets (e.g., citation, collaboration, online social networks) of varying size (§6.1). In doing so, we lay a foundation to study this previously-unexplored PA bias in the GNN setting.
-
2.
We bridge GCN’s PA bias with unfairness in LP (§4.2, §6.2). We contribute a new within-group fairness metric for LP, which quantifies disparities in LP scores within social groups, towards combating the amplification of degree and power disparities. To our knowledge, we are the first to study the within-group fairness of GNNs.
- 3.
2 Related Work
Degree Bias in GNNs
Numerous papers have investigated how GNN performance is degraded for low-degree nodes on node representation learning and classification tasks (Tang et al., 2020; Liu et al., 2021; Kang et al., 2022; Xu et al., 2023; Shomer et al., 2023). Liu et al. (2023) present a generalized notion of degree bias that considers different multi-hop structures around nodes and propose a framework to address it; in contrast to prior work, which focuses on degree equal opportunity (i.e., similar accuracy for nodes with the same degree), Liu et al. (2023) also study degree statistical parity (i.e., similar prediction rates of each class for nodes with the same degree). Beyond node classification, Wang & Derr (2022) find GNN LP performance disparities across nodes with different degrees: low-degree nodes often benefit from higher performance than high-degree nodes. In this paper, we find that GCNs have a PA bias in LP, and present a new fairness metric which quantifies disparities in GNN LP scores within social groups. We focus on group fairness (i.e., parity between groups) rather than individual fairness (i.e., treating similar individuals similarly); this is because producing similar LP scores for similar-degree individuals does not prevent high-degree individuals from unfairly amassing links, and thus power (cf. Figure 1). We further compare our work to prior degree bias works in §K.
Fair Link Prediction
Prior work has investigated the unfairness of GNN LP (Li et al., 2021; Current et al., 2022; Li et al., 2022), often attributing it to graph structure, (e.g., stratification of social groups). However, most of this research has focused on dyadic fairness, i.e., satisfying some notion of parity between inter-group and intra-group links. Like Wang & Derr (2022), we examine how degree bias impacts GNN LP; however, rather than focus on performance disparities across nodes with different degrees, we study GCN’s PA bias and LP score disparities across (sub)groups.
Within-Group Fairness
Much previous work has studied within-group fairness, i.e., fairness over social subgroups (e.g., Black women, Indigenous men) defined over multiple axes (e.g., race, gender) (Kearns et al., 2017; Foulds et al., 2020; Ghosh et al., 2021; Wang et al., 2022). The motivation of this work is that classifiers can be fair with respect to two social axes separately, but be unfair to subgroups defined over both these axes. While prior research has termed this phenomenon intersectional unfairness, we opt for within-group unfairness to distinguish it from the critical framework of Intersectionality (Ovalle et al., 2023). We study within-group fairness in the GNN setting. In particular, our theoretical and empirical findings reveal that GCN LP can further marginalize social subgroups; this relates to the “complexity” tenet of Intersectionality, which expresses that the marginalization faced by, e.g., Black women, is non-additive and distinct from the marginalization faced by Black men and white women (Collins & Bilge, 2020).
Bias and Power in Networks
A wealth of literature outside fair graph learning has examined how network structure enables discrimination and disparities in capital (Fish et al., 2019; Stoica et al., 2020; Zhang et al., 2021; Bashardoust et al., 2022). Boyd et al. (2014) describe how an individual’s position in a social network affects their access to jobs and public health information, as well as how they are surveilled. Stoica et al. (2018) observe that high-degree accounts on Instagram overwhelmingly belong to men and recommendation algorithms further boost these accounts; complementarily, the authors find that even a simple, random walk-based recommendation algorithm can amplify degree disparities between social groups in networks modeled by PA dynamics. Similarly, we investigate how GCN LP can amplify degree disparities in networks and further concentrate power among high-degree individuals.
3 Preliminaries
We have a simple, undirected -node graph with doubly-weighted self-loops. The nodes have features , with each . We denote the adjacency matrix of as and the degree matrix as , with . We consider two -layer GCN encoders: (1) (Kipf & Welling, 2017), which uses a symmetric normalized filter, and (2) , which uses a random walk normalized filter. and compute node representations as, :
(1) | ||||
(2) | ||||
(3) |
where ; is the 1-hop neighborhood of ; and are the weight matrices corresponding to layer of and , respectively; for is a ReLU non-linearity; and is the identity function. We now consider the first-order Taylor expansions of and around :
(4) | |||
(5) |
where is the error of the first-order approximations. This error is low when are close to , which we validate empirically in §6.1. Furthermore, we consider an inner-product LP score function :
(6) |
where is the last-layer representation for node . While it is common to use a vanilla GCN and inner-product score function for LP (Fey, 2019), researchers have proposed methods to improve the expressivity of node representations for LP by capturing subgraph information (Zhang & Chen, 2018; Li et al., 2020; Chamberlain et al., 2023). Our theoretical findings remain relevant to methods that ultimately use a GCN to predict links (e.g., Zhang & Chen (2018); Li et al. (2020)), as we do not make assumptions about the features passed to the GCN (i.e., they could be distance encodings, SEAL node embeddings, etc.) Our results may also generalize to GNN architectures that use a degree-normalized graph filter, e.g., Graph Attention Networks (Veličković et al., 2018). Studying the fairness of more expressive LP methods is an interesting direction for future research. Furthermore, although we only consider an inner-product LP score function in our theoretical analysis, we also run experiments with a Hadamard product and MLP score function (cf. §G.2), and we find that our theoretical analysis is still relevant to and reasonably supports the experimental results.
4 Theoretical Analysis
We leverage spectral graph theory to study how degree bias affects GCN LP. Theoretically, we find that GCNs with a symmetric normalized graph filter have a within-group PA bias (§4.1), but GCNs with a random walk normalized filter may lack such a bias (§4.3). We further bridge GCN’s PA bias with unfairness in GCN LP, proposing a new LP within-group fairness metric (§4.2) and a simple training-time strategy to alleviate unfairness (§5). We empirically validate our theoretical results and fairness strategy in §6. We provide proofs for all theoretical results in §A.
Our ultimate goal is to bound the expected LP scores and for nodes in the same social group in terms of the degrees of . We begin with Lemma 4.1, which expresses GCN representations (in expectation) as a linear combination of the initial node features. In doing so, we decouple the computation of GCN representations from the non-linearities .
Lemma 4.1.
Similarly to Xu et al. (2018), assume that each path from node in the computation graph of is independently activated with probability , and similarly, for (cf. §L). Furthermore, suppose that , where the expectations are taken over the probability distributions of paths activating. We define , and . Then, :
(7) | |||
(8) |
Lemma 4.1 demonstrates that under certain assumptions (which we show to be reasonable in §6.1), the expected GCN representations can be expressed as a linear combination of the node features that depends on a normalized version of the adjacency matrix.
We now introduce social groups in into our analysis. Suppose that can be partitioned into disjoint social groups , such that and . Furthermore, we define as the induced connected subgraph of formed from . (If a group comprises connected components, it can be treated as separate groups.) Let be a within-group adjacency matrix that contains links between nodes in the same group, i.e., contains the link if and only if for some group , . Without loss of generality, we reorder the rows and columns of and such that is a block matrix. Let be the degree matrix of .
4.1 Symmetric Normalized Filter
We first focus on analyzing . We introduce the notation for the symmetric normalized adjacency matrix. We further define , which has the form . Each admits the orthonormal spectral decomposition . Let be the eigenvalues of sorted in non-increasing order; the eigenvalues fall in the range . By the spectral properties of , . Following Lovász (2001), we denote the spectral gap of as ; corresponds to the smallest non-zero eigenvalue of the symmetric normalized graph Laplacian. Let . If is highly modular or approximately disconnected, then , albeit with positive and non-positive entries. Finally, we define the volume .
In Lemma 4.2, we present an inequality for the entries of in terms of the spectral properties of . We can then combine this inequality with Lemma 4.1 to bound , and subsequently .
Lemma 4.2.
For :
(9) | |||
(10) |
where is the operator norm. And for , .
The proof of Lemma 4.2 is similar to spectral proofs of random walk convergence. When is small (e.g., 2 for many GCNs (Kipf & Welling, 2017)) and , . Furthermore, with significant stratification between social groups (Hofstra et al., 2017) and high expansion within groups (Malliaros & Megalooikonomou, 2011; Leskovec et al., 2008), . In this case, and for . Combining Lemmas 4.1 and 4.2, can oversmooth the expected representations to (Keriven, 2022; Giovanni et al., 2023). We use this knowledge to bound in terms of the degrees of .
Theorem 4.3.
Following a relaxed assumption from Xu et al. (2018), for nodes , we assume that . Then:
(11) | ||||
(12) | ||||
where: | (13) | |||
(14) | ||||
(15) | ||||
(16) |
In simpler terms, Theorem 4.3 states that with social stratification and expansion, the expected LP score approximately when belong to the same social group. This is because, as explained before Theorem 4.3, , so the RHS of the bound is . This demonstrates that in LP, GCNs with a symmetric normalized graph filter have a within-group PA bias. If positively influences the formation of links over time, this PA bias can drive “rich get richer” dynamics within social groups (Stoica et al., 2018). As shown in Figure 1 and §4.2, such “rich get richer” dynamics can engender group unfairness when nodes’ degrees are statistically associated with their group membership (§4.2). An association between node degree and group membership depends on group size and homophily; in particular, when a group has many nodes and intra-links (i.e., is homophilous), there may be more nodes with a high within-group degree. Beyond fairness, Theorem 4.3 reveals that GCNs do not align with theories that social rank influences link formation, i.e., the likelihood of a link forming between nodes is proportional to their degree difference (Gu et al., 2018).
4.2 Within-Group Fairness
We further investigate the fairness implications of the PA bias of in LP. We first introduce an additional set of social groups. Suppose that can also be partitioned into disjoint social groups ; then, we can consider intersections of and . For example, revisiting Figure 1, may correspond to academic discipline (e.g., CS, Edu) and may correspond to gender (e.g., men, women). For simplicity, we let . We measure the unfairness of LP for group as:
(17) | ||||
(18) | ||||
(19) |
where is a discrete uniform distribution over the input set. quantifies disparities in GCN LP scores within (with respect to and ). In other words, measures differences in how GCNs allocate LP scores across subgroups, i.e., are links with nodes in one subgroup predicted at a higher rate than links with nodes in the other subgroup? Our metric is motivated by how GNN link predictions influence real-world link formation (e.g., GNN-based recommender systems use LP scores to rank suggested social connections), which has consequences for degree and power disparities. Based on Theorem 4.3 and §B.1, when , we can estimate as:
(20) | |||
(21) | |||
(22) |
This suggests that a large disparity in the degree of nodes in vs. can greatly increase the unfairness of LP. For example, in Figure 1, the large degree disparity within CS (between men and women) entails that a GCN collaboration recommender system applied to the network will have a large . We empirically validate these fairness implications on diverse network datasets in §6.2. While we consider pre-activation LP scores in Eqn. 17 (in line with prior work, e.g., Li et al. (2021)), we consider post-sigmoid scores (where is the sigmoid function) in §6.2 and §6.3, as this simulates how LP scores may be processed in practice.
Ultimately, within-group unfairness is characteristic of all GNN link prediction methods that: (1) predict scores for links with magnitudes that are positively associated with the degrees of their incident nodes, and (2) are applied to graphs where within-group membership is associated with node degree.
4.3 Random Walk Normalized Filter
We now follow similar steps as with to understand how degree bias affects LP scores for . We redefine , , and the remaining notation from §4.1 accordingly for the random walk setting.
Theorem 4.4.
In other words, if , is approximately constant when belong to the same social group. Based on Theorem 4.4 and §B.2, we can estimate as . Theoretically, this would suggest that a large disparity in the degree of nodes in vs. does not increase the unfairness of LP. However, we find empirically that this is not the case (§6.1). Even so, we include theoretical results for the random walk filter to be more comprehensive with respect to filter choice, as well as be upfront about the limitations of our analysis in this case. We also seek to provide an example of how to apply our analysis to other filters, for researchers who would like to build on it in the future. For example, findings for the random walk filter could be relevant to the GAT filter (Veličković et al., 2018), which is also a row-stochastic matrix.
In summary, in §4, we build on prior analysis techniques for random walks and GNNs. At a high level, we: (1) simplify the GCN architecture to be a linear function by truncating its Taylor expansion and considering node representations in expectation; (2) analyze the convergence of node representations via a spectral analysis of the convergence of short random walks within subgraphs (corresponding to social groups); and (3) use norm inequalities to estimate link prediction scores. Our analysis comprises numerous novel elements including:
-
1.
Analyzing the convergence of random walks within subgraphs, which requires accounting for the rate at which probability mass escapes from the subgraphs. In contrast, random walk results in the literature usually concern the convergence of random walks over an entire graph.
-
2.
Uncovering properties of short random walks on graphs, since most GNNs are shallow. In contrast, random walk results in the literature often concern the stationary distribution of random walks.
-
3.
Concretely relating theoretical properties of random walks to the fairness of GCN link prediction.
5 Fairness Regularizer
We propose a simple training-time solution to alleviate within-group LP unfairness regardless of graph filter type and GNN architecture. In particular, we can add a fairness regularization term to our original GNN training loss (Kamishima et al., 2011):
(29) |
where is a tunable hyperparameter that for higher values, pushes the GNN to learn fairer parameters. With our fairness strategy, we empirically observe a significant decrease in the average unfairness across groups without a severe drop in LP performance for GCN (§6.3).
6 Experiments
In this section, we empirically validate our theoretical analysis (§6.1) and the within-group fairness implications of GCN’s LP PA bias (§6.2) on diverse real-world network datasets of varying size. We further find that our simple training-time strategy to alleviate unfairness is effective on citation, online social, and credit networks (§6.3). We release our code and data in our GitHub repository111https://github.com/ArjunSubramonian/link_bias_amplification. We present experimental results with 4-layer GCN encoders and a Hadamard product with MLP LP score function in §G, with similar conclusions.
6.1 Validating Theoretical Analysis
We validate our theoretical analysis on 10 real-world network datasets (e.g., citation, collaboration, online social networks), which we describe in §C. Each dataset is natively intended for node classification; however, we adapt the datasets for LP, treating the connected components within the node classes as the social groups . This design choice is reasonable, as in all the datasets, the classes naturally correspond to socially-relevant grou**s of the nodes, or proxies thereof (e.g., in the LastFMAsia dataset, the classes are the home countries of users). Because we adopt the class labels for each dataset as the social group labels, the social groups are largely homophilic; this aligns with our assumptions when interpreting Theorems 4.3 and 4.4 that social groups are stratified in networks.
We train GCN encoders and for LP over 10 random seeds (cf. §E for more details). In Figure 4, we plot the theoretic222While our theoretic scores resulted from our theoretical analysis in §4, we reiterate that our results in §4 rely on the assumptions that we state and the theoretic score is not a ground-truth value. LP score that we derive in §4 against the GCN LP score for pairs of test nodes belonging to the same social group (including positive and negative links). In particular, for , the theoretic LP score is and the GCN LP score is (cf. Theorem 4.3). In contrast, for , the theoretic LP score is and the GCN LP score is (cf. Theorem 4.4). For all the datasets, we estimate and separately for each social group as the slope of the least-squares regression line (through the data from ) that predicts the GCN score as a function of the theoretic score. Hence, we do not plot any pair of test nodes that is the only pair in , as it is not possible to estimate . Further, the test AUC is consistently high, indicating that the GCNs are well-trained. The large range of each color in the plots indicates a diversity of LP scores within each social group.
We visually observe that the theoretic LP scores are strong predictors of the scores for each dataset, validating our theoretical analysis. This strength is further confirmed by the generally low NRMSE and high PCC (except for the EN dataset). However, we observe a few cases in which our theoretical analysis does not line up with our experiments:
-
1.
Our theoretical analysis predicts that the LP score between two nodes that belong to the same social group will always be non-negative; however, can predict negative scores for pairs of nodes in the same social group. In this case, it appears that relies more on the dissimilarity of (transformed) features than node degree.
-
2.
For many network datasets (especially from the citation and online social domains), there exist node pairs (near the origin) for which the theoretic LP score underestimates the score. Upon further analysis (cf. Appendix H), we find that the theoretic score is less predictive of the score for nodes when the product of their degrees (i.e., their PA score) or similarity of their features is relatively low.
-
3.
It appears that the theoretic LP score tends to poorly estimate the score when the score is relatively high; this suggests that may conservatively rely more on the (dis)similarity of node features than node degree when the degree is large.
We do not observe that the theoretic LP scores are strong predictors of the scores, although there is still a moderate association between these variables. This could be because the error bound for the theoretic scores for , unlike for , has an extra dependence on the degrees of the incident nodes (cf. in Theorem 4.4). In contrast, the error bound for the theoretic scores for (cf. in Theorem 4.3) does not depend on this degree ratio. This ratio can be quite large in social networks (e.g., celebrities vs. new users in the Twitter follow network); we further confirm that this ratio is large for our datasets in §I.
6.2 Within-Group Fairness
We now empirically validate the implications of GCN’s PA bias for within-group unfairness in LP. We run experiments on three network datasets: (1) the NBA social network (Dai & Wang, 2021), (2) the German credit network (Agarwal et al., 2021), and (3) a new DBLP-Fairness citation network that we construct. We describe these datasets in §D, including and .
We train 2-layer GCN encoders for LP (cf. §E). In Figure 3, for all the datasets, we plot vs. (cf. Eqns. 17, 22) for each . We qualitatively and quantitatively observe that is moderately predictive of for each dataset. This confirms our theoretical intuition (§4.2) that a large disparity in the degree of nodes in vs. can greatly increase the unfairness of LP; such unfairness can amplify degree disparities, worsening power imbalances in the network. Many points deviate from the line of equality; these deviations can be explained by the reasons in §6.1 and the compounding of errors.
() Test AUC () NBA 4.0 NBA 2.0 NBA 1.0 NBA 0.0 DBLPFAIRNESS 4.0 DBLPFAIRNESS 2.0 DBLPFAIRNESS 1.0 DBLPFAIRNESS 0.0 GERMAN 4.0 GERMAN 2.0 GERMAN 1.0 GERMAN 0.0
() Test AUC () NBA 4.0 NBA 2.0 NBA 1.0 NBA 0.0 DBLPFAIRNESS 4.0 DBLPFAIRNESS 2.0 DBLPFAIRNESS 1.0 DBLPFAIRNESS 0.0 GERMAN 4.0 GERMAN 2.0 GERMAN 1.0 GERMAN 0.0
6.3 Fairness Regularizer
We evaluate our solution to alleviate LP unfairness (§4.2). In particular, we add our fairness regularization term to the original training loss for the 2-layer and encoders. During each training epoch, we compute post-sigmoid using only the LP scores over the sampled (positive and negative) training edges. In Table 1, we summarize the link prediction fairness and performance (test AUC) for the NBA, German, and DBLP-Fairness datasets with various settings of .
For both graph filter types, we generally observe a significant decrease in (without a severe drop in test AUC) for over (with the exception of for German); however, the varying magnitudes by which decreases across the datasets suggests that may need to be tuned per dataset. As expected, we mostly observe a tradeoff between and the test AUC as increases. Our experiments reveal that, regardless of graph filter type, even simple regularization approaches can alleviate this new form of unfairness. As this form of unfairness has not been previously explored, we have no baselines.
Our fairness regularizer can be easily integrated into model training, does not require significant additional computation, and directly optimizes for LP fairness. The time complexity of calculating the regularization term is , as we have already computed the LP scores for the cross-entropy loss term and simply need to sum them appropriately with respect to the groups and subgroups. Furthermore, the time complexity of computing gradients for the regularization term is on the same order as backpropagation for the cross-entropy loss term.
However, our fairness regularizer is not applicable in settings where model parameters cannot be retrained or finetuned. Hence, we encourage future research to also explore post-processing fairness strategies. For example, for models, based on our theory (cf. Theorem 4.3), for each pair of nodes , we can decay the influence of GCN’s PA bias by scaling (pre-activation) LP scores by , where is a hyperparameter that can be tuned to achieve a desirable balance between and the test AUC.
Empirical evaluation of our fairness regularizer using existing LP fairness metrics, such as statistical parity and equal opportunity dyadic fairness (Li et al., 2021), or equal opportunity degree bias (Wang & Derr, 2022), is beyond the scope of our paper given that our algorithm and metric are designed to handle a different form of unfairness. For example, inter-group and intra-group links can be predicted at the same rate or with the same accuracy, but these links can be exclusively with high-degree nodes, thereby marginalizing low-degree nodes (cf. §J). Similarly, even if we consistently predict links with the same accuracy across nodes with different degrees, high-degree nodes can still receive higher LP scores than low-degree nodes (cf. §K).
7 Conclusion
We theoretically and empirically show that GCNs can have a PA bias in LP. We analyze how this bias can engender within-group unfairness, and amplify degree and power imbalances in networks. We further propose a simple training-time strategy to alleviate this unfairness. We encourage future work to: (1) explore PA bias in other GNN architectures and directed and heterophilic networks, (2) characterize the “rich get richer” evolution of networks affected by GCN’s PA bias, and (3) propose pre-processing and post-processing strategies for within-group LP unfairness.
Because this unfairness is at the level of dyads, we would like to explore new forms of unfairness that occur at the level of higher-order structures (e.g., prediction disparities between important coalitions of nodes). Moreover, node degree is a local property, and it would be valuable to theoretically and empirically relate higher-order graph properties (e.g., local clustering coefficient, different measures of centrality) to unfairness.
Acknowledgements
We would like to thank the anonymous reviewers for their feedback on this work. This work was partially supported by NSF 2211557, NSF 1937599, NSF 2119643, NSF 2303037, NSF 2312501, NASA, SRC JUMP 2.0 Center, Amazon Research Awards, and Snapchat Gifts.
Impact Statement
Our paper seeks to uncover and combat discrimination, bias, and unfairness in GNNs. Throughout, we tie our analysis back to issues of disparity and power, towards advancing justice in graph learning. While we propose a strategy to alleviate LP unfairness, we emphasize that it is not a ‘silver bullet’ solution; we encourage graph learning practitioners to adopt a sociotechnical approach to fairness and continually adapt their algorithms, datasets, and metrics in response to the everchanging landscape of inequality and power. Furthermore, the fairness of GCN LP should not sidestep concerns about GCN LP being used at all in certain scenarios.
Some datasets that we use contain protected attribute information (detailed in §D). We avoid using datasets that enable carceral technology (e.g., Recidivism (Agarwal et al., 2021)). We release our code and data with an MIT license.
For transparency, we do our best to discuss limitations throughout the paper. For each lemma and theorem (§4), our assumptions are clearly explained and justified either before or in the statement thereof, and we include complete proofs of our theoretical claims in §A and §B.
For reproducibility, we provide all our code and data (including the raw DBLP-Fairness dataset) in our GitHub repository, along with a README. We detail our data processing steps in §D.3. Furthermore, our experiments (§6) are run with 10 random seeds and errors are reported. We provide model implementation details in §E.
References
- Agarwal et al. (2021) Agarwal, C., Lakkaraju, H., and Zitnik, M. Towards a unified framework for fair and stable graph representation learning. In Conference on Uncertainty in Artificial Intelligence, 2021.
- Barabási & Albert (1999) Barabási, A.-L. and Albert, R. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. doi: 10.1126/science.286.5439.509. URL https://www.science.org/doi/abs/10.1126/science.286.5439.509.
- Bashardoust et al. (2022) Bashardoust, A., Friedler, S. A., Scheidegger, C. E., Sullivan, B. D., and Venkatasubramanian, S. Reducing access disparities in networks using edge augmentation. ArXiv, abs/2209.07616, 2022.
- Bojchevski & Günnemann (2018) Bojchevski, A. and Günnemann, S. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1ZdKJ-0W.
- Boyd et al. (2014) Boyd, D., Levy, K., and Marwick, A. The networked nature of algorithmic discrimination. Data and Discrimination: Collected Essays. Open Technology Institute, 2014.
- Chamberlain et al. (2023) Chamberlain, B. P., Shirobokov, S., Rossi, E., Frasca, F., Markovich, T., Hammerla, N. Y., Bronstein, M. M., and Hansmire, M. Graph neural networks for link prediction with subgraph sketching. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=m1oqEOAozQU.
- Collins & Bilge (2020) Collins, P. H. and Bilge, S. Intersectionality. John Wiley & Sons, 2020.
- Current et al. (2022) Current, S., He, Y., Gurukar, S., and Parthasarathy, S. FairEGM: Fair link prediction and recommendation via emulated graph modification. In Equity and Access in Algorithms, Mechanisms, and Optimization. ACM, oct 2022. doi: 10.1145/3551624.3555287. URL https://doi.org/10.1145%2F3551624.3555287.
- Dai & Wang (2021) Dai, E. and Wang, S. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21, pp. 680–688, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450382977. doi: 10.1145/3437963.3441752. URL https://doi.org/10.1145/3437963.3441752.
- Fan et al. (2019) Fan, W., Ma, Y., Li, Q., He, Y., Zhao, Y. E., Tang, J., and Yin, D. Graph neural networks for social recommendation. The World Wide Web Conference, 2019.
- Fey (2019) Fey, M. link_pred.py, 2019. URL https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py.
- Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Fish et al. (2019) Fish, B., Bashardoust, A., Boyd, D., Friedler, S., Scheidegger, C., and Venkatasubramanian, S. Gaps in information access in social networks? In The World Wide Web Conference, WWW ’19, pp. 480–490, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450366748. doi: 10.1145/3308558.3313680. URL https://doi.org/10.1145/3308558.3313680.
- Foulds et al. (2020) Foulds, J. R., Islam, R., Keya, K. N., and Pan, S. An intersectional definition of fairness. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1918–1921, 2020. doi: 10.1109/ICDE48307.2020.00203.
- Freedman et al. (2007) Freedman, D., Pisani, R., and Purves, R. Statistics: Fourth International Student Edition. Emersion: Emergent Village Resources for Communities of Faith Series. W.W. Norton & Company, 2007. ISBN 9780393930436.
- Ghosh et al. (2021) Ghosh, A., Genuit, L., and Reagan, M. Characterizing intersectional group fairness with worst-case comparisons. In Artificial Intelligence Diversity, Belonging, Equity, and Inclusion, pp. 22–34. PMLR, 2021.
- Giovanni et al. (2023) Giovanni, F. D., Rowbottom, J., Chamberlain, B. P., Markovich, T., and Bronstein, M. M. Understanding convolution on graphs via energies. 2023. URL https://openreview.net/forum?id=v5ew3FPTgb.
- Gu et al. (2018) Gu, Y., Sun, Y., Li, Y., and Yang, Y. Rare: Social rank regulated large-scale network embedding. Proceedings of the 2018 World Wide Web Conference, 2018.
- Hofstra et al. (2017) Hofstra, B., Corten, R., van Tubergen, F., and Ellison, N. B. Sources of segregation in social networks: A novel approach using facebook. American Sociological Review, 82(3):625–656, 2017. doi: 10.1177/0003122417705656. URL https://doi.org/10.1177/0003122417705656.
- Kamishima et al. (2011) Kamishima, T., Akaho, S., and Sakuma, J. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 643–650, 2011. doi: 10.1109/ICDMW.2011.83.
- Kang et al. (2022) Kang, J., Zhu, Y., Xia, Y., Luo, J., and Tong, H. Rawlsgcn: Towards rawlsian difference principle on graph convolutional network. In Proceedings of the ACM Web Conference 2022, pp. 1214–1225, 2022.
- Kasy & Abebe (2021) Kasy, M. and Abebe, R. Fairness, equality, and power in algorithmic decision-making. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp. 576–586, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445919. URL https://doi.org/10.1145/3442188.3445919.
- Kearns et al. (2017) Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, 2017.
- Keriven (2022) Keriven, N. Not too little, not too much: a theoretical analysis of graph (over)smoothing. ArXiv, abs/2205.12156, 2022.
- Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
- Leskovec et al. (2008) Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6:123–29, 2008.
- Li et al. (2020) Li, P., Wang, Y., Wang, H., and Leskovec, J. Distance encoding: Design provably more powerful neural networks for graph representation learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- Li et al. (2021) Li, P., Wang, Y., Zhao, H., Hong, P., and Liu, H. On dyadic fairness: Exploring and mitigating bias in graph connections. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xgGS6PmzNq6.
- Li et al. (2022) Li, Y., Wang, X., Ning, Y., and Wang, H. Fairlp: Towards fair link prediction on social network graphs. Proceedings of the International AAAI Conference on Web and Social Media, 16(1):628–639, May 2022. doi: 10.1609/icwsm.v16i1.19321. URL https://ojs.aaai.org/index.php/ICWSM/article/view/19321.
- Liu et al. (2021) Liu, Z., Nguyen, T.-K., and Fang, Y. Tail-gnn: Tail-node graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp. 1109–1119, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467276. URL https://doi.org/10.1145/3447548.3467276.
- Liu et al. (2023) Liu, Z., Nguyen, T.-K., and Fang, Y. On generalized degree fairness in graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):4525–4533, Jun. 2023. doi: 10.1609/aaai.v37i4.25574. URL https://ojs.aaai.org/index.php/AAAI/article/view/25574.
- Lovász (2001) Lovász, L. M. Random walks on graphs: A survey. 2001.
- Malliaros & Megalooikonomou (2011) Malliaros, F. D. and Megalooikonomou, V. Expansion properties of large social graphs. In DASFAA Workshops, 2011.
- Nakkiran et al. (2019) Nakkiran, P., Kaplun, G., Kalimeris, D., Yang, T., Edelman, B. L., Zhang, F., and Barak, B. Sgd on neural networks learns functions of increasing complexity, 2019.
- Oono & Suzuki (2020) Oono, K. and Suzuki, T. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1ldO2EFPr.
- Otto (2019) Otto, S. How to normalize the rmse [blog post], 2019. URL https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/.
- Ovalle et al. (2023) Ovalle, A., Subramonian, A., Gautam, V., Gee, G., and Chang, K.-W. Factoring the matrix of domination: A critical review and reimagination of intersectionality in ai fairness. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pp. 496–511, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400702310. doi: 10.1145/3600211.3604705. URL https://doi.org/10.1145/3600211.3604705.
- Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA, 2019.
- Rozemberczki & Sarkar (2020) Rozemberczki, B. and Sarkar, R. Characteristic functions on graphs: Birds of a feather, from statistical descriptors to parametric models. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp. 1325–1334, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411866. URL https://doi.org/10.1145/3340531.3411866.
- Rozemberczki et al. (2021) Rozemberczki, B., Allen, C., and Sarkar, R. Multi-Scale attributed node embedding. Journal of Complex Networks, 9(2):cnab014, 05 2021. ISSN 2051-1329. doi: 10.1093/comnet/cnab014. URL https://doi.org/10.1093/comnet/cnab014.
- Sankar et al. (2021) Sankar, A., Liu, Y., Yu, J., and Shah, N. Graph neural networks for friend ranking in large-scale social platforms. Proceedings of the Web Conference 2021, 2021.
- Shchur et al. (2018) Shchur, O., Mumme, M., Bojchevski, A., and Günnemann, S. Pitfalls of graph neural network evaluation. ArXiv, abs/1811.05868, 2018.
- Shomer et al. (2023) Shomer, H., **, W., Wang, W., and Tang, J. Toward degree bias in embedding-based knowledge graph completion. In Proceedings of the ACM Web Conference 2023, WWW ’23, pp. 705–715, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394161. doi: 10.1145/3543507.3583544. URL https://doi.org/10.1145/3543507.3583544.
- Stoica et al. (2018) Stoica, A.-A., Riederer, C., and Chaintreau, A. Algorithmic glass ceiling in social networks: The effects of social recommendations on network diversity. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, pp. 923–932, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee. ISBN 9781450356398. doi: 10.1145/3178876.3186140. URL https://doi.org/10.1145/3178876.3186140.
- Stoica et al. (2020) Stoica, A.-A., Han, J. X., and Chaintreau, A. Seeding network influence in biased networks and the benefits of diversity. In Proceedings of The Web Conference 2020, WWW ’20, pp. 2089–2098, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450370233. doi: 10.1145/3366423.3380275. URL https://doi.org/10.1145/3366423.3380275.
- Subramonian et al. (2022) Subramonian, A., Chang, K.-W., and Sun, Y. On the discrimination risk of mean aggregation feature imputation in graphs. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 32957–32973. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/d4c2f25bf0c33065b7d4fb9be2a9add1-Paper-Conference.pdf.
- Tang et al. (2008) Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., and Su, Z. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 990–998, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605581934. doi: 10.1145/1401890.1402008. URL https://doi.org/10.1145/1401890.1402008.
- Tang et al. (2020) Tang, X., Yao, H., Sun, Y., Wang, Y., Tang, J., Aggarwal, C., Mitra, P., and Wang, S. Investigating and mitigating degree-related biases in graph convoltuional networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, pp. 1435–1444, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411872. URL https://doi.org/10.1145/3340531.3411872.
- Valle-Pérez et al. (2019) Valle-Pérez, G., Camargo, C. Q., and Louis, A. A. Deep learning generalizes because the parameter-function map is biased towards simple functions, 2019.
- Veličković et al. (2018) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
- Wang et al. (2022) Wang, A., Ramaswamy, V. V., and Russakovsky, O. Towards intersectionality in machine learning: Including more identities, handling underrepresentation, and performing evaluation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pp. 336–349, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533101. URL https://doi.org/10.1145/3531146.3533101.
- Wang & Derr (2022) Wang, Y. and Derr, T. Degree-related bias in link prediction. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 757–758, 2022. doi: 10.1109/ICDMW58026.2022.00103.
- Xie et al. (2021) Xie, Q., Zhu, Y., Huang, J., Du, P., and Nie, J.-Y. Graph neural collaborative topic model for citation recommendation. ACM Trans. Inf. Syst., 40(3), nov 2021. ISSN 1046-8188. doi: 10.1145/3473973. URL https://doi.org/10.1145/3473973.
- Xu et al. (2023) Xu, H., Xiang, L., Huang, F., Weng, Y., Xu, R., Wang, X., and Zhou, C. Grace: Graph self-distillation and completion to mitigate degree-related biases. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp. 2813–2824, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599368. URL https://doi.org/10.1145/3580305.3599368.
- Xu et al. (2021) Xu, H.-R., Bu, Y., Liu, M., Zhang, C., Sun, M., Zhang, Y., Meyer, E., Salas, E., and Ding, Y. Team power dynamics and team impact: New perspectives on scientific collaboration using career age as a proxy for team power. Journal of the Association for Information Science and Technology, 73:1489–1505, 2021.
- Xu et al. (2018) Xu, K., Li, C., Tian, Y., Sonobe, T., ichi Kawarabayashi, K., and Jegelka, S. Representation learning on graphs with jum** knowledge networks. In International Conference on Machine Learning, 2018.
- Yamamoto & Frachtenberg (2022) Yamamoto, J. and Frachtenberg, E. Gender differences in collaboration patterns in computer science. Publications, 10(1), 2022. ISSN 2304-6775. doi: 10.3390/publications10010010. URL https://www.mdpi.com/2304-6775/10/1/10.
- Zhang & Chen (2018) Zhang, M. and Chen, Y. Link prediction based on graph neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 5171–5181, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Zhang et al. (2021) Zhang, Y., Han, J. X., Mahajan, I., Bengani, P., and Chaintreau, A. Chasm in hegemony: explaining and reproducing disparities in homophilous networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–38, 2021.
- Zhao et al. (2022) Zhao, B., Gu, Y., Forde, J. Z., and Saphra, N. One venue, two conferences: The separation of chinese and american citation networks, 2022.
Supplementary Text
Appendix A Proofs
A.1 Proof of Lemma 4.1
Proof.
Similarly to Xu et al. (2018); Tang et al. (2020), we compute the first-order partial derivatives of and :
(30) | ||||
(31) |
where is the -th node on path in the computation graph of or ( is node and is node ); is the set of all -length random walk paths from node to ; and is pre-activated or .
With our assumption that the path from node in the computation graph of is independently activated with probability , and similarly, for :
(32) | |||
(33) |
A.2 Proof of Lemma 4.2
Proof.
For , we can re-express 555For simplicity, we abuse notation here: is not the entry at row and column , but rather the entry at the row corresponding to node and column corresponding to node . Similarly, is the standard basis vector with a 1 at the entry corresponding to node .. By the spectral properties of , (Lovász, 2001). Hence:
(37) | ||||
(38) |
Then, by Cauchy-Schwarz:
(39) | ||||
(40) | ||||
(41) | ||||
(42) | ||||
(43) |
Let . Then, by the triangle inequality:
(44) | ||||
(45) | ||||
(46) |
For , . Then:
(47) | ||||
(48) |
∎
A.3 Proof of Theorem 4.3
A.4 Lemma A.1 and Proof
Lemma A.1.
We introduce the notation . We further define . Fix . Then, for :
(57) |
And for :
(58) |
Proof.
Similar to the proof of Lemma 4.2:
(59) |
Subsequently:
(60) |
Finally:
(61) |
For , . Then:
(62) |
∎
A.5 Proof of Theorem 4.4
Appendix B Approximation of
B.1 Approximation of for
(69) | |||
(70) | |||
(71) | |||
(72) | |||
(73) | |||
(74) |
B.2 Approximation of for
(75) | |||
(76) | |||
(77) | |||
(78) | |||
(79) | |||
(80) |
Appendix C Datasets Used in §6.1
In our experiments in §6.1, we use 10 real-world network datasets from Bojchevski & Günnemann (2018), Shchur et al. (2018), Rozemberczki & Sarkar (2020), and Rozemberczki et al. (2021), covering diverse domains (e.g., citation networks, collaboration networks, online social networks). We provide a description and some statistics of each dataset in Table 2. All the datasets have node features and are undirected. We were unable to find the exact class names and their label correspondence from the dataset documentation.
-
•
In all the citation network datasets, nodes represent documents, edges represent citation links, and features are a bag-of-words representation of documents. We row-normalize the features to sum to 1, following Fey & Lenssen (2019)666https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py. The classification task is to predict the topic of documents.
-
•
In the collaboration network datasets, nodes represent authors, edges represent coauthorships, and features are embeddings of paper keywords for authors’ papers. The classification task is to predict the most active field of study for authors.
-
•
In the LastFMAsia network dataset, nodes represent LastFM users from Asia, edges represent friendships between users, and features are embeddings of the artists liked by users. The classification task is to predict the home country of users.
-
•
In the Twitch network datasets, nodes represent gamers on Twitch, edges represent followerships between them, and features are embeddings of the history of games played by the Twitch users. The classification task is to predict whether or not a gamer streams adult content.
We only run experiments on datasets that can fit without sampling nodes on a single NVIDIA GeForce GTX Titan Xp Graphic Card with 12196MiB of space. Furthermore, we only consider the three largest datasets (i.e., with the most nodes) from Rozemberczki et al. (2021). We use PyTorch Geometric to load and process all datasets (Fey & Lenssen, 2019).
Name | Domain | # Nodes | # Edges | # Features | # Classes |
---|---|---|---|---|---|
Cora | citation | 19793 | 126842 | 8710 | 70 |
CiteSeer | citation | 4230 | 10674 | 602 | 6 |
DBLP | citation | 17716 | 105734 | 1639 | 4 |
PubMed | citation | 19717 | 88648 | 500 | 3 |
CS | collaboration | 18333 | 163788 | 6805 | 15 |
Physics | collaboration | 34493 | 495924 | 8415 | 5 |
LastFMAsia | online social | 7624 | 55612 | 128 | 18 |
Twitch-DE | online social | 9498 | 315774 | 128 | 2 |
Twitch-EN | online social | 7126 | 77774 | 128 | 2 |
Twitch-FR | online social | 6551 | 231883 | 128 | 2 |
Appendix D Datasets Used in §6.2
We run experiments on three network datasets: (1) the NBA social network (cf. §D.1), (2) the German credit network (cf. §D.2), and (3) a new DBLP-Fairness citation network that we construct (cf. §D.3). All the datasets have node features and are undirected. We do not pass sensitive attributes as features to the models that we train. For each dataset, we min-max normalize node features to fall in , following Dai & Wang (2021) and Agarwal et al. (2021). Furthermore, for all datasets, .
D.1 NBA Dataset
The NBA network (Dai & Wang, 2021) has 403 nodes representing NBA basketball players who are connected if they follow each other on Twitter. There are 21242 links. Each node has 95 features, with an average degree of . We consider two sensitive attributes per node:
-
•
Age : how old the payer is, i.e., Young ( years) or Old ( years).
-
•
Nationality : from where the player is, i.e., United States or Overseas.
D.2 German Dataset
The German network (Agarwal et al., 2021) comprises 1000 nodes representing clients in a German bank who are connected if they have similar credit accounts. The German network is not natively a graph dataset; synthetic edges were created by Agarwal et al. There are 44484 links. Each node has 27 features (e.g., loan amount, account-related features), with an average degree of . We consider two sensitive attributes per node:
-
•
Foreign worker : whether the client is a foreign worker, i.e., Yes or No.
-
•
Gender : the gender of the client, i.e., Man or Woman.
D.3 DBLP-Fairness Dataset
In this subsection, we detail how we construct the DBLP-Fairness dataset. We build DBLP-Fairness, as there are only a few natively-graph network datasets with sensitive attributes that are appropriate for graph learning (Subramonian et al., 2022).
We begin with the version of the DBLP-Citation-network V12 dataset from (Tang et al., 2008) that was processed by Xu et al. (2021). This dataset has 3658127 nodes. Each node represents a paper and each edge represents a citation link. We consider five node features:
-
•
Team size: the number of authors on the paper.
-
•
Mean collaborators: the average number of collaborators with whom the authors have previously published.
-
•
Gini collaborators: the Gini coefficient of the number of collaborators with whom the authors have previously published.
-
•
Mean productivity: the average number of papers that the authors have previously published.
-
•
Gini productivity: the Gini coefficient of the number of papers that the authors have previously published.
We also consider two sensitive attributes per node:
-
•
Field : the field to which the paper belongs, i.e., Programming Languages or Databases.
-
•
Nationality : the country where most authors reside, i.e., United States or China.
In DBLP-Fairness, we only include papers whose nationality is United States or China; American and Chinese citation networks are known to be stratified (Zhao et al., 2022). We also only include papers whose field is Programming Languages or Databases; we infer the field of a paper using its keywords (i.e., whether they contain “programming language” and “database”), and discard papers which include both “programming language” and “database” in its keywords. Furthermore, we filter out all papers from before 2010. We sought DBLB-Fairness to be of comparable size to the citation networks in §C. Following filtering, we were left with 14537 nodes and 24844 edges.
Appendix E Models
For all experiments, we use GCN encoders (Kipf & Welling, 2017) to get node representations. Each encoder has two layers (128-dimensional hidden layer, 64-dimensional output layer) with a ReLU nonlinearity in between. We only use two layers, as this is common practice in graph deep learning to prevent oversmoothing (Oono & Suzuki, 2020); however, we run experiments with four layers in §G. We do not use any regularization (e.g., Dropout, BatchNorm). The encoders are explicitly trained for LP with the inner-product LP score function in Eqn. 6, binary cross-entropy loss, and the Adam optimizer with full-batch gradient descent and a learning rate of 0.01 (Kingma & Ba, 2014). We use a random link split of 0.85-0.05-0.1 for train-val-test, following the PyTorch Geometric LP example777https://github.com/pyg-team/pytorch_geometric/blob/master/examples/link_pred.py. We train the encoders for 100 epochs, with a new round of negative link sampling during every epoch; we use a 1:1 ratio of positive to negative links. We ultimately select the model parameters with the highest validation ROC-AUC. Although we do not do any hyperparameter tuning, the test ROC-AUC values (displayed in the figures in §6) indicate that the encoders are well-trained. We use PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey & Lenssen, 2019) to train all the encoders on a single NVIDIA GeForce GTX Titan Xp Graphic Card with 12196MiB of space.
Appendix F Remaining Plots
Appendix G Additional Experiments
G.1 Additional Experiments for §6.1 (4-layer Encoders)
We run the experiments from §6.1 for with the same settings, except we use 4-layer (instead of 2-layer) encoders (128-dimensional hidden layers, 64-dimensional output layer). We run these additional experiments because the error bound for the theoretic LP scores for depends on the number of encoder layers . We find that the experimental results continue to support our theoretical analysis, both qualitatively and quantitatively (cf. Table 3, Figure 7); the NRMSE and PCC values are comparable to or better than those from the experiments with the 2-layer encoders (especially for the EN dataset).
NRMSE () | PCC () | Test AUC () | |
---|---|---|---|
CORA | |||
CITESEER | |||
DBLP | |||
PUBMED | |||
CS | |||
PHYSICS | |||
LASTFMASIA | |||
DE | |||
EN | |||
FR |
G.2 Additional Experiments for §6.1 (Hadamard Product and MLP LP Score Function)
We also run the experiments from §6.1 for with the same settings, except we use the following LP score function:
(81) |
where is the Hadamard product and is a 2-layer MLP with a 64-dimensional hidden layer and ReLU nonlinearity. We run these additional experiments because a Hadamard product and MLP score function is often used in the literature. We find that that our theoretical analysis is still relevant to and reasonably supports the experimental results, both qualitatively and quantitatively (cf. Table 4, Figure 8). This could be because MLPs have an inductive bias towards learning simpler, often linear functions (Nakkiran et al., 2019; Valle-Pérez et al., 2019), and our theoretical findings are generalizable to linear LP score functions. Notably, in this setting, makes a higher number of negative link predictions. For a few datasets (e.g., Cora, CiteSeer, LastFMAsia), a handful of theoretic LP scores are negative because the regression (incorrectly) predicts for 1-2 groups to be negative.
NRMSE () | PCC () | Test AUC () | |
---|---|---|---|
CORA | |||
CITESEER | |||
DBLP | |||
PUBMED | |||
CS | |||
PHYSICS | |||
LASTFMASIA | |||
DE | |||
EN | |||
FR |
G.3 Additional Experiments for §6.2
G.4 Additional Experiments for §6.3
() Test AUC () NBA 4.0 NBA 2.0 NBA 1.0 NBA 0.0 DBLPFAIRNESS 4.0 DBLPFAIRNESS 2.0 DBLPFAIRNESS 1.0 DBLPFAIRNESS 0.0 GERMAN 4.0 GERMAN 2.0 GERMAN 1.0 GERMAN 0.0
() Test AUC () NBA 4.0 NBA 2.0 NBA 1.0 NBA 0.0 DBLPFAIRNESS 4.0 DBLPFAIRNESS 2.0 DBLPFAIRNESS 1.0 DBLPFAIRNESS 0.0 GERMAN 4.0 GERMAN 2.0 GERMAN 1.0 GERMAN 0.0
Appendix H Theory Pitfalls
To understand the second pitfall from §6.1, we separately investigate the association between the within-group degree product and the absolute deviation of the theoretic LP scores from the scores, as well as the association between the (transformed) feature similarity and the absolute deviation (cf. Figure 10). We observe that the absolute deviation is highest for the node pairs with a relatively small degree product (i.e., nodes with a low PA score) and low feature similarity.
Appendix I Error Analysis of Theoretic Scores
Figure 11 reveals that the max term is quite large in practice, which causes the theoretic LP scores to generally be poor estimates for the scores. We additionally find in Figure 11 that the relative error (as measured by NRMSE and PCC) of the theoretic LP scores for is not lower for lower values of the max term .
Furthermore, Figure 12 reveals that LP scores are not higher for incident nodes with larger degrees.
There are intimate connections between Theorem 4.4 and the steady-state probabilities of random walks. The stationary probabilities of random walks are the same regardless of the starting node. This is why produces similar representations for all the nodes in each social group, regardless of the degree of the node; in fact, with a larger number of layers, would oversmooth all the representations to the same vector (Keriven, 2022). Hence, LP scores do not have a degree dependence, theoretically or empirically.
Appendix J Preferential Attachment and Motivation
Preferential Attachment
Preferential attachment (PA) describes the propensity of links to form with high-degree nodes888https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.link_prediction.preferential_attachment.html. Network scientists have studied for decades how links in real-world networks exhibit PA. For example, in the iterative Barabási-Albert model of network formation, each new node forms links with existing nodes with probability proportional to the degree of , i.e., . In the context of our paper, PA describes how a GCN with an inner-product LP score function often predicts links between nodes with score approximately (Theorem 4.3).
Motivation
A wealth of literature in network science and the social sciences has examined the PA properties of real-world networks and how these properties contribute to unfair (non-neural) algorithms (§2). For example, Stoica et al. (2018) find that Instagram accounts run by men have a significantly higher following than those run by women due to gender discrimination; this degree disparity is only amplified by link recommendation algorithms that suggest following high-degree accounts, which makes the rich get richer and reveals that these algorithms have a PA bias. Moreover, many papers outside graph learning have discussed the intersectional unfairness of machine learning (§2).
However, despite the increasing real-world deployment of GNNs for LP, their unfairness has not been studied from the perspectives of PA and intersections of social groups. Our paper fills this gap by providing thorough theoretical and empirical evidence that GCNs (Kipf & Welling, 2017) have a PA bias when predicting links between nodes in the same social group. This finding is nontrivial as GCNs leverage a combination of features and local structural context to make link predictions.
Our research question is challenging from a technical perspective, as it requires uncovering properties of short random walks on graphs (since most GNNs are shallow); in contrast, most random walk results in the literature concern random walks at convergence. Our research question is further important because GNNs with a PA bias can amplify degree disparities, which translates to increased discrimination and disparities in social influence among nodes.
As we uncover this new form of unfairness, there are no existing solutions to this unfairness in the literature. We propose a training-time regularization-based fairness method that alleviates this unfairness without greatly sacrificing the test AUC of LP. While cap** the number of positive link predictions per node is a possible solution, doing so with utility in mind requires identifying a utility-maximizing subset of link predictions. As our theoretical and empirical results reveal, GCN LP scores are often inherently proportional to the geometric mean of the degrees of the incident nodes, which can make them a poor indicator of prediction confidence; from a calibration perspective, GCNs naturally make overconfident predictions for links between high-degree nodes.
While we describe methods for alleviating degree bias in §2, these methods address degraded performance for low-degree nodes, not PA bias. We do not study performance issues but rather how GCNs scale representations of nodes proportionally to (approximately) the square root of their within-group degree, which affects the magnitude of their LP scores (cf. §K).
In summary, we augment the field’s understanding of degree bias beyond performance disparities across nodes. We further lay a foundation to study PA bias and within-group unfairness in GNN LP more broadly (e.g., SOTA contrastive methods for LP), which is a critical and interesting direction of research.
Appendix K Comparison to Prior Research on Degree Bias
Studies concerning degree bias have observed that low-degree nodes experience degraded performance compared to high-degree nodes. They have thus often formulated degree bias from a performance perspective, focusing on equal opportunity. In particular, these studies seek to satisfy for all possible degrees , where is the prediction for node and is its ground-truth label. This fairness criterion treats the degree of a node as a sensitive attribute, requiring that a GNN’s accuracy is consistent across nodes with different degrees.
However, in this paper, we seek to ensure that degree disparities in networks are not amplified by GNN LP. We cannot adopt the equal opportunity formulation of degree bias because it is concerned with performance while we are concerned with degree disparity amplification. For example, even if we consistently predict links with the same accuracy across nodes with different degrees, high-degree nodes can still receive higher LP scores than low-degree nodes. In this way, the “degree bias” discussed by other studies is not compatible with our unfairness metric (Eqn. 17). We also cannot simply adopt common LP fairness metrics like dyadic fairness, as they do not capture the new type of unfairness that we uncover.
Roughly, we care that , where is the GNN score for a link prediction between nodes . In other words, we do not want GNN LP scores to be higher for high-degree nodes vs. low-degree nodes. This is what motivates our fairness metric (Eqn. 17).
Our theoretical analysis (Theorem 4.3) and empirical validation (§6.1) reveal that GCNs fundamentally often predict links between nodes with score approximately because of their symmetric normalized filter. This finding of a preferential attachment bias allows us to express our unfairness metric in terms of degree disparity (Eqn. 22), but this degree disparity is not related to the “degree bias” that has been discussed by other papers; this is a new fairness paradigm.
Appendix L Justification of Assumptions in Lemma 4.1
The independence of path activation probabilities may not always hold true in practice. However, we verify that this assumption is plausible via our extensive experiments on real-world datasets that validate our theoretical analysis (§6.1). This assumption also aligns with findings that deep neural networks have an inductive bias towards learning simpler, often linear, functions (Nakkiran et al., 2019; Valle-Pérez et al., 2019). Furthermore, a variant of our assumption (where is constant for all nodes) has been used in the literature to simplify theoretical analysis (e.g., Xu et al. (2018); Tang et al. (2020)); our assumption may be more realistic than this variant, as it captures that the probability of paths activating can differ across nodes (e.g., due to differences in features, neighborhood structure).