HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: filecontents

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2210.04442v3 [cs.LG] 15 Mar 2024

DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy

Qiuchen Zhang 0002-7054-1983 Emory University201 Dowman DrAtlantaGAUSA30322 [email protected] Hong kyu Lee Emory University201 Dowman DrAtlantaGAUSA [email protected] **g Ma Emory UniversityAtlantaGAUSA [email protected] Jian Lou Zhejiang UniversityHangzhouChina [email protected] Carl Yang Emory UniversityAtlantaGAUSA [email protected]  and  Li Xiong Emory UniversityAtlantaGAUSA [email protected]
(2024)
Abstract.

Graph Neural Networks (GNNs) have achieved great success in learning with graph-structured data. Privacy concerns have also been raised for the trained models which could expose the sensitive information of graphs including both node features and the structure information. In this paper, we aim to achieve node-level differential privacy (DP) for training GNNs so that a node and its edges are protected. Node DP is inherently difficult for GNNs because all direct and multi-hop neighbors participate in the calculation of gradients for each node via layer-wise message passing and there is no bound on how many direct and multi-hop neighbors a node can have, so existing DP methods will result in high privacy cost or poor utility due to high node sensitivity. We propose a Decoupled GNN with Differentially Private Approximate Personalized PageRank (DPAR) for training GNNs with an enhanced privacy-utility tradeoff. The key idea is to decouple the feature projection and message passing via a DP PageRank algorithm which learns the structure information and uses the top-K𝐾Kitalic_K neighbors determined by the PageRank for feature aggregation. By capturing the most important neighbors for each node and avoiding the layer-wise message passing, it bounds the node sensitivity and achieves improved privacy-utility tradeoff compared to layer-wise perturbation based methods. We theoretically analyze the node DP guarantee for the two processes combined together and empirically demonstrate better utilities of DPAR with the same level of node DP compared with state-of-the-art methods.

Differential Privacy; Graph Neural Networks; PageRank
journalyear: 2024copyright: acmlicensedconference: Proceedings of the ACM Web Conference 2024; May 13–17, 2024; Singapore, Singaporebooktitle: Proceedings of the ACM Web Conference 2024 (WWW ’24), May 13–17, 2024, Singapore, Singaporedoi: 10.1145/3589334.3645531isbn: 979-8-4007-0171-9/24/05ccs: Security and privacy Privacy protectionsccs: Computing methodologies Neural networks

1. Introduction

Graph Neural Networks (GNNs) have shown superior performance in mining graph-structured data and learning graph representations for downstream tasks like node classification, link prediction, and graph classification (Wu et al., 2020; Hamilton et al., 2017; Bojchevski et al., 2020; Liu et al., 2020). Like neural network models trained on private datasets that could expose sensitive training data, GNN models trained on graph data embedded with node features and topology are also vulnerable to various privacy attacks (Wu et al., 2021; Zhang et al., 2021a, 2022).

Differential privacy (DP) has become the standard for neural network training with rigorous protection for training data (Dwork et al., 2014; Abadi et al., 2016). A key method is DP stochastic gradient descent (DP-SGD) (Abadi et al., 2016; Zhang et al., 2020), which introduces calibrated noise into gradients during SGD training. DP ensures a bounded risk for an adversary to deduce from a model whether a record was used in its training. For graph data, where both node features (e.g., personal attributes) and edges (e.g., social relationships) can be sensitive, our objective is to achieve node-level DP, limiting the risk of inferring whether a node and its edges were included in the training.

Challenges. Achieving node DP for GNNs is inherently challenging. Unlike grid-based data such as images, graph data contains both feature vectors for each node and the edges that connect the nodes. During the training of GNN models, all direct and multi-hop neighbors participate in the calculation of gradients for each node via recursive layer-wise message passing (Hamilton et al., 2017; Wu et al., 2020). At each layer, each node aggregates the features (or the latent representations) from its neighbors when generating its own representation. There is no bound on how many direct and multi-hop neighbors a node can have. This means the sensitivity of the gradient due to the presence or absence of a node can be extremely high due to the node itself and its neighbors (or correlations between the nodes), which makes standard DP-SGD based methods (Abadi et al., 2016; Zhang et al., 2021b) infeasible, resulting in either high privacy cost or poor utility due to the large required DP noise.

Few recent works tackled node DP for training GNNs and they mainly attempted to bound the correlations during training to help bound the sensitivity or privacy cost. Daigavane et al. (Daigavane et al., 2021) sample subgraphs to ensure that each node has a bounded number of neighbors within each subgraph, and limit the occurrences of each node in other subgraphs such that it can apply the privacy-by-amplification technique (Kasiviswanathan et al., 2011; Bassily et al., 2014) to GNN. Their method is limited to GNNs with only one or two layers. The GAP algorithm (Sajadmanesh et al., 2023) assumes a maximum degree for each node in order to bound the sensitivity of individual nodes. Meanwhile, their message-passing scheme requires DP noise at each step, therefore, it further bounds the sensitivity by bounding the number of hops. This affects the model utility as it may restrict each node from acquiring useful information from higher hop neighbors. In sum, these approaches make it feasible to train GNNs with node DP but still sacrifice the model accuracy due to the restrictions on the number of hops during training.

Contributions. We propose a Decoupled GNN with Differentially Private Approximate Personalized PageRank (DPAR, pronounced “dapper”) for training GNNs with node DP and enhanced privacy-utility tradeoff. The key idea is to decouple the feature aggregation and message passing into two processes: 1) use a DP Approximate Personalized PageRank (APPR) algorithm to learn the structure information, and 2) use the top-K𝐾Kitalic_K neighbors determined by the APPR for feature aggregation and model learning with DP. In other words, the APPR learns the influence score of all direct and multi-hop neighbors, and the layer-wise message-passing is replaced by neighborhood aggregation based on the APPR.

Our framework is based on the decoupled GNN training frameworks (Klicpera et al., 2019; Bojchevski et al., 2020) which are originally designed to scale up the training for large graphs. Our main insight is that this decoupled strategy can be exploited to improve the design of DP algorithms. By capturing the most important neighbors for each node (bounding the node sensitivity) and avoiding the expensive privacy cost accumulation from the layer-wise message passing, our framework achieves enhanced privacy-utility tradeoff compared to layer-wise perturbation based methods.

Adding DP to this decoupled framework is nontrivial and presents several challenges. First, there are no existing works for computing sparsified APPR with formal node DP. While there exist DP top-K𝐾Kitalic_K selection algorithms (Durfee and Rogers, 2019), directly applying it can result in poor accuracy due to high sensitivity since each node (and its edges) can affect all the elements in the APPR matrix. Second, while DP-SGD can be used for feature aggregation, the neighborhood sampling returns a correlated batch of nodes based on the APPR, making the privacy analysis more complex, particularly for quantifying the privacy amplification ratio. To address these challenges, we develop DP-APPR algorithms to compute the top-K𝐾Kitalic_K sparsified APPR with DP. We then utilize DP-SGD (Abadi et al., 2016) for feature aggregation and model training to protect node features. We analyze the privacy loss caused by the neighborhood sampling and calibrate tighter Gaussian noise for the clipped gradients to provide a rigorous overall privacy guarantee. We summarize our contributions as follows.

  • We propose DPAR, a novel de-coupled DP framework with sparsification for training GNNs with rigorous node DP. DPAR decouples message passing from feature aggregation via DP APPR and uses the top-K𝐾Kitalic_K neighbors determined by APPR for feature aggregation, which captures the most important neighbors for each node and avoids the layer-wise message passing and achieves better privacy-utility tradeoff than existing layer-wise perturbation based methods.

  • We develop two DP APPR algorithms based on the exponential mechanism and Gaussian mechanism for selecting top-K𝐾Kitalic_K elements in the APPR vector with formal node DP. We employ sampling and clip** to address the high sensitivity challenge. We utilize the exponential mechanism (Dwork et al., 2014; Durfee and Rogers, 2019) to select the indices of the top-K𝐾Kitalic_K elements first, and then compute the corresponding noisy values with additional privacy costs. Alternatively, the Gaussian mechanism directly adds noise to the APPR vector and then selects the top-K𝐾Kitalic_K from the noisy vectors. We formally analyze the privacy guarantee for both methods.

  • We use DP-SGD for feature aggregation and model learning based on the DP APPR. By using the top-K𝐾Kitalic_K sparsified DP APPR vectors, we limit the maximum number of nodes one node can affect during gradient computation, which is the maximum column-wise 0subscript0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT norm of the DP APPR matrix. We incorporate additional clip** to ensure a maximum 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm per column which determines the sensitivity of each node. We calibrate the Gaussian noise by theoretically analyzing the privacy loss and privacy amplification caused by the neighborhood sampling determined by the DP APPR and provide a rigorous privacy guarantee for DPAR.

  • We conduct extensive experiments on five real-world graph datasets to evaluate the effectiveness of the proposed algorithms. Results show that they achieve better accuracy at the same level of node DP compared to the state-of-the-art algorithms. We also illustrate the privacy protection of the trained models.

2. Background

2.1. GNNs with Personalized PageRank

Given a graph G=(V,E,X)𝐺VEXG=(\mathrm{V},\mathrm{E},\mathrm{X})italic_G = ( roman_V , roman_E , roman_X ), where VV\mathrm{V}roman_V and EE\mathrm{E}roman_E denote the set of vertices and edges, respectively, and X|V|×dXsuperscriptV𝑑\mathrm{X}\in\mathbb{R}^{|\mathrm{V}|\times{d}}roman_X ∈ blackboard_R start_POSTSUPERSCRIPT | roman_V | × italic_d end_POSTSUPERSCRIPT represents the feature matrix where each row corresponds to the associated feature vector Xvdsubscript𝑋𝑣superscript𝑑X_{v}\in\mathbb{R}^{d}italic_X start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (v=1,,|V|𝑣1Vv=1,\dots,|\mathrm{V}|italic_v = 1 , … , | roman_V |) of node v𝑣vitalic_v. Each node is associated with a class (or label) vector Yvcsubscript𝑌𝑣superscript𝑐Y_{v}\in\mathbb{R}^{c}italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, such as the one-hot encoding vector, with the number of classes c. Considering the node classification task as an instance, a GNN model learns a representation function f𝑓fitalic_f that generates the node embedding hvsubscripth𝑣\mathrm{h}_{v}roman_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT for each node vV𝑣Vv\in\mathrm{V}italic_v ∈ roman_V based on the features of the node itself as well as all its neighbors (Wu et al., 2020), and the generated node embeddings will further be used to label the class of unlabeled nodes using the softmax classifier with the cross-entropy loss.

GNN models use the recursive message-passing procedure to spread information through a graph, which couples the neighborhood aggregation and feature transformation for node representation learning. This coupling pattern can cause some potential issues in model training, including neighbor explosion and over-smoothing (Bojchevski et al., 2020; Liu et al., 2020). Recent works propose to decouple the neighborhood aggregation process from feature transformation and achieve superior performance (Bojchevski et al., 2020; Dong et al., 2021). Bojchevski et al. (Bojchevski et al., 2020) show that neighborhood aggregation/propagation based on personalized PageRank (Gleich, 2015) can maintain the influence score of all “neighboring” (relevant) nodes that are reachable to the source node in the graph, without the explicit message-passing procedure. They pre-compute a pagerank matrix ΠΠ\Piroman_Π and truncate it by kee** only the top k𝑘kitalic_k largest entries of each row and setting others to zero to get a sparse matrix ΠpprsuperscriptΠ𝑝𝑝𝑟\Pi^{ppr}roman_Π start_POSTSUPERSCRIPT italic_p italic_p italic_r end_POSTSUPERSCRIPT, which is then used to aggregate node representations, generated using a neural network, of “neighbors” (most relevant nodes) to get final predictions, expressed as follows:

(1) zv=softmax(u𝒩k(v)𝝅(v)uHu,:),subscript𝑧𝑣softmaxsubscript𝑢superscript𝒩𝑘𝑣superscript𝝅subscript𝑣𝑢subscript𝐻𝑢:\small\leavevmode\resizebox{281.85585pt}{}{$z_{v}=\operatorname{softmax}\left(% \sum_{u\in\mathcal{N}^{k}(v)}\bm{\pi}^{\prime}(v)_{u}H_{u,:}\right)$},italic_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = roman_softmax ( ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_v ) end_POSTSUBSCRIPT bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_u , : end_POSTSUBSCRIPT ) ,

where 𝒩k(v)superscript𝒩𝑘𝑣\mathcal{N}^{k}(v)caligraphic_N start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_v ) enumerates indices of the k𝑘kitalic_k non-zero entries in 𝝅(v)superscript𝝅𝑣\bm{\pi}^{\prime}(v)bold_italic_π start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_v ) which is the v𝑣vitalic_v-th row of ΠpprsuperscriptΠ𝑝𝑝𝑟\Pi^{ppr}roman_Π start_POSTSUPERSCRIPT italic_p italic_p italic_r end_POSTSUPERSCRIPT corresponding to the node v𝑣vitalic_v’s sparse APPR vector. 𝑯u,:subscript𝑯𝑢:\bm{H}_{u,:}bold_italic_H start_POSTSUBSCRIPT italic_u , : end_POSTSUBSCRIPT is the node representation generated by a neural network fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT using the node feature vector Xusubscript𝑋𝑢X_{u}italic_X start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT of each node u𝑢uitalic_u independently.

2.2. Differential Privacy (DP)

DP (Dwork et al., 2014; Ma et al., 2019) has demonstrated itself as a strong and rigorous privacy framework for aggregate data analysis in many applications. DP ensures the output distributions of an algorithm are indistinguishable with a certain probability when the input datasets differ in only one record.

Definition 0 ().

((ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ)-Differential Privacy) (Dwork et al., 2014). Let 𝒟𝒟\mathcal{D}caligraphic_D and 𝒟superscript𝒟normal-′\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be two neighboring datasets that differ in at most one entry. A randomized algorithm 𝒜𝒜\mathcal{A}caligraphic_A satisfies (ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ)-differential privacy if for all 𝒮𝒮absent\mathcal{S}\subseteqcaligraphic_S ⊆ Range(𝒜)𝒜(\mathcal{A})( caligraphic_A ):

Pr[𝒜(𝒟)𝒮]eϵPr[𝒜(𝒟)𝒮]+δ,𝑃𝑟delimited-[]𝒜𝒟𝒮superscript𝑒italic-ϵ𝑃𝑟delimited-[]𝒜superscript𝒟𝒮𝛿Pr\left[\mathcal{A}(\mathcal{D})\in\mathcal{S}\right]\leq e^{\epsilon}Pr\left[% \mathcal{A}(\mathcal{D^{\prime}})\in\mathcal{S}\right]+\delta,italic_P italic_r [ caligraphic_A ( caligraphic_D ) ∈ caligraphic_S ] ≤ italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT italic_P italic_r [ caligraphic_A ( caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_S ] + italic_δ ,

where 𝒜(𝒟)𝒜𝒟\mathcal{A}(\mathcal{D})caligraphic_A ( caligraphic_D ) represents the output of 𝒜𝒜\mathcal{A}caligraphic_A with the input 𝒟𝒟\mathcal{D}caligraphic_D, ϵitalic-ϵ\epsilonitalic_ϵ and δ𝛿\deltaitalic_δ are the privacy parameters (or privacy budget) and a lower ϵitalic-ϵ\epsilonitalic_ϵ and δ𝛿\deltaitalic_δ indicate stronger privacy and lower privacy loss.

In this paper, we aim to achieve node-level DP for graph data to protect both the features and edges of a node.

Definition 0 ().

((ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ)-Node-level Differential Privacy) Let 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢normal-′\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be two neighboring graphs that differ in at most one node including its feature vector and all its connected edges. A randomized algorithm 𝒜𝒜\mathcal{A}caligraphic_A satisfies (ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ)-node-level DP if for all 𝒮𝒮absent\mathcal{S}\subseteqcaligraphic_S ⊆ Range(𝒜)𝒜(\mathcal{A})( caligraphic_A ):

Pr[𝒜(𝒢)𝒮]eϵPr[𝒜(𝒢)𝒮]+δ,𝑃𝑟delimited-[]𝒜𝒢𝒮superscript𝑒italic-ϵ𝑃𝑟delimited-[]𝒜superscript𝒢𝒮𝛿Pr\left[\mathcal{A}(\mathcal{G})\in\mathcal{S}\right]\leq e^{\epsilon}Pr\left[% \mathcal{A}(\mathcal{G^{\prime}})\in\mathcal{S}\right]+\delta,italic_P italic_r [ caligraphic_A ( caligraphic_G ) ∈ caligraphic_S ] ≤ italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT italic_P italic_r [ caligraphic_A ( caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_S ] + italic_δ ,

where 𝒜(𝒢)𝒜𝒢\mathcal{A}(\mathcal{G})caligraphic_A ( caligraphic_G ) represents the output of 𝒜𝒜\mathcal{A}caligraphic_A with the input graph 𝒢𝒢\mathcal{G}caligraphic_G.

2.3. DP-SGD and Challenges

A widely used technique for achieving DP for deep learning models is DP stochastic gradient descent (DP-SGD) algorithm (Abadi et al., 2016; Lee and Kifer, 2018). It first computes the gradient 𝐠(xi)𝐠subscript𝑥𝑖\mathbf{g}\left(x_{i}\right)bold_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for each example xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the randomly sampled batch with size B𝐵Bitalic_B, and then clips the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of each gradient with a clip** threshold C𝐶Citalic_C to bound the sensitivity of 𝐠(xi)𝐠subscript𝑥𝑖\mathbf{g}\left(x_{i}\right)bold_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to C𝐶Citalic_C. The clipped gradient 𝐠¯(xi)¯𝐠subscript𝑥𝑖\overline{\mathbf{g}}\left(x_{i}\right)over¯ start_ARG bold_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) of each example will be summed together and added with the Gaussian noise 𝒩(0,σ2C2𝐈)𝒩0superscript𝜎2superscript𝐶2𝐈\mathcal{N}\left(0,\sigma^{2}C^{2}\mathbf{I}\right)caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) to protect privacy. Finally, the average of the noisy accumulated gradient 𝐠~~𝐠\tilde{\mathbf{g}}over~ start_ARG bold_g end_ARG will be used to update the model parameters for this step. We express 𝐠~~𝐠\tilde{\mathbf{g}}over~ start_ARG bold_g end_ARG as:

(2) 𝐠~1B(i=1B𝐠¯(xi)+𝒩(0,σ2C2𝐈)).~𝐠1𝐵superscriptsubscript𝑖1𝐵¯𝐠subscript𝑥𝑖𝒩0superscript𝜎2superscript𝐶2𝐈\small\leavevmode\resizebox{258.36667pt}{}{$\tilde{\mathbf{g}}\leftarrow\frac{% 1}{B}\left(\sum_{i=1}^{B}\overline{\mathbf{g}}\left(x_{i}\right)+\mathcal{N}% \left(0,\sigma^{2}C^{2}\mathbf{I}\right)\right)$}.over~ start_ARG bold_g end_ARG ← divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT over¯ start_ARG bold_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) ) .

In DP-SGD, each example individually calculates its gradient, e.g., only the features of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will be used to compute the gradient 𝐠(xi)𝐠subscript𝑥𝑖\mathbf{g}\left(x_{i}\right)bold_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. However, when training GNNs, nodes are no longer independent, and one node’s feature will affect the gradients of other nodes. In a GNN model with K𝐾Kitalic_K layers, one node has the chance to utilize additional features from all its neighbors up to K𝐾Kitalic_K-hop when calculating its gradient. Rethinking Equation 2, the bound of the sensitivity of i=1B𝐠¯(xi)superscriptsubscript𝑖1𝐵¯𝐠subscript𝑥𝑖\sum_{i=1}^{B}\overline{\mathbf{g}}\left(x_{i}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT over¯ start_ARG bold_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) becomes B*C𝐵𝐶B*Citalic_B * italic_C since changing one node could potentially change the gradients of all nodes in the batch i=1B𝐠¯(xi)superscriptsubscript𝑖1𝐵¯𝐠subscript𝑥𝑖\sum_{i=1}^{B}\overline{\mathbf{g}}\left(x_{i}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT over¯ start_ARG bold_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Substituting B*C𝐵𝐶B*Citalic_B * italic_C for C𝐶Citalic_C in Equation 2 and we get the following equation:

(3) 𝐠~1B(i=1B𝐠¯(xi)+𝒩(0,σ2B2C2𝐈)).~superscript𝐠1𝐵superscriptsubscript𝑖1𝐵¯𝐠subscript𝑥𝑖𝒩0superscript𝜎2superscript𝐵2superscript𝐶2𝐈\small\leavevmode\resizebox{281.85585pt}{}{$\tilde{\mathbf{g}^{\prime}}% \leftarrow\frac{1}{B}\left(\sum_{i=1}^{B}\overline{\mathbf{g}}\left(x_{i}% \right)+\mathcal{N}\left(0,\sigma^{2}B^{2}C^{2}\mathbf{I}\right)\right)$}.over~ start_ARG bold_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ← divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT over¯ start_ARG bold_g end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) ) .

Comparing Equation 3 to 2, to achieve the same level of privacy at each step during DP-SGD, the standard deviation of the Gaussian noise added to the gradients is scaled up by a factor of the batch size B𝐵Bitalic_B, resulting in poor utility. Existing works (Sajadmanesh et al., 2023; Daigavane et al., 2021) mitigate the high sensitivity by bounding the number of hops and node degrees but also sacrifice the information that can be learned from higher hop neighbors, resulting in limited success in improving accuracy.

3. DPAR

We present our DPAR framework for training DP GNN models via DP approximate personalized PageRank (APPR). The key idea is to exploit the decoupled framework (Section 2.1) and decouple message passing from feature aggregation into two steps: 1) use a DP APPR algorithm to learn the structure information (Section 3.1), and 2) use the top-K𝐾Kitalic_K neighbors determined by the APPR for feature aggregation and model learning with DP-SGD (Section 3.2). By capturing the most important neighbors for each node from the APPR and avoiding explicit message passing, it bounds the node sensitivity without sacrificing model accuracy, achieving an improved privacy-utility tradeoff. The overall privacy budget will be split between the two steps, and we theoretically analyze the node DP guarantee for the entire framework in Section 3.2.

3.1. Differentially Private APPR

We develop our DP APPR algorithms based on the ISTA algorithm (Fountoulakis et al., 2019) for computing APPR. Andersen et al. (Andersen et al., 2006) proposed the first approximate personalized PageRank (APPR) algorithm which is adopted in (Klicpera et al., 2019; Bojchevski et al., 2020) to replace the explicit message-passing procedure for GNNs. Most recently, Fountoulakis et al. (Fountoulakis et al., 2019) demonstrated that the APPR algorithm can be characterized as an 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized optimization problem, and proposed an iterative shrinkage-thresholding algorithm (ISTA) (Algorithm 3 in (Fountoulakis et al., 2019)) to solve it with a running time independent of the size of the graph. The input of ISTA contains the adjacency matrix of a graph and the one-hot vector corresponding to the index of one node in the graph, and the output is the APPR vector of that node. We develop our DP APPR algorithm based on ISTA due to its status as one of the state-of-the-art APPR algorithms. ISTA provides an excellent balance between scalability and approximation guarantees. Moreover, the resulting sparse APPR matrix can be easily accommodated into the memory, facilitating the subsequent neural network training.

Recall the purpose of calculating APPR vectors is to utilize them to aggregate representations from relevant nodes for the source node during model training. The index of each entry in an APPR vector indicates the index of a node in the graph, and the value of each entry reflects the importance or relevance of this node to the source node. By reserving the top K𝐾Kitalic_K largest entries for each APPR vector, the feature aggregation step computes a weighted average of the representations of the K𝐾Kitalic_K most relevant nodes to the source node (recall Equation 1). The graph structure information is encoded in both the indexes and values of non-zero entries in each sparse APPR vector. Thus, to provide DP protection for the graph structure, we propose two DP APPR algorithms to obtain the top-K𝐾Kitalic_K indexes and values for each APPR vector.

Input: ISTA hyperparameters: γ,α,ρ𝛾𝛼𝜌\gamma,\alpha,\rhoitalic_γ , italic_α , italic_ρ; privacy parameters: ϵitalic-ϵ\epsilonitalic_ϵ, ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, δ𝛿\deltaitalic_δ; clip bound C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, a graph (V,E)𝑉𝐸(V,E)( italic_V , italic_E ) where V={v1,,vN}𝑉subscript𝑣1subscript𝑣𝑁V=\{v_{1},...,v_{N}\}italic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, an integer K>0𝐾0K>0italic_K > 0 and an integer M[1,N]𝑀1𝑁M\in[1,N]italic_M ∈ [ 1 , italic_N ].
1 Initialize the APPR matrix 𝚷M×N𝚷superscript𝑀𝑁\bm{\Pi}\in\mathbb{R}^{M\times N}bold_Π ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT with all zeros.
2 for i=1,,M𝑖1normal-…𝑀i=1,...,Mitalic_i = 1 , … , italic_M do
3       Compute APPR:
4       Compute the APPR vector 𝐩(vi)subscript𝐩subscript𝑣𝑖\mathbf{p}_{(v_{i})}bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT for node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT using ISTA;
5       Clip Norm:
6       𝐩^(vi):for each entry 𝐩(vi)[j],j[1,,N],set 𝐩(vi)[j]=𝐩(vi)[j]/max(1,𝐩(vi)[j]1C2)\hat{\mathbf{p}}_{(v_{i})}\leftarrow:\text{for each entry }\mathbf{p}_{(v_{i})% }[j],j\in[1,...,N],\text{set }\mathbf{p}_{(v_{i})}[j]=\mathbf{p}_{(v_{i})}[j]/% \max\left(1,\frac{\left\|\mathbf{p}_{(v_{i})}[j]\right\|_{1}}{C_{2}}\right)over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ← : for each entry bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ] , italic_j ∈ [ 1 , … , italic_N ] , set bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ] = bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ] / roman_max ( 1 , divide start_ARG ∥ bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ] ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) Add Noise:
7       𝐩~(vi)𝐩^(vi)+𝐺𝑢𝑚𝑏𝑒𝑙(β𝐈)subscript~𝐩subscript𝑣𝑖subscript^𝐩subscript𝑣𝑖𝐺𝑢𝑚𝑏𝑒𝑙𝛽𝐈\tilde{\mathbf{p}}_{(v_{i})}\leftarrow\hat{\mathbf{p}}_{(v_{i})}+\textit{% Gumbel}\left(\beta\mathbf{I}\right)over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ← over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT + Gumbel ( italic_β bold_I ), where β=C2/ϵ𝛽subscript𝐶2italic-ϵ\beta=C_{2}/\epsilonitalic_β = italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_ϵ;
8       Report Noisy Indexes:
9       𝐍Ksubscript𝐍𝐾absent\mathbf{N}_{K}\leftarrowbold_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ←: select the indexes of the top K𝐾Kitalic_K entries with the largest values in 𝐩~(vi)subscript~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}_{(v_{i})}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT;
10       Report Noisy Values:
11       option I: 𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖absent\tilde{\mathbf{p}}_{(v_{i})}^{\prime}\leftarrowover~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ←: set 𝐩^(vi)[j]subscript^𝐩subscript𝑣𝑖delimited-[]𝑗\hat{\mathbf{p}}_{(v_{i})}[j]over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ], j𝐍K𝑗subscript𝐍𝐾j\in\mathbf{N}_{K}italic_j ∈ bold_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, to be 1/K1𝐾1/K1 / italic_K, and other entries to be 0;
12       option II: 𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖absent\tilde{\mathbf{p}}_{(v_{i})}^{\prime}\leftarrowover~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ←: set 𝐩^(vi)[j]subscript^𝐩subscript𝑣𝑖delimited-[]𝑗\hat{\mathbf{p}}_{(v_{i})}[j]over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ], j𝐍K𝑗subscript𝐍𝐾j\in\mathbf{N}_{K}italic_j ∈ bold_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, to be 𝐩^(vi)[j]+𝐿𝑎𝑝𝑙𝑎𝑐𝑒(KC2/ϵ2)subscript^𝐩subscript𝑣𝑖delimited-[]𝑗𝐿𝑎𝑝𝑙𝑎𝑐𝑒𝐾subscript𝐶2subscriptitalic-ϵ2\hat{\mathbf{p}}_{(v_{i})}[j]+\textit{Laplace}(KC_{2}/\epsilon_{2})over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_j ] + Laplace ( italic_K italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), and other entries to be 0;
13       Replace the i𝑖iitalic_i-th row of 𝚷𝚷\bm{\Pi}bold_Π with 𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}_{(v_{i})}^{\prime}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.
14      
15 end for
return 𝚷𝚷\bm{\Pi}bold_Π and the overall privacy cost.
Algorithm 1 DP-APPR using the Exponential Mechanism (DP-APPR-EM)

Exponential Mechanism (DP-APPR-EM). We present the DP APPR algorithm using the exponential mechanism. While we can employ a DP top-K𝐾Kitalic_K selection algorithm based on the exponential mechanism (Durfee and Rogers, 2019), there are several challenges that need to be addressed. First, each node (and its edges) can change an arbitrary number of elements in the APPR vector and lead to significant changes in each element. Second, each node can change an arbitrary number of APPR vectors in the APPR matrix. Both of these mean extremely high sensitivity, making a direct application of the top-K𝐾Kitalic_K selection algorithm ineffective. To address them, we employ two techniques: 1) clip** each element to bound the sensitivity, 2) sampling and only computing APPR for a subset of M nodes in the graph to reduce sensitivity. We then employ the exponential mechanism to select the top-K𝐾Kitalic_K values.

As shown in Algorithm 1, for each of the M𝑀Mitalic_M sampled nodes, we first compute the APPR vector using the ISTA algorithm (line 4). Then we employ clip** to bound the sensitivity of each element by C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (line 6). We use the clipped value as its utility score for the exponential mechanism since the magnitude of each entry indicates its importance (utility) and is used as the weight when aggregating the representation of the nodes. We simulate the exponential mechanism by injecting a one-shot Gumbel noise to the clipped vector 𝐩^(v)subscript^𝐩𝑣\hat{\mathbf{p}}_{(v)}over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT (line 8) and then select the indexes of top K𝐾Kitalic_K largest noisy entries (Durfee and Rogers, 2019) (line 10). We can then either: option I) set the values of all top K𝐾Kitalic_K entries to be 1/K1𝐾1/K1 / italic_K (line 12), which means we consider the top K𝐾Kitalic_K entries equally important to the source node, or option II) spend additional privacy budget ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to obtain the noisy values of the top K𝐾Kitalic_K entries with DP (line 13). Given the same privacy budget, the option I has a better chance to output indexes of the actual top K𝐾Kitalic_K entries while losing the importance scores. In contrast, option II sacrifices some accuracy in selecting the indexes of top K𝐾Kitalic_K entries but has additional importance scores.

Privacy Analysis of DP-APPR-EM. We formally analyze the DP guarantee of Algorithm 1 utilizing the following corollary for the exponential mechanism based top-K𝐾Kitalic_K selection.

Corollary 0 ().

(Durfee and Rogers, 2019) 𝐺𝑢𝑚𝑏𝑒𝑙k(u)superscriptsubscript𝐺𝑢𝑚𝑏𝑒𝑙𝑘𝑢\mathcal{M}_{\text{Gumbel}}^{k}(u)caligraphic_M start_POSTSUBSCRIPT Gumbel end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) adds the one-shot 𝐺𝑢𝑚𝑏𝑒𝑙(Δ(u)/ϵ)𝐺𝑢𝑚𝑏𝑒𝑙normal-Δ𝑢italic-ϵ\textit{Gumbel}(\Delta(u)/\epsilon)Gumbel ( roman_Δ ( italic_u ) / italic_ϵ ) noise to each utility score u(x,r)𝑢𝑥𝑟u(x,r)italic_u ( italic_x , italic_r ) and outputs the k indices with the largest noisy values. For any δ0𝛿0\delta\geq 0italic_δ ≥ 0, 𝐺𝑢𝑚𝑏𝑒𝑙k(u)superscriptsubscript𝐺𝑢𝑚𝑏𝑒𝑙𝑘𝑢\mathcal{M}_{\text{Gumbel}}^{k}(u)caligraphic_M start_POSTSUBSCRIPT Gumbel end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_u ) is (ε,δ)superscript𝜀normal-′𝛿\left(\varepsilon^{\prime},\delta\right)( italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ )-DP where

ϵ=2min{kϵ,kϵ(e2ϵ1e2ϵ+1)+ϵ2kln(1/δ)}superscriptitalic-ϵ2𝑘italic-ϵ𝑘italic-ϵsuperscript𝑒2italic-ϵ1superscript𝑒2italic-ϵ1italic-ϵ2𝑘1𝛿\epsilon^{\prime}=2\cdot\min\left\{k\epsilon,k\epsilon\left(\frac{e^{2\epsilon% }-1}{e^{2\epsilon}+1}\right)+\epsilon\sqrt{2k\ln(1/\delta)}\right\}italic_ϵ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 2 ⋅ roman_min { italic_k italic_ϵ , italic_k italic_ϵ ( divide start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT + 1 end_ARG ) + italic_ϵ square-root start_ARG 2 italic_k roman_ln ( 1 / italic_δ ) end_ARG }

The privacy analysis conducted in (Durfee and Rogers, 2019) assumes independent users and the sensitivity Δ(u)Δ𝑢\Delta(u)roman_Δ ( italic_u ) is 1. In our case, each node (and its edges) can modify an arbitrary number of elements in the APPR vector and each element can change at most by C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT due to clip** (line 6). Consequently, the sensitivity Δ(u)Δ𝑢\Delta(u)roman_Δ ( italic_u ) used in Corollary 1 is set to C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the noise is calibrated accordingly in our algorithm (line 8). Additionally, since each node can change up to M𝑀Mitalic_M vectors in the APPR matrix, we use sequential composition to bound the privacy loss for M𝑀Mitalic_M APPR vectors. With the calibrated noise and composition, we establish the DP guarantee in Theorem 2.

Theorem 2 ().

For any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, ϵ2>0subscriptitalic-ϵ20\epsilon_{2}>0italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 and δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ], let ϵ1=2min{Kϵ,Kϵ(e2ϵ1e2ϵ+1)+ϵ2Kln(1/δ)}subscriptitalic-ϵ1normal-⋅2𝐾italic-ϵ𝐾italic-ϵsuperscript𝑒2italic-ϵ1superscript𝑒2italic-ϵ1italic-ϵ2𝐾1𝛿\epsilon_{1}=2\cdot\min\left\{K\epsilon,K\epsilon\left(\frac{e^{2\epsilon}-1}{% e^{2\epsilon}+1}\right)+\epsilon\sqrt{2K\ln(1/\delta)}\right\}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 ⋅ roman_min { italic_K italic_ϵ , italic_K italic_ϵ ( divide start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT + 1 end_ARG ) + italic_ϵ square-root start_ARG 2 italic_K roman_ln ( 1 / italic_δ ) end_ARG }, Algorithm 1 is (ϵg1,2Mδ)subscriptitalic-ϵsubscript𝑔12𝑀𝛿(\epsilon_{g_{1}},2M\delta)( italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 2 italic_M italic_δ )-differentially private for option I, and (ϵg2,2Mδ)subscriptitalic-ϵsubscript𝑔22𝑀𝛿(\epsilon_{g_{2}},2M\delta)( italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 2 italic_M italic_δ )-differentially private for option II, where ϵ1=ϵg1/(2Mln(e+ϵg1/2Mδ))subscriptitalic-ϵ1subscriptitalic-ϵsubscript𝑔12𝑀𝑒subscriptitalic-ϵsubscript𝑔12𝑀𝛿\epsilon_{1}=\epsilon_{g_{1}}/\left(2\sqrt{M\ln\left(e+\epsilon_{g_{1}}/2M% \delta\right)}\right)italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_M roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / 2 italic_M italic_δ ) end_ARG ) and ϵ1+ϵ2=ϵg2/(2Mln(e+ϵg2/2Mδ))subscriptitalic-ϵ1subscriptitalic-ϵ2subscriptitalic-ϵsubscript𝑔22𝑀𝑒subscriptitalic-ϵsubscript𝑔22𝑀𝛿\epsilon_{1}+\epsilon_{2}=\epsilon_{g_{2}}/\left(2\sqrt{M\ln\left(e+\epsilon_{% g_{2}}/2M\delta\right)}\right)italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_M roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / 2 italic_M italic_δ ) end_ARG ).

Proof.

See Appendix A for the proof. ∎

Gaussian Mechanism. We explore another DP-APPR algorithm (DP-APPR-GM) based on Gaussian mechanism (Dwork et al., 2014) and output perturbation. The idea behind DP-APPR-GM is to use the clip** strategy to bound the global sensitivity of each output PageRank vector and add Gaussian noise to each bounded PageRank vector to achieve DP. See Appendix B for more details about DP-APPR-GM.

Input: The graph dataset G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG, sampling rate qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, randomly sampled training graph G=(V,E,X)𝐺𝑉𝐸𝑋G=(V,E,X)italic_G = ( italic_V , italic_E , italic_X ) from G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG by qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT where V={v1,,vN}𝑉subscript𝑣1subscript𝑣𝑁V=\{v_{1},...,v_{N}\}italic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, a sampled subset VMVsubscript𝑉𝑀𝑉V_{M}\subseteq Vitalic_V start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊆ italic_V with size M𝑀Mitalic_M (for computing APPR), learning rate ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, batch size B𝐵Bitalic_B, training steps T𝑇Titalic_T, noise scale σ𝜎\sigmaitalic_σ, gradient norm bound C𝐶Citalic_C, clip bound τ𝜏\tauitalic_τ, the DP APPR matrix 𝚷M×N𝚷superscript𝑀𝑁\bm{\Pi}\in\mathbb{R}^{M\times N}bold_Π ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT of VMsubscript𝑉𝑀V_{M}italic_V start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT satisfying (ϵpr,δpr)subscriptitalic-ϵ𝑝𝑟subscript𝛿𝑝𝑟(\epsilon_{pr},\delta_{pr})( italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT )-DP.
1 Initialize θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT randomly
2 for j=1,,N𝑗1normal-…𝑁j=1,...,Nitalic_j = 1 , … , italic_N do
3       𝚷:,j𝚷:,j/max(1,𝚷:,j1τ)subscript𝚷:𝑗subscript𝚷:𝑗1subscriptnormsubscript𝚷:𝑗1𝜏\bm{\Pi}_{:,j}\leftarrow\bm{\Pi}_{:,j}/\max\left(1,\frac{\left\|\bm{\Pi}_{:,j}% \right\|_{1}}{\tau}\right)bold_Π start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ← bold_Π start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT / roman_max ( 1 , divide start_ARG ∥ bold_Π start_POSTSUBSCRIPT : , italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_τ end_ARG )
4 end for
5for t=1,,T𝑡1normal-…𝑇t=1,...,Titalic_t = 1 , … , italic_T do
6       Take a randomly sampled batch B𝐵Bitalic_B and their K𝐾Kitalic_K neighbors based on 𝚷𝚷\bm{\Pi}bold_Π from VMsubscript𝑉𝑀V_{M}italic_V start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT.
7       Compute Gradient:
8       For each iBt𝑖subscript𝐵𝑡i\in B_{t}italic_i ∈ italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, compute 𝐠t(vi)θt(θt,vi)subscript𝐠𝑡subscript𝑣𝑖subscriptsubscript𝜃𝑡subscript𝜃𝑡subscript𝑣𝑖\mathbf{g}_{t}\left(v_{i}\right)\leftarrow\nabla_{\theta_{t}}\mathcal{L}\left(% \theta_{t},v_{i}\right)bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ← ∇ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).
9       Clip Gradient:
10       𝐠¯t(vi)𝐠t(vi)/max(1,𝐠t(vi)2C)subscript¯𝐠𝑡subscript𝑣𝑖subscript𝐠𝑡subscript𝑣𝑖1subscriptnormsubscript𝐠𝑡subscript𝑣𝑖2𝐶\overline{\mathbf{g}}_{t}\left(v_{i}\right)\leftarrow\mathbf{g}_{t}\left(v_{i}% \right)/\max\left(1,\frac{\left\|\mathbf{g}_{t}\left(v_{i}\right)\right\|_{2}}% {C}\right)over¯ start_ARG bold_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ← bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / roman_max ( 1 , divide start_ARG ∥ bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C end_ARG ).
11       Add Noise:
12       𝐠~t1B(i𝐠¯t(vi)+𝒩(0,σ2C2𝐈)).subscript~𝐠𝑡1𝐵subscript𝑖subscript¯𝐠𝑡subscript𝑣𝑖𝒩0superscript𝜎2superscript𝐶2𝐈\tilde{\mathbf{g}}_{t}\leftarrow\frac{1}{B}\left(\sum_{i}\overline{\mathbf{g}}% _{t}\left(v_{i}\right)+\mathcal{N}\left(0,\sigma^{2}C^{2}\mathbf{I}\right)% \right).over~ start_ARG bold_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over¯ start_ARG bold_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) ) . Update Parameters:
13       θt+1θtηt𝐠~tsubscript𝜃𝑡1subscript𝜃𝑡subscript𝜂𝑡subscript~𝐠𝑡\theta_{t+1}\leftarrow\theta_{t}-\eta_{t}\tilde{\mathbf{g}}_{t}italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over~ start_ARG bold_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
14      
15 end for
return θTsubscript𝜃𝑇\theta_{T}italic_θ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and the overall privacy cost.
Algorithm 2 Differentially Private GNNs

3.2. Differentially Private GNNs

We show our overall approach for training a DP GNN model in Algorithm 2. The main idea is to use DP APPR for neighborhood sampling and then use DP-SGD to achieve DP for the node features. We employ additional sampling and clip** to reduce the privacy cost.

Given a graph dataset G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG, we first use a sampling rate qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to randomly sample nodes from G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG to form a subgraph G𝐺Gitalic_G = (V𝑉Vitalic_V, E𝐸Eitalic_E, X𝑋Xitalic_X) containing only the sampled nodes and their connected edges, which is used for training in Algorithm 3. This sampling step brings a privacy amplification effect in our privacy guarantee by a factor of qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (Kasiviswanathan et al., 2011; Beimel et al., 2014). Note that this is different from the batch sampling during each iteration of the training process. We further sample M𝑀Mitalic_M nodes to compute the DP APPR using DP-APPR-EM or DP-APPR-GM and use it as input for Algorithm 2.

Utilizing the sparsified DP APPR vectors (each row has only top-K𝐾Kitalic_K non-zero elements) limits the impact of a node on the gradient computation of up to Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT nodes, where Bsuperscript𝐵B^{\prime}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the maximum column-wise 0subscript0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT norm of the DP APPR matrix (number of non-zero elements in each column). The exact impact or sensitivity is determined by the maximum column-wise 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm of the DP APPR matrix (see privacy analysis for more details). Hence, we employ additional clip** on the DP APPR matrix to bound the sensitivity. Given 𝚷𝚷\bm{\Pi}bold_Π computed using DP-APPR algorithms, each column of 𝚷𝚷\bm{\Pi}bold_Π is clipped to have a maximum 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm of τ𝜏\tauitalic_τ to limit privacy loss (line 3).

During each training step, we sample a batch of B𝐵Bitalic_B nodes and their top-K𝐾Kitalic_K neighbors (both direct and indirect) using APPR vectors, loading features of up to B×K𝐵𝐾B\times Kitalic_B × italic_K nodes for gradient computation (line 6). The loss function (θ,vi)𝜃subscript𝑣𝑖\mathcal{L}(\theta,v_{i})caligraphic_L ( italic_θ , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the cross-entropy between node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s true label and its prediction from Equation 1. Following DP-SGD, we compute each node’s gradient, clip it to a maximum 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of C𝐶Citalic_C, and introduce Gaussian noise with sensitivity C𝐶Citalic_C (line 7-12). The model is updated with the averaged noisy gradient (line 14).

Privacy Analysis. Theorem 3 presents the DP analysis of Algorithm 2. An essential distinction between our algorithm and the original DP-SGD is that our neighborhood sampling returns a correlated batch of nodes for gradient computation (i.e., the computation of 𝐠t(vi)subscript𝐠𝑡subscript𝑣𝑖\mathbf{g}_{t}(v_{i})bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) requires the features of the neighboring nodes of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT accesses the fixed K𝐾Kitalic_K nodes based on the DP-APPR vector), while the original DP-SGD uses the much simpler Poisson sampling. As a result, the privacy analysis of our algorithm is more involved, especially in terms of quantifying the privacy amplification ratio under such a neighbor-correlated sampling setting. We prove that the privacy amplification ratio is proportional to the maximum of the column-wise 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm of the DP-APPR matrix.

For the composition of DP-APPR and DP-SGD, we use the standard composition theorem. Recall that for the privacy composition of multiple DP-APPR vectors for the DP-APPR matrix (Theorem 2 and 3), we used a strong composition theorem. We note that our privacy analysis can always benefit from a more advanced composition theorem to achieve tighter overall privacy, which can be a future work direction.

Theorem 3 ().

There exist constants c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT so that given probability q=B/N𝑞𝐵𝑁q=B/Nitalic_q = italic_B / italic_N and the number of steps T𝑇Titalic_T, for any ϵsgd<c1q2T,subscriptitalic-ϵ𝑠𝑔𝑑subscript𝑐1superscript𝑞2𝑇\epsilon_{sgd}<c_{1}q^{2}T,italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT < italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T , Algorithm 2 is q(ϵsgd+ϵpr,δsgd+δpr)superscript𝑞normal-′subscriptitalic-ϵ𝑠𝑔𝑑subscriptitalic-ϵ𝑝𝑟subscript𝛿𝑠𝑔𝑑subscript𝛿𝑝𝑟q^{\prime}(\epsilon_{sgd}+\epsilon_{pr},\delta_{sgd}+\delta_{pr})italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT ) -differentially private corresponding to G¯normal-¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG, for any δsgd>0subscript𝛿𝑠𝑔𝑑0\delta_{sgd}>0italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT > 0 if we choose σc2qτTlog(1/δsgd)ϵsgd𝜎subscript𝑐2𝑞𝜏𝑇1subscript𝛿𝑠𝑔𝑑subscriptitalic-ϵ𝑠𝑔𝑑\sigma\geq c_{2}\frac{q\tau\sqrt{T\log(1/\delta_{sgd})}}{\epsilon_{sgd}}italic_σ ≥ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_q italic_τ square-root start_ARG italic_T roman_log ( 1 / italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT ) end_ARG end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT end_ARG.

Proof.

See Appendix C for the proof. ∎

4. Experimental Results

We evaluate our method on five graph datasets with varying sizes and edge density: Cora-ML (Bojchevski and Günnemann, 2018), Microsoft Academic graph (Shchur et al., 2018), CS (Shchur et al., 2018), Physics (Shchur et al., 2018), and Reddit (Hamilton et al., 2017). Appendix D provides the details of each dataset.

Table 1. Privacy budget and test accuracy on each graph dataset
Dataset Privacy (ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ) GAP SAGE Features DPAR-EM0 DPAR-EM1 DPAR-GM DPARNoDP GAPNoDP FeaturesNoDP
Cora-ML (1, 2×1032superscript1032\times 10^{-3}2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) 0.34 0.152 0.5733 0.3421 0.2895 0.3333 0.7076 0.8883 0.7733
(8, 2×1032superscript1032\times 10^{-3}2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) 0.5733 0.368 0.6107 0.5965 0.6199 0.4854
MS Academic (1, 8×1048superscript1048\times 10^{-4}8 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.6563 0.013 0.83 0.8306 0.8569 0.8225 0.955 0.9571 0.8382
(8, 8×1048superscript1048\times 10^{-4}8 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.8581 0.063 0.8723 0.9054 0.9135 0.9165
CS (1, 8×1048superscript1048\times 10^{-4}8 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.66 0.0917 0.8344 0.8898 0.8921 0.8927 0.9707 0.9571 0.9307
(8, 8×1048superscript1048\times 10^{-4}8 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.8537 0.7366 0.895 0.9017 0.8994 0.9063
Reddit (1, 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.7047 0.086 0.7436 0.9167 0.9286 0.934 0.9698 0.9949 0.8337
(8, 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.9161 0.82 0.777 0.9375 0.9399 0.931
Physics (1, 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.8192 0.1263 0.8412 0.8887 0.8927 0.8948 0.9548 0.9597 0.9504
(8, 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 0.9088 0.8919 0.9017 0.9023 0.9020 0.9101

Setup. To simulate the real-world situations where training nodes are assumed to be private and not publicly available, we split the nodes into a training set (80%percent8080\%80 %) and a test set (20%percent2020\%20 %), and select inductive graph learning setting by removing edges between the two sets. The training nodes are inaccessible during inference. We use the same 2-layer feed-forward neural network with a hidden layer size of 32 as in (Bojchevski et al., 2020) for all datasets. The training epochs are fixed at 200, the learning rate at 0.005, and the batch size at 60. The hyperparameters for ISTA are chosen through grid search as α=0.25𝛼0.25\alpha=0.25italic_α = 0.25, ρ=104𝜌superscript104\rho=10^{-4}italic_ρ = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, and γ=104𝛾superscript104\gamma=10^{-4}italic_γ = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. In our comparison with baseline methods, we set K𝐾Kitalic_K to 2 for computing top-K𝐾Kitalic_K sparsified DP APPR. We also present results on the effect of K𝐾Kitalic_K with different K𝐾Kitalic_K values. The graph sampling rate is set to q=9%superscript𝑞percent9q^{\prime}=9\%italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 9 % for all datasets, and M=70𝑀70M=70italic_M = 70 nodes are chosen randomly and uniformly to generate DP-APPR vectors. Experiments are conducted on a server with an Nvidia K80 GPU, a 6-core Intel CPU, and 56 GiB RAM. Results are based on the mean of 10 independent trials. The source code is available111The source code is available at: https://github.com/Emory-AIMS/DPAR..

Our Approach and Baselines. Our proposed algorithms using the DP-APPR with exponential mechanism (options I and II in Algorithm 1) are referred to as DPAR-EM0 and DPAR-EM1, respectively, and our algorithm using the DP-APPR with Gaussian mechanism is referred to as DPAR-GM.

We compare our proposed algorithms with two state-of-the-art methods achieving node DP for GNN and one baseline method: 1) SAGE (Daigavane et al., 2021) samples subgraphs of 1-hop neighbors of each node to train 1-layer GNNs with the GraphSAGE (Hamilton et al., 2017) model. 2) GAP (Sajadmanesh et al., 2023) uses aggregation perturbation and MLP-based encoder and classifier with DP-SGD and a bounded node degree and number of hops. 3) Features is a baseline method that only uses node feature as an independent input to train the GNN model and does not consider the structural information of the graph. Features utilizes the original DP-SGD to achieve node DP. Note that it is equal to the case where we use a one-hot vector as each node’s APPR vector in Algorithm 2 (i.e., no correlation with other nodes is used). We included this baseline to help characterize the datasets and calibrate the results, i.e., a good performance of the method may suggest that the topological structure of the particular dataset has limited benefit in training GNN. The models DPARNoDP and GAPNoDP indicate the respective methods (DPAR, GAP) with no DP protection.

Inference Phase. As suggested in (Bojchevski et al., 2020), instead of computing the APPR vectors for all testing nodes and generating predictions based on their APPR vectors, we use power iteration during inference:

(4) Q(0)=H,Q(p)=(1α)D1AQ(p1)+αH,p[1,,P].formulae-sequencesuperscript𝑄0𝐻formulae-sequencesuperscript𝑄𝑝1𝛼superscript𝐷1𝐴superscript𝑄𝑝1𝛼𝐻𝑝1𝑃Q^{(0)}=H,\quad Q^{(p)}=(1-\alpha)D^{-1}AQ^{(p-1)}+\alpha H,p\in[1,...,P].italic_Q start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_H , italic_Q start_POSTSUPERSCRIPT ( italic_p ) end_POSTSUPERSCRIPT = ( 1 - italic_α ) italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A italic_Q start_POSTSUPERSCRIPT ( italic_p - 1 ) end_POSTSUPERSCRIPT + italic_α italic_H , italic_p ∈ [ 1 , … , italic_P ] .

where H𝐻Hitalic_H is the representation matrix of testing nodes generated by the trained private model, with the input being the feature matrix of the testing nodes; D𝐷Ditalic_D and A𝐴Aitalic_A are the degree matrix and adjacency matrix of the graph containing only testing nodes, respectively. The final output of power iteration Q(P)superscript𝑄𝑃Q^{(P)}italic_Q start_POSTSUPERSCRIPT ( italic_P ) end_POSTSUPERSCRIPT will be input into a softmax layer to generate the predictions for testing nodes. We set P=2𝑃2P=2italic_P = 2 and the teleportation constant α=0.25𝛼0.25\alpha=0.25italic_α = 0.25 as suggested in (Bojchevski et al., 2020) in our experiments.

4.1. Privacy vs. Accuracy Trade-off

We use the value of privacy budget ϵitalic-ϵ\epsilonitalic_ϵ (with fixed δ𝛿\deltaitalic_δ chosen to be roughly equal to the inverse of each dataset’s number of training nodes) to represent the level of privacy protection and use the test accuracy for node classification to indicate the model’s utility. Table 1 shows the results of our proposed methods and the baselines in all datasets, where the total privacy budget is evenly divided between DP-APPR and DP-SGD. In comparison to GAP and SAGE, our methods show superior test accuracy under the same privacy budget on all datasets. For instance, when ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1, our methods (DPAR-GM, DPAR-EM0, or DPAR-EM1) achieve the highest test accuracy of 0.3421/0.8569/0.8927/0.934/0.8948 on Cora-ML/MS Academic/CS/Reddit/Physics datasets respectively. The best accuracy achieved by the baselines (GAP or SAGE) is 0.34/0.6563/0.66/0.7047 /0.8192 on the corresponding datasets, indicating a test accuracy improvement by 0.62%percent\%%/30.6%percent\%%/35.3%percent\%%/32.5%percent\%%/9.23%percent\%% respectively. The performance improvement demonstrates our method’s superior ability to balance the privacy-utility trade-off on training graph datasets with privacy considerations.

Refer to caption
Refer to caption
(a) K=4, ϵsgdsubscriptitalic-ϵ𝑠𝑔𝑑\epsilon_{sgd}italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT=2.0
Refer to caption
(b) K=4, ϵsgdsubscriptitalic-ϵ𝑠𝑔𝑑\epsilon_{sgd}italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT=8.0
Refer to caption
(c) K=16, ϵsgdsubscriptitalic-ϵ𝑠𝑔𝑑\epsilon_{sgd}italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT=2.0
Refer to caption
(d) K=16, ϵsgdsubscriptitalic-ϵ𝑠𝑔𝑑\epsilon_{sgd}italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT=8.0
Figure 1. Relationship between privacy budget ϵitalic-ϵ\epsilonitalic_ϵ (fixed δ=2×103𝛿2superscript103\delta=2\times 10^{-3}italic_δ = 2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) and test accuracy on Cora-ML dataset.

Existing research in the graph neural network community suggests that features alone, especially for heterophilic graphs, can sometimes result in better-trained node classification models with MLP as the backend architecture compared to state-of-the-art GNN models (Maurya et al., 2022). For the Cora-ML dataset, which has a low edge density, the Features approach outperforms our methods when ϵitalic-ϵ\epsilonitalic_ϵ is small (e.g., 1). This is because our methods allocate part of the privacy budget to protect graph structure information, which may not be as critical, while Features uses its entire privacy budget to protect node features without considering graph structure information. However, as ϵitalic-ϵ\epsilonitalic_ϵ increases (e.g., 8), our methods outperform Features.

Our proposed methods protect the graph structure and node features independently via the decoupled framework. Different graphs possess unique characteristics, and the relative significance of structure information and node features can differ among them. Accordingly, our methods are able to allocate the total privacy budget differently to protect node features and structures, which leads to more precise and tunable privacy protection for graph data that includes both feature and structural information.

Refer to caption
Refer to caption
(a) DPAR-GM
Refer to caption
(b) DPAR-EM0
Refer to caption
(c) DPAR-EM1
Figure 2. Cora-ML. The privacy budget ϵitalic-ϵ\epsilonitalic_ϵ ratio for DP-ARRP
Refer to caption
(a) DPAR-GM
Refer to caption
(b) DPAR-EM0
Refer to caption
(c) DPAR-EM1
Figure 3. CS. The privacy budget ϵitalic-ϵ\epsilonitalic_ϵ ratio for DP-ARRP
Refer to caption
(a) DPAR-GM
Refer to caption
(b) DPAR-EM0
Refer to caption
(c) DPAR-EM1
Figure 4. MS Academic. The privacy budget ϵitalic-ϵ\epsilonitalic_ϵ ratio for DP-ARRP

Ablation Study of Different DP-APPR Methods. To further study the impact of DP-APPR on the model accuracy, in Figure 1, we fix ϵsgdsubscriptitalic-ϵ𝑠𝑔𝑑\epsilon_{sgd}italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT (privacy budget for DP-SGD) and use varying ϵprsubscriptitalic-ϵ𝑝𝑟\epsilon_{pr}italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT (privacy budget for DP-APPR) as the x-axis. For DPAR-GM and DPAR-EM1, the higher the ϵprsubscriptitalic-ϵ𝑝𝑟\epsilon_{pr}italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT, the less noise is added when calculating the APPR vector for each training node. This allows a better chance for each node to aggregate representations from more important nodes using more precise importance scores. Hence these models have higher test accuracy compared to DPAR-EM0. In contrast, for DPAR-EM0, noise in DP-APPR will only affect the output of the indexes of the top-K𝐾Kitalic_K most relevant nodes corresponding to the source node, but not their importance scores. DPAR-EM0 achieves better performance than DPAR-GM and DPAR-EM1 when the privacy budget ϵprsubscriptitalic-ϵ𝑝𝑟\epsilon_{pr}italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT is small, this is because DPAR-EM0 uses 1/K1𝐾1/K1 / italic_K as the importance score for all nodes (considering nodes equally important), which diminishes the negative effect of less important or irrelevant nodes having high importance scores due to the noise in DPAR-GM and DPAR-EM1. Both DPAR-EM0 and DPAR-EM1 are based on the exponential mechanism designed for identifying the index of the top-K𝐾Kitalic_K accurately. Therefore, when the privacy budget is small, they outperform DPAR-GM. However, when the privacy budget is large, they all have a good chance to find the indexes of the actual top-K𝐾Kitalic_K, and DPAR-GM becomes gradually better than DPAR-EM0 and DPAR-EM1, as the Gaussian noise has better privacy loss composition property.

Refer to caption
(a) DPAR-GM
Refer to caption
(b) DPAR-EM0
Refer to caption
(c) DPAR-EM1
Figure 5. Reddit. The privacy budget ϵitalic-ϵ\epsilonitalic_ϵ ratio for DP-ARRP
Refer to caption
(a) DPAR-GM
Refer to caption
(b) DPAR-EM0
Refer to caption
(c) DPAR-EM1
Figure 6. Physics. The privacy budget ϵitalic-ϵ\epsilonitalic_ϵ ratio for DP-ARRP

4.2. Privacy Protection Effectiveness

Privacy Budget Allocation between DP-APPR and DP-SGD. The total privacy budget is divided between DP-APPR and DP-SGD. We compare the impact of the budget allocation by changing the ratio of the total privacy budget used by each of them. Figure 2, 3, 4, 5, and 6 report the model test accuracy with varying ratios of the total privacy budget used for DP-APPR for the five datasets respectively, and they share the same legend as in Figure 2. A lower ratio means a smaller privacy budget is allocated for DP-APPR while more is allocated for DP-SGD. The impact of the ratio on the privacy-utility trade-off is closely aligned with the characteristics of each dataset. From Figure 2, the model achieves better accuracy when the ratio is lower, regardless of the total privacy budget. This is because of the characteristics of the Cora-ML dataset, as its node features are more important than its structure. Interestingly, when the privacy budget is small, Figure 3, 4, 5, and 6 show that information from node features is crucial for all datasets. Allocating more privacy budget to DP-SGD can learn more useful information from the node features and improve model accuracy. When the privacy budget is large, e.g., ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8, we find that in MS Adacemic and CS datasets, the model can achieve the best results when the budget is equally divided, suggesting the importance of learning from both the structure information and features.

4.3. Effects of Privacy Parameters

We use the Cora-ML dataset as an example to demonstrate the effects of the parameters specific to privacy, including the clip** bound in DP-APPR, the number of nodes M𝑀Mitalic_M in DP-APPR, the number of selected top-K𝐾Kitalic_K entries in DP-APPR, the batch size in DP-SGD, and the clip** bound in DP-SGD. By default, we set the batch size to 60, the clip** bound C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in DP-APPR-GM (Algorithm 3 in Appendix) to 0.01, the clip** bound C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in DP-APPR-EM (Algorithm 1) to 0.001, the gradient norm clip** bound C𝐶Citalic_C for DP-SGD to 1, and M𝑀Mitalic_M to 70. We analyze them individually while kee** the rest constant as the default values.

Clip** Bound in DP-APPR (C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). Figure 7 shows the effect of clip** bound in DP-APPR on the model’s test accuracy. Given a constant total privacy budget, the standard deviation of the noise added to the APPR vectors is proportional to the clip** bound (C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in DP-APPR-GM and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in DP-APPR-EM). Hence, choosing a smaller clip** bound in general can avoid adding too much noise and result in better accuracy. However, too small of a clip** bound may degrade the accuracy due to the clip** error. In experiments, we set C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to be 0.01 and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to be 0.001 for all datasets.

Refer to caption
(a) K=4
Refer to caption
(b) K=16
Figure 7. Cora-ML. Relationship between clip** bound of DP-APPR and model test accuracy. Total privacy (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ ) = (8,2×103)82superscript103(8,2\times 10^{-3})( 8 , 2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ).
Refer to caption
Refer to caption
(a) ϵitalic-ϵ\epsilonitalic_ϵ = 1.0
Refer to caption
(b) ϵitalic-ϵ\epsilonitalic_ϵ = 8.0
Figure 8. Cora-ML: Relationship between the number of top-K𝐾Kitalic_K entries in DP-APPR vector and model test accuracy.

Number of Top-K𝐾Kitalic_K in DP-APPR (K𝐾Kitalic_K). Figure 8 shows the accuracy with respect to varying K𝐾Kitalic_K (2, 4, 8, 16, 32) for the top-K𝐾Kitalic_K selection in DP-APPR. The Gaussian mechanism’s sensitivity depends on the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of the APPR vector. We use a clip bound C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to restrict the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of the APPR vector, therefore the privacy guarantees are linked to C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, not K𝐾Kitalic_K. K𝐾Kitalic_K impacts the number of non-zero entries in each DP-APPR vector, influencing node feature embeddings. A small K𝐾Kitalic_K may not capture enough neighbors while a higher K𝐾Kitalic_K may include more irrelevant nodes as ”neighbors”, adversely affecting aggregated information. For the Exponential mechanism, we clip each APPR vector value by C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to control sensitivity. The privacy guarantee is dependent on both C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and K𝐾Kitalic_K. A larger K𝐾Kitalic_K means more noise for each entry, affecting accuracy. From Figure 8, we can observe that DPAR-EM1 results highlight this effect, while DPAR-EM0 mitigates it by assigning a value of 1/K1𝐾1/K1 / italic_K without additional noise. In our experiments compared against baselines, we use a fixed K𝐾Kitalic_K = 2 for all datasets.

We also investigate the impact of batch size in DP-SGD (B𝐵Bitalic_B), the clip** bound in DP-SGD (C𝐶Citalic_C), and the number of nodes in DP-APPR (M𝑀Mitalic_M). We have included the results in Appendix F.

5. Related Work

Differentially Private Graph Publishing. Works on privacy-preserving graph data publishing aim to release the entire graph (Nguyen et al., 2015; Gao and Li, 2019; Xiao et al., 2014; Jorgensen et al., 2016), or the statistics or properties of the original graph (Ahmed et al., 2019; Lu and Miklau, 2014; Chen et al., 2014; Kasiviswanathan et al., 2013; Zhang et al., 2015; Day et al., 2016), with the DP guarantee. Different from those works, our work focuses on training GNN models on private graph datasets and publishing the model that satisfies node-level DP.

Differentially Private Graph Neural Networks. Yang et al. (Yang et al., 2020) propose using DP-SGD to train a graph generation model with edge-DP, protecting link privacy. Sajadmanesh et al. (Sajadmanesh and Gatica-Perez, 2021) develop a GNN training algorithm based on local DP (LDP) to protect node features’ privacy, excluding edge privacy. Zhang et al. (Zhang et al., 2021c) apply LDP and the functional mechanism (Zhang et al., 2012) to secure user’s sensitive features in graph embedding models for recommendations. Lin et al. (Lin et al., 2022) suggest a privacy-preserving framework for decentralized graphs, ensuring LDP on edge DP for each user. Epasto et al. (Epasto et al., 2022) introduce a DP Personalized PageRank algorithm with edge-level DP for graph embedding. These efforts do not provide strict node-level DP for features and edges in GNN model training. Few recent works (Daigavane et al., 2021; Sajadmanesh et al., 2023) achieve node-level DP for GNNs, yet compromise model accuracy due to training restrictions on hops or layers. Our results show DPAR outperforms these methods.

6. Conclusion

We addressed private learning for GNN models with a two-stage framework: DP approximate personalized PageRank (DP-APPR) and DP-SGD, safeguarding graph structure and node features respectively. We developed two DP-APPR algorithms using Gaussian and exponential mechanisms to learn PageRank for each node’s most relevant neighborhood. DP-APPR protects nodes’ edge information and limits sensitivity during DP-SGD training, enhancing nodes’ feature information protection. Experiments on real-world graph datasets show our methods outperform existing ones in privacy-utility tradeoff. Future work includes develo** tighter privacy DP-APPR algorithms and adaptive privacy budget strategies (e.g., between DP-APPR and DP-SGD based on dataset characteristics), as well as generalizing our approach to various types of graphs.

Acknowledgements.
This research was partially supported by the National Science Foundation (NSF) under CNS-2124104, CNS-2125530, CNS-2302968, IIS-2312502, NCS-2319449, and the National Institute of Health (NIH) under R01ES033241, R01LM013712, K25DK135913.

References

  • (1)
  • Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In ACM SIGSAC CCS.
  • Ahmed et al. (2019) Faraz Ahmed, Alex X Liu, and Rong **. 2019. Publishing Social Network Graph Eigenspectrum With Privacy Guarantees. IEEE Transactions on Network Science and Engineering 7, 2 (2019), 892–906.
  • Andersen et al. (2006) Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS 2006. IEEE, 475–486.
  • Bassily et al. (2014) Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. arXiv:1405.7085 [cs.LG]
  • Beimel et al. (2014) Amos Beimel, Hai Brenner, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. 2014. Bounds on the sample complexity for private learning and private data release. Machine learning 94 (2014), 401–437.
  • Bojchevski and Günnemann (2018) Aleksandar Bojchevski and Stephan Günnemann. 2018. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. ICLR (2018).
  • Bojchevski et al. (2020) Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, and Stephan Günnemann. 2020. Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2464–2473.
  • Chen et al. (2014) Rui Chen, Benjamin CM Fung, S Yu Philip, and Bipin C Desai. 2014. Correlated network data publication via differential privacy. The VLDB Journal 23, 4 (2014), 653–676.
  • Daigavane et al. (2021) Ameya Daigavane, Gagan Madan, Aditya Sinha, Abhradeep Guha Thakurta, Gaurav Aggarwal, and Prateek Jain. 2021. Node-level differentially private graph neural networks. arXiv preprint arXiv:2111.15521 (2021).
  • Day et al. (2016) Wei-Yen Day, Ninghui Li, and Min Lyu. 2016. Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data. 123–138.
  • Dong et al. (2021) Hande Dong, Jiawei Chen, Fuli Feng, Xiangnan He, Shuxian Bi, Zhaolin Ding, and Peng Cui. 2021. On the Equivalence of Decoupled Graph Convolution Network and Label Propagation. The World Wide Web Conference (2021).
  • Durfee and Rogers (2019) David Durfee and Ryan M Rogers. 2019. Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. Advances in Neural Information Processing Systems 32 (2019), 3532–3542.
  • Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407.
  • Epasto et al. (2022) Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, and Peilin Zhong. 2022. Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank. arXiv preprint arXiv:2207.06944 (2022).
  • Fountoulakis et al. (2019) Kimon Fountoulakis, Farbod Roosta-Khorasani, Julian Shun, Xiang Cheng, and Michael W Mahoney. 2019. Variational perspective on local graph clustering. Mathematical Programming 174, 1 (2019), 553–573.
  • Fredrikson et al. (2015) Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 1322–1333.
  • Gao and Li (2019) Tianchong Gao and Feng Li. 2019. Sharing social networks using a novel differentially private graph model. In 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1–4.
  • Gleich (2015) David F Gleich. 2015. PageRank beyond the Web. siam REVIEW 57, 3 (2015), 321–363.
  • Hamilton et al. (2017) William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems.
  • Hou et al. (2023) Guanhao Hou, Qintian Guo, Fangyuan Zhang, Sibo Wang, and Zhewei Wei. 2023. Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–26.
  • Jorgensen et al. (2016) Zach Jorgensen, Ting Yu, and Graham Cormode. 2016. Publishing attributed social graphs with formal privacy guarantees. In Proceedings of the 2016 international conference on management of data. 107–122.
  • Kairouz et al. (2017) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2017. The Composition Theorem for Differential Privacy. IEEE Transactions on Information Theory 63, 6 (2017), 4037–4049.
  • Kasiviswanathan et al. (2011) Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3 (2011), 793–826.
  • Kasiviswanathan et al. (2013) Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2013. Analyzing graphs with node differential privacy. In Theory of Cryptography Conference. Springer, 457–476.
  • Klicpera et al. (2019) Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then propagate: Graph neural networks meet personalized pagerank. ICLR (2019).
  • Lee and Kifer (2018) Jaewoo Lee and Daniel Kifer. 2018. Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. In KDD.
  • Li et al. (2020) Kaiyang Li, Guangchun Luo, Yang Ye, Wei Li, Shihao Ji, and Zhipeng Cai. 2020. Adversarial Privacy Preserving Graph Embedding against Inference Attack. IEEE Internet of Things Journal (2020).
  • Li et al. (2023) Yiming Li, Yanyan Shen, Lei Chen, and Mingxuan Yuan. 2023. Zebra: When Temporal Graph Neural Networks Meet Temporal Personalized PageRank. Proceedings of the VLDB Endowment 16, 6 (2023), 1332–1345.
  • Lin et al. (2022) Wanyu Lin, Baochun Li, and Cong Wang. 2022. Towards Private Learning on Decentralized Graphs with Local Differential Privacy. arXiv:2201.09398 (2022).
  • Liu et al. (2020) Meng Liu, Hongyang Gao, and Shuiwang Ji. 2020. Towards deeper graph neural networks. In 26th ACM SIGKDD. 338–348.
  • Liu et al. (2022) Xiyang Liu, Weihao Kong, Prateek Jain, and Sewoong Oh. 2022. DP-PCA: Statistically Optimal and Differentially Private PCA. Advances in Neural Information Processing Systems 35 (2022), 29929–29943.
  • Lu and Miklau (2014) Wentian Lu and Gerome Miklau. 2014. Exponential random graph estimation under differential privacy. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 921–930.
  • Lv et al. (2021) Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. 2021. Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1150–1160.
  • Ma et al. (2019) **g Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-preserving tensor factorization for collaborative health data analysis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1291–1300.
  • Maurya et al. (2022) Sunil Kumar Maurya, Xin Liu, and Tsuyoshi Murata. 2022. Simplifying approach to node classification in Graph Neural Networks. Journal of Computational Science 62 (2022), 101695.
  • Nguyen et al. (2015) Hiep H Nguyen, Abdessamad Imine, and Michaël Rusinowitch. 2015. Differentially private publication of social graphs at linear cost. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 596–599.
  • Sajadmanesh and Gatica-Perez (2021) Sina Sajadmanesh and Daniel Gatica-Perez. 2021. Locally private graph neural networks. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2130–2145.
  • Sajadmanesh et al. (2023) Sina Sajadmanesh, Ali Shahin Shamsabadi, Aurélien Bellet, and Daniel Gatica-Perez. 2023. GAP: Differentially Private Graph Neural Networks with Aggregation Perturbation. In 32nd USENIX Security Symposium.
  • Shchur et al. (2018) Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. Relational Representation Learning Workshop (R2L 2018), NeurIPS (2018).
  • Wu et al. (2021) Bang Wu, Xiangwen Yang, Shirui Pan, and Xingliang Yuan. 2021. Adapting membership inference attacks to GNN for graph classification: approaches and implications. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 1421–1426.
  • Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems (2020).
  • Xiao et al. (2014) Qian Xiao, Rui Chen, and Kian-Lee Tan. 2014. Differentially private network data release via structural inference. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 911–920.
  • Yang et al. (2020) Carl Yang, Haonan Wang, Lichao Sun, and Bo Li. 2020. Secure Network Release with Link Privacy. arXiv:2005.00455 (2020).
  • Zhang et al. (2015) Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2015. Private release of graph statistics using ladder functions. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 731–745.
  • Zhang et al. (2012) Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. 2012. Functional Mechanism: Regression Analysis under Differential Privacy. Proceedings of the VLDB Endowment 5, 11 (2012).
  • Zhang et al. (2021b) Qiuchen Zhang, **g Ma, Jian Lou, and Li Xiong. 2021b. Private stochastic non-convex optimization with improved utility rates. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence.
  • Zhang et al. (2020) Qiuchen Zhang, **g Ma, Yonghui Xiao, Jian Lou, and Li Xiong. 2020. Broadening differential privacy for deep learning against model inversion attacks. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 1061–1070.
  • Zhang et al. (2021c) Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui, and Xiangliang Zhang. 2021c. Graph Embedding for Recommendation against Attribute Inference Attacks. arXiv:2101.12549 (2021).
  • Zhang et al. (2022) Zhikun Zhang, Min Chen, Michael Backes, Yun Shen, and Yang Zhang. 2022. Inference attacks against graph neural networks. In 31st USENIX Security Symposium (USENIX Security 22). 4543–4560.
  • Zhang et al. (2021a) Zaixi Zhang, Qi Liu, Zhenya Huang, Hao Wang, Chengqiang Lu, Chuanren Liu, and Enhong Chen. 2021a. Graphmi: Extracting private graph data from graph neural networks. arXiv preprint arXiv:2106.02820 (2021).

Appendix A Proof for Theorem 2

Proof.

We first consider the privacy loss of outputting the noisy APPR vector 𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}_{\left(v_{i}\right)}^{\prime}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in Algorithm 1. For each element in the APPR vector, we use its value as its utility score. Since each element is nonnegative and clipped by the constant C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sensitivity Δ(u)Δ𝑢\Delta(u)roman_Δ ( italic_u ) of each element is equal to C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By adding the one-shot Gumbel noise 𝐺𝑢𝑚𝑏𝑒𝑙(β𝐈)𝐺𝑢𝑚𝑏𝑒𝑙𝛽𝐈\textit{Gumbel}(\beta\mathbf{I})Gumbel ( italic_β bold_I ) where β=C2/ϵ𝛽subscript𝐶2italic-ϵ\beta=C_{2}/\epsilonitalic_β = italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_ϵ to the clipped APPR vector 𝐩~(vi)~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}\left(v_{i}\right)over~ start_ARG bold_p end_ARG ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), option I selects K𝐾Kitalic_K indices with the largest noisy values and satisfies (ϵ1,δ)subscriptitalic-ϵ1𝛿(\epsilon_{1},\delta)( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_δ )-DP where ϵ1=2min{Kϵ,Kϵ(e2ϵ1e2ϵ+1)+ϵ2Kln(1/δ)}subscriptitalic-ϵ12𝐾italic-ϵ𝐾italic-ϵsuperscript𝑒2italic-ϵ1superscript𝑒2italic-ϵ1italic-ϵ2𝐾1𝛿\epsilon_{1}=2\cdot\min\left\{K\epsilon,K\epsilon\left(\frac{e^{2\epsilon}-1}{% e^{2\epsilon}+1}\right)+\epsilon\sqrt{2K\ln(1/\delta)}\right\}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 ⋅ roman_min { italic_K italic_ϵ , italic_K italic_ϵ ( divide start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT 2 italic_ϵ end_POSTSUPERSCRIPT + 1 end_ARG ) + italic_ϵ square-root start_ARG 2 italic_K roman_ln ( 1 / italic_δ ) end_ARG } according to Corollary 1. Option II uses the Laplace mechanism (Dwork et al., 2014) to report K𝐾Kitalic_K selected noisy values. By adding Laplace noise 𝐿𝑎𝑝𝑙𝑎𝑐𝑒(KC2/ϵ2)𝐿𝑎𝑝𝑙𝑎𝑐𝑒subscriptKC2subscriptitalic-ϵ2\textit{Laplace}\left(\mathrm{KC}_{2}/\mathrm{\epsilon}_{2}\right)Laplace ( roman_KC start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) to each clipped element, option II costs an additional ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT privacy budget (Dwork et al., 2014) since the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sensitivity of each element is C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and satisfies (ϵ1+ϵ2,δ)subscriptitalic-ϵ1subscriptitalic-ϵ2𝛿(\epsilon_{1}+\epsilon_{2},\delta)( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_δ )-DP.

Now we consider the privacy loss of Algorithm 1 which outputs M𝑀Mitalic_M noisy APPR vectors. We use the optimal composition theorem in (Kairouz et al., 2017) which argues that for k𝑘kitalic_k sub-mechanisms, each with an (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantee, the overall privacy guarantee is (ϵg,δg)subscriptitalic-ϵ𝑔subscript𝛿𝑔(\epsilon_{g},\delta_{g})( italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ), where ϵ=ϵg/(2kln(e+ϵg/δg))italic-ϵsubscriptitalic-ϵ𝑔2𝑘𝑒subscriptitalic-ϵ𝑔subscript𝛿𝑔\epsilon=\epsilon_{g}/(2\sqrt{k\ln(e+\epsilon_{g}/\delta_{g})})italic_ϵ = italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_k roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) end_ARG ) and δ=δg/2k𝛿subscript𝛿𝑔2𝑘\delta=\delta_{g}/2kitalic_δ = italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / 2 italic_k. By substituting M𝑀Mitalic_M for k𝑘kitalic_k and ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / ϵ1+ϵ2subscriptitalic-ϵ1subscriptitalic-ϵ2\epsilon_{1}+\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (option I/option II) for ϵitalic-ϵ\epsilonitalic_ϵ, the privacy loss of Algorithm 1 with option I is (ϵg1,2Mδ)subscriptitalic-ϵsubscript𝑔12𝑀𝛿(\epsilon_{g_{1}},2M\delta)( italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 2 italic_M italic_δ ), where ϵ1=ϵg1/(2Mln(e+ϵg1/2Mδ))subscriptitalic-ϵ1subscriptitalic-ϵsubscript𝑔12𝑀𝑒subscriptitalic-ϵsubscript𝑔12𝑀𝛿\epsilon_{1}=\epsilon_{g_{1}}/\left(2\sqrt{M\ln\left(e+\epsilon_{g_{1}}/2M% \delta\right)}\right)italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_M roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / 2 italic_M italic_δ ) end_ARG ), and the privacy loss of Algorithm 1 with option II is (ϵg2,2Mδ)subscriptitalic-ϵsubscript𝑔22𝑀𝛿(\epsilon_{g_{2}},2M\delta)( italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 2 italic_M italic_δ ), where ϵ1+ϵ2=ϵg2/(2Mln(e+ϵg2/2Mδ))subscriptitalic-ϵ1subscriptitalic-ϵ2subscriptitalic-ϵsubscript𝑔22𝑀𝑒subscriptitalic-ϵsubscript𝑔22𝑀𝛿\epsilon_{1}+\epsilon_{2}=\epsilon_{g_{2}}/\left(2\sqrt{M\ln\left(e+\epsilon_{% g_{2}}/2M\delta\right)}\right)italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_M roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT / 2 italic_M italic_δ ) end_ARG ). ∎

Appendix B Gaussian Mechanism (DP-APPR-GM)

We propose another DP APPR algorithm (DP-APPR-GM) based on the Gaussian mechanism (Dwork et al., 2014) and output perturbation. DP-APPR-GM utilizes a similar sampling and clip** strategy to limit the sensitivity of the APPR vector and directly adds Gaussian noise to each element to achieve DP. As shown in Algorithm 3, for each node v𝑣vitalic_v, we clip the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of its APPR vector 𝐩(v)subscript𝐩𝑣\mathbf{p}_{(v)}bold_p start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT (line 6) and add the calibrated Gaussian noise to each element in the clipped 𝐩(v)subscript𝐩𝑣\mathbf{p}_{(v)}bold_p start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT (line 8). We then select the top-K𝐾Kitalic_K largest entries in 𝐩~(v)subscript~𝐩𝑣\tilde{\mathbf{p}}_{(v)}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT to get a sparse vector 𝐩~(v)superscriptsubscript~𝐩𝑣\tilde{\mathbf{p}}_{(v)}^{\prime}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (line 10).

Privacy Analysis of DP-APPR-GM. Using the properties of the Gaussian mechanism and the optimal composition theorem (Kairouz et al., 2017), we establish the overall privacy guarantee for the DP-APPR-GM algorithm. Note that the DP guarantee is independent of K𝐾Kitalic_K, in contrast with DP-APPR-EM.

Theorem 1 ().

Let ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 and δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ], Algorithm 3 is (ϵg,2Mδ)subscriptitalic-ϵ𝑔2𝑀𝛿(\epsilon_{g},2M\delta)( italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , 2 italic_M italic_δ )-differentially private where ϵ=ϵg/(2Mln(e+ϵg/2Mδ))italic-ϵsubscriptitalic-ϵ𝑔2𝑀𝑒subscriptitalic-ϵ𝑔2𝑀𝛿\epsilon=\epsilon_{g}/\left(2\sqrt{M\ln\left(e+\epsilon_{g}/2M\delta\right)}\right)italic_ϵ = italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_M roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / 2 italic_M italic_δ ) end_ARG ).

Proof.

We utilize the optimal composition theorem in (Kairouz et al., 2017) which argues that for k𝑘kitalic_k sub-mechanisms, each with an (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP guarantee, the overall privacy guarantee is (ϵg,δg)subscriptitalic-ϵ𝑔subscript𝛿𝑔(\epsilon_{g},\delta_{g})( italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT )-DP, where ϵ=ϵg/(2kln(e+ϵg/δg))italic-ϵsubscriptitalic-ϵ𝑔2𝑘𝑒subscriptitalic-ϵ𝑔subscript𝛿𝑔\epsilon=\epsilon_{g}/(2\sqrt{k\ln(e+\epsilon_{g}/\delta_{g})})italic_ϵ = italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / ( 2 square-root start_ARG italic_k roman_ln ( italic_e + italic_ϵ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) end_ARG ) and δ=δg/2k𝛿subscript𝛿𝑔2𝑘\delta=\delta_{g}/2kitalic_δ = italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT / 2 italic_k. In Algorithm 3, the noisy APPR vector for each node satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP by the Gaussian mechanism independently. Since the returned APPR matrix contains the noisy APPR vectors of M𝑀Mitalic_M nodes, the number of components for composition is M𝑀Mitalic_M. We substitute M𝑀Mitalic_M for k and 2Mδ2𝑀𝛿2M\delta2 italic_M italic_δ for δgsubscript𝛿𝑔\delta_{g}italic_δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, which can conclude the proof. ∎

Appendix C Proof for Theorem 3

Proof.

Denote μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT the Gaussian distribution with mean 00 and variance 1111. Assume 𝔻superscript𝔻\mathbb{D}^{\prime}blackboard_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the neighboring feature dataset of 𝔻𝔻\mathbb{D}blackboard_D, which differs at isuperscript𝑖i^{\dagger}italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT such that 𝐱i𝐱isuperscriptsubscript𝐱superscript𝑖subscript𝐱superscript𝑖\mathbf{x}_{i^{\dagger}}^{\prime}\neq\mathbf{x}_{i^{\dagger}}bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Without loss of generality, we assume f(𝐱i)=𝟎𝑓subscript𝐱𝑖0\nabla f(\mathbf{x}_{i})=\bm{0}∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_0, for any 𝐱i𝔻subscript𝐱𝑖𝔻\mathbf{x}_{i}\in\mathbb{D}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_D, while f(𝐱i)=𝒆1𝑓superscriptsubscript𝐱superscript𝑖subscript𝒆1\nabla f(\mathbf{x}_{i^{\dagger}}^{\prime})=\bm{e}_{1}∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Recall that the DP-APPR matrix is 𝚷𝚷\bm{\Pi}bold_Π, where 𝚷i:subscript𝚷:𝑖absent\bm{\Pi}_{i:}bold_Π start_POSTSUBSCRIPT italic_i : end_POSTSUBSCRIPT is the i𝑖iitalic_i-th row and the DP-APPR vector for node i𝑖iitalic_i, while 𝚷:jsubscript𝚷:absent𝑗\bm{\Pi}_{:j}bold_Π start_POSTSUBSCRIPT : italic_j end_POSTSUBSCRIPT is the j𝑗jitalic_j-th column of 𝚷𝚷\bm{\Pi}bold_Π. In addition, we can assume that 𝚷:j1τsubscriptnormsubscript𝚷:absent𝑗1𝜏\|\bm{\Pi}_{:j}\|_{1}\leq\tau∥ bold_Π start_POSTSUBSCRIPT : italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_τ due to the clip** in line 3, for all j=1,,N𝑗1𝑁j=1,\dots,Nitalic_j = 1 , … , italic_N, and denote μτsubscript𝜇𝜏\mu_{\tau}italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT the Gaussian distribution with mean τ𝜏\tauitalic_τ and variance 1. Then, we have 𝔼[𝒢(𝔻)]𝔼delimited-[]𝒢𝔻\mathbb{E}[\mathcal{G}(\mathbb{D})]blackboard_E [ caligraphic_G ( blackboard_D ) ] and 𝔼[𝒢(𝔻)]𝔼delimited-[]𝒢superscript𝔻\mathbb{E}\left[\mathcal{G}\left(\mathbb{D}^{\prime}\right)\right]blackboard_E [ caligraphic_G ( blackboard_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] below,

(5)

𝔼[𝒢(𝔻)]=[||Nji,j𝒩(i)Gj]+[||Nji,j𝒩(i)Gj]+[||NGi]=[||Nji,j𝒩(i)k𝒩(j)𝚷jkf(𝐱k)]+[||Nji,j𝒩(i)(k𝒩(j)\i𝚷jkf(𝐱k)+𝚷jif(𝐱i))]+[||N(k𝒩(i)\i𝚷ikf(𝐱k)+𝚷iif(𝐱i))],𝔼delimited-[]𝒢𝔻delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝐺𝑗delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝐺𝑗delimited-[]𝑁subscript𝐺𝑖delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝑘𝒩𝑗subscript𝚷𝑗𝑘𝑓subscript𝐱𝑘delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝑘\𝒩𝑗superscript𝑖subscript𝚷𝑗𝑘𝑓subscript𝐱𝑘subscript𝚷𝑗superscript𝑖𝑓subscript𝐱superscript𝑖delimited-[]𝑁subscript𝑘\𝒩superscript𝑖superscript𝑖subscript𝚷superscript𝑖𝑘𝑓subscript𝐱𝑘subscript𝚷superscript𝑖superscript𝑖𝑓subscript𝐱superscript𝑖\begin{split}\mathbb{E}[\mathcal{G}(\mathbb{D})]=[\frac{|\mathcal{B}|}{N}\sum_% {j\neq i^{\dagger},j\notin\mathcal{N}\left(i^{\dagger}\right)}G_{j}]+[\frac{|% \mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\in\mathcal{N}\left(i^{\dagger}\right% )}G_{j}]+[\frac{|\mathcal{B}|}{N}G_{i}]\\ =[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\notin\mathcal{N}\left(i^{% \dagger}\right)}\sum_{k\in\mathcal{N}(j)}\bm{\Pi}_{jk}\nabla f\left(\mathbf{x}% _{k}\right)]\\ +[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\in\mathcal{N}\left(i^{% \dagger}\right)}\left(\sum_{k\in\mathcal{N}(j)\backslash i^{\dagger}}\bm{\Pi}_% {jk}\nabla f\left(\mathbf{x}_{k}\right)+\bm{\Pi}_{ji^{\dagger}}\nabla f\left(% \mathbf{x}_{i^{\dagger}}\right)\right)]\\ +[\frac{|\mathcal{B}|}{N}\left(\sum_{k\in\mathcal{N}\left(i^{\dagger}\right)% \backslash i^{\dagger}}\bm{\Pi}_{i^{\dagger}k}\nabla f\left(\mathbf{x}_{k}% \right)+\bm{\Pi}_{i^{\dagger}i^{\dagger}}\nabla f\left(\mathbf{x}_{i^{\dagger}% }\right)\right)],\end{split}start_ROW start_CELL blackboard_E [ caligraphic_G ( blackboard_D ) ] = [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∉ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL = [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∉ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_j ) end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_j ) \ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + bold_Π start_POSTSUBSCRIPT italic_j italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ] end_CELL end_ROW start_ROW start_CELL + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) \ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + bold_Π start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ] , end_CELL end_ROW

which indicates 𝒢(𝔻)μ0similar-to𝒢𝔻subscript𝜇0\mathcal{G}(\mathbb{D})\sim\mu_{0}caligraphic_G ( blackboard_D ) ∼ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

(6)

𝔼[𝒢(𝔻)]=[||Nji,j𝒩(i)Gj]+[||Nji,j𝒩(i)Gj]+[||NGi]=[||Nji,j𝒩(i)k𝒩(j)𝚷jkf(𝐱k)]+[||Nji,j𝒩(i)(k𝒩(j)\i𝚷jkf(𝐱k)+𝚷jif(𝐱i))]+[||N(k𝒩(i)\i𝚷ikf(𝐱k)+𝚷iif(𝐱i))]=𝔼[𝒢(𝔻)]+||Nj=1N𝚷ji(f(𝐱i)f(𝐱i))=𝔼[𝒢(𝔻)]+||N𝚷:i1𝔼[𝒢(𝔻)]+||Nτ,𝔼delimited-[]𝒢superscript𝔻delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝐺𝑗delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖superscriptsubscript𝐺𝑗delimited-[]𝑁superscriptsubscript𝐺𝑖delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝑘𝒩𝑗subscript𝚷𝑗𝑘𝑓subscript𝐱𝑘delimited-[]𝑁subscriptformulae-sequence𝑗superscript𝑖𝑗𝒩superscript𝑖subscript𝑘\𝒩𝑗superscript𝑖subscript𝚷𝑗𝑘𝑓subscript𝐱𝑘subscript𝚷𝑗𝑖𝑓superscriptsubscript𝐱superscript𝑖delimited-[]𝑁subscript𝑘\𝒩superscript𝑖superscript𝑖subscript𝚷superscript𝑖𝑘𝑓subscript𝐱𝑘subscript𝚷superscript𝑖superscript𝑖𝑓superscriptsubscript𝐱superscript𝑖𝔼delimited-[]𝒢𝔻𝑁superscriptsubscript𝑗1𝑁subscript𝚷𝑗superscript𝑖𝑓superscriptsubscript𝐱superscript𝑖𝑓subscript𝐱superscript𝑖𝔼delimited-[]𝒢𝔻𝑁subscriptdelimited-∥∥subscript𝚷:absentsuperscript𝑖1𝔼delimited-[]𝒢𝔻𝑁𝜏\begin{split}\mathbb{E}\left[\mathcal{G}\left(\mathbb{D}^{\prime}\right)\right% ]=[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\notin\mathcal{N}\left(i^{% \dagger}\right)}G_{j}]+[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\in% \mathcal{N}\left(i^{\dagger}\right)}G_{j}^{\prime}]+[\frac{|\mathcal{B}|}{N}G_% {i}^{\prime}]\\ =[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\notin\mathcal{N}\left(i^{% \dagger}\right)}\sum_{k\in\mathcal{N}(j)}\bm{\Pi}_{jk}\nabla f\left(\mathbf{x}% _{k}\right)]\\ +[\frac{|\mathcal{B}|}{N}\sum_{j\neq i^{\dagger},j\in\mathcal{N}\left(i^{% \dagger}\right)}\left(\sum_{k\in\mathcal{N}(j)\backslash i^{\dagger}}\bm{\Pi}_% {jk}\nabla f\left(\mathbf{x}_{k}\right)+\bm{\Pi}_{ji}\nabla f\left(\mathbf{x}_% {i^{\dagger}}^{\prime}\right)\right)]\\ +[\frac{|\mathcal{B}|}{N}\left(\sum_{k\in\mathcal{N}\left(i^{\dagger}\right)% \backslash i^{\dagger}}\bm{\Pi}_{i^{\dagger}k}\nabla f\left(\mathbf{x}_{k}% \right)+\bm{\Pi}_{i^{\dagger}i^{\dagger}}\nabla f\left(\mathbf{x}_{i^{\dagger}% }^{\prime}\right)\right)]\\ =\mathbb{E}[\mathcal{G}(\mathbb{D})]+\frac{|\mathcal{B}|}{N}\sum_{j=1}^{N}\bm{% \Pi}_{ji^{\dagger}}\left(f\left(\mathbf{x}_{i^{\dagger}}^{\prime}\right)-f% \left(\mathbf{x}_{i^{\dagger}}\right)\right)\\ =\mathbb{E}[\mathcal{G}(\mathbb{D})]+\frac{|\mathcal{B}|}{N}\left\|\bm{\Pi}_{:% i^{\dagger}}\right\|_{1}\leq\mathbb{E}[\mathcal{G}(\mathbb{D})]+\frac{|% \mathcal{B}|}{N}\tau,\\ \end{split}start_ROW start_CELL blackboard_E [ caligraphic_G ( blackboard_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] = [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∉ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL = [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∉ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_j ) end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_j ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_j ) \ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + bold_Π start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ] end_CELL end_ROW start_ROW start_CELL + [ divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N ( italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) \ italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_Π start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_k end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + bold_Π start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∇ italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ] end_CELL end_ROW start_ROW start_CELL = blackboard_E [ caligraphic_G ( blackboard_D ) ] + divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_Π start_POSTSUBSCRIPT italic_j italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL = blackboard_E [ caligraphic_G ( blackboard_D ) ] + divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG ∥ bold_Π start_POSTSUBSCRIPT : italic_i start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ blackboard_E [ caligraphic_G ( blackboard_D ) ] + divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG italic_τ , end_CELL end_ROW

which indicates 𝒢(𝔻)μ0+||Nμτsimilar-to𝒢superscript𝔻subscript𝜇0𝑁subscript𝜇𝜏\mathcal{G}\left(\mathbb{D}^{\prime}\right)\sim\mu_{0}+\frac{|\mathcal{B}|}{N}% \mu_{\tau}caligraphic_G ( blackboard_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∼ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG | caligraphic_B | end_ARG start_ARG italic_N end_ARG italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT.

In the following, we quantify the divergence between 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by following the moments accountant (Abadi et al., 2016), where we show that 𝔼[(μ(z)μ0(z))λ]α,𝔼delimited-[]superscript𝜇𝑧subscript𝜇0𝑧𝜆𝛼\mathbb{E}\left[\left(\frac{\mu(z)}{\mu_{0}(z)}\right)^{\lambda}\right]\leq\alpha,blackboard_E [ ( divide start_ARG italic_μ ( italic_z ) end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ] ≤ italic_α , and 𝔼[(μ0(z)μ(z))λ]α,𝔼delimited-[]superscriptsubscript𝜇0𝑧𝜇𝑧𝜆𝛼\mathbb{E}\left[\left(\frac{\mu_{0}(z)}{\mu(z)}\right)^{\lambda}\right]\leq\alpha,blackboard_E [ ( divide start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_μ ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ] ≤ italic_α , for some explicit α𝛼\alphaitalic_α. To do so, the following is to be bounded for v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

(7)

𝔼zv0[(v0(z)v1(z))λ]=𝔼zv1[(v1(z)v0(z))λ+1]subscript𝔼similar-to𝑧subscript𝑣0delimited-[]superscriptsubscript𝑣0𝑧subscript𝑣1𝑧𝜆subscript𝔼similar-to𝑧subscript𝑣1delimited-[]superscriptsubscript𝑣1𝑧subscript𝑣0𝑧𝜆1\mathbb{E}_{z\sim v_{0}}\left[\left(\frac{v_{0}(z)}{v_{1}(z)}\right)^{\lambda}% \right]=\mathbb{E}_{z\sim v_{1}}\left[\left(\frac{v_{1}(z)}{v_{0}(z)}\right)^{% \lambda+1}\right]blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( divide start_ARG italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( divide start_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT ]

Following (Abadi et al., 2016), the above can be expanded with binomial expansion, which gives

(8)

𝔼zv1[(v1(z)v0(z))λ+1]=t=0λ+1(λ+1)𝔼zv1[(v0v1(z)v1(z))t]=1+0+T3+T4+subscript𝔼similar-to𝑧subscript𝑣1delimited-[]superscriptsubscript𝑣1𝑧subscript𝑣0𝑧𝜆1superscriptsubscript𝑡0𝜆1𝜆1subscript𝔼similar-to𝑧subscript𝑣1delimited-[]superscriptsubscript𝑣0subscript𝑣1𝑧subscript𝑣1𝑧𝑡absent10subscript𝑇3subscript𝑇4\begin{array}[]{l}\mathbb{E}_{z\sim v_{1}}\left[\left(\frac{v_{1}(z)}{v_{0}(z)% }\right)^{\lambda+1}\right]=\sum_{t=0}^{\lambda+1}(\lambda+1)\mathbb{E}_{z\sim v% _{1}}\left[\left(\frac{v_{0}-v_{1}(z)}{v_{1}(z)}\right)^{t}\right]\\ =1+0+T_{3}+T_{4}+\ldots\end{array}start_ARRAY start_ROW start_CELL blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( divide start_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT ( italic_λ + 1 ) blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( divide start_ARG italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL = 1 + 0 + italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_T start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT + … end_CELL end_ROW end_ARRAY

Next, we bound T3subscript𝑇3T_{3}italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT by substituting the pairs of v0=μ0,v1=μformulae-sequencesubscript𝑣0subscript𝜇0subscript𝑣1𝜇v_{0}=\mu_{0},v_{1}=\muitalic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_μ and v0=μ,v1=μ0formulae-sequencesubscript𝑣0𝜇subscript𝑣1subscript𝜇0v_{0}=\mu,v_{1}=\mu_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_μ , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in, and upper bound them, respectively.

For T3subscript𝑇3T_{3}italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, with v0=μ0,v1=μformulae-sequencesubscript𝑣0subscript𝜇0subscript𝑣1𝜇v_{0}=\mu_{0},v_{1}=\muitalic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_μ, we have

(9)

T3=(λ+1)λ2𝔼zμ[(μ0(z)μ(z)μ(z))2]=(λ+1)λ2𝔼zμ[(qμτ(z)μ(z))2]=q2(λ+1)λ2+(μτ(z))2μ0(z)+qμτ(z)𝑑zq2(λ+1)λ2+(μτ(z))2μ0(z)𝑑z=q2(λ+1)λ2𝔼zμ0[(μτ(z)μ0(z))2]=q2(λ+1)λ2exp(τ2σ2)q2(λ+1)λ2(τ2σ2+1)q2τ2(λ+1)λσ2,subscript𝑇3absent𝜆1𝜆2subscript𝔼similar-to𝑧𝜇delimited-[]superscriptsubscript𝜇0𝑧𝜇𝑧𝜇𝑧2𝜆1𝜆2subscript𝔼similar-to𝑧𝜇delimited-[]superscript𝑞subscript𝜇𝜏𝑧𝜇𝑧2missing-subexpressionabsentsuperscript𝑞2𝜆1𝜆2superscriptsubscriptsuperscriptsubscript𝜇𝜏𝑧2subscript𝜇0𝑧𝑞subscript𝜇𝜏𝑧differential-d𝑧superscript𝑞2𝜆1𝜆2superscriptsubscriptsuperscriptsubscript𝜇𝜏𝑧2subscript𝜇0𝑧differential-d𝑧missing-subexpressionabsentsuperscript𝑞2𝜆1𝜆2subscript𝔼similar-to𝑧subscript𝜇0delimited-[]superscriptsubscript𝜇𝜏𝑧subscript𝜇0𝑧2superscript𝑞2𝜆1𝜆2superscript𝜏2superscript𝜎2missing-subexpressionabsentsuperscript𝑞2𝜆1𝜆2superscript𝜏2superscript𝜎21superscript𝑞2superscript𝜏2𝜆1𝜆superscript𝜎2\begin{aligned} T_{3}&=\frac{(\lambda+1)\lambda}{2}\mathbb{E}_{z\sim\mu}\left[% \left(\frac{\mu_{0}(z)-\mu(z)}{\mu(z)}\right)^{2}\right]=\frac{(\lambda+1)% \lambda}{2}\mathbb{E}_{z\sim\mu}\left[\left(\frac{q\mu_{\tau}(z)}{\mu(z)}% \right)^{2}\right]\\ &=\frac{q^{2}(\lambda+1)\lambda}{2}\int_{-\infty}^{+\infty}\frac{\left(\mu_{% \tau}(z)\right)^{2}}{\mu_{0}(z)+q\mu_{\tau}(z)}dz\leq\frac{q^{2}(\lambda+1)% \lambda}{2}\int_{-\infty}^{+\infty}\frac{\left(\mu_{\tau}(z)\right)^{2}}{\mu_{% 0}(z)}dz\\ &=\frac{q^{2}(\lambda+1)\lambda}{2}\mathbb{E}_{z\sim\mu_{0}}\left[\left(\frac{% \mu_{\tau}(z)}{\mu_{0}(z)}\right)^{2}\right]=\frac{q^{2}(\lambda+1)\lambda}{2}% \exp\left(\frac{\tau^{2}}{\sigma^{2}}\right)\\ &\leq\frac{q^{2}(\lambda+1)\lambda}{2}\left(\frac{\tau^{2}}{\sigma^{2}}+1% \right)\leq\frac{q^{2}\tau^{2}(\lambda+1)\lambda}{\sigma^{2}},\end{aligned}start_ROW start_CELL italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_μ end_POSTSUBSCRIPT [ ( divide start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) - italic_μ ( italic_z ) end_ARG start_ARG italic_μ ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = divide start_ARG ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_μ end_POSTSUBSCRIPT [ ( divide start_ARG italic_q italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_μ ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT divide start_ARG ( italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) + italic_q italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) end_ARG italic_d italic_z ≤ divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT divide start_ARG ( italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG italic_d italic_z end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG blackboard_E start_POSTSUBSCRIPT italic_z ∼ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( divide start_ARG italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_z ) end_ARG start_ARG italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG roman_exp ( divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 ) ≤ divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW

where in the last inequality, we assume τ2σ2+12τ2σ2superscript𝜏2superscript𝜎212superscript𝜏2superscript𝜎2\frac{\tau^{2}}{\sigma^{2}}+1\leq 2\frac{\tau^{2}}{\sigma^{2}}divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 ≤ 2 divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, i.e., τ2σ21superscript𝜏2superscript𝜎21\frac{\tau^{2}}{\sigma^{2}}\geq 1divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≥ 1. Thus, it requires στ𝜎𝜏\sigma\leq\tauitalic_σ ≤ italic_τ.

As a result,

(10) α𝒢(λ)q2τ2(λ+1)λσ2+O(q3λ3/σ3).subscript𝛼𝒢𝜆superscript𝑞2superscript𝜏2𝜆1𝜆superscript𝜎2𝑂superscript𝑞3superscript𝜆3superscript𝜎3\alpha_{\mathcal{G}}(\lambda)\leq\frac{q^{2}\tau^{2}(\lambda+1)\lambda}{\sigma% ^{2}}+O\left(q^{3}\lambda^{3}/\sigma^{3}\right).italic_α start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ( italic_λ ) ≤ divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_λ + 1 ) italic_λ end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_q start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) .

To satisfy Tq2τ2λ2σ2λϵsgd2,𝑇superscript𝑞2superscript𝜏2superscript𝜆2superscript𝜎2𝜆subscriptitalic-ϵ𝑠𝑔𝑑2T\frac{q^{2}\tau^{2}\lambda^{2}}{\sigma^{2}}\leq\frac{\lambda\epsilon_{sgd}}{2},italic_T divide start_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_λ italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , and exp(λϵsgd2)δsgd,𝜆subscriptitalic-ϵ𝑠𝑔𝑑2subscript𝛿𝑠𝑔𝑑\exp\left(-\frac{\lambda\epsilon_{sgd}}{2}\right)\leq\delta_{sgd},roman_exp ( - divide start_ARG italic_λ italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ≤ italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT , we set

(11) ϵsgd=c1q2τ2T,subscriptitalic-ϵ𝑠𝑔𝑑subscript𝑐1superscript𝑞2superscript𝜏2𝑇\epsilon_{sgd}=c_{1}q^{2}\tau^{2}T,italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T ,
(12) σ=c2qτTlog(1/δsgd)ϵsgd.𝜎subscript𝑐2𝑞𝜏𝑇1subscript𝛿𝑠𝑔𝑑subscriptitalic-ϵ𝑠𝑔𝑑\sigma=c_{2}\frac{q\tau\sqrt{T\log(1/\delta_{sgd})}}{\epsilon_{sgd}}.italic_σ = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_q italic_τ square-root start_ARG italic_T roman_log ( 1 / italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT ) end_ARG end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT end_ARG .

Given that the input DP-APPR matrix costs additional (ϵpr,δpr)subscriptitalic-ϵ𝑝𝑟subscript𝛿𝑝𝑟(\epsilon_{pr},\delta_{pr})( italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT ) privacy budget, by using the standard composition theorem of DP, the total privacy budget for the sampled graph G𝐺Gitalic_G is (ϵsgd+ϵpr,δsgd+δpr)subscriptitalic-ϵ𝑠𝑔𝑑subscriptitalic-ϵ𝑝𝑟subscript𝛿𝑠𝑔𝑑subscript𝛿𝑝𝑟(\epsilon_{sgd}+\epsilon_{pr},\delta_{sgd}+\delta_{pr})( italic_ϵ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_s italic_g italic_d end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT ). Since G𝐺Gitalic_G is randomly sampled from the graph dataset G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG, we can conclude the proof with the privacy amplification theorem of DP (Kasiviswanathan et al., 2011; Beimel et al., 2014). ∎

Input: ISTA hyperparameters: γ,α,ρ𝛾𝛼𝜌\gamma,\alpha,\rhoitalic_γ , italic_α , italic_ρ; privacy parameters: ϵitalic-ϵ\epsilonitalic_ϵ, δ𝛿\deltaitalic_δ; clip bound C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, a graph (V,E)𝑉𝐸(V,E)( italic_V , italic_E ) where V={v1,,vN}𝑉subscript𝑣1subscript𝑣𝑁V=\{v_{1},...,v_{N}\}italic_V = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }, an integer K>0𝐾0K>0italic_K > 0 and an integer M[1,N]𝑀1𝑁M\in[1,N]italic_M ∈ [ 1 , italic_N ].
1 Initialize the APPR matrix 𝚷M×N𝚷superscript𝑀𝑁\bm{\Pi}\in\mathbb{R}^{M\times N}bold_Π ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT with all zeros.
2 for i=1,,M𝑖1normal-…𝑀i=1,...,Mitalic_i = 1 , … , italic_M do
3       Compute APPR Vector:
4       Compute the APPR vector 𝐩(vi)subscript𝐩subscript𝑣𝑖\mathbf{p}_{(v_{i})}bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT for node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT using ISTA;
5       Clip Norm:
6       𝐩^(vi)𝐩(vi)/max(1,𝐩(vi)2C1)subscript^𝐩subscript𝑣𝑖subscript𝐩subscript𝑣𝑖1subscriptnormsubscript𝐩subscript𝑣𝑖2subscript𝐶1\hat{\mathbf{p}}_{(v_{i})}\leftarrow\mathbf{p}_{(v_{i})}/\max\left(1,\frac{\|% \mathbf{p}_{(v_{i})}\|_{2}}{C_{1}}\right)over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ← bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT / roman_max ( 1 , divide start_ARG ∥ bold_p start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) ;
7       Add Noise:
8       𝐩~(vi)𝐩^(vi)+𝒩(0,σ2𝐈)subscript~𝐩subscript𝑣𝑖subscript^𝐩subscript𝑣𝑖𝒩0superscript𝜎2𝐈\tilde{\mathbf{p}}_{(v_{i})}\leftarrow\hat{\mathbf{p}}_{(v_{i})}+\mathcal{N}(0% ,\sigma^{2}\mathbf{I})over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ← over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ), where σ=2ln(1.25/δ)C1/ϵ𝜎21.25𝛿subscript𝐶1italic-ϵ\sigma=\sqrt{2\ln(1.25/\delta)}C_{1}/\epsilonitalic_σ = square-root start_ARG 2 roman_ln ( 1.25 / italic_δ ) end_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_ϵ;
9       Sparsification:
10       𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖absent\tilde{\mathbf{p}}_{(v_{i})}^{\prime}\leftarrowover~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ←: select the top K𝐾Kitalic_K largest entries in 𝐩~(vi)subscript~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}_{(v_{i})}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT by setting all other entries with small values to zero.
11       Replace the i𝑖iitalic_i-th row of 𝚷𝚷\bm{\Pi}bold_Π with 𝐩~(vi)superscriptsubscript~𝐩subscript𝑣𝑖\tilde{\mathbf{p}}_{(v_{i})}^{\prime}over~ start_ARG bold_p end_ARG start_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.
12      
13 end for
return 𝚷𝚷\bm{\Pi}bold_Π and compute the overall privacy cost using the optimal composition theorem.
Algorithm 3 DP-APPR using the Gaussian Mechanism (DP-APPR-GM)

Appendix D Datasets

We evaluate our method on five graph datasets: Cora-ML (Bojchevski and Günnemann, 2018) which consists of academic research papers from various machine learning conferences and their citation relationships, Microsoft Academic graph (Shchur et al., 2018) which contains scholarly data from various sources and the relationships between them, CS and Physics (Shchur et al., 2018) which are co-authorship graphs, Reddit (Hamilton et al., 2017) which is constructed from Reddit posts, where edges represent connections between posts when the same user commented on both. Table 2 shows the statistics of the five datasets.

Table 2. Dataset statistics
Dataset Cora-ML MS Academic CS Reddit Physics
Classes 7 15 15 8 8
Features 2,879 6,805 6,805 602 8,415
Nodes 2,995 18,333 18,333 116,713 34,493
Edges 8,416 81,894 327,576 46,233,380 495,924

Appendix E Illustration of Privacy Protection

To provide an intuitive illustration of the privacy protection provided by the DP trained models using our methods, we visualize the t-SNE clustering of training nodes’ embeddings generated by the private models with varying ϵitalic-ϵ\epsilonitalic_ϵ values in Figure 9 for the Cora-ML dataset. We omit the results for other datasets as they display a similar pattern leading to the same conclusion. The color of each node corresponds to the label of the node. We can observe that when the privacy budget is small (ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1), the model achieves strong privacy protection, thus it becomes hard to distinguish the training nodes belonging to different classes from each other. Meanwhile, when the privacy guarantee becomes weak (ϵitalic-ϵ\epsilonitalic_ϵ becomes larger), embeddings of nodes with the same class label are less obfuscated, hence gradually forming a cluster. This observation demonstrates that the privacy budget used in our proposed methods is correlated with the model’s ability to generate private node embeddings, and therefore also associated with the privacy protection effectiveness against adversaries utilizing the generated embeddings to carry out privacy attacks (Fredrikson et al., 2015; Li et al., 2020).

Refer to caption
(a) DPAR-GM, ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1
Refer to caption
(b) DPAR-GM, ϵ=4italic-ϵ4\epsilon=4italic_ϵ = 4
Refer to caption
(c) DPAR-GM, ϵ=16italic-ϵ16\epsilon=16italic_ϵ = 16
Refer to caption
(d) DPAR-EM1, ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1
Refer to caption
(e) DPAR-EM1, ϵ=4italic-ϵ4\epsilon=4italic_ϵ = 4
Refer to caption
(f) DPAR-EM1, ϵ=16italic-ϵ16\epsilon=16italic_ϵ = 16
Figure 9. Cora-ML. Clustering of training nodes’ embeddings generated by private models with different privacy guarantees ϵitalic-ϵ\epsilonitalic_ϵ (fixed δ=2×103𝛿2superscript103\delta=2\times 10^{-3}italic_δ = 2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) and training methods.

Appendix F More Results on Effects of Privacy Parameters

Batch Size in DP-SGD (B𝐵Bitalic_B). Figure 10 shows batch size impact on model test accuracy. According to Theorem 3, with fixed privacy budget and epochs, Gaussian noise’s standard deviation scales with the batch size’s square root, increasing gradient noise for larger batches. However, larger batches may provide more accurate updates by encompassing more nodes and correlations. Thus, the curve remains relatively flat for batch sizes not too small.

Clip** Bound in DP-SGD (C𝐶Citalic_C). Figure 11 shows the effect of gradient norm clip** bound C𝐶Citalic_C in DP-SGD on the model’s test accuracy. The clip** bound affects the noise scale added to the gradients (linearly) as well as the optimization direction of model parameters. A large clip** bound may involve too much noise to the gradients, while a small clip** bound may undermine gradients’ ability for unbiased estimation. The result verifies this phenomenon. We use C𝐶Citalic_C= 1 for all datasets in our experiments.

Number of Nodes in DP-APPR (M𝑀Mitalic_M). During the DP-APPR algorithm, a subset of M𝑀Mitalic_M nodes is randomly sampled from the input training graph. Figure 12 illustrates the relationship between M𝑀Mitalic_M and test accuracy under different total privacy budgets (ϵitalic-ϵ\epsilonitalic_ϵ=1 and ϵitalic-ϵ\epsilonitalic_ϵ=8, with δ=2×103𝛿2superscript103\delta=2\times 10^{-3}italic_δ = 2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). As M𝑀Mitalic_M increases, the privacy budget allocated for calculating each DP-APPR vector decreases. This leads to more noise in each DP-APPR vector, which can adversely affect its utility and result in lower accuracy as observed. However, too small of an M𝑀Mitalic_M will degrade the performance since it will not contain enough information about the graph structure. In our experiments, we set M𝑀Mitalic_M = 70 for all datasets.

Refer to caption
Refer to caption
Figure 10. Cora-ML. Batch size vs. model test accuracy. Fix total privacy budget (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ ) = (8, 2×1032superscript1032\times 10^{-3}2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). K=4 (left), K=16 (right)

Appendix G Generalization to Various Types of Graphs

DPAR proposed in this paper focuses on homogeneous graphs, including both homophilous and non-homophilous graphs, and can be applied in various domains such as social networks, recommendation systems, knowledge graphs, drug discovery, and traffic network analysis. Additionally, DPAR holds the potential for generalization to diverse graph types, including dynamic graphs, heterogeneous graphs, and those with high-dimensional features. For instance, in dynamic graphs, DPAR’s decoupling strategy is well-regarded for its efficiency in addressing the high computational complexity often encountered in dynamic graph learning (Li et al., 2023; Hou et al., 2023). Consequently, we can adapt the existing framework of DPAR by integrating established temporal differential privacy mechanisms (Lv et al., 2021; Liu et al., 2022), which effectively manage specific challenges like temporal correlations among identical nodes across varying graph snapshots. In the context of heterogeneous graphs, prior research (Lv et al., 2021) demonstrates that homogeneous GNNs, like GCN and GAT, can process heterogeneous graphs by simply disregarding node and edge types. This finding suggests that extending DPAR to accommodate heterogeneous graphs, while concurrently implementing additional privacy safeguards for type information during type embedding learning, could yield favorable outcomes.

Appendix H Complexity of DPAR

DPAR has linear computational complexity corresponding to the number of nodes and the node feature dimension. We elaborate as follows. In DP-APPR (Algorithm 1 and Algorithm 3), we calculate the APPR vector using ISTA (Fountoulakis et al., 2019). Based on Theorem 3 in (Fountoulakis et al., 2019), the time complexity of ISTA for calculating the APPR vector depends only on the number of non-zeros of the calculated APPR vector, unlike calculations based on the entire graph. For each APPR vector, the steps of clip** the norm, adding noise, and reporting noisy indexes have the worst-case time complexity that is linear to the number of nodes in the input graph. Since we calculate M𝑀Mitalic_M DP-APPR vectors, the overall time complexity for DP-APPR algorithms is O(MN)=O(N)(NM)𝑂𝑀𝑁𝑂𝑁much-greater-than𝑁𝑀O(MN)=O(N)(N\gg M)italic_O ( italic_M italic_N ) = italic_O ( italic_N ) ( italic_N ≫ italic_M ) (N𝑁Nitalic_N is the number of nodes), which indicates linear time complexity. In Algorithm 2, where we train the DP-GNN models using the node feature vectors and DP-APPR matrix, the model is a 2-layer MLP with each layer’s size equal to 32. Therefore, the time complexity for each iteration is mainly bounded by the node feature dimension D𝐷Ditalic_D (D𝐷Ditalic_D much-greater-than\gg 32). In conclusion, the overall time complexity for DPAR is O(N+D)𝑂𝑁𝐷O(N+D)italic_O ( italic_N + italic_D ), linearly related to the number of nodes and the node feature dimension.

Refer to caption
(a) K=4
Refer to caption
(b) K=16
Figure 11. Cora-ML. Relationship between clip** bound of DP-SGD and model test accuracy. Fix total privacy budget (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ ) = (8, 2×1032superscript1032\times 10^{-3}2 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT).
Refer to caption
Refer to caption
(a) ϵitalic-ϵ\epsilonitalic_ϵ = 1.0
Refer to caption
(b) ϵitalic-ϵ\epsilonitalic_ϵ = 8.0
Figure 12. Cora-ML: Relationship between the number of nodes M in DP-APPR vector calculation and model test accuracy. K = 2.