Too Good to be True? Turn Any Model Differentially Private With DP-Weights

(June 27, 2024)

Abstract

Imagine training a machine learning model with Differentially Private Stochastic Gradient Descent (DP-SGD), only to discover post-training that the noise level was either too high, crippling your model’s utility, or too low, compromising privacy. The dreaded realization hits: you must start the lengthy training process from scratch. But what if you could avoid this retraining nightmare? In this study, we introduce a groundbreaking approach (to our knowledge) that applies differential privacy noise to the model’s weights after training. We offer a comprehensive mathematical proof for this novel approach’s privacy bounds, use formal methods to validate its privacy guarantees, and empirically evaluate its effectiveness using membership inference attacks and performance evaluations. This method allows for a single training run, followed by post-hoc noise adjustments to achieve optimal privacy-utility trade-offs. We compare this novel fine-tuned model (DP-Weights model) to a traditional DP-SGD model, demonstrating that our approach yields statistically similar performance and privacy guarantees. Our results validate the efficacy of post-training noise application, promising significant time savings and flexibility in fine-tuning differential privacy parameters, making it a practical alternative for deploying differentially private models in real-world scenarios.

1 Introduction

Differential privacy is a critical component in protecting sensitive information in machine learning models. Traditional methods integrate noise during training, which can significantly impact time to deployment. We propose a new method that applies differentially private noise to the model’s weights after training, aiming to achieve similar privacy guarantees as DP-SGD with similar impacts on model performance. This study investigates the statistical properties of this novel fine-tuned approach (DP-Weights) and compares them with those of a conventional DP-SGD model. Comparisons are also made to a fine-tuned model without noise applied. Our approach can be particularly beneficial in fields requiring frequent model updates or where computational resources are limited, such as healthcare, finance, and personalized recommendations.

2 Related Work

Differentially Private Stochastic Gradient Descent (DP-SGD) has become essential for training models with privacy guarantees. Abadi et al. (2016) pioneered DP-SGD by incorporating noise into gradient updates and applying gradient clip**, providing a tradeoff for model performance while ensuring data privacy [1].

Recent advancements focus on enhancing DP-SGD efficiency and applicability. Yu et al. (2022) introduced parameter-efficient fine-tuning (PEFT) techniques to reduce computational costs and improve performance for large language models [2]. Li et al. (2022) demonstrated DP-SGD’s effectiveness in full fine-tuning for complex models and large datasets [3].

Alternative approaches include DP coordinate descent by Damaskinos et al. (2021), which reduces privacy cost by computing gradients of random coordinates during backpropagation [4]. Koskela et al. (2023) explored privacy amplification through shuffling, enhancing privacy guarantees by shuffling data before DP-SGD [5].

Studies have also explored post-training noise application. Nasr et al. (2018) proposed adding noise to the model’s weights post-training to achieve differential privacy, showing its effectiveness across various models and datasets [9]. Lyu et al. (2020) presented a framework for differentially private model publishing through noise injection into the model’s weights after training, balancing privacy and utility [10].

The DP-UTIL framework analyzes various DP perturbation techniques, highlighting that output perturbation results in lower utility loss [6]. Dong et al. (2022) introduced Gaussian Differential Privacy (GDP), providing a tighter privacy analysis using Gaussian noise [7]. Additionally, research on differentially private training with smooth sensitivity emphasizes optimizing privacy-utility trade-offs [8].

Despite these advancements, there remains a need for a practical approach to applying differential privacy post-training with formal privacy guarantees and empirical validation through robust methods like membership inference attacks. Our work addresses this gap by providing a comprehensive mathematical proof of privacy bounds for post-training noise injection, empirical evaluations using membership inference attacks, and strategies for optimizing privacy-utility trade-offs, promising significant time savings and flexibility in fine-tuning differential privacy parameters.

3 Methodology

3.1 Noise Scale Calculation

The noise scale for applying noise to model weights post training is calculated using the following formula:

\sigma=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{E\cdot\eta% \cdot C}{\epsilon\cdot N\cdot B}+\frac{0.009760}{\epsilon^{0.078008}}

(1)

where $N$ is the dataset size, $\delta=\frac{1}{N^{2}}$ , $E$ is the number of epochs, $\eta$ is the learning rate, $C$ is the clip** norm, $\epsilon$ is the privacy parameter, $B$ is the batch size. Sufficient knowledge of the training process is vital to ensure proper application of noise.

The additive term in the noise scale was added to gradually ease the noise scale taper as epsilon approaches higher values, ensuring enhanced privacy protection. This refinement aims to maintain a comparative similarity between our new DP-Weights approach and DP-SGD. The specific values for the numerator and the exponent were derived by fitting a model that aligns perplexity scores with epsilon for DP-SGD, and correlates perplexity and noise scale values for our DP-Weights approach. This method ensures a smooth transition and consistent performance across different privacy budgets.

It is critical to note that the most a model’s weights are permitted to be changed during the training process is directly proportional to the learning rate times the clip** norm times the number of epochs a model saw the training data. The clip** norm binds the global sensitivity of the function, and we include the epoch term because a model’s weights may accumulate weight change over multiple training rounds. As such, we define the maximum global sensitivity of the weight update rule to be bound by:

\Delta W=\frac{E\cdot\eta\cdot C}{N\cdot B}

(2)

The sensitivity is scaled by the inverse of the size of the dataset and batch size. A formal proof of privacy guarantees is offered in the appendix.

3.2 Data and Experimental Setup

We conducted experiments using three primary GPT2 model configurations:

1.

DP Model: A traditional model trained with DP-SGD.
2.

DP-Weights Model: A fine-tuned model with differentially private noise applied post-training.
3.

Fine-Tuned Model: A model fine-tuned without any differential privacy noise.

We used the first 1,000 records from the Open Orca dataset for training and as members in our membership inference attack, and the second 1,000 records from the Open Orca dataset for evaluation as non-members in our membership inference attack.

3.3 Membership Inference Attack Methodology

To assess the vulnerability of each approach to membership inference attacks, we designed and conducted a series of experiments involving each previously noted GPT2 model. The following methodology outlines the steps taken to evaluate these models.

3.3.1 Experimental Setup

We used the GPT-2 language model as the base for all our experiments. The datasets used for training and evaluation were divided into member and non-member sets, representing data that the model was trained on and data it had not seen, respectively. The evaluation metrics focused on model perplexity and confidence scores, with subsequent analysis through membership inference attacks.

3.3.2 Evaluation Metrics

We evaluated the models based on the following metrics:

•

Perplexity: Perplexity scores were calculated for both member and non-member datasets.
•

Confidence Scores: For each dataset, we computed confidence scores using the softmax output of the model logits.
•

Membership Inference Metrics: We performed a membership inference attack by comparing the confidence scores of member and non-member datasets. The metrics used included ROC-AUC, accuracy, precision, recall, and F1-score.

3.3.3 Membership Inference Attack

The membership inference attack was performed as follows:

1.

Combine the confidence scores from the member and non-member datasets.
2.

Label the scores where 1 indicates a member and 0 indicates a non-member.
3.

Set a threshold as the midpoint between the mean confidence scores of the member and non-member datasets.
4.

Predict membership based on whether the confidence score is above or below the threshold.
5.

Calculate ROC-AUC, accuracy, precision, recall, and F1-score to evaluate the attack’s effectiveness.

3.4 Training Procedure

The training procedure involved multiple steps. Initially, models were trained on a designated member dataset, employing the calculated noise scales for differentially private models. The training was performed using a custom implementation of DP-SGD with gradient clip** and noise addition. For the DP-Weights model, differential privacy noise was applied post-training using the pre-computed noise scales.

3.5 Statistical Simulation For Validating Privacy Guarantees

Overview: This approach leverages statistical simulations to empirically validate differential privacy conditions by evaluating the noise mechanism across a spectrum of epsilon values.

Procedure:

Noise Scale Calculation: Define a function to compute the noise scale $\sigma$ . The function is based on differential privacy parameters $\epsilon$ and $\delta$ , number of epochs (E), learning rate ( $\eta$ ), clip** norm (C), dataset size (N), and batch size (B). The calculation incorporates both the standard Gaussian noise component and an empirical term to ensure adequate privacy protection.

\sigma=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{E\cdot\eta% \cdot C}{N\cdot B\cdot\epsilon}+\frac{0.009760}{\epsilon^{0.078008}}

(3)

2.

Noise Mechanism Simulation: Implement a function to add Gaussian noise to a given weight $w$ .

$\text{noisy}_{w}=w+\mathcal{N}(0,\sigma)$ (4)
3.

Privacy Condition Testing: Develop a function to simulate the noise mechanism for a specified number of samples. This function calculates the violation rate by comparing the logarithm of the ratio of the probability density functions (PDFs) of the noisy weights for adjacent datasets.

Algorithm 1 Privacy Condition Testing

1: $E$ , $\eta$ , $C$ , $N$ , $B$ , $\delta$ , $num\_samples$

2: $x\leftarrow 0$

3: $x^{\prime}\leftarrow x+\frac{E\cdot\eta\cdot C}{N\cdot B}$

4: $sigma\leftarrow\sqrt{2\cdot\log{\left(\frac{1.25}{\delta}\right)}}\cdot\frac{E% \cdot\eta\cdot C}{N\cdot B\cdot\epsilon}+\frac{0.009760}{\epsilon^{0.078008}}$

5: $violations\leftarrow 0$

6: $privacy\_losses\leftarrow[\ ]$

7:for $i=1$ to $num\_samples$ do

8:      $y\leftarrow x+\mathcal{N}(0,\sigma)$

9:      $privacy\_loss\leftarrow|\log{\left(\frac{\mathcal{N}(y;x,\sigma)}{\mathcal{N}(% y;x^{\prime},\sigma)}\right)}|$

10:     Append $privacy\_loss$ to $privacy\_losses$

11:     if $privacy\_loss>\epsilon$ then

12:          $violations\leftarrow violations+1$

13:     end if

14:end for

15: $violation\_rate\leftarrow\frac{violations}{num\_samples}$ return $violation\_rate,privacy\_losses$
4.

Experiment Execution: Perform experiments by loo** through a range of epsilon values to compute the violation rates. The results are then visualized using a logarithmic scale plot.

3.6 Formal Verification Approach for Validating Privacy Guarantees Using Z3

Overview: This approach employs formal verification techniques using the Z3 solver to mathematically validate the differential privacy conditions. It explores the impact of multiple compositions on privacy guarantees.

Procedure:

Define the Differential Privacy Condition: Utilize a Taylor series expansion to approximate the exponential function for the Gaussian noise PDF. Formulate the differential privacy condition as a Z3 constraint.

\text{pdf}_{w}=\frac{1}{\sqrt{2\pi\sigma^{2}}}\cdot\exp\left(-\frac{(x-w)^{2}}% {2\sigma^{2}}\right)

(5)

\text{pdf}_{w^{\prime}}=\frac{1}{\sqrt{2\pi\sigma^{2}}}\cdot\exp\left(-\frac{(% x-w^{\prime})^{2}}{2\sigma^{2}}\right)

(6)

\text{dp\_condition}=\left(\text{pdf}_{w}\leq\exp(\epsilon)\cdot\text{pdf}_{w^% {\prime}}+\delta\right)

(7)

Advanced Composition: Compute the composed $\epsilon$ and $\delta$ values after multiple compositions using advanced composition theorems.

\epsilon^{\prime}=\epsilon\cdot\sqrt{2\cdot k\cdot\log\left(\frac{1}{\delta}% \right)}

(8)

\delta^{\prime}=k\cdot\delta+\delta

(9)

Algorithm 2 Differential Privacy Testing with Z3 Solver

E

\eta

C

N

B

\epsilon

\delta

num\_compositions

delta_{w}\leftarrow\frac{E\cdot\eta\cdot C}{N\cdot B}

3:function TaylorExp(

x

terms

)

result\leftarrow 1

factorial\leftarrow 1

6: for

i=1

terms-1

factorial\leftarrow factorial\cdot i

result\leftarrow result+\frac{x^{i}}{factorial}

9: end for

10: return

result

11:end function

12:function DifferentialPrivacy(

\epsilon

\delta

\sigma

delta_{w}

)

13: Define

w

w^{\prime}

noisy\_w

noisy\_w^{\prime}

x

as Real variables

14:

pdf\_w\leftarrow\frac{1}{\sqrt{2\pi\sigma^{2}}}\cdot\text{TaylorExp}\left(% \frac{-(x-w)^{2}}{2\sigma^{2}}\right)

15:

pdf\_w^{\prime}\leftarrow\frac{1}{\sqrt{2\pi\sigma^{2}}}\cdot\text{TaylorExp}% \left(\frac{-(x-w^{\prime})^{2}}{2\sigma^{2}}\right)

16: return

\text{Implies}(w^{\prime}=w+delta\_w\land x=noisy\_w,pdf\_w\leq\text{TaylorExp% }(\epsilon)\cdot pdf\_w^{\prime}+\delta)

17:end function

18:function AdvancedComposition(

\epsilon

\delta

num\_compositions

)

19: return

\epsilon\cdot\sqrt{2\cdot num\_compositions\cdot\log{\left(\frac{1}{\delta}% \right)}}

num\_compositions\cdot\delta+\delta

20:end function

21:

\epsilon\_composed,\delta\_composed\leftarrow\text{AdvancedComposition}(% \epsilon,\delta,num\_compositions)

22:

\sigma\_composed\leftarrow\sqrt{2\cdot\log{\left(\frac{1.25}{\delta\_composed}% \right)}}\cdot\frac{delta\_w}{\epsilon\_composed}+\frac{0.009760}{\epsilon\_% composed^{0.078008}}

23:Create Z3 solver

s\_composed

24:

s\_composed.\text{add}(\text{DifferentialPrivacy}(\epsilon\_composed,\delta\_% composed,\sigma\_composed,delta\_w))

25:

\sigma\_original\leftarrow\sqrt{2\cdot\log{\left(\frac{1.25}{\delta}\right)}}% \cdot\frac{delta\_w}{\epsilon}+\frac{0.009760}{\epsilon^{0.078008}}

26:Create Z3 solver

s\_original

27:

s\_original.\text{add}(\text{DifferentialPrivacy}(\epsilon,\delta,\sigma\_% original,delta\_w))

4 Analysis

4.1 Approach: Statistical Simulation

Overview: The statistical simulation approach aimed to empirically validate differential privacy conditions across varying epsilon values. By calculating the noise scale and applying it to a weight, we simulated multiple instances and computed the violation rate based on the ratio of probability density functions (pdfs) of noisy weights.

Results: The simulation revealed the following violation rates:

•

Minimum violation rate: 0.000000 at epsilon = 0.01
•

Maximum violation rate: 0.000000 at epsilon = 0.01

The violation rates were recorded and visualized, providing insights into the effectiveness of the noise mechanism in preserving privacy under different privacy budgets.

Refer to caption — Figure 1: Visualization of Privacy Loss Distribution Bounded by Delta

4.2 Approach: Formal Verification Using Z3 Solver

Overview: The formal verification approach employed the Z3 solver to verify differential privacy conditions under advanced composition for multiple queries or iterations. This method provided a rigorous mathematical validation by checking the satisfiability of the differential privacy conditions.

Results: For epsilon = 1, the Z3 solver analysis demonstrated the following:

•

The differential privacy condition is satisfied after 1000 compositions.
•

Composed epsilon: 191.94103648752323
•

Composed delta: 1.001e-05
•

The original (non-composed) differential privacy condition is satisfied.

This method confirmed that the privacy guarantees hold under advanced composition, providing validation of the differential privacy mechanisms.

The results from both approaches complement each other, with the statistical simulation providing empirical insights and the formal verification offering rigorous mathematical validation. Together, they demonstrate the effectiveness and robustness of differential privacy mechanisms under varying conditions and compositions.

4.3 Approach: Empirical Evaluation Comparing 5, 10, 50, and 100 Batch Sizes

In this section, we present a comprehensive analysis of the performance of the differentially private (DP-SGD) model compared to the non-differentially private (DP-Weights) noisy model and the fine-tuned model. The analysis includes pairwise comparisons and confidence intervals for various metrics, providing insights into the statistical similarities and differences between the models.

We conducted pairwise comparisons and calculated 95% confidence intervals for the following metrics: perplexity (member and non-member), ROC AUC, accuracy, precision, recall, and F1 score. The results are summarized below:

4.3.1 Perplexity (Member)

•

DP vs. DP-Weights:

T-stat:

-0.25, P-value: 0.8032

Confidence Interval:

(-1.65, 1.28)

Interpretation:

No significant difference between the DP model and DP-Weights noisy model.
•

DP vs. Fine-tuned:

T-stat:

5.95, P-value: 1.05e-08

Confidence Interval:

(2.61, 5.22)

Interpretation:

Significant difference, with the DP model having higher perplexity.
•

DP-Weights vs. Fine-tuned:

T-stat:

11.72, P-value: 5.35e-25

Confidence Interval:

(3.41, 4.80)

Interpretation:

Significant difference, with the DP-Weights noisy model having higher perplexity.

4.3.2 Perplexity (Non-member)

•

DP vs. DP-Weights:

T-stat:

-2.72, P-value: 0.0069

Confidence Interval:

(-3.30, -0.53)

Interpretation:

Significant difference, with the DP model having lower perplexity.
•

DP vs. Fine-tuned:

T-stat:

5.18, P-value: 4.96e-07

Confidence Interval:

(2.00, 4.48)

Interpretation:

Significant difference, with the DP model having higher perplexity.
•

DP-Weights vs. Fine-tuned:

T-stat:

16.13, P-value: 3.00e-39

Confidence Interval:

(4.52, 5.79)

Interpretation:

Significant difference, with the DP-Weights noisy model having higher perplexity.

4.3.3 ROC AUC

•

DP vs. DP-Weights:

T-stat:

-4.33, P-value: 2.24e-05

Confidence Interval:

(-0.12, -0.05)

Interpretation:

Significant difference, with the DP model having lower ROC AUC.
•

DP vs. Fine-tuned:

T-stat:

-5.32, P-value: 2.51e-07

Confidence Interval:

(-0.18, -0.08)

Interpretation:

Significant difference, with the DP model having lower ROC AUC.
•

DP-Weights vs. Fine-tuned:

T-stat:

-2.02, P-value: 0.0448

Confidence Interval:

(-0.10, -0.001)

Interpretation:

Significant difference, with the DP-Weights noisy model having lower ROC AUC.

4.3.4 Accuracy

•

DP vs. DP-Weights:

T-stat:

-1.96, P-value: 0.0518

Confidence Interval:

(-0.07, 0.00)

Interpretation:

No significant difference between the DP model and DP-Weights noisy model.
•

DP vs. Fine-tuned:

T-stat:

-5.49, P-value: 1.08e-07

Confidence Interval:

(-0.16, -0.07)

Interpretation:

Significant difference, with the DP model having lower accuracy.
•

DP-Weights vs. Fine-tuned:

T-stat:

-3.84, P-value: 0.0002

Confidence Interval:

(-0.13, -0.04)

Interpretation:

Significant difference, with the DP-Weights noisy model having lower accuracy.

4.3.5 Precision

•

DP vs. DP-Weights:

T-stat:

-2.43, P-value: 0.0161

Confidence Interval:

(-0.08, -0.01)

Interpretation:

Significant difference, with the DP model having lower precision.
•

DP vs. Fine-tuned:

T-stat:

-5.42, P-value: 1.57e-07

Confidence Interval:

(-0.15, -0.07)

Interpretation:

Significant difference, with the DP model having lower precision.
•

DP-Weights vs. Fine-tuned:

T-stat:

-3.13, P-value: 0.0020

Confidence Interval:

(-0.11, -0.02)

Interpretation:

Significant difference, with the DP-Weights noisy model having lower precision.

4.3.6 Recall

•

DP vs. DP-Weights:

T-stat:

0.21, P-value: 0.8314

Confidence Interval:

(-0.04, 0.05)

Interpretation:

No significant difference between the DP model and DP-Weights noisy model.
•

DP vs. Fine-tuned:

T-stat:

-6.23, P-value: 2.32e-09

Confidence Interval:

(-0.19, -0.10)

Interpretation:

Significant difference, with the DP model having lower recall.
•

DP-Weights vs. Fine-tuned:

T-stat:

-6.21, P-value: 2.63e-09

Confidence Interval:

(-0.20, -0.10)

Interpretation:

Significant difference, with the DP-Weights noisy model having lower recall.

4.3.7 F1 Score

•

DP vs. DP-Weights:

T-stat:

-0.89, P-value: 0.3762

Confidence Interval:

(-0.05, 0.02)

Interpretation:

No significant difference between the DP model and DP-Weights noisy model.
•

DP vs. Fine-tuned:

T-stat:

-5.91, P-value: 1.29e-08

Confidence Interval:

(-0.17, -0.08)

Interpretation:

Significant difference, with the DP model having lower F1 scores.
•

DP-Weights vs. Fine-tuned:

T-stat:

-4.92, P-value: 1.71e-06

Confidence Interval:

(-0.15, -0.07)

Interpretation:

Significant difference, with the DP-Weights noisy model having lower F1 scores.

4.3.8 Summary of Findings

•

Perplexity (Member): No significant difference between DP and DP-Weights noisy models, but both significantly differ from the fine-tuned model.
•

Perplexity (Non-member): Significant difference between DP and DP-Weights noisy models, with both models also differing significantly from the fine-tuned model.
•

ROC AUC: Significant difference between DP and DP-Weights noisy models, and both models differ significantly from the fine-tuned model.
•

Accuracy: No significant difference between DP and DP-Weights noisy models, but both significantly differ from the fine-tuned model.
•

Precision: Significant difference between DP and DP-Weights noisy models, with both models differing significantly from the fine-tuned model.
•

Recall: No significant difference between DP and DP-Weights noisy models, but both significantly differ from the fine-tuned model.
•

F1 Score: No significant difference between DP and DP-Weights noisy models, but both significantly differ from the fine-tuned model.

4.4 Approach: Empirical Evaluation Comparing 1, 5, 10, and 20 Epochs

In this section, we analyze the performance differences between the DP (Differentially Private) model, the DP-Weights noisy model, and the fine-tuned model across different epochs of training. The analysis focuses on key metrics: perplexity (for both member and non-member data points), and metrics related to a membership inference attack on the model: ROC AUC, accuracy, precision, recall, and F1 score. The goal is to determine if the DP model behaves similarly to the DP-Weights noisy model and if both differ significantly from the fine-tuned model.

4.4.1 Perplexity (Member)

•

DP vs. DP-Weights:

T-stat:

-1.211, P-value: 0.227

Confidence Interval:

(-6.94, 1.66)

Interpretation:

There is no significant difference between the DP model and the DP-Weights noisy model in terms of perplexity for member data points. The confidence interval includes zero, indicating that the differences observed could be due to random chance.
•

DP vs. Fine-tuned:

T-stat:

6.573, P-value: 3.47e-10

Confidence Interval:

(6.60, 12.29)

Interpretation:

The DP model shows significantly higher perplexity compared to the fine-tuned model for member data points, indicating poorer performance.
•

DP-Weights vs. Fine-tuned:

T-stat:

7.079, P-value: 1.89e-11

Confidence Interval:

(8.71, 15.47)

Interpretation:

Similarly, the DP-Weights noisy model has significantly higher perplexity compared to the fine-tuned model for member data points.

4.4.2 Perplexity (Non-member)

•

DP vs. DP-Weights:

T-stat:

-2.185, P-value: 0.030

Confidence Interval:

(-8.69, -0.45)

Interpretation:

There is a significant difference between the DP model and the DP-Weights noisy model for non-member perplexity, suggesting some divergence in performance between the two models.
•

DP vs. Fine-tuned:

T-stat:

6.305, P-value: 1.54e-09

Confidence Interval:

(5.99, 11.48)

Interpretation:

The DP model has significantly higher perplexity compared to the fine-tuned model for non-member data points, indicating poorer performance.
•

DP-Weights vs. Fine-tuned:

T-stat:

8.227, P-value: 1.62e-14

Confidence Interval:

(10.10, 16.51)

Interpretation:

The DP-Weights noisy model also shows significantly higher perplexity compared to the fine-tuned model for non-member data points.

4.4.3 ROC AUC (Membership Inference Attack)

•

DP vs. DP-Weights:

T-stat:

-4.486, P-value: 1.17e-05

Confidence Interval:

(-0.08, -0.03)

Interpretation:

The DP model performs significantly worse than the DP-Weights noisy model in terms of ROC AUC, indicating a notable difference in the membership inference attack performance.
•

DP vs. Fine-tuned:

T-stat:

-5.380, P-value: 1.89e-07

Confidence Interval:

(-0.12, -0.06)

Interpretation:

The DP model has a significantly lower ROC AUC compared to the fine-tuned model, indicating poorer classification performance in the membership inference attack.
•

DP-Weights vs. Fine-tuned:

T-stat:

-1.738, P-value: 0.084

Confidence Interval:

(-0.07, 0.00)

Interpretation:

The DP-Weights noisy model is not significantly different from the fine-tuned model in terms of ROC AUC, with the p-value close to 0.05.

4.4.4 Accuracy (Membership Inference Attack)

•

DP vs. DP-Weights:

T-stat:

-1.055, P-value: 0.292

Confidence Interval:

(-0.04, 0.01)

Interpretation:

There is no significant difference in accuracy between the DP model and the DP-Weights noisy model, indicating similar performance in terms of correct predictions in the membership inference attack.
•

DP vs. Fine-tuned:

T-stat:

-3.997, P-value: 8.74e-05

Confidence Interval:

(-0.08, -0.03)

Interpretation:

The DP model has significantly lower accuracy compared to the fine-tuned model, indicating poorer performance in the membership inference attack.
•

DP-Weights vs. Fine-tuned:

T-stat:

-2.699, P-value: 0.007

Confidence Interval:

(-0.07, -0.01)

Interpretation:

The DP-Weights noisy model also shows significantly lower accuracy compared to the fine-tuned model.

4.4.5 Precision (Membership Inference Attack)

•

DP vs. DP-Weights:

T-stat:

-1.142, P-value: 0.255

Confidence Interval:

(-0.04, 0.01)

Interpretation:

There is no significant difference in precision between the DP model and the DP-Weights noisy model, indicating similar performance in terms of correctly predicted positive instances in the membership inference attack.
•

DP vs. Fine-tuned:

T-stat:

-3.996, P-value: 8.77e-05

Confidence Interval:

(-0.08, -0.03)

Interpretation:

The DP model has significantly lower precision compared to the fine-tuned model.
•

DP-Weights vs. Fine-tuned:

T-stat:

-2.489, P-value: 0.014

Confidence Interval:

(-0.07, -0.01)

Interpretation:

The DP-Weights noisy model also has significantly lower precision compared to the fine-tuned model.

4.4.6 Recall (Membership Inference Attack)

•

DP vs. DP-Weights:

T-stat:

1.150, P-value: 0.252

Confidence Interval:

(-0.01, 0.05)

Interpretation:

There is no significant difference in recall between the DP model and the DP-Weights noisy model, indicating similar performance in terms of identifying all relevant instances in the membership inference attack.
•

DP vs. Fine-tuned:

T-stat:

-4.239, P-value: 3.29e-05

Confidence Interval:

(-0.09, -0.03)

Interpretation:

The DP model has significantly lower recall compared to the fine-tuned model.
•

DP-Weights vs. Fine-tuned:

T-stat:

-4.685, P-value: 4.87e-06

Confidence Interval:

(-0.11, -0.05)

Interpretation:

The DP-Weights noisy model also has significantly lower recall compared to the fine-tuned model.

4.4.7 F1 Score (Membership Inference Attack)

•

DP vs. DP-Weights:

T-stat:

0.234, P-value: 0.815

Confidence Interval:

(-0.02, 0.03)

Interpretation:

There is no significant difference in F1 score between the DP model and the DP-Weights noisy model, indicating similar performance in terms of the balance between precision and recall in the membership inference attack.
•

DP vs. Fine-tuned:

T-stat:

-4.225, P-value: 3.49e-05

Confidence Interval:

(-0.09, -0.03)

Interpretation:

The DP model has a significantly lower F1 score compared to the fine-tuned model.
•

DP-Weights vs. Fine-tuned:

T-stat:

-3.840, P-value: 0.000160

Confidence Interval:

(-0.09, -0.03)

Interpretation:

The DP-Weights noisy model also has a significantly lower F1 score compared to the fine-tuned model.

4.4.8 Summary of Epoch-based Analysis

The results from the epoch-based analysis support the following conclusions:

•

Similarity between DP and DP-Weights Models: The DP model and DP-Weights noisy model show similar performance in most metrics, with no significant differences in perplexity (member), accuracy, precision, recall, and F1 score. The only notable differences are in perplexity (non-member) and ROC AUC, where the DP model performs slightly worse.
•

Difference from Fine-tuned Model: Both the DP model and the DP-Weights noisy model are significantly different from the fine-tuned model across all metrics. This indicates that the fine-tuned model consistently outperforms the DP and DP-Weights models in terms of perplexity, ROC AUC, accuracy, precision, recall, and F1 score – indicating the fine-tuned model is more susceptible to membership inference attacks.

These findings suggest that the DP-SGD model can achieve performance (relatively) comparable to the DP-Weights noisy model, with statistically similar behavior in most metrics, while both models are distinct from the fine-tuned model in terms of performance.

5 Discussion

The results of this study provide significant insights into the efficacy of applying differentially private (DP) noise to machine learning models post-training. Our approach was compared against a traditional DP model and a fine-tuned model without DP noise, across various metrics and experimental setups involving different batch sizes and epochs.

5.1 Performance Comparison

Our experiments demonstrated that the novel fine-tuned model (DP-Weights model) with post-training noise application achieves performance metrics statistically similar to those of the traditional DP model. This finding is crucial as it validates the effectiveness of our method in maintaining privacy guarantees while maintaining a similar level of performance degradation compared to the DP-SGD model. This can be particularly advantageous in practical applications where retraining a model is costly or impractical.

5.1.1 Perplexity

The perplexity analysis for both member data points revealed that the DP-Weights model and the DP-SGD model exhibited comparable perplexity scores. For member data points, there was no significant difference between the DP-SGD and DP-Weights models, while both had significantly higher perplexity compared to the fine-tuned model. This suggests that the addition of noise, whether during or post-training, introduces similar levels of uncertainty in the model’s predictions.

For non-member data points, the DP model showed significantly lower perplexity compared to the DP-Weights model, indicating slightly better generalization. However, both models had higher perplexity than the fine-tuned model, highlighting the impact of differential privacy on model performance.

5.1.2 Classification Metrics

In terms of ROC AUC, accuracy, precision, recall, and F1 score, the DP model and the DP-Weights model exhibited similar behavior across different batch sizes and epochs. The significant differences observed were mainly between these models and the fine-tuned model. This indicates that while differential privacy noise impacts model performance, the timing of its application (during or post-training) should not result in substantial performance differences if applied correctly.

5.1.3 DP-Weights Losses and Gains vs. DP-SGD

Looking at the radar plots, there seems to be a tradeoff between DP-Weights and DP-SGD that is not present in the descriptive statistics. This discrepancy is important to note. Visually, it is clear these two approaches are not identical.

•

On average across epsilon values from 1 to 1,000, at lower batch sizes like 1 through 5, DP-SGD appears to provide noticeably greater privacy protection and perplexity performance, whereas DP-Weights appears to provide slightly greater privacy protection and perplexity performance at higher batch sizes when holding epochs at 10, dataset size at 1000, learning rate at 5e-5, clip** norm at 1.0.
•

On average across epsilon values from 1 to 1,000, DP-SGD and DP-Weights appear to offer similar privacy protections for epochs of values 1, 5, and 10. However, DP-SGD offers greater privacy protection and performance at high numbers of epochs like 20. Both offer statistically greater privacy protection than the fine-tuned model.

This indicates that the batch size term is not correctly accounted for in practice when using the proposed noise scale.

We caution users of DP-Weights to be mindful of these tradeoffs should they choose to incorporate our approach into their privacy protection protocols. While our mathematical proof may provide evidence to support the fact that our approach is Differentially Private, and the statistics do not show any significant difference between DP-SGD and DP-Weights, this does not necessarily mean DP-Weights will offer the same level of privacy protections as other DP approaches to training machine learning models.

5.2 Implications of Post-Training Noise Application

The primary advantage of applying DP noise post-training is the potential for improved training efficiency and model performance tuning. Traditional methods integrating noise during training can lead to significant training costs and require careful tuning of the noise parameters to balance privacy and accuracy. By applying noise post-training, our method simplifies the training process and allows for better optimization of the model’s performance before introducing privacy guarantees.

5.3 Limitations and Future Work

While our study shows promising results, it is essential to consider the limitations and potential areas for future research. One limitation is the assumption that the noise added post-training uniformly affects all model weights. Further investigation is needed to understand the impact of noise distribution and its implications on model performance.

This approach requires a machine learning practitioner to have knowledge of the dataset size, learning rate, batch size, and the gradients must be clipped during training for this approach to be mathematically differentially private. Without clip** the gradients, the global sensitivity of the model weight update rule is not bound by the gradient norm. As such, one must always have knowledge of how a model was trained in order for this approach to provide the guarantees afforded by the Differential Privacy framework.

Additionally, exploring the scalability of our approach to larger models and more complex datasets is a critical next step. Understanding how post-training noise application interacts with various model architectures and training regimes will provide deeper insights into its practical applicability.

6 Conclusion

This study introduces a novel noise scale for applying differential privacy noise to machine learning models post-training. Our analysis across various metrics and experimental setups shows that this approach yields performance statistically similar to that of traditional DP models while simplifying the training process. These findings highlight the potential of post-training noise application as a viable alternative for achieving privacy guarantees in machine learning models.

Future work will focus on refining this method, exploring its applicability to different model architectures, and further understanding the nuances of noise distribution and its impact on model performance.

7 Acknowledgements

I, David Zagardo, would like to thank my dog, Mr. Macaroni, for his support during this process. He has proven an invaluable contributor to my research. ArXiv does not allow for animals as co-authors, but rest assured that the inclusive language used in the aforementioned text refers to both myself and Mr. Macaroni, without whom this publication would not have happened.

References

[1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, ”Deep Learning with Differential Privacy,” Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016.
[2] H. Yu, K. Su, Y. Chen, and T. Goldstein, ”Differentially Private Model Training with Fast Adaptation,” arXiv preprint arXiv:2206.02617, 2022.
[3] Y. Li, P. Kairouz, and M. Jaggi, ”Large Scale Differentially Private Fine-tuning,” arXiv preprint arXiv:2112.00845, 2022.
[4] G. Damaskinos, R. Tramèr, and F. B. Shepherd, ”Differentially Private Coordinate Descent,” arXiv preprint arXiv:2112.00845, 2021.
[5] J. Koskela, A. Bun, and V. Pihur, ”Privacy Amplification by Shuffling: Applications to DP-SGD,” arXiv preprint arXiv:2206.02617, 2023.
[6] J. Yu, K. Zhao, and A. Srivastava, ”DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning,” arXiv preprint arXiv:2102.07746, 2021.
[7] H. Dong, J. Hsu, and A. Roth, ”Gaussian Differential Privacy,” Journal of Privacy and Confidentiality, 2022.
[8] S. Vadhan, ”Differentially Private Training of Deep Neural Networks with Smooth Sensitivity,” arXiv preprint arXiv:2210.05659, 2022.
[9] M. Nasr, R. Shokri, and A. Houmansadr, ”Comprehensive Privacy Analysis of Deep Learning,” arXiv preprint arXiv:1812.00910, 2018.
[10] J. Lyu, D. Su, B. Xu, and Y. Wang, ”Differentially Private Model Publishing through Noise Injection,” arXiv preprint arXiv:2006.10218, 2020.

8 Appendix

8.1 Mathematical Rigor and Proof of Differential Privacy

To establish the differential privacy guarantees of our proposed method of applying noise post-training, we follow a systematic approach to prove that the mechanism is indeed differentially private.

8.1.1 Differential Privacy Definition

A randomized mechanism $\mathcal{M}$ provides $(\epsilon,\delta)$ -differential privacy if for any two adjacent datasets $D$ and $D^{\prime}$ (differing by at most one element) and for any set of outcomes $S\subseteq\text{Range}(\mathcal{M})$ :

\Pr[\mathcal{M}(D)\in S]\leq e^{\epsilon}\Pr[\mathcal{M}(D^{\prime})\in S]+\delta

8.1.2 Step-by-Step Proof

Step 1: Sensitivity Calculation of the Weight Update Rule

Let $W=\{w_{1},w_{2},\ldots,w_{n}\}$ be the set of trainable weights in the model. The sensitivity $\Delta W$ is the maximum change in any trainable weight $w_{i}$ when one data point is added or removed from the training dataset:

\Delta W=\frac{E\cdot\eta\cdot C}{N\cdot B}

where $E$ is the number of epochs, $\eta$ is the learning rate, $C$ is the clip** norm, $N$ is the dataset size, and $B$ is the batch size.

Step 2: Noise Scale Calculation

The noise added to each trainable weight is drawn from a Gaussian distribution with zero mean and variance $\sigma^{2}$ , where:

\sigma=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W}{% \epsilon}+\frac{0.009760}{\epsilon^{0.078008}}

This ensures the added noise accounts for the empirical term, enhancing privacy protection.

Step 3: Probability Distribution

The probability density function (pdf) of the Gaussian noise is:

f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{x^{2}}{2\sigma^{2}}\right)

Step 4: Adjacent Datasets

Consider two adjacent datasets $D$ and $D^{\prime}$ that differ by one data point. The corresponding weights after training on these datasets are $W$ and $W^{\prime}$ , respectively, with $|W_{i}-W^{\prime}_{i}|\leq\Delta W$ for each trainable weight $w_{i}$ .

Step 5: Privacy Guarantee

The goal is to show:

\Pr[\mathcal{M}(D)=W+N]\leq e^{\epsilon}\Pr[\mathcal{M}(D^{\prime})=W^{\prime}% +N]+\delta

where $N\sim\mathcal{N}(0,\sigma^{2})$ is the added Gaussian noise.

The pdfs of the noisy weights are:

f(W+N)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(W+N-W)^{2}}{2\sigma^{2% }}\right)

f(W^{\prime}+N)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(W^{\prime}+N-% W^{\prime})^{2}}{2\sigma^{2}}\right)

The ratio of these probabilities for adjacent datasets is:

\frac{f(W+N)}{f(W^{\prime}+N)}=\exp\left(\frac{-(W+N-W)^{2}+(W^{\prime}+N-W^{% \prime})^{2}}{2\sigma^{2}}\right)

Since $|W-W^{\prime}|\leq\Delta W$ , we have:

\frac{f(W+N)}{f(W^{\prime}+N)}=\exp\left(\frac{(W^{\prime}-W)^{2}}{2\sigma^{2}% }\right)

Substituting $\sigma$ :

\sigma=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W}{% \epsilon}+\frac{0.009760}{\epsilon^{0.078008}}

The bound becomes:

\frac{f(W+N)}{f(W^{\prime}+N)}\leq\exp\left(\frac{\Delta W^{2}}{2\left(\sqrt{2% \cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W}{\epsilon}+\frac% {0.009760}{\epsilon^{0.078008}}\right)^{2}}\right)

To simplify, denote $\sigma_{1}=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W% }{\epsilon}$ and $\sigma_{2}=\frac{0.009760}{\epsilon^{0.078008}}$ , giving:

\frac{f(W+N)}{f(W^{\prime}+N)}\leq\exp\left(\frac{\Delta W^{2}}{2(\sigma_{1}+% \sigma_{2})^{2}}\right)

To ensure that:

\exp\left(\frac{\Delta W^{2}}{2(\sigma_{1}+\sigma_{2})^{2}}\right)\leq e^{\epsilon}

we need to show:

\frac{\Delta W^{2}}{2(\sigma_{1}+\sigma_{2})^{2}}\leq\epsilon

Substituting $\sigma_{1}$ and $\sigma_{2}$ :

\sigma_{1}=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W% }{\epsilon}

\sigma_{2}=\frac{0.009760}{\epsilon^{0.078008}}

Simplifying the bound:

(\sigma_{1}+\sigma_{2})^{2}=\left(\sqrt{2\cdot\log\left(\frac{1.25}{\delta}% \right)}\cdot\frac{\Delta W}{\epsilon}+\frac{0.009760}{\epsilon^{0.078008}}% \right)^{2}

Assuming $\sigma_{2}$ is relatively small compared to $\sigma_{1}$ :

\sigma_{1}^{2}\approx 2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot\frac{% \Delta W^{2}}{\epsilon^{2}}

Thus:

\frac{\Delta W^{2}}{2\sigma_{1}^{2}}=\frac{\Delta W^{2}}{2\cdot 2\cdot\log% \left(\frac{1.25}{\delta}\right)\cdot\frac{\Delta W^{2}}{\epsilon^{2}}}=\frac{% \epsilon^{2}}{4\cdot\log\left(\frac{1.25}{\delta}\right)}

Ensuring:

\frac{\epsilon^{2}}{4\cdot\log\left(\frac{1.25}{\delta}\right)}\leq\epsilon

Solving for $\epsilon$ :

\epsilon\leq 4\cdot\log\left(\frac{1.25}{\delta}\right)

Therefore, our mechanism satisfies $(\epsilon,\delta)$ -differential privacy if $\epsilon$ is chosen such that:

\epsilon\leq 4\cdot\log\left(\frac{1.25}{\delta}\right)

The inclusion of the empirical term $\sigma_{2}=\frac{0.009760}{\epsilon^{0.078008}}$ provides an additional conservative noise buffer, ensuring robustness of the privacy guarantee. By combining $\sigma_{1}$ and $\sigma_{2}$ , the overall noise scale maintains the differential privacy requirements.

However, if $\sigma_{2}$ is not relatively small compared to $\sigma_{1}$ , we need to reassess the bound more rigorously.

Considering $\sigma_{2}$ is not relatively small compared to $\sigma_{1}$ :

\sigma_{1}=\sqrt{2\cdot\log\left(\frac{1.25}{\delta}\right)}\cdot\frac{\Delta W% }{\epsilon}

\sigma_{2}=\frac{0.009760}{\epsilon^{0.078008}}

The noise scale is:

\sigma=\sigma_{1}+\sigma_{2}

The bound on the ratio of probabilities for adjacent datasets becomes:

\frac{f(W+N)}{f(W^{\prime}+N)}\leq\exp\left(\frac{\Delta W^{2}}{2(\sigma_{1}+% \sigma_{2})^{2}}\right)

Ensuring:

\exp\left(\frac{\Delta W^{2}}{2(\sigma_{1}+\sigma_{2})^{2}}\right)\leq e^{\epsilon}

Thus:

\frac{\Delta W^{2}}{2(\sigma_{1}+\sigma_{2})^{2}}\leq\epsilon

Substituting $\sigma_{1}$ and $\sigma_{2}$ :

(\sigma_{1}+\sigma_{2})^{2}=\left(\sqrt{2\cdot\log\left(\frac{1.25}{\delta}% \right)}\cdot\frac{\Delta W}{\epsilon}+\frac{0.009760}{\epsilon^{0.078008}}% \right)^{2}

Expanding the square:

(\sigma_{1}+\sigma_{2})^{2}=2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot% \frac{\Delta W^{2}}{\epsilon^{2}}+2\cdot\sqrt{2\cdot\log\left(\frac{1.25}{% \delta}\right)}\cdot\frac{\Delta W\cdot 0.009760}{\epsilon\cdot\epsilon^{0.078% 008}}+\left(\frac{0.009760}{\epsilon^{0.078008}}\right)^{2}

To ensure the bound:

\frac{\Delta W^{2}}{2\left(2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot% \frac{\Delta W^{2}}{\epsilon^{2}}+2\cdot\sqrt{2\cdot\log\left(\frac{1.25}{% \delta}\right)}\cdot\frac{\Delta W\cdot 0.009760}{\epsilon\cdot\epsilon^{0.078% 008}}+\left(\frac{0.009760}{\epsilon^{0.078008}}\right)^{2}\right)}\leq\epsilon

Simplify further:

\frac{\Delta W^{2}\cdot\epsilon^{2}}{2\left(2\cdot\log\left(\frac{1.25}{\delta% }\right)\cdot\Delta W^{2}+2\cdot\sqrt{2\cdot\log\left(\frac{1.25}{\delta}% \right)}\cdot\Delta W\cdot 0.009760\cdot\epsilon^{1-0.078008}+0.009760^{2}% \cdot\epsilon^{2-0.156016}\right)}\leq\epsilon^{3}

Rewriting the inequality:

\frac{\Delta W^{2}\cdot\epsilon}{2\left(2\cdot\log\left(\frac{1.25}{\delta}% \right)\cdot\Delta W^{2}+2\cdot\sqrt{2\cdot\ log\left(\frac{1.25}{\delta}% \right)}\cdot\Delta W\cdot 0.009760\cdot\epsilon^{0.921992}+0.009760^{2}\cdot% \epsilon^{1.843984}\right)}\leq\epsilon

\frac{\Delta W^{2}}{2\left(2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot% \Delta W^{2}+2\cdot\sqrt{2\cdot\ log\left(\frac{1.25}{\delta}\right)}\cdot% \Delta W\cdot 0.009760\cdot\epsilon^{0.921992}+0.009760^{2}\cdot\epsilon^{1.84% 3984}\right)}\leq 1

\Delta W^{2}\leq 2\left(2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot\Delta W% ^{2}+2\cdot\sqrt{2\cdot\ log\left(\frac{1.25}{\delta}\right)}\cdot\Delta W% \cdot 0.009760\cdot\epsilon^{0.921992}+0.009760^{2}\cdot\epsilon^{1.843984}\right)

Dividing both sides by 2:

\frac{\Delta W^{2}}{2}\leq 2\cdot\log\left(\frac{1.25}{\delta}\right)\cdot% \Delta W^{2}+2\cdot\sqrt{2\cdot\ log\left(\frac{1.25}{\delta}\right)}\cdot% \Delta W\cdot 0.009760\cdot\epsilon^{0.921992}+0.009760^{2}\cdot\epsilon^{1.84% 3984}

Solving for $\epsilon$ involves ensuring that the terms on the right-hand side appropriately bound $\Delta W^{2}$ . This guarantees $(\epsilon,\delta)$ -differential privacy if:

\epsilon\leq\min\left(4\cdot\log\left(\frac{1.25}{\delta}\right),\left(\frac{% \Delta W^{2}}{4\cdot\ log\left(\frac{1.25}{\delta}\right)\cdot\Delta W^{2}+0.0% 384\cdot\Delta W\cdot\sqrt{\log\left(\frac{1.25}{\delta}\right)}}\right)^{1/0.% 922},\left(\frac{\Delta W^{2}}{0.000191}\right)^{1/1.844}\right)

8.1.3 Rényi Differential Privacy and Subsampling Amplification

Rényi Differential Privacy Definition

Define $\alpha$ -Rényi divergence between two probability distributions $P$ and $Q$ :

D_{\alpha}(P\parallel Q)=\frac{1}{\alpha-1}\log\mathbb{E}_{Q}\left[\left(\frac% {P(x)}{Q(x)}\right)^{\alpha}\right]

A mechanism $M$ is $(\alpha,\epsilon)$ -RDP if for all adjacent datasets $D$ and $D^{\prime}$ :

D_{\alpha}(M(D)\parallel M(D^{\prime}))\leq\epsilon

Our $L_{2}$ -sensitivity remains:

\Delta W=\frac{E\cdot\eta\cdot C}{N\cdot B}

For a Gaussian mechanism with noise $N(0,\sigma^{2}I)$ , the RDP guarantee is:

\epsilon_{RDP}(\alpha)=\frac{\alpha(\Delta W)^{2}}{2\sigma^{2}}

With a sampling rate $q=\frac{B}{N}$ (batch size / dataset size), we can use the following bound (Wang et al., 2019):

\epsilon_{RDP,\text{subsampled}}(\alpha)\leq\frac{1}{\alpha-1}\log\left(1+q^{2% }\binom{\alpha}{2}\left(e^{(\alpha-1)\epsilon_{RDP}(\alpha)}-1\right)\right)

We need to calibrate $\sigma$ to achieve the desired RDP guarantee. Let’s set:

\sigma^{2}=\frac{(\Delta W)^{2}\alpha}{2\epsilon_{RDP}(\alpha)}

For $E$ epochs, we use RDP composition:

\epsilon_{RDP,\text{total}}(\alpha)=E\cdot\epsilon_{RDP,\text{subsampled}}(\alpha)

Use the conversion theorem (Canonne et al., 2020) to convert RDP to $(\epsilon,\delta)$ -DP: For any $\delta>0$ , the mechanism satisfies $(\epsilon,\delta)$ -DP where:

\epsilon=\min_{\alpha}\left\{\epsilon_{RDP,\text{total}}(\alpha)+\frac{\log(1/% \delta)}{\alpha-1}\right\}

To account for the empirical term in our noise formula, we add it to our $\sigma$ :

\sigma_{\text{total}}^{2}=\sigma^{2}+\left(\frac{0.009760}{\epsilon^{0.078008}% }\right)^{2}

Final Privacy Guarantee

The mechanism $M$ satisfies $(\epsilon,\delta)$ -DP where:

\epsilon=\min_{\alpha}\left\{\frac{\alpha(\Delta W)^{2}}{2\sigma_{\text{total}% }^{2}}\cdot E\cdot\frac{1}{\alpha-1}\log\left(1+q^{2}\binom{\alpha}{2}\left(e^% {(\alpha-1)\frac{\alpha(\Delta W)^{2}}{2\sigma_{\text{total}}^{2}}}-1\right)% \right)+\frac{\log(1/\delta)}{\alpha-1}\right\}

By leveraging RDP and subsampling amplification, our mechanism achieves tighter privacy bounds, enhancing the overall privacy guarantee.

Thus, the mechanism $\mathcal{M}$ with post-training noise addition conclusively satisfies $(\epsilon,\delta)$ -differential privacy, providing both theoretical and empirical noise components for enhanced privacy protection. This ensures robustness even when $\sigma_{2}$ is not relatively small compared to $\sigma_{1}$ .

Step 6: Explicitly Accounting for $\delta$

The above derivation shows the bound in terms of $\epsilon$ . To fully account for $\delta$ , we need to ensure that the probability mass beyond the bound $\epsilon$ is captured by $\delta$ . By designing the noise scale $\sigma$ to include the term $\log(1.25/\delta)$ , we ensure that the probability mass beyond $e^{\epsilon}$ is at most $\delta$ .

8.1.4 Considerations and Assumptions

•

Composition: If the method is applied multiple times or combined with other privacy-preserving techniques, the privacy guarantees compose according to the composition theorems of differential privacy.
•

Assumptions: The analysis assumes the independence of noise added to different weights.

8.1.5 Proof Summary

By adding Gaussian noise to the model weights post-training with the calculated noise scale, the proposed method ensures differential privacy. This proof demonstrates that the noise addition mechanism adheres to the definition of $(\epsilon,\delta)$ -differential privacy, providing a rigorous foundation for the approach.

8.1.6 Further Validation

Future work would consider the following:

•

Empirical Validation: Conduct experiments with more datasets and more models.
•

Simulations: Use synthetic datasets for analytical comparisons.
•

Peer Review: Submit for feedback from privacy and machine learning experts.

The comprehensive proof and additional validation steps ensure a robust and reliable method for achieving differential privacy through post-training noise addition.

8.2 Tables and Figures

Batch Size	Model	Perplex. (Mem.)	Perplex. (Non-Mem.)	ROC AUC	Accuracy	Prec.	Recall	F1 Score
		Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std
5	dp_model	$2.79\pm 0.96$	$3.86\pm 0.46$	$0.69\pm 0.21$	$0.67\pm 0.19$	$0.66\pm 0.18$	$0.69\pm 0.22$	$0.67\pm 0.20$
5	fine_tuned_model	$1.79\pm 0.00$	$4.04\pm 0.00$	$0.97\pm 0.00$	$0.95\pm 0.00$	$0.91\pm 0.00$	$1.00\pm 0.00$	$0.95\pm 0.00$
5	non_dp_model	$3.84\pm 1.11$	$7.86\pm 2.07$	$0.83\pm 0.05$	$0.78\pm 0.05$	$0.78\pm 0.04$	$0.77\pm 0.09$	$0.77\pm 0.06$
10	dp_model	$3.36\pm 1.39$	$4.04\pm 1.05$	$0.60\pm 0.14$	$0.59\pm 0.11$	$0.58\pm 0.12$	$0.59\pm 0.13$	$0.58\pm 0.12$
10	fine_tuned_model	$2.13\pm 0.00$	$3.75\pm 0.00$	$0.84\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$
10	non_dp_model	$4.82\pm 1.43$	$7.69\pm 2.14$	$0.71\pm 0.04$	$0.68\pm 0.07$	$0.70\pm 0.07$	$0.63\pm 0.09$	$0.66\pm 0.08$
50	dp_model	$7.42\pm 5.80$	$7.59\pm 5.66$	$0.49\pm 0.02$	$0.55\pm 0.05$	$0.55\pm 0.05$	$0.55\pm 0.06$	$0.55\pm 0.06$
50	fine_tuned_model	$2.98\pm 0.00$	$3.52\pm 0.00$	$0.50\pm 0.00$	$0.55\pm 0.00$	$0.55\pm 0.00$	$0.60\pm 0.00$	$0.57\pm 0.00$
50	non_dp_model	$8.54\pm 3.56$	$9.89\pm 4.07$	$0.54\pm 0.06$	$0.53\pm 0.07$	$0.53\pm 0.07$	$0.49\pm 0.10$	$0.51\pm 0.08$
100	dp_model	$12.47\pm 9.96$	$12.50\pm 9.78$	$0.49\pm 0.03$	$0.53\pm 0.07$	$0.53\pm 0.07$	$0.50\pm 0.07$	$0.51\pm 0.07$
100	fine_tuned_model	$3.47\pm 0.00$	$3.72\pm 0.00$	$0.48\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$
100	non_dp_model	$9.60\pm 3.78$	$10.20\pm 4.00$	$0.52\pm 0.06$	$0.49\pm 0.07$	$0.48\pm 0.09$	$0.43\pm 0.11$	$0.45\pm 0.10$

Table 1: Descriptive Statistics for Batch Sizes and Models

Epoch	Model	Perplex. (Mem.)	Perplex. (Non-Mem.)	ROC AUC	Accuracy	Prec.	Recall	F1 Score
		Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std	Mean $\pm$ Std
1	dp_model	$32.62\pm 15.91$	$32.20\pm 15.48$	$0.48\pm 0.04$	$0.52\pm 0.04$	$0.52\pm 0.04$	$0.51\pm 0.07$	$0.51\pm 0.05$
1	fine_tuned_model	$10.68\pm 0.00$	$10.73\pm 0.00$	$0.52\pm 0.00$	$0.55\pm 0.00$	$0.56\pm 0.00$	$0.50\pm 0.00$	$0.53\pm 0.00$
1	non_dp_model	$41.46\pm 20.20$	$41.32\pm 19.90$	$0.48\pm 0.07$	$0.52\pm 0.07$	$0.52\pm 0.08$	$0.48\pm 0.10$	$0.49\pm 0.09$
5	dp_model	$12.47\pm 9.96$	$12.50\pm 9.78$	$0.49\pm 0.03$	$0.53\pm 0.07$	$0.53\pm 0.07$	$0.50\pm 0.07$	$0.51\pm 0.07$
5	fine_tuned_model	$3.47\pm 0.00$	$3.72\pm 0.00$	$0.48\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$	$0.50\pm 0.00$
5	non_dp_model	$9.60\pm 3.78$	$10.20\pm 4.00$	$0.52\pm 0.06$	$0.49\pm 0.07$	$0.48\pm 0.09$	$0.43\pm 0.11$	$0.45\pm 0.10$
10	dp_model	$7.24\pm 5.79$	$7.45\pm 5.63$	$0.50\pm 0.03$	$0.53\pm 0.07$	$0.53\pm 0.07$	$0.53\pm 0.09$	$0.53\pm 0.08$
10	fine_tuned_model	$2.82\pm 0.00$	$3.54\pm 0.00$	$0.54\pm 0.00$	$0.55\pm 0.00$	$0.55\pm 0.00$	$0.60\pm 0.00$	$0.57\pm 0.00$
10	non_dp_model	$8.86\pm 3.97$	$10.86\pm 4.80$	$0.57\pm 0.07$	$0.55\pm 0.08$	$0.55\pm 0.08$	$0.52\pm 0.11$	$0.53\pm 0.09$
20	dp_model	$4.37\pm 2.98$	$5.07\pm 2.60$	$0.59\pm 0.13$	$0.61\pm 0.10$	$0.60\pm 0.10$	$0.61\pm 0.11$	$0.61\pm 0.10$
20	fine_tuned_model	$1.96\pm 0.00$	$4.29\pm 0.00$	$0.86\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$	$0.80\pm 0.00$
20	non_dp_model	$7.36\pm 3.45$	$13.11\pm 5.52$	$0.71\pm 0.05$	$0.69\pm 0.06$	$0.70\pm 0.06$	$0.66\pm 0.09$	$0.68\pm 0.07$

Table 2: Descriptive Statistics for Epochs and Models

Table 3: Results for Epoch = 1

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	12.505	26.940	36.620
RMSE for perplexity_non_member	12.577	26.305	36.298
RMSE for roc_auc	0.072	0.058	0.078
RMSE for accuracy	0.082	0.048	0.077
RMSE for precision	0.087	0.051	0.088
RMSE for recall	0.125	0.073	0.098
RMSE for f1	0.102	0.050	0.092
Pearson correlation for perplexity_member	0.902	nan	nan
Pearson correlation for perplexity_non_member	0.905	nan	nan
Pearson correlation for roc_auc	0.091	nan	nan
Pearson correlation for accuracy	-0.061	nan	nan
Pearson correlation for precision	0.006	nan	nan
Pearson correlation for recall	-0.013	nan	nan
Pearson correlation for f1	-0.037	nan	nan
MAE for perplexity_member	9.420	21.949	30.785
MAE for perplexity_non_member	9.581	21.468	30.590
MAE for roc_auc	0.052	0.046	0.055
MAE for accuracy	0.066	0.032	0.063
MAE for precision	0.071	0.036	0.069
MAE for recall	0.086	0.039	0.068
MAE for f1	0.079	0.035	0.069
R² score for perplexity_member	0.359	-1.974	-2.409
R² score for perplexity_non_member	0.315	-1.995	-2.451
R² score for roc_auc	-2.455	-1.231	-0.485
R² score for accuracy	-3.449	-0.542	-0.238
R² score for precision	-4.036	-0.750	-0.269
R² score for recall	-1.998	-0.022	-0.069
R² score for f1	-3.393	-0.059	-0.153
MSE for perplexity_member	156.363	725.739	1341.057
MSE for perplexity_non_member	158.187	691.943	1317.525
MSE for roc_auc	0.005	0.003	0.006
MSE for accuracy	0.007	0.002	0.006
MSE for precision	0.008	0.003	0.008
MSE for recall	0.016	0.005	0.010
MSE for f1	0.010	0.003	0.008
MedAE for perplexity_member	7.455	17.503	23.244
MedAE for perplexity_non_member	8.031	17.199	23.265
MedAE for roc_auc	0.040	0.055	0.030
MedAE for accuracy	0.050	0.025	0.050
MedAE for precision	0.056	0.022	0.056
MedAE for recall	0.100	0.000	0.100
MedAE for f1	0.074	0.026	0.074
Coefficient of Variation for perplexity_member	0.488	0.487	0.0
Coefficient of Variation for perplexity_non_member	0.481	0.481	1.685e-16
Coefficient of Variation for roc_auc	0.082	0.137	2.174e-16
Coefficient of Variation for accuracy	0.076	0.137	2.056e-16
Coefficient of Variation for precision	0.076	0.154	0.0
Coefficient of Variation for recall	0.144	0.204	0.0
Coefficient of Variation for f1	0.096	0.176	2.148e-16
MAPE for perplexity_member	0.331	0.576	0.686
MAPE for perplexity_non_member	0.338	0.572	0.684
MAPE for roc_auc	0.110	0.104	0.136
MAPE for accuracy	0.128	0.067	0.133
MAPE for precision	0.138	0.074	0.152
MAPE for recall	0.164	0.074	0.176
MAPE for f1	0.152	0.071	0.164
Wilcoxon test for perplexity_member	12.0, $5.215e-07$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	9.0, $2.459e-07$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	162.0, $0.732$	20.5, $0.0001251$	55.0, $0.002160$
Wilcoxon test for accuracy	114.5, $0.473$	1.0, $0.0008325$	34.0, $0.002487$
Wilcoxon test for precision	115.5, $0.493$	5.0, $0.0003717$	42.0, $0.005869$
Wilcoxon test for recall	36.5, $0.0963$	16.0, $0.417$	36.0, $0.1524$
Wilcoxon test for f1	106.0, $0.3302$	60.0, $0.2632$	73.0, $0.0816$

Table 4: Results for Epoch = 5

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	7.123	13.296	7.165
RMSE for perplexity_non_member	6.595	13.011	7.575
RMSE for roc_auc	0.079	0.033	0.076
RMSE for accuracy	0.109	0.070	0.074
RMSE for precision	0.122	0.072	0.088
RMSE for recall	0.167	0.073	0.135
RMSE for f1	0.145	0.070	0.112
Pearson correlation for perplexity_member	0.922	nan	nan
Pearson correlation for perplexity_non_member	0.922	nan	nan
Pearson correlation for roc_auc	0.018	nan	nan
Pearson correlation for accuracy	-0.100	nan	nan
Pearson correlation for precision	-0.082	nan	nan
Pearson correlation for recall	-0.229	nan	nan
Pearson correlation for f1	-0.168	nan	nan
MAE for perplexity_member	4.356	9.001	6.126
MAE for perplexity_non_member	4.126	8.774	6.479
MAE for roc_auc	0.061	0.030	0.060
MAE for accuracy	0.093	0.063	0.059
MAE for precision	0.103	0.065	0.069
MAE for recall	0.136	0.054	0.111
MAE for f1	0.119	0.059	0.091
R² score for perplexity_member	0.470	-0.846	-2.717
R² score for perplexity_non_member	0.529	-0.834	-2.726
R² score for roc_auc	-4.865	-0.043	-0.450
R² score for accuracy	-1.853	-0.171	-0.030
R² score for precision	-2.402	-0.181	-0.052
R² score for recall	-4.212	-0.002	-0.447
R² score for f1	-3.573	-0.048	-0.268
MSE for perplexity_member	50.742	176.777	51.335
MSE for perplexity_non_member	43.490	169.299	57.378
MSE for roc_auc	0.006	0.001	0.006
MSE for accuracy	0.012	0.005	0.005
MSE for precision	0.015	0.005	0.008
MSE for recall	0.028	0.005	0.018
MSE for f1	0.021	0.005	0.012
MedAE for perplexity_member	2.159	4.802	4.431
MedAE for perplexity_non_member	2.356	4.613	4.718
MedAE for roc_auc	0.045	0.030	0.065
MedAE for accuracy	0.100	0.050	0.050
MedAE for precision	0.100	0.056	0.056
MedAE for recall	0.100	0.100	0.100
MedAE for f1	0.103	0.067	0.079
Coefficient of Variation for perplexity_member	0.799	0.394	1.303e-16
Coefficient of Variation for perplexity_non_member	0.783	0.392	1.215e-16
Coefficient of Variation for roc_auc	0.068	0.123	1.178e-16
Coefficient of Variation for accuracy	0.125	0.152	0.0
Coefficient of Variation for precision	0.127	0.181	0.0
Coefficient of Variation for recall	0.148	0.269	0.0
Coefficient of Variation for f1	0.134	0.225	0.0
MAPE for perplexity_member	0.327	0.527	0.587
MAPE for perplexity_non_member	0.329	0.506	0.583
MAPE for roc_auc	0.131	0.061	0.111
MAPE for accuracy	0.176	0.119	0.125
MAPE for precision	0.194	0.123	0.157
MAPE for recall	0.261	0.110	0.339
MAPE for f1	0.225	0.116	0.244
Wilcoxon test for perplexity_member	149.0, $0.227$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	159.0, $0.327$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	94.5, $0.0231$	126.0, $0.2059$	62.5, $0.0024$
Wilcoxon test for accuracy	91.5, $0.0323$	62.0, $0.0061$	103.5, $0.6718$
Wilcoxon test for precision	90.5, $0.0305$	80.0, $0.0229$	89.0, $0.3517$
Wilcoxon test for recall	49.0, $0.0107$	56.0, $0.7963$	45.0, $0.0029$
Wilcoxon test for f1	79.5, $0.0147$	108.0, $0.0837$	82.0, $0.0509$

Table 5: Results for Epochs = 10

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	7.123	13.296	7.165
RMSE for perplexity_non_member	6.595	13.011	7.575
RMSE for roc_auc	0.079	0.033	0.076
RMSE for accuracy	0.109	0.070	0.074
RMSE for precision	0.122	0.072	0.088
RMSE for recall	0.167	0.073	0.135
RMSE for f1	0.145	0.070	0.112
Pearson correlation for perplexity_member	0.922	nan	nan
Pearson correlation for perplexity_non_member	0.922	nan	nan
Pearson correlation for roc_auc	0.018	nan	nan
Pearson correlation for accuracy	-0.100	nan	nan
Pearson correlation for precision	-0.082	nan	nan
Pearson correlation for recall	-0.229	nan	nan
Pearson correlation for f1	-0.168	nan	nan
MAE for perplexity_member	4.356	9.001	6.126
MAE for perplexity_non_member	4.126	8.774	6.479
MAE for roc_auc	0.061	0.030	0.060
MAE for accuracy	0.093	0.063	0.059
MAE for precision	0.103	0.065	0.069
MAE for recall	0.136	0.054	0.111
MAE for f1	0.119	0.059	0.091
R² score for perplexity_member	0.470	-0.846	-2.717
R² score for perplexity_non_member	0.529	-0.834	-2.726
R² score for roc_auc	-4.865	-0.043	-0.450
R² score for accuracy	-1.853	-0.171	-0.030
R² score for precision	-2.402	-0.181	-0.052
R² score for recall	-4.212	-0.002	-0.447
R² score for f1	-3.573	-0.048	-0.268
MSE for perplexity_member	50.742	176.777	51.335
MSE for perplexity_non_member	43.490	169.299	57.378
MSE for roc_auc	0.006	0.001	0.006
MSE for accuracy	0.012	0.005	0.005
MSE for precision	0.015	0.005	0.008
MSE for recall	0.028	0.005	0.018
MSE for f1	0.021	0.005	0.012
MedAE for perplexity_member	2.159	4.802	4.431
MedAE for perplexity_non_member	2.356	4.613	4.718
MedAE for roc_auc	0.045	0.030	0.065
MedAE for accuracy	0.100	0.050	0.050
MedAE for precision	0.100	0.056	0.056
MedAE for recall	0.100	0.100	0.100
MedAE for f1	0.103	0.067	0.079
Coefficient of Variation for perplexity_member	0.799	0.394	1.303e-16
Coefficient of Variation for perplexity_non_member	0.783	0.392	1.215e-16
Coefficient of Variation for roc_auc	0.068	0.123	1.178e-16
Coefficient of Variation for accuracy	0.125	0.152	0.0
Coefficient of Variation for precision	0.127	0.181	0.0
Coefficient of Variation for recall	0.148	0.269	0.0
Coefficient of Variation for f1	0.134	0.225	0.0
MAPE for perplexity_member	0.327	0.527	0.587
MAPE for perplexity_non_member	0.329	0.506	0.583
MAPE for roc_auc	0.131	0.061	0.111
MAPE for accuracy	0.176	0.119	0.125
MAPE for precision	0.194	0.123	0.157
MAPE for recall	0.261	0.110	0.339
MAPE for f1	0.225	0.116	0.244
Wilcoxon test for perplexity_member	149.0, 0.227	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	159.0, 0.327	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	94.5, 0.023	126.0, 0.206	62.5, 0.002
Wilcoxon test for accuracy	91.5, 0.032	62.0, 0.006	103.5, 0.672
Wilcoxon test for precision	90.5, 0.030	80.0, 0.023	89.0, 0.352
Wilcoxon test for recall	49.0, 0.011	56.0, 0.796	45.0, 0.003
Wilcoxon test for f1	79.5, 0.015	108.0, 0.084	82.0, 0.051

Table 6: Results for Epochs = 20

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	3.197	7.225	6.567
RMSE for perplexity_non_member	3.577	6.896	7.528
RMSE for roc_auc	0.081	0.025	0.073
RMSE for accuracy	0.081	0.053	0.070
RMSE for precision	0.088	0.053	0.073
RMSE for recall	0.115	0.080	0.145
RMSE for f1	0.090	0.061	0.099
Pearson correlation for perplexity_member	0.896	nan	nan
Pearson correlation for perplexity_non_member	0.887	nan	nan
Pearson correlation for roc_auc	0.003	nan	nan
Pearson correlation for accuracy	0.154	nan	nan
Pearson correlation for precision	0.073	nan	nan
Pearson correlation for recall	0.273	nan	nan
Pearson correlation for f1	0.283	nan	nan
MAE for perplexity_member	2.573	4.447	5.560
MAE for perplexity_non_member	3.169	4.076	6.378
MAE for roc_auc	0.070	0.016	0.062
MAE for accuracy	0.059	0.039	0.055
MAE for precision	0.067	0.041	0.061
MAE for recall	0.082	0.050	0.118
MAE for f1	0.066	0.044	0.081
R² score for perplexity_member	0.685	-0.610	-2.532
R² score for perplexity_non_member	0.587	-0.537	-2.543
R² score for roc_auc	-10.510	-0.098	-0.456
R² score for accuracy	-1.291	-0.004	-0.123
R² score for precision	-1.680	-0.0002	-0.039
R² score for recall	-2.364	-0.636	-1.391
R² score for f1	-1.601	-0.184	-0.769
MSE for perplexity_member	10.222	52.200	43.127
MSE for perplexity_non_member	12.792	47.557	56.675
MSE for roc_auc	0.007	0.001	0.005
MSE for accuracy	0.007	0.003	0.005
MSE for precision	0.008	0.003	0.005
MSE for recall	0.013	0.006	0.021
MSE for f1	0.008	0.004	0.010
MedAE for perplexity_member	2.284	1.442	3.925
MedAE for perplexity_non_member	2.885	1.023	4.564
MedAE for roc_auc	0.065	0.010	0.050
MedAE for accuracy	0.050	0.050	0.050
MedAE for precision	0.055	0.045	0.050
MedAE for recall	0.100	0.000	0.100
MedAE for f1	0.048	0.029	0.071
Coefficient of Variation for perplexity_member	0.781	0.417	0.0
Coefficient of Variation for perplexity_non_member	0.746	0.412	0.0
Coefficient of Variation for roc_auc	0.050	0.115	0.0
Coefficient of Variation for accuracy	0.099	0.128	2.056e-16
Coefficient of Variation for precision	0.100	0.137	2.073e-16
Coefficient of Variation for recall	0.116	0.195	0.0
Coefficient of Variation for f1	0.104	0.151	0.0
MAPE for perplexity_member	0.466	0.399	0.596
MAPE for perplexity_non_member	0.570	0.337	0.589
MAPE for roc_auc	0.145	0.034	0.112
MAPE for accuracy	0.112	0.077	0.114
MAPE for precision	0.127	0.080	0.123
MAPE for recall	0.147	0.107	0.287
MAPE for f1	0.122	0.090	0.181
Wilcoxon test for perplexity_member	77.0, $0.003$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	49.0, $0.0002$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	61.0, $0.0007$	81.0, $0.132$	52.5, $0.0018$
Wilcoxon test for accuracy	51.0, $0.226$	55.0, $0.174$	42.0, $0.010$
Wilcoxon test for precision	103.5, $0.455$	98.0, $0.349$	138.0, $0.340$
Wilcoxon test for recall	19.5, $0.0049$	0.0, $0.0011$	7.0, $6.554e-05$
Wilcoxon test for f1	57.5, $0.0143$	55.0, $0.0187$	34.0, $0.0003$

Table 7: Results for Batch Size = 5

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	1.105	1.375	2.320
RMSE for perplexity_non_member	4.430	0.483	4.327
RMSE for roc_auc	0.222	0.348	0.151
RMSE for accuracy	0.202	0.334	0.181
RMSE for precision	0.202	0.304	0.135
RMSE for recall	0.218	0.380	0.247
RMSE for f1	0.206	0.341	0.189
Pearson correlation for perplexity_member	0.948	nan	nan
Pearson correlation for perplexity_non_member	0.362	nan	nan
Pearson correlation for roc_auc	0.706	nan	nan
Pearson correlation for accuracy	0.386	nan	nan
Pearson correlation for precision	0.427	nan	nan
Pearson correlation for recall	0.330	nan	nan
Pearson correlation for f1	0.382	nan	nan
MAE for perplexity_member	1.045	1.001	2.046
MAE for perplexity_non_member	3.994	0.334	3.819
MAE for roc_auc	0.194	0.282	0.144
MAE for accuracy	0.182	0.277	0.173
MAE for precision	0.181	0.248	0.129
MAE for recall	0.182	0.314	0.232
MAE for f1	0.181	0.280	0.180
R² score for perplexity_member	-0.375	-1.129	-3.498
R² score for perplexity_non_member	-95.737	-0.151	-3.526
R² score for roc_auc	-0.196	-1.928	-9.677
R² score for accuracy	-0.171	-2.195	-11.602
R² score for precision	-0.313	-1.974	-10.019
R² score for recall	-0.044	-2.170	-7.504
R² score for f1	-0.129	-2.091	-8.940
MSE for perplexity_member	1.221	1.890	5.380
MSE for perplexity_non_member	19.629	0.233	18.723
MSE for roc_auc	0.049	0.121	0.023
MSE for accuracy	0.041	0.112	0.033
MSE for precision	0.041	0.093	0.018
MSE for recall	0.048	0.144	0.061
MSE for f1	0.042	0.116	0.036
MedAE for perplexity_member	0.943	0.774	1.623
MedAE for perplexity_non_member	3.481	0.332	3.001
MedAE for roc_auc	0.165	0.320	0.135
MedAE for accuracy	0.200	0.325	0.150
MedAE for precision	0.164	0.291	0.109
MedAE for recall	0.200	0.300	0.200
MedAE for f1	0.183	0.301	0.152
Coefficient of Variation for perplexity_member	0.344	0.290	0.0
Coefficient of Variation for perplexity_non_member	0.119	0.264	0.0
Coefficient of Variation for roc_auc	0.301	0.057	1.166e-16
Coefficient of Variation for accuracy	0.283	0.067	1.190e-16
Coefficient of Variation for precision	0.272	0.053	1.244e-16
Coefficient of Variation for recall	0.317	0.112	0.0
Coefficient of Variation for f1	0.294	0.079	0.0
MAPE for perplexity_member	0.397	0.293	0.498
MAPE for perplexity_non_member	1.040	0.084	0.453
MAPE for roc_auc	0.352	0.537	0.178
MAPE for accuracy	0.318	0.520	0.229
MAPE for precision	0.327	0.476	0.168
MAPE for recall	0.331	0.621	0.321
MAPE for f1	0.327	0.545	0.241
Wilcoxon test for perplexity_member	0.0, $7.451e-09$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	0.0, $7.451e-09$	43.0, $9.319e-05$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	61.0, $0.0007$	0.0, $3.991e-05$	0.0, $7.451e-09$
Wilcoxon test for accuracy	64.5, $0.0027$	0.0, $3.835e-05$	0.0, $7.451e-09$
Wilcoxon test for precision	57.0, $0.0015$	0.0, $3.876e-05$	0.0, $7.451e-09$
Wilcoxon test for recall	79.0, $0.0417$	0.0, $3.574e-05$	0.0, $7.451e-09$
Wilcoxon test for f1	78.0, $0.0076$	0.0, $3.876e-05$	0.0, $7.451e-09$

Table 8: Results for Batch Size = 10

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	1.602	1.833	3.032
RMSE for perplexity_non_member	3.979	1.067	4.467
RMSE for roc_auc	0.172	0.280	0.138
RMSE for accuracy	0.150	0.240	0.138
RMSE for precision	0.175	0.245	0.122
RMSE for recall	0.138	0.249	0.195
RMSE for f1	0.147	0.247	0.160
Pearson correlation for perplexity_member	0.885	nan	nan
Pearson correlation for perplexity_non_member	0.688	nan	nan
Pearson correlation for roc_auc	0.335	nan	nan
Pearson correlation for accuracy	0.157	nan	nan
Pearson correlation for precision	0.030	nan	nan
Pearson correlation for recall	0.290	nan	nan
Pearson correlation for f1	0.203	nan	nan
MAE for perplexity_member	1.493	1.228	2.685
MAE for perplexity_non_member	3.649	0.482	3.939
MAE for roc_auc	0.155	0.243	0.133
MAE for accuracy	0.107	0.214	0.121
MAE for precision	0.128	0.216	0.100
MAE for recall	0.089	0.214	0.175
MAE for f1	0.095	0.216	0.141
R² score for perplexity_member	-0.386	-0.815	-3.644
R² score for perplexity_non_member	-14.029	-0.080	-3.499
R² score for roc_auc	-0.511	-3.026	-14.274
R² score for accuracy	-0.913	-3.905	-3.380
R² score for precision	-1.337	-3.545	-2.143
R² score for recall	-0.167	-2.830	-4.035
R² score for f1	-0.547	-3.334	-3.526
MSE for perplexity_member	2.565	3.359	9.190
MSE for perplexity_non_member	15.831	1.138	19.954
MSE for roc_auc	0.029	0.078	0.019
MSE for accuracy	0.022	0.058	0.019
MSE for precision	0.031	0.060	0.015
MSE for recall	0.019	0.062	0.038
MSE for f1	0.022	0.061	0.026
MedAE for perplexity_member	1.279	0.839	2.118
MedAE for perplexity_non_member	3.251	0.187	3.027
MedAE for roc_auc	0.155	0.310	0.135
MedAE for accuracy	0.050	0.250	0.150
MedAE for precision	0.083	0.244	0.117
MedAE for recall	0.100	0.200	0.200
MedAE for f1	0.058	0.229	0.168
Coefficient of Variation for perplexity_member	0.412	0.297	2.123e-16
Coefficient of Variation for perplexity_non_member	0.259	0.279	1.206e-16
Coefficient of Variation for roc_auc	0.238	0.051	0.0
Coefficient of Variation for accuracy	0.189	0.099	1.413e-16
Coefficient of Variation for precision	0.200	0.100	1.413e-16
Coefficient of Variation for recall	0.221	0.142	1.413e-16
Coefficient of Variation for f1	0.207	0.116	1.413e-16
MAPE for perplexity_member	0.480	0.290	0.522
MAPE for perplexity_non_member	0.914	0.089	0.477
MAPE for roc_auc	0.294	0.476	0.191
MAPE for accuracy	0.221	0.417	0.190
MAPE for precision	0.269	0.429	0.155
MAPE for recall	0.203	0.453	0.305
MAPE for f1	0.213	0.441	0.230
Wilcoxon test for perplexity_member	1.0, $1.490e-08$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	0.0, $7.451e-09$	192.0, $0.813923$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	41.5, $0.000393$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for accuracy	20.5, $0.000334$	0.0, $5.103e-06$	0.0, $6.731e-06$
Wilcoxon test for precision	14.0, $0.000162$	0.0, $5.383e-06$	0.0, $7.326e-06$
Wilcoxon test for recall	42.0, $0.162486$	0.0, $4.571e-06$	0.0, $6.002e-06$
Wilcoxon test for f1	48.0, $0.006188$	0.0, $5.383e-06$	0.0, $7.326e-06$

Table 9: Results for Batch Size = 50

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	3.197	7.225	6.567
RMSE for perplexity_non_member	3.577	6.896	7.528
RMSE for roc_auc	0.081	0.025	0.073
RMSE for accuracy	0.081	0.053	0.070
RMSE for precision	0.088	0.053	0.073
RMSE for recall	0.115	0.080	0.145
RMSE for f1	0.090	0.061	0.099
Pearson correlation for perplexity_member	0.896	nan	nan
Pearson correlation for perplexity_non_member	0.887	nan	nan
Pearson correlation for roc_auc	0.003	nan	nan
Pearson correlation for accuracy	0.154	nan	nan
Pearson correlation for precision	0.073	nan	nan
Pearson correlation for recall	0.273	nan	nan
Pearson correlation for f1	0.283	nan	nan
MAE for perplexity_member	2.573	4.447	5.560
MAE for perplexity_non_member	3.169	4.076	6.378
MAE for roc_auc	0.070	0.016	0.062
MAE for accuracy	0.059	0.039	0.055
MAE for precision	0.067	0.041	0.061
MAE for recall	0.082	0.050	0.118
MAE for f1	0.066	0.044	0.081
R² score for perplexity_member	0.685	-0.610	-2.532
R² score for perplexity_non_member	0.587	-0.537	-2.543
R² score for roc_auc	-10.510	-0.098	-0.456
R² score for accuracy	-1.291	-0.004	-0.123
R² score for precision	-1.680	-0.000	-0.039
R² score for recall	-2.364	-0.636	-1.391
R² score for f1	-1.601	-0.184	-0.769
MSE for perplexity_member	10.222	52.200	43.127
MSE for perplexity_non_member	12.792	47.557	56.675
MSE for roc_auc	0.007	0.001	0.005
MSE for accuracy	0.007	0.003	0.005
MSE for precision	0.008	0.003	0.005
MSE for recall	0.013	0.006	0.021
MSE for f1	0.008	0.004	0.010
MedAE for perplexity_member	2.284	1.442	3.925
MedAE for perplexity_non_member	2.885	1.023	4.564
MedAE for roc_auc	0.065	0.010	0.050
MedAE for accuracy	0.050	0.050	0.050
MedAE for precision	0.055	0.045	0.050
MedAE for recall	0.100	0.000	0.100
MedAE for f1	0.048	0.029	0.071
Coefficient of Variation for perplexity_member	0.781	0.417	0.0
Coefficient of Variation for perplexity_non_member	0.746	0.412	0.0
Coefficient of Variation for roc_auc	0.050	0.115	0.0
Coefficient of Variation for accuracy	0.099	0.128	2.056e-16
Coefficient of Variation for precision	0.100	0.137	2.073e-16
Coefficient of Variation for recall	0.116	0.195	0.0
Coefficient of Variation for f1	0.104	0.151	0.0
MAPE for perplexity_member	0.466	0.399	0.596
MAPE for perplexity_non_member	0.570	0.337	0.589
MAPE for roc_auc	0.145	0.034	0.112
MAPE for accuracy	0.112	0.077	0.114
MAPE for precision	0.127	0.080	0.123
MAPE for recall	0.147	0.107	0.287
MAPE for f1	0.122	0.090	0.181
Wilcoxon test for perplexity_member	77.0, $0.003$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	49.0, $0.0001937$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	61.0, $0.0007162$	81.0, $0.132$	52.5, $0.001772$
Wilcoxon test for accuracy	51.0, $0.226$	55.0, $0.174$	42.0, $0.010$
Wilcoxon test for precision	103.5, $0.455$	98.0, $0.349$	138.0, $0.340$
Wilcoxon test for recall	19.5, $0.004877$	0.0, $0.001054$	7.0, $6.554e-05$
Wilcoxon test for f1	57.5, $0.0143$	55.0, $0.0187$	34.0, $0.000321$

Table 10: Results for Batch Size = 100

Metric	DP vs DP-Weights	DP vs Fine-Tuned	DP-Weights vs Fine-Tuned
RMSE for perplexity_member	7.123	13.296	7.165
RMSE for perplexity_non_member	6.595	13.011	7.575
RMSE for roc_auc	0.079	0.033	0.076
RMSE for accuracy	0.109	0.070	0.074
RMSE for precision	0.122	0.072	0.088
RMSE for recall	0.167	0.073	0.135
RMSE for f1	0.145	0.070	0.112
Pearson correlation for perplexity_member	0.922	nan	nan
Pearson correlation for perplexity_non_member	0.922	nan	nan
Pearson correlation for roc_auc	0.018	nan	nan
Pearson correlation for accuracy	-0.100	nan	nan
Pearson correlation for precision	-0.082	nan	nan
Pearson correlation for recall	-0.229	nan	nan
Pearson correlation for f1	-0.168	nan	nan
MAE for perplexity_member	4.356	9.001	6.126
MAE for perplexity_non_member	4.126	8.774	6.479
MAE for roc_auc	0.061	0.030	0.060
MAE for accuracy	0.093	0.063	0.059
MAE for precision	0.103	0.065	0.069
MAE for recall	0.136	0.054	0.111
MAE for f1	0.119	0.059	0.091
R² score for perplexity_member	0.470	-0.846	-2.717
R² score for perplexity_non_member	0.529	-0.834	-2.726
R² score for roc_auc	-4.865	-0.043	-0.450
R² score for accuracy	-1.853	-0.171	-0.030
R² score for precision	-2.402	-0.181	-0.052
R² score for recall	-4.212	-0.002	-0.447
R² score for f1	-3.573	-0.048	-0.268
MSE for perplexity_member	50.742	176.777	51.335
MSE for perplexity_non_member	43.490	169.299	57.378
MSE for roc_auc	0.006	0.001	0.006
MSE for accuracy	0.012	0.005	0.005
MSE for precision	0.015	0.005	0.008
MSE for recall	0.028	0.005	0.018
MSE for f1	0.021	0.005	0.012
MedAE for perplexity_member	2.159	4.802	4.431
MedAE for perplexity_non_member	2.356	4.613	4.718
MedAE for roc_auc	0.045	0.030	0.065
MedAE for accuracy	0.100	0.050	0.050
MedAE for precision	0.100	0.056	0.056
MedAE for recall	0.100	0.100	0.100
MedAE for f1	0.103	0.067	0.079
Coefficient of Variation for perplexity_member	0.799	0.394	1.303e-16
Coefficient of Variation for perplexity_non_member	0.783	0.392	1.215e-16
Coefficient of Variation for roc_auc	0.068	0.123	1.178e-16
Coefficient of Variation for accuracy	0.125	0.152	0.0
Coefficient of Variation for precision	0.127	0.181	0.0
Coefficient of Variation for recall	0.148	0.269	0.0
Coefficient of Variation for f1	0.134	0.225	0.0
MAPE for perplexity_member	0.327	0.527	0.587
MAPE for perplexity_non_member	0.329	0.506	0.583
MAPE for roc_auc	0.131	0.061	0.111
MAPE for accuracy	0.176	0.119	0.125
MAPE for precision	0.194	0.123	0.157
MAPE for recall	0.261	0.110	0.339
MAPE for f1	0.225	0.116	0.244
Wilcoxon test for perplexity_member	149.0, $0.227$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for perplexity_non_member	159.0, $0.327$	0.0, $7.451e-09$	0.0, $7.451e-09$
Wilcoxon test for roc_auc	94.5, $0.023$	126.0, $0.206$	62.5, $0.002$
Wilcoxon test for accuracy	91.5, $0.032$	62.0, $0.006$	103.5, $0.672$
Wilcoxon test for precision	90.5, $0.030$	80.0, $0.023$	89.0, $0.352$
Wilcoxon test for recall	49.0, $0.011$	56.0, $0.796$	45.0, $0.003$
Wilcoxon test for f1	79.5, $0.015$	108.0, $0.084$	82.0, $0.051$